University of Calgary PRISM: University of Calgary's Digital Repository

Graduate Studies The Vault: Electronic Theses and Dissertations

2020-06-04 Integrative Structural Model of DNA-PKcs in the Initial Steps of Non-Homologous End Joining

Hepburn, Morgan Rose

Hepburn, M. R. (2020). Integrative Structural Model of DNA-PKcs in the Initial Steps of Non-Homologous End Joining (Unpublished doctoral thesis). University of Calgary, Calgary, AB. http://hdl.handle.net/1880/112161 doctoral thesis

University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission. Downloaded from PRISM: https://prism.ucalgary.ca UNIVERSITY OF CALGARY

Integrative Structural Model of DNA-PKcs in the Initial Steps of Non-Homologous End Joining

by

Morgan Rose Hepburn

A THESIS

SUBMITTED TO THE FACULTY OF GRADUATE STUDIES

IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE

DEGREE OF DOCTOR OF PHILOSOPHY

GRADUATE PROGRAM IN BIOCHEMISTRY AND MOLECULAR BIOLOGY

CALGARY, ALBERTA

JUNE, 2020

© Morgan Rose Hepburn 2020 Abstract

Non-homologous end joining (NHEJ) performs untemplated repair of DNA double strand breaks

(DSBs). Despite lack of a template, intricate repair, coordinated by the core NHEJ factors, can repair breaks with minimal to no alterations. Initiating repair, Ku70/80 binds to the free DNA ends, and interacts with the large protein kinase, DNA dependent protein kinase catalytic subunit

(DNA-PKcs), forming the holoenzyme DNA-PK. Holoenzymes can synapse across the break to tether the DNA ends. Assembly of the initial synaptic complex and its role in NHEJ is poorly understood, as final ligation requires a structural rearrangement of this initial complex. To better understand DNA-PKcs’ role in NHEJ, an integrative structural model of DNA-PKcs in the initial stages of NHEJ was developed using mass spectrometry (MS) techniques.

Due to technical challenges working with DNA-PKcs, each of the MS techniques were optimized for the system. Hydrogen deuterium exchange (HX) methods were optimized on a nano-spray HX system, allowing for differential HX analysis of bead bound DNA-PKcs complexes with high sequence coverage, and 5X improvement in protein consumption.

Reversible crosslinking and peptide fingerprinting (RCAP) was optimized to allow for direct detection of DNA binding peptides, using a single sample. Finally, given the benefits of DNA-

PKcs complex assembly on beads to limit heterogeneity, an on-bead crosslinking method was developed. Mass Spec Studio was used to accurately identify many crosslinks, which can be utilized for a label free quantitation comparison of states.

Using HX-MS to explore DNA-PKcs conformational changes from binding to activation of the kinase, an allosteric pathway was identified in DNA-PKcs connecting DNA-binding with the kinase domain. loading of the kinase domain revealed that DNA-PK occupies a tensed state when active. From integrative structural modelling, with the XL-MS restraints, a

ii model with a precision of 13.5Å was reported, revealing a symmetric DNA-PK dimer, with head to head interactions. In our synaptic model, the DNA ends are positioned with a large offset, protected by a previously uncharacterized plug domain of DNA-PKcs. We propose the initial formation of the synaptic complex allows for a hierarchical processing of DNA ends and assembly of a core NHEJ scaffold.

iii Preface

Work in this thesis appears in the following peer-reviewed or submitted publications:

1) Sarpe, V., Rafiei, A., Hepburn, M., Ostan, N., Schryvers, A. B., Schriemer, D. C.

(2016) High sensitivity crosslink detection coupled with integrative structure modeling in

the Mass Spec Studio. Mol. Cell. Proteomics, 15(9), 3071-3080

2) Sheff, J. G., Hepburn, M., Yu, Y., Lees-Miller, S. P., Schriemer, D. (2017)

Nanospray HX-MS configuration for structural interrogation of large protein systems.

Analyst. 142, 904-910 (Co-First Author)

3) Saltzberg, D.J., Hepburn, M., Pilla K.B., Schriemer, D.C., Lees-Miller, S.P.,

Blundell, T.L., Sali, A. (2019) SSEThread: Integrative threading of the DNA-PKcs

sequence based on data from chemical cross-linking and hydrogen deuterium exchange.

Prog Biophys Mol Biol. 147, 92-102.

4) Hepburn, M., Saltzberg, D.J., Lee, L., Fang, S., Atikinson, C., Strynadka, N.C.J., Sali,

A., Lees-Miller, S.P., Schriemer, D.C. (2020) The active DNA-PK holoenzyme occupies

a tensed state in a staggered synaptic complex. Submitted Manuscript.

Of the work presented in this thesis, my role included experimental design, data collection, data analysis, and interpretation, unless otherwise stated. Sections of chapter 2 were adapted from

Sheff et al. (2017), with permission from The Royal Society of Chemistry. The nano-spray HX source, being tested in chapter 2 for use on large protein systems, was developed and optimized by Dr. J. Sheff. For the HX analysis in a slurry format, Dr. J. Sheff helped in method development for application of the caffeine and back-exchange corrections. In chapter 4, V.

Sarpe was responsible for all Mass Spec Studio development. DNA-PKcs used for the HX

iv experiments (chapters 2 and 5), DNA footprinting (chapters 3 and 5), and initial crosslinking, was purified by Yaping Yu (Lees-Miller Lab).

Data from the submitted manuscript appears throughout the thesis (Chapters 2-4), but the full manuscript is presented in Chapter 5. The manuscript was co-written by myself and Dr. D.C.

Schriemer, with contributions, for integrative modelling methods, from Dr. D. Saltzberg.

Integrative modelling of DNA-PK synapsis with IMP was completed by Dr. D. Saltzberg, with

Dr. A. Sali. Dr. D. Saltzberg also aided in the interpretation of the modelling results. Dr’s S.

Fang, L. Lee, and S.P. Lees-Miller assisted in protein expression and isolation. Dr’s C. Atkinson and N.C.J. Strynadka supported with electron microscopy.

v Acknowledgements

First of I would like to thank my supervisor Dr. David Schriemer for his support and guidance throughout my graduate program. It was a long haul, so thank-you for all your patience and encouragement, throughout this project.

I would also like to thank my committee members, Dr. Susan Lees-Miller and Dr.

Kenneth Ng, for thoughtful discussion, input and support for my project thought the years.

As well to thank-you Dr. Susan Lees-Miller, Dr. Yaping Yu, Dr. Shujuan Fang, Dr,

Sarvan K. Radhakrishnan, and all the other members of the Lees-Miller lab, for the supply of various NHEJ proteins throughout the years, and allowing me to come in an use their space for purification, and western blots.

Thank-you to Dr. Daniel Saltzberg and Dr. Anrej Sali, for all their invaluable work in generating the integrative model of the DNA-PK synaptic complex. With a special thank-you to

Dr. Daniel Saltzberg for taking to the time to meet and discuss the modeling, interpretation, and just generally any questions and concerns I had.

Thank-you to Dr. Claire Atkinson and Dr. Natalie Strynaka, for allowing me to come learn how to prepare samples for negative stain electron microscopy. A special thanks to Dr.

Claire Atkinson, for taking the electron microscopy images.

I would like to thank all the members of the Schriemer lab past and present, for making it such a great work environment to come to, and thoughtful discussion on projects and methods development. Specifically, I would like thank Dr. Martial Rey for teaching me all about HX exchange when I first joined the lab. Thank-you to Dr. Joey Sheff who designed the Nano-spray

HX configuration, my results would not have been nearly as exciting without it. I would also like to thank Dr. Linda Lee for all the proteins you have supplied throughout the years, as well

vi assisting the DNA-PKcs purification. Finally, I would like to thank Vlad Sarpe and the rest of the

Mass Spec Studio development team, for all you do making data analysis easier for us.

Last but not least, I would like to thank my friends and family for support throughout the years. Specifically, I would like to thank my husband, for making me breakfast every day to fuel my science, bringing dinner when experiments run late, and even helping with Matlab code for data analysis.

vii Table of Contents

Abstract ...... ii

Preface ...... iv

Acknowledgements ...... vi

Table of Contents ...... viii

List of Tables ...... xi

List of Figures and Illustrations ...... xii

List of Symbols, Abbreviations and Nomenclature ...... xv

Chapter 1: Introduction ...... 1 1.1 DNA Double Strand Break Repair ...... 1 1.2 Non-Homologous End Joining ...... 3 1.2.1 End Detection and Tethering ...... 5 1.2.2 End Processing ...... 12 1.2.3 Ligation ...... 14 1.2.4 Summary of NHEJ ...... 15 1.3 Integrative Structural Biology ...... 17 1.4 ISB Model of DNA-PKcs in the Initial Steps of NHEJ ...... 22 1.4.1 Available Data and Restraints ...... 23 1.4.2 Hydrogen Exchange Mass Spectrometry (HX-MS) ...... 25 1.4.3 Crosslinking Mass Spectrometry (XL-MS) ...... 30 1.4.4 Bringing it all Together ...... 36

Chapter 2: HX Methods for Ultra Large Protein Systems ...... 37 2.1 Introduction ...... 37 2.2. Methods ...... 42 2.2.1. Nano-spray HX...... 42 Nano-spray vs Traditional Micro...... 42 Kinetics Analysis of DNA-PKcs using nanoHX ...... 43 Calculating Peptide Protection Factors ...... 44 2.2.2 Nano-spray HX for Bead Bound DNA-PKcs ...... 47 Capture of DNA-PKcs on DNA Bound to Beads ...... 47 Differential HX Analysis of DNA-PKcs (DNA Bound and Free)...... 49 Deuterium Corrections ...... 52 2.3 Results and Discussion...... 53 2.3.1 Nano-spray HX reduces protein consumption and improves the peptide map ...... 53 2.3.2 High DNA-PKcs sequence coverage persists through a HX-kinetics evaluation...... 56

viii 2.3.3 On bead HX of DNA-PKcs identifies changes in deuterium uptake upon binding to DNA ...... 60 2.3.4 Conclusions ...... 70

Chapter 3: DNA Footprinting to Identify DNA Binding Sites ...... 71 3.1 Introduction ...... 71 3.2. Methods ...... 74 3.2.1 Formaldehyde footprinting: RCAP ...... 74 3.3 Results and Discussion...... 76 3.3.1 Defining DNA binding peptides ...... 76 3.3.2 Verification of the chosen DNA binding data cut-offs ...... 78 3.3.3 Conclusions ...... 83

Chapter 4: Optimizing Crosslinking for DNA-PKcs ...... 84 4.1 Introduction ...... 84 4.2 Methods ...... 87 4.2.1 Enrichment of Crosslinked Peptides ...... 87 4.2.2 Crosslinking DNA-PKcs ...... 88 4.2.3 DNA-PKcs and Ku70/80 Protein Purification for Crosslinking ...... 89 4.2.4 Crosslinking Bead-Bound DNA-PK Complexes ...... 91 DNA-PK Capture and Crosslinking ...... 91 Crosslink Quantification and Comparison ...... 93 4.3 Results and Discussion...... 94 4.3.1 Mass Spec Studio for Crosslinked Peptide Identification ...... 94 4.3.2 DNA-PKcs Crosslinks can be Identified without Enrichment ...... 96 4.3.3 Purification of DNA-PKcs and Ku70/80 for XL-MS ...... 103 4.3.4 Quantitative Crosslinking of Bead Bound DNA-PK ...... 103 4.3.5 Conclusions ...... 107

Chapter 5: Integrative Structural Model of DNA-PKcs in the Initial Steps of Non- Homologous End Joining ...... 110 5.1 Introduction ...... 110 5.2 Methods ...... 113 5.2.1 Protein Production and 2kb DNA Preparation ...... 113 DNA-PK Purification ...... 113 PAXX Purification ...... 113 Biotinylated 2kb Preparation ...... 113 5.2.2 Complex Formation and Isolation ...... 116 5.2.3 Hydrogen Deuterium Exchange Mass Spectrometry ...... 117 5.2.4 DNA Footprinting ...... 117 5.2.5 Negative Stain Electron Microscopy ...... 118 5.2.6 Crosslinking Mass Spectrometry ...... 118 5.2.7 Integrative Structure Modeling ...... 119 Gathering Data ...... 120 Representing Subunits and Translating Data into Spatial Restraints ...... 120 Sampling ...... 122 Analyzing and Validating the Models ...... 123

ix 5.2.8 Localizing Ku70/80 termini and PAXX ...... 124 5.3 Results ...... 125 5.3.1 Assembling the DNA-PK Holoenzyme for Structural Mass Spectrometry125 5.3.2 DNA flexes the arm and is constrained by a plug domain ...... 129 5.3.3 An allosteric pathway between DNA binding sites and the kinase domain130 5.3.4 Nucleotide loading primes the allosteric pathway ...... 134 5.3.5 PAXX engages the allosteric pathway ...... 137 5.3.6 Crosslinking the Synaptic Complex ...... 137 5.3.7 Structure of the DNA-PK synaptic complex ...... 145 5.3.8 Ku70/80 C-terminal regions define a supportive base ...... 151 5.4 Discussion ...... 153

Chapter 6: Summary and Future Directions ...... 161 6.1 Summary ...... 161 6.2 Future Considerations ...... 165

Bibliography ...... 169

Appendices ...... 183 Appendix A: Copyright and Permissions ...... 183 Appendix B: Deuterium Uptake and Change for Peptides in the Differential HX-MS Analyses Presented...... 183 Appendix C: Crosslink Identifications Used in Modelling ...... 183

x List of Tables

Table 1.1. Common Restraints that can be used for ISB...... 21

Table 2.1. Evaluation of Traditional and Nano-spray DNA-PKcs Maps...... 54

Table 2.2. Summary of the Kinetics Evaluation of DNA-PKcs...... 55

Table 2.3. Caffeine Determined D2O Labelling ...... 64

Table 2.4. Average Standard Deviation of Deuteration Measurements with Correction ...... 65

Table 2.5. Summary of the Differential HX of DNA-PKcs DNA Binding...... 67

Table 4.1. DNA Constructs for DNA-PK Capture...... 92

Table 5.1. DNA Constructs...... 115

Table 5.2. Summary of the Differential HX-MS Binding Experiments...... 127

xi List of Figures and Illustrations

Figure 1.1. DSB repair pathway choice...... 2

Figure 1.2. DSB Repair by NHEJ...... 4

Figure 1.3. Structural Elements of the Long-Range Synaptic Complex...... 6

Figure 1.4. Dimerization of DNA-PK...... 9

Figure 1.5. Structural Elements of the Ku70/80:XLF:XRCC4:LigIV Complex...... 11

Figure 1.6. Interactions of the Processing Enzymes...... 13

Figure 1.7. Structural Data for Integrative Structural Biology for NHEJ...... 19

Figure 1.8. Resolution of Inputs Effect the Final Resolution of the ISB Model...... 20

Figure 1.9. General HX-MS Workflow...... 27

Figure 1.10. ISB Model of the Ribosome Translation Initiation Complex...... 32

Figure 1.11. General XL-MS Workflow...... 33

Figure 1.12. Effect of Number of Crosslink Restraints on Model Accuracy and Precision ...... 35

Figure 2.1. Nano-spray HX System...... 39

Figure 2.2. Expansion of the Isotopic Distribution with Deuterium Uptake...... 41

Figure 2.3. Determining the Lowest Rate of Exchange that can be Accurately Measured...... 46

Figure 2.4. DNA Constructs Used for the Pull-Down of DNA-PKcs...... 48

Figure 2.5. Correction of Deuterium Labelling with Light/Heavy Caffeine...... 50

Figure 2.6. Representative Spectrum of an Overlapped Peptide That is Rescued by Peak Picking in Mass Spec Studio...... 57

Figure 2.7. Representative Kinetics Curves for the Selection of D2O Labelling Time...... 58

Figure 2.8. DNA-PKcs Protection Factors...... 59

Figure 2.9. DNA-PKcs Pull-Down on DNA Bound to Streptavidin Agarose Beads...... 61

Figure 2.10. Comparison of Correction Methods...... 62

Figure 2.11. Nano-spray HX-MS evaluation of DNA-PKcs binding to DNA...... 68

xii Figure 3.1. Typical Workflow for MS Peptide/Protein Identification...... 73

Figure 3.2. Determining Rigorous Cut-Offs for Blunt DNA Binding Peptides...... 77

Figure 3.3. Blunt DNA Binding Peptides at Varying Intensity Cut-offs...... 79

Figure 3.4. Applying Defined Cut-off to Determine OH DNA Binding Peptides...... 80

Figure 3.5. Verification of Formaldehyde DNA Footprinting...... 81

Figure 4.1. Crosslinks Identified by Mass Spec Studio show the Paired Light/ Heavy Isotopic Distribution...... 97

Figure 4.2. Evaluation of the DNA-PKcs Crosslinks...... 98

Figure 4.3. Enrichment of BSA Crosslinked Peptides...... 99

Figure 4.4. SEC Enrichment of DNA-PKcs Crosslinks...... 101

Figure 4.5. Comparison of Unique DSS Crosslinks for DNA-PKcs Identified With and Without Enrichment...... 102

Figure 4.6. Purification of DNA-PKcs and Ku70/80 from 100L HeLa Cells...... 104

Figure 4.7. Distribution of Charge States from a Tryptic Digestion of DNA-PKcs...... 106

Figure 4.8. Assessment of MS Measurement Deviation...... 108

Figure 5.1. Diagram of Biotinylated 2kb Preparation...... 114

Figure 5.2. Complex assembly for nanoHX-MS and XL-MS...... 126

Figure 5.3. DNA-PK is a complex conformational switch...... 128

Figure 5.4. NanoHX-MS evaluation of DNA-PKcs binding to DNA...... 131

Figure 5.5. DNA foot-printing...... 132

Figure 5.6. NanoHX-MS evaluation of DNA-PKcs binding to Ku70/80...... 133

Figure 5.7. NanoHX-MS evaluation of DNA-PKcs binding to AMP-PNP...... 135

Figure 5.8. NanoHX-MS evaluation of nucleotide loaded DNA-PK binding to PAXX...... 138

Figure 5.9. Negative stain EM of DNA-PK synaptic complex preparations...... 139

Figure 5.10. Crosslinking of the DNA-PK synaptic complex...... 141

Figure 5.11. A subset of detected DSS crosslinks on DNA-PK bound to DNA...... 142

xiii Figure 5.12. Unique dimer crosslink spectra...... 143

Figure 5.13. Integrative structure modelling of the synaptic complex...... 146

Figure 5.14. Multi-scale representation of synaptic complex components...... 147

Figure 5.15. Integrative models of the synaptic complex...... 148

Figure 5.16. Structural models of clusters not representative of the synaptic state...... 150

Figure 5.17. Positioning the Ku70/80 C-terminal regions and PAXX in the synaptic model. ... 152

Figure 5.18. Nucleotide loading induces tension in DNA-PK...... 154

Figure 5.19. A model of long-range DNA-PK synapsis in the context of NHEJ...... 157

Figure 5.20. Structural assembly supporting the long-range synaptic complex...... 158

Figure 6.1. Synaptic Dimer Interface...... 166

xiv

List of Symbols, Abbreviations and Nomenclature

Symbol Definition ABC Ammonium Bicarbonate ACN Acetonitrile Alt-EJ Alternative End Joining AMP-PNP Adenylyl-Imidodiphosphate AP Affinity Purification APBS Adaptive Poisson-Boltzmann Solver APLF Aprataxin and PNK-Like Factor APTX Aprataxin ATM Ataxia Telangiectasia Mutated BRCT BRCA1 C-Terminal BRET Bioluminescence resonance energy transfer BSA Bovine Serum Albumin BSP5 Bis(succinimidyl)penta(ethylene glycol) BX Back Exchange CRIMP Crosslink analysis

D2O Deuterium Oxide DDA Data Dependent Acquisition DEAE Diethylaminoethyl DMSO Dimethyl sulfoxide DNA-PKcs DNA Dependent Protein Kinase Catalytic Subunit dRMSD Distance Root-Mean-Square Deviation DSBs Double Strand Breaks DSG Disuccinimidyl Gluterate DSS Disuccinimidyl Suberate EDC 1-ethyl-3-(3-dimethylaminopropyl) Carbodiimide Hydrochloride EDTA Ethylenediaminetetraacetic Acid EM Electron Microscopy EMSA Electrophoretic Mobility Shift Assays FA Formic Acid FAT FRAP-ATM-TRRAP FDR False Discovery Rate FHA Fork-Head Associated FRET Förster Resonance Energy Transfer GFP Green Fluorescent Protein HEAT Huntingtin, Elongation factor, PP2A, subunit-TOR HEPES 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid

xv HR HSW High Salt Wash HX Hydrogen Deuterium Exchange IP Immunoprecipitation IR Ionizing Radiation ISB Integrative Structural Biology KCl Potassium Chloride LC Liquid Chromatography LFQ Label-Free Quantification LigIV DNA Ligase IV MC Monte Carlo mg Milligram MgCl2 Magnesium Chloride MS Mass Spectrometry MSS Mass Spec Studio NaCl Sodium Chloride NCE Normalized Collision Energy NHEJ Non-Homologous End Joining NMR Nuclear Magnetic Resonance OH Overhang PAXX Paralog of XRCC4 and XLF PBZ PAR-Binding Zinc-Finger PCD Percent Corrected Deuterium PCR Polymerase Chain Reaction PF Protection Factor PIC Peptide Ion Chromatogram pmol Picomole PMSF Phenylmethylsulfonyl Fluoride PNKP Polynucleotide Kinase/Phosphatase RCAP Reversible Cross-linking and Peptide Fingerprinting rNep-II Recombinant Nepenthesin-II SAP SAF-A/B, Acinus, and PIAS SASA Solvent Accessible Surface Area SAXS Small Angle X-ray Scattering SCX Strong Cation Exchange SEC Size Exclusion Chromatography sec Second SP Sulphopropyl SSA Single Strand Annealing ssDNA Single Strand DNA TDP1 Tyrosyl DNA Phosphodiesterase

xvi Tris Tris(hydroxymethyl)aminomethane UV WCE Whole Cell Extract WRN Werner Protein XIC Extracted Ion Chromatogram XLF XRCC4 Like Factor XL-MS Crosslinking Mass Spectrometry XRCC4 X-ray Repair Cross-Complementing Protein 4

xvii 1

Chapter 1: Introduction

1.1 DNA Double Strand Break Repair

When DNA double strand breaks (DSBs) occur, they must be repaired to prevent , but also must be repaired accurately to maintain genome stability. Depending on the stage the DSB will be repaired by one of the two major DSB repair pathways, homologous recombination (HR) or non-homologous end joining (NHEJ)1. HR is the most error free way to repair a DSB, using the sister chromatid for templated repair (Figure 1.1)2. But because of the requirement for the sister chromatid template, HR is only active when the are present, with maximal activity in mid and G23–5. Throughout the cell cycle DSBs can be repaired by NHEJ 4,5. Though not as error free as HR, NHEJ, untemplated repair, repairs DSBs quickly with little to no alterations to the DNA (Figure 1.1)6,7. As a back-up to these major pathways, DSBs are repaired by alternative end joining (Alt-EJ) or single strand annealing (SSA)

(Figure 1.1)1,8. Like NHEJ these back-up pathways do not use the sister chromatid for accurate repair of DSBs, but these back-up pathways are much more error prone than NHEJ1,8.The difference in the fidelity of repair between these backup pathways and NHEJ is resection (Figure

1.1)1. Resection, without the sister chromatid templated repair that occurs in HR, causes deletions, insertions, and mutations1.

The cytotoxicity of DSBs is often employed in treating . Ionizing radiation (IR) is one such technique that induces large amounts of complex DNA damage including DSBs. If the cancer cell is unable to repair all the breaks, the unrepaired DSBs will signal for cell death. The

DSBs caused by IR are primarily repaired by NHEJ, so efficacy of therapy can be improved with

2

Figure 1.1. DSB repair pathway choice. Summary of each DSB repair pathway. C-NHEJ, stands for classical NHEJ, referred to as NHEJ throughout. Reprinted from Trends in Cell Biology (PMID: 26437586), copyright (2016) with permission from Elsevier

3 the addition of the NHEJ inhibitors9–14. Even in the absence of additional DNA damage these

DSBs repair pathways are more commonly being targeted as monotherapies12,15,16. As DSB repair pathways are being increasingly targeted for cancer therapy, we must consider how to inhibit repair by these major pathways without shifting repair to the error-prone back-up pathways17–19. Shifting repair to the error-prone pathways will help drive genome instability and , a hallmark of cancer20. The major difference between NHEJ and the back-up repair pathways is resection1. So, when developing drugs to target NHEJ, ideally, we want to stop repair while still protecting DNA from resection. Understanding the assembly of the NHEJ complex would provide insight into how it is able to repair a wide range of DSBs while still maintaining protection of the DNA. With a structural model of NHEJ, targets could be identified that inhibit DNA repair, while still protecting DNA ends from resection and initiation of the back-up repair pathways.

1.2 Non-Homologous End Joining

Repair by NHEJ proceeds through three steps; end detection and tethering, end processing, and ligation21 (Figure 1.2). NHEJ consists of core proteins that are required for the repair of all types of DSB lesions, and a set of accessory proteins that function in NHEJ but are not necessarily required for all repair. The set of core proteins include: Ku70/80, DNA dependent protein kinase catalytic subunit (DNA-PKcs), X-ray repair cross-complementing protein 4

(XRCC4) dimer, XRCC4 like factor (XLF) dimer, and DNA ligase IV (LigIV)22. Accessory proteins include the end processing enzymes and proteins which stabilize the core factors23,24.

Between all the interactions that can occur within the core complex, to interactions with accessory proteins, NHEJ has been a challenging complex to study structurally. Combining

4

Figure 1.2. DSB Repair by NHEJ. Cartoon representation of NHEJ DSB repair. Black bars on the ends of the DNA represent un- ligatable DNA ends. For ease of representation complexes are shown sequentially, but its likely a much more dynamic repair pathway.

5 many different structural studies, we can evaluate the current structural understanding of NHEJ, and determine what is still missing in the characterization.

1.2.1 End Detection and Tethering

Ku70/80, a heterodimer consisting of ~70kDa and ~80kDa proteins, is the first complex to bind the break, and has no sequence specificity25 (Figure 1.2). The high abundance of

Ku70/8026, and high affinity for DNA27, allows DSBs to be quickly detected and bound26.

Despite structural similarity between the Ku70 and Ku80 subunits, Ku70/80 loads onto the DNA the same direction each time, Ku70 closest to the DNA end and Ku80 internal28. The crystal structure of Ku70/80 bound to DNA shows a bridge, consisting of both Ku70/80, encircling the

DNA (Figure 1.3A)29. Each of the Ku molecules has a C-terminal domain that is connected to the core through a flexible linker30–32. The Ku70 C-terminal domain is present in the structure of

Ku70/80 without DNA, interacting with Ku8029. Separately, the Ku70 C-terminal domain structure was solved by nuclear magnetic resonance (NMR)31 (Figure 1.3A). This Ku70 C- terminal domain, more commonly referred to as the SAP (SAF-A/B, Acinus, and PIAS) domain, is a proposed DNA binding domain thought to prevent inward translocation of Ku70/80 on the

DNA31,33,34. Due to the flexibility of the Ku80 C-terminal linker region, the Ku80 C-terminal region was removed before crystallization, so is not present in either the DNA bound or free

Ku70/80 structures29. The Ku80 C-terminal region structure was solved separately by NMR32 as well (Figure 1.3A). From this structure, a protein interaction domain was identified that could be responsible for the interaction of Ku80 with other NHEJ proteins32, including DNA-PKcs35. Not present in the NMR structure is the last 28 residues of Ku80, that are known to interact with

DNA-PKcs35–37 (Figure 1.3A).

6

Figure 1.3. Structural Elements of the Long-Range Synaptic Complex. Available structures for A) Ku70/80 (PDB: 1JEY (2.5Å)29, 1RW232, 1JJR31), B) PAXX (PDB: 3WTD (2.35Å)38), C) DNA-PKcs (PDB: 5LUQ (4.3Å)39), and D) DNA-PK (PDB: 5Y3R (6.6Å)40). The bars below the structures show the regions of sequence represented in the structure (color matching the corresponding structure/domain). Grey on the bar shows regions not represented in the structure. Length of the bars is proportional to the protein sequence length. Because of the large size of DNA-PKcs, the same length on the sequence bar represents 2X as much sequence compared to the bars for the other proteins. Known interactions, discussed in the text, are shown as boxes on the sequence bar, with the color corresponding to the interacting protein/protein domain. The circle on each of the DNA-PKcs molecules highlights the movement of the N-terminal arm. When DNA-PKcs is not bound to DNA the N-terminal region sits below the circle (C), but when DNA bound moves up into the circle in contact with the FAT (D).

7

Once at the break, Ku70/80 assembles many of the other core NHEJ proteins (Figure

1.2)26. In particular, DNA-PKcs is recruited to interact with DNA bound Ku70/80 and form the active holoenzyme DNA-PK35–37,40,41, which can be stabilized by the accessory proteins

Aprataxin and PNK-Like Factor (APLF)42 and the Paralog of XRCC4 and XLF dimer (PAXX)43.

Binding of Ku70/80 and DNA-PKcs to the break can protect the DNA ends from resection44. At the break and ATP-loaded, DNA-PKcs kinase is now activated and can phosphorylate one of its most important targets, itself45–48. Two major in vivo DNA-PKcs auto-phosphorylation clusters,

ABCDE (T2609, S2612, T2620, S2624, T2638, T2647) and PQR (S2023, S2029, S2041, S2053,

S2056), which are required for efficient repair, have been shown to regulate access to DNA ends46,49–52. The kinase domain of DNA-PKcs is surrounded by the FAT (FRAP-ATM-TRRAP) and FAT-C-terminal domains, forming the ‘head’ region of DNA-PKcs (Figure 1.3C). This head region is positioned above the double ring structures formed by the middle and N-terminal

HEAT (Huntingtin, Elongation factor, PP2A, subunit-TOR) domains (Figure 1.3C)39. The

DNA-PKcs structure confirms that the auto-phosphorylation of PQR site on DNA-PKcs involves trans-autophosphorylation46,52,53, since true auto-phosphorylation would require a massive conformational change to bring the kinase to the base of the middle HEAT ring of DNA-PKcs

(Figure 1.3C)39. As for phosphorylation of the ABCDE site on DNA-PKcs, the structure does not reveal if it could be auto-phosphorylated52, as it is present on the large segment (2576-2773) of sequence that was not present in the crystal structure due to a high degree of disorder39

(Figure 1.3C).

Despite the inclusion of the Ku80 C-terminal region while generating the DNA-PKcs crystal structure, only a few helices attributed to the Ku80 C-terminal region were identified, near the PQR site, but no sequence could be assigned39 (Figure 1.3C). Using cryo electron

8 microscopy (EM) and the high-resolution structures of DNA-PKcs and Ku70/80, the structure of the complete DNA-PK was determined40. The cryo-EM structure shows two interaction interfaces between DNA-PKcs and the core of Ku70/80 (Figure 1.3D)40. Upon binding of DNA-

PKcs, the Ku70/80 molecules moves inward on the DNA and DNA-PKcs now occupies the terminal end of the DNA40,54. The DNA extending through the central cavity of Ku70/80 comes under the N-terminal HEAT domain causing a displacement of the N-terminal region up towards the FAT domain (Figure 1.3)40. This displacement of the N-terminal HEAT domain induces a conformational change in the kinase domain, that could be important for activation40. Despite using full length proteins for the cryo-EM structure, each of the Ku70/80 C-terminal regions are still not able to be localized in the DNA-PK structure40.

When each side of the break is loaded with DNA-PK, DNA-PKcs can dimerize in order to tether the DNA ends across the break43,44,55–57. This initial tethering event has been termed the long-range synaptic complex, as there is a separation greater than 100Å between the DNA ends56. Formation of the initial long-range synapsis minimally requires Ku70/80 and DNA-

PKcs56, and can be stabilized by the accessory factor PAXX43, which is also recruited to

Ku7038,58,59 (Figure 1.3B). PAXX is a newly discovered NHEJ factor that stabilizes core NHEJ proteins, with poorly defined overlapping functions with XLF, to promote efficient NHEJ repair38,58,60,61. Low resolution small angle X-ray scattering (SAXS) and EM both demonstrate symmetrical DNA-PK dimers, but there is conflicting dimerization interfaces dependent on the

DNA structure used for complex assembly62,63 (Figure 1.4). On DNA that was blocked on one end with a “Y” structure, the DNA-PK models showed dimerization base to base (Figure

1.4A,B)62,63. For DNA with one end blocked with a hairpin structure, DNA-PK was proposed to dimerize head to head (Figure 1.4C)62. Of note, the same head to head dimerization of

9

Figure 1.4. Dimerization of DNA-PK. A) DNA-PK dimerization as seen by negative stain EM on DNA blocked on one end with a Y structure63. Above are selected images of the DNA-PK dimer, and below a model of the dimer complex. Colouring of the model is as follows; Ku70/80 in red, DNA-PKcs HEAT repeats in orange, DNA-PKcs FAT in blue, DNA-PKcs kinase in pink, and DNA-PKcs FAT-C in blue. B-D) Dimerization as seen by SAXS62. DNA-PKcs is shown as two regions: the head, containing the kinase domain, shown in gold, and the base/palm shown in grey. Ku70/80 is shown in pink. Above is the P(r) function, then below a cartoon representation of the complex. Below the cartoon is the SAXS envelope. The cryo-EM DNA-PKcs structure used for reference, below the SAXS envelopes, is from Williams et al. 200864 B)DNA-PK dimerization on DNA blocked on one end with a Y structure. C) DNA-PK dimerization on DNA blocked on one end with a hairpin structure. D) Concentration induced DNA-PKcs dimerization. Section A was adapted from Molecular Cell (PMID: 16713581), copyright (2006) with permission from Elsevier. Sections B-D were adapted from The Journal of Biological Chemistry (PMID: 19893054), copyright (2010) with permission from The American Society for Biochemistry and Molecular Biology

10

DNA-PKcs was seen in the concentration induced DNA-PKcs dimers in the absence of DNA

(Figure 1.4D)62.

After the formation of the long-range synaptic complex, a transition to a short-range synaptic complex appears to follow56. This transition requires DNA-PKcs kinase activity and the other core factors XLF, XRCC4 and LigIV56. These core factors are recruited to the break through by Ku70/8026 (Figure 1.2). XRCC4 and LigIV form a complex, the linker between the two BRCT (BRCA1 C-Terminal) domains of LigIV interact with the stalk of XRCC465,66

(Figure 1.5D). The XRCC4:LigIV complex is recruited to the break by Ku70/80, dependent on the presence of the LigIV BRCT domains67. XRCC4:LigIV then can be stabilized at the break by the accessory protein APLF, which binds Ku80 and then tethers to XRCC4 with its fork-head associated (FHA) domain interacting with phosphorylated XRCC468–71 (Figure 1.5). XLF is rapidly recruited to the break through interactions with Ku8072, where the c-terminal region of

XLF interacts with a pocket on Ku80 that is more accessible in the DNA bound form69 (Figure

1.5A). In addition to its interaction with Ku70/80, XLF interacts with the XRCC4:LigIV complex through interactions between the head domains of XRCC4 and XLF73,74 (Figure 1.5C).

This head to head interaction of XRCC4 and XLF is able to repeat, creating a filament structure that can tether DNA ends in vitro73–77. Filaments of XRCC4 and XLF can adopt many different pitches, some even so large they would be able to accommodate DNA-PK within the center of the filament, and could possibly tether the break over this large complex23. Even with this complex set of interactions that occur between the core NHEJ proteins, it is unclear how they could be organized into a larger NHEJ complex. Noticeably lacking are the interactions of the core factors, minus Ku70/80, with DNA-PKcs. The assembly of all the core NHEJ proteins

11

Figure 1.5. Structural Elements of the Ku70/80:XLF:XRCC4:LigIV Complex. Available structures for A) Ku70/80 (PDB: 1JEY (2.5Å)29), B) APLF (PDB: 5W7X (2.0Å)78, 2KQB79, 2KQC79), C) XLF:XRCC4 (PDB: 3SR2 (3.97Å)73), D) XRCC4:LigIV BRCT (PDB: 3II6 (2.40Å)66), and E) LigIV catalytic core (PDB: 6BKF (3.25Å)80). Spherical representation on the Ku70/80 structure highlights the known interaction sites for, the APLF Ku80 binding peptide (aquamarine), and XLF Ku80 binding peptide (blue)69. The bars below the structures show the regions of sequence represented in the structure (color matching the corresponding structure). Grey on the bar shows large regions not represented in the structure. Length of the bars is proportional to the protein sequence length. Known interactions, discussed in the text, are shown as boxes on the sequence bar, with the color corresponding to the interacting protein. PBZ domain in APLF is PAR-binding zinc-finger.

12 stabilizes synapsis and can increase DNA-PKcs auto-phosphorylation43,55,81, but how they interact to perform these functions is still poorly understood.

1.2.2 End Processing

The formation of a DSB often creates non-ligatable ends, especially in the case of breaks induced by IR, so DNA ends must be processed to allow for the final ligation step (Figure 1.2).

Phosphorylation of the ABCDE cluster on DNA-PKcs, by Ataxia Telangiectasia Mutated (ATM) or DNA-PKcs, is thought to allow access to the processing enzymes46,51,82 (Figure 1.2). There are many different processing enzymes that can function in NHEJ, requiring various interactions with the core proteins83 (Figure 1.6). Artemis, a that can also open DNA hairpins84, is recruited to the break through interactions with DNA-PKcs85, but it has also been seen to interact with LigIV86,87. Many of the other processing enzymes share similar modes of recruitment, suggesting some competition for binding of these proteins88 (Figure 1.6). The DNA polymerases

µ and λ, which can fill in gaps during repair24, are recruited to the break by Ku70/8089. Like

LigIV these polymerases contain a BRCT domain that is thought to be responsible for the recruitment to Ku70/8083. In addition to interacting with Ku70/80, polymerase λ was found to also interact with PAXX, XLF and XRCC490. The Werner protein (WRN), a DNA helicase, may bind to Ku70/80 at the same sites as APLF or XLF, based on identified binding motifs83,91, as well as interact with the XRCC4:LigIV complex92. Tyrosyl DNA phosphodiesterase (TDP1), which removes 3` phosphotyrosine24, also interacts with Ku70/80 and XLF, though the recruitment site is not as well defined93. In addition to overlapping binding sites on Ku70/80, polynucleotide kinase/phosphatase (PNKP), Aprataxin (APTX), and APLF, all contain an FHA domain which binds to phosphorylated XRCC483. Interestingly, for the PNKP

13

Figure 1.6. Interactions of the Processing Enzymes. Representation of the many interactions that can occur between processing enzymes and the core NHEJ proteins. Bars are proportional to the length of the processing protein. The lines show interactions between the processing proteins and the core proteins (boxes) that are discussed in the text. Lines that originate from colored regions on the bar represent known binding sites going the matched color in the core proteins. The two boxes in Ku80 represent the known XLF (green) and APLF (aquamarine) binding sites. The red circle represents the XRCC4 phosphorylation.

14

XRCC4:LigIV complex, only one PNKP was found to bind to the XRCC4 dimer, suggesting the XRCC4:LigIV complex could bind two different FHA containing proteins at a time94. End processing is a complex step within NHEJ with many protein interactions that can occur (Figure

1.6). But, despite this complexity there seems to be some order to how the processing enzymes are utilized, allowing minimal alterations to the DNA and repairing breaks with the fewest processing steps possible6,95.

In preparation for ligation, trans auto-phosphorylation of DNA-PKcs at the PQR somehow removes the blockade to ligation46, and the long-rang synaptic complex will transition to the short-range synaptic complex56 (Figure 1.2). For ease of representation in Figure 1.2,

DNA-PKcs auto-phosphorylation and the transition to the short-range complex are shown sequentially, but it is not necessarily how this occurs. A conformational change in DNA-PKcs upon phosphorylation62, could cause the release of DNA-PKcs from the DNA45,62, but it could also be a rearrangement of DNA-PKcs on the DNA that allows the two DNA ends to come closer together56. This new short-range synaptic complex is tethered by XLF interacting with two

XRCC4:LigIV complexes96. End processing could occur in either the long or the short-range synaptic complexes (Figure 1.2), though it was recently shown that processing by polymerases µ and λ was restricted to the short-range complex97. Limiting the processing by the polymerase enzymes to the short-range synaptic, when it can use other strand as a template, may be one of the ways in which NHEJ maintains the fidelity of the DNA upon repair97.

1.2.3 Ligation

The final step in NHEJ, ligation, is completed by LigIV in complex with XRCC4 (Figure

1.2). LigIV is a unique ligase in that it can ligate over gaps and accommodate base alterations6,98,99. This unique substrate tolerance comes from an extra segment of sequence in the

15

DNA binding domain, relative to other ligases, that allows for flexibility in the binding of mismatched substrates99. The ability to ligate over diverse ends is another way NHEJ limits processing, even allowing errors to be corrected after joining by pathways such as base excision repair6. After ligation Ku70/80 appears to be trapped on the sealed DNA. The removal of

Ku70/80 is thought to involve ubiquitination of Ku80 but this is still a poorly understood process100,101.

1.2.4 Summary of NHEJ

For each of the steps in NHEJ, there are structures for the core proteins, as well as structures for certain pairwise interactions (Figures 1.3 and 1.4). But a common trait among

NHEJ proteins, core and accessory, is disorder. Whether it is disorder in the middle of the protein as with DNA-PKcs, flexible linkers as seen with Ku70/80 and LigIV, or disordered tails like XRCC4 and XLF, this disorder makes it a challenge to generate high-resolution structures of larger NHEJ complexes, or even pairwise interactions (Figures 1.3 and 1.4). Though we do know a fair number of interactions sites within the NHEJ complex, the well-defined binding sites are often restricted to just one of the binding partners or located in these disordered regions, so it is not enough information to determine the structural assembly. Not only is the disorder challenging to build the larger assembly of NHEJ, but as the complexes become larger, new interactions that are occurring that could change the binary interactions that have been observed.

Assembly of these large NHEJ complexes adds another layer of complexity to the structural analysis. Can these complexes be assembled sequentially, adding purified proteins, or will they need to be purified from endogenous sources, and what are relevant sub-complexes?

The NHEJ system minimally includes the initial long-range synaptic complex, the assembly of the core NHEJ proteins, the dynamic interactions of the end processing enzymes,

16 and the short-range synaptic complex. A complete structural analysis of these complexes would answer important functional questions about NHEJ. For instance, knowing the structure of both the long and short-range synaptic complexes would inform on the functional significance of the observed two-stage synapsis, and how the transition between the two could occur56. With the assembly of the long-range synaptic complex, the role phosphorylation of DNA-PKcs in the assembly and regulation of the complex, the access to processing enzymes, and the involvement in the transition to the short-range complex, could be understood. Further, the structure of any of these larger complexes could provide information on how processing enzymes are allowed access to the DNA while still ensuring genomic fidelity.

The work presented in this thesis focuses on the initial steps of NHEJ, from assembly of

DNA-PK on DNA and activation of the kinase to formation of the initial long-range synaptic complex. A model of the initial stages of NHEJ is important to understand how DNA-PK binds to DNA in a way that protects DNA ends from resection, thus helping to maintain the fidelity of the DNA, but still would be able to allow access to processing enzymes. The current understanding of the assembly of DNA-PK comes from two static structural techniques, x-ray crystallography and cryo-EM, but does begin to reveal an allosteric mechanism of DNA-PK activation. With the importance of the allosteric activation of the kinase, the dynamics of binding and activation of DNA-PK should be further explored to understand how they relate to function, and possible release or re-arrangement of DNA-PKcs to allow for final ligation. Finally, a structural model of the long-range synaptic complex could elucidate, the function of this observed two stage synapsis, its possible role in DNA-PKcs trans-auto-phosphorylation, and if it positions DNA ends for processing. Given the structural challenges of these large and dynamic

17

NHEJ complexes, a single technique is unlikely to generate the required structural details for these complexes, so multiple structural techniques will need to be combined.

1.3 Integrative Structural Biology

To meet the challenges of studying complex protein systems, integrative structural biology

(ISB) was developed102,103. ISB allows information from many different structural techniques to be combined to generate a model for large protein complexes 102,104. Generally, ISB is divided into four steps: gathering data, translating data into appropriate system representation and spatial restraints, sampling models that satisfy the restraints, and finally the evaluation and validation of models102,105,106 . Many different types of data can be incorporated into ISB, offering unique structural information about the system102,104,107 (Figure 1.7). Not all structural techniques are equal, so the system representation or spatial restraint applied to each set of gathered data should reflect the precision of the data102,104,107 (Figure 1.8). Precision of the input representations and restraints ultimately effects the precision of the final model102. Looking at the two ISB models of the nuclear pore complex, the first low precision model (~60Å)108 was developed with a set of lower precision structural techniques, to build a higher precision model (~9Å)109 many higher precision structural techniques had to be incorporated102 (Figure 1.8).

In representing the system, the components that are going to be modelled are defined: proteins that are part of the system are identified, as well as the stoichiometry107 (Figure 1.7).

With the defined components, the next consideration is how to portray each for modelling in a way that represents the available information102. For example, in the low resolution model of the nuclear pore complex, they did not have crystal structures of subunits so the subunits were represented very coarsely, where the proteins were represented by large beads, dependent on the protein size, roughly assembled as globular or elongated shapes based on ultra-centrifugation

18 data102,108,110 (Figure 1.8). Many different representations can be chosen for one system, allowing disordered regions or less poorly defined subunits to still be modelled alongside of higher resolution structures in the larger complex102.

Once again depending on the data, the restraints generated from the structural data must be consistent with the precision of the technique102 (Figure 1.8). Examples of the types of restraints used for ISB are listed in Table 1.1107. If we consider a restraint from affinity purification (AP) versus a restraint from crosslinking mass spectrometry (XL-MS), both can tell us about interactions that occur in the complexes but do so at different resolutions (Figure 1.8)107. A connectivity restraints can be generated from AP-MS, while XL-MS can generate distance restraints107 (Table 1.1). The difference in how these restraints are satisfied relates to the resolution of the structural technique. Connectivity restraints can be satisfied if one protein from the AP data interacts with another protein from the AP data, so there could be many configurations of proteins that satisfy these restraints, consistent with the low-resolution data used to generate the restraint107. A distance restraint, generated from XL-MS, is satisfied if the two linked residues are not separated by more than a maximum distance, set based on the crosslinker used111. Restricting the interaction/proximity to specific residues, which is an appropriate resolution based on the technique, there will be less models that can satisfy the XL-

MS distant restraint. Spatial restraints are combined into a scoring function that is used to score the models in sampling102.

Sampling explores the many different configurations of the model, to identify models which are consistent with the identified spatial restraints102. If a configuration is not sampled, it cannot be evaluated for its consistency with the spatial restraints, so sampling must be

19

Figure 1.7. Structural Data for Integrative Structural Biology for NHEJ. Representation of the structure above corresponds to the resolution of each type of data. SPR is surface plasma resonance, Y2H is yeast two-hybrid, PCA is protein-fragment complementation assay, Cryo-ET is cryo-electron tomography, and H/D exchange is hydrogen/deuterium exchange. Reprinted from Molecular & Cellular Proteomics (PMID: 20507923), copyright (2010) with permission from American Society for Biochemistry & Molecular Biology.

20

Figure 1.8. Resolution of Inputs Effect the Final Resolution of the ISB Model. Each type of data that went into modelling the nuclear pore complex in 2007, and 2018108–110. The blue bars below each data type shows the range of resolution of data from that technique. Below are the corresponding ISB models of the nuclear pore complex, generated in 2007108 with low resolution structural data (left), and generated in 2018109 with the addition of higher resolution data (right). Adapted from Cell (PMID: 31150619), copyright (2019) with permission from Elsevier

21

Table 1.1. Common Restraints that can be used for ISB. Reprinted from Molecular & Cellular Proteomics (PMID: 20507923), copyright (2010) with permission from American Society for Biochemistry & Molecular Biology.

Restraint Description Source of information

Prevents steric clashes between Excluded volume restraint Physical first principles system particles

Geometric Restrains a protein interface to the Physical first principles complementarity restraint tightest possible packing

Restrains a structure to have contact Physical first principals, all Statistical potential frequencies similar to those in previously determined protein restraint structurally defined complexes structures

Restrains the distance between two FRET, BRET, cross-linking, Distance restraint particles homology to a known structure

Protein localization Restrains a protein to a specific Immuno-EM, gold labeling, restraint position GFP labeling

Protein connectivity Restrains all proteins in a set to Affinity purification restraint interact directly or indirectly

Restrains the angle between three EM, SAXS, homology to a Angle restraint particles known structure

Restrains the distance between the Complex diameter two most distant particles in a protein EM, SAXS restraint or complex

Maintains the same configuration of EM, SAXS, homology to a Symmetry restraint equivalent particles across multiple known structure symmetry units

Restrains the model to overlap with a EM quality-of-fit restraint EM, SAXS density map

Restrains the correlation between Radial distribution experimentally measured and SAXS function restraint computed radial distribution functions BRET (Bioluminescence resonance energy transfer), GFP (green fluorescent protein)

22 sufficiently thorough102. But, with large/complex systems it is not reasonable to try and sample all possible configurations102. Stochastic sampling is generally employed in ISB to minimize bias and generate a set of models102. One requirement for a good final model involves a test for convergence 102. For a good model, we would expect convergence to a small number of solutions. A lack of convergence of models could result from insufficient spatial restraints for the representation that was chosen105. Whether it is from low resolution restraints or limited number of restraints, lack of convergence would suggest that you should return to the first stage of modelling and gather more data, that could help models to converge, or rebuild your representation105. If the models do converge then they can be validated, looking for both satisfaction of the data used to generate the model as well as satisfaction of data that were not used to generate the model102.

1.4 ISB Model of DNA-PKcs in the Initial Steps of NHEJ

When building an ISB model, you should consider what resolution is needed to answer the question that is being asked102. Depending on the resolution that is needed, different structural techniques can be used to generate data107 (Figure 1.8). In building a model of DNA-

PKcs in the initial stages of NHEJ, we hope to determine the role conformational changes have in DNA-PKcs binding and activation, how DNA-PK binding and synapsis can protect DNA ends from resection while still allowing access to processing enzymes, and finally if synapsis contributes to trans auto-phosphorylation of DNA-PKcs. To answer these questions, we will need to localize specific regions in the system, such as the kinase and phosphorylation sites, so a low-resolution model (>30Å) will not be enough to answer our biological questions. There have been many different structural techniques applied to NHEJ, so with the available structural information, could an ISB model of sufficiently high resolution answer our biological questions?

23

1.4.1 Available Data and Restraints

Though obtaining high resolutions structure of protein complexes is challenging, requiring large amounts of homogenous protein solutions, high resolution structures of all structural elements of the synaptic complex, including the synapsis stabilizer PAXX, are available in whole or in part (Figure 1.3). These high resolution structures are important as they allow for a high resolution representation of the subunits when modelling102. Despite the availability of these structures, there is still large amounts of sequence missing from each structure. Sections without structure will have to be coarse-grained to some degree in the representation, which reduces the resolution of these elements in the model. In addition to providing high-resolution representation of the subunits when modelling, the DNA-PK structure provides high-resolution information on the interaction between Ku70/80 and DNA-PKcs, so

DNA-PK can be modelled as one unit rather than assembling the individual proteins.

Both SAXS and EM, which provide quality of fit restraints for modelling (Table 1.1), have been applied to the DNA-PK synaptic complex, generating low resolution models62,63.

SAXS and EM restraints are valuable for generating higher resolution models (Figure 1.8)102, but the ambiguity of the DNA-PK synaptic models that are generated (dependent on DNA substrate (Figure 1.5)), make this SAXS and EM data inappropriate as restraint in modeling105.

Even though these data cannot be used as a spatial restraint in modelling, they could be applied to validate models in the final stage. The SAXS and EM data do still offer an important piece of data for representing the system, that is stoichiometry. Despite different DNA-PK synapsis orientations being detected, they both suggest a dimer consisting two DNA-PK molecules55,62,63.

Another restraint that could be generated from structural techniques and used to inform an ISB model for NHEJ, involves connectivity. These connectivity restraints are generated from

24 several different structural techniques including, pull-downs, immunoprecipitation (IP), and electrophoretic mobility shift assays (EMSA). Connectivity restraints can be refined when these techniques are combined with /deletions, allowing the interaction site to be restricted to specific residues/regions of protein. Two such connectivity restraints, not already represented in the DNA-PK structure, are the interaction of the Ku80 c-terminal region with DNA-PKcs35, and the interaction of the PAXX c-terminal region with Ku7059. The binding site for these interactions are only well defined for one of the proteins, so would only represent a very low- resolution restraint, similar to the scale for affinity purification seen in Figure 1.8.

Finally, the last restraint that could be used to model DNA-PK synapsis would be from the recent Förster Resonance Energy Transfer (FRET) experiments56. FRET can be used to generate a distance restraint between the incorporated fluorescent molecules107 (Table 1.1). So, in the case of the FRET data available, the distance restraint would be satisfied if in the synaptic complex the DNA ends were separated by greater than 100Å. Though the DNA strand is seen in the DNA-PK structure40, we do not know if that represents all DNA binding that can occur, as there is some evidence that DNA-PKcs can splay open the DNA ends, which is not seen in the structure55,112,113. The uncertainty in the position of the DNA ends, impedes the use of this FRET based distance restraint in modelling the complex. If the DNA end positions are different than presented in the crystal structure, versus the positions when generating the FRET data on a longer piece of DNA, then the restraint could lead to the incorrect selection of models, that do satisfy the distance restraint but the distance is not accurately represented in the model105. Like with the ambiguous SAXS and EM data, the uncertainty associated with the FRET restraint make it inappropriate to use in modeling, but it could still be used in the final validation of the data.

25

Looking at the available data for ISB modelling of the DNA-PK synaptic complex, there is high-resolution data when it comes to the representation of the system, but the spatial restraints that can be used in modelling are both low-resolution and sparse. With this combination of data, high resolution representation and sparse low-resolution restraints, it is unlikely we would be successful in generating a converging set of models at the required resolution, based on the sparseness of the restraints. So, if we are going to accurately model

DNA-PK to answer our biological questions, what type of data should be added to the currently available data? The good news is we already have a high-resolution representation of the proteins for the questions, so we do not have to worry about generating any high-resolution structures.

Instead we can focus on higher-precision (residue level Figure 1.7), structural techniques, to generate structural informational that will help answer our biological questions. As such two MS techniques, hydrogen deuterium MS (HX-MS) and XL-MS, will be used to generate additional data to answer our biological questions. Each of these techniques provides higher resolution data that will help build our understanding of DNA-PKcs in the initial stage of NHEJ. HX-MS provides peptide level information on binding and conformational changes114,115, and XL-MS provides residue level distance restraints for modelling102. Both techniques are also able to provide structural information for disordered regions of proteins, which cannot be monitored by many of the other higher resolution structural techniques.

1.4.2 Hydrogen Exchange Mass Spectrometry (HX-MS)

The ability of HX-MS to monitor dynamics of proteins makes it an ideal technique to study the dynamic activation of DNA-PKcs in NHEJ114,116–120. HX-MS measures deuterium uptake in proteins. Since HX is labelling the amide hydrogens, information for the entire protein, except proline, can be obtained, including information for disordered regions that are not present

26 in the high-resolution structures. To begin the proteins are incubated with deuterium oxide to allow for the exchange of amide hydrogens for the heavier deuterium (Figure 1.9). After a specified labelling time the exchange reaction is quenched, decreasing the pH (~2.5) and temperature (<10°C)121. Quenching the reaction slows the exchange of deuterium and hydrogen but some exchange still occurs, so remaining steps in the experiment must be done under quench conditions and as quickly as possible to prevent back-exchange of the deuterium label to hydrogen121. To increase the resolution of the deuterium uptake measurement, proteins are digested to peptides (Figure 1.9), allowing for the measurement of the deuterium uptake at the peptide level rather than the whole protein.

Under quench conditions, trypsin, the typical specific protease used for proteomics, does not work. Digestion for HX-MS requires a protease that cleaves rapidly at low pH and temperature. The most common protease employed is pepsin which has non-specific cleavage and functions at a low pH114,122, but other proteases are available that meet these requirements such as recombinant Nepenthesin-II (rNep-II)123. Peptides are separated by reverse phase liquid chromatography (LC) before MS analysis (Figure 1.9). The MS spectra can be analyzed by various software tools to determine the deuterium uptake for that peptide, which can be calculated by measuring the change in the average mass of the peptide114 (Figure 1.9). Now with the calculated deuterium uptake values for the peptides of a protein, what does the method inform us about structure and protein dynamics?

The base catalyzed exchange of the hydrogen for deuterium proceeds as below124:

푘표푝 푘푐ℎ 푘푐푙 N-Hclosed ⇄ N-Hopen ⇄ N-Dopen ⇄ N-Dclosed (equation 1.1) 푘푐푙 D2O 푘표푝

27

Figure 1.9. General HX-MS Workflow. Spectral analysis is showing the measurement of the deuterium incorporation. The arrows show the centroid mass of the peptide with and without deuterium. The change in centroid mass (labelled – unlabelled) is used to calculate the deuterium incorporation. Reprinted from Nature Methods (PMID: 31249422), copyright (2019) with permission from Springer Nature

28 where kop and kcl correspond to the rates of opening and closing for structural conformations that allow for exchange of the hydrogen amide, and kch is the intrinsic rate of exchange for the exchange of hydrogen for deuterium. From this reaction mechanism the observed rate of

124 exchange (kHX) is :

푘표푝푘푐ℎ 푘퐻푋 = (equation 1.2) 푘표푝+푘푐푙+푘푐ℎ

It is then assumed that the closed state is the preferred state for a protein, so the rate of closing would be much faster than the rate of opening leading to a simplified equation below124.

푘표푝푘푐ℎ 푘퐻푋 = (equation 1.3) 푘푐푙+푘푐ℎ

This equation is further simplified when considering the exchange for structured proteins. For structured proteins it is assumed that the rate of closing is much greater than the rate of exchange:124

푘표푝 푘퐻푋 = 푘푐ℎ (equation 1.4) 푘푐푙

With this simplified equation to describe the rate of exchange of a structured peptide/protein, we see that the observed exchange rate is related to the equilibrium constant for deprotection of the

124 amide (퐾표푝):

푘표푝 퐾표푝 = (equation 1.5) 푘푐푙

If deuterium uptake is related to the rate of deprotection of the amide, what are the structural features that influence the rate of opening? There have been several attempts to relate observed deuterium uptake with structure, but the current models of deuterium exchange are not able to accurately predict deuterium uptake from an observed structure125–130. So, unfortunately with a set of predicted/modelled structures, consistency with the measured deuterium cannot be

29 accurately used to score or validate these structures. But, from these models of deuterium exchange we see that deuterium uptake correlates with hydrogen bonding and to a lesser extent with solvent accessible surface area (SASA) 125–130. So, both hydrogen bonding and SASA must have some effect on the rate of structural fluctuations that present the amide hydrogen for exchange.

Since structure cannot be inferred from the observed deuterium uptake, HX experiments are usually conducted as differential experiments, looking at the changes in deuterium uptake between two states (most often bound and unbound) (Figure 1.9). When binding occurs, there is the formation of a new binding site and conformational changes may be induced by binding.

Both new interactions and conformational changes will influence the hydrogen bonding and

SASA of the protein and thus influence the deuterium uptake. So, changes in deuterium uptake upon binding can be used to identify, at a peptide level, binding sites as well as regions of the protein that undergo conformational change upon binding114,115.

Since both conformational changes and binding cause changes in deuterium uptake, without prior knowledge of binding sites, the observed changes in binding can be structurally ambiguous, and for that reason are dangerous to include as uninterpreted restraints in ISB.

However, HX-MS data can be used successfully when structure is available, testing if the final models generally agree with the observed HX-MS data. Some caveats regarding the technique must be considered in data interpretation. The measured peptide deuterium-uptake represents the average deuterium uptake for that peptide from all protein molecules in solution. If the sample represents a heterogenous solution of states, the measured deuterium uptake will represent all those states, which in the case of partial binding/complex formation could result in the inability to detect changes due to binding. Assembly of the long-range synaptic complex will likely

30 generate a heterogeneous solution of states (whole complexes, partial complexes, and single proteins), with the highest reported formation of DNA-PK synaptic complexes in vitro being only 30% of the observed complexes55. Because of this heterogeneity, and the ambiguous nature of HX data, HX-MS is not the right technique for investigating the long-range synaptic complex but is ideally suited to study the conformational changes that must accompany the assembly of the DNA-PK complex. To generate restraints that can be used to model the long-range synaptic complex, crosslinking mass spectrometry (XL-MS) will be used.

1.4.3 Crosslinking Mass Spectrometry (XL-MS)

Distance restraints from XL-MS have already been successfully used as ISB restraints for modelling protein complexes108,109,111,131–137, so they should be effective for modelling of the long-range synaptic complex. XL-MS generates a set of distance restraints, setting a maximum distance crosslinked residues can be separated131,135,136. As mentioned above, assembling the long-range synaptic complex is likely to generate a heterogeneous solution of states. In XL-MS, like HX-MS, the crosslinking data do represent crosslinks for all states in the sample. However, a single crosslink does not represent the average of all states, rather it represents one of the sub- states in the sample. So, some sample heterogeneity in complexes/complex assembly can be tolerated111,136–140. Consider an example from Erzberger et al (2014), where the gathered data consists of high-resolution structures and crosslinking. The authors were able to develop a model of the ribosome translation initiation complex with an average precision of ~36Å (Figure

1.10)131. As with most structural techniques, ISB does not generate a uniform solution, so depending on the information available for different proteins in the complex, the precision of proteins within the model can be different102. When individual proteins are considered in the

Erzberger model, some parts show the interactions/protein positions with a precision of 15Å131.

31

For XL-MS, proteins and protein complexes are crosslinked with a crosslinker of known reactivity and spacer length (Figure 1.11). Depending on the reactivity of the crosslinker, different residues can be targeted, but the most commonly used reactive groups primarily couple lysine residues141. The reaction of the crosslinker with the protein is covalent, so unlike HX-MS, processing after labelling can include protracted digestions and enrichment techniques.

Crosslinked proteins are digested, usually with trypsin, generating crosslinked peptides along with the usual tryptic linear peptides (Figure 1.11). Generally, crosslinked peptides represent only a small fraction of the peptides present in the sample141, so to increase their detection they can be enriched from the free peptides (Figure 1.11). The most common techniques that are used for enrichment are size exclusion chromatography (SEC), strong cation exchange (SCX), and affinity tags141. SEC and SCX both are enrichments based on the fact that crosslinked peptides contain two peptides, causing them to be larger than free peptides for SEC enrichment142, and have higher charge for SCX enrichment143. Affinity enrichment is more specific than SEC/SCX, as it is a targeted enrichment based on an enrichment tag present on the spacer/linker of the crosslink (Figure 1.11). After enrichment crosslinks are analyzed by LC-MS (Figure 1.11).

Finally, there is various software tools available to identify the crosslinked peptides, to generate a list of crosslinked residues (Figure 1.11).

The covalent reaction of the crosslinker with protein offers many advantages for down- stream sample processing, but because it is a covalent reaction with the protein, there is the possibility that the incorporation of the crosslinker can alter the native state of the protein144.

These alterations to structure are a problem when the modified structure is crosslinked again, generating a crosslink that is not representative of the protein structure144. In other words, a crosslinker can also kinetically trap a protein during an extended reaction, where one crosslink

32

Figure 1.10. ISB Model of the Ribosome Translation Initiation Complex. A) Cluster analysis of one ensemble (ensemble 2 not shown). B) Model localization densities of B) cluster 1, and C) cluster 2. Adapted from Cell (PMID: 25171412), copyright (2014) with permission from Elsevier

33

Figure 1.11. General XL-MS Workflow. Reprinted from Nature Structural & Molecular Biology (PMID: 30374081), copyright (2018) with permission from Springer Nature

34 traps a conformational change, which now can be crosslinked again. The final product can be a state that is not necessarily representative of the conformational ensemble in solution145. This problem is inherent to the slow reactivities of the commonly employed crosslinkers, which are often reacted for minutes to hours146. High-resolution input structures provide a check on the disruption of the structure, particularly through the evaluation of intra-protein crosslink satisfaction. Nonetheless, we should always remain mindful of trapping non-native structures when interpreting our models. The use of photoactivatable crosslinkers would help address protein trapping, as they react very quickly, and thus should impart minimal secondary crosslinking of the system145. Despite the advantages of fast reactivity, photoactivated crosslinkers are not used very often, as they have lower yields145, and the non-specific linking of residues makes it harder to accurately identify crosslinked peptide135.

Though crosslinks can be challenging to detect, in the right circumstances as few as 4-5 crosslinks are able to generate an accurate model (Figure 1.12C). With the addition of more data, the model does not necessarily become more accurate, but the precision can be increased

(Figure 1.12C). This is not to say that 4-5 crosslinks represent a magic number that will work for all systems, but that in a tightly controlled biochemical preparation you may not need to detect large numbers of crosslinks at the interface to generate an accurate model. With that said, if the precision of the model is insufficient to address question being asked, then additional data should be added to improve the model (Figure 1.12). These additional data could arise from improved crosslink detection protocols, or through the application of other crosslinking chemistries141, such as both hetero and homo-bifunctional acidic reactive147–149 or photo- activatable145, that offer complementary data sets111,145,150,151.

35

Figure 1.12. Effect of Number of Crosslink Restraints on Model Accuracy and Precision A) Localization density map from of Nup84, showing the location of the structure Nup145c- Sec13 (PDB 31KO). The known interface for Nup145c-Sec13 was being modelled with various amounts of crosslink restraints. B) Total score of the model plotted against Cα dRMSD (distance root-mean-square deviation) for the dimer models vs the structure. C) Comparison of the dimer models to the structure. dRMSD is the Cα distance root-mean-square deviation. The red line (mean) corresponds to the accuracy of the model, and the precision of the model is represented by the box. DSS is disuccinimidyl suberate and EDC is 1-ethyl-3-(3-dimethylaminopropyl) carbodiimide hydrochloride. Reprinted from Molecular & Cellular Proteomics (PMID: 25161197), copyright (2014) with permission from American Society for Biochemistry and Molecular Biology

36

1.4.4 Bringing it all Together

The work presented in this thesis attempts to build an accurate structure-function model for the initial stages in the NHEJ process, focusing on the role of DNA-PK in end protection and formation of the long-range synaptic complex. HX-MS was used to develop an understanding of the conformational changes in these initial steps and how they relate the function of DNA-PK in

NHEJ. XL-MS restraints together with the full armamentarium of available structures were used to generate an ISB model of the essential end protection complex and the initial long-range synaptic complex. Each of the MS techniques provided valuable information for NHEJ, but first required significant optimizations to meet the unique challenges of this system. Specifically, chapter two presents an advanced nanoHX-MS configuration ideal for large protein systems, and describes how key challenges were overcome to allow for HX analysis of these large NHEJ complexes. Chapters three and four include the benchmarking and method optimization of two different crosslinking methods. With the optimized HX and XL methods in hand, I describe in chapter five an extensive conformational analysis of the assembled monomeric DNAPK system and present a model for the long-range synaptic state using ISB method.

37

Chapter 2: HX Methods for Ultra Large Protein Systems

2.1 Introduction

HX-MS is a valuable tool for studying the dynamics of proteins114,115,118,120,124.

Traditional HX-MS methods have worked well for protein systems up to 150kDa, but as the proteins and complexes that are being studied become larger and larger, we start to approach the limitations of this technique. To study the large protein kinase DNA-PKcs (469kDa) there are three main challenges to using HX-MS. The first challenge is the short gradient required to prevent back-exchange. With a short gradient and increasing number of peptides more spectral overlap will occur. Improvements such as subzero LC152,153, which allows for the use of longer gradients while keeping back exchange at a minimum, and ion mobility154, which allows for a secondary separation of peptides prior to MS analysis, have been developed to address the current limitations of chromatography. While subzero LC and ion mobility could help “solve” the separations problem, they do not address the second problem of working with DNA-PKcs, namely, sample requirements. For a single replicate at one labeling time point, an HX-MS experiment can consume anywhere from 10-100 picomole (pmol) of protein119,152,155,156. Though consumption in the pmol scale seems small, relative to more traditional structural techniques that require milligrams (mg) of purified protein157, the purification of DNA-PKcs from an endogenous source means less protein is available, and with a high cost associated with isolation of sufficient amounts. Given that purification of DNA-PKcs from 100L cell pellet through multiple columns takes ~2 months158, DNA-PKcs must be thoughtfully consumed, limiting sample consumption per experiment. Lastly, since HX measures the solution average of deuterium uptake, how should DNA-PKcs complexes be assembled to ensure a homogenous solution of states? If multiple forms are present in the sample then the deuterium measurements

38 become an average of all states, increasing noise and making it more challenging to detect changes in deuterium measurements.

To address the first two challenges of HX-MS for DNA-PKcs, we developed and optimized a nano-spray HX-MS system. Nano-spray with columns incorporated into the spray tip, already commonly employed in proteomics159, allows for decreased sample consumption and improved chromatographic peak widths160. Despite the advantages of nano-spray, over traditional micro flow, it is not commonly employed in HX-MS. One of the major challenges of incorporating nano-spray into HX is keeping the column cold enough to limit back exchange.

Our nano-spray HX system consists of a cooler that can be attached to a Sciex Nanopray III source to maintain quench temperatures at the column (Figure 2.1)161. The switch to a nano- spray system reduces chromatographic peaks width to ~6 seconds (sec), compared to 20 sec with our traditional flow set-up161. Back-exchange for the nano-spray HX system was comparable to the back-exchange observed with traditional flow set ups, with average retention of 75% of the deuterium label, ensuring the cooler was effectively chilling the column161. On a small sample protein, full protein sequence coverage was obtained injecting just 3 femtomole of protein using the nano-spray HX system161. With the improved chromatographic performance and increased sensitivity on small proteins, our nano-spray HX system was tested on DNA-PKcs, to see if the improvements will allow for HX-MS analysis of large proteins/protein complexes.

Improvement in chromatography will help with spectral overlap and sample consumption, but it may not be enough for full coverage of large proteins in HX analysis. In addition to our nano-spray HX system, there are other methodological changes we can make to improve sequence coverage. Traditionally, labelling of the protein is done at high deuterium

39

Figure 2.1. Nano-spray HX System. a) Diagram of the nano-spray HX system. The nano-spray cooler system, temperature controlled with the Peltier element, is mounted on the column rail encasing the picofrit column to keep separation at quench temperatures. A and B in the diagram show the possible positions from the trap column; A in an external cooler, and B within the nano-spray cooler. For the following experiments, the trap column is at position A. Circles with X’s are valves. b) Temperature in the cooler at different source interface temperatures, red 75°C, green 150°C, and purple 225°C. At each interface temperature the nano-spray cooler can keep most of the column around the set point 2°C, with the exception of the very tip of the column. Reprinted from Analyst (PMID:28154854), copyright (2017), with permission from The Royal Society of Chemistry.

40 oxide (D2O) concentrations (~90%). Reducing the deuterium labelling percentage (<50%), reduces the expansion of the isotopic distribution162. This reduced expansion helps maintain the intensity of the peaks, allowing lower intensity peptides to be detected162 (Figure 2.2). The reduction in mass shift with lower deuterium labelling could also help reduce spectral overlap of the labelled peptides (Figure 2.2). Another change that can be made, is using a different method to calculate deuterium uptake. Deuterium uptake is generally measured as the change in the centroid mass (equation 2.1)163,

푚−푚 푃푒푟푐푒푛푡 퐷푒푢푡푒푟푎푡𝑖표푛 = 0 x 100% (equation 2.1) 푚100−푚0 where m is the experimental centroid mass, 푚0 is the centroid mass of the peptide not

163 deuterated, and 푚100 is the centroid mass of the fully deuterated peptide . If there is a partially overlapping peptide it will skew the centroid mass, making the peptide unusable for deuterium quantification. To allow for the use of peptides with minor overlap, our in-house software, Mass

Spec Studio (MSS)164, calculates fitted deuteration values. For these fitted values, deuterium incorporation is calculated using linear least squares methods, based on a manual selection of 2 or more peaks in the correct distribution165. With the ability to select non-overlapping peaks in

MSS for deuterium quantitation, overlapped peptides can be rescued. So, to improve peptide detection and sequence coverage, nano-spray HX analysis will be combined with low deuterium labelling, and deuterium uptake quantification with MSS.

Use of the nano-spray HX system with low deuterium labelling and MSS analysis should address the first two challenges of HX for large proteins like DNA-PKcs: spectral overlap and sample consumption. However, since DNA-PKcs in the context of NHEJ is to be studied, DNA-

PKcs complexes were assembled on DNA bound to beads through affinity, simulating one side

41

Figure 2.2. Expansion of the Isotopic Distribution with Deuterium Uptake. Simulated deuterium uptake of a protein with 100 exchangeable sites. Each of the curves corresponds to the isotopic distribution of the protein at the time (sec) listed above the curve. With the incorporation of more deuterium, up to ~50%, isotopic distribution becomes broader resulting in a lower maximum intensity. Incorporating more deuterium, above 50%, the isotopic distribution begins to narrow. Adapted from the Journal of Mass Spectrometry (PMID:18523973), copyright (2008), with permission from John Wiley and Sons.

42 of double strand break. Assembling the complexes on DNA should ensure that the protein being analyzed is saturated with DNA. The unbound protein can be washed away, removing a key source of heterogeneity. However, assembling DNA-PKcs complexes on DNA-bound beads means HX labeling in a slurry, presenting the possibility of labelling error arising from differences in residual volume left on the beads. In this chapter, I describe an optimized nanoHX-MS analysis of DNA-PKcs (DNA-free and DNA-bound) and test different methods for correcting the volume uncertainty arising from the required slurry format.

2.2. Methods

2.2.1. Nano-spray HX.

Nano-spray vs Traditional Micro.

To compare sequence coverage and sample consumption requirements of the traditional flow set up to the nano-spray HX system, a map for DNA-PKcs was generated with each configuration. For the traditional flow set up, DNA-PKcs was digested with rNep-II for 3 min at

10°C. After digestion 5 pmol was loaded on to the sample loop of a liquid chromatography (LC) system. All LC proceeded at 4°C using an Eksigent nano-Ultra 2D HPLC (Sciex). Peptides were concentrated on a 250 µm x 2 cm self-packed (Jupiter Beads, C18 3.6 µm, Phenomenex) trap column and washed for 3 min, 10 µL/min solvent A (3% acetonitrile (ACN) with 0.23% formic acid (FA)). Separation proceeded on a 10 cm self-packed 10 cm x 100 µm column (Jupiter

Beads, C18 3.6 µm), with a 10 minute 10-40% solvent B (97% ACN, 0.23% FA) gradient at

4 L/min. Data were collected on a Sciex Triple TOF 5600 fitted with the DuoSpray ion source.

MS/MS data were collected fragmenting the top 20 with a collisional energy of 40. Five more replicates were run in a recursive “gas-phase fragmentation” method, targeting 300-500 m/z,

43

500-600 m/z, 600-700 m/z, 700-900 m/z and 900-1250 m/z. To validate the map three replicates were run, collecting only MS data.

For the nano-spray map, DNA-PKcs was digested with rNep-II for 3 min at 10°C. 0.5 pmol of digest was loaded onto a 1µL loop. As before, chromatography, including the Nano- spray cooler, was held at 4°C. The sample was concentrated on a Pepmap100 C18 nanoviper trap column (75 µm x 2 cm, 3 µm Beads, Thermo Scientific), and washed 2 min at 5 µL/min.

Peptides were separated on a 7 cm x 75 µm self packed (Jupiter Beads, C18 3.6 µm,

Phenomenex) picofrit column (New Objective), with a 10 minute 10-40% solvent B gradient at

500 nL/min. Data were collected on a Sciex Triple TOF 5600 equipped with Sciex Nanospray III source. The data for the map and validation was collected as for the traditional flow, but the only the top 10 peaks were selected for fragmentation, to allow for at least five cycles over the narrowed peak widths (~6s).

Data for both the traditional and nano-spray map were searched using Mascot (Version

2.4.0). Data were searched on a self-curated database which included DNA-PKcs and possible contaminant proteins as well as rNep-II. Search parameters included: non-specific digestion, with a peptide mass tolerance of +/- 10 ppm, a fragment mass tolerance of 0.02 Da, searching for b and y fragments. Peptides with a false discovery rate (FDR) of 5% were kept and peptides from all replicates were pooled, to generate a peptide list to be validated on the MS only replicates. The peptide list from Mascot was validated using MSS, removing low intensity peptides and peptides with extensive overlap, to generate the final peptide map.

Kinetics Analysis of DNA-PKcs using nanoHX

DNA-PKcs was labelled in 40% deuterium oxide (D2O) solution (10 mM Tris pH 7.5, 75 mM KCl, 5 mM MgCl2, with 1 mM DTT added fresh) at 25°C, at the specified time point (0.5,

44

1, 5, 15, or 60 min), the sample was quenched, adding 100 mM glycine HCl pH 2.5, and frozen in liquid nitrogen. Before analysis samples were thawed and digested with rNep-II for 3 min at 10°C. Digested peptides underwent analysis as above to generate the protein map using the nano-spray system. Data were collected on a Sciex Triple TOF 5600, collecting only MS1 data. Each time point was run in triplicate, as well as a single 24 hr labelling sample was collected. Deuterium uptake was quantified with the HX-MS application in MSS164 (V1), using the nano-spray peptide map for the peptide identifications. Average deuterium uptake values were generated with the statistics package of MSS164.

Calculating Peptide Protection Factors

For each peptide, using Matlab (R2016a), average deuterium values were plotted versus time and fit with single and double exponential equations:

퐷 = 푀 − 퐴푒−푘푡 (equation 2.1)

퐷 = 푀 − 퐵푒−푘1푡 − 퐶푒−푘2푡 (equation 2.2)

D is the measured average deuterium uptake, M is the maximum deuterium uptake (set as the deuterium uptake at 24 hrs), and t is the time. The rate of exchange (푘, 푘1, 푘2) and the coefficients (A, B, C) were determined from the fits. Peptides with good r2 values (>0.97) for the single exponential fit or the r2 for the single was better than for the double fit, were assigned rates of exchange as determined by the single fit. The remaining peptides were assigned rates of exchange determined by the double fit. To determine the intrinsic rate of exchange for each peptide, the theoretical deuterium uptake of each peptide at different time points was calculated with previously measured random coil rates166. These theoretical deuterium uptakes were plotted and the same single exponential (equation 2.1) fit to each. The protection factor (PF) was

45 calculated, dividing the intrinsic rate of exchange (푘푐ℎ) by the observed rate of exchange

124 (푘퐻푋) :

푘 푃퐹 = 푐ℎ (equation 2.3) 푘퐻푋

For the peptides best fit with the double exponential (equation 2.2) two PFs were assigned, one for each observed rate of exchange. Given the wide range of PF values that are observed, the final value is reported as the Log(PF). A subset of the data (363 peptides) were not able to be fit with either a single or a double exponential and were manually assigned to one of three categories: not protected, fully protected, and no protection in time scale of the experiment. For peptides which showed no protection (193 peptides), deuterium uptake at 0.5 min ≈ 60 min ≈ 24 hrs. Peptides that showed full protection (135 peptides), deuterium uptake at 60 min and 24hrs was approximately zero. The last category, no protection in the time scale of the experiment (35 peptides), the deuterium uptake at 0.5 min ≈ deuterium uptake at 60 min but is less than deuterium uptake at 24 hrs. The peptides which showed no protection were used to determine the lowest rate of exchange that could be accurately measured. All the deuterium values for the no protection peptides were plotted on the same graph and a linear fit determined (Figure 2.3). The higher limit of the 95% confidence interval for the slope was used to set the minimum slope that could be accurately measured, 0.00054 Deuterium/min (Figure 2.3). The minimum slope that could be measured was used to define the peptides that are fully protected within the accuracy of our measurements. With the extreme Log(PF) values removed, because they were now defined as fully protected, the highest Log(PF) measured was 4.6. So, all peptides classified as fully protected were assigned a Log(PF) value of 5. For the peptides that were classified as not

46

Figure 2.3. Determining the Lowest Rate of Exchange that can be Accurately Measured. All peptides that were manually assigned as fully protected were plotted as deuterium uptake versus time. A linear fit was determined with the Matlab (R2016a) Curve Fitting Tool giving a slope of 0.00041 with 95% confidence bounds of 0.000028 and 0.00054. The red line in the middle of the box indicates the median of the data. The whiskers extend out the minimum and maximum values. Outliers are plotted at red plus symbols.

47 protected a Log(PF) of 0 was assigned. Peptides showing no protection in the timescale of the experiment were assigned two Log(PF) values, 0 and 5, as part of the peptide shows no protection (deuterium 0.5 min = 60 min) but part of the peptide very slowly exchanges

(deuterium 60 min < 24 hrs).

2.2.2 Nano-spray HX for Bead Bound DNA-PKcs

Capture of DNA-PKcs on DNA Bound to Beads

The DNA-PKcs pull-down procedure was adapted from Hammel et al 201062.

Streptavidin agarose beads (Solulink, agarose ultra performance) were removed from the bead slurry and washed three time with HE buffer (10 mM HEPES pH 7.0, 1 mM EDTA). DNA (100 pmol) was bound to the beads at a 1:8 ratio with biotin for 10 min at room temperature, to disperse the DNA over the beads and limit interactions between DNA strands. Two different types of DNA were used to pull down DNA-PKcs; blunt and overhang (OH) (Integrated DNA

Technologies) (Figure 2.4). Additional biotin was added for 5 min to ensure that all streptavidin sites were saturated.

To prepare for protein binding, beads were washed 5 times with DNA binding buffer (50 mM Tris pH 7.5, 75 mM KCl, 5 mM MgCl2, 5% glycerol, and added fresh 1 mM DTT and 0.5 mM PMSF). After washing, to block non-specific interactions with the beads, BSA in DNA binding buffer was added for 5 min. Purified DNA-PKcs (13 pmol) was added to the DNA bound beads and equilibrated 15 min on ice. To remove unbound DNA-PKcs the beads were washed 5 times with wash buffer (10 mM Tris pH 7.5, 75 mM KCl, 5 mM MgCl2, 3% glycerol

(v/v), 0.2 µM caffeine, and 1 mM DTT added fresh). 10 times the volume of beads was used for each wash. After the final wash, 5X sample loading buffer (250 mM Tris pH 6.8, 10% SDS, 30% glycerol(v/v), 0.05% Bromophenol Blue (w/v)) and wash buffer was added to the beads. The

48

Figure 2.4. DNA Constructs Used for the Pull-Down of DNA-PKcs.

49 beads were heated for 5 min at 97°C before running on an 8% low-bis gel, with a control lane of known amount of DNA-PKcs. For Western analysis the gel was transferred to a nitrocellulose membrane and probed with an antibody to DNA-PKcs (Monoclonal Ab 42-47, Lees-Miller Lab).

Quantitation was done with Image Studio Lite (version 5.2).

Differential HX Analysis of DNA-PKcs (DNA Bound and Free)

To enable accurate HX analysis of DNA-PKcs on beads, the volume of buffer left on the beads requires determination. To correct for labelling errors, an established method using light and heavy (isotopically labelled) caffeine was evaluated167. For this correction, light caffeine is included in the buffers/protein solution prior to labelling, and the heavy caffeine is included in the labeling buffer, then the two are mixed as they normally would be for the HX-MS experiment (Figure 2.5)167. The intensity for each of the light and heavy caffeine can be measured, and the ratio between the light and heavy can be used to correct for the actual deuterium labelling percentage (Figure 2.5)167.

The beads were prepared by the pull-down procedure above, without the addition of

DNA or protein. After the final washing step, the supernatant was carefully removed and wash buffer containing heavy caffeine (C18 labelled), instead of caffeine, was added. 100mM glycine was added, and then the beads were spun down, and the supernatant loaded for LC-MS analysis.

To measure the caffeine in the sample a targeted LC-MS/MS was completed, fragmenting caffeine (195.086 m/z) and heavy caffeine (198.097 m/z)167. The relative amount of light: heavy caffeine was quantified using Peak View (Version 1.2, Sciex) by measuring the area under the

XIC (extracted ion chromatogram) for fragment ion 138 m/z and 140 m/z (light and heavy respectively). Various amounts of the wash buffer with the heavy caffeine were added until the area under the XIC for each of the fragment ions was approximately equal.

50

Figure 2.5. Correction of Deuterium Labelling with Light/Heavy Caffeine. a) Diagram of correction with light/heavy caffeine. The protein solution, with light caffeine (L), is mixed with the D2O labeling solution, containing the heavy caffeine (H), to label the protein as normal for an HX-MS experiment. The sample would then be quenched, digested, and analyzed by LC-MS as normal. Caffeine can be measured in the same LC-MS experiment as the HX or separately. Depending on the observed ratio of heavy to light, you can determine if the protein was labelled with more (left), or less (right) deuterium then anticipated, and apply that correction to the data (middle). b) Structures of light (left) and heavy carbon 13 labelled (right), caffeine. Reprinted from the Analytical Chemistry (PMID:25427063), copyright (2014), with permission from American Chemical Society.

51

The caffeine is included to help account for any errors in D2O labelling percentage, but there is another source of error that can occur, namely differences in back exchange. To correct for errors in back-exchange, three peptides (the BX peptides bradykinin, angiotensin I and leucine enkephalin) were included that have been previously used for such purposes168.

A map for DNA-PKcs, including the BX peptides, was generated as before (section

2.2.1) with minor changes in the LC set-up to allow for larger volumes of sample to be loaded.

That is, the sample loop was increased to 5 µL, and the trap was changed to a self-packed 200

µm x 2.5 cm trap (Jupiter Beads, C18 3.6 µm, Phenomenex). The sample was loaded and washed on the trap for 2 minutes at 10 µL/min. For separation the column was increased to a 7 cm x 150

µm self packed (Jupiter Beads, C18 3.6 µm, Phenomenex) picofrit column (New Objective), the gradient was kept the same, but the flow rate was increased to 1000 nL/min. These changes only had a small effect on sample consumption, requiring an increase in the DNA-PKcs loaded to 1 pmol. MS data collection and peptide identification routines remained the same.

For HX analysis, all DNA-PKcs complexes were pulled down immediately before analysis. To the pulled down DNA bound DNA-PKcs samples, HX buffer was added (10 mM

Tris pH 7.5, 75 mM KCl, 5 mM MgCl2, 0.2 µM Caffeine, 0.25 µM BX peptides, 3% glycerol

(v/v), and 1 mM DTT added fresh). For the DNA free samples, beads went through the pull- down procedure, with no DNA or DNA-PKcs added, then free DNA-PKcs in HX buffer was added to the washed beads. Labelling buffer (90% D2O with 10 mM Tris pH 7.5, 75 mM KCl, 5 mM MgCl2, 0.2 µM Heavy Caffeine, 0.125 µM BX peptides, 3% glycerol (v/v), and 1 mM DTT added fresh) was added to each, labelling at ~45% D2O for 5 min at 25°C. Samples were then quenched and digested on bead, adding rNep-II in 100 mM glycine-HCl pH 2.5. Digestion proceeded for 3 min at 10°C. After digestion the beads were spun down and half of the

52 supernatant loaded onto the sample loop to be analyzed by LC-MS, the other half of the supernatant was frozen in liquid nitrogen to measure the caffeine standard in a separate LC-MS run. The LC-MS proceeded as for the map, collecting MS1 data only. MS data were analyzed as before for the kinetic analysis (section 2.2.1). Data were collected in quadruplicate for each

DNA-PKcs sample (DNA-PKcs bound to blunt DNA, DNA-PKcs bound to overhang DNA, and

DNA-PKcs free).

To measure the caffeine standards a targeted MS/MS was done as above when determining the buffer left on bead. Control samples for the caffeine correction were run, where matched volumes of HX buffer and labelling buffer were mixed to determine the ratio of light: heavy caffeine that corresponded to 45% D2O labelling.

Deuterium Corrections

Peak View (Version 1.2, Sciex) was used the measure the area under the XIC for a fragment ion of each the light and heavy caffeine, 138 m/z and 140 m/z respectively. The corrected percent deuterium labelling was calculated with:

푟푎푡𝑖표퐶 %퐿푒푥 = %퐿퐶( ) (equation 2.4) 푟푎푡𝑖표푒푥 where, %퐿푒푥 is the calculated experimental deuterium labeling percentage, 푟푎푡𝑖표퐶 is the average ratio of XIC area light : heavy caffeine for the control samples, 푟푎푡𝑖표푒푥 is the ratio of XIC area light:heavy caffeine for the experimental sample, and %퐿퐶 is the percent deuterium label of the control sample which was made to match the expected deuterium labelling, in this case 45%. The percent corrected deuterium (PCD) was then calculated using the calculated deuterium labelling

(equation 2.5).

%퐷 푃퐶퐷 = %퐿 (equation 2.5) ( 푒푥) 100

53

After correction the data once again were analyzed using the Statistical Analysis app in MSS to identify peptides which significant changes in deuterium uptake. Peptides with significant change in deuterium were mapped to the 5Y3R DNA-PK40 structure using Chimera169.

2.3 Results and Discussion

2.3.1 Nano-spray HX reduces protein consumption and improves the peptide map

With both nano-spray and the traditional flow, high sequence coverage, represented by many peptides, is achieved for the DNA-PKcs maps (Table 2.1). Redundancy, which provides the average number of measurements for each amino acid, shows that for each map the amino acids have multiple observations. While the traditional flow map has an acceptable sequence coverage and large number of peptides, the nano-spray map outperforms it in each of the criteria used to evaluate peptide maps. The nano-spray map covers 6% more of the protein sequence.

This increase in sequence seems relatively modest, but the nano-spray map has almost 500 more peptides, with a 1.5 increase in redundancy of the measurements. The increase in peptides is from the chromatographic improvement alone, allowing for moderate overlap in peptides. This increase in peptides becomes important in the transition to HX experiments. Generally, with the labelling of proteins there is attrition in the peptides. This loss could occur through increasing overlap as the isotopic distribution is expanded, signal splitting with the expanding isotopic distribution162, as well as the peptide not being found in all replicates. Loss of a useable peptide cannot be predicted, so I aim for the highest number of peptides and sequence coverage when generating the peptide map, knowing some coverage will be lost. Not only does the nano-spray map outperform the traditional flow map, it does so while consuming 10X less protein (Table

2.1). Thus, the improvement in both map quality and consumption make nano-spray HX system ideal for analysis of DNA-PKcs.

54

Table 2.1. Evaluation of Traditional and Nano-spray DNA-PKcs Maps. Flow Injection Validated Coverage (%) Redundancy a Configuration (pmol) Peptides Traditional 5 1312 89.8 3.7 Micro-flow Nano-spray HX 0.5 1769 95.1 5.2 system a Redundancy is average number of measurements per amino acid. Calculated as the sum of all amino acids that are represented by the validate peptides divided by the total number of amino acids in the system.

55

Table 2.2. Summary of the Kinetics Evaluation of DNA-PKcs. Data Set DNA-PKcs Kinetics

Reaction Details DNA-PKcs labelled with 40% D2O pH 7.5

Labelling Times (min) 0.5, 1, 5, 15, 60

Replicates 3 Biological

Number of Peptides 1172

Sequence Coverage 90.4

Average Peptide Length 11.7

Redundancy 3.3

Repeatability 0.04 (average standard deviation, D) Table as per community guidelines122

56

2.3.2 High DNA-PKcs sequence coverage persists through a HX-kinetics

evaluation.

Completing a five-time point kinetics evaluation of DNA-PKcs, the protection factors for

1172 peptides are calculated, representing 90% of the DNA-PKcs sequence (Table 2.2). These results again stress the importance of having a map with many peptides, as almost 600 peptides are lost throughout the analysis. Given the high sequence coverage, and number of peptides from our starting map, this loss has a modest effect on sequence coverage (-5%), while still maintaining a redundancy of 3.3 (Table 2.2). Of the 1172 peptides ~40% are were rescued with the manual peak selection and fitted deuteration of MSS (Figure 2.6). The nano-spray HX system shows high reproducibility with an average standard deviation of 0.04 deuterium. From the kinetics experiment, the optimum labelling time, to be used for subsequent DNA-PKcs HX experiments, was determined to be 5 min based on the deuterium uptake curves (both single and double) (Figure 2.7).

Comparing the observed PF to the DNA-PKcs structure, many regions that are not present in the structure (498-524, 810-845, 2577-2773, and 3200-3226) correspond with peptides that show no-protection (Figure 2.8). Regions missing from structure are generally highly dynamic, which would be consistent with the low protection factor, as they would exchange similar to a disordered peptide. Unlike a globular protein, DNA-PKcs, especially in the HEAT regions (amino acids 1-2802), does not have a well protected core region. The common pattern of alternating low and high protection factors for DNA-PKcs peptides is consistent with the solenoid structures present in DNA-PKcs39,40 (Figure 2.8). Beyond these general interpretations it is challenging to relate protection factors to structure, but the few correlations we do see provide confidence that the protection factors measured with the nano-spray HX system reflect

57

Figure 2.6. Representative Spectrum of an Overlapped Peptide That is Rescued by Peak Picking in Mass Spec Studio. Grey boxes behind the peaks is the expected isotopic distribution of the peptide not deuterated. Peak m/z boxes outlined in orange show the non-overlapping m/z peaks that are selected, the remaining unselected peaks are not considered in the fitted deuteration. The orange curve shows the calculated fitted deuteration curve based on the selected peaks.

58

Figure 2.7. Representative Kinetics Curves for the Selection of D2O Labelling Time. Representative uptake curves from the kinetics evaluation of DNA-PKcs with nano-spray HX. The top two show curves fit with a single exponential and the bottom two a double exponential. The red line highlights deuterium uptake at 5 min.

59

Figure 2.8. DNA-PKcs Protection Factors. Protection factors for DNA-PKcs from the kinetics analysis of DNA-PKcs with nano-spray HX system. Each bar represents a peptide. Peptides with two protection factors are shown on the plot twice, plotted at each observed protection factor. The red line at the bottom and top of the plot shows peptides that were manually assigned to either full protection (log(PF) = 5) to no protection (log(PF) = 0). Light blue bars highlight regions of structure greater than 25 amino acids that are missing from the DNA-PKcs crystal structure39 60

the known structure. Nano-spray HX can be used to obtain high sequence coverage for DNA-

PKcs throughout an entire kinetics experiment. With success on DNA-PKcs in a solution-based experiment the nano-spray HX system was used for more complex HX experiments.

2.3.3 On bead HX of DNA-PKcs identifies changes in deuterium uptake upon

binding to DNA

HX-MS provides the solution average of deuterium uptake for proteins. To ensure that changes upon binding can be detected we want the majority of the proteins in the bound form.

DNA-PKcs was pulled down on two different types of DNA: blunt and OH (Figure 2.9). The pull down on beads was dependent on DNA, as minimal non-specific interactions were observed when DNA was not included (Figure 2.9). Approximately 6-10pmol of DNA-PKcs can pulled- down, starting from 13pmol of DNA-PKcs, which was sufficient for subsequent nano-spray HX analysis (Figure 2.9). Of note the increased DNA-PKcs pulled-down on OH DNA versus blunt

DNA was not reproducible (Figure 2.9), more often the amount of DNA-PKcs pulled-down on each type of DNA was similar. From the caffeine measurements, the residual volume of buffer on the beads was determined to be approximately equal to the bead volume, in this case 2.5µL.

Completing the HX analysis of DNA-PKcs on beads, changes in deuterium uptake upon binding to DNA were detected without any correction of the data (Figure 2.10A). For the back- exchange correction, 2 of the 3 BX peptides were not useable for the correction due to overlap and ion suppression. A single value would not provide a high confidence correction, so no back- exchange correction was done. However, the ability to detect changes without the back-exchange correction suggested that back-exchange was well controlled for experimentally. Analysis of the light and heavy caffeine showed a small labelling error between samples, with a standard 61

Figure 2.9. DNA-PKcs Pull-Down on DNA Bound to Streptavidin Agarose Beads. Above shows the western blot from the pull down on DNA-PKcs on DNA, adding 13pmol of purified proteins to beads with ~100pmol of DNA bound. Below quanitation of the western blot, comparing the signal for the pull downs to the known lane with 5pmol loaded.

62

Figure 2.10. Comparison of Correction Methods. Statisical analysis of DNA-PKcs binding to blunt DNA with A) no correction of the deuteration values B) caffeine correction applied to the deuteration values C) unchanged peptide multipoint correction applied to the deuteration values. Volcanno (left) and Woods (right) plots show peptides with a significant change in deuterium uptake (red and blue dot and bars), determined as previosuly described155.

63 deviation of 2.6% (Table 2.3). Correcting the PCD with caffeine labelling actually increased the spread in the data (Table 2.4) and shifted the values more positively (Figure 2.10B). The spread introduced by the caffeine correction could also arise from a single point correction, so a multipoint correction was developed and applied to the data, to see if that could improve the correction.

For the multipoint correction, DNA-PKcs peptides from the sample itself were used. Ten peptides were selected for the correction from the pool of peptides that statistically showed; no deuteration change between states, a non-zero deuterium uptake, and clean non-overlapping spectra. The deuterium uptake for these peptides were analyzed and the replicate which consistently showed the lowest deuterium values was selected as the base replicate. For each peptide, a correction factor was determined dividing the deuterium uptake of that peptide by the deuterium uptake of the peptide from the base replicate. The average of all ten correction factors was then used as the final correction factor and applied to all peptides in the corresponding replicate, using:

퐷푒푥푝 퐷푐표푟푟푒푐푡푒푑 = (equation 2.6) 퐶푝푒푝 where 퐷푐표푟푟푒푐푡푒푑 is the calculated corrected deuterium uptake, 퐷푒푥푝 is the measured deuterium uptake, and 퐶푝푒푝 is the correction factor determined with the ten correction peptides.

Correction using the unchanged peptides reduces the spread in the data (Table 2.4) and does not introduce a new shift but brings the distribution of data closer to center (Figure 2.10C).

Correcting with the data with the unchanged peptides does not significantly alter the observed changes (Figure 2.10), but reduces the standard deviation of the measurements (Table 2.4). The consistency of the corrected data with the uncorrected data provides confidence that the data is

64

Table 2.3. Caffeine Determined D2O Labelling

Corrected Labelling Protein State Replicate (%)

1 45.8 2 42.4 Free DNA-PKcs 3 44.7 4 51.3 1 42.1 DNA-PKcs on Blunt 2 45.1 DNA 3 42.4 4 45.7 1 46.5 DNA-PKcs on OH 2 43.3 DNA 3 42.8 4 42.2

65

Table 2.4. Average Standard Deviation of Deuteration Measurements with Correction Average Standard Deviation Correction Method (PCD)

Not Corrected 1.58

Caffeine 1.71

Unchanged Peptides 1.17

66 not being over corrected. With the improvement in the standard deviation in the measurements and the centering of the data distribution, it demonstrates that volume corrections are indeed important to apply for slurry-based HX experiments. All data were then treated with the multipoint peptide correction.

For the nano-spray HX analyses of DNA-PKcs on beads, high sequence coverage, monitoring over 1350 peptides, was achieved (Table 2.5). Using the statistics app in MSS, for a change in deuterium to be considered significant it must pass a two tailed t-test, with a p value of

0.05, and have a change greater than ±2 standard deviations of the unchanging peptides

(p>0.05)155 (Figure 2.11A, left). From the resulting volcano plot we identified the peptides that have a significant change in deuterium, points colored red and blue, but this doesn’t provide the structural context for the differences within the protein sequence (Figure 2.11A, left). To explore the significant changes in the context of the protein sequence we use Woods plots. For the

Woods plot, the change in deuterium is plotted versus the protein sequence, with each bar representing a peptide, where the length of the bar is consistent with the length of the peptide, and the coloring of peptides corresponds to the volcano plot (blue and red significant differences, grey not significant) (Figure 2.11A, right).

Looking at the significant changes observed in the context of the sequence we start to see distinct categories of changes; large changes in deuterium showing many overlapping peptides, smaller changes in deuterium but still showing overlapping peptides, and a small subset of what may be very small or spurious changes (Figure 2.11A, right). Beyond understanding the changes that occur at the sequence level, we can also map the results to protein structures to begin to understand how these changes relate to structure. The third category of changes were not mapped to structure, as they could also contain a small number of indistinguishable spectral overlaps.

67

Table 2.5. Summary of the Differential HX of DNA-PKcs DNA Binding. Data Set DNA-PKcs Binding Blunt DNA-PKcs Binding OH DNA DNA Labelled with 45% D O pH Labelled with 45% D O pH Reaction Details 2 2 7.5 on beads ± Blunt DNA 7.5 on beads ± OH DNA

Labelling Time (min) 5 5

Replicates for Each State 4 Biological 4 Biological

Number of Peptides 1480 1354

Sequence Coverage 94 92 (%)

Average Peptide Length 12.5 12.5

Redundancy 4.3 3.9

Repeatability (average standard deviation, 1.17 1.15 (PCD) Significant Differences in Deuterium Uptake 2.21 2.36 (PCD) Table as per community guidlines122

68

Figure 2.11. Nano-spray HX-MS evaluation of DNA-PKcs binding to DNA. Changes deuterium uptake of DNA-PKcs upon binding to blunt DNA. Volcano plots (left figure) and the corresponding Woods plot (right figure) highlight peptides with significant changes in deuteration. Significantly deprotected peptides are shown in red and significantly protected peptides are shown in blue. The boxes highlight two classes of change that are observed, purple large change in deuteration, yellow smaller change in deuterium. Arrows highlight representative peptides that show spurious change in deuteration. Changes upon binding B) blunt DNA and C) OH DNA mapped to structure. Peptides with significant changes, colored as the above, are mapped to DNA-PK structure with Ku70/80 hidden40. DNA-PKcs is colored in three sections, light blue N-terminal HEATs, teal middle HEATs, and steel blue the head containing the FAT, Kinase and FAT-C domains. Regions of structure not represented in the HX map are colored grey.

69

However, to err on the side of caution, I retained small changes in deuteration if arising from the only peptide representing that region.

With the changes mapped to the structure we can begin to explore what binding might be doing structurally. Changes in deuterium uptake are discussed as stabilizations and destabilizations. A stabilization occurs when upon binding there is protection from exchange, whether through direct binding or an induced conformational change, causing less deuterium uptake in that region (blue peptides). A destabilization occurs when upon binding the amide hydrogens become deprotected, usually through a conformational change, causing more deuterium uptake in that region (red peptides). Comparing the observed changes in DNA-PKcs upon binding to DNA, the DNA binding site in the structure aligns with a group of changes with a large change in deuterium and many overlapping peptides (Figure 2.11). Most of these changes correspond to a stabilization upon binding that is consistent with presence of the DNA protecting these peptides from exchange. The same DNA binding site is stabilized when DNA-

PKcs binds to OH DNA (Figure 2.11C). Here, we know the DNA binding site from the available structure, but if such information was not available, we would only know that these large changes could represent a binding site or dramatic conformational change. The same ambiguity is true for the remaining changes. The interpretation of all the changes seen when

DNA-PKcs binds DNA will be discussed in more detail in chapter 5, but as a proof of concept the agreement of deuterium changes seen when DNA-PKcs binds to each type of DNA, and the stabilization at the observed binding site confirms that the HX analysis can be done on bead bound samples (Figure 2.11).

70

2.3.4 Conclusions

Our nano-spray HX system and data analysis with MSS meets the challenges of HX on large protein systems like DNA-PKcs. Improvement in chromatography increases the number of peptides that can be used for HX analysis, while consuming 10X less protein than a traditional flow set-up. MSS allows a significant number of peptides (~40%) that still have minor overlaps to be used for deuterium measurements. Even with minor modification to the LC set-up of the nano-spray HX system, allowing for more dilute samples to be loaded, it still consumes 5X less protein then a micro flow set up. Pull-down of DNA-PKcs on DNA bound beads allows for the control for stoichiometry of complexes for HX analysis, in this case ensuring all DNA-PKcs molecules were DNA bound. A multipoint correction using unchanged peptides allows for an effective method of controlling volume inaccuracies arising from the labelling step, improving the detection of change in deuterium uptake. HX analysis of DNA-PKcs on beads was able to identify biologically relevant changes in deuterium uptake upon binding DNA. With success in a single protein system, nano-spray HX of on bead complexes can be used for the analysis of larger NHEJ protein complexes centered around DNA-PKcs.

71

Chapter 3: Formaldehyde DNA Footprinting to Identify DNA Binding Sites

3.1 Introduction

When conducting the HX analysis of the DNA-PKcs bound to DNA, the structure of

DNA bound DNA-PKcs was not yet known, and we sought to support the HX data with a true footprinting method. Given the ambiguous nature of HX changes, reduced labeling does not necessarily mean a binding site, as protection can also be caused by a conformational change114,115. Not only is there ambiguity in interpretation without other information, but HX can also miss binding sites if coverage is incomplete. Quite simply, if there is binding in a region, but no peptides detected in that region, there is then no information for binding. Further, if binding is weaker only attributable to a substoichiometric amount of the protein, it becomes challenging to detect changes in deuterium uptake, and the binding site might not be seen114. Binding sites can also be missed if the interaction primarily occurs with side chains, causing no change to the stability of the amide hydrogen114. There are other concerns, while binding is generally thought to result in protection, if binding induces a deprotecting conformational change at the binding site, there may be a net zero effect in protection upon binding, and even in rare cases deprotection at or very near the binding site170, causing the binding site to be missed115.

So, though protection at the N-terminal of DNA-PKcs was detected (Chapter 2) it does not mean the entire DNA binding site was mapped, as it was not the only change that was seen throughout the molecule (Figure 2.9). To determine which sites of change correspond to the

DNA binding sites and which were conformational changes, and to more confidently map the binding site in general, we sought to identify regions of DNA-PKcs that interact with DNA using a footprinting method. 72

Though there are well developed methods to identify the sequence of DNA that interacts with a protein, such as chromatin immunoprecipitation sequencing171, or the protein footprint on the DNA28,44,54, there are not many methods commonly used to identify amino acids of protein that interact with DNA. One method that has been used, for both RNA and DNA, is crosslinking the protein and DNA with ultraviolet (UV) radiation172–175. With UV crosslinking, peptides with the addition of nucleotide(s) can be identified providing direct evidence of the amino acid-DNA interaction172,175. The major limitations to this method is the low abundance of protein-DNA (or RNA) crosslinks induced by UV and the challenge of detecting such modifications by mass spectrometry. Given the low abundance of protein-DNA crosslinks, experiments require high starting amounts of sample with subsequent enrichment to identify these crosslinks172–175. Given our limitations in sample supply for DNA-PKcs, UV crosslinking was not feasible.

While less direct, reversible cross-linking and peptide fingerprinting (RCAP) techniques can identify peptides that interact with RNA or DNA using low starting amounts of protein176,177.

In RCAP, proteins are crosslinked to RNA with formaldehyde and then digested176. Peptides that are crosslinked to the RNA are enriched, and after enrichment the crosslinks can be reversed and the peptides analyzed by MS using proteomics176.This technique has been applied to a virus protein to identify DNA binding regions, using just 10 µg of protein per experiment177. When the

RCAP was applied to both RNA176,178 and DNA177 binding proteins, MS identification of DNA binding peptides relied only on MS1 level identifications. That is, the peptides were identified based only on accurate mass. To increase the confidence of peptide identification, particularly as the number of possible peptides increases, standard LC-MS/MS methods can be employed

(Figure 3.1). In the DNA binding of the virus proteins177, a second RCAP experiment was

73

Figure 3.1. Typical Workflow for MS Peptide/Protein Identification. Adapted from Journal of the American Society of Nephrology (PMID: 29724882), copyright (2018) with permission from American Society of Nephrology.

74 conducted on the virion with a protein of interest. Here, the MS data collected included peptide fragmentation, because with more peptides from other proteins in the virion, MS1 was likely insufficient for an accurate peptide identification177. The same will certainly be true for

DNA-PKcs. Here I describe the use of RCAP176 with a modified MS protocol to identify DNA binding sites in DNA-PKcs.

3.2. Methods

3.2.1 Formaldehyde footprinting: RCAP

Purified full length DNA-PKcs, supplied by the Lees-Miller lab in a Tris buffer, was exchanged to a HEPES buffer using a heparin column (Heparin HiTrap, GE Healthcare). The protein was loaded onto the column and washed with HEPES buffer (50 mM HEPES pH 7.5,

100 mM KCl, with 1 mM DTT added fresh), then eluted with a 20 min 0-100% gradient HEPES

Buffer B (50 mM HEPES pH 7.5, 750 mM KCl, and 1 mM DTT added fresh). DNA protein complexes were assembled by mixing DNA-PKcs with DNA, either blunt or OH (Figure 2.4), in footprinting DNA binding buffer (50 mM HEPEs pH 7.5, 75 mM KCl, 5 mM MgCl2, with 1 mM

DTT added fresh). Samples were crosslinked with 0.1% formaldehyde for 5 min at room temperature. Crosslinking was quenched by adding 2 M glycine for 5 min. For digestion, samples were diluted with 100 mM ammonium bicarbonate (ABC) and digested with trypsin overnight. After digestion, DNA with the crosslinked peptides was precipitated by adding sodium acetate to 0.3 M and 2 volumes of cold ethanol, then storing at -80°C overnight. DNA was pelleted by centrifugation at 21000g for 30 min. The pellet was washed three times with cold 75% ethanol, pelleting the DNA after each wash. All the wash was removed, and any remaining supernatant was evaporated before dissolving the DNA-peptides in resuspension buffer (25 mM sodium acetate pH 5.2, 200 mM NaCl). To reverse the formaldehyde crosslinks

75 the sample was heated 70°C for 1 hr. Finally, the samples were cleaned up with C18 ZipTips

(Millipore) before running on LC-MS. Peptides were separated on a 50 cm EASY-Spray™ LC

Column (Thermo), with a 25 min gradient 3-50% Orbitrap Solvent B (98% ACN, 0.1% FA), at

300 nL/min. Data dependent acquisition (DDA) data were collected on an Orbitrap Lumos

(Thermo Scientific). Data collection parameters included m/z 350-1200, resolution of 120000 for

MS1 and 15000 for MS2, charge states 2-8 were selected for HCD fragmentation at 35% normalized collision energy (NCE). Samples were prepared with a matched control that went through the same experimental workup, with the only difference being that no formaldehyde crosslinking was applied. Each of the control and the formaldehyde crosslinked samples were analyzed in triplicate (biological replicates).

To identify peptides, data were searched using PEAKS 8.5 against a self-curated list of proteins, allowing up to 3 missed cleavage sites, variably oxidized methionine, with a peptide tolerance of 10 ppm, and fragment tolerance of 0.02 Da. Peptides above the 0.1% false discovery rate (FDR) were accepted. Identifications from the crosslinked and control samples were pooled to generate a peptide list for quantification. The HX app in MSS164, which also exports quantification measurements, was used to determine the summed isotopic intensity for each peptide. Various cut-offs were applied to the data to determine the appropriate criteria for a peptide considered selectively enriched, and thus DNA binding. To explore the cut-offs relative to known DNA binding, selectively enriched peptides were mapped to the 5Y3R DNA-PK structure40 with Chimera169.

76

3.3 Results and Discussion

3.3.1 Defining DNA binding peptides

With any enrichment, there is a set of identifications that are true interactions, as well as identifications which come from non-specific interactions. How do you determine a true interaction from background? To define a rigorous cut off for DNA binding peptides, various criteria were applied to peptides enriched upon formaldehyde crosslinking to blunt DNA. The first graph (A) in figure 3.2, shows the intensity of the identified peptides, the peptides for the formaldehyde crosslinked and the control are not matched, but each set (formaldehyde or control) are sorted from highest to lowest intensity. Looking at the intensity of peptides found in the formaldehyde sample versus the control, peptides were indeed enriched in the crosslinked samples (Figure 3.2A). Some of this increased intensity was due to the selective enrichment of formaldehyde crosslinked peptides, but we are likely seeing higher background in these samples versus the control, as non-specific interactions could be crosslinked at low levels with formaldehyde, and perhaps even non-specific interactions of free peptides with the crosslinked peptides.

In identifying interactions, we want to optimize the number of true interactions while minimizing the number of false interactions (accepted background). The intensity of peptides from the control samples (i.e. not crosslinked) provides information on the intensity we might expect for non-specific interactions. So, different cut-offs based on the intensity in the control sample were tested, to determine an appropriate cut-off (Figure 3.2A). As the cut-off was lowered from 98% to 95%, most of the new peptides above the cut-off showed similar intensity to the matched control peptides, with an average intensity ratio (formaldehyde: control) of 4.3.

77

Figure 3.2. Determining Rigorous Cut-Offs for Blunt DNA Binding Peptides. A) Intensity cut off plot. The graph shows the average intensity for peptides found in each sample each sample sorted highest to lowest intensity. Error bars are ± 1 standard deviation (n=3). The horizontal dashed red lines show the respective intensity cut-offs from top to bottom, 100, 99, 98, and 95. The vertical red lines show where the respective intensity cut-off intersects the formaldehyde crosslinked intensity curve. B) Enrichment cut off. The ratio for each of the peptides from the formaldehyde crosslinked sample that were above the 98% intensity cut off was plotted. The red line shows the score cut-off set at 30fold enrichment.

78

These new peptides are probably mostly background peptides, thus an intensity cut-off of 98% was chosen to minimize false positives.

However, not all peptides above the chosen 98% cut-off are highly enriched above the control samples, thus a second cut-off threshold was determined based on the enrichment ratio.

To set the enrichment cut off, the intensity ratios of the peptides that pass the 98% intensity- based cut-off were plotted and the approximate apex of the curve (manually chosen) was selected as the cut-off, representing a conservative 30fold enrichment (Figure 3.2B). To explore the effects of these filtering steps and test the robustness of the cut-offs, the peptides that pass the intensity cut-off and the fold enrichment cut-off were mapped to the DNA-PK structure (Figure

3.3). Looking at the peptides that are possible DNA binding sites as the cut-off was lowered to

98% shows little change in the DNA binding peptides (Figure 3.3). That is, as the intensity cut- off was lowered most of the new peptides support peptides identified at the 100% cut-off, either overlapping with (or clustering in) the same region (Figure 3.3). This agreement of peptides from the 100-98% cut-offs, gives us confidence as the intensity cut-off is lowered to 98%, we are still primarily identifying true interactions.

3.3.2 Verification of the chosen DNA binding data cut-offs

Having set rigorous cut-offs based on the data observed for blunt DNA binding, the two- stage cut-off was applied to the OH DNA binding data (Figure 3.4). With the similarity in the observed HX changes in DNA-PKcs upon binding to blunt and OH DNA (Figure 2.11), we would expect DNA-PKcs to bind the OH DNA in a fashion similar to the blunt DNA. So, if our two stage cut offs are sensitively and selectively identifying DNA binding peptides, we would expect similar peptides to be identified in the OH data set. Indeed, using the two-stage cut-offs, much of the same sites were considered DNA binding (Figure 3.5B). As always when working

79

Figure 3.3. Blunt DNA Binding Peptides at Varying Intensity Cut-offs. Peptides that are considered DNA binding with a A) 100%, B) 99%, and C) 98% intensity cut- off. DNA binding peptides (purple) mapped to the 5Y3R structure with Ku70/80 hidden40. DNA- PKcs shown in teal, and missing regions greater than 30 amino acids represented as proportional length boxes.

80

Figure 3.4. Applying Defined Cut-off to Determine OH DNA Binding Peptides. Top) Intensity cut-off. The graph shows the average intensity for peptides found in each sample each sample sorted highest to lowest intensity. Blue the formaldehyde crosslinked (n=2) and green the control (n=3). Error bars are ± 1 standard deviation. The red line shows the 98% cut- off determined from the control sample. Below) Determining the ratio cut off. The ratio for each of the peptides from the formaldehyde crosslinked sample that were above the 98% intensity cut off was plotted. The red line shows the score cut-off set at 40fold enrichment.

81

Figure 3.5. Verification of Formaldehyde DNA Footprinting. DNA-PKcs DNA binding peptides identified by formaldehyde footprinting A) Blunt DNA, B) OH DNA, and C) Blunt DNA single replicate. Peptides that were enriched after formaldehyde footprinting are shown in purple on a teal DNA-PKcs (5Y3R with Ku70/80 hidden40). D) Electrostatic potential of DNA bound DNA-PKcs, calculated in Chimera169 with PDB2PQR179,180 and the Adaptive Poisson-Boltzmann Solver (APBS)181 82 with DNA-PKcs, limiting sample consumption is important. Thus, the two-stage cut-off was tested on a single blunt sample to see if the cut-offs are rigorous enough to function without replicates. The identified DNA binding peptides from a single sample were very similar to the peptides identified from the triplicate analysis (Figure 3.5C). The slight differences do point to some uncertainty but, given the core set of peptides that are identified as DNA binding in all cases (blunt, OH, single replicate) we can be confident that the chosen cut-offs are primarily identifying true interactions.

When the RCAP-generated binding site is compared to the known DNA binding site

(DNA-PK structure40), peptides in the known DNA binding site are certainly identified, but the entire DNA binding site and the “footprint” detected by HX (Figure 2.11) is not mapped (Figure

3.5). To be sure, a DNA binding site might be missed because the peptides do not pass the conservative cut-offs we applied. Setting rigorous cut-offs gives us confidence that identified peptides are true interactions, but also can lead to false negatives. Further, extensive crosslinking is another way in which a DNA binding site could go undetected177. Samples are digested while still crosslinked to DNA, but the presence of crosslinks can block digestion, specifically blocking lysine cleavage sites182, generating large peptides that are challenging to detect. So, true DNA binding site are detected with RCAP, but they are partial binding sites, reflective of a lysine- dominant technique.

Interestingly, RCAP also identified sites well outside the known binding site (Figure

3.5). Some of these sites would be identified even if the strictest 100% intensity cut-off was used

(Figure 3.3A). Comparing the sites that are considered DNA binding to electrostatic potential of

DNA-PKcs, we see that the identified DNA binding regions align with positive regions of DNA-

PKcs, like the known DNA binding site (Figure 3.5), demonstrating that these could be true

83

DNA binding sites. The consistency in the identified DNA binding sites with two different pieces of DNA and the alignment with positive regions of DNA-PKcs, provides evidence that formaldehyde footprinting with our rigorous two-stage cut-offs can be used to identify DNA binding sites at a peptide level.

3.3.3 Conclusions

Though RCAP is a lower resolution technique, it benefits by being a direct site determination tool, and we are able to use it identify peptides that interact with DNA, even with just a single replicate. The DNA binding sites identified by RCAP does not necessarily represent the entire binding site, as DNA binding peptides can be missed, but can still be used to accurately identify partial DNA binding sites and support our HX data. When comparing the known binding of DNA-PKcs to the DNA footprinting, regions outside the known binding were found to be DNA binding. Our rigorous cut-offs try to only capture true interactions, but we can not rule out non-specific protein-DNA interactions in solution. RCAP will work best in conjunction with other structural techniques, to ensure that the observed DNA binding sites have biological significance. Like other MS techniques, DNA footprinting has the advantage of being able to provide information on entire proteins, not being limited by the dynamics of the protein system. So, even though there is an available structure of DNA-PK bound to DNA40, formaldehyde DNA footprinting can be used to provide DNA binding information on regions not represented in the structure, and how binding changes with larger segments of DNA.

84

Chapter 4: Optimizing Crosslinking for DNA-PKcs

4.1 Introduction

XL-MS is one the most commonly employed MS methods in ISB111,131–134,136. XL restraints provide a maximum distance between residues104,135,183. One of the advantages of XL-

MS is the ability to generate restraints for weaker interactions135,183,184. Not only can XL-MS capture weak interactions, XL can tolerate more heterogeneity in samples than techniques such as HX-MS. With the increasing complexity of the DNA-PKcs complexes being studied, XL-MS will be an important technique for building a model of DNA-PKcs in NHEJ. Importantly, XL-

MS and HX-MS do not provide the same structural information, so the combined data from both will only serve to generate a better model185. The most obvious synergistic relationship between

HX and XL would be confirmation of binding sites that remain ambiguous in the HX experiments, or the identification of weak binding sites that could be missed by HX115. But, HX and XL would also complement each other in identifying conformational changes. While comparative HX identifies regions of structure that undergo conformational changes upon binding, quantitative XL (i.e. comparing two states), can identify structural changes that occur upon binding140,186–189, giving a directionality to the conformational changes that occur.

Even though XL-MS is a widely used technique, there are still challenges that require consideration. In a crosslinking experiment, after digestion of the sample, crosslinks represent a small fraction of the peptides present, with some suggesting as little as 0.1% of the peptides are coupled141. Their low abundance makes it challenging to detect these crosslinked peptides.

Several techniques have been employed to enrich crosslinked peptides, including SEC190,191,

SCX143,184,191–193, and affinity enrichment194–197. The enrichment of crosslinks using SCX and

SEC is not specific to the crosslinker, rather they rely the increased charge (SCX) and size (SEC)

85 that come with being linked. Thus, these techniques can be applied to samples linked by any type of crosslinker. But because these techniques separate based on crude peptide characteristics, there can still be many free peptides in the enriched fractions. SEC and SCX techniques usually rely on column separation, often requiring high amounts of starting material (50-200µg), followed by MS analysis of the inevitable multiple fractions190,191. For a more specific enrichment of crosslinked peptides, affinity tags can be incorporated into the linker region183.

Biotin is one of the most commonly used affinity tags, with its strong binding to streptavidin/avidin, but incorporation of the bulky biotin tags on the linker can have negative effects on crosslinking (e.g. biased coupling)198. To overcome the negative effects, crosslinkers with click-chemistries incorporated into the linker have been developed that allow the bulky affinity tag to be incorporated after protein crosslinking has occurred195,196. With affinity based enrichment, starting amounts of protein as low as 15 µg have been used to identify 4X more crosslinks than without enrichment195. Despite the advantages of these click-chemistry enrichable crosslinkers they are not yet commercially available, so crosslink enrichment by SEC or SCX is more often utilized141.

Another aspect that can affect the number of detectable crosslinks is the linker length. As the length increases, the number of possible crosslinks can increase151. But as the linker length is increased the value of the crosslink restraint decreases, as now many different configurations could satisfy the lower-resolution restraint. On the other hand, if the linker is too short, there are fewer crosslinks that can be detected, and the sparseness in the data may not allow the model to converge105. Thus, a linker length must be chosen that balances detectable crosslinks with value for modelling151.

86

After crosslinks are generated and detected, the crosslinked peptides must be accurately identified, which is not trivial. When searching for crosslinked peptides, all free peptides as well as all possible crosslink pairs must be considered, so the search space becomes exponentially larger as the protein sequence in the database increases (n2 complexity)199. Many different software packages have been developed to identify protein crosslinks, with various techniques for database reduction, some which rely on the use of specially designed crosslinkers142,200–203. A crosslinking application (CRIMP) for our in-house software (MSS) was developed with a database reduction strategy that can be applied to almost any crosslinker and data type199. The user friendly interface of MSS and sophisticated filtering methods allow for rapid manual validation of data, considering the quality of the XIC, MS1 spectrum, and annotated fragment spectrum, while also highlighting any conflicting peptides199.

Here, like the previous techniques, crosslinking methods were benchmarked on DNA-

PKcs alone, ensuring that relevant information can be obtained for the large protein before extending to the larger complex. Each of the following parameters were assessed: the ability of

MSS to identify crosslinked peptides, the need for enrichment of crosslinked peptides, and effects of crosslinker length. To extend XL to the larger DNA-PKcs complexes, assembly of the complexes had to be considered once again. The on-bead enrichment of DNA-PKcs worked well for HX to ensure that all protein was DNA bound (section 2.3.3), limiting one source of heterogeneity. Given the advantage of a bead-based assembly, crosslinking was tested on the same format. The reproducibility of crosslinking on beads was then explored to see if comparative quantitative crosslinking could be used as an additional source of information to guide modelling, as the complex is expanded.

87

4.2 Methods

4.2.1 Enrichment of Crosslinked Peptides

Enrichment strategies were first tested on crosslinked Bovine Serum Albumin (BSA) samples following the protocol in Leitner et al.204. Briefly, BSA (1mg/mL), dissolved in 20 mM

HEPES pH 8.3, 75 mM KCl, 5 mM MgCl2, was crosslinked with 1 mM DSS for 30 min 37°C, before quenching with ammonium bicarbonate. The sample was dried down and dissolved in 8

M Urea, then reduced and alkylated. Urea was diluted to 1 M for digestion with trypsin overnight at 37°C. Digestion was quenched acidifying the solution with FA. Samples were then cleaned up for with solid phase extraction HyperSep SpinTips (Thermo Scientific) for subsequent crosslink enrichment. For SEC, samples were dissolved in SEC buffer (30% ACN, 0.1% FA) and injected on to a Superdex Peptide PC 3.2/30 column (GE Healthcare). After loading peptides were separated at a flow rate of 100 µL/min with an Agilent 1100 chromatography system. Fractions were collected from the SEC column every minute, and then evaporated. For SCX, samples were dissolved in SCX buffer A (30% ACN, 0.1%FA). The crosslinked peptides were then separated on a PolySufloethyl A Column (100 mm x 1 mm, 3 µm Particles, PolyLC) with a 25-minute gradient SCX solvent B (0.5M ammonium formate pH 3.0, 30% ACN) on an Agilent 1200

HPLC. The gradient eluted peptides in 2 steps, holding at 25% and 40% SCX solvent B. SCX

Fractions were collected every 2 min, then evaporated. For both enrichment techniques samples were reconstituted in 0.1% FA for LC-MS analysis. Sample were loaded onto a Pepmap100 C18 nanoviper trap column (75 µm x 2 cm, 3 µm Beads, Thermo Scientific). Peptides were then separated on a 10 cm x 75 µm self packed (Aeris Peptide XB-C18, 3.6 µm particle size,

Phenomenex) picofrit column (New Objective). Separation occurred over a 30 min 5-60%

Orbitrap Solvent B (98% ACN, 0.1% FA) gradient, at 300 nL/min (EasyLC1000, Thermo

88

Scientific Inc.). MS/MS Data were collected on an Orbitrap Velos (Thermo Scientific Inc.), collecting 400-2000 m/z in MS1 with a resolution of 60 000. MS2 data, at resolution of 7500, were collected with HCD (40% NCE) fragmentation of the top 5 peaks, rejecting charge states

+1 and +2. Data were searched using the CRIMP (crosslink analysis) application in MSS199, using default parameters (MS1 and MS2 accuracy ±10ppm, and percent E value threshold of 10), against a database that only included the BSA sequence.

4.2.2 Crosslinking DNA-PKcs

DNA-PKcs purified from HeLa cells, supplied by the Lees-Miller lab in a Tris buffer, was exchanged to a HEPES buffer. Buffer exchange occurred through either through, exchange on a heparin column as in section 3.2.1, or multiple rounds of concentration and dilution on a

30kDa cut off membrane (Amicon Ultra 0.5 mL, Millipore), both resulting in ~60-70% sample loss. Three different crosslinkers were used to crosslink DNA-PKcs in HEPES buffer (20 mM

HEPES pH 7.5, 75 mM KCl, 5 mM MgCl2, and 0.5 mM DTT added fresh); disuccinimidyl suberate (DSS) (Creative Molecules Inc.) with a linker length 11.4Å, disuccinimidyl gluterate

(DSG) (Creative Molecules Inc.) with a linker length 7.7Å, and Bis(succinimidyl)penta(ethylene glycol) (BSP5, Thermo Scientific) with in a linker length of 21.7Å. Each crosslinker was used in separate crosslinking experiments. Crosslinking proceeded for 30 min, at 37°C for DSS and

DSG, and 30°C for BSP5, with shaking. Different crosslinker concentrations were used to increase the identifiable crosslinks (~1:2, 1:1, and 2:1 total lysine concentration to crosslinker concentration). The crosslinking reaction was quenched with excess ABC (50 mM final concentration) after 20 min. DNA-PKcs was digested overnight with trypsin, using a 1:30 enzyme to substrate ratio (w/w), at 37°C. Digestion was quenched with FA and then the sample lyophilized. For a subset of samples, crosslinks were SEC enriched as for BSA (section 4.2.1).

89

All samples were reconstituted in 0.1% FA for MS analysis. LC-MS analysis proceeded the same as for the BSA samples above, except a longer gradient (90 min) was used for the non- enriched samples. Crosslinks were identified with CRIMP-MSS using the default search parameters, searching a database that only included DNA-PKcs. One DSS replicate was also search on MeroX/StavroX, with the software’s default parameters, for comparison205.

4.2.3 DNA-PKcs and Ku70/80 Protein Purification for Crosslinking

With the loss of DNA-PKcs to buffer exchange (~60%), a purification of DNA-PKcs and

Ku70/80 from HeLa cells, following a protocol based on Goodarzi and Lees-Miller158and finishing in a XL compatible HEPES buffer was undertaken. A 100 L HeLa frozen cell pellet was purchased from the National Cell Culture Centre. The cells were thawed and spun 10 000g for 15 min 4°C, to isolate the nuclear fraction. The nuclear pellet was gently dissolved in 100 mL of high salt buffer (50 mM Tris pH 8.0, 5% glycerol (v/v), 0.2 mM EDTA, 10 mM MgCl2, 400 mM KCl, with protease inhibitors (0.1 mM benzamidine, 0.2 mM PMSF, 0.2 µg/uL of each pepstatin, aprotinin and leupeptin), and 1 mM DTT added fresh) to extract nuclear proteins.

Samples were spun down again, and the supernatant kept. The pellet was extracted two more times with 50 mL high salt buffer, and supernatants pooled. Gels and Western blots were run to confirm the presence of DNA-PKcs (Monoclonal Ab 42-47, Lees-Miller Lab) and Ku70/80

(Ku80 Abcam ab33242) in the high salt extraction supernatant.

Pooled high salt supernatant was then applied to a series of columns to isolate DNA-PKcs and Ku70/80. For each column, fractions were collected and analyzed by Coomassie stain and western blot to identify the DNA-PKcs and Ku70/80 containing fractions to continue the purification with. Dialysis was performed between each column enrichment to adjust the solution to the correct starting salt concentration. Initial column purifications were done in TB (50 mM

90

Tris-HCl pH 8.0, 5% glycerol, 0.2 mM EDTA, with protease inhibitors and 1 mM DTT added fresh), with a specified concentration of KCl (i.e TB50 is TB with 50 mM KCl). The first three columns, DEAE (self packed 15 cm x 5 cm, GE Healthcare), SP Sepharose (self packed 10 cm x

5 cm, GE Healthcare), and ssDNA (self packed 4 cm x 5c m, Sigma), were done by gravity flow.

Protein was applied to the equilibrated column slowly to bind, then washed with a low salt buffer

(TB75, TB50 and TB100 respectively). For the first two columns, proteins were eluted in two steps with TB175 and TB750. For the ssDNA column, a gradient of TB100-TB750 was applied to elute the proteins.

After the ssDNA column, the volume/protein content was reduced sufficiently to use smaller columns on an NGC Chromatography system (Biorad). For each column, proteins were slowly applied to the column to bind, washed with a low salt buffer, and then eluted with an increasing salt gradient. DNA-PKcs and Ku70/80 were still present in the same fractions after the ssDNA column. To separate the two proteins, they were applied to a Heprin HiTrap (5 mL,

GE Healthcare). Once on the column DNA-PKcs and Ku70/80 were separated with a 60 min 10-

60% TB750 (with 0.02% tween) gradient. With a separation of Ku70/80 and DNA-PKcs subsequent columns, MonoQ (HR5/5, Sigma) and MonoS (HR5/5, Sigma) were run twice, once for the Ku70/80 containing fractions and once of DNA-PKcs containing fractions. For purification on the final columns, the buffer was switched to HB (50 mM HEPES pH 7.5, 5% glycerol (v/v), 0.2 mM EDTA, with protease inhibitors and 1 mM DTT added fresh). For the

MonoQ column, proteins were separated with a 40 min 10-30% HB750 (with 0.02% tween) gradient. For MonoS column, proteins were separated with a 40 min 0-100% HB750 gradient. At this point the desired purification was not quite achieved, so the protein was applied to MonoS column again, except this time the HB buffer pH was increased to 8.0, to match the pH of the TB

91 buffer normally used for purification, and separated on a 30 min 10-40% HB750 gradient.

After the second MonoS the desired protein purification was achieved and DNA-PKcs and

Ku70/80 fractions were separately pooled. The pooled protein was concentrated on a 30 kDa cut off Vivaspin 6 Microconcentrator (GE Healthcare). To confirm purity, the purified proteins were digested with trypsin and analyzed by LC-MS. The digest was searched using Mascot against the

Swiss Prot Human database, allowing up to 2 missed cleavages, MS tolerance ±10 ppm, MS/MS tolerance ± 0.06Da, and oxidation of methionine as a variable modification.

4.2.4 Crosslinking Bead-Bound DNA-PK Complexes

DNA-PK Capture and Crosslinking

DNA-PK was pulled down on 100bp DNA (Table 4.1, UCDNA Synthesis Lab) attached to streptavidin agarose beads. The pull-down was as for the HX pull downs (section 2.2.2), except the DNA binding buffer was replaced with a HEPES version (50 mM HEPES, 75 mM

KCl, 5 mM MgCl2, 5% glycerol, 1 mM DTT fresh), and was also used as the wash buffer. To assemble DNA-PK, Ku70/80 and DNA-PKcs (from the purification above) proteins were mixed in equimolar amounts and added to the captured DNA, then assembly and washing proceeded as for the HX samples (section 2.2.2). After capture of complexes, crosslinking was completed on beads, adding DSS (dissolved in DMSO, Thermo Scientific) to a final concentration of 1:1 or

1:2, lysine concentration : DSS concentration. The ratio of DSS to lysine considered all the lysine residues present on the captured proteins as well as lysine content on the streptavidin agarose beads, approximated from supplier reported biotin binding capacity. Crosslinking proceeded 30 min, 37°C, with shaking. ABC was added to a final concentration of 50 mM to quench crosslinking for 20 min, 37°C. The sample was then digested on bead with trypsin overnight at 37°C. After digestion, the beads were spun down, and the supernatant collected. The

92

Table 4.1. DNA Constructs for DNA-PK Capture. Name Sequence* Biotin -GCA CAA TCT CGC GCA ACG CGT CAG 100bp-Biotin TGG GCT GAT CAT TAA CTA TCC GCT GGA TGA CCA GGA TGC CAT TGC TGT GGA AGC TGC CTG CAC TAA TGT TCC G CGG AAC ATT AGT GCA GGC AGC TTC CAC 100bp AGC AAT GGC ATC CTG GTC ATC CAG CGG ATA GTT AAT GAT CAG CCC ACT GAC GCG TTG CGC GAG AAG ATT GTG C *Sequences written 5` to 3`

93 beads were washed 2X with 50 mM ABC, pooling the washes with the supernatant. FA was added to the pooled supernatant to quench digestion, and the sample evaporated. Samples were then cleaned-up with C18 ZipTips (Millipore), before being reconstituted in 0.1% FA for MS analysis. Peptides were separated on a 50 cm EASY-Spray™ LC Column (Thermo Scientific), with a 70-minute gradient 5-25% Orbitrap solvent B (98% acetonitrile, 0.1% FA) with an

EasyLC1000 nanoLC (Thermo Scientific). DDA data was collected on an Orbitrap Lumos

(Thermo Scientific), with MS1 375-1600 m/z at a resolution of 120 000, then charge states +4 and up were selected for fragmentation with HCD (NCE 30%), at a resolution of 30 000. Two technical replicates of each sample were collected. Crosslinks were identified using CRIMP-

MSS, with a manually curated protein list including proteins of interest and low-level contaminants, and default search parameters. Crosslinks identified by MSS which passed the 1%

FDR were manually validated.

Crosslink Quantification and Comparison

The area under the XIC for each crosslink was exported from MSS with the crosslink identifications. For each replicate, the area under the peptide ion chromatogram (PIC) (500-1000 m/z) was determined with Xcalibur Qual Browser (V2.2, Thermo Scientific). Each Crosslink

XIC area was normalized to the PIC for the replicate, dividing the observed XIC area by the replicate PIC. To compare intensity of each crosslink between replicates, the fold change was calculated using the equation below:

푁표푟푚푎푙𝑖푧푒푑 푋퐼퐶 퐴푟푒푎 푅푒푝푙𝑖푐푎푡푒 1 퐹표푙푑 퐶ℎ푎푛𝑔푒 = 퐿표𝑔 ( ) (equation 3.1) 2 푁표푟푚푎푙𝑖푧푒푑 푋퐼퐶 퐴푟푒푎 푅푒푝푙𝑖푐푎푡푒 2

The reproducibility of the crosslink measurements from the technical replicates was evaluated looking at the distribution of the fold changes. Fold change values were plotted as a histogram

94 and fit with a Gaussian distribution with Matlab (R2016a), to determine the standard deviation associated with the technical replicates.

4.3 Results and Discussion

4.3.1 Mass Spec Studio for Crosslinked Peptide Identification

As protein complexes being studied become larger and more complex, we need crosslinking software that can efficiently identify crosslinks from large databases, without the need for specialty crosslinkers. Specialty crosslinkers, isotopically labelled or cleavable, do offer clever ways to reduce database size142,184,206. But, whether through signal splitting with isotopic crosslinkers, or the requirement for complex MS acquisition in the case of cleavable crosslinkers184, these specialty crosslinkers can have a negative effect on the data of these already challenging to detect crosslinks. With simple crosslinkers, for these large but still relatively simple systems (complexes assembled from purified proteins) more crosslinks could be identified if we had software that could search large databases efficiently and accurately.

Though in principal DNA-PKcs is a simple system, a single purified protein, its large amount of unique sequence (4128 aa) searching this system is similar to searching a protein complex. Data from crosslinking DNA-PKcs with light/heavy DSS, was searched with MSS to test its ability to identify crosslinks. To identify crosslinked peptides, MSS generates a peptide library from the provided protein sequences with the user defined charge states and modifications. The full peptide library is then searched against each MS2 spectrum using a revised version of

OMSSA207, adapted for high resolution MS2 data208. Only peptides with acceptable E-scores

(which can be made more or less restrictive by the user) are used to permit candidates for that spectrum199. The candidates (dead ends, free peptides) are then filtered based on precursor mass of the spectrum199. A refined scoring step is then applied to the filtered candidates based on peak

95 assignment in MS1 and MS2, now considering all possible fragments generated from the crosslinked peptides, not just each linear peptide199. With this MS2 database reduction strategy in

MSS, large complexes (even small proteomes) can be easily accommodated on a standard computer199. For comparison the same DNA-PKcs crosslinked data was searched on another freely available crosslinking software, MeroX (formally StavroX)205. MeroX starts generating a list of all peptides for the given proteins. Using the peptide list, it then generates a list of all theoretical crosslink masses205. Those masses are then compared to the observed precursor masses205. From there the observed fragment spectra are compared to the theoretical fragment spectra of that crosslink205. Since the initial search in MeroX requires a generation of database that contains all theoretical crosslinks, this software is generally used for smaller protein systems

(1-10 proteins), as the search space would become quadratically larger with more peptides205.

For more complex protein systems, MeroX has a search based on cleavable crosslinkers206. The search with MSS took ~35 minutes on a personal computer, while the same search with MeroX took ~1hr. MeroX identified 15 unique crosslink sites, while MSS was able to identify 61 unique crosslink sites. Of the 15 crosslink sites MeroX identified, MSS identified 14 of them. To ensure accurate identifications of crosslinked peptides, crosslinking was done with a mixture of light and heavy isotopically labelled DSS crosslinker. With light/heavy isotopically labelled crosslinkers true crosslinked peptides will have two isotopic distributions (separated by an amount dependent on the incorporation of the heavy element); a light, the two peptides crosslinked by the light version of the crosslinker, and a heavy, the two peptides crosslinked by the heavy version of the crosslinker. Without using the presence of these paired isotopic distributions to search for crosslinks, each of the crosslinks identified by MSS, including the ones not identified by MeroX, did indeed have the isotopic pair, showing they were true

96 crosslinked peptides (Figure 4.1). Based on the agreement with MeroX, and presence of the light/heavy distributions, we can be confident that CRIMP is correctly identifying crosslinked peptides with its MS2 based database reduction strategy.

To further validate CRIMP identifications the remaining crosslinking data, including

BSP5 which is not available as an isotopically labelled version, were searched. Of the crosslinks that can be mapped to the structure (~55% of observed crosslink set), most agree with the DNA-

PKcs structure, a crosslink satisfaction of ~95% 39 (Figure 4.2), again supporting that the crosslinks are accurately identified by MSS. Modelling proteins with crosslinking restraints has to balance limiting the distance restraint with number of crosslinks that are observed135,151. Not suprisingly as the linker length is increased more crosslinks could be observed (Figure 4.2). On balance we decided that DSS will be used for subsequent crosslinking experiments. MSS allows for the flexibility of crosslinker types, since its database reduction strategey is independent of the crosslinker. So, if insufficent crosslinks are identified with DSS crosslinking can be repeated, with DSG to tighten the restraint, BSP5 to increase the number of restraints, or any number of different crosslinker reactivities that are commerically available151.

4.3.2 DNA-PKcs Crosslinks can be Identified without Enrichment

The two commonly used crosslink enrichment techniques, SEC and SCX were evaluated on crosslinked BSA samples before testing on DNA-PKcs. For each of the enrichment strategies a shift in the TIC to later times is seen in the fractions where the crosslink peptides should be eluting (early SEC fractions and late SCX fractions), which is consistent with the enrichment of the large crosslinked peptides in these fractions (Figure 4.3). Looking at the number of crosslinks identified in each fraction, it appears that SCX enriched for more crosslinks, but many

97

Figure 4.1. Crosslinks Identified by Mass Spec Studio show the Paired Light/ Heavy Isotopic Distribution. Representative crosslink identification from MSS199. Top left shows the XIC for the parent ion. Top right shows the MS spectrum with the light and heavy isotopic distributions highlighted with green boxes. Below is the corresponding fragment spectrum, that shows fragments for peptides on each side of the crosslink.

98

Figure 4.2. Evaluation of the DNA-PKcs Crosslinks. Circos plots, modified from xVis209, show all unique crosslink sites identified for each crosslinker (linker length in brackets). The red bars on the plot indicate missing regions of DNA- PKcs structure >30 amino acids, crosslinks to these regions can not be mapped to structure. The corresponding crosslinks that can be mapped to the DNA-PKcs structure (5LUQ39) are shown to the right. Crosslinks were mapped to the structure with Xlink Analyzer210. The blue bars show crosslinks that satisfy the crosslinker distance (less than, 25Å for DSG, 30Å for DSS, and 50Å for BSP5), red bars are crosslinks that are not satisfied. DNA-PKcs is colored in 3 sections, light blue the N-Terminal HEATs, teal the middle HEATs, and green the head.

99

Figure 4.3. Enrichment of BSA Crosslinked Peptides. A) Enrichment of crosslinked peptides with SEC. The top graph shows the absorbance at 260nm as the peptides elute from the SEC column. Boxes show the fractions collected with the TIC for the corresponding fraction below. B) Enrichment of crosslinked peptides SCX. The top graph shows the absorbance as the peptides are eluted with increasing salt. The boxes show the fractions collected, with the corresponding TIC for the corresponding fraction below. For both A and B, the crosslinks listed are all identification from MSS and do not represent the unique crosslink sites.

100 of the crosslinks identified in SCX were repeats. So, when comparing the unique crosslink sites from each, SEC outperforms SCX, with 164 unique crosslink sites being identified from the

SEC samples versus 71 unique crosslink sites from SCX. Both techniques did enrich for crosslinks, but more unique ones were identified in the SEC sample, so SEC was chosen as the enrichment technique to be tested on DNA-PKcs samples.

With any extra separation step there will be sample loss. To evaluate if this additional enrichment step is beneficial for DNA-PKcs crosslink identification, DSS crosslink identifications with and without SEC enrichment were compared. As with the BSA samples, the

TICs of the early DNA-PKcs SEC fractions were shifted to later time points, consistent with enrichment of the large crosslinked peptides (Figure 4.4). Comparing the unique crosslink sites identified by SEC (pooling identifications from SEC fractions 15-19) and the non-enriched sample, more crosslinks were able to be identified in the non-enriched sample (Figure 4.5). The under performance of SEC enrichment could be due to sample loss when using a minimal amount of starting protein, causing low abundance crosslinks to be lost to non-specific interactions with the column. But, even using the minimum amount of protein for SEC, it is still consumes 4X the amount of protein required for non-enriched samples. Since many crosslink peptides are low abundance and near the limit of detection, increasing the sample will help increase crosslink identifactions, bringing more crosslinks into the detectable range. In an effort to balance identification with protein consumption, crosslinked identification could be increased starting with more protein for the non-enriched samples, while still consuming less then an SEC samples. Given that many DNA-PKcs crosslinks are identified without enrichment from free peptides, and in the case of limiting amounts of protein more crosslinks are identified without

101

Figure 4.4. SEC Enrichment of DNA-PKcs Crosslinks. Top graph shows the separation of peptides with the SEC column monitoring the absorbance at 280nm. The boxes show the fractions that were analyzed by MS. Below are the corresponding TICs for each fraction. Crosslinks listed are all crosslinks identified, not unique crosslink sites.

102

Figure 4.5. Comparison of Unique DSS Crosslinks for DNA-PKcs Identified With and Without Enrichment. The proportional Venn diagram was generated with BioVenn211

103 enrichment, subsequent DNA-PKcs crosslinking experiments were done without enrichment to conserve protein.

4.3.3 Purification of DNA-PKcs and Ku70/80 for XL-MS

While DNA-PKcs protein can be conserved without the need for crosslink enrichment, the need to buffer exchange meant that for every experiment nearly ¾ of the protein was being wasted on buffer exchange. If multiple states are going to be considered for quantitative crosslinking, continuing with buffer exchange of the DNA-PKcs, and later Ku70/80, would limit the number of samples that could be analyzed with the available DNA-PKcs stock. To ensure there was enough protein for crosslinking analysis of multiple states and replicates, a purification of DNA-PKcs and Ku70/80 was completed ending in a buffer compatible with XL. From the purification, 3.2mg of DNA-PKcs (1.71mg/mL), and 1.35mg Ku70/80 (2.6mg/mL), were isolated (Figure 4.6). From the MS analysis of the purified proteins, the contaminating band seen for Ku70/80 was determined to be Vigilin (Uniprot: Q00341) (Figure 4.6). The contamination in the Ku70/80 should not have a negative effect on the crosslinking as Vigilin is not known to interact with either DNA-PKcs or Ku70/80, but Vigilin was included in the crosslink searches to prevent misidentifications. Though there are no contaminant bands seen for the purified DNA-PKcs (Figure 4.6), by MS there are low levels of Leucine-rich PPR motif containing protein, so to prevent misidentifications it was also included in the crosslink search protein database.

4.3.4 Quantitative Crosslinking of Bead Bound DNA-PK

To test crosslinking on beads, DNA-PK was assembled on 100bp captured on streptavidin agarose beads. The length of DNA was increased relative to HX pull-downs (section

104

Figure 4.6. Purification of DNA-PKcs and Ku70/80 from 100L HeLa Cells. Purification of proteins from the nuclear high salt wash (HSW), through a series of columns to the final purified proteins (right) (WCE = whole cell extract and M=markers). On the right, the first arrow shows the lane containing purified DNA-PKcs. The second arrow shows the lane contain purified Ku70/80.

105

2.2.2) to allow for assembly of DNA-PK, which fully occupies the 28bp of DNA54. The increased DNA length was selected to prevent non-specific crosslinks between DNA-PK and the immobilized streptavidin. In this first pass at crosslinking DNA-PK on beads, 258 unique DSS crosslinking sites were identified. This was a striking improvement from the 74 unique sites from the initial DSS crosslinking of DNA-PKcs (Figure 4.3). A few modifications were made to the crosslinking and MS protocol that would have contributed to this increase in crosslink identification. The first change to the crosslinking protocol involved the use of just the light version of DSS. Since the ability of MSS to correctly identify crosslinked peptides was validated on DNA-PKcs alone, the presence of the light/heavy isotopic pair was no longer needed. Use of only the light version of DSS increases the abundance of the crosslinks, as the signal is no longer split between the light and heavy versions. Having only one isoptic distribution also means that more crosslinks can be sampled, as the mass spectrometer will not waste time fragmenting both the light and heavy versions of the same crosslink. The next change was to reject 1-3+ charge states. As to be expected, looking at the distribution of charge states from a tryptic digest of

DNA-PKcs, there are still many tryptic peptides that have a charge of 3+ (Figure 4.7). Another member of the lab found that most of the crosslinks that are present as a 3+ charge state are also found as the 4+ charge state (verbal communication), so rejecting 3+ charge states should not reduce crosslink identifictions. By rejecting charge states 1-3+, less time will be spent fragmenting tryptic peptides, allowing for more crosslinked peptides to be fragmented. The last change was using the Orbitrap Lumos for MS, instead of the Orbitrap Velos, which has improved sensitivity allowing for more of the low abundance crosslinks to be detected.

To evaluate if crosslinking DNA-PK is amenable to label-free quantification (LFQ)212,213 the reproduciblity of the MS measurements of crosslinks was assessed. While LFQ is not the

106

Figure 4.7. Distribution of Charge States from a Tryptic Digestion of DNA-PKcs. Plot was generated by RawMeat (Vast Scientific).

107 most precise comparative technique, LFQ will allow easy comparison between many different states without any experimental modifications213,214. For the on bead crosslinked samples, the reproducibility of label free quanitification was evaluated comparing the PIC normalized intensity between two technical replicates. The distribution of the fold change between the PIC normalized intensity of the two replicates was evaluated, and fit with a

Gaussian distribution, showing a small standard deviation, 0.536 (Figure 4.8). Since we expect these replicates to the be the same, this fit defines the normal variance of the MS measurements.

This observed spread of data was used to define a conservative fold change cut-off for later quantitaive comparisions, requiring a fold change greater then ±3σ (1.61 fold change) to be considered a significantly enriched crosslink (Figure 4.8).

4.3.5 Conclusions

MSS analysis allows for the correct identification of DNA-PKcs crosslinked peptides that agree with the observed DNA-PKcs structure. The MS2 based database reduction in CRIMP-

MSS means DNA-PKcs complexes can be expanded without significantly increasing the search time, as well different crosslinkers can be used if crosslinking with DSS yields insufficient data to model. Comparison of the non-enriched to the SEC enriched samples show, at least in the case of DNA-PKcs, that enrichment is not absolutely required to detect a large number of crosslinks.

With the sample loss associated with enrichment, applying both crosslink enrichment along with enrichment of DNA bound complexes, would not be reasonable as it could require anywhere from 100-200g of protein per crosslinking sample. But, since we do not need to rely on enrichment of the crosslinked peptides, we elected to focus enrichment on capturing DNA bound complexes, as was done for HX, to remove one source of heterogeneity in the XL measurements.

Purification of DNA-PKcs and Ku70/80 into a XL compatible buffer means 70% less protein

108

Figure 4.8. Assessment of MS Measurement Deviation. The distribution of the fold change between the normalized XIC area of two crosslinking replicates. Yellow curve is the fit Gaussian distribution with the fit parameters in the top right corner. The solid red line shows the median and the dashed red lines ±3σ.

109 will be consumed per experiment, as well as providing sufficient stock for a thorough XL analysis of multiple DNA-PKcs complexes. The revised crosslinking method on beads showed marked improvement in crosslink identifications, as well reproducibility in the crosslink measurements. This revised on-bead crosslinking method can be applied to the expanding DNA-

PKcs complexes, generating large number of crosslinks that can be used for LFQ comparison of states.

110

Chapter 5: Integrative Structural Model of DNA-PKcs in the Initial Steps of Non-

Homologous End Joining

5.1 Introduction

DNA double strand breaks (DSBs) must be repaired to prevent cell death, but also must be repaired accurately to maintain genome stability. Homologous recombination (HR) is the most error-free mechanism to repair a double strand break, but it is only active when sister chromatids are available as a template, whereas non-homologous end joining (NHEJ) is active throughout the cell cycle. NHEJ involves untemplated ligation. This seemingly simple function requires a complex multi-protein process to maintain the highest fidelity possible. Several processing factors are required, depending on the chemical and structural properties of the break, and the order in which the factors are engaged for repair has a direct impact on the process and outcomes of NHEJ6,95,99. DSBs, regardless of sequence29, are quickly bound by the highly- abundant Ku70/80 heterodimer, followed by the recruitment of DNA-dependent protein kinase catalytic subunit (DNA-PKcs) to form the large holoenzyme DNA-PK (~620kDa)40. The recruitment of a DNA-PK to each DNA end prevents resection44 and sets the stage for the formation of a dimeric complex (i.e. the synaptic state) that tethers the broken ends55. The complex also initiates the recruitment of DNA processing factors and ultimately supports ligation of the broken ends23.

The synaptic complex appears to contribute an organizational and gating function to

NHEJ. Repair involves an iteration through different processing events that remodel the DNA ends prior to ligation95. The entire process requires several structural factors to present the ends for ligation the instant they are sufficiently processed6,95,99. The synaptic complex consists of matched holoenzymes across the break55 that recruits core NHEJ factors such as X-ray repair

111 cross-complementing protein 4 (XRCC4), XRCC4-like factor (XLF), paralog of XRCC4 and

XLF (PAXX), and Ligase IV (LigIV)43,81. In particular, an XLF-XRCC4-LigIV subcomplex65,73,74 participates in scaffolding the break site through interactions with Ku70/8067,72 and the recruitment of PAXX (minimally involving an interaction with Ku7038,58,59) adds to stabilization.

The complexity of the preligation state raises questions regarding the overall organization of components and how processing events are regulated in such a system. Recently, compelling single-molecule FRET studies have supported the existence of long-range and short-range synaptic structures56. The long-range structure maintains the separation of DNA ends by

≥100Å56. It transitions to a short-range complex that appears to moderate at least some of the processing functions (as well as ligation by LigIV56,96,97). This staging of the synaptic state argues strongly for a well-organized and hierarchical approach to repair, where LigIV activity is held in abeyance until a processing program is organized and initiated. Subsequent studies also appear to favor such a model97, but it is unclear how synapsis can serve an organizational role and coordinate a transition to a ligase-competent short-range state, and indeed why the ends need to be held far apart in a long-range complex in the first place.

Core NHEJ components are involved in the transition56. Specifically, the short-range complex consists of a well-defined ternary complex involving the XLF dimer interacting with two XRCC4-LigIV subcomplexes96. The complex creates an environment where LigIV is held proximal to the break and poised to repair it upon sufficient end-processing96,97. There seems to be no explicit role for DNA-PKcs in ligation, as studies indicate that (auto)phosphorylation triggers its release prior to ligation45,62. Most models of NHEJ repair maintain the presence of

DNA-PKcs up to ligation215, but the recent two-stage model suggests that it is retained

112 throughout the process, perhaps undergoing a conformational change to manage the transition56. It is also unclear how end-protection by DNA-PK is relieved as repair proceeds and how this release is coordinated with a requirement for tethering and positioning of processing factors throughout the process.

A structural model of the long-range synaptic complex is required to understand how the system supports end-joining, but studies have proven challenging given its size and structural heterogeneity. Recently, a structure of DNA-PKcs was solved by X-ray crystallography at 4.3Å resolution, involving 89% of the sequence39. It was extended to include most of Ku70/80 and a short dsDNA sequence using cryo-electron microscopy (EM), at 6.6 Å resolution40. Together, these structures reveal a “head” feature consisting of intertwined FAT (FRAP-ATM-TRRAP),

FAT-C-terminal and kinase domains assembled on a large N-terminal solenoid that forms a double ring shape. The DNA is presented in a channel formed by Ku70/80 and the solenoid, involving surprisingly few contacts with the catalytic subunit, but bending the N-terminal HEAT

(Huntingtin, Elongation factor, PP2A, subunit-TOR) repeats towards the head structure40.

Attempts have been made using EM to generate structures of a synaptic state with limited results63,216, because of the strong orientational bias of DNA-PK on sample grids. Here, we used mass spectrometry and integrative structure modeling to track the conformational changes in the holoenzyme from DNA binding to activation, producing a model of the long-range synaptic complex at 13.5Å precision. This model presents the two holoenzymes in a head-to-head orientation with a considerable offset. It rationalizes key elements of the transition to a short-range repair complex, identifies a new plug domain with a role in regulating the transition, and localizes unassigned structural elements (Ku80 C-terminal and PAXX).

113

5.2 Methods

5.2.1 Protein Production and 2kb DNA Preparation

DNA-PK Purification

DNA-PKcs and Ku70/80 were purified as described in section 4.2.3.

PAXX Purification

Human PAXX (C9orf142, Accession number NM_183241), expressed and purified by S.

Fang in the Lees-Miller lab and L. Lee in the Schriemer lab, was amplified from a HeLa cell cDNA library and cloned into pGEX6P1 vector between BamH I/Xho I sites and transformed into E. coli (BL21). Cells were induced with IPTG (0.2 mM) for 10 hours, lysed, clarified and extracted with Glutathione Sepharose 4B. Bound protein was eluted with glutathione (20 mM), the GST tag removed with PreScission protease (GE Healthcare) and the cleaved product polished on a HiTrap Heparin HP column. Purity was confirmed by SDS-PAGE and Western

Blot.

Biotinylated 2kb Preparation

Internally biotinylated substrates, similar to Graham et al., 201656 with slight modifications, were prepared. For clarity, the method is diagramed in Figure 5.1 and all DNA constructs (UCDNA Synthesis Lab, unless stated otherwise) used are shown in Table 5.1. Using pet28a as the initial template, a specified tandem cut-site was inserted that would allow for the incorporation of an internal biotin. To add the cut-site, two primers were designed with the desired cut-site sequence (LSO78R and LSO77F), and corresponding primers for pet28a plasmid

(LSO71F and LSO72R). These primers were used to generate two polymerase chain reaction

(PCR) products, PCR1 and PCR2, which were ~1 kb long containing the cut-site sequence at the end. Then using gel purified (Geneaid gel/PCR DNA extraction kit) PCR1 and PCR2 as

114

Figure 5.1. Diagram of Biotinylated 2kb Preparation. The red section represents the desired sequence to be added to be added to the plasmid. The added sequence contains the tandem cut-sites for insertion of the LSO73, the ssDNA with an internal biotin (small black circle).

115

Table 5.1. DNA Constructs.

DNA Sequence*

LSO71F CGGAACATTAGTGCAGGCAGC

LSO72R GCGTAATGGCTGGCCTGTTG

LSO73 /5Phos/TGAGGGATATCGAA/iBiodUK/TCCTGCAGGC

LSO76F TGAGGGATATCGAATCCTGCAGGC

TGAGGGATATCGAATCCTGCAGGCTG LSO77F AGGACCACCACCACCACTGAGATC CCTCAGCCTGCAGGATTCGATATCC LSO78R CTCAGCGACCCATTTGCTGTCCAC

28bp CTCAGGCGTTGACGACAACCCCTCGCCC

28bp-Biotin Biotin GGGCGAGGGGTTGTCGTCAACGCCTGAG

*Sequences written 5` to 3` iBiodUK is the internal biotin modification

116 templates, PCR with LSO71F and LSO72R was completed to generate a 2 kb piece of DNA with the desired cut-sites in the middle, PCR3. PCR3 was then gel purified (Geneaid gel/pcr

DNA extraction kit). To insert the cut-site sequence into pet28a, both pet28a and PCR3 were digested with BclI and BlpI (New England BioLabs), then ligated and transfected into αDH5 bacteria and plated on 2YT with Kam50. Uptake of the correct plasmid was confirmed with colony PCR, using primers LSO76F and LSO72R. Colonies with successful PCR were grown-up and a plasmid prep completed (GenElute Plasmid Miniprep Kit, Sigma). Resulting plasmids were sequenced (University of Calgary DNA Sequencing) to ensure correct insertion of the cut- site. The new plasmid, with the tandem cut-site, is hereafter referred to as PSD108.

To prepare the 2 kb DNA, PSD108 was used as a template for PCR with LSO71F and

LSO72R. The resulting 2 kb DNA was digested with Nb.BbcVI (New England BioLabs) for 2 hrs 37°C, which cuts at the tandem cut-site leaving a ssDNA segment for addition of the internally biotinylated ssDNA LSO73 (Integrated DNA Technologies). After digestion a 10X excess of LSO73 was added, and the sample heated to 80°C for 20 min to denature Nb.BbcVI, then slowly cooled to allow annealing of LSO73. LSO73 was then ligated into the plasmid with

DNA ligase 4 (New England BioLabs) overnight. The final DNA product was cleaned up with an E.Z.N.A Cycle-Pure Kit (Omega Bio-Tek), and concentration/purity measured by Nanodrop

(Thermo Scientific).

5.2.2 Complex Formation and Isolation

Complexes to be studied by nano-spray HX were DNA-PK, DNA-PK with AMP-PNP

(adenylyl-imidodiphosphate), and DNA-PK with AMP-PNP and PAXX. While the complexes to be studied by XL were non-synaptic states: DNA-PK, and DNA-PK with AMP-PNP, and the synaptic states: DNA-PK, DNA-PK with AMP-PNP, DNA-PK with PAXX, and DNA-PK with

117

PAXX and AMP-PNP. DNA-PKcs complexes were assembled as in section 2.2.2 for HX analysis, and section 4.2.4 for XL-MS analysis, with minor modifications. The first modification was complex assembly on a larger DNA constructs. A 100bp DNA was used for the HX samples and non-synaptic XL samples (Table 4.1). For the synaptic XL samples, the internally biotinylated 2kb DNA described above was used (section 5.2.1). Next, since more than a single protein was being added to the DNA, all the purified proteins were combined before adding to the captured DNA, with DNA-PKcs and Ku70/80 at equal molar amounts and PAXX in excess.

Finally, for the AMP-PNP samples, 1mM AMP-PNP was included in the DNA binding buffer, protein buffer, and the wash buffer.

5.2.3 Hydrogen Deuterium Exchange Mass Spectrometry

A peptide map for DNA-PK and PAXX was generated as in section 2.2.2. The only change was data was also searched on PEAKS 8.5, with the same parameters as the Mascot search, and all peptide identifications (PEAKS and Mascot) were combined. The nano-pray HX-

MS analysis of the expanded DNA-PKcs complexes, DNA-PK, DNA-PK with AMP-PNP, and

DNA-PK with AMP-PNP and PAXX, proceeded as in section 2.2.2. The only modifications were, removal of the BX peptides from all buffers, and the addition of 1 mM AMP-PNP to the labelling buffer for AMP-PNP containing samples. For the differential HX analysis of DNA-

PKcs binding to DNA the data from section 2.2.2 was used. The final deuterium measurements for all data were corrected using a set of 10 peptides that showed no change upon binding as in section 2.3.3.

5.2.4 DNA Footprinting

DNA footprinting of the DNA-PK complex on two different DNA substrates, short and long, proceeded as in section 3.2.1, with a single control and crosslink replicate for each

118 substrate. The short DNA substrate was a 28bp double strand DNA (UCDNA Synthesis Lab,

Table 5.1), the long DNA substrate was the 2kb DNA described above (section 5.2.1) without incorporation of the internal biotin. Selectively enriched peptides were determined using the criteria defined in section 3.3.1, 98% intensity cut-off, and enrichment cut-off. Footprinting data presented for DNA-PKcs binding DNA is from section 3.2.1, with the intensity and enrichment cut-offs applied.

5.2.5 Negative Stain Electron Microscopy

DNA-PKcs (5 pmol) and Ku70/80 (2.5 pmol) were combined with 2 kb DNA (2.5 pmol) in DNA binding buffer +/- 1 mM AMP-PNP. For PAXX samples, 7.5 pmol of PAXX was included. Complexes were equilibrated on ice for 15 min., diluted 1:5 with DNA binding buffer and immediately 3 µL of sample applied to a glow-discharged carbon film 400 mesh copper grid

(Electron Microscopy Sciences, Hatfield PA). The sample was washed twice with water, then stained with 2% uranyl acetate. Excess stain was blotted off and the grids allowed to dry before imaging on 120 kV Talos microscope with a BM-Ceta camera (Thermo Scientific).

5.2.6 Crosslinking Mass Spectrometry

Crosslinking of the bead bound complexes (section 5.2.2) was completed as in section

4.2.4, with minor modifications as follows. For PAXX containing the samples crosslinking was also done at concentrations of 1:3 and 1:6 (lysine: DSS). In addition to analysis on the nanoLC

Orbitrap Lumos (Thermo Scientific), each sample was analyzed on a nanoLC Orbitrap Velos, to saturate crosslink identifications. Parameters for running on the nanoLC Orbitrap Velos included; separation of peptides with a 60 minute gradient 5-30% Orbitrap solvent B, MS1 resolution 60000, MS2 resolution 7500, triggering the top 12 peaks for fragmentation with HCD

(NCE 35%) rejecting charge states 1-3. Crosslink quantitation proceeded as in section 4.2.4,

119 comparing, non-synaptic DNA-PK to synaptic, non-synaptic DNA-PK with AMP-PNP to synaptic DNA-PK with AMP-PNP, and synaptic DNA-PK to synaptic DNA-PK with PAXX.

Crosslinks that were not identified in both states were quantified with a match-between-runs approach213, creating a crosslinked peptide list then measuring the area under the XIC with the

HX app in MSS (V1), and calculating the fold change as before. Crosslinks with no corresponding feature in the given comparison were assigned a maximal log2(ratio) of 9.

Adjusting for the center of the fold change distribution, the previously determined ±3σ cut off

(section 4.3.4) was applied to each to comparison to identify the enriched crosslinks in each state.

Inter-dimer crosslinks were quantified using Xcalibur Qual Browser (V2.2, Thermo

Scientific). An XIC for the monoisotopic mass for the crosslink peptide was generated, then the

MS1 spectra was extracted for the entire XIC peak width. The area under the isotopic peaks was exported and summed. For one crosslink, one of the isotopic peaks was overlapped with another peptide so that peak excluded from the sum. The summed isotopic intensity was normalized dividing each by the PIC for that replicate.

5.2.7 Integrative Structure Modeling

Integrative structure determination proceeded through four stages105,106: (1) gathering data, (2) representing subunits and translating data into spatial restraints, (3) sampling of structural components to produce an ensemble of structures that satisfy the restraints and (4) analyzing and validating the ensemble structures and data. The integrative structure modeling protocol (stages 2, 3 and 4) was scripted using the Python modeling interface (PMI), which is a library for modeling macromolecular complexes based on the open-source integrative modeling platform (IMP) version 2.11.0 (https://integrativemodeling.org). Analysis of ensembles was

120 performed using the PMI_analysis (https://github.com/salilab/PMI_analysis) and imp- sampcon (https://github.com/salilab/imp-sampcon) libraries. The specific procedures detailed below are an updated version of previously described protocols106,217.

Gathering Data

In stage 1, the set of data and information to be used in modeling was gathered. The cryo-

EM structure 5Y3R supplied atomic coordinates for the majority of DNA-PKcs and the N- terminal domains of Ku70 and Ku8040 and localized the binding interfaces of Ku70 and Ku80 on

DNA-PKcs. Additional structures for the C-terminal globular domains were obtained from structures 1JJR (Ku7031) and 1RW2 (Ku8032). Chemical cross-links were collected on six constructs of the synaptic complex, as described above.

Representing Subunits and Translating Data into Spatial Restraints

In stage 2, the molecular representation of the system components was constructed, and the data translated into spatial restraints to score alternative structures. The molecular representation of the synaptic complex components must be sufficiently precise to allow for an accurate definition of spatial restraints given the data and biological interpretation of the model, yet sufficiently simple for efficient sampling of alternative models. This general guidance was applied to the representation of the synaptic complex in two ways. First, we represented the complex components with atomic models in a multi-scale fashion, using 1-10 residues per bead.

Second, components of the complex for which atomic models are available were treated as rigid bodies during structural sampling, within which the distances between pairs of beads are fixed.

The rigid body comprising the components of the cryo-EM structure fixes the intra-holoenzyme interaction interfaces among Ku70, Ku80, and DNA-PKcs. All rigid bodies were consistent with

121 the crosslinks between residues within the rigid bodies. Sequence segments missing from the atomic structures were represented by flexible strings of beads.

With this representation, input information from Stage 1 was translated into spatial restraints, which assess the parsimony between a piece of information and the structures of the components defined by the model representation. The following restraints were defined:

1) Excluded volume restraints were applied to the largest bead representation for each residue by applying a harmonic potential to the distance between sphere surfaces of close beads as described previously111. Bead radii are determined via a statistical measure based on the total mass of the residues it represents110.

2) Sequence connectivity restraints were applied to all consecutive beads in a molecule using a harmonic upper bound on the distance between the bead centers. The center of the harmonic potential is 3.6 Å times the number of residues between the first residue of the N-terminal bead and last residue of the C-terminal bead.

3) DSS chemical crosslinks were converted into distance restraints between crosslinked residues, relying on a Bayesian scoring function131 using a cross-linker arm length of 35 Å. The restraint was formulated to consider ambiguity of the cross-linked sites due to the presence of two copies of each protein in the system (Copy1 and Copy2). For each observed cross-linked residue pair, four alternative assignments were constructed: the intra-molecular pairs

(Residue1::Copy1--Residue2::Copy1 and Residue1::Copy2--Residue2::Copy2) and inter- molecular pairs (Residue1::Copy1--Residue2::Copy2 and Residue1::Copy2--Residue2::Copy1).

An “ambiguous” crosslink restraint was then evaluated against the model structure by multiplying the scores for all alternative restraint assignments134. A pair of crosslinked peptides that contain the same residue were treated as unambiguously inter-molecular and for these

122 crosslinks (4085-4085 and 4084-4085 in DNA-PKcs), only the inter-molecular assignments were considered.

4) Differential HX data that support areas of interaction between PAXX and Ku70 were used to create a restraint between the two molecules. An upper-harmonic potential with a mean of 4 Å and spring constant of 0.05 Å was used to restrain the minimum distance between the set of all

PAXX residues and the set of residues in Ku70 that exhibited differential HDX (residues 413-

442, 462-476 and 494-528).

Sampling

In stage 3, with the scoring function and representation in hand, alternative structures of the synaptic complex were sampled. These structures were generated using Markov-Chain

Monte Carlo (MC) enhanced by replica exchange, as implemented in IMP, for the set of movable objects as defined in the system representation. The MC moves included a random translation and rotation of rigid bodies with a maximum of 4 Å and 0.04 radians, respectively, per step, and random translations for each flexible bead of a maximum of 4 Å per step. Each of the six samples were modeled as separate systems. To model each system, 30-50 independent simulations, each using 16 replicas with reduced temperature values equally distributed between

1.0 and 2.5, were initiated. Each simulation began by translating each rigid body and flexible bead by a random vector of up to 150 Å in magnitude. The positions of flexible bead coordinates were then optimized using steepest descent minimization for 500 steps, with a maximum displacement of 2.0 Å per step. Subsequently, a burn-in phase of 10,000 MC steps followed by

2M MC steps in each production run was generated. Snapshots from production runs were saved every 10,000 frames, resulting in 20,000 frames per run and ensembles of 600,000 to 1,000,000 structures produced per sample. For modeling of the synaptic complex, each independent

123 simulation required 4-10 days on a single 16-core Intel Xeon processor with speeds between

2.3 to 2.8 GHz.

Analyzing and Validating the Models

Finally, in stage 4, the hundreds of thousands of models generated for each of the six samples were analyzed for convergence and clustered. Each independent simulation was first analyzed for equilibration of score values (total score, crosslinking score, and excluded volume score). Only structures generated after equilibration of all values were considered further, reducing the population by 40-60%, resulting in 1,176,638 post-equilibrium structures over all six samples. This set of structures was then analyzed for sampling precision based on RMSD clustering using pyRMSD218, as described previously219. Because of the large size of each model structure and the large number of structures, performing pairwise RMSD calculations on the entire set of models was not feasible. Instead, post-equilibration models for each of the six states were split into two independent groups (by random assignment of the 30-50 independent simulations into the A group or B group) and each group split into equal-sized subsets such that the combination of the A and B subsets contained ~2,000 models. The pairwise RMSD and sampling precision for each A and B subset pair were then calculated and models clustered at a threshold equal to the computed sampling precision. Across all subsets in all six samples, the number of clusters and cluster populations were similar, indicating that the computed subset sampling precision estimates were a fair representation of the population sampling precision, which averaged ~30 Å.

To represent subsets equally within the clustered models, each subset was then clustered at 30 Å, resulting in 2-5 clusters per subset. The centroid structure from each subset cluster was extracted, resulting in 2062 model structures from the combined six samples. These combined

124 structures were then clustered based on the pairwise RMSD of their DNA-PKcs coordinates using hierarchical linkage clustering as implemented in the Python library scikit-learn v0.20.3220.

This clustering resulted in seven sets of models. Four of these sets represented less than 10,000 models of the equilibrated set of ~1.2 million and were thus ignored. The remaining three clusters represented 399,500, 142,223 and 656,934 models and were analyzed further as described in our results. Localization densities were used to represent the final solutions, as described108.

5.2.8 Localizing Ku70/80 termini and PAXX

To position the Ku80 C-terminal region in the DNA-PK structure, HADDOCK docking based on crosslinks identified between Ku80 (residues 594-732) and DNA-PKcs (section 5.2.6) was performed. A modified structure for the Ku80 C-terminal region32 (PDB 1RW2) without the variable N-terminal region in 1RW2 was used. The C-terminal 22 amino acid residues were added to the 1RW2 structure as a disordered peptide, and optimized with ModLoop221, retaining the structure with the best score. The resulting refined C-terminal region was docked to the

DNA-PKcs structure (PDB 5Y3R) with the HADDOCK webserver222, the crosslinks were added as unambiguous restraints, and the added C-terminal residues were considered as fully flexible.

The top scoring model from docking the Ku80 c-terminal region to DNA-PK was added to the DNA-PK structure. Using Modeller, the linker region between the Ku80 core domain and the c-terminal region was added to the structure, and optimized with ModLoop221,223, keeping the best scoring model. The stabilizations in DNA-PK upon binding to PAXX (section 5.2.3) were then mapped to this new DNA-PK structure with full length Ku80. Based on the HX stabilizations, the crosslink between the Ku80 core and PAXX (section 5.2.6), and observed space complementarity, PAXX was manually placed in the DNA-PK model. To position the

125

Ku70 SAP domain in the DNA-PK synaptic model, the localization densities were evaluated for their proximity to DNA, and the Ku70 SAP was manually placed in the model.

5.3 Results

5.3.1 Assembling the DNA-PK Holoenzyme for Structural Mass Spectrometry

We reconstituted a series of complexes using DNA-PKcs and Ku70/80 (Figure 5.2).

Biotinylated dsDNA was used to capture DNA-PKcs, Ku70/80 plus DNA-PKcs (thus forming the holoenzyme), nucleotide-bound holoenzyme, and finally PAXX-stabilized holoenzyme. By controlling the dispersion of the biotinylated capture DNA with excess free biotin, these pull- downs model one-ended DNA breaks of various lengths: 25 bp (blunt-end and overhang) and

100bp (blunt-end). Loading methods favored strong enrichment of the intended states and protein capture was confirmed by Western blot (Figure 5.2). Each state was analyzed by nanoHX-MS161 and excellent sequence coverage was obtained (Table 5.2). Differential HX-MS experiments were conducted (e.g., free DNA-PKcs vs. DNA-bound DNA-PKcs) to determine the conformational transitions associated with stepwise assembly of the holoenzyme. The peptides with significant changes in deuteration were mapped to the DNA-PK cryo-EM structure (PDB

5Y3R)40 (Appendix B). Figure 5.3A provides a reference to aid in the assessment of deuteration changes. Approximately 11% of the holoenzyme sequence is not resolved in available structures40, but as MS techniques can detect these regions, they are mapped on structure as bars, scaled to their length. Especially noteworthy is a long 198-residue loosely-structured segment containing the ABCDE phosphorylation sites (residues 2609-2647) that regulates the interaction of DNA-PKcs with Ku70/80 and promote DNA end-processing46,51,82. Both Ku70 and Ku80 have smaller structured C-termini connected to the heterodimer through short disordered

126

Figure 5.2. Complex assembly for nanoHX-MS and XL-MS.

(A) SDS-PAGE of human PAXX purified from E. coli cell lysate. The arrow indicates the purified PAXX used for subsequent experiments. Western blot analysis of protein/protein complexes pulled down with streptavidin agarose for B) HX-MS and C) XL-MS analysis.

127

Table 5.2. Summary of the Differential HX-MS Binding Experiments.

Nucleotide Loaded Data Set DNA-PKcs Binding DNA-PK Binding DNA-PK Binding Ku70/80 AMP-PNP PAXX

Labelled with 45% Labelled with 45% Labelled with 45% Reaction Details D2O pH 7.5 on beads D2O pH 7.5 on beads D2O pH 7.5 on ± Ku70/80 ± AMP-PNP beads ± PAXX

Labelling Time (min) 5 5 5

Replicates for Each 4 Biological 4 Biological 4 Biological State

Number of Peptides 803 1863 1809

Sequence Coverage 77 91 91 (%) Average Peptide 11.8 11.9 11.9 Length

Redundancy 2.2 3.8 3.7

Repeatability (average standard 0.89 0.82 0.97 deviation, (PCD) Significant Differences in Deuterium Uptake 2.79 2.09 2.07 (PCD) Table as per community guidlines122

128

Figure 5.3. DNA-PK is a complex conformational switch. (A) Orientation of the DNA-PK (5Y3R) complex, complete with linked Ku70 (1JJR) and Ku80 (1RW2) C-terminal structures. Boxes represent missing regions of structure greater than 30 amino acid residues. The figure presents the head, consisting of a kinase domain encased within the FAT/FAT-C domains. The middle HEATs together with the N-terminal HEATs form loosely concentric rings that present a cavity in the base of DNA-PKcs. DNA binds under the N-terminal HEAT and protrudes into the cavity. Coloring is preserved for all HX figures, except for the head (all steel blue for clarity). Differential HX analysis of (B) DNA-PKcs upon binding to DNA, (C) DNA-PKcs upon binding to Ku70/80 and DNA, (D) DNA-PK upon binding to AMP-PNP, and (E) DNA-PK upon binding to PAXX. Peptides destabilized upon binding are shown in red. Peptides that are stabilized upon binding are shown in blue. Expanded analyses are shown in Figures 5.4, 5.6-8, and a list of peptides analyzed for each binding event, with PCD uptake values and change in deuterium statistics, can be found in Appendix B.

129 regions29,31,32. Crystallographic analysis suggests that elements of the Ku80 C-terminus may interact with DNA-PKcs near the PRQ domain39,40, which is another site of phosphorylation thought to regulate progression to ligation46.

5.3.2 DNA flexes the arm and is constrained by a plug domain

We first explored the effect of DNA binding upon DNA-PKcs. The changes in deuteration are well-defined, with the most prominent change found in the N-terminal HEAT repeats (Figure 5.3B, Figure 5.4), which have an arm-like morphology. The pattern of change is striking: a large stabilization in the first 450 residues is coupled with an even larger destabilization in residues 363-388. The stabilizations at the N-terminus agree with the biochemical evidence of binding224 and the known DNA binding site in the holoenzyme structure

(PDB 5Y3R)40. The major difference between this structure and the DNA-PKcs structure (PDB

5LUQ)39 is a rotation of the N-terminal HEAT repeats toward the FAT domain, involving a

“switch point” at residue 382 of DNA-PKcs in the elbow (Figure 5.3A)40. The HX data indicate that DNA binding alone is enough to induce this arm rotation (“flexing”): the inner elbow is stabilized and the outer elbow is destabilized (Figure 5.3B). Ku70/80 is not required to induce this conformational change. Arm displacement appears to propagate to the very N-terminal end of the domain. Two patches of stabilization are seen, consistent with the end of the arm knuckling into the FAT domain. This deuteration pattern occurs independent of the type of DNA used to capture the protein (either blunt-end or overhang, Figure 5.4).

Interestingly, stabilizations in the arm are not the only ones that occur upon binding to

DNA. Lower-magnitude stabilizations were also detected in the middle HEAT domain, at the base of DNA-PKcs across from the DNA binding site and flanking the large disordered region

(Figure 5.3B, left). These additional stabilizations could represent induced conformational

130 changes, but they may also indicate novel, secondary DNA binding sites. To explore these possibilities further we conducted DNA footprinting experiments, which return binding-site information at a peptide level of resolution (Figure 5.5). We identified peptides in the N- terminal arm as expected. In addition, we also identified peptides that form an extended interaction surface, including the central disordered region (residues 2577-2773) and part of the

FAT domain. The map is consistent with most of the stabilizations seen in the HX data. The extended interaction site is once again independent of the nature of the DNA used – blunt or overhang. Furthermore, when the DNA is constrained through the addition of Ku70/80, the same regions are footprinted as well as the linker between the structured regions in Ku80. While it is difficult to localize the extended binding sites with precision, the footprint is clearly relegated to one face of DNA-PKcs, consistent with the electrostatic potential on the surface of the protein

(Figure 5.5). The engagement of the large disordered domain (a.a. 2577-2773) suggests it

“plugs” the center of the middle HEAT domain, to block the DNA.

There are other minor changes in deuteration upon DNA binding. One involves a long- range destabilization in the kinase near the activation loop. It suggests that DNA binding, either indirectly through the motion of the N-terminal arm or more directly through an extended DNA binding channel, can influence the kinase domain. To explore this allosteric effect further, we conducted a nanoHX-MS analysis on the entire holoenzyme.

5.3.3 An allosteric pathway between DNA binding sites and the kinase domain

The addition of Ku70/80 induces a widespread conformational response in DNA-PKcs

(Figure 5.3C, Figure 5.6). The changes in deuteration are so extensive that interpretation beyond the domain level is challenging. Ku binding induces conformational changes throughout the molecule, including the kinase domain ~80Å away. As expected, Ku strongly reduces

131

Figure 5.4. NanoHX-MS evaluation of DNA-PKcs binding to DNA. Changes in DNA-PKcs conformational status upon binding to (A) blunt 25bp DNA and (B) overhang DNA, 25bp with a 15nt tail. Volcano plots (left figure) and the corresponding Woods plot (right figure) highlight peptides with significant changes in deuteration, determined as previously described155. Significantly deprotected peptides are shown in red and significantly protected peptides are shown in blue. Peptides are mapped to the DNA-PKcs structure with Ku70/80 hidden (5Y3R40, bottom figure). Regions of structure not represented in the HX map are colored grey; coloring otherwise as in Figure 5.3.

132

Figure 5.5. DNA foot-printing. Peptides selectively enriched after formaldehyde-based crosslinking, DNA-peptide chimera precipitation, and release. (A) DNA-PKcs on 25bp blunt DNA. (B) DNA-PKcs on overhang DNA (25bp with a 15 nucleotide overhang). (C) DNA-PK on 28bp blunt DNA. (D) DNA-PK on 2kb DNA. All significantly detected peptides are shown in purple. The red region highlights selectively enriched peptides that match to a proposed RNA binding site on DNA-PK225. (E) DNA-PK electrostatic potential surface, calculated with PDB2PQR179,180 and APBS (Adaptive Poisson-Boltzmann Solver)181 in Chimera, where blue represents positive polarity and red negative polarity. Orientation and structural layout as in Figure 5.3. DNA-PKcs40 is represented as a single color (teal).

133

Figure 5.6. NanoHX-MS evaluation of DNA-PKcs binding to Ku70/80. Changes in DNA-PKcs conformational status upon binding to Ku70/80. Volcano plot (left) and the corresponding Woods plot (right) highlight peptides with significant changes in deuteration, determined as previously described155. Significantly deprotected peptides are shown in red and significantly protected peptides are shown in blue. Peptides are mapped to DNA-PK structure40 (5Y3R, bottom figure). Regions of structure not represented in the HX map are colored grey; coloring otherwise as in Figure 5.3.

134 deuteration at the binding interface with DNA-PKcs, which is comprised of elements of the

N-terminal arm and middle HEAT repeats. Additional conformational changes are seen in the elbow of the N-terminal arm consistent with an arm flexion that is “locked in” upon Ku binding.

We used a slightly longer piece of DNA (100bp) to avoid restricting the formation of the complex and to allow for possible secondary interactions with DNA (a 25bp strand would only occupy the main channel). The secondary DNA binding sites are accentuated in the HX data and supported by DNA footprinting experiments on an even longer DNA construct (Figure 5.5).

The two remaining notable effects include a widespread destabilization that occurs throughout the molecule and a stabilization at the base of DNA-PKcs (Figure 5.3C). For the first, the destabilizations affect all domains, even the plug. The plug becomes disordered further in a set of helices that precede the ABCDE phosphorylation sites226. For the second, the addition of Ku70/80 induces a set of stabilizations in the base of DNA-PKcs. These changes may support the placement of the Ku80 C-terminus in the region, near the PQR site (Figure 5.3C)39.

5.3.4 Nucleotide loading primes the allosteric pathway

Collectively, these findings suggest a long-range allosteric pathway connecting Ku70/80, the wider DNA binding site and the kinase domain. To test this hypothesis further, we added an

ATP mimic to the holoenzyme and repeated the nanoHX-MS analysis. We reasoned that, if an allosteric pathway exists, the addition of nucleotide would likely activate conformational changes along a similar trajectory. As the addition of ATP induces phosphorylation and release of DNA-PKcs from the DNA62,215,227, we used the non-hydrolysable adenylyl-imidodiphosphate

(AMP-PMP) to characterize the conformational state immediately prior to hydrolysis and phosphorylation. Upon nucleotide binding, a conformational response was observed that cascades from the binding site all the way to Ku70/80 (Figure 5.3D, Figure 5.7). As anticipated,

135

Figure 5.7. NanoHX-MS evaluation of DNA-PKcs binding to AMP-PNP. Changes in the DNA-PKcs conformational status upon binding to AMP-PNP. Volcano plot (left) and the corresponding Woods plot (right) show peptides with significant changes in deuteration, determined as previously described155. Significantly deprotected peptides are shown in red and significantly protected peptides are shown in blue. Peptides are mapped to the DNA-PK structure40 (5Y3R, bottom figure). Regions of structure not represented in the HX map are colored grey; coloring otherwise as in Figure 5.3.

136 the nucleotide induces a strong local stabilization. It propagates throughout the kinase and includes elements of the FAT domain. More distal conformational changes involve the N- terminal arm at the shoulder and elbow. The changes in Ku70/80 are remarkable, particularly in their respective C-termini. For the most part, the heterodimer is stabilized upon nucleotide binding, including the narrow bridge over the DNA and both C-termini. For Ku80, there is a patch at the extreme C-terminal end that is destabilized, which correlates with a similar destabilization in its putative binding site on DNA-PKcs, near the PQR region. These changes in stabilization suggest a repositioning/weakening of the binding site in the kinase-active state. The strong stabilization of the Ku70 C-terminal SAP domain has no definite structural rationalization at this point. However, our DNA footprinting experiments identified this region as a DNA binding site (Figure 5.5), which is supported by previous studies31,33,34. Thus, the HX data suggest that the kinase can influence DNA binding directly. This linkage is more dramatically highlighted by observing the effect of nucleotide binding on the “plug”. The entire domain becomes disordered, including its DNA binding site, suggesting the plug exists in a conformationally-relaxed mode in the kinase-active state where DNA binding is weakened at least partially112.

Taken together, the kinase-active DNA-PKcs is a complex conformational switch that appears to organize the DNA end in an extended channel on a single side of the protein. The kinase domain engages a long-range allosteric axis pathway that connects the nucleotide binding pocket with all points of DNA binding, and the kinase regulates interactions with the Ku70/80 C- termini at its base, likely through a mechanism involving N-terminal arm transitions.

137

5.3.5 PAXX engages the allosteric pathway

We next determined if additional proteins known to interact with the holoenzyme could influence its conformational behavior. In anticipation of a structural analysis of the DNA-PK synaptic complex, we chose the homodimer PAXX, a known stabilizer of core NHEJ proteins38 and the synaptic complex43 to conduct another nanoHX-MS analysis. PAXX binding to nucleotide-loaded holoenzyme only induces stabilizations (Figure 5.3E, Figure 5.8). These stabilizations are once again distributed throughout the structure, but in highly localized regions.

Most notably, regions of the Ku70 core domain are stabilized, which supports an earlier study that showed the C-terminal of PAXX interacting with the Ku70 subunit59. This interaction would locate PAXX at the base of the holoenzyme. In support of this hypothesis, we observe a corresponding stabilization in the Ku80 C-terminal domain and linker, and one at the base of

DNA-PKcs. The interaction induces stabilization of the DNA binding sites, including elements of the N-terminal arm, the plug, and the extended channel. Once again, the nucleotide binding site is associated with these changes, likely through the conformational effects on the N-terminal arm.

5.3.6 Crosslinking the Synaptic Complex

To place the unresolved elements and better understand how the allosteric network is engaged in the context of a double-strand break, we turned to XL-MS to generate additional data for integrative modelling of the long-range complex structure. We used a DNA substrate modelled after Graham et al.56, involving a 2kb stretch of DNA with an internal biotin for affinity capture. This substrate supports synapsis through the circularization of the DNA. We confirmed synapsis using negative-stain EM albeit with some evidence of aggregation (Figure

5.9). We assembled several states and used quantitative crosslinking to explore the relationship

138

Figure 5.8. NanoHX-MS evaluation of nucleotide loaded DNA-PK binding to PAXX. Changes in the DNA-PK conformational status upon binding to PAXX. Volcano plot (left) and the corresponding Woods plot (right) highlight peptides demonstrating significant changes in deuteration, determined as previously described155. Significantly deprotected peptides are shown in red and significantly protected peptides are shown in blue. Peptides are mapped to the DNA-PK structure40 (5Y3R, bottom figure). Regions of structure not represented in the HX map are colored grey; coloring otherwise as in Figure 5.3.

139

Figure 5.9. Negative stain EM of DNA-PK synaptic complex preparations. Representative negative stain electron microscopy images of synapsis of DNA-PK on a 2kb DNA substrate. (A) DNA-PK, (B) DNA-PK + AMP-PNP, (C) DNA-PK + PAXX. The left panel in each row highlights a representative synaptic structure (large yellow arrow). The middle panel highlights a representative image of multiple Ku70/80 molecule loading on DNA (small yellow arrows showing individual Ku dimers). The right panel highlights a representative image of aggregation. Raw images were processed with ImageJ228.

140 between the holoenzyme conformational changes and synapsis. For the synaptic states, we crosslinked the holoenzyme, holoenzyme with AMP-PNP, holoenzyme with PAXX, and holoenzyme with AMP-PNP and PAXX. Nominally non-synaptic control samples were assembled on 100 bp DNA, including holoenzyme and holoenzyme with AMP-PNP. 200-300 unique DSS crosslinks mostly between pairs of lysine residues were identified for each state and mapped on Circos plots using xVis209 (Figure 5.10A, the full list of identified crosslinks can be found in Appendix C). The maps of all states appear remarkably similar. We consider in the first place the subset of crosslinks that are satisfied by a single holoenzyme (regardless of the possibility that they may, in fact, span the two holoenzymes in a synaptic complex). On average,

42% of the crosslinks can span sites on the known holoenzyme structure40. (Figure 5.11A). We also observe 3 crosslinks between the Ku80 C-terminal region and the base of DNA-PKcs, near the PQR sites around residue 1985, which supports the HX data in localizing the Ku80 C- terminal region to the base. Many of the remaining crosslinks involve nominally disordered regions, primarily residues 2577-2773 of the DNA-PKcs plug. Numerous crosslinks are formed between this region, the N-terminal arm, and the middle HEAT repeats (Figure 5.11B). No crosslinks are found between the plug and either the kinase or FAT domain, despite their proximity in the holoenzyme structure (Figure 5.11B).

Crosslinking also confirms a nucleotide-induced conformational change in the N-terminal arm. In the DNA-free and nucleotide-free monomeric state, the “hand” at the N-terminal end is too far to crosslink to the FAT domain (~50Å), but in the DNA-bound form of the nucleotide- free holoenzyme, we observe a crosslink between these regions (residues 99-3196) (Figure

5.11A). We also observe it in the nucleotide-free form of the synaptic complex. Interestingly,

141

Figure 5.10. Crosslinking of the DNA-PK synaptic complex. (A) Circos plots of crosslinks found in multiple states. Intra-protein crosslinks are shown in grey and inter-protein crosslinks in purple. (B) Comparison of crosslink abundance between states. Crosslinks enriched in the synaptic (and PAXX-bound) states are shown in blue. Crosslinks enriched in the non-synaptic (and PAXX-free) control states are shown in green. Non-significant changes are represented in grey. Redundant crosslinks (i.e. different peptides for identical linked residues) preserved for completeness. Extreme values (log2(ratio)=±9) represent crosslinks found in only one state. (C) Quantitative evaluation of a unique head-to-head (left) and a base-to-base (right) dimer crosslink. Data represent aggregated and normalized LC-MS feature intensities; error bars = SEM (n=3).

142

Figure 5.11. A subset of detected DSS crosslinks on DNA-PK bound to DNA. (A) Crosslinks satisfied by the structured elements of a DNA-PK monomer from an analysis of the non-synaptic holoenzyme (nucleotide-free), mapped to the 5Y3R structure using Xlink Analyzer210. Blue rods represent satisfied crosslinks (less than 35 Å). (B) Left structure: crosslinking attachment points shown as red spheres for the large disordered region of DNA- PKcs (residues 2577-2774, not shown). Right structure: the availability of lysine residues for crosslinking, also shown as red spheres. Coloring of the DNA-PK structure as in Figure 5.3.

143

Figure 5.12. Unique dimer crosslink spectra. (A) Representative DDA data for the 4+ charge state of the head-to-head dimer crosslink (m/z 498.261). Top left is the MS precursor spectrum. Bottom is the corresponding MS/MS spectrum, with an expansion of m/z 510-770 (top right). (B) Representative DDA data for the 4+ charge state of the base-to-base dimer crosslink (m/z 797.418). Top left is the MS parent spectrum. Bottom is the corresponding MS/MS spectrum, with an expansion of m/z 800-1450 (top right). Annotated spectra show the fragments for each peptide (y and b series) as well as the crosslinked fragments (b_β, b_α, y_β, and y_α) and internal fragments in teal. All spectra from the CRIMP plug-in from the Mass Spec Studio (www.msstudio.ca).

144 upon nucleotide binding, this crosslink is reduced in intensity for the holoenzyme and it completely vanishes in the synaptic state. PAXX binding does not return it.

Quantitative analysis was extended to the rest of the crosslinks (Figure 5.10B), to characterize the synaptic states prior to integrative modeling. Two crosslinks stand out as unambiguously inter-holoenzyme, where the same peptide sequence is found on each side of the crosslink (Figure 5.12). The one crosslink supports a head-to-head interaction, linking residues

4085-4085 in the kinase domain. It is absent in the non-synaptic control, whereas the addition of

PAXX increased its abundance considerably (Figure 5.10C). Loading the kinase with AMP-

PNP generated the highest abundance. In fact, AMP-PNP binding appears to drive low levels of synapsis even in the nominally non-synaptic control. This is entirely possible. The design of the control attempts to disperse DNA evenly on the beads, but the preparation likely possesses some distribution in the dispersion, allowing a subset to dimerize. Surprisingly, the addition of PAXX to the nucleotide-loaded holoenzyme increased the intensity in this crosslink relative to the nucleotide-free state, but not the nucleotide-bound state. Taken together with the HX data, the similarity in the crosslinking for the two nucleotide-loaded states (Figure 5.10C middle) suggests one conformational state is shared between the two forms. PAXX may slightly alter the conformation of the head domain and thus reduce crosslinking locally in this region.

The other unambiguously inter-holoenzyme crosslink supports a base-to-base interaction, linking residues 1869-1869 in the middle HEAT repeats. The crosslink was identified in all states, including at significant levels in the non-synaptic controls (Figure 5.10C). The intensity patterns are very different from the head-to-head crosslink: the intensity decreases upon adding

AMP-PNP and/or PAXX. In preliminary modeling runs, we observed that the base-to-base

145 crosslink is almost never satisfied. Thus, the crosslink is likely an artifact indicative of aggregation (Figure 5.9).

Finally, we note that our crosslinking data localize PAXX near the base of DNA-PKcs, in support of the HX results in Figure 5.3E. Crosslinks are scant but there is a link between the N- terminal of PAXX and the Ku80 C-terminal core structure (Figure 5.10A). Surprisingly, no crosslinks are found to Ku70 where the PAXX tail is known to bind. However, in agreement with the HX data it appears that a PAXX dimer and Ku70/80 synergistically stabilize the base of a DNA-PKcs monomer and drive a holoenzyme conformation that can synapse across a double strand break, especially when nucleotide-bound.

5.3.7 Structure of the DNA-PK synaptic complex

Armed with the crosslinking data, the available high-resolution holoenzyme structure40, the Ku70/80 termini structures31,32, the physical constraints of connectivity and excluded volume, we performed integrative structure modeling using the Integrative Modeling Platform (IMP) program229, to determine the orientation of the holoenzyme in the synaptic form (Figure 5.13,

5.14). We constructed separate models for each of the six crosslinked states, with each model consisting of two copies each of DNA-PKcs, Ku70 and Ku80. Each crosslink was allowed to be satisfied by either an inter-or intra-holoenzyme distance. In total, 1.2 million configurations of the synaptic complex were assessed. For each dataset, the head-to-head crosslink (4085-4085) was applied to filter the ensemble, followed by structural clustering. Approximately 200 representatives of each cluster for each dataset were combined, and clustered again to identify common states among all six crosslinked states (Figure 5.15A). Two distinct clusters satisfying the head-to-head crosslink emerged, along with a more diffusive third cluster that did not. The dominant cluster of solutions represents ~30% of the total configurations and has a precision of

146

Figure 5.13. Integrative structure modelling of the synaptic complex. Integrative structure modelling of the complex proceeded through four stages110, including (1) gathering information, (2) representing the system and translating data into spatial restraints, (3) sampling structures, and (4) analyzing and validating the ensemble of structures and data. The integrative structure modelling protocol was scripted using the Python modeling interface of the open-source IMP package229, version 2.10 (https://integrativemodeling.org).

147

Figure 5.14. Multi-scale representation of synaptic complex components. The components of the synaptic complex were modeled using a multi-scale scheme guided by the available input structures. Regions of sequence described in a high-resolution structure (structured) were modeled with beads comprising one residue each, while all other regions were modeled by beads of up to 5-10 residues per bead. (A) Sequence plots for each molecular component of the holoenzyme. The colored blocks on the bottom show the structured regions that are represented by one residue per bead. The circles above the blocks represent individual beads used to represent both the structured and non-structured regions of sequence. The system was further divided into four rigid bodies, as shown by the dotted boxes. (B) The multi-scale representation of the four rigid bodies in the holoenzyme (including PAXX) and the molecular components that they encompass. (C) Creating a multi-scale representation. For rigid body 2, a portion of the sequence is structured and represented at one residue per bead (orange block) and up to ten residues per bead (large circles). The remaining structure is represented only by beads of up to five residue per bead (smaller circles on the N- and C-termini.

148

Figure 5.15. Integrative models of the synaptic complex. A) Model reduction and clustering analysis of the synaptic models from each crosslinked sample. From the set of ~4 million sampled structures of the synaptic complex, ~1.2 million remained after equilibration testing. Subsets of these models were generated for each dataset and clustered at 30Å. The resulting 2062 cluster centroids were then combined and clustered again, resulting in three major structural classes. Clusters highlighted in red are clusters which satisfy the head to head dimer crosslink. The grey cluster does not satisfy the head to head dimer crosslink. B) Detailed representation of the average structure from Cluster 1, showing the localization densities (light/dark grey surfaces) fit with the DNA-PK structures (5Y3R) where the two DNA- PKcs molecules are shown in teal and dark slate grey, the Ku70s in shades of yellow, and Ku80s in shades of orange. The localization density for the plug domain (representing DNA-PKcs residues 2577-2773) is highlighted in red.

149

13.5Å. This cluster is comprised of models from every synaptic data set, with the dominant contributions coming from the nucleotide-loaded forms. It includes some contribution from a nominal control state, the nucleotide-loaded holoenzyme on 100 bp DNA. This finding is not surprising, given the head-to-head crosslink observation in this sample (Figure 5.10C).

Strikingly, the structure places the holoenzymes side-to-side, across an interaction interface that begins at the kinase and FAT domains and runs along the middle HEAT repeats

(Figure 5.15B). We did not explicitly include DNA strands in the modeling, but we fit the model with the holoenzyme structure (PDB 5Y3R) to orient them. This fitting places the incoming broken DNA strands in parallel, but staggers their ends by ~105 Å. In this configuration, the two

DNA surfaces where the “plug” domains are located are found on opposite sides of the synaptic complex.

The smaller cluster of solutions that satisfies the head-to-head crosslink represents 12.1% of all models with a precision of 15.3Å. This solution orients one holoenzyme at a right angle to the other along its long axis, while still maintaining contact between the FAT/kinase domains of

DNA-PKcs. The interaction interface does not involve any additional DNA-PKcs contacts.

Rather, the Ku70/80 heterodimers contact each other extensively. We fit the holoenzyme structure as above, but here we found that the distal DNA ends (i.e. the ones opposite the

“break”) would collide at Ku70/80 (Figure 5.16). Thus, the smaller cluster cannot represent the synaptic state. Our EM data suggest that some DNA strands could have multiple Ku heterodimers loading on a section of dsDNA, forming “beads on a string” (Figure 5.9). This phenomenon has been observed previously55 and explains how a subset of solutions could position two Ku heterodimers close together.

150

Figure 5.16. Structural models of clusters not representative of the synaptic state. Top: second cluster of solutions (15.3Å precision and 144K models). Bottom: third cluster of solutions (30.5Å precision and 675K models). As in Figure 5.15, two DNA-PK molecules were fit in the localization densities, with the same coloring of DNA-PK molecules as used previously.

151

The remaining models belong to a cluster with low precision (30.5Å), placing the holoenzymes in an approximate face-to-face position, where the base of one molecule interacts with the loosely structured “plug” of the other (Figure 5.16). The crosslinks that support the cluster are mostly contained in the “plug” and other disordered regions. It is likely that this cluster is representative of the aggregated states in the sample, whose presence was indicated by the negative stain EM data. Not surprisingly, a fraction of the models from all six crosslinked samples populate this diffusive cluster, as aggregates are expected in all cases (Figure 5.15).

Taken together, the data suggest a structure of the synaptic complex that brings the ends of the

DNA break together in a staggered manner, separated by a distance of ~100Å, and protected by a well-defined plug domain.

5.3.8 Ku70/80 C-terminal regions define a supportive base

We then inspected how the remaining structural elements could be placed within the model. Integrative modeling localizes the structured Ku80 C-terminal domain internal to the synaptic complex as follows (Figure 5.17A). To refine its position using interaction energies and shape complementarity, we used HADDOCK, aided by the 5 crosslinks identified between the

Ku80 C-terminus and DNA-PKcs. Top-scoring clusters docked the Ku80 C-terminus directly under DNA-PKcs in the central cavity made by the middle and N-terminal HEAT repeats, readily accommodating the linker to the Ku80 core domain. This aspect of the model is well- supported by the HX data and DNA footprinting (Figure 5.3D, Figure 5.5). Interestingly, although the crosslinking data for PAXX is too sparse for precise modeling, the available crosslinking data situate the structured head domains neatly in the gap between the Ku80 C- terminal domain and main Ku70/80 heterodimer (Figure 5.17B). Placement in this location also helps to rationalize the HX data for the interaction between the PAXX C-terminal and Ku70.

152

Figure 5.17. Positioning the Ku70/80 C-terminal regions and PAXX in the synaptic model. (A) Positioning of the Ku80 C-terminal region. Light and dark grey show the localization densities for the Ku80 C- terminal regions from the Cluster 1 synaptic model (Figure 5.15), corresponding to the respective light and dark DNA-PK structures. Separate HADDOCK modeling exercises (middle) illustrate a cluster of solutions based on a subset of crosslinking data, with the scores (top) and top 3 corresponding structures (bottom) shown. The Ku80 C- terminal is displayed in light orange, and the DNA-PK molecule colored as in Figure 5.3. The linker distances (from the C-terminal of the Ku80 core to the N-terminal of the Ku80 C-terminal domain) are shown, calculated with Jwalk230. To position the Ku80 C-terminal in the synaptic model, the top scoring HADDOCK model was overlaid with the Cluster 1 synaptic model (right). (B) Positioning of PAXX in the synaptic model. The synaptic model shown is the cluster from modelling the synaptic + AMP-PNP + PAXX crosslinking data set that corresponds to the Cluster 1 synaptic model (Figure 5.15). The synaptic structure was augmented with the best scoring model of the Ku80 C-terminal domain, with its linker inserted using Modeller221,223. Light and dark grey show the localization densities for PAXX, corresponding to the respective light and dark DNA-PK structures (left). Stabilizations seen in the HX-MS data upon binding to PAXX are mapped to one half of the dimer (middle), colored as in Figure 5.8. The crosslink between Ku80 and PAXX, together with these stabilizations, were used to manually position PAXX (PDB 3WTD38, green), overlapping with the Ku80 linker at the base of DNA-PKcs (right). (C) Position of the Ku70 SAP C-terminal region. Light and dark grey show the localization densities for the Ku70 SAP C-terminal regions from the Cluster 1 synaptic model (Figure 5.15), corresponding to the respective light and dark DNA-PK structures (left). The DNA foot-printing results for the Ku70 SAP domain and linker (middle), together with the localization densities IMP, support the placement of the Ku70 C-terminal SAP region at the extreme ends of the complex (right).

153

Finally, based on crosslinking alone, the Ku70 SAP domain is also difficult to locate with precision, but DNA footprinting supports its localization directly behind the Ku heterodimer

(Figure 5.17C), which is consistent with its suspected role in DNA binding to prevent inward movement of the DNA-PK complex31,34.

5.4 Discussion

Using integrative modeling and structural mass spectrometry methods, we determined the structure of the long-range synaptic complex and the conformational changes in the holoenzyme that could accompany a transition out of the long-range state. Our HX data confirm the localization of the primary DNA binding site in the center channel and together with quantitative crosslinking, highlight the influence of DNA binding on the conformational state of the N- terminal arm. This arm undergoes a major flexion at the elbow upon DNA binding. Ku70/80 supports the conformational change in the arm and completes an allosteric pathway between the kinase and the DNA-binding site. The degree of activation is remarkable. Conformational change is transmitted throughout the holoenzyme, confirming that DNA-PKcs is a large conformational switch under the control of the kinase domain. Nucleotide loading generates the active kinase, which appears to force the arm towards a conformation closer to the DNA-free form. In other words, nucleotide loading generates a tensioned state, as the arm is pushed away from the FAT domain even when DNA is bound. The tensioned state is also characterized by weakened interactions with the Ku80 C-terminus and increased disorder in the plug domain (Figure 5.18).

Our structural model of the synaptic complex is broadly consistent with a density map determined by negative-stain EM at 33Å resolution63, although there are also key differences.

Both representations identify a two-fold axis of symmetry, with the DNA-PKcs subunits offset along the axis. However, while both orient the ends of the two DNA strands in parallel, our

154

Figure 5.18. Nucleotide loading induces tension in DNA-PK. Free DNA-PKcs exists in a relaxed state, with the N-terminal arm distal from the head. Binding of DNA and Ku70/80 pushes the DNA-PKcs N-terminal into a flexed state. Loading the ATP analog places the arm in tension, where a nucleotide-induced conformational change acts against the DNA-induced flexing of the arm. The tense state destabilizes the plug domain.

155 model positions the ends much further apart from each other (~105Å) and places the Ku heterodimers much farther away from the interface than the EM model. Our model is also consistent with the head-to-head dimerization of other members of the phosphatidylinositol 3- kinase related kinase family, ATM 231–233 and ATR234.

Our model also reveals new insights into how the DNA end is presented within the holoenzyme. The “exit channel” for the DNA in each monomer is blocked by the large presumably disordered region, effectively forming a space-filling plug or barrier, preventing the

DNA from extruding further from the channel. It may also serve to distort the incoming DNA end, directing it towards the FAT domain, splaying DNA over a wider surface for processing functions55,112,113. This hypothesis would rationalize the secondary binding sites observed by HX

(Figure 5.3B, left), however non-specific binding of the DNA strand cannot be ruled out. We note that the upper reaches of the secondary site (Figure 5.5) have been recently suggested to form part of an RNA binding domain225. Either way, the electrostatic potential surface of DNA-

PKcs is decidedly polar, which suggests the scaffold is capable of orienting DNA across a wide contact surface on the face opposite from Ku70/80. The structure also reveals that the Ku80 C- terminal domain and linker participate in binding DNA at the base of the complex, partly filling the hole formed by the middle and N-terminal HEAT repeats. This placement defines a PAXX binding site in the same region, running from Ku70 along the base of DNA-PKcs to the Ku80 C- terminal end. The assembly of these components places the DNA ends in a well-protected pocket, consistent with the major role of DNA-PKcs as an end protector44,46. Finally, the active kinase generates the most competent state for synapsis, likely by reorganizing the dimerization interface (Figure 5.3D). It appears that the tensioned state is preserved in the synaptic complex because the N-terminal arm does not engage the FAT domain, even with PAXX added.

156

Our integrative model and conformational analysis allow us to address some unanswered questions about the organization of downstream repair functions. The long distance between the DNA ends is consistent with the lack of FRET observed between labeled DNA ends56, supporting our model of the long-range complex. Even if the ends were splayed over the secondary sites on the plug-side of the holoenzyme, the distance separating them would still exceed 100Å. Therefore, the structure of this long-range complex explains why DNA end- processing might require a synaptic state. The large separation of the DNA ends presents an opportunity for the complex to assemble downstream machinery prior to committing to repair

(Figure 5.19). The complexity of the damage that can occur upon a double-strand break necessitates repair by a wide range of enzymes to remove the damage and allow for ligation

(e.g., Artemis, FEN1, PNKP, Werner protein, MRN, and DNA polymerases24); moreover, each end could require its own combination of enzymes. Separating the ends and placing them in the center of a broad surface allows for larger and more complex lesions (e.g., hairpins and long overhangs) to be accommodated and for recruiting the necessary accessory enzymes such as

Artemis235. In this context, the dynamics of the plug are likely significant. Phosphorylation of the

ABCDE sites in the plug domain (or perhaps simply nucleotide turnover) may connect kinase activity to the factors required for engaging the extreme ends of the DNA break.

The large separation of ends has further implications for end-processing. End-joining necessitates the recruitment of XLF, XRCC4 and LigIV in a scaffolding role across the break, and the presentation of LigIV for ligation56,96. The synaptic complex can nucleate their recruitment, perhaps through each holoenzyme as an intermediate, ultimately to form a supporting structure with the stoichiometry that has been recently demonstrated96 (Figure 5.19,

Figure 5.20). FRET studies have shown that a measure of processing takes place in the context

157

Figure 5.19. A model of long-range DNA-PK synapsis in the context of NHEJ. The initial (or long-range) synaptic complex protects the DNA ends with plug domains, under conformational control of the kinase domains, preventing resection and premature end- processing. The base of the central cavity is blocked by the Ku80 C-terminus and PAXX, completing the sheltering of DNA ends (left panel). Processing factors (recruited directly or in trans) unblock DNA ends as needed and scaffolding factors are recruited to prime the transition to a short-range synaptic complex, through an intermediate (center panel) that generates the expected stoichiometry (right panel). Space-filling structural model is found in Figure 5.20.

158

Figure 5.20. Structural assembly supporting the long-range synaptic complex. Beginning with the initial synaptic complex stabilized by PAXX, the structure shown is the same as in Figure 5.15B (with the localization densities hidden, except for the plug domain). The position of the Ku70/80 c-terminal regions and PAXX are from Figure 5.17. The XLF-XRCC4- LigIV complexes were assembled by aligning the XRCC4-LigIV BRCT structure (PDB 3II6)66 to the XLF-XRCC4 structure (PDB 3SR2)73. Although the placement of the XLF-XRCC4-LigIV complex is not data driven, the complex was positioned such that the distance from the XLF binding site on Ku80 (highlighted in blue on the Ku80 molecules) to the C-terminal of the XLF structure does not violate the length of the XLF tail (~60 residues). There are no structural conflicts in this model. Coloring of structures is as in Figure 5.19.

159 of a short-range complex97. The dimensions of this complex are uncertain, but it is difficult to see how this complex could involve DNA-PKcs. A large reorientation would be required to bring the ends close together, requiring the remodeling of a large interface (>10,000Å2 in buried surface area). There are two possible alternative mechanisms. In one, the ends are enzymatically unblocked (if required) in the long-range complex to the point where they can exit the base of

DNA-PKcs and move closer together for further processing and ligation. Here, DNA-PKcs would retain a role in positioning the downstream repair machinery until a later stage in processing. In the other, DNA-PKcs would be released from the break immediately after the recruitment of the scaffold and the coordination of any unblocking activity, to permit further processing and ligation on the minimal scaffold. Either mechanism requires the timely exit of

DNA-PKcs, which can be facilitated by tension release in the synaptic complex. Specifically, we propose that phosphorylation (or even nucleotide turnover) provides sufficient free energy in the form of released arm tension to eject DNA-PKcs from the DNA and expose the ends to a short- range complex215.

The synaptic structure raises new questions that need to be explored in future studies. For example, the autophosphorylation of the ABCDE sites in cis is readily explained through conformational transitions in the plug domain, but it is much harder to rationalize how autophosphorylation in trans could occur at the PQR sites46, given their extreme distance from the catalytic sites. Such phosphorylation could require some relay mechanism that invokes other intermediates or perhaps it occurs during release. It is also conceivable that a large conformational change in DNA-PKcs could drive closer contact for autophosphorylation. The capture of additional accessory factors (e.g., XLF, XRCC4 and LigIV) and integrative structural

160 analysis should allow us to determine how the long-range synaptic complex recruits these factors and transitions to the next stages of repair.

161

Chapter 6: Summary and Future Directions

6.1 Summary

NHEJ is the major pathway that repairs DSB caused by IR. With DNA-PK being targeted for increasing the efficacy of IR, or as a monotherapy to treat cancer13,15,16,236, a structural understanding of DNA-PK in NHEJ could allow for a more targeted approach to inhibiting

DNA-PK. Though DSB repair by NHEJ is presented in three steps (end detection and tethering, end processing, and ligation237), the actual process is likely much more dynamic, involving large interacting protein assemblies. Considerable effort has been expended to generate high- resolution structures for the core NHEJ proteins, but the size and dynamics of the larger NHEJ complexes make further structural analysis challenging. This thesis aimed to develop a model of

DNA-PKcs in the initial stages of NHEJ, using optimized HX-MS and XL-MS methods, in a manner that overcomes some of the limitations of conventional structural techniques.

In chapter 2, HX-MS methods were tested and developed for the analysis of large proteins and their interactions, such as DNA-PKcs (+/- DNA). With the combined improvements provided by the nano-spray HX system, accurate deuterium quantification in MSS, spectral complexity reduction using reduced % D2O, and multipoint volume correction for deuterium uptake, HX-MS can be achieved on ultra-large systems, even offering high sequence coverage and redundancy. The combination provided a large (5-10X) decrease in sample consumption requirements, sufficient to support the work presented in this thesis. The ability to do HX-MS in a slurry-format (i.e. DNA captured on beads) allows for the DNA to serve as platform for the assembly of larger NHEJ complexes, increasing control over heterogeneity by allowing for unbound proteins to be washed away.

162

With RCAP (formaldehyde footprinting) in chapter 3, a rigorous 2 stage cut-off allowed for the direct identification of DNA binding sites. Although some uncertainty remains in the identified DNA binding sites (i.e. partial coverage), the results of formaldehyde footprinting are ideal for combination with other data, to ensure the full binding site is determined.

Chapter 4 shows that CRIMP-MSS correctly identifies crosslinked peptides, using a database reduction strategy that can easily accommodate the purified NHEJ complexes (and beyond). Additionally, a large number of crosslinks can be identified for DNA-PKcs without absolutely requiring enrichment from the free peptides. A significant amount of protein was lost to buffer exchange so DNA-PKcs and Ku70/80 were purified into an XL compatible buffer, to limit sample loss. As with the HX-MS experiments, the crosslinking samples could benefit from the assembly of complexes on beads, limiting signal from free proteins. A revised on-bead protocol for XL of DNA-PK was developed to identify many crosslinks, which could be used for an LFQ-based comparison of states.

Upon optimization of all biochemical and MS methods, I was able to build a model of

DNA-PKcs in the initial stages of NHEJ in chapter 5. The first question to be addressed was the role conformational changes have in DNA binding and activation. From the HX analysis of the assembly of DNA-PK on one side of the break, an allosteric axis was identified linking the movement of the N-terminal arm, FAT/kinase, and the plug domain (a newly discovered feature). When DNA-PKcs binds to DNA with Ku70/80, the N-terminal arm adopts a flexed state, opening the kinase domain for activation. Upon nucleotide loading of the kinase, the DNA-

PK appears to enter a tensed state. The kinase is stabilized by the nucleotide binding, leading to a partial movement of the N-terminal arm away from the FAT domain, as indicated by loss of the crosslink between the two domains. In the tensed state the plug domain is partially displaced,

163 becoming more exposed. Binding of PAXX to the nucleotide loaded DNA-PK partially reverses some of the tension, stabilizing the N-terminal arm and the plug domain, but it is not able to restore the crosslink between the N-terminal and the FAT domain. Overall, we see that

DNA-PK is a large conformational switch, with binding of any of cofactors causing changes well beyond their binding site.

Next, we addressed how binding and long-range synapsis protect DNA ends from resection while still allowing access to processing enzymes, and if synapsis contributes to trans auto-phosphorylation of DNA-PKcs. This set of questions was best answered by our ISB model.

With the XL-MS data and available structures, we were able to generate an ISB model of DNA-

PK long-range synapsis at a global precision of 13.5Å for dimeric DNA-PK. The other elements modelled (Ku70/80 c-terminal regions and PAXX) were more crudely localized in the ISB model, but were positioned with greater confidence using a combination of alternative modeling procedures and manual assignment. The model shows a symmetrical dimer of DNA-PK molecules head-to-head, with a few contacts extending into the middle HEATS. This model places the DNA ends in a staggered position separated by >100Å. Despite the staggered positioning of the DNA ends, the ends are well protected within the center of the DNA-PKcs molecule, the central cavity blocked by the plug domain, and the cavity at the base blocked with the Ku80 c-terminal region and PAXX. End processing proteins might then be recruited to the

Ku70/80 molecule in trans, ready to bind the free DNA end when exposed. Exposure of DNA ends could be caused either by phosphorylation of the plug on the ABCDE site, or even by the nucleotide-loaded tensioned state causing a partial displacement of the plug domain and the

Ku80 C-terminal. Interestingly, we propose that the short-range complex could be assembled using the long-range synaptic complex as a scaffold. However, from our model the DNA-PKcs

164 molecules are not positioned in a manner that obviously allows for trans auto- phosphorylation of either the ABCDE or PQR cluster. To achieve trans auto-phosphorylation of the PQR would require a massive conformational change, which we do not rule out.

It is interesting to speculate as to the significance of these findings from a mechanistic and therapeutic perspective. Many of the small molecule inhibitors developed as potential cancer therapies target the kinase domain of DNA-PKcs238. Given the binding to the kinase, it is possible that the use of such inhibitors could regulate a tensioned state, like AMP-PNP. In this tensioned state we saw a displacement of the plug domain that might regulate accessibility to the

DNA ends, or could increase the accessibility of the ABCDE cluster for phosphorylation by

ATM, which then allows access to processing enzymes46,50,51,82. If the inhibited DNA-PKcs molecule is trapped on the DNA, alternative repair pathways could be restricted from access as the ends are still blocked, but if these inhibitors cause dissociation it may in fact allow for repair by alternative pathways. Though not consistent in all studies, a group found that a DNA-PKcs inhibitor does not actually cause cell death due to unrepaired DSBs, as the DSBs were still being repaired by the other NHEJ factors not involving DNA-PKcs239. It would be important to know if this DNA-PKcs independent NHEJ is as error free as with DNA-PKcs, as well the effect of

DNA-PKcs inhibitors on DNA binding more specifically. If our model is right in proposing that

DNA-PK assembly and synapsis protects the ends, we might expect DNA-PKcs independent repair (whether DNA-PKcs independent NHEJ or alternative NHEJ) to be more error prone.

An alternative to targeting the kinase of DNA-PKcs, could be targeting the allosteric pathway that was identified. If a molecule could be designed to trap the kinase in the flexed state, it could possibly inhibit kinase activity, as well keep the DNA-PKcs trapped on the DNA in order to sustain a protected state. Targeting a conformational change may offer less concern

165 about specificity, as kinase inhibitors can also target other kinases in the same family15,238.

Further, it also it has the opportunity to be specific to the DNA-PKcs role in NHEJ.

6.2 Future Considerations

It would be compelling to test the final ISB model by disrupting the interface with mutations, but unfortunately at the precision of the model we cannot identify interacting residues. For the observed interface in the model, low resolution regions of possible interactions were selected in the structure (Figure 6.1B). When looking at the intensity of the dimer crosslink, it appeared that a conformational change in binding to AMP-PNP favored the synaptic state. With the addition of PAXX, relative to AMP-PNP alone, the synaptic state was reduced, we assume because PAXX seems to reverse some of the conformational changes that were seen upon binding to AMP-PNP. Interestingly one of the conformational change reversals occurs within the possible interaction sites: 2822-3035 (Figure 6.1). This region also shows decreased crosslinking in synaptic samples, suggesting it might be more buried upon synapsis, (consistent with a possible interface) making this one of the more compelling “interface” regions to test.

This still leaves a large region that could interact (over 200 residues), but possibly with multiple sequence alignments or the examination of surface accessible residues and charges this could be trimmed to a reasonable number of testable residues.

Building our model of DNA-PKcs in the initial stages of NHEJ only begins to scratch the surface of the assembly of NHEJ. With my enhancements of the MS protocols for the on-bead capture, the next step would be to continue the buildup of the complex, for both HX and XL analyses. However, unless the complexes can form a stable synaptic complex, all assemblies

166

Figure 6.1. Synaptic Dimer Interface. A) Surface for the interface between the synaptic DNA-PK molecules was generated with Intersurf240 in Chimera. The surface is shown with a gradient of red to blue with red being the closest interactions and blue the furthest. B) Interaction surface rotated 90° with the DNA-PK molecules removed (left) and with the possible interacting regions (899-901, 2409-2574, 2822- 3035, 3823-3881, 4071-4095) overlaid in cyan (right).

167 may only be amenable to XL-MS. But, the next logical expansion of the synaptic complex would be to include XLF and XRRC4, which have been seen to stabilize synapsis43, allowing us to test if the short-range complex could actually form under the long-range synaptic complex.

We could also test different synapsis stabilizers to evaluate the accuracy of our synaptic model.

The addition of Artemis to the DNA-PK complex would be interesting to study as it is one of the few processing enzymes that is known to interact with DNA-PKcs85.

As appealing as it may seem to assemble whole complexes from purified proteins, for simplicity of knowing what exactly is in the system, at some point the relevancy of these modelled complexes will need to be determined in the cell. Also, the effects of relevant post- translational modifications on the system should be explored. To achieve a structural MS analysis on NHEJ complexes abstracted from cells directly would require a member of the NHEJ complex to be tagged for pull down and crosslinking, or alternatively crosslinking in the cell then pulling the complexes down. These types of samples will be much more complex than the assembled protein complexes, so adaptations in crosslinking methods would need to be made accordingly. These could include the identification of pulled down proteins, to restrict the number of proteins that must be searched for crosslinked peptides. A peptide enrichment step which could include enrichable crosslinkers, if available. Ultimately, if the database cannot be reduced sufficiently through pull-down of relevant proteins, cleavable crosslinkers can be employed for effective database reduction.

I think one of the largest questions that needs to be addressed for NHEJ relates to the phosphorylation of DNA-PKcs. The main question our model could not answer was how trans auto-phosphorylation occurs. This is not a trivial matter to study, even with our MS methods.

One possible approach involves a time course of crosslinking with a photoactivatable

168 crosslinker: crosslinking the protein rapidly at specified time points to see how the crosslinking changes with increasing amounts of phosphorylation. Not only do we not understand what happens to DNA-PKcs as it is phosphorylated, we also do know what could cause such a large conformational change to support trans auto-phosphorylation across the break.

With respect to methods development, the multipoint peptide correction should be included in all future HX-MS experiments to account for labelling errors. Using peptides from the protein system being studied, it can easily be incorporated into any data set, allowing more accurate detection of deuterium change. When conducting LFQ of the crosslinked peptides, I was surprised to realize how many crosslinks that were a “unique” identification were not actually unique crosslinks (i.e. they were actually in the dataset). Whether it is under sampling on the MS, the crosslink intensity being just below the fragmentation intensity cut-off, or simply a missed identification in the software, I think LFQ methods should be employed routinely in crosslink analysis. Oftentimes modelling decisions about the quality of the crosslink identification, are made based on number of identifications131, and how many replicates a given crosslink was found in137. This could become especially important if you are modelling two different states. If a crosslink is in both samples, but only identified in one, it will only be used for modelling one state, creating a possible bias. Not that LFQ should be done for all crosslinking experiments, but if crosslink identifications are going to be used to eliminate/weight data based on reproducibility, or modeling of multiple states, it should be considered. Admittedly this was not something done in my own data set when comparing the crosslinked states, but in retrospect it would be something I would want to do. I do acknowledge that LFQ of crosslinked peptides can be a laborious task as it is not incorporated in many software tools yet, but is something that we are working towards incorporating into MSS.

169

Bibliography

1. Ceccaldi, R., Rondinelli, B. & D’Andrea, A. D. Repair Pathway Choices and Consequences at the Double-Strand Break. Trends Cell Biol. 26, 52–64 (2015). 2. Prakash, R., Zhang, Y., Feng, W. & Jasin, M. Homologous Recombination and Human Health: The Roles of BRCA1, BRCA2, and Associated Proteins. Cold Spring Harb. Perspect. Biol. 7, 1–27 (2015). 3. Karanam, K., Kafri, R., Loewer, A. & Lahav, G. Quantitative Live Cell Imaging Reveals a Gradual Shift between DNA Repair Mechanisms and a Maximal Use of HR in Mid S Phase. Mol. Cell 47, 320–329 (2012). 4. Hustedt, N. & Durocher, D. The control of DNA repair by the cell cycle. Nat. Cell Biol. 19, 1–9 (2017). 5. Rothkamm, K., Krüger, I., Thompson, L. H., Kru, I. & Lo, M. Pathways of DNA Double- Strand Break Repair during the Mammalian Cell Cycle. Mol. Cell. Biol. 23, 5706–5715 (2003). 6. Waters, C. a et al. The fidelity of the ligation step determines how ends are resolved during nonhomologous end joining. Nat. Commun. 5, 1–11 (2014). 7. Bétermier, M., Bertrand, P. & Lopez, B. S. Is Non-Homologous End-Joining Really an Inherently Error-Prone Process? PLoS Genet. 10, 1–9 (2014). 8. Iliakis, G., Murmann, T. & Soni, A. Alternative end-joining repair pathways are the ultimate backup for abrogated classical non-homologous end-joining and homologous recombination repair: Implications for the formation of chromosome translocations. Mutat. Res. Toxicol. Environ. Mutagen. 793, 166–175 (2015). 9. Ismail, I. H. et al. SU11752 inhibits the DNA-dependent protein kinase and DNA double- strand break repair resulting in ionizing radiation sensitization. Oncogene 23, 873–882 (2004). 10. Gustafsson, A.-S., Abramenkovs, A. & Stenerlöw, B. Suppression of DNA-dependent protein kinase sensitize cells to radiation without affecting DSB repair. Mutat. Res. - Fundam. Mol. Mech. Mutagen. 769, 1–10 (2014). 11. Mamo, T. et al. Inhibiting DNA-PK CS radiosensitizes human osteosarcoma cells. Biochem. Biophys. Res. Commun. 486, 307–313 (2017). 12. O’Connor, M. J. Targeting the DNA Damage Response in Cancer. Mol. Cell 60, 547–560 (2015). 13. Yang, C. et al. NU7441 Enhances the Radiosensitivity of Liver Cancer Cells. Cell Physiol Biochem 38, 1897–1905 (2016). 14. Weterings, E. et al. A novel small molecule inhibitor of the DNA repair protein Ku70/80. DNA Repair (Amst). 43, 98–106 (2016). 15. Sishc, B. J. & Davis, A. J. The role of the core non-homologous end joining factors in carcinogenesis and cancer. (Basel). 9, 1–30 (2017). 16. Doherty, R. E., Bryant, H. E., Valluru, M. K., Rennie, I. G. & Sisley, K. Increased non- homologous end joining makes -pk a promising target for therapeutic intervention in uveal melanoma. Cancers (Basel). 11, 1–13 (2019). 17. Khanna, A. DNA Damage in Cancer Therapeutics: A Boon or a Curse? Cancer Res. 75, 2133–2139 (2015). 18. Nickoloff, J. A., Jones, D., Lee, S. H., Williamson, E. A. & Hromas, R. Drugging the

170

Cancers Addicted to DNA Repair. J. Natl. Cancer Inst. 109, 1–13 (2017). 19. Soni, A. et al. Requirement for Parp-1 and DNA ligases 1 or 3 but not of Xrcc1 in chromosomal translocation formation by backup end joining. Nucleic Acids Res. 42, 6380–6392 (2014). 20. Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646– 674 (2011). 21. Wang, C. & Lees-Miller, S. P. Detection and repair of ionizing radiation-induced DNA double strand breaks: new developments in nonhomologous end joining. Int. J. Radiat. Oncol. Biol. Phys. 86, 440–449 (2013). 22. Ochi, T., Wu, Q. & Blundell, T. L. The spatial organization of non-homologous end joining: From bridging to end joining. DNA Repair (Amst). 17, 98–109 (2014). 23. Liang, S. et al. Achieving selectivity in space and time with DNA double-strand-break response and repair: molecular stages and scaffolds come with strings attached. Struct. Chem. 28, 161–171 (2017). 24. Serrano-Benítez, A., Cortés-Ledesma, F. & Ruiz, J. F. “An End to a Means”: How DNA- End Structure Shapes the Double-Strand Break Repair Process. Front. Mol. Biosci. 6, 1–9 (2020). 25. Zhang, W.-W. & Yaneva, M. On the mechanisms of Ku protein binding to DNA. Biochem. Biophys. Res. Commun. 186, 574–579 (1992). 26. Grundy, G. J., Moulding, H. A., Caldecott, K. W. & Rulten, S. L. One ring to bring them all--the role of Ku in mammalian non-homologous end joining. DNA Repair (Amst). 17, 30–38 (2014). 27. West, R. B., Yaneva, M. & Lieber, M. R. Productive and nonproductive complexes of Ku and DNA-dependent protein kinase at DNA termini. Mol. Cell. Biol. 18, 5908–5920 (1998). 28. Yoo, S., Kimzey, A. & Dynan, W. S. Photocross-linking of an oriented DNA repair complex: Ku bound at a single DNA end. J. Biol. Chem. 274, 20034–20039 (1999). 29. Walker, J. R., Corpina, R. A. & Goldberg, J. Structure of the Ku heterodimer bound to DNA and its implications for double-strand break repair. Nature 412, 607–614 (2001). 30. Thompson, R. F., Walker, M., Siebert, C. A., Muench, S. P. & Ranson, N. A. An introduction to sample preparation and imaging by cryo-electron microscopy for structural biology. Methods 100, 3–15 (2016). 31. Zhang, Z. et al. The three-dimensional structure of the C-terminal DNA-binding domain of human Ku70. J. Biol. Chem. 276, 38231–38236 (2001). 32. Zhang, Z. et al. Solution structure of the C-terminal domain of Ku80 suggests important sites for protein-protein interactions. Structure 12, 495–502 (2004). 33. Aravind, L. & Koonin, E. V. SAP - A putative DNA-binding motif involved in chromosomal organization. Trends Biochem. Sci. 25, 112–114 (2000). 34. Hu, S., Pluth, J. M. & Cucinotta, F. A. Putative binding modes of Ku70-SAP domain with double strand DNA: A molecular modeling study. J. Mol. Model. 18, 2163–2174 (2012). 35. Gell, D. & Jackson, S. P. Mapping of protein-protein interactions within the DNA- dependent protein kinase complex. Nucleic Acids Res. 27, 3494–3502 (1999). 36. Bennett, S. M., Woods, D. S., Pawelczak, K. S. & Turchi, J. J. Multiple protein-protein interactions within the DNA-PK complex are mediated by the C-terminus of Ku 80. Int. J. Biochem. Mol. Biol. 3, 36–45 (2012).

171

37. Singleton, B. K., Torres-Arzayus, M. I., Rottinghaus, S. T., Taccioli, G. E. & Jeggo, P. A. The C terminus of Ku80 activates the DNA-dependent protein kinase catalytic subunit. Mol. Cell. Biol. 19, 3267–3277 (1999). 38. Ochi, T. et al. PAXX, a paralog of XRCC4 and XLF, interacts with Ku to promote DNA double-strand break repair. Science (80-. ). 347, 185–188 (2015). 39. Sibanda, B. L., Chirgadze, D. Y., Ascher, D. B. & Blundell, T. L. DNA-PKcs structure suggests an allosteric mechanism modulating DNA double-strand break repair. Science (80-. ). 355, 520–524 (2017). 40. Yin, X. et al. Cryo-EM structure of the human DNA-PK holoenzyme. Cell Res. 27, 1341– 1350 (2017). 41. Gottlieb, T. M., Jackson, P. & Jackson, S. P. The DNA-dependent protein kinase: requirement for DNA ends and association with Ku antigen. Cell 72, 131–142 (1993). 42. Hammel, M. et al. An intrinsically disordered APLF links Ku, DNA-PKcs, and XRCC4- DNA ligase IV in an extended flexible non-homologous end joining complex. J. Biol. Chem. 291, 26987–27006 (2016). 43. Wang, J. L. et al. Dissection of DNA double-strand-break repair using novel single- molecule forceps. Nat. Struct. Mol. Biol. 25, 482–487 (2018). 44. Weterings, E. et al. The role of DNA dependent protein kinase in synapsis of DNA ends. Nucleic Acids Res. 31, 7238–7246 (2003). 45. Uematsu, N. et al. Autophosphorylation of DNA-PKCS regulates its dynamics at DNA double-strand breaks. J. Cell Biol. 177, 219–229 (2007). 46. Jiang, W. et al. Differential Phosphorylation of DNA-PKcs Regulates the Interplay between End-Processing and End-Ligation during Nonhomologous End-Joining. Mol. Cell 58, 172–185 (2015). 47. Dobbs, T. A., Tainer, J. A. & Lees-Miller, S. P. A structural model for regulation of NHEJ by DNA-PKcs autophosphorylation. DNA Repair (Amst). 9, 1307–1314 (2010). 48. Ding, Q. Q. et al. Autophosphorylation of the catalytic subunit of the DNA-dependent protein kinase is required for efficient end processing during DNA double-strand break repair. Mol. Cell. Biol. 23, 5836–5848 (2003). 49. Cui, X. et al. Autophosphorylation of DNA-Dependent Protein Kinase Regulates DNA End Processing and May Also Alter Double-Strand Break Repair Pathway Choice. Mol. Cell. Biol. 25, 10842–10852 (2005). 50. Block, W. D. et al. Autophosphorylation-dependent remodeling of the DNA-dependent protein kinase catalytic subunit regulates ligation of DNA ends. Nucleic Acids Res. 32, 4351–4357 (2004). 51. Neal, J. A. et al. Unraveling the Complexities of DNA-Dependent Protein Kinase Autophosphorylation. Mol. Cell. Biol. 34, 2162–2175 (2014). 52. Douglas, P. et al. Identification of in vitro and in vivo phosphorylation sites in the catalytic subunit of the DNA-dependent protein kinase. Biochem. J. 368, 243–251 (2002). 53. Meek, K., Douglas, P., Cui, X., Ding, Q. Q. & Lees-Miller, S. P. trans Autophosphorylation at DNA-dependent protein kinase’s two major autophosphorylation site clusters facilitates end processing but not end joining. Mol. Cell. Biol. 27, 3881–3890 (2007). 54. Yoo, S. & Dynan, W. S. Geometry of a complex formed by double strand break repair proteins at a single DNA end: recruitment of DNA-PKcs induces inward translocation of

172

Ku protein. Nucleic Acids Res. 27, 4679–4686 (1999). 55. DeFazio, L. G., Stansel, R. M., Griffith, J. D. & Chu, G. Synapsis of DNA ends by DNA- dependent protein kinase. EMBO J. 21, 3192–3200 (2002). 56. Graham, T. G. W., Walter, J. C. & Loparo, J. J. Two-Stage Synapsis of DNA Ends during Non-homologous End Joining. Mol. Cell 61, 850–858 (2016). 57. Merkle, D., Block, W. D., Yu, Y., Lees-Miller, S. P. & Cramb, D. T. Analysis of DNA- dependent protein kinase-mediated DNA end joining by two-photon fluorescence cross- correlation spectroscopy. Biochemistry 45, 4164–4172 (2006). 58. Xing, M. et al. Interactome analysis identifies a new paralogue of XRCC4 in non- homologous end joining DNA repair pathway. Nat. Commun. 6, 1–12 (2015). 59. Tadi, S. K. et al. PAXX Is an Accessory c-NHEJ Factor that Associates with Ku70 and Has Overlapping Functions with XLF. Cell Rep. 17, 541–555 (2016). 60. Balmus, G. et al. Synthetic lethality between PAXX and XLF in mammalian development. Genes Dev. 30, 2152–2157 (2016). 61. Liu, X., Shao, Z., Jiang, W., Lee, B. J. & Zha, S. PAXX promotes KU accumulation at DNA breaks and is essential for end-joining in XLF-deficient mice. Nat. Commun. 8, 1– 13 (2017). 62. Hammel, M. et al. Ku and DNA-dependent protein kinase dynamic conformations and assembly regulate DNA binding and the initial non-homologous end joining complex. J. Biol. Chem. 285, 1414–1423 (2010). 63. Spagnolo, L., Rivera-Calzada, A., Pearl, L. H. & Llorca, O. Three-dimensional structure of the human DNA-PKcs/Ku70/Ku80 complex assembled on DNA and its implications for DNA DSB repair. Mol. Cell 22, 511–519 (2006). 64. Williams, D. R., Lee, K.-J., Shi, J., Chen, D. J. & Stewart, P. L. Cryo-EM Structure of the DNA-Dependent Protein Kinase Catalytic Subunit at Subnanometer Resolution Reveals α Helices and Insight into DNA Binding. Structure 16, 468–477 (2008). 65. Sibanda, B. L. et al. Crystal structure of an Xrcc4-DNA ligase IV complex. Nat. Struct. Biol. 8, 1015–1019 (2001). 66. Wu, P. et al. Structural and Functional Interaction between the Human DNA Repair Proteins DNA Ligase IV and XRCC4. Mol. Cell. Biol. 29, 3163–3172 (2009). 67. Costantini, S., Woodbine, L., Andreoli, L., Jeggo, P. A. & Vindigni, A. Interaction of the Ku heterodimer with the DNA ligase IV/Xrcc4 complex and its regulation by DNA-PK. DNA Repair (Amst). 6, 712–722 (2007). 68. Grundy, G. J. et al. APLF promotes the assembly and activity of non-homologous end joining protein complexes. EMBO J. 32, 112–125 (2013). 69. Nemoz, C. et al. XLF and APLF bind Ku80 at two remote sites to ensure DNA repair by non-homologous end joining. Nat. Struct. Mol. Biol. 25, 971–980 (2018). 70. Macrae, C. J., McCulloch, R. D., Ylanko, J., Durocher, D. & Koch, C. A. APLF (C2orf13) facilitates nonhomologous end-joining and undergoes ATM-dependent hyperphosphorylation following ionizing radiation. DNA Repair (Amst). 7, 292–302 (2008). 71. Shirodkar, P., Fenton, A. L., Meng, L. & Koch, C. A. Identification and functional characterization of a Ku-binding motif in aprataxin polynucleotide kinase/phosphatase- like factor (APLF). J. Biol. Chem. 288, 19604–19613 (2013). 72. Yano, K. et al. Ku recruits XLF to DNA double-strand breaks. EMBO Rep. 9, 91–96

173

(2008). 73. Hammel, M. et al. XRCC4 protein interactions with XRCC4-like factor (XLF) create an extended grooved scaffold for DNA ligation and double strand break repair. J. Biol. Chem. 286, 32638–32650 (2011). 74. Ropars, V. et al. Structural characterization of filaments formed by human Xrcc4- Cernunnos/XLF complex involved in nonhomologous DNA end-joining. Proc. Natl. Acad. Sci. U. S. A. 108, 12663–12668 (2011). 75. Andres, S. N. et al. A human XRCC4-XLF complex bridges DNA. Nucleic Acids Res. 40, 1868–1878 (2012). 76. Reid, D. A. et al. Organization and dynamics of the nonhomologous end-joining machinery during DNA double-strand break repair. Proc. Natl. Acad. Sci. 112, E2575– E2584 (2015). 77. Brouwer, I. et al. Sliding sleeves of XRCC4-XLF bridge DNA and connect fragments of broken DNA. Nature 535, 566–569 (2016). 78. Kim, K., Pedersen, L. C., Kirby, T. W., Derose, E. F. & London, R. E. Characterization of the APLF FHA-XRCC1 phosphopeptide interaction and its structural and functional implications. Nucleic Acids Res. 45, 12374–12387 (2017). 79. Eustermann, S. et al. Solution structures of the two PBZ domains from human APLF and their interaction with poly(ADP-ribose). Nat. Struct. Mol. Biol. 17, 241–243 (2010). 80. Kaminski, A. M. et al. Structures of DNA-bound human ligase IV catalytic core reveal insights into substrate binding and catalysis. Nat. Commun. 9, 1–12 (2018). 81. Cottarel, J. et al. A noncatalytic function of the ligation complex during nonhomologous end joining. J. Cell Biol. 200, 173–186 (2013). 82. Cui, X. et al. Autophosphorylation of DNA-Dependent Protein Kinase Regulates DNA End Processing and May Also Alter Double-Strand Break Repair Pathway Choice Autophosphorylation of DNA-Dependent Protein Kinase Regulates DNA End Processing and May Also Alter Double-Strand. Mol. Cell. Biol. 25, 10842–10852 (2005). 83. Frit, P. et al. Plugged into the Ku-DNA hub : The NHEJ network. Prog. Biophys. Mol. Biol. 147, 62–76 (2019). 84. Chang, H. H. Y. Y. & Lieber, M. R. Structure-Specific nuclease activities of Artemis and the Artemis: DNA-PKcs complex. Nucleic Acids Res. 44, 4991–4997 (2016). 85. Ma, Y., Pannicke, U., Schwarz, K. & Lieber, M. R. Hairpin opening and overhang processing by an Artemis/DNA-dependent protein kinase complex in nonhomologous end joining and V(D)J recombination. Cell 108, 781–794 (2002). 86. Ochi, T., Gu, X. & Blundell, T. L. Structure of the catalytic region of DNA ligase IV in complex with an artemis fragment sheds light on double-strand break repair. Structure 21, 672–679 (2013). 87. De Ioannes, P., Malu, S., Cortes, P. & Aggarwal, A. K. Structural basis of DNA ligase IV- Artemis interaction in nonhomologous end-joining. Cell Rep. 2, 1505–1512 (2012). 88. Rulten, S. L. & Grundy, G. J. Non-homologous end joining: Common interaction sites and exchange of multiple factors in the DNA repair process. BioEssays 39, 1–12 (2017). 89. Mahajan, K. N., McElhinny, S. A. N., Mitchell, B. S. & Ramsden, D. A. Association of DNA polymerase μ (pol μ) with Ku and ligase IV: role for pol μ in end-joining double- strand break repair. Mol. Cell. Biol. 22, 5194–5202 (2002). 90. Craxton, A. et al. PAXX and its paralogs synergistically direct DNA polymerase λ activity

174

in DNA repair. Nat. Commun. 9, 1–16 (2018). 91. Orren, D. K. et al. A functional interaction of Ku with Werner exonuclease facilitates digestion of damaged DNA. Nucleic Acids Res. 29, 1926–1934 (2001). 92. Kusumoto, R. et al. Werner protein cooperates with the XRCC4-DNA ligase IV complex in end-processing. Biochemistry 47, 7548–56 (2008). 93. Heo, J. et al. TDP1 promotes assembly of non-homologous end joining protein complexes on DNA. DNA Repair (Amst). 30, 28–37 (2015). 94. Daniel Aceytuno, R. et al. Structural and functional characterization of the PNKP- XRCC4-LigIV DNA repair complex. Nucleic Acids Res. 45, 6238–6251 (2017). 95. Chang, H. Y. et al. Different DNA end configurations dictate which NHEJ components are most important for joining efficiency. J. Biol. Chem. 291, 24377–24389 (2016). 96. Graham, T. G. W. W., Carney, S. M., Walter, J. C. & Loparo, J. J. A single XLF dimer bridges DNA ends during nonhomologous end joining. Nat. Struct. Mol. Biol. 25, 877– 884 (2018). 97. Stinson, B. M., Moreno, A. T., Walter, J. C. & Loparo, J. J. A Mechanism to Minimize Errors during Non-homologous End Joining. Mol. Cell 77, 1–12 (2019). 98. Gu, J. et al. XRCC4:DNA ligase IV can ligate incompatible DNA ends and can ligate across gaps. EMBO J. 26, 1010–1023 (2007). 99. Conlin, M. P. et al. DNA Ligase IV Guides End-Processing Choice during Nonhomologous End Joining. Cell Rep. 20, 2810–2819 (2017). 100. Postow, L. et al. Ku80 removal from DNA through double strand break-induced ubiquitylation. J. Cell Biol. 182, 467–479 (2008). 101. Postow, L. Destroying the ring: Freeing DNA from Ku with ubiquitin. FEBS Lett. 585, 2876–2882 (2011). 102. Rout, M. P. & Sali, A. Principles for Integrative Structural Biology Studies. Cell 177, 1384–1403 (2019). 103. Ward, A. B., Sali, A. & Wilson, I. A. Integrative structural biology. Science (80-. ). 339, 913–915 (2013). 104. Alber, F., Förster, F., Korkin, D., Topf, M. & Sali, A. Integrating diverse data for structure determination of macromolecular assemblies. Annu. Rev. Biochem. 77, 443–477 (2008). 105. Schneidman-Duhovny, D., Pellarin, R. & Sali, A. Uncertainty in integrative structural modeling. Curr. Opin. Struct. Biol. 28, 96–104 (2014). 106. Webb, B. et al. Integrative structure modeling with the Integrative Modeling Platform. Protein Sci. 27, 245–258 (2018). 107. Lasker, K. et al. Integrative Structure Modeling of Macromolecular Assemblies from Proteomics Data. Mol. Cell. Proteomics 9, 1689–1702 (2010). 108. Alber, F. et al. The molecular architecture of the nuclear pore complex. Nature 450, 695– 701 (2007). 109. Kim, S. J. et al. Integrative structure and functional anatomy of a nuclear pore complex. Nature 555, 475–482 (2018). 110. Alber, F. et al. Determining the architectures of macromolecular assemblies. Nature 450, 683–694 (2007). 111. Shi, Y. et al. Structural Characterization by Cross-linking Reveals the Detailed Architecture of a Coatomer-related Heptameric Module from the Nuclear Pore Complex. Mol. Cell. Proteomics 13, 2927–2943 (2014).

175

112. Jovanovic, M. & Dynan, W. S. Terminal DNA structure and ATP influence binding parameters of the DNA-dependent protein kinase at an early step prior to DNA synapsis. Nucleic Acids Res. 34, 1112–1120 (2006). 113. Leuther, K. K., Hammarsten, O., Kornberg, R. D. & Chu, G. Structure of DNA-dependent protein kinase : implications for its regulation by DNA. EMBO J. 18, 1114–1123 (1999). 114. Percy, A. J., Rey, M., Burns, K. M. & Schriemer, D. C. Probing protein interactions with hydrogen/deuterium exchange and mass spectrometry-a review. Anal. Chim. Acta 721, 7– 21 (2012). 115. Konermann, L., Vahidi, S. & Sowole, M. A. Mass spectrometry methods for studying structure and dynamics of biological macromolecules. Anal. Chem. 86, 213–232 (2014). 116. Pan, Y., Piyadasa, H., O’Neil, J. D. & Konermann, L. Conformational dynamics of a membrane transport protein probed by H/D exchange and covalent labeling: The glycerol facilitator. J. Mol. Biol. 416, 400–413 (2012). 117. Sheff, J. G. et al. Novel allosteric pathway of Eg5 regulation identified through multivariate statistical analysis of hydrogen-exchange mass spectrometry (HX-MS) ligand screening data. Mol. Cell. Proteomics 16, 428–437 (2017). 118. Ramirez-Sarmiento, C. A. & Komives, E. A. Hydrogen-deuterium exchange mass spectrometry reveals folding and allostery in protein-protein interactions. Methods 144, 43–52 (2018). 119. Joseph, R. E., Wales, T. E., Fulton, D. B., Engen, J. R. & Andreotti, A. H. Achieving a Graded Immune Response: BTK Adopts a Range of Active/Inactive Conformations Dictated by Multiple Interdomain Contacts. Structure 25, 1481–1494 (2017). 120. Skinner, J. J. et al. Protein dynamics viewed by hydrogen exchange. Protein Sci. 21, 996– 1005 (2012). 121. Walters, B. T., Ricciuti, A., Mayne, L. & Englander, S. W. Minimizing back exchange in the hydrogen exchange-mass spectrometry experiment. J. Am. Soc. Mass Spectrom. 23, 2132–2139 (2012). 122. Masson, G. R. et al. Recommendations for performing, interpreting and reporting hydrogen deuterium exchange mass spectrometry (HDX-MS) experiments. Nat. Methods 16, 595–602 (2019). 123. Yang, M. et al. Recombinant Nepenthesin II for Hydrogen/Deuterium Exchange Mass Spectrometry. Anal. Chem. 87, 6681–6687 (2015). 124. Konermann, L., Pan, J. & Liu, Y.-H. Hydrogen exchange mass spectrometry for studying protein structure and dynamics. Chem. Soc. Rev. 40, 1224–1234 (2011). 125. Devaurs, D. et al. Coarse-Grained Conformational Sampling of Protein Structure Improves the Fit to Experimental Hydrogen-Exchange Data. Front. Mol. Biosci. 4, 1–14 (2017). 126. Persson, F. & Halle, B. How amide hydrogens exchange in native proteins. Proc. Natl. Acad. Sci. 112, 10383–10388 (2015). 127. Liu, T. et al. Quantitative Assessment of Protein Structural Models by Comparison of H/D Exchange MS Data with Exchange Behavior Accurately Predicted by DXCOREX. J. Am. Soc. Mass Spectrom. 23, 43–56 (2012). 128. Park, I.-H. H. et al. Estimation of Hydrogen-Exchange Protection Factors from MD Simulation Based on Amide Hydrogen Bonding Analysis. J. Chem. Inf. Model. 55, 1914– 1925 (2015).

176

129. Skinner, J. J., Lim, W. K., Bédard, S., Black, B. E. & Englander, S. W. Protein hydrogen exchange: Testing current models. Protein Sci. 21, 987–995 (2012). 130. McAllister, R. G. & Konermann, L. Challenges in the interpretation of protein h/d exchange data: a molecular dynamics simulation perspective. Biochemistry 54, 2683–2692 (2015). 131. Erzberger, J. P. et al. Molecular Architecture of the 40S⋅eIF1⋅eIF3 Translation Initiation Complex. Cell 158, 1123–1135 (2014). 132. Lasker, K. et al. Molecular architecture of the 26S proteasome holocomplex determined by an integrative approach. Proc. Natl. Acad. Sci. 109, 1380–1387 (2012). 133. Ostan, N. K. H. et al. Lactoferrin binding protein B – a bi-functional bacterial receptor protein. PLoS Pathog. 13, 1–20 (2017). 134. Shi, Y. et al. A strategy for dissecting the architectures of native macromolecular assemblies. Nat. Methods 12, 1135–1138 (2015). 135. Yu, C. & Huang, L. Cross-Linking Mass Spectrometry: An Emerging Technology for Interactomics and Structural Biology. Anal. Chem. 90, 144–165 (2018). 136. Politis, A. et al. A mass spectrometry-based hybrid method for structural modeling of protein complexes. Nat. Methods 11, 403–406 (2014). 137. Gutierrez, C. et al. Structural dynamics of the human COP9 signalosome revealed by cross-linking mass spectrometry and integrative modeling. Proc. Natl. Acad. Sci. U. S. A. 117, 4088–4098 (2020). 138. Wang, X. et al. Molecular details underlying dynamic structures and regulation of the human 26S proteasome. Mol. Cell. Proteomics 16, 840–854 (2017). 139. Herzog, F. et al. Structural probing of a protein phosphatase 2A network by chemical cross-linking and mass spectrometry. Science 337, 1348–1352 (2012). 140. Yu, C. et al. Probing H2O2-mediated structural dynamics of the human 26s proteasome using quantitative cross-linking mass spectrometry (QXL-MS). Mol. Cell. Proteomics 18, 954–967 (2019). 141. Steigenberger, B., Albanese, P., Heck, A. J. R. & Scheltema, R. A. To Cleave or Not To Cleave in XL-MS? J. Am. Soc. Mass Spectrom. 31, 196–206 (2020). 142. Leitner, A., Walzthoeni, T. & Aebersold, R. Lysine-specific chemical cross-linking of protein complexes and identification of cross-linking sites using LC-MS/MS and the xQuest/xProphet software pipeline. Nat. Protoc. 9, 120–137 (2014). 143. Fritzsche, R., Ihling, C. H., Götze, M. & Sinz, A. Optimizing the enrichment of cross- linked products for mass spectrometric protein analysis. Rapid Commun. Mass Spectrom. 26, 653–658 (2012). 144. Mendoza, V. L. & Vachet, R. W. Probing Protein Structure by Amino Acid-Specific Covalent Labeling and Mass Spectrometry. Mass Spectrom. Rev. 28, 785–815 (2009). 145. Ziemianowicz, D. S., Ng, D., Schryvers, A. B. & Schriemer, D. C. Photo-Cross-Linking Mass Spectrometry and Integrative Modeling Enables Rapid Screening of Antigen Interactions Involving Bacterial Transferrin Receptors. J. Proteome Res. 18, 934–946 (2019). 146. Sinz, A. Chemical cross-linking and mass spectrometry to map three-dimensional protein structures and protein-protein interactions. Mass Spectrom. Rev. 25, 663–682 (2006). 147. Leitner, A. et al. Chemical cross-linking/mass spectrometry targeting acidic residues in proteins and protein complexes. Proc. Natl. Acad. Sci. U. S. A. 111, 9455–9460 (2014).

177

148. Gutierrez, C. B. et al. Developing an acidic residue reactive and sulfoxide-containing MS-cleavable homobifunctional cross-linker for probing protein-protein interactions. Anal. Chem. 88, 8315–8322 (2016). 149. Fioramonte, M. et al. XPlex: An Effective, Multiplex Cross-Linking Chemistry for Acidic Residues. Anal. Chem. 90, 6043–6050 (2018). 150. Rafiei, A. & Schriemer, D. C. A crosslinking protocol for integrative structural modeling activities. Anal. Biochem. 586, 1–8 (2019). 151. Hofmann, T., Fischer, A. W., Meiler, J. & Kalkhof, S. Protein structure prediction guided by crosslinking restraints – A systematic evaluation of the impact of the crosslinking spacer length. Methods 89, 79–90 (2015). 152. Wales, T. E., Fadgen, K. E., Eggertson, M. J. & Engen, J. R. Subzero Celsius separations in three-zone temperature controlled hydrogen deuterium exchange mass spectrometry. J. Chromatogr. A 1523, 275–282 (2017). 153. Venable, J. D., Okach, L., Agarwalla, S. & Brock, A. Subzero Temperature Chromatography for Reduced Back-Exchange and Improved Dynamic Range in Amide Hydrogen/Deuterium Exchange Mass Spectrometry. Anal. Chem. 84, 9601–9608 (2012). 154. Iacob, R. E., Murphy III, J. P. & Engen, J. R. Ion mobility adds an additional dimension to mass spectrometric analysis of solution-phase hydrogen/ deuterium exchange. Rapid Commun. Mass Spectrom. 22, 2898–2904 (2008). 155. Bennett, M. J., Barakat, K., Huzil, J. T., Tuszynski, J. & Schriemer, D. C. Discovery and characterization of the laulimalide-microtubule binding mode by mass shift perturbation mapping. Chem. Biol. 17, 725–734 (2010). 156. Mayne, L. et al. Many overlapping peptides for protein hydrogen exchange experiments by the fragment separation-mass spectrometry method. J. Am. Soc. Mass Spectrom. 22, 1898–905 (2011). 157. Wlodawer, A., Minor, W., Dauter, Z. & Jaskolski, M. Protein crystallography for aspiring crystallographers or how to avoid pitfalls and traps in macromolecular structure determination. FEBS J. 280, 5705–5736 (2013). 158. Goodarzi, A. A. & Lees-Miller, S. P. Biochemical characterization of the ataxia- telangiectasia mutated (ATM) protein from human cells. DNA Repair (Amst). 3, 753–767 (2004). 159. Gaspari, M. & Cuda, G. Nano LC-MS/MS: A Robust Setup for Proteomics Analysis. in Nanoproteomics: Methods and Protocols, Methods in Molecular Biology (eds. Weil, R. J. & Toms, S. A.) 790, 115–126 (Springer Science, 2011). 160. Wang, L. & Smith, D. L. Downsizing improves sensitivity 100-fold for hydrogen exchange-mass spectrometry. Anal. Biochem. 314, 46–53 (2003). 161. Sheff, J. G., Hepburn, M., Yu, Y., Lees-Miller, S. P. & Schriemer, D. Nanospray HX-MS configuration for structural interrogation of large protein systems. Analyst 142, 904–910 (2017). 162. Slysz, G. W., Percy, A. J. & Schriemer, D. C. Restraining expansion of the peak envelope in H/D exchange-MS and its application in detecting perturbations of protein structure/dynamics. Anal. Chem. 80, 7004–7011 (2008). 163. Zhang, J., Ramachandran, P., Kumar, R. & Gross, M. L. H/D Exchange Centroid Monitoring is Insufficent to Show Differences in the Behavior of Protein States. J. Am. Soc. Mass Spectrom. 24, 450–453 (2013).

178

164. Rey, M. et al. Mass Spec Studio for Integrative Structural Biology. Structure 22, 1538–1548 (2014). 165. Chik, J. K., Vande Graaf, J. L. & Schriemer, D. C. Quantitating the statistical distribution of deuterium incorporation to extend the utility of H/D exchange MS data. Anal. Chem. 78, 207–214 (2006). 166. Bai, Y., Milne, J. S., Mayne, L. & Englander, S. W. Primary Structure Effects on Peptide Group Hydrigen Exchange. Proteins 17, 75–86 (1993). 167. Sheff, J. G. & Schriemer, D. C. Toward Standardizing Deuterium Content Reporting in Hydrogen Exchange-MS. Anal. Chem. 86, 11962–11965 (2014). 168. Zhang, Z., Zhang, A. & Xiao, G. Improved protein hydrogen/deuterium exchange mass spectrometry platform with fully automated data processing. Anal Chem 84, 4942–4949 (2012). 169. Pettersen, E. F. et al. UCSF Chimera - A visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004). 170. Sowole, M. A., Alexopoulos, J. A., Cheng, Y.-Q., Ortega, J. & Konermann, L. Activation of ClpP protease by ADEP antibiotics: insights from hydrogen exchange mass spectrometry. J. Mol. Biol. 425, 4508–4519 (2013). 171. Kidder, B. L., Hu, G. & Zhao, K. ChIP-Seq: Technical considerations for obtaining high- quality data. Nat. Immunol. 12, 918–922 (2011). 172. Kramer, K. et al. Photo-cross-linking and high-resolution mass spectrometry for assignment of RNA-binding sites in RNA-binding proteins. Nat. Methods 11, 1064–1070 (2014). 173. Tiss, A., Barre, O., Michaud-Soret, I. & Forest, E. Characterization of the DNA-binding site in the ferric uptake regulator protein from Escherichia coli by UV crosslinking and mass spectrometry. FEBS Lett. 579, 5454–5460 (2005). 174. Doneanu, C. E., Gafken, P. R., Bennett, S. E. & Barofsky, D. F. Mass spectrometry of UV-cross-linked protein-nucleic acid complexes: identification of amino acid residues in the single-stranded DNA-binding domain of human replication protein A. Anal. Chem. 76, 5667–5676 (2004). 175. Steen, H. & Jensen, O. N. Analysis of protein-nucleic acid interactions by photochemical cross-linking and mass spectrometry. Mass Spectrom. Rev. 21, 163–182 (2002). 176. Vaughan, R. C. & Kao, C. C. Mapping Protein – RNA Interactions by RCAP , RNA- Cross- Linking and Peptide Fingerprinting. RNA Nanotechnol. Ther. Methods Proctols, Methods Mol. Biol. 1297, 225–236 (2015). 177. Perez-Vargas, J. et al. Isolation and Characterization of the DNA and Protein Binding Activities of Adenovirus Core Protein V. J. Virol. 88, 9287–9296 (2014). 178. Vaughan, R., Fan, B., You, J.-S. & Kao, C. C. Identification and functional characterization of the nascent RNA contacting residues of the hepatitis C virus RNA- dependent RNA polymerase. Rna 18, 1541–1552 (2012). 179. Dolinsky, T. J., Nielsen, J. E., McCammon, J. A. & Baker, N. A. PDB2PQR: An automated pipeline for the setup of Poisson-Boltzmann electrostatics calculations. Nucleic Acids Res. 32, 665–667 (2004). 180. Dolinsky, T. J. et al. PDB2PQR: Expanding and upgrading automated preparation of biomolecular structures for molecular simulations. Nucleic Acids Res. 35, 522–525 (2007). 181. Baker, N. A., Sept, D., Joseph, S., Holst, M. J. & McCammon, J. A. Electrostatics of

179

nanosystems; Application to and the ribosome. 98, 10037–10041 (2001). 182. Toews, J., Rogalski, J. C., Clark, T. J. & Kast, J. Mass spectrometric identification of formaldehyde-induced peptide modifications under in vivo protein cross-linking conditions. Anal. Chim. Acta 618, 168–183 (2008). 183. O’Reilly, F. J. & Rappsilber, J. Cross-linking mass spectrometry: methods and applications in structural, molecular and systems biology. Nat. Struct. Mol. Biol. 25, 1000–1008 (2018). 184. Liu, F., Rijkers, D. T. S., Post, H. & Heck, A. J. R. Proteome-wide profiling of protein assemblies by cross-linking mass spectrometry. Nat. Methods 12, 1179–1184 (2015). 185. Walzthoeni, T., Leitner, A., Stengel, F. & Aebersold, R. Mass spectrometry supported determination of protein complex structure. Curr. Opin. Struct. Biol. 23, 252–260 (2013). 186. Chen, Z. A., Fischer, L., Cox, J. & Rappsilber, J. Quantitative Cross-linking/Mass Spectrometry Using Isotope-labeled Cross-linkers and MaxQuant. Mol. Cell. Proteomics 15, 2769–2778 (2016). 187. Chen, Z. A. & Rappsilber, J. Quantitative cross-linking / mass spectrometry to elucidate structural changes in proteins and their complexes. Nat. Protoc. 14, 171–201 (2019). 188. Walzthoeni, T. et al. XTract: Software for characterizing conformational changes of protein complexes by quantitative cross-linking mass spectrometry. Nat. Methods 12, 1185–1190 (2015). 189. Schmidt, C. & Robinson, C. V. A comparative cross-linking strategy to probe conformational changes in protein complexes. Nat. Protoc. 9, 2224–2236 (2014). 190. Leitner, A. et al. Expanding the Chemical Cross-Linking Toolbox by the Use of Multiple Proteases and Enrichment by Size Exclusion Chromatography. Mol. Cell. Proteomics 11, 1–12 (2012). 191. Klykov, O. et al. Efficient and robust proteome-wide approaches for cross-linking mass spectrometry. Nat. Protoc. 13, 2964–2990 (2018). 192. Schmidt, C. et al. Surface Accessibility and Dynamics of Macromolecular Assemblies Probed by Covalent Labeling Mass Spectrometry and Integrative Modeling. Anal. Chem. 89, 1459–1468 (2017). 193. Schmidt, R. & Sinz, A. Improved single-step enrichment methods of cross-linked products for protein structure analysis and protein interaction mapping. Anal. Bioanal. Chem. 409, 2393–2400 (2017). 194. Petrotchenko, E. V. & Borchers, C. H. Crosslinking Combined with Mass Spectrometry for Structural Proteomics. Mass Spectrom. Rev. 29, 862–876 (2010). 195. Rey, M., Dupré, M., Lopez-Neira, I., Duchateau, M. & Chamot-Rooke, J. EXL-MS: An Enhanced Cross-Linking Mass Spectrometry Workflow to Study Protein Complexes. Anal. Chem. 90, 10707–10714 (2018). 196. Burke, A. M. et al. Synthesis of two new enrichable and MS-cleavable cross-linkers to define protein-protein interactions by mass spectrometry. Org. Biomol. Chem. 13, 5030– 5037 (2015). 197. Steigenberger, B., Pieters, R. J., Heck, A. J. R. & Scheltema, R. A. PhoX: An IMAC- Enrichable Cross-Linking Reagent. ACS Cent. Sci. 5, 1514–1522 (2019). 198. Sinz, A. Divide and conquer: cleavable cross-linkers to study protein conformation and protein–protein interactions. Analytical and Bioanalytical Chemistry 409, 33–44 (2017). 199. Sarpe, V. et al. High sensitivity crosslink detection coupled with integrative structure

180

modeling in the Mass Spec Studio. Mol. Cell. Proteomics 15, 3071–3080 (2016). 200. Yu, F., Li, N. & Yu, W. Exhaustively identifying cross-linked peptides with a linear computational complexity. J. Proteome Res. 16, 3942–3952 (2017). 201. Hoopmann, M. R. et al. Kojak: Efficient Analysis of Chemically Cross-Linked Protein Complexes. J. Proteome Res. 14, 2190–2198 (2015). 202. Lima, D. B. et al. SIM-XL: A powerful and user-friendly tool for peptide cross-linking analysis. J. Proteomics 129, 51–55 (2014). 203. Lu, L. et al. Identification of MS-Cleavable and Noncleavable Chemically Cross-Linked Peptides with MetaMorpheus. J. Proteome Res. 17, 2370–2376 (2018). 204. Leitner, A. et al. Probing native protein structures by chemical cross-linking, mass spectrometry, and bioinformatics. Mol. Cell. Proteomics 9, 1634–1649 (2010). 205. Götze, M. et al. StavroX-A software for analyzing crosslinked products in protein interaction studies. J. Am. Soc. Mass Spectrom. 23, 76–87 (2012). 206. Gotze, M. et al. Automated Assingment of MS/MS Cleavable Cross-Links in Protein 3D- Structure Analysis. J. Am. Soc. Mass Spectrom. 26, 83–97 (2015). 207. Geer, L. Y. et al. Open mass spectrometry search algorithm. J. Proteome Res. 3, 958–964 (2004). 208. Wenger, C. D., Phanstiel, D. H., Lee, M. V., Bailey, D. J. & Coon, J. J. COMPASS: A suite of pre- and post-search proteomics software tools for OMSSA. Proteomics 11, 1064– 1074 (2011). 209. Grimm, M., Zimniak, T., Kahraman, A. & Herzog, F. XVis: A web server for the schematic visualization and interpretation of crosslink-derived spatial restraints. Nucleic Acids Res. 43, W362–W369 (2015). 210. Kosinski, J. et al. Xlink analyzer: Software for analysis and visualization of cross-linking data in the context of three-dimensional structures. J. Struct. Biol. 189, 177–183 (2015). 211. Hulsen, T., de Vlieg, J. & Alkema, W. BioVenn - A web application for the comparison and visualization of biological lists using area-proportional Venn diagrams. BMC Genomics 9, 1–6 (2008). 212. Müller, F., Fischer, L., Chen, Z. A., Auchynnikava, T. & Rappsilber, J. On the Reproducibility of Label-Free Quantitative Cross-Linking/Mass Spectrometry. J. Am. Soc. Mass Spectrom. 29, 405–412 (2018). 213. Cox, J. et al. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol. Cell. Proteomics 13, 2513– 2526 (2014). 214. Bantscheff, M., Schirle, M., Sweetman, G., Rick, J. & Kuster, B. Quantitative mass spectrometry in proteomics: A critical review. Anal. Bioanal. Chem. 389, 1017–1031 (2007). 215. Jette, N. & Lees-Miller, S. P. The DNA-dependent protein kinase: A multifunctional protein kinase with roles in DNA double strand break repair and mitosis. Prog. Biophys. Mol. Biol. 117, 194–205 (2014). 216. Baretic, D. et al. Structural insights into the critical DNA damage sensors DNA-PKcs, ATM and ATR. Prog. Biophys. Mol. Biol. 147, 4–16 (2019). 217. Saltzberg, D. et al. Modeling Biological Complexes Using Integrative Modeling Platform. in Bonomi M., Camilloni C. (eds) Biomolecular Simulations. Methods in Molecular Biology, Volume 2022 353–377 (2019).

181

218. Gil, V. A. & Guallar, V. PyRMSD: A Python package for efficient pairwise RMSD matrix calculation and handling. Bioinformatics 29, 2363–2364 (2013). 219. Viswanath, S., Chemmama, I. E., Cimermancic, P. & Sali, A. Assessing Exhaustiveness of Stochastic Sampling for Integrative Modeling of Macromolecular Structures. Biophys. J. 113, 2344–2353 (2017). 220. Pedregosa, F. et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011). 221. Fiser, A. & Sali, A. ModLoop: Automated modeling of loops in protein structures. Bioinformatics 19, 2500–2501 (2003). 222. Van Zundert, G. C. P. et al. The HADDOCK2.2 Web Server: User-Friendly Integrative Modeling of Biomolecular Complexes. J. Mol. Biol. 428, 720–725 (2016). 223. Webb, B. & Sali, A. Comparative Protein Structure Modeling Using MODELLER. Curr Protoc Bioinforma. 54, 139–148 (2016). 224. Meek, K., Lees-Miller, S. P. & Modesti, M. N-terminal constraint activates the catalytic subunit of the DNA-dependent protein kinase in the absence of DNA or Ku. Nucleic Acids Res. 40, 2964–2973 (2012). 225. Song, Z. et al. Genome-wide identification of DNA-PKcs-associated RNAs by RIP-Seq. Signal Transduct. Target. Ther. 4, 21–23 (2019). 226. Saltzberg, D. J. et al. SSEThread: Integrative threading of the DNA-PKcs sequence based on data from chemical cross-linking and hydrogen deuterium exchange. Prog. Biophys. Mol. Biol. 147, 92–102 (2019). 227. Merkle, D. et al. The DNA-dependent protein kinase interacts with DNA to form a protein - DNA complex that is disrupted by phosphorylation. Biochemistry 41, 12706–12714 (2002). 228. Schneider, C. A., Rasband, W. S. & Eliceiri, K. W. NIH Image to ImageJ: 25 years of image analysis. Nature Methods 9, 671–675 (2012). 229. Russel, D. et al. Putting the pieces together: Integrative modeling platform software for structure determination of macromolecular assemblies. PLoS Biol. 10, 1–5 (2012). 230. Bullock, J. M. A., Schwab, J., Thalassinos, K. & Topf, M. The importance of non- accessible crosslinks and solvent accessible surface distance in modeling proteins with restraints from crosslinking mass spectrometry. Mol. Cell. Proteomics 15, 2491–2500 (2016). 231. Wang, X. et al. Structure of the intact ATM/Tel1 kinase. Nat. Commun. 7, 1–8 (2016). 232. Lau, W. C. Y. et al. Structure of the human dimeric ATM kinase. Cell Cycle 15, 1117– 1124 (2016). 233. Yates, L. A. et al. Cryo-EM Structure of Nucleotide-Bound Tel1ATM Unravels the Molecular Basis of Inhibition and Structural Rationale for Disease-Associated Mutations. Structure 28, 96–104 (2020). 234. Rao, Q. et al. Cryo-EM structure of human ATR-ATRIP complex. Cell Res. 28, 143–156 (2018). 235. Wang, J. et al. Artemis deficiency confers a DNA double-strand break repair defect and Artemis phosphorylation status is altered by DNA damage and cell cycle progression. DNA Repair (Amst). 4, 556–570 (2005). 236. Mohiuddin, I. S. & Kang, M. H. DNA-PK as an Emerging Therapeutic Target in Cancer. Front. Oncol. 9, 1–8 (2019).

182

237. Radhakrishnan, S. K., Jette, N. & Lees-Miller, S. P. Non-homologous end joining: Emerging themes and unanswered questions. DNA Repair (Amst). 17, 2–8 (2014). 238. Pospisilova, M., Seifrtova, M. & Rezacova, M. Small Molecule Inhibitors of DNA-PK for Tumor Sensitization to Anticancer Therapy. J. Physiol. Pharmacol. 68, 337–344 (2017). 239. Liu, Y., Efimova, E. V., Ramamurthy, A. & Kron, S. J. Repair-independent functions of DNA-PKcs protect irradiated cells from mitotic slippage and accelerated senescence. J. Cell Sci. 132, 1–12 (2019). 240. Ray, N., Cavin, X., Paul, J. C. & Maigret, B. Intersurf: Dynamic interface between proteins. J. Mol. Graph. Model. 23, 347–354 (2005).

183

Appendices

Appendix A: Copyright and Permissions

Found in: ucalgary_2020_hepburn_morgan_appendixA.pdf

Appendix B: Deuterium Uptake and Change for Peptides in the Differential HX-MS

Analyses Presented

Tables found in: ucalgary_2020_hepburn_morgan_appendixB.xml

Appendix C: Crosslink Identifications Used in Modelling

List found in: ucalgary_2020_hepburn_morgan_appendixC.csv