A GENOME-WIDE SIRNA SCREEN REVEALS DIVERSE CELLULAR PROCESSES AND PATHWAYS THAT MEDIATE GENOMIC STABILITY

A DISSERTATION

SUBMITTED TO THE DEPARTMENT OF

CHEMICAL AND SYSTEMS BIOLOGY

AND THE COMMITTEE ON GRADUATE STUDIES

OF STANFORD UNIVERSITY

IN PARTIAL FULFILLMENT OF THE

REQUIREMENTS FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

Renee Darlene Paulsen

May 2010

© 2010 by Renee Darlene Paulsen. All Rights Reserved. Re-distributed by Stanford University under license with the author.

This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/

This dissertation is online at: http://purl.stanford.edu/zx436yn9136

Includes supplemental files: 1. Table S1. H2AX Signal and Cell Cycle Distribution for Genome (NT dataset). The following list shows the percentage ... (Paulsen_ST1.xlsx) 2. Table S2. List of with Significant H2AX Values (NT dataset). The following list shows genes that had a H2AX s... (Paulsen_ST2.xlsx) 3. Table S3. Genes that Caused Extensive Cell Death. The following is a list of genes that when knocked down lead to w... (Paulsen_ST3.xlsx) 4. Table S4. Categories of Enrichment Determined by DAVID Bioinformatic Database and Ingenuity Pathway Analysis. The g... (Paulsen_ST4.xlsx) 5. Table S5. List of Deconvoluted Genes. The following is a list of genes for which we individually tested four differ... (Paulsen_ST5.xlsx) 6. Table S6. Individual Components of Modules and Networks Enriched Amongst Screen Hits. The individual modules i... (Paulsen_ST6.xlsx) 7. Table S7: 53BP1 Staining Results and Table S8: Phospho-H3 Recovery Assay Results. (Paulsen_ST7_ST8.xlsx) 8. Table S9. Table of Genes Identified in This Screen and Other Screens. Hits from complimentary siRNA screens were co... (Paulsen_ST9.xlsx) 9. Table S10: H2AX Values From Retesting mRNA Processing Genes and the Effect of RNAseH Treatment, Table S11: mRNA Pro... (Paulsen_ST10_ST11_ST12.xlsx)

ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Karlene Cimprich, Primary Adviser

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Tobias Meyer

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Clifford Wang

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Joanna Wysocka

Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost Graduate Education

This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives.

iii

ABSTRACT

Genome instability has long been known to be a hallmark of cancerous cells, but the cellular causes and consequences of such instability are still not fully understood. Mutations, translocations, DNA rearrangements, as well as chromosomal loss can all result in the loss of genomic integrity. To prevent the disruption of cellular homeostasis due to DNA damage accumulation, cells contain pathways to sense and respond to DNA damage including cell cycle checkpoints and numerous DNA repair processes, collectively known as the DNA damage response (DDR). Mutations in many of the genes involved in the DDR are linked to several diseases, including premature aging, neurodegeneration and cancer. These signaling pathways are especially critical during DNA replication when the DNA is unwound and vulnerable to processing. Here, the cell relies on the S-phase checkpoint to sense DNA damage at the sites of replication forks and to facilitate a number of downstream pathways to maintain genomic stability. These processes include blocking further origin firing, facilitating DNA repair, preventing cell cycle progression, and stabilizing stalled replication forks.

Here, two genome-wide siRNA screens were employed to identify additional genes involved in genome stabilization by monitoring phosphorylation of the histone variant H2AX, an early mark of DNA damage. The first screen looked at H2AX phosphorylation that occurred simply by individual depletion, and the second screen used a low level of a replication inhibitor, aphidicolin, to specifically identify genes that were needed to prevent DNA damage during S-phase, potentially due to the loss of replication fork stabilization mechanisms. While the results from the second screen are still undergoing further characterization, we did discover hundreds of genes whose down-regulation led to elevated levels of H2AX phosphorylation (γH2AX) in the absence of any external stress. From this gene set, we identified many gene networks that were significantly enriched amongst our screening hits as well as several intriguing individual genes that were chosen for follow up study.

iv

From our screen, we found a widespread role for mRNA processing factors, predominantly mRNA splicing , in preventing γH2AX, with the loss of nearly ninety mRNA processing genes capable of inducing DNA damage. Additionally, we discovered that in some cases, loss of proper mRNA processing caused damage due to aberrant RNA-DNA structure formation, along with defects in the replication progression. Furthermore, we connected increased γH2AX levels to the neurological disorder, Charcot-Marie-Tooth (CMT) syndrome, and we found a role for several CMT proteins in the DNA damage response with loss causing DNA damage sensitivities and DNA repair deficiencies. Finally, the genome-wide siRNA screen also lead to the discovery of the histone methyltransferase PR-Set7/Set8 as a novel mediator of genomic integrity. We demonstrated that Set8 has multi-faceted roles to facilitate replication progression and prevent DNA damage formation. Cumulatively, this thesis highlights that preservation of genome stability is mediated by a larger network of biological processes than previously appreciated, and understanding all mechanisms in place to prevent DNA damage should help elucidate how cells prevent mutations, translocations, and ultimately, oncogenic transformation.

v

ACKNOWLEDGMENTS

I am deeply grateful for all of the faculty, family, and friends who have helped during the course of my PhD. First and foremost, thank you to my advisor, Karlene Cimprich for guiding me throughout the process, and serving as a good mentor and role model both in science and life. Thank you also to my committee members, Tobias Meyer, Joanna Wysocka, Cliff Wang, and Teresa Wang for their helpful suggestions and useful criticism along the way. Many thanks also go to the Cimprich lab members both past and present. Thanks to Sharon Barr and Ryan Bombargden for getting me up and running in the lab. Thanks to Jia-Ren Lin, Michelle Zeman, Claudia Choi, and Anna Guan for being excellent grad student lab mates along the way. Also thanks to Deena Soni for all of the help with the siRNA screen and Robert Driscoll, Anja Duursma, Julie Sollier, Thomas Wechsler, and Andrew Kile for all of the good ideas and fresh enthusiasm you have brought to the lab. A special thanks also to M.C. Yee who is simply an inspiration in so many ways. Thank you for all of your help scientifically as well as mentoring and family support. I would not have made it through graduate school without you, and I greatly appreciate all you have done for me and my family. A special thanks also to Chris Van, Debbie Chang, and Angie Hahn for all of the advice, commiseration sessions, support, food, and friendship. Graduate school wouldn’t have been nearly as much fun without you all of you. Outside of the Cimprich lab thank you to Roy Wollman, Annette Salmeen, Phil Vitorino, Lin Gan, David Solow-Cordero, and Jason Wu for all of your help with the genome wide screen, analysis, and interpretation. You made it all possible and thankfully make sense. Also a special thanks to Blueshift biotechnologies, now part of MDS analytical technologies for letting us invade with an idea, and leave with beautiful scientific results. Jayne Hesley, Steve Miller, and Evan Cromwell, we would still be screening the genome if it were not for you and your brilliant instrumentation. Thank you also to my friends in the CSB department, Emily Eggler, Andy Poon, Paul Rack, Mark Sellmyer, and the many others who have come and gone along the way.

vi

Thank you also to my family for supporting me throughout this process. Thanks to my parents Stanley and Nancy Baack for instilling in me the values to succeed in life. You have always been a constant source of support. Thank you to my sister, Jennifer Baack, who has patiently proof read countless manuscripts, and papers. Thank you also to my extended family, Wes and Pat Paulsen. I am so blessed to have two sets of supporting parents in my life, and I am very grateful to Pat for all of her help during this process.

Finally, thank you to my husband, my best friend, and the love of my life TJ Paulsen. You have stood by my side, listened to my complaints, been a constant source of reason, stolen me from the bench when appropriate, and made me a better person by knowing you. You and Meagan have been the joy of my life and I cannot tell you how thankful I am for your love, comfort, and support throughout this processes. This would not have been possible without you. We made it through.

vii

DEDICATION

I wish to dedicate this thesis to my daughter Meagan E. Paulsen.

May you achieve your own life’s ambitions, never lose your curiosity, love those around you, and retain joy along the way.

viii

ABBREVIATIONS

911 Rad9-Rad1-Hus1 α Alpha A Adenine ac Acetyl ADP Adenosine diphosphate AID Activation induced cytodine deaminase AP Aphidicolin Aph Aphidicolin ASF/SF2 Alternative splicing factor AT Ataxia telangiecstasia ATM Ataxia telangiecstasia mutated ATP Adenosine triphosphate ATR ATM and Rad3-Related ATRIP ATR interacting protein bp BrdU Bromo deoxy-uridine BSA Bovine serum albumin C Cytosine CAF1 Chromatin assembly factor 1 CHIP Chromatin Immuno precipitation CIAP Calf intestinal alkaline phosphatase CIN Chromosomal instability CMT Charcot-Marie-Tooth disease CPT Camptothecin CS Cockayne's Syndrome cs Catalytic subunit δ Delta DAPI 4'6-Diamidino-2-phenylindole DAVID Database for annotation visualization and integrated discovery DDR DNA damage response di Two DMEM Dulbecco's Modified Eagle's Medium DMSO Dimethyl sulfoxide DNA Deoxyribonucleic acid DNA-PK DNA-protein kinase DSB Double-strand DNA break dsDNA Double-strand DNA

ix

ε Epsilon EdU 5' ethyl-2'-deoxyuridine EM Expectation maximization ERCC Excision repair FA Fanconi's Anemia FACS Fluorescent activated cell sorting FBS Fetal Bovine Serum FDR False Discovery Rate G Guanine GABA Gamma-aminobutyric acid GCR Gross chromosomal rearrangement GFP Green fluorescent protein GJB1 Gap junction GL3 Luciferase GM Gaussian mixture GO h Hour H2AX Histone H2AX H4 Histone H4 HR Homologous recombination HSPB8 Heat shock protein B8 HU Hydroxyurea IR Ionizing radiation K Lysine kB Kilobase LMNA Lamin A Luc Luciferase MB Megabase MCM Mini- maintenance me Methyl MEF Mouse embryonic fibroblast min Minute MIN Micro-satellite instability MMC Mitomycin C MMS Methyl-methane sulfonate mono One MPZ Myelin protein zero NCBI National center for biotechnology information NER Nucleotide excision repair NHEJ Non-homologous end joining nM Nanomolar NPC Nuclear pore complex NT Nontreated OMIM Online mendelian inheritance in man

x

Opti-MEM Reduced serum modification of eagle's minimum essential media ORC Origin recognition complex PANTHER Protein analysis through evolutionary relationships PBS Phosphate-buffered saline PCNA Proliferating cell nuclear antigen PCR chain reaction PFA Para-formaldehyde PI Propidium Iodide PIKK Phosphatidylinositol-kinase related protein kinases PIP PCNA interacting motif Pol Polymerase PP Protein phosphatase PSG Penicillin Streptomycin Glutamine RFC Replication factor complex RNA Ribonucleic acid RNAi RNA interference ROI Region of interest RPA rpm Rotations per minute S Serine SCAN1 Spinocerebellar ataxia with axonal neuropathy SD Standard Deviation SE Standard Error siRNA Small interfering RNA snRNP Small nuclear ribonucleic protein SR Serine/argenine rich SSB Singe-strand DNA break ssDNA Single stranded DNA T Thymine TAM Transcription associated mutation TAR Transcription associated recombination TBS Tris buffered saline TCR Transcription coupled repair TDP1 Tyrosyl DNA phosphatase 1 tri Three WNT Wingless-type MMTV integration site family XP Xeroderma Pigmentosa YFP Yellow fluorescent protein

xi

TABLE OF CONTENTS

Abstract ...... iv Acknowledgments...... vi Dedication ...... viii Abbreviations ...... ix List of tables ...... xvii List of figures ...... xviii 1 Introduction to the causes and consequences of genome instability ...... 1 1.1 Types of genomic instability ...... 1 1.2 Genomic instability and the link to cancer ...... 3 1.3 DNA replication as a source of genomic instability ...... 4 1.3.1 Endogenous challenges for DNA replication ...... 5 1.4 Cellular mechanisms for the prevention of genomic stability ...... 8 1.4.1 Cell cycle checkpoints ...... 8 1.4.2 The S-phase checkpoint's role during DNA replication ...... 10 1.4.3 The S-phase checkpoint is required for cellular viability under conditions of replicative stress ...... 12 1.4.4 Stabilization of stalled replication forks by the S-phase checkpoint ...... 12 1.4.5 Possible targets of the S-phase checkpoint at DNA replication forks ...... 14 1.5 Aims and contributions ...... 19 2 siRNA screening for novel regulators of genome integrity ...... 31 2.1 Introduction ...... 31 2.1.1 Rational for a genome-wide siRNA screen looking for proteins involved in preventing DNA damage ...... 31 2.1.2 Functional genomic screens ...... 32 2.1.3 A readout for DNA damage: Histone H2AX phosphorylation (γH2AX) ....33 2.1.4 Synopsis ...... 34 2.2 Results of the siRNA screens ...... 35

xii

2.2.1 Design and implementation of a genome wide siRNA screen for novel inducers of H2AX phosphorylation ...... 35 2.2.2 Screen for genes involved in maintaining genomic stability (NT dataset) ...37 2.2.2.1 Bioinformatics of screening hits ...... 39 2.2.2.2 siRNA screening hit validation ...... 41 2.2.2.3 siRNA screening protein interaction network analysis ...... 44 2.2.3 Results of the targeted screen for novel regulators of replication fork stability (AP dataset) ...... 52 2.2.3.1 Bioinformatics of screening hits ...... 53 2.2.4 Secondary Screening ...... 55 2.2.4.1 53BP1 foci formation ...... 55 2.2.4.2 Phospho-H3 Recovery Assay ...... 56 2.3 Discussion ...... 56 3 The roles of co-transcriptional processes in maintaining genome stability ...... 65 3.1 Introduction ...... 65 3.1.1 Co-transcriptional processes: basic biological mechanisms ...... 65 3.1.2 Transcription as a source of DNA damage ...... 67 3.1.3 Links between the DNA damage response and mRNA processing ...... 73 3.1.4 Synopsis ...... 73 3.2. Results of mRNA processing genes followup studies ...... 74 3.3.1 Loss of mRNA processing genes induce DNA damage ...... 74 3.3.2 The H2AX signal upon mRNA processing gene knockdown is reducced by over-expression of the RNAse H ...... 78 3.3.3 Detection of R-loops in the absence of proper mRNA processing ...... 79 3.3.4 Loss of mRNA processing induces cell cycle arrest, replication progression defects, and damage during S-phase ...... 83 3.3.5 mRNA processing genes affect checkpoint and double strand break repair processes ...... 88 3.3 Discussion ...... 89 4: Charcot-Marie Tooth genes and the link to DNA damage ...... 97 4.1 Introduction ...... 97

xiii

4.1.1 Charcot-Marie-tooth disease: clinical designations and involved genes .....98 4.1.2 DNA damage response genes with neuronal phenotypes ...... 103 4.1.3 Why are neurons specifically sensitve to DNA damage? ...... 104 4.1.4 Synopsis ...... 105 4.2 Results of Charcot-marie-tooth genes followup studies ...... 105 4.2.1 Charcot-marie-tooth genes cause DNA damage and DNA damage sensitvity when lost ...... 105 4.2.2 DNA repair defects are visible upon loss of Charcot-Marie-Tooth genes ..108 4.2.3 Charcot-Marie-Tooth patient cell lines show a heightened checkpoint response...... 110 4.3 Discussion ...... 111 5: The role of Set8 in maintaining genome stability ...... 119 5.1 Introduction ...... 119 5.1.1 Chromatin modifying and the DNA damage response ...... 119 5.1.2 The Set8 histone methyltransferase ...... 121 5.1.3 Histone H4 K20 methylations ...... 121 5.1.4 Additional targets of Set8 ...... 122 5.1.5 Synopsis ...... 122 5.2 Results ...... 123 5.2.1 Set8 is needed to maintain genomic stability ...... 123 5.2.2 Set8 loss induces DNA damage during S phase ...... 125 5.2.3 Set8 knockdown causes activation of the DNA damage checkpoint ...... 127 5.2.4 53BP1 foci formation is inhibited by Set8 loss ...... 129 5.2.5 Loss of 53BP1 foci formation uon Set8 knockdown does not inhibit Chk2 activation ...... 130 5.2.6 Set8 knockdown reduces H4K20 methylation and overall chromatin compaction status ...... 131 5.2.7 A prior mitosis in not required to induce DNA damage in the absence of Set8 ...... 133 5.2.8 The damage induced by Set8 knockdown can be relieved by codpletion with either Rad51 or Mus81 ...... 135

xiv

5.3 Disucssion ...... 137 6: Outlook and future directions ...... 147 APPENDIX A: siRNA screening protocols and analysis methods ...... 155 A.1 Robotic siRNA procedures ...... 155 A.1.1 High throughput reagents, buffers and equipment used ...... 155 A.1.2 384 well robotic siRNA transfection ...... 157 A.1.3 384 well γH2AX immunofluorescence staining and imaging ...... 161 A.2 Data analysis and statistical methods ...... 163 A.2.1 Per plate data transformation ...... 164 A.2.2 Calculation of signal strength per well ...... 166 A.2.3 Assignment of P-values for each well ...... 169 A.2.4 Mathmatical derivation of the P-value ...... 170 A.2.5 Correction for multiple testing and assignment of significance groups .....172 A.3 Other screening protocols, analysis, and calculations ...... 172 A.3.1 siRNA deconvolution ...... 172 A.3.2 Z' factor calculation ...... 173 A.3.3 Propidium iodide cell cycle analysis ...... 173 A.3.4 Bioinformatic methods ...... 174 APPENDIX B: General lab and cell culture protocols ...... 177 B.1 Basic Cell culture maintenance ...... 177 B.1.1 Cell culture care ...... 177 B.1.2 Cell lines used ...... 179 B.2 Experimental cell culture protocols ...... 180 B.2.1 Immunofluorescence ...... 180 B.2.2 Transient transfection DNA and/or siRNA ...... 181 B.2.3 Retroviral transfection/ stable cell line generation ...... 183 B.2.4 G2/M checkpoint assay ...... 184 B.2.5 EdU/p-H3 or EdU/γH2AX co-staining ...... 184 B.2.6 Homologous recombination assay ...... 186 B.2.7 DNA damage sensitivity assay ...... 187 B.2.8 BrdU FACS cell cycle tracking ...... 187

xv

B.2.9 RNAse H H2AX assay ...... 188 B.3 Other general protocols ...... 189 B.3.1 Miccrococal nuclease chromatin digestion assay ...... 189 B.3.2 Sodium Bisulfite sequencing assay ...... 190 B.3.3 Histone extraction protocol ...... 193 APPENDIX C: Cloning, antibody, and siRNA information ...... 195 C.1 General cloning protocols ...... 195 C.2 Antibody information ...... 198 C.3 Recombinant proteins ...... 198 C.3.1 human RNAse H1 ...... 198 C.3.2 human Pr-Set7/Set8 ...... 199 C.3.3 human Znf574 ...... 199 C.4 Dharmacon siRNA information ...... 201 APPENDIX D: Supplementary tables ...... 209 Table S1: γH2AX signal and cell cycle distribution for the genomic NT siRNA screen ...... 209 Table S2: List of genes with significant γH2AX values (NT screen) ...... 209 Table S3: Genes that caused extensive cell death ...... 209 Table S4: Categories of enrichment determined by DAVID bioinformatic database and Ingenuity pathway analysis ...... 209 Table S5: List of deconvoluted genes and γH2AX values ...... 209 Table S6: Individual components of gene modules and networks enriched amongst screen hits ...... 210 Table S7: 53BP1 staining experimental results ...... 210 Table S8: H3 recovery assay experimental results ...... 210 Table S9: Literature cross comparison of screening results ...... 210 Table S10: γH2AX values from retesting mRNA processing genes and the effect of RNAse H treatment ...... 210 Table S11: mRNA processing γH2AX values +/- aphidicolin treatment ...... 210 Table S12: Replication and DNA damage correlation for the mRNA processing genes ...... 210

xvi

LIST OF TABLES

Number Page Table 2.1: Screening hits that scored with 4 of 4 siRNAs tested ...... 43 Table 3.1: mRNA processing genes identified by the siRNA screen ...... 75 Table 4.1: Charcot-Marie-Tooth gene classification and effects on γH2AX in the original screen and deconvolution anlaysis ...... 101 Table 4.2: List of neurological phenotypes from mutations in genes involved in DNA repair and the DNA damage response ...... 103 Table 4.3: Individual Charcot-Marie-Tooth gene and siRNA information ...... 107 Table C1: Individually ordered siRNA duplexes ...... 201 Table C2: siRNA duplex information for the mRNA processing library ...... 204 Table S1: γH2AX signal and cell cycle distribution for the genomic NT siRNA screen ...... 209 Table S2: List of genes with significant γH2AX values (NT screen) ...... 209 Table S3: Genes that caused extensive cell death ...... 209 Table S4: Categories of enrichment determined by DAVID bioinformatic database and Ingenuity pathway analysis ...... 209 Table S5: List of deconvoluted genes and γH2AX values ...... 209 Table S6: Individual components of gene modules and networkd enriched amongst screen hits...... 210 Table S7: 53BP1 staining experimental results ...... 210 Table S8: H3 recovery assay experimental results ...... 210 Table S9: Literature cross comparison of screening results ...... 210 Table S10: γH2AX values from retesting mRNA processing genes and the effect of RNAse H treatment ...... 210 Table S11: mRNA processing γH2AX values +/- aphidicolin treatment ...... 210 Table S12: Replication and DNA damage correlation for mRNA processing genes ... 210

xvii

LIST OF FIGURES

Number Page Figure 1.1: Oncogene induced DNA damage model for cancer development ...... 2 Figure 1:2. Model of stalled fork processing during DNA replication ...... 7 Figure 1.3: Eukaryotic cell cycle and checkpoints ...... 9 Figure 1.4: Model of replication checkpoint activation and function ...... 11 Figure 1.5: Potential targets of ATR and Chk1 necessary for replication fork stabilization ...... 16 Figure 2.1: H2AX phosphorylation as a readout of DNA damage ...... 34 Figure 2.2: Rationale for the genome-wide siRNA screens ...... 36 Figure 2.3: siRNA screen for genes suppressing H2AX phosphorylations ...... 38 Figure 2.4: Functional classification of statistically significant gene set ...... 40 Figure 2.5: Classification enrichment of statistically significant gene set ...... 41 Figure 2.6: Screening validation ...... 42 Figure 2.7: Network modeling of screen hits identifies DNA replication and DNA repair functional groups linked to genome maintenace ...... 45 Figure 2.8: Network modeling of screen hits identifies mRNA processing and Charcot-Marie-Tooth functional groups linked to genome maintenace ...... 47 Figure 2.9: Network modeling of screen hits identifies nuclear pore and pericentric binding functional groups linked to genome maintenace ...... 48 Figure 2.10: Network modeling of screen hits identifies circadian rhythm, GABA signaling, and WNT signaling functional groups linked to genome maintenace ...... 50 Figure 2.11: Interconnections between functional groups linked to genome maintenance ...... 51 Figure 2.12: Screen for genes suppressing replication fork stability ...... 52 Figure 2.13: Functional classification and enrichment analysis of the aphidicolin statistically significant gene set ...... 54

xviii

Figure 3.1: Basic eukaryotic mRNA splicing cycle ...... 66 Figure 3.2: Simplified depiction of RNA/DNA hybrid formation that could occur in the absence of mRNA processing factors ...... 70 Figure 3.3: Possible mechanisms for RNA/DNA hybrid induced DNA damage ...... 72 Figure 3.4: Functional assays for mRNA processing genes affecting H2AX phosphorylation...... 74 Figure 3.5: Reduction of H2AX phosphorylation by over-expression of RNAse H ...... 79 Figure 3.6: Sodium bisulfite method for detection of alternative DNA structures ...... 81 Figure 3.7: Sodium bisulfite sequencing of the β-actin after splicing gene knockdown ...... 82 Figure 3.8: Effect of knockdin down mRNA processing genes on the cell cycle ...... 83 Figure 3.9: Effect of aphidicolin on the H2AX phosphorylation observed post mRNA processing gene knockdown ...... 84 Figure 3.10: Replication dependence of the DNA damage induced by mRNA processing gene knockdown ...... 85 Figure 3.11: Cell cycle and replication progression analysis after splicing gene knockdown ...... 87 Figure 3.12: Loss of mRNA processing genes leads to checkpoint and DNA repair defects ...... 88 Figure 4.1: Loss of Charcot-Marie-Tooth disease genes leads to increased H2AX phosphorylation...... 106 Figure 4.2: Loss of Charcot-Marie-Tooth disease genes lead to enhance H2AX phosphorylation in the presence of aphidicolin ...... 108 Figure 4.3: Loss of Charcot-Marie-Tooth disease genes leads to increased DNA damage sensitivity and DNA repair defects ...... 109 Figure 4.4: Charcot-Marie-Tooth pateint cell lines shown a heightened checkpoint response...... 110 Figure 5.1: Nucleosome architechture ...... 120 Figure 5.2: Set8 is needed to maintain genomic stability ...... 124 Figure 5.3: Set8 loss induces DNA damage in S-phase and is required for efficient DNA replication ...... 126

xix

Figure 5.4: Set8 knockdown causes the activation of the DNA damage checkpoint .....127 Figure 5.5: Set8 replication progression defects are partially relieved by checkpoint inhibition ...... 128 Figure 5.6: Set8 knockdown inhibits 53BP1 foci formation ...... 130 Figure 5.7: Set8 knockdown reduces histone H4K20 methylation and alters chromatin compaction status...... 132 Figure 5.8: A prior mitosis is not required for H2AX phosphorylation induction in the absence of Set8 ...... 134 Figure 5.9: The DNA damage induced by Set8 knockdown can be relieved by co- depletion of either Rad51 or Mus81 ...... 136

xx

CHAPTER 1

Introduction to the causes and consequences of genome instability

Parts of this chapter have been adapted with permission from:

“The ATR Pathway: Fine-Tuning the Fork” Paulsen RD and Cimprich KA, DNA Repair, 2007; 6: 953-66.

1.1 Types of genome instability

Genomic instability leads to mutations and chromosomal rearrangements that are found within cancerous cells, and the underlying DNA damage can come in a range of forms. Most cancers are known to possess chromosomal instability (CIN), which refers to an elevated rate of changes in chromosome structure and number as compared to normal proliferating cells1, 2. This leads to the rapid gain or loss of whole and is caused by failures in either mitotic chromosome transmission or the spindle mitotic checkpoint3. Other forms of genomic instability include micro and mini-satellite instability (MIN), which are characterized by the expansion or contraction in the number of oligonucleotide repeats present in the microsatellite regions and can be caused by replication slippage or defects in mismatch or homologous recombination mediated repair4. Increased mutation frequency at the DNA base level is also frequently observed in cancers, as are increases in gross chromosomal rearrangements (GCRs) including chromosomal translocations, duplications, inversions, and deletions5. These types of instability are commonly due to mutations in DNA replication and repair genes that are required to either prevent or respond to the formation of DNA double-strand breaks in the cell6. Overall, while the varying types of genomic instability have begun to be defined, deciphering how and when the instability arises along with the molecular mechanisms in place to prevent such instability from occurring remain open-ended questions.

1

Figure 1.1. Oncogene-induced DNA damage model for cancer development. Genomic instability first arises due to DNA damage caused by oncogene activation. This activates the cellular checkpoints which cause cells to either die by apoptosis or stop growing through cellular senescence. However, once a secondary mutation occurs in the tumor suppressor genes or checkpoint genes, the precancerous cell is free to undergo replication without proper checkpoint and repair processes, allowing for the opportunity for additional detrimental mutations to form, driving cancer development.

2

1.2 Genome instability and the link to cancer

In order for a normal cell to develop into a cancer cell, the cell needs to acquire several functional capabilities it does not inherently have including: self sufficiency in growth signals, insensitivity to anti-growth signals, evasion of apoptosis, sustained angiogenesis, tissue invasion and metastasis, and unlimited replicative potential7. All of these capabilities can be acquired by mutations or genomic instability altering key cellular mechanisms, such as the activation of oncogenes or disruption of tumor suppressors.

The link between genomic instability and cancer development is clearly demonstrated in the cases of hereditary cancers. Hereditary cancers and some premature aging syndromes are known to possess mutations in a variety of genes known to have roles in DNA repair processes including BRCA1 (Breast cancer associated 1), BRCA2 (breast cancer associated 2), TP53 (tumor protein ), ATM (ataxia telangiectasia mutated), PALB2 (partner and localizer of BRCA2), BRIP1 (BRCA1 interacting partner) WRN (Werner syndrome ), NBS1 (Nijmegan breakage syndrome protein 1), CHEK2, RAD50, BLM (Bloom syndrome helicase), RECQL4 (RecQ like protein-like 4), and the Fanconi anemia genes1. These observations suggest that in the case of hereditary cancer, spontaneous genomic instability due to the inability to respond to or repair DNA damage accurately drives tumour development by increasing the mutation rate. This concept is termed the “mutator hypothesis”8. However, while genome instability is a universal phenotype of cancer, hereditary cancers only account for a small percentage of cancer development1.

What then is the source of the genomic instability in sporadic cancers? An obvious hypothetical answer to this question would be that the most commonly mutated genes in cancer cells would be responsible for maintaining genomic stability. However, a number of recent sequencing studies that looked at the frequency of mutations in cancer cells throughout their genomes found that in sporadic cancers, the number of genes that are mutated and/or deleted at high frequencies is very low with only 6 genes TP53, EGFR (epidermal growth factor receptor), RAS (small GTPase), p16INK4A (cyclin-dependent kinase 4 inhibitor), PTEN (phosphatase and tensin homolog), and NF1

3

(neurofibromatosis type 1) altered in more that 20% of the tumors analyzed9-14. The “mutator hypothesis” would predict that mutations affecting genes involved in DNA repair would be frequent and occur early in cancer progression. However the only DNA repair gene identified at high frequency was the tumor suppressor p53. The other genes found all encoded oncoproteins or proteins that antagonize the growth-promoting activities of oncoproteins. These observations suggest that the genome instability in sporadic cancers could be arising due to the loss of proliferation control that occurs upon oncogene activation (Fig. 1.1). Correspondingly, several studies have shown that oncogene activation induces DNA replication stress and can lead to DNA double strand break formation in precancerous lesions15-18. Therefore, in sporadic cancer, genomic instability may be intimately tied with the cells ability to accurately complete DNA replication challenged by oncogene induced replicative stress (Fig. 1.1). While other processes have also been shown to induce DNA damage including, improper maintenance19, cytokinesis defects3, DNA repair deficiencies4, and defects in mitochondrial function20, the following text will highlight the challenges DNA replication presents to maintaining genomic stability as a linkage between replication associated damage and the cancer phenotype is becoming increasingly clear.

1.3 DNA replication as a source of genomic instability

DNA replication is a particularly challenging time for genomic stability. Even under normal conditions, DNA replication forks must overcome lesions or obstacles in the DNA that have the propensity to stall or collapse oncoming replication forks into DNA double-strand breaks (DSBs). Replication-associated DNA breaks can be generated by several mechanisms. First, DNA breaks can occur if the replication fork encounters a single-strand DNA (ssDNA) nick, leading to discontinued synthesis of the nascent strand and subsequent DSB formation21. Second, lesions could block synthesis on one strand without impeding fork progression, thus allowing the replication to restart downstream of the lesion and leave behind a ssDNA gap on either the leading or lagging strand22, 23. Finally, replication forks can also stall at DNA lesions. Without proper stabilization, these stalled forks can be processed by a number of mechanisms that lead to DNA breaks and potential rearrangements.

4

1.3.1 Endogenous challenges for DNA replication

Secondary DNA structure and fragile sites

In addition to the DNA lesions occurring naturally within the cell due to metabolic byproducts and damage induced by external agents, the genome also contains several inherent impediments to oncoming replication forks. Secondary DNA structures, such as hairpins caused by repeated DNA sequences, can interfere with replication fork progression24. Also, cells contain fragile sites within the genome that are large chromosomal regions that replicate late during S-phase and are frequently associated with hotspots of translocations, gene amplifications, integration of exogenous DNA and other rearrangements25, 26. Fragile sites are not known to have any conserved repeats, but they may be enriched in flexible A-T rich sequences that are more prone to form secondary structure26. Their presence is conserved from yeast to humans, and interestingly, their breakage increases in the presence of replicative stress25.

Transcription

Transcription of a DNA sequence increases its frequency of recombination, a process referred to as transcription-associated recombination (TAR)27. In addition to stimulating rearrangements, transcription is also known to induce mutations within the template gene, a process known as transcription-associated mutation (TAM)28. While the mechanisms behind these phenomena are not entirely understood, several scenarios could be imagined where the intersection of transcription and replication would cause genomic instability. In S. cerevisiae TAR is detected in S-phase but not in G1, suggesting that a collision between the replication fork and transcription machinery is the underlying trigger for chromosomal rearrangement29. Mutations in mRNA processing are also know to induce the accumulation of ssDNA on the non-template DNA strand due to the formation of RNA/DNA hybrids with the transcribed DNA30, 31. These ssDNA regions can serve as recombination hotspots or function as blockages to oncoming replication forks, and in either case would create challenges for the preservation of genomic integrity.

5

Chromatin

The positioning of nucleosomes onto DNA as well as the tertiary structure of chromatin serve as barriers for all processes that require access to the DNA, including replication. During each S-phase the nucleosomes must be dismantled in front of the oncoming replication fork and reassembled behind the fork after duplication32. Disruption of these processes can result in DNA damage and replication defects. In human cells, loss of the essential function of CAF1 (chromatin assembly factor 1), which is responsible for depositing newly synthesized histones onto DNA, causes spontaneous DNA damage and defects in S-phase progression33. Similarly loss of Asf1 in yeast, which serves as a histone chaperone, directly activates the DNA damage checkpoint, causes G2/M cell cycle accumulation, and makes cells highly sensitive to DNA damaging agents34, 35. Additionally, many factors involved in chromatin remodeling have been shown have defects in replication and/or DNA repair processes that could contribute to the formation of genomic instability36, 37.

1.3.2 The dangers of stalled replication forks

Responding to the formation of a stalled replication fork is a very common challenge during replication38, 39. When fork progression is halted, the offending lesion or barrier must be dealt with to allow completion of DNA replication (Fig. 1.2). In some cases, this may involve repair of the lesion or removal of the barrier. For some types of DNA damage, the lesion can also be bypassed in an error-prone or error-free manner to allow for repair at a later time40, 41. If replication is to resume from the stalled fork, it is crucial that the fork remain in the proper configuration and that components of the stay associated with the fork. Without proper regulation, replisome components can dissociate, causing replication fork collapse42-47. The stabilization of every stalled fork may not necessarily be essential since adjacent origins could allow for replication through this region at a later time. However, if two converging forks collapse, or if a fork stalls in a telomeric region where other origins may not be available, the completion of DNA replication may be compromised.

6

Figure 1.2. Model of stalled fork processing during DNA replication. When a replication fork encounters a lesion, the polymerase stalls while the helicase continues to unwind the DNA. This functional uncoupling between the helicase and polymerase activities generates a large stretch of ssDNA, which binds RPA. If a stalled fork is recognized by ATR, it facilitates fork stabilization through direct phosphorylation of replication fork targets and via Chk1-dependent phosphorylation (right pathway). Stabilization requires continued association of at the stalled fork, inhibition of homologous recombination and possibly also inhibition of the MCM helicase. These

7

events keep the replisome at the fork and help maintain a conformation capable of resuming replication. Replication can then resume at this site if the barrier is removed by repair or some other process, or by lesion bypass. If the fork remains blocked, replication may be completed by a fork entering from an adjacent origin or by homologous recombination. If the stalled fork is not stabilized, it can undergo collapse losing DNA polymerases and accumulating additional ssDNA (left pathway). Fork recovery requires reloading of DNA polymerases and removal or bypass of the barrier. If the DNA polymerases are not reloaded, ssDNA and other structures that form at the fork may be cleaved by nucleases, leading to DNA double-strand break formation. DNA double- strand breaks that form can be used to restart replication through homologous recombination mechanisms, but inappropriate and elevated levels of recombination can also lead to gene loss and chromosomal rearrangements.

Stalled replication forks are also detrimental as they may accumulate large sections of single-stranded DNA (ssDNA) and may be more susceptible to fork reversal or other rearrangements. These alternate structures of the parental and nascent DNA strands are more readily targeted by some nucleases. Thus, failure to stabilize stalled forks can also result in the formation of DNA double-strand breaks which can ultimately lead to chromosomal rearrangements and loss of genetic material46, 48-50. Although there are recombination mechanisms to restart replication forks from these breaks, their unscheduled and excessive production is a potentially hazardous event that can lead to illegitimate recombination and genomic instability51-53. Because of the threat imposed by stalled replication forks, all organisms have evolved pathways that actively stabilize and restart stalled and possibly even collapsed forks.

1.4 Cellular mechanisms for the prevention of genome instability

1.4.1 Cell cycle checkpoints

To cope with DNA damage both during and outside of DNA replication, cells have evolved elaborate mechanisms known as cell cycle checkpoints to sense and respond to DNA lesions (Fig. 1.3). Checkpoints are present during each phase of the cell cycle to ensure proper transmission of genetic information to the next generation of cells. In

8

response to DNA damage, activation of the cell cycle checkpoints is accomplished by a signaling pathway known as the DNA damage response51, 54-56. Central to the DNA damage response and checkpoint are the phosphatidylinositol-kinase related protein kinases (PIKKs), ATM (ataxia telangiectasia mutated) and ATR (ATM and Rad3- related), which effectively sense DNA lesions caused by DNA damage and replication stress and respond in turn by activating downstream effectors and other kinases. This pathway coordinates many aspects of the DNA damage response, including processes that regulate the cell cycle, DNA repair, transcription, cellular senescence, and apoptosis57, 58. Because the genome is under constant assault from endogenous and exogenous sources of stress, loss of the checkpoint or other components of the DNA damage response leads to increased basal DNA damage and a loss of genome stability45, 54.

Figure 1.3. Eukaryotic cell cycle and checkpoints. The eukaryotic cell cycle is comprised of four phases, each of which is regulated by a cell cycle checkpoint (red flag). During G1, the cell prepares for DNA duplication by increasing in size and protein content. Within S-phase, the cell replicates its DNA. During G2, the cell undergoes additional growth as it prepares for the division that occurs during M phase, or mitosis, where the cell divides into two daughter cells with identical DNA content.

9

In addition to facilitating genomic stability, the DNA damage response also serves as a barrier to cancer progression15, 17. As stated previously, cell proliferation driven by oncogene activation creates replication stress and results in increased double strand breaks during S-phase. This replicative stress and DNA damage activates the DNA damage checkpoint which facilitates the arrest of cellular growth through senescence16, 18. Thus, only upon a secondary mutation in the DNA damage checkpoint proteins, such as p53, would the precancerous cells become particularly problematic, driving tumor progression by reentering the cell cycle and promoting the loss of genomic integrity (Fig. 1.1).

1.4.2 The S-phase checkpoint’s role during DNA replication

The genome is particularly vulnerable to damage and resulting genome instability during DNA replication when agents that block replication fork progression can lead to DNA double strand breaks and a loss of genetic information. In response to replication fork stalling, cells activate an intra-S-phase checkpoint pathway, also known as the replication checkpoint to facilitate a number of different mechanisms to preserve genome stability59, 60. Central to this checkpoint signaling pathway is the DNA damage sensing kinase ATR and its functional homologs in Saccharomyces cerevisiae (S. cerevisiae), Mec1 and in Schizosaccharomyces pombe (S. pombe), Rad3. ATR or its homologs are recruited to stalled replication forks, and once activated, phosphorylate a number of proteins, including the downstream effector kinase, Chk1. The phosphorylation and ensuing activation of Chk1 or its functional homologs, Rad53 (S. cerevisiae) and Cds1 (S. pombe), requires the activity of several other proteins, and in mammalian cells, these proteins include Claspin, TopBP1, the Rad9-Hus1-Rad1 (9-1-1 complex) and the Rad17- Rfc2-5 complex. Once activated, the replication checkpoint kinases block cell cycle progression, downregulate late origin firing, stabilize stalled replication forks, and facilitate the restart of collapsed forks (Figure 1.4).

10

Figure 1.4 Model of replication checkpoint activation and function When a polymerase encounters a lesion that prevents its progression, the MCM helicase and the polymerase on the opposing DNA strand will continue to unwind and synthesize DNA. This functional uncoupling between the stalled polymerase and helicase leads to accumulation of ssDNA on the stalled strand, which recruits the ATR-ATRIP complex. The accumulated ssDNA as well as other DNA structures at the stalled fork bind to and recruit a number of other replication checkpoint proteins necessary for activation of ATR and phosphorylation of Chk1. Activation of this pathway promotes genomic stability by down-regulating further origin firing, facilitating replication fork restart, stabilizing stalled replication forks, and blocking cell cycle progression. In this figure, the lesion is shown on the leading strand. However, it is not known whether activation of ATR can occur on both strands or if it is differentially affected by placement of the lesion on one strand.

11

1.4.3 The S-phase checkpoint is required for cellular viability under conditions of replicative stress

Genomic integrity is severely compromised by loss of the replication checkpoint kinases. Disruption of ATR results in early embryonic lethality in mice61, and cell death in murine embryonic fibroblasts (MEFs) and human cells, even in the absence of exogenous DNA damage62, 63. These cells accumulate DNA double-strand breaks during S-phase, an effect that is further enhanced by aphidicolin, an inhibitor of replicative polymerases61.

Like ATR, disruption of Chk1 in mice leads to early embryonic lethality64, 65. In addition, although Chk1-deficient DT40 cells are viable, they exhibit many phenotypes similar to those observed upon conditional disruption of ATR45, 66. Finally, loss of Chk1 activity leads to the accumulation of DNA double-strand breaks in both the presence and absence of replicative stress67, 68.

The profound effect on cellular viability that is observed upon loss of ATR or Chk1 raises the question of what is the essential function of the replication checkpoint. Using different Mec1 mutants, it has been shown the roles of Mec1 in regulating late origin firing and cell cycle arrest are genetically separable from its role in preventing fork collapse. More importantly, its functions in late origin firing and cell cycle arrest are of limited importance in maintaining viability when cells are exposed to the alkylating agent, methyl methanesulfonate (MMS)69. These observations indicate the essential function of this checkpoint cascade in response to replication stress to stabilize stalled replication forks.

1.4.4 Stabilization of DNA replication forks by the replication checkpoint

Although the formation of breaks during S-phase certainly suggests that ATR is needed to stabilize stalled replication forks, more direct evidence for a role of the ATR pathway in preventing replication fork collapse comes from a number of studies in yeast. In S. cerevisiae, density transfer techniques have shown that replication forks break down in the absence of Mec1 and Rad53 following MMS treatment70. Similar conclusions were drawn from a study that utilized two-dimensional gel techniques and in which Rad53

12

mutants treated with hydroxyurea (HU) accumulated unusual DNA structures at replication forks and were unable to complete DNA replication71. By controlling expression of Rad53 with an inducible system, these defects were tied to irreversible replication fork collapse rather than a defect in fork recovery69. The DNA structures that accumulate in Rad53-deficient cells have also been observed by electron microscopy, and among them are large stretches of ssDNA, hemireplicated molecules, and reversed replication forks – so-called “chicken-foot” structures49.

Increasing evidence suggests the ATR pathway may also stabilize the replisome at stalled forks in higher . Using caffeine, an inhibitor of ATR and a related kinase ATM (ataxia telangiectasia mutated), as well as 2-aminopurine, a protein kinase inhibitor that blocks checkpoint activation, it was shown that early replicons fail to complete DNA replication when aphidicolin is present, and that PCNA (proliferating cell nuclear antigen) and RPA (replication protein A) redistribute from early to late replicating regions. MCM2, a subunit of the replicative helicase MCM2-7, was also released from chromatin under these conditions72. These observations are consistent with the idea that components of the replisome are lost in the absence of the replication checkpoint causing replication fork collapse.

Chk1-deficient DT40 cells also exhibit a loss of PCNA staining at stalled forks and are unable to recover after release from a replication fork block45, 66. In addition, an inhibitor of Chk1, UCN-01, also prevented resumption of DNA replication in early-replicating domains after release from replication arrest73. These observations are consistent with the idea that Chk1 plays a role in replication fork stabilization and/or recovery by preventing dissociation of or by reloading replisome components at stalled forks. Chk1 was also found to be required for replication fork progression in unstressed cells, and using DNA fiber labeling techniques it was shown that Chk1-deficient cells have higher levels of nascent ssDNA74. These results may indicate these cells accumulate abnormal replication fork structures, as observed upon loss of Rad53 in yeast49.

Although Chk1 plays a critical role in stabilizing stalled replication forks, it may not be needed for all of ATR’s functions in this process. Mec1 mutants are more sensitive to

13

MMS than Rad53 mutants, and deletion of Mec1 resulted in ~200-fold increase in gross chromosomal rearrangements versus a ~30-fold increase observed with Rad53 deletion75. In addition, ChIP analyses have shown that Mec1 but not Rad53 activity is essential to maintain pol α and pol ε at stalled forks46, 76, 77, although, a reduction upon Rad53 loss was observed in one study44. In contrast, the stability of the MCM helicase at stalled forks was compromised in Rad53 mutants46.

1.4.5 Possible targets of the S-phase checkpoint at DNA replication forks

A variety of genetic and biochemical data from both yeast and higher eukaryotes suggest the ATR pathway regulates several processes that contribute to replication fork stability. Based on the phenotypes and structures formed in the absence of ATR signaling and known substrates of ATR, it has been suggested that the ATR pathway is required to stabilize the replisome at stalled forks, to inhibit the formation of reversed forks, to prevent cleavage of stalled and reversed forks and to prevent the formation of excess ssDNA – all of which may suppress recombination46, 47, 49, 78. Although the details of how ATR regulates these processes are still emerging, some of the substrates of ATR regulation include the enzymes that carry out replication and mediate recombination or fork cleavage. Altogether, the S-phase checkpoint is crucial for maintaining genomic stability by responding to replication fork stalling and DNA damage to prevent to collapse of stalled replication forks and subsequent double strand breaks.

Polymerase Association

One function of the ATR pathway is to stabilize the association of replisome components with stalled forks, among which are Pol α and Pol ε (Fig. 1.5A)46, 47. Whether the replisome itself is directly targeted or whether this is an indirect effect of suppressing other processes, however, is unknown. Currently, there is no evidence that Pol ε is directly regulated by ATR or Chk1. However, Pol ε does interact with several components of the checkpoint pathway, including TopBP1/Cut5/Dpb11 and Rad1776, 79. This raises the possibility that ATR-dependent modification of one of these proteins could indirectly stabilize Pol ε at the fork. Interestingly, Rad17 is phosphorylated on two

14

sites by ATR following DNA damage and is also found at sites of DNA replication79-81. In addition, Pol ε may preferentially interact with the phosphorylated form of Rad1779.

There is also no evidence that Pol α is a direct target of Mec1. However, in yeast, Rad53 was shown to indirectly regulate Pol α phosphorylation82. Although the functional relevance of this event is unclear, the activity and presence of Pol α at stalled replication forks appears to be necessary for their stabilization, as reduced levels of Pol α result in a large increase in chromosomal rearrangements83.

Recombination inhibition

Other observations suggest that the ATR pathway may also stabilize stalled replication forks by suppressing recombination during S-phase (Fig.1.5C). In S. pombe, a temporal separation of recombination and replication exists and is altered when the S-phase checkpoint is disrupted by deletion of Cds1. Cds1-deficient cells treated with HU formed foci containing the Rad52 homolog, Rad22, prematurely and with greater frequency. In addition, Rhp51, the S. pombe homolog of Rad51, is not normally required for recovery from HU arrest. Importantly, however, after release from HU, deletion of Rhp51 suppresses the appearance of aberrant replication fork structures that form in the absence of Cds1 and promotes the progression of cells through S-phase84. These observations suggest that the replication checkpoint may delay recombination, and in its absence recombination intermediates may form inappropriately. Similarly, in S. cerevisiae, there is little Rad52 foci formation in cells exposed to HU. However, in checkpoint-deficient mec1 or rad53 cells, Rad52 foci rapidly accumulated in response to HU treatment, suggesting the checkpoint inhibits homologous recombination during replicative stress85. Consistent with the idea that alternative and potentially dangerous DNA structures form at stalled replication forks when the S-phase checkpoint is lost, in S. pombe the DNA damage checkpoint is activated upon HU treatment when Cds1 is absent86.

15

Figure 1.5. Potential targets of ATR and Chk1 necessary for replication fork stabilization (A) ATR stabilizes the association of polymerases with stalled forks to facilitate replication progression. This may occur through phosphorylation of the polymerases themselves, RPA and/or other proteins. At least in S. cerevisiae, this function appears independent of Chk1. (B) ATR modulates RecQ helicase activity, possibly via phosphorylation. Because these prevent the formation of aberrant replication fork structures and recombination intermediates, this promotes formation of a stable fork structure and suppresses homologous recombination. (C) In S. pombe, Cds1 inhibits homologous recombination and double-strand break formation by phosphorylating the endonuclease Mus81 and causing its dissociation from chromatin. There are several other possible targets of the ATR pathway as well relevant to recombination (see Text). (D) ATR and Chk1 may keep the replisome in a stable conformation by limiting MCM-mediated unwinding at stalled forks. This prevents extensive ssDNA accumulation which can cause the fork to fold into alternative structures.

16

Mus81

ATR may have a number of targets directly relevant to the inhibition of homologous recombination at stalled replication forks84, 85. One recombination protein that appears to be negatively regulated in a checkpoint-dependent manner includes Mus81. Mus81 is a structure-specific endonuclease that forms a complex with Eme1 to cleave structures that form at stalled replication forks87. In S. pombe, Mus81 is phosphorylated after DNA damage by the Cds1 kinase78, and this phosphorylation promotes its dissociation from chromatin. Thus, Cds1 phosphorylation of Mus81 may inhibit the initiation of recombination at stalled or reversed forks by preventing fork cleavage (Fig. 1.5C). In mammalian cells, Chk2, an effector of ATM and ATR60, has been shown to interact with Mus81, although the relevance of this interaction is not yet known88.

Exo1 processing of stalled replication forks

Another key target of the replication checkpoint at sites of stalled DNA replication is the Exo1 nuclease. In yeast, the fork collapse seen in rad53 cells exposed to MMS can be alleviated by deletion of Exo189, thus deletion allows the rescue of both DNA replication and cell viability. While, it was also shown that Exo1 was needed for fork collapse in rad53 cells in the presence of the replication fork stalling agent, HU, replication and cell viability could not be restored by double deletion89, 90. These results suggest that Rad53 has both Exo1 dependent and independent roles at HU stalled replication forks, but at DNA replication forks challenged by DNA damage (MMS treatment), the function of rad53 is mainly executed through Exo1. While the role of Exo1 at stalled replication forks is still under investigation, an attractive hypothesis would be that upon replication fork stalling, the replication checkpoint would regulate Exo1 to prevent it from cleaving stalled replication fork structures into potentially harmful double-strand breaks. Along these lines, Exo1 has been shown to be phosphorylated by rad53 when cells are treated with MMS91.

17

RecQ helicases

The RecQ family of helicases is another group of proteins which are required to stabilize replication forks. In S. cerevisiae, this group includes only one family member, Sgs1, while human cells contain five RecQ homologs including BLM and WRN92. Sgs1 and BLM have been found to colocalize with replication forks during normal, unpertubed replication and in response to replicative stress47, 92, and WRN is recruited to replication foci after HU treatment92. RecQ family members exhibit 3’ to 5’-helicase activity and are thought to act at stalled replication forks to keep aberrant structures from forming. This activity would promote polymerase association and inhibit recombination by clearing structures that would be preferred substrates for nucleases that cleave Holliday junctions and similar structures that form at stalled forks92, 93.

Consistent with these ideas, loss of Sgs1 leads to increased gross chromosomal rearrangements presumably as a result of elevated homologous and illegitimate recombination94. BLM- and WRN-deficient cells also exhibit an elevated rate of chromosomal instability, sister-chromatid exchanges, and aberrant replication intermediates92, 95. Moreover, the stabilization of Pol ε and Pol α at stalled replication forks depends specifically on the helicase activity of Sgs196. In addition, deletion of Sgs1 and Srs2, another helicase known to prevent homologous recombination, results in synthetic lethality, which can be rescued by deletion of Rad5197. Similarly, the decreased survival of WRN-deficient cells is suppressed by loss of Rad51-mediated homologous recombination, suggesting loss of WRN leads to deleterious homologous recombination98.

In higher eukaryotes, the ATR and RecQ helicase pathways are intricately intertwined. One study indicates ATR and Chk1 may regulate the phosphorylation and stabilization of BLM as well as its localization to stalled replication forks99. In addition, ATR interacts with BLM and phosphorylates it in response to replication fork stalling100, 101. Studies have also shown that ATR colocalizes with and phosphorylates WRN after replication fork collapse. However, it is not clear if the activity of WRN is regulated by ATR95. Together, these observations could indicate that ATR suppresses recombination by

18

promoting the ability of BLM and WRN to resolve “chicken-foot” structures and other replication fork blocking structures, ultimately preventing cleavage of the fork and recombination (Fig. 1.5B).

Other mechanisms of damage prevention during DNA replication

The aforementioned proteins and processes involved in maintaining replication fork stability are by no means an all inclusive list of the methods cells employ to maintain genome stability during DNA replication. Several additional methods that may be checkpoint regulated or checkpoint independent have been shown to affect DNA damage accumulation. These include formation limiting ssDNA accumulation49, either by inhibition of the replicative helicase (Fig. 1.5D), or by formation of a stable replisome pausing complex by the Timeless and Tipin related proteins102. Also, numerous DNA repair mechanisms are in place to respond to DNA damage throughout the cell cycle, many of which mutations cause a predisposition to cancer1, 6. Cleary, the cell goes to extraordinary measures to maintain genomic integrity, and while several methods for DNA damage prevention have been discovered, we have yet to gain an overall picture of all proteins and processes that may be involved.

1.5 Aims and contributions

While many proteins involved in maintaining genomic stability both during and outside of DNA replication have been demonstrated, they have failed to fully explain the expansive genomic instability found in cancerous cells and specifically sporadic cancers. This suggests there could be many more factors involve in preventing DNA damage that have yet to be discovered. Additionally, while a clear link between replication induced DNA damage and cancer is evident, the understanding of how cells maintain stalled replication forks and replication progression is still very limited. Unbiased approaches to determine all genes involved in preventing DNA damage could lend great insight into determining how genomic instability is facilitated in cancerous cells.

Here I explore the realm and scope of proteins involved in maintaining genomic stability. In the second chapter, I utilize a pair of genome-wide siRNA screens to look for novel

19 proteins that when down-regulated, induce DNA damage. From the results of this screening effort, the third and fourth chapters begin to investigate how some of the enriched functional groups within the screening hits, mRNA processing genes, and genes involved in the pathology of Charcot-Marie-Tooth disease, prevent the accumulation of DNA damage. And finally, the fifth chapter examines Set8 a specific protein that was found within our screening experiments to induce high levels of DNA damage when lost. This thesis work demonstrates that diverse mechanisms with little previous connection to the DNA damage response are needed to maintain genome stability, and determining how these processes are affected during cancer progression will be of great interest for future study.

20

REFERENCES

1. Negrini S, Gorgoulis VG, Halazonetis TD. Genomic instability--an evolving hallmark of cancer. Nat Rev Mol Cell Biol; 11:220-8.

2. Lengauer C, Kinzler KW, Vogelstein B. Genetic instability in colorectal cancers. Nature 1997; 386:623-7.

3. Draviam VM, Xie S, Sorger PK. Chromosome segregation and genomic stability. Curr Opin Genet Dev 2004; 14:120-5.

4. Fishel R, Lescoe MK, Rao MR, Copeland NG, Jenkins NA, Garber J, Kane M, Kolodner R. The human mutator gene homolog MSH2 and its association with hereditary nonpolyposis colon cancer. Cell 1993; 75:1027-38.

5. Tsai AG, Lieber MR. Mechanisms of chromosomal rearrangement in the . BMC genomics; 11 Suppl 1:S1.

6. Aguilera A, Gomez-Gonzalez B. Genome instability: a mechanistic view of its causes and consequences. Nature reviews 2008; 9:204-17.

7. Hanahan D, Weinberg RA. The hallmarks of cancer. Cell 2000; 100:57-70.

8. Loeb LA. A mutator phenotype in cancer. Cancer research 2001; 61:3230-9.

9. Sjoblom T, Jones S, Wood LD, Parsons DW, Lin J, Barber TD, Mandelker D, Leary RJ, Ptak J, Silliman N, Szabo S, Buckhaults P, Farrell C, Meeh P, Markowitz SD, Willis J, Dawson D, Willson JK, Gazdar AF, Hartigan J, Wu L, Liu C, Parmigiani G, Park BH, Bachman KE, Papadopoulos N, Vogelstein B, Kinzler KW, Velculescu VE. The consensus coding sequences of human breast and colorectal cancers. Science (New York, NY 2006; 314:268-74.

10. Wood LD, Parsons DW, Jones S, Lin J, Sjoblom T, Leary RJ, Shen D, Boca SM, Barber T, Ptak J, Silliman N, Szabo S, Dezso Z, Ustyanksky V, Nikolskaya T, Nikolsky Y, Karchin R, Wilson PA, Kaminker JS, Zhang Z, Croshaw R, Willis J, Dawson D, Shipitsin M, Willson JK, Sukumar S, Polyak K, Park BH, Pethiyagoda CL, Pant PV, Ballinger DG, Sparks AB, Hartigan J, Smith DR, Suh E, Papadopoulos N, Buckhaults P, Markowitz SD, Parmigiani G, Kinzler KW, Velculescu VE, Vogelstein B. The genomic landscapes of human breast and colorectal cancers. Science (New York, NY 2007; 318:1108-13.

11. Jones S, Zhang X, Parsons DW, Lin JC, Leary RJ, Angenendt P, Mankoo P, Carter H, Kamiyama H, Jimeno A, Hong SM, Fu B, Lin MT, Calhoun ES, Kamiyama M, Walter K, Nikolskaya T, Nikolsky Y, Hartigan J, Smith DR, Hidalgo M, Leach SD, Klein

21

AP, Jaffee EM, Goggins M, Maitra A, Iacobuzio-Donahue C, Eshleman JR, Kern SE, Hruban RH, Karchin R, Papadopoulos N, Parmigiani G, Vogelstein B, Velculescu VE, Kinzler KW. Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science (New York, NY 2008; 321:1801-6.

12. Parsons DW, Jones S, Zhang X, Lin JC, Leary RJ, Angenendt P, Mankoo P, Carter H, Siu IM, Gallia GL, Olivi A, McLendon R, Rasheed BA, Keir S, Nikolskaya T, Nikolsky Y, Busam DA, Tekleab H, Diaz LA, Jr., Hartigan J, Smith DR, Strausberg RL, Marie SK, Shinjo SM, Yan H, Riggins GJ, Bigner DD, Karchin R, Papadopoulos N, Parmigiani G, Vogelstein B, Velculescu VE, Kinzler KW. An integrated genomic analysis of human glioblastoma multiforme. Science (New York, NY 2008; 321:1807-12.

13. Ding L, Getz G, Wheeler DA, Mardis ER, McLellan MD, Cibulskis K, Sougnez C, Greulich H, Muzny DM, Morgan MB, Fulton L, Fulton RS, Zhang Q, Wendl MC, Lawrence MS, Larson DE, Chen K, Dooling DJ, Sabo A, Hawes AC, Shen H, Jhangiani SN, Lewis LR, Hall O, Zhu Y, Mathew T, Ren Y, Yao J, Scherer SE, Clerc K, Metcalf GA, Ng B, Milosavljevic A, Gonzalez-Garay ML, Osborne JR, Meyer R, Shi X, Tang Y, Koboldt DC, Lin L, Abbott R, Miner TL, Pohl C, Fewell G, Haipek C, Schmidt H, Dunford-Shore BH, Kraja A, Crosby SD, Sawyer CS, Vickery T, Sander S, Robinson J, Winckler W, Baldwin J, Chirieac LR, Dutt A, Fennell T, Hanna M, Johnson BE, Onofrio RC, Thomas RK, Tonon G, Weir BA, Zhao X, Ziaugra L, Zody MC, Giordano T, Orringer MB, Roth JA, Spitz MR, Wistuba, II, Ozenberger B, Good PJ, Chang AC, Beer DG, Watson MA, Ladanyi M, Broderick S, Yoshizawa A, Travis WD, Pao W, Province MA, Weinstock GM, Varmus HE, Gabriel SB, Lander ES, Gibbs RA, Meyerson M, Wilson RK. Somatic mutations affect key pathways in lung adenocarcinoma. Nature 2008; 455:1069-75.

14. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 2008; 455:1061-8.

15. Gorgoulis VG, Vassiliou LV, Karakaidos P, Zacharatos P, Kotsinas A, Liloglou T, Venere M, Ditullio RA, Jr., Kastrinakis NG, Levy B, Kletsas D, Yoneta A, Herlyn M, Kittas C, Halazonetis TD. Activation of the DNA damage checkpoint and genomic instability in human precancerous lesions. Nature 2005; 434:907-13.

16. Bartkova J, Rezaei N, Liontos M, Karakaidos P, Kletsas D, Issaeva N, Vassiliou LV, Kolettas E, Niforou K, Zoumpourlis VC, Takaoka M, Nakagawa H, Tort F, Fugger K, Johansson F, Sehested M, Andersen CL, Dyrskjot L, Orntoft T, Lukas J, Kittas C, Helleday T, Halazonetis TD, Bartek J, Gorgoulis VG. Oncogene-induced senescence is part of the tumorigenesis barrier imposed by DNA damage checkpoints. Nature 2006; 444:633-7.

17. Bartkova J, Horejsi Z, Koed K, Kramer A, Tort F, Zieger K, Guldberg P, Sehested M, Nesland JM, Lukas C, Orntoft T, Lukas J, Bartek J. DNA damage response as a candidate anti-cancer barrier in early human tumorigenesis. Nature 2005; 434:864- 70.

22

18. Di Micco R, Fumagalli M, Cicalese A, Piccinin S, Gasparini P, Luise C, Schurra C, Garre M, Nuciforo PG, Bensimon A, Maestro R, Pelicci PG, d'Adda di Fagagna F. Oncogene-induced senescence is a DNA damage response triggered by DNA hyper- replication. Nature 2006; 444:638-42.

19. O'Sullivan RJ, Karlseder J. : protecting chromosomes against genome instability. Nat Rev Mol Cell Biol; 11:171-81.

20. Kim HS, Patel K, Muldoon-Jacobs K, Bisht KS, Aykin-Burns N, Pennington JD, van der Meer R, Nguyen P, Savage J, Owens KM, Vassilopoulos A, Ozden O, Park SH, Singh KK, Abdulkadir SA, Spitz DR, Deng CX, Gius D. SIRT3 is a mitochondria- localized tumor suppressor required for maintenance of mitochondrial integrity and metabolism during stress. Cancer cell; 17:41-52.

21. Cortes-Ledesma F, Aguilera A. Double-strand breaks arising by replication through a nick are repaired by cohesin-dependent sister-chromatid exchange. EMBO Rep 2006; 7:919-26.

22. Lopes M, Foiani M, Sogo JM. Multiple mechanisms control chromosome integrity after replication fork uncoupling and restart at irreparable UV lesions. Molecular cell 2006; 21:15-27.

23. Heller RC, Marians KJ. Replication fork reactivation downstream of a blocked nascent leading strand. Nature 2006; 439:557-62.

24. Lenzmeier BA, Freudenreich CH. Trinucleotide repeat instability: a hairpin curve at the crossroads of replication, recombination, and repair. Cytogenetic and genome research 2003; 100:7-24.

25. Glover TW, Arlt MF, Casper AM, Durkin SG. Mechanisms of common fragile site instability. Hum Mol Genet 2005; 14 Spec No. 2:R197-205.

26. Arlt MF, Durkin SG, Ragland RL, Glover TW. Common fragile sites as targets for chromosome rearrangements. DNA repair 2006; 5:1126-35.

27. Aguilera A. The connection between transcription and genomic instability. The EMBO journal 2002; 21:195-201.

28. Kim N, Abdulovic AL, Gealy R, Lippert MJ, Jinks-Robertson S. Transcription- associated mutagenesis in yeast is directly proportional to the level of and influenced by the direction of DNA replication. DNA repair 2007; 6:1285-96.

29. Prado F, Aguilera A. Impairment of replication fork progression mediates RNA polII transcription-associated recombination. The EMBO journal 2005; 24:1267-76.

23

30. Li X, Manley JL. Inactivation of the SR protein splicing factor ASF/SF2 results in genomic instability. Cell 2005; 122:365-78.

31. Huertas P, Aguilera A. Cotranscriptionally formed DNA:RNA hybrids mediate transcription elongation impairment and transcription-associated recombination. Molecular cell 2003; 12:711-21.

32. Margueron R, Reinberg D. Chromatin structure and the inheritance of epigenetic information. Nature reviews; 11:285-96.

33. Ye X, Franco AA, Santos H, Nelson DM, Kaufman PD, Adams PD. Defective S phase chromatin assembly causes DNA damage, activation of the S phase checkpoint, and S phase arrest. Molecular cell 2003; 11:341-51.

34. Ramey CJ, Howar S, Adkins M, Linger J, Spicer J, Tyler JK. Activation of the DNA damage checkpoint in yeast lacking the histone chaperone anti-silencing function 1. Molecular and cellular biology 2004; 24:10313-27.

35. Tyler JK, Adams CR, Chen SR, Kobayashi R, Kamakaka RT, Kadonaga JT. The RCAF complex mediates chromatin assembly during DNA replication and repair. Nature 1999; 402:555-60.

36. van Attikum H, Fritsch O, Gasser SM. Distinct roles for SWR1 and INO80 chromatin remodeling complexes at chromosomal double-strand breaks. The EMBO journal 2007; 26:4113-25.

37. Papamichos-Chronakis M, Peterson CL. The Ino80 chromatin-remodeling enzyme regulates replisome function and stability. Nature structural & molecular biology 2008; 15:338-45.

38. Tourriere H, Pasero P. Maintenance of fork integrity at damaged DNA and natural pause sites. DNA repair 2007; 6:900-13.

39. Paulsen RD, Cimprich KA. The ATR pathway: fine-tuning the fork. DNA repair 2007; 6:953-66.

40. McGowan CH. Running into problems: how cells cope with replicating damaged DNA. Mutat Res 2003; 532:75-84.

41. Friedberg EC. Suffering in silence: the tolerance of DNA damage. Nat Rev Mol Cell Biol 2005; 6:943-53.

42. Branzei D, Foiani M. The DNA damage response during DNA replication. Current opinion in cell biology 2005; 17:568-75.

24

43. Lambert S, Carr AM. Checkpoint responses to replication fork barriers. Biochimie 2005; 87:591-602.

44. Lucca C, Vanoli F, Cotta-Ramusino C, Pellicioli A, Liberi G, Haber J, Foiani M. Checkpoint-mediated control of replisome-fork association and signalling in response to replication pausing. Oncogene 2004; 23:1206-13.

45. Zachos G, Rainey MD, Gillespie DA. Chk1-dependent S-M checkpoint delay in vertebrate cells is linked to maintenance of viable replication structures. Molecular and cellular biology 2005; 25:563-74.

46. Cobb JA, Schleker T, Rojas V, Bjergbaek L, Tercero JA, Gasser SM. Replisome instability, fork collapse, and gross chromosomal rearrangements arise synergistically from Mec1 kinase and RecQ helicase mutations. Genes & development 2005; 19:3055- 69.

47. Cobb JA, Bjergbaek L, Shimada K, Frei C, Gasser SM. DNA polymerase stabilization at stalled replication forks requires Mec1 and the RecQ helicase Sgs1. The EMBO journal 2003; 22:4325-36.

48. Carr AM. Checking that replication breakdown is not terminal. Science (New York, NY 2002; 297:557-8.

49. Sogo JM, Lopes M, Foiani M. Fork reversal and ssDNA accumulation at stalled replication forks owing to checkpoint defects. Science (New York, NY 2002; 297:599- 602.

50. Cha RS, Kleckner N. ATR homolog Mec1 promotes fork progression, thus averting breaks in replication slow zones. Science (New York, NY 2002; 297:602-6.

51. Kolodner RD, Putnam CD, Myung K. Maintenance of genome stability in Saccharomyces cerevisiae. Science (New York, NY 2002; 297:552-7.

52. Rothstein R, Michel B, Gangloff S. Replication fork pausing and recombination or "gimme a break". Genes & development 2000; 14:1-10.

53. Hyrien O. Mechanisms and consequences of replication fork arrest. Biochimie 2000; 82:5-17.

54. Cimprich KA, Cortez D. ATR: an essential regulator of genome integrity. Nat Rev Mol Cell Biol 2008.

55. Branzei D, Foiani M. Interplay of replication checkpoints and repair proteins at stalled replication forks. DNA repair 2007; 6:994-1003.

25

56. Harper JW, Elledge SJ. The DNA damage response: ten years after. Molecular cell 2007; 28:739-45.

57. Zhou BB, Elledge SJ. The DNA damage response: putting checkpoints in perspective. Nature 2000; 408:433-9.

58. Melo J, Toczyski D. A unified view of the DNA-damage checkpoint. Current opinion in cell biology 2002; 14:237-45.

59. Osborn AJ, Elledge SJ, Zou L. Checking on the fork: the DNA-replication stress- response pathway. Trends in cell biology 2002; 12:509-16.

60. Nyberg KA, Michelson RJ, Putnam CW, Weinert TA. Toward maintaining the genome: DNA damage and replication checkpoints. Annu Rev Genet 2002; 36:617-56.

61. Brown EJ, Baltimore D. ATR disruption leads to chromosomal fragmentation and early embryonic lethality. Genes & development 2000; 14:397-402.

62. Cortez D, Guntuku S, Qin J, Elledge SJ. ATR and ATRIP: partners in checkpoint signaling. Science (New York, NY 2001; 294:1713-6.

63. Brown EJ, Baltimore D. Essential and dispensable roles of ATR in cell cycle arrest and genome maintenance. Genes & development 2003; 17:615-28.

64. Liu Q, Guntuku S, Cui XS, Matsuoka S, Cortez D, Tamai K, Luo G, Carattini- Rivera S, DeMayo F, Bradley A, Donehower LA, Elledge SJ. Chk1 is an essential kinase that is regulated by Atr and required for the G(2)/M DNA damage checkpoint. Genes & development 2000; 14:1448-59.

65. Takai H, Tominaga K, Motoyama N, Minamishima YA, Nagahama H, Tsukiyama T, Ikeda K, Nakayama K, Nakanishi M, Nakayama K. Aberrant cell cycle checkpoint function and early embryonic death in Chk1(-/-) mice. Genes & development 2000; 14:1439-47.

66. Zachos G, Rainey MD, Gillespie DA. Chk1-deficient tumour cells are viable but exhibit multiple checkpoint and survival defects. The EMBO journal 2003; 22:713-23.

67. Nghiem P, Park PK, Kim Y, Vaziri C, Schreiber SL. ATR inhibition selectively sensitizes G1 checkpoint-deficient cells to lethal premature chromatin condensation. Proceedings of the National Academy of Sciences of the United States of America 2001; 98:9092-7.

68. Durkin SG, Arlt MF, Howlett NG, Glover TW. Depletion of CHK1, but not CHK2, induces chromosomal instability and breaks at common fragile sites. Oncogene 2006; 25:4381-8.

26

69. Tercero JA, Longhese MP, Diffley JF. A central role for DNA replication forks in checkpoint activation and response. Molecular cell 2003; 11:1323-36.

70. Tercero JA, Diffley JF. Regulation of DNA replication fork progression through damaged DNA by the Mec1/Rad53 checkpoint. Nature 2001; 412:553-7.

71. Lopes M, Cotta-Ramusino C, Pellicioli A, Liberi G, Plevani P, Muzi-Falconi M, Newlon CS, Foiani M. The DNA replication checkpoint response stabilizes stalled replication forks. Nature 2001; 412:557-61.

72. Dimitrova DS, Gilbert DM. Temporally coordinated assembly and disassembly of replication factories in the absence of DNA synthesis. Nature cell biology 2000; 2:686- 94.

73. Feijoo C, Hall-Jackson C, Wu R, Jenkins D, Leitch J, Gilbert DM, Smythe C. Activation of mammalian Chk1 during DNA replication arrest: a role for Chk1 in the intra-S phase checkpoint monitoring replication origin firing. The Journal of cell biology 2001; 154:913-23.

74. Petermann E, Maya-Mendoza A, Zachos G, Gillespie DA, Jackson DA, Caldecott KW. Chk1 requirement for high global rates of replication fork progression during normal vertebrate S phase. Molecular and cellular biology 2006; 26:3319-26.

75. Myung K, Chen C, Kolodner RD. Multiple pathways cooperate in the suppression of genome instability in Saccharomyces cerevisiae. Nature 2001; 411:1073-6.

76. Masumoto H, Sugino A, Araki H. Dpb11 controls the association between DNA polymerases alpha and epsilon and the autonomously replicating sequence region of budding yeast. Molecular and cellular biology 2000; 20:2809-17.

77. Aparicio OM, Stout AM, Bell SP. Differential assembly of Cdc45p and DNA polymerases at early and late origins of DNA replication. Proceedings of the National Academy of Sciences of the United States of America 1999; 96:9130-5.

78. Kai M, Boddy MN, Russell P, Wang TS. Replication checkpoint kinase Cds1 regulates Mus81 to preserve genome integrity during replication stress. Genes & development 2005; 19:919-32.

79. Post SM, Tomkinson AE, Lee EY. The human checkpoint Rad protein Rad17 is chromatin-associated throughout the cell cycle, localizes to DNA replication sites, and interacts with DNA polymerase epsilon. Nucleic Acids Res 2003; 31:5568-75.

80. Bao S, Tibbetts RS, Brumbaugh KM, Fang Y, Richardson DA, Ali A, Chen SM, Abraham RT, Wang XF. ATR/ATM-mediated phosphorylation of human Rad17 is required for genotoxic stress responses. Nature 2001; 411:969-74.

27

81. Post S, Weng YC, Cimprich K, Chen LB, Xu Y, Lee EY. Phosphorylation of serines 635 and 645 of human Rad17 is cell cycle regulated and is required for G(1)/S checkpoint activation in response to DNA damage. Proceedings of the National Academy of Sciences of the United States of America 2001; 98:13102-7.

82. Pellicioli A, Lucca C, Liberi G, Marini F, Lopes M, Plevani P, Romano A, Di Fiore PP, Foiani M. Activation of Rad53 kinase in response to DNA damage and its effect in modulating phosphorylation of the lagging strand DNA polymerase. The EMBO journal 1999; 18:6561-72.

83. Lemoine FJ, Degtyareva NP, Lobachev K, Petes TD. Chromosomal translocations in yeast induced by low levels of DNA polymerase a model for chromosome fragile sites. Cell 2005; 120:587-98.

84. Meister P, Taddei A, Vernis L, Poidevin M, Gasser SM, Baldacci G. Temporal separation of replication and recombination requires the intra-S checkpoint. The Journal of cell biology 2005; 168:537-44.

85. Lisby M, Barlow JH, Burgess RC, Rothstein R. Choreography of the DNA damage response: spatiotemporal relationships among checkpoint and repair proteins. Cell 2004; 118:699-713.

86. Lindsay HD, Griffiths DJ, Edwards RJ, Christensen PU, Murray JM, Osman F, Walworth N, Carr AM. S-phase-specific activation of Cds1 kinase defines a subpathway of the checkpoint response in Schizosaccharomyces pombe. Genes & development 1998; 12:382-95.

87. West SC. Molecular views of recombination proteins and their control. Nat Rev Mol Cell Biol 2003; 4:435-45.

88. Chen XB, Melchionna R, Denis CM, Gaillard PH, Blasina A, Van de Weyer I, Boddy MN, Russell P, Vialard J, McGowan CH. Human Mus81-associated endonuclease cleaves Holliday junctions in vitro. Molecular cell 2001; 8:1117-27.

89. Segurado M, Diffley JF. Separate roles for the DNA damage checkpoint protein kinases in stabilizing DNA replication forks. Genes & development 2008; 22:1816-27.

90. Cotta-Ramusino C, Fachinetti D, Lucca C, Doksani Y, Lopes M, Sogo J, Foiani M. Exo1 processes stalled replication forks and counteracts fork reversal in checkpoint- defective cells. Molecular cell 2005; 17:153-9.

91. Smolka MB, Albuquerque CP, Chen SH, Zhou H. Proteome-wide identification of in vivo targets of DNA damage checkpoint kinases. Proceedings of the National Academy of Sciences of the United States of America 2007; 104:10364-9.

28

92. Khakhar RR, Cobb JA, Bjergbaek L, Hickson ID, Gasser SM. RecQ helicases: multiple roles in genome maintenance. Trends in cell biology 2003; 13:493-501.

93. Bachrati CZ, Borts RH, Hickson ID. Mobile D-loops are a preferred substrate for the Bloom's syndrome helicase. Nucleic Acids Res 2006; 34:2269-79.

94. Myung K, Datta A, Chen C, Kolodner RD. SGS1, the Saccharomyces cerevisiae homologue of BLM and WRN, suppresses genome instability and homeologous recombination. Nat Genet 2001; 27:113-6.

95. Pichierri P, Franchitto A. Werner syndrome protein, the MRE11 complex and ATR: menage-a-trois in guarding genome stability during DNA replication? Bioessays 2004; 26:306-13.

96. Bjergbaek L, Cobb JA, Tsai-Pflugfelder M, Gasser SM. Mechanistically distinct roles for Sgs1p in checkpoint activation and replication fork maintenance. The EMBO journal 2005; 24:405-17.

97. Gangloff S, Soustelle C, Fabre F. Homologous recombination is responsible for cell death in the absence of the Sgs1 and Srs2 helicases. Nat Genet 2000; 25:192-4.

98. Saintigny Y, Makienko K, Swanson C, Emond MJ, Monnat RJ, Jr. Homologous recombination resolution defect in werner syndrome. Molecular and cellular biology 2002; 22:6971-8.

99. Sengupta S, Robles AI, Linke SP, Sinogeeva NI, Zhang R, Pedeux R, Ward IM, Celeste A, Nussenzweig A, Chen J, Halazonetis TD, Harris CC. Functional interaction between BLM helicase and 53BP1 in a Chk1-mediated pathway during S-phase arrest. The Journal of cell biology 2004; 166:801-13.

100. Li W, Kim SM, Lee J, Dunphy WG. Absence of BLM leads to accumulation of chromosomal DNA breaks during both unperturbed and disrupted S phases. The Journal of cell biology 2004; 165:801-12.

101. Davies SL, North PS, Dart A, Lakin ND, Hickson ID. Phosphorylation of the Bloom's syndrome helicase and its role in recovery from S-phase arrest. Molecular and cellular biology 2004; 24:1279-91.

102. Katou Y, Kanoh Y, Bando M, Noguchi H, Tanaka H, Ashikari T, Sugimoto K, Shirahige K. S-phase checkpoint proteins Tof1 and Mrc1 form a stable replication- pausing complex. Nature 2003; 424:1078-83.

29

30

CHAPTER 2

siRNA Screens Discover Novel Regulators of Genomic Integrity and Replication Fork Stability

Parts of this chapter have been adapted with permission from:

“A Genome-wide siRNA Screen Reveals Diverse Cellular Processes and Pathways that Mediate Genome Stability” Paulsen RD, Soni DV, Wollman R, Hahn AH, Yee MC, Guan A, Hesley JA, Miller SC, Cromwell EF, Solow-Cordero DE, Meyer T, and Cimprich KA. Molecular Cell, 2009

Contributions

In this chapter, Renee Paulsen and Deena Soni performed the genome-wide siRNA screen and deconvolution experiments. Roy Wollman developed and implemented the statistical analysis method, and Renee Paulsen performed the bioinformatic analysis, the secondary screening experiments, and cell cycle analysis.

2.1 Introduction

2.1.1 Rational for a genome-wide siRNA screen looking for proteins involved in preventing DNA damage

One of the hallmark phenotypes of cancerous cells is genomic instability, but how the genomic instability arises is still in many cases unknown1. Genome instability can take the form of a variety of genetic alterations that range in complexity from point mutations to the loss or gain of whole chromosomes or gross chromosomal rearrangements (GCRs). Some of these alterations are a direct result of DNA damage and a failure to repair that damage in an error-free manner. Indeed, translocations, deletions, inversions, and duplications are all forms of GCRs that arise from the formation of DNA double-strand breaks (DSB)2. Genome instability can also arise due to mutations in tumor suppressors or oncogenes that cause a loss of cell cycle control and thus more cellular stress and DNA

31

damage3. The ability of cells to maintain and regulate genome stability is critical for homeostasis, and defects in the maintenance of genome stability underlie a number of developmental disorders and human diseases, including cancer and premature aging 4-7. A comprehensive analysis of what genes are capable of inducing DNA damage when lost is critical to understanding how the cell prevents DNA damage from occurring and thus prevents oncogenic transformation.

A particularly challenging period of the cell cycle for maintaining genome stability is during DNA replication, when the DNA is unwound and vulnerable to processing8. Replication forks encounter many lesions during a typical S-phase, and each stalled fork must be dealt with properly to prevent DNA damage9, 10. A critical method cells have evolved for maintaining genome stability during S-phase is the stabilization of stalled replication forks for the proper resumption of DNA synthesis once the offending lesion has been removed. The DNA damage checkpoint is responsible for facilitating replication fork stabilization, along with inducing cell cycle arrest, and coordinating different repair processes with each other and with cell cycle progression9-12. Key effectors of this checkpoint are two kinases, ATR (ATM and Rad3-related) and its downstream effector Chk113. Although much work has been done to characterize the events that lead to the checkpoint’s cell cycle arrest functions, little is known about how checkpoint proteins, or independent pathways, mediate stabilization of the replication fork13-16.

2.1.2 Functional genomic screens

Advances in biology are strongly influenced by the technological developments that become available to researchers. For years, systematic screening of the genome has been a reality in model systems such as Escherichia coli, Saccharomyces cerevisiae, and Drosophila melanogaster, which allowed researchers the ability to assign function to specific genes. However only recently, with the advance of siRNA technology, have researchers had the ability to knock down the expression of individual genes in mammalian systems on a genome-wide scale17. This technology provides a powerful

32

new method for the analysis of cell signaling pathways as well as the ability to find the functional significance of genes relating to disease phenotypes such as cancer. siRNA screening libraries generally fall into two categories: DNA-based or RNA- based18. The DNA-based libraries utilize vectors that encode for the expression of double-stranded RNA molecules. These vectors can be transfected into cells or alternatively, cloned into viral-based vectors for infection of hard to transfect cell populations. The second category, RNA-based, utilize synthetic double-stranded RNA oligonucleotides which are transfected directly into the cells of interest. Both technologies have their limitations, such as inefficient knockdown and off-target effects of the encoded siRNA. Therefore, false negatives and false positives in an siRNA screen are expected, and multiple methods of confirmation for the phenotypes observed are required to be confident of the acquired results19.

2.1.3 A readout for DNA damage: histone H2AX phosphorylation

To study what proteins are needed to prevent DNA damage from occurring, a quantifiable readout for DNA damage induction is needed. Phosphorylation of the histone variant H2AX serves as an early mark of DNA damage20, 21. H2AX is a histone variant of the H2A histone family and comprises between 10-15% of the H2A nucleosome occupancy in mammalian cells20. However, it differs from the other H2A family members due to the presence of a C-terminal (carboxy-terminal) tail. On this tail, Ser-139 is phosphorylated in response to DNA damage by the phosphatidylinosital- kinase related protein kinases (PIKKs), ATM (ataxia telangiectasia mutated), ATR, and DNA-PKcs (DNA-dependent protein kinase, catalytic subunit)22-24 (Fig. 2.1A). The phospho-H2AX signal (also known as γH2AX) spreads through at least a 50kB region on either side of a DNA double-strand break (DSB) in mammalian cells25 (Fig. 2.1B), and it has multiple functions, acting to amplify and maintain the checkpoint as well as recruit downstream repair proteins20, 26. Modification of γH2AX occurs primarily following DSB formation, although it may also occur during replication stress22. As such, it is indicative of the tumorigenic events that occur early in the progression of many cancer

33 types and an early marker of genome instability. Indeed, H2AX phosphorylation is frequently observed in premalignant lesions27, 28.

Figure 2.1: H2AX phosphorylation as a readout of DNA damage. (A) When DNA damage occurs, ATM, ATR and DNA-PK can phosphorylate S139 on the C terminal tail of histone H2AX. (B) This signal spreads throughout an extended region of DNA on either side of the break greatly amplifying the signal observed for a single break and serving as a positive feedback mechanism.

2.1.4 Synopsis

Here, we performed two unbiased, genome-wide siRNA screens in human cells using H2AX phosphorylation as a readout to obtain a global understanding of the different molecular pathways that prevent genome instability. The first screen examines which genes cause DNA damage inherently when down regulated, and the second looks for those whose loss leads to DNA damage specifically during S-phase. Using an approach

34 that integrates bioinformatics and functional analysis of our hits, we uncovered a variety of additional genes and molecular networks that maintain genome stability. Our findings suggest that diverse mechanisms with little previous connection to the DNA damage response are needed to maintain genome stability, including proteins involved in mRNA processing, nuclear pore proteins, pericentric chromatin binding proteins, and proteins involved in the pathology of Charcot-Marie-Tooth disease.

2.2 Results of the genome wide siRNA screens

2.2.1 Design and implementation of a genome-wide siRNA screen for novel inducers of H2AX phosphorylation

We designed a pair of genome-wide siRNA screens to look for genes necessary for maintaining genome stability in general (Fig. 2.2A), and to look for genes specifically necessary for replication fork stability (Fig. 2.2B). To ask these two questions, we decided to screen the human genome both in the absence (NT) and presence of a low dose (400 nM) of the drug aphidicolin (AP), an inhibitor of the replicative DNA polymerases (alpha, delta and epsilon). This dose of aphidicolin slows but does not arrest DNA replication, and does not cause H2AX phosphorylation when all proteins required for replication fork stabilization are functional. However, when ATR or Chk1 is deleted, this low dose of aphidicolin is known to induce fragile site breakage, suggesting an inability to stabilize stalled replication forks29, 30. Therefore, we reasoned that when a gene needed for fork stabilization was silenced, stalled forks would break down and/or be unable to recover, leading to DNA double-strand breaks (DSBs) and an increase in H2AX phosphorylation (Fig. 2.2B). Moreover, we expected that this increase would be enhanced by aphidicolin. In contrast, we expected that agents causing chromosomal instability by some other means (e.g. cell cycle control, managing oxidative stress, etc) would be apparent in non-drug treated screen (NT) and would not be enhanced by the presence of drug.

During the screening process it became apparent that the presence of aphidicolin enhanced the hit rate of the siRNA screen, and many secondary assays would be necessary to decipher specifically what genes have roles in replication fork stability

35 versus genes that have a higher γH2AX signal due to an enrichment in the S-phase population. Therefore, we proceeded to analyze and validate the non-drug treated screening condition (NT), of which the work will be discussed in the following sections. The aphidicolin treated siRNA screen continues to be a work in progress and the hits ensuing from that screen will only be discussed in a general context. However, the results from the NT screen indicated that preservation of genome stability is mediated by a much larger network of biological processes than previously appreciated.

Figure 2.2: Rationale for the genome-wide siRNA screens (A) Unbiased DNA damage screening hypothesis (NT): When a gene required for the maintenance of genomic stability is depleted by an siRNA, H2AX phosphorylation (γH2AX) will be observed. (B) Screening hypothesis for genes involved in replication fork stability (AP): When replication is stressed with a low dose of the replication polymerase inhibitor in wild-type cells, all components necessary to stabilize stalled replication forks are present and replication forks remain intact (top path). However, when a gene necessary to stabilize a stalled fork is down-regulated by an siRNA, the replication fork will collapse into a DSB resulting in a gain of γH2AX signal (bottom path).

36

2.2.2 Screen for genes involved in maintaining genomic stability (NT dataset)

We carried out our siRNA screen in HeLa cells using the Thermofisher siGenome library. The library targets ~21,000 genes and was arrayed in pools of four individual siRNA duplexes per gene. Cells were stained with antibodies to γH2AX to measure DNA damage and with propidium iodide (PI) to determine cell cycle distribution. Images were collected on a laser scanning fluorimeter, allowing analysis of cell number and quantification of γH2AX intensity and DNA content on a single-cell basis. The screen was performed in duplicate with a non-targeting siRNA pool as the negative control and a siRNA pool targeting the replication checkpoint kinase Chk1 as the positive control (Fig. 2.3A, B). Pools were used at 25 nM total siRNA concentration to minimize off-target effects.

The data were normalized to account for plate-to-plate and day-to-day variation in the screen. Because there is a slight increase in γH2AX intensity with increasing DNA content31, we normalized the raw γH2AX intensity (Fig. 2.3C) to correct for DNA content on a single-cell basis. Normalization was done by first transforming the γH2AX into log-scale to account for the tail in γH2AX distributions. Second, we fitted a regression line for the negative control cells in each plate to estimate the expected γH2AX signal for each PI intensity. Finally, for each cell we adjusted the observed γH2AX by subtracting the estimated γH2AX value obtained by linear regression for a given PI intensity. As a result of this data normalization, we obtained a single value per cell (Fig. 2.3D, y-axis) referred to as the adjusted γH2AX intensity. The percentage of γH2AX positive cells (γH2AX+) was then calculated for each siRNA tested using an intensity threshold determined using the eight replicates of the negative and positive control cells from each plate (Fig. 2.3D & Table S1). Further details are provided in the screening methods appendix A. Duplicate measurements resulted in little variation, with an average correlation coefficient between replicas of 0.66 ± 0.11 (mean ± SD) (Fig. 2.3E). The overall Z’ factor calculated for the screen was 0.68 (Fig. 2.3F), suggesting that our assay had a robust signal-to-noise ratio.

37

Figure 2.3. siRNA screen for genes suppressing H2AX phosphorylation. Images of siControl (A) or siChk1 (B) transfected cells stained with anti-γH2AX antibodies and PI acquired on the Isocyte™. (C) Scatter plot of the raw γH2AX and PI intensity for negative (siControl, blue) and positive (siChk, red) controls. Each dot represents the γH2AX intensity of a single cell as a function of PI intensity. (D) The same cells from panel C shown after normalization. Right portion shows the histograms of each population. The dotted horizontal line reflects the adjusted γH2AX intensity cutoff used to designate a cell as γH2AX positive. (E) Deviation between duplicates of the screen is shown by plotting the first replicate against the second for each siRNA tested. Individual colors indicate the day in which the siRNA pool was tested. Inset shows the histogram of correlation coefficients for all plates analyzed. (F) The Z’ factor for each plate.

38

To assign significance to individual genes, we took into account the proportion of γH2AX+ cells and the reproducibility between duplicates. The negative controls from the multiple plates analyzed on a given day allowed us to control for the background γH2AX staining, as well as the variation observed within a single day. From these measurements, we calculated a p-value for each well using a statistical method we developed (Table S2). Because the large number of statistical tests performed in genome-wide siRNA screens creates the potential for a large number of false positives32, we also implemented a four- tier method to account for the degree of reproducibility between the replicates by defining two p-value cutoffs using a false discovery rate (FDR) correction (see Appendix A for further detail).

Genes in the most significant level (group 4, 581 genes) have a p-value for both replicates lower than the FDR corrected level (p-value < 0.0042). Genes in the next level (group 3, 206 genes) have p-values that are significant at the FDR level for one replicate and that fall below the traditional level of p = 0.05 for the second replicate. Genes in the third level (group 2, 1451 genes) have a p-value < 0.05 for only one replicate. Genes within these three levels also have a γH2AX signal that scored within the top 25% of the genome. We also created a final group containing those genes which have a strong, albeit statistically insignificant signal, since the stringent statistical procedure we used is likely to omit true biological hits. This group includes all genes not included in other groups with an average γH2AX signal in the top 5% of the genome (group 1, 164 genes). The γH2AX signals observed, cell viability, and cell cycle distributions for all genes tested can be found in Table S1, and those falling into the top four significance groups can be found in Table S2.

2.2.2.1 Bioinformatic analysis of screening hits

To survey the spectrum of biological functions within our candidate genes, we utilized PANTHER (Protein Analysis through Evolutionary Relationships) 33 on the genes found within our top significance group (group four). siRNA pools that caused extensive cell death (<400 cells at 72 h, a value <50% of the cells originally plated) (Table S3) were eliminated from this and subsequent analyses. The predominant categories of genes we

39

found included those with roles in nucleoside, nucleotide, and nucleic acid metabolism, as might be expected for effectors of genome stability, protein metabolism/modification, and signal transduction, as well as many genes with unclassified functions (Fig. 2.4A, B).

Figure 2.4. Functional classification of statistically significant gene set. (A) Genes identified by statistical methods described in the text were grouped by biological process using PANTHER (http://www.pantherdb.org). (B) Detailed protein categorization of the nucleoside, nucleotide, and nucleic acid metabolism category from (A)

To determine if our strongest γH2AX effectors (group 4) were enriched for any groups of genes involved in known biological processes in a statistically significant manner, we functionally categorized our hits using the DAVID bioinformatics database (http://david.abcc.ncifcrf.gov/)34 and Ingenuity pathway analysis (Ingenuity Systems, www.ingenuity.com). The genes were categorized according to GO (gene ontology) terms (biological process, cellular complex, molecular function), protein information resource keywords, or the OMIM/Genetic Association disease datasets. As might be expected, we found that genes involved in the cell cycle, cancer, DNA replication and repair were enriched in our data set, providing confidence in our results. Surprisingly, we also found that genes involved in RNA post-transcriptional modification and splicing represented the most significantly enriched categories of genes (Fig. 2.5, Table S4).

40

Figure 2.5. Classification enrichments of statistically significant gene set. (A) Classification enrichment was determined using the David bioinformatic database and Ingenuity Pathway Analysis, and the right-tailed Fisher’s exact test. The threshold of significance was applied for –log (p=0.05).

2.2.2.2 siRNA screening hit validation

Next, we chose ~350 genes to validate using multiple individual siRNAs (deconvolution) (Table S5). The genes were chosen based on our significance analysis, relevant literature information, and/or functional categorization. Additionally, we selected a few siRNA targets with borderline γH2AX signals that were of biological interest or that functioned in pathways, processes or complexes found among the genes deemed significant. Several genes known to cause an increase in γH2AX upon knockdown, such as TopBP1 and ATR, had low signals, suggesting that use of a low concentration of pooled siRNA (25 nM) may have led to incomplete knockdown resulting in false negatives35. Therefore, we rationalized that further exploration of genes displaying lower levels of γH2AX could also reveal true hits.

For the chosen genes, the four individual siRNAs comprising the original screening pool were individually tested at 25nM using the same platform as the primary screen. A cell was considered positive if its γH2AX intensity was greater than a linear intensity

41

threshold set approximately three times the mean γH2AX intensity observed in siControl- treated cells (Fig. 2.6A). After applying the threshold, the median percentage of γH2AX+ cells was calculated from all siControl wells tested (1.6%), and siRNAs that displayed a value at least two standard deviations greater than the control value were considered positives (≥3%). This analysis revealed that 94% of the genes retested scored positive with at least one siRNA while 68% (231 genes) scored positive with two or more siRNAs (Fig. 2.6B). The majority of genes in the top two significance groups retested with multiple siRNAs (Fig. 2.6C).

Figure 2.6. Screening validation. (A) A representative well showing the γH2AX signal as a function of PI intensity for siControl- and siChk1-transfected cells. The percentage of γH2AX positive cells per well was calculated by applying a γH2AX intensity cutoff (horizontal line). (B) Table representing the effects of the four individual siRNAs tested during deconvolution. siRNAs were considered positive if the percentage of γH2AX+ cells was ≥2 SD of the value for siControl-transfected wells. (C) Table demonstrating the retest rate for genes chosen from the primary screening significance groups. (D) Bar graph showing the effect of targeting genes involved in DNA replication and checkpoint activation. Each bar represents an individual siRNA tested, and error bars indicate variation between duplicates.

42

Category Symbol Comments Apoptosis BIRC5 Baculoviral IAP repeat-containing 5 CASP8AP2 CASP8 associated protein 2 Cell Cycle CDCA5 Cell division cycle associated 5 CENPE Centromere protein E STK6 Aurora kinase A Channel SLC28A2 Solute carrier family 28 LRP1B Low density lipoprotein-related protein 1B Cytoskeletal-Related CKAP5 Cytoskeleton associated protein 5 KRTAP12-3 Keratin associated protein 12-3 KIF11 Kinesin family member 11 DNA Binding SON SON DNA binding protein PABPC4 Poly(A) binding protein, cytoplasmic 4 DNA replication/repair ERCC6 Excision repair cross-complementation group 6 XAB2 XPA binding protein 2 DBF4/ASK DBF4 homolog; activator of S phase kinase CHEK1/CHK1 Checkpoint kinase 1 homolog CLSPN Claspin homolog MCM10 Minichromosome maintenance complex protein 10 RPA1 Replication protein A1, 70kDa RPA2 , 32kDa RRM1 Ribonucleotide reductase M1 polypeptide RRM2 Ribonucleotide reductase M2 polypeptide SETD8/SET8 SET domain containing 8 TIMELESS Timeless homolog Other COPZ1 Coatomer protein complex, subunit zeta 1 CSA2 Cytokine, down-regulator of HLA II ITGB1 Integrin, beta 1 BRD8 Bromodomain containing 8 NCAN/CSPG3 Neurocan FAM24A Family with sequence similarity 24, me mbe r A FKBP6 FK506 binding protein 6 PSMD3 Proteasome 26S non-ATPase subunit 3 Polymerase POLR2G Polymerase (RNA) II (DNA-directed) polypeptide G POLR2I Polymerase (RNA) II (DNA-directed) polypeptide I RNA Splicing PRPF19/PRP19 PRP19/PSO4 pre-mRNA processing factor 19 homolog CRNKL1 Crooked neck pre-mRNA splicing factor-like 1 HNRPC Heterogeneous nuclear ribonucleoprotein C CWC22/KIAA1604 CWC22 spliceosome-associated pr ote in homolo g LSM2 LSM2 homolog, U6 small nuclear RNA-associated PRPF8 PRP8 pre-mRNA processing factor 8 homolog SART1 Squamo us cell carcinoma antigen recognized by T cells SF3A1 Splicing factor 3a, subunit 1 SF3A2 Splicing factor 3a, subunit 2 SF3B2 Splicing factor 3b, subunit 2 SF3B4 Splicing factor 3b, subunit 4 SKIIP SNW d omain containing 1; SKI-interacting protein SNRPA1 Small nuclear ribonucleoprotein polypeptide A1 SNRPB Small nuclear ribonucleoprotein polypeptides B and B1 SNRPD1 Small nuclear ribonucleoprotein D1 polypeptide SNRPD3 Small nuclear ribonucleoprotein D3 polypeptide WBP11 WW domain binding protein 11 AQRAquarius homolog CDC40 Cell division cycle 40 homolog CDC5L CDC5 cell division cycle 5-like Transcription Factor ZNF157 Zinc finger protein 157

Table 2.1 Screening hits that scored with 4 of 4 siRNAs tested.

43

Of the genes that scored positive when targeted by multiple siRNAs, those causing the largest increase in γH2AX+ cells were primarily proteins involved in DNA replication or checkpoint signaling (Fig. 2.6D). Both of these processes are known to have roles in preserving genomic stability, adding confidence to our validated data set4, 7. Other genes that scored with multiple siRNAs included proteins involved in a broad spectrum of functions including cell cycle control, DNA binding, ion flux, gene regulation, and RNA processing (Table 2.1 & Table S5).

2.2.2.3 siRNA screening protein interaction network analysis

To determine if we had identified and validated groups of genes that were common to previously characterized pathways or complexes, we used the statistically significant functional categories (Fig. 2.5, Table S4) identified from the genes in group four of the original screen to define networks of interacting proteins. Once a network of interacting proteins had been defined, we mapped our list of deconvoluted genes as well as genes found within significance groups 3, 2, or 1 onto these networks. Several interaction modules were identified encompassing expected pathways and those that had not been previously linked to genome maintenance (Table S6).

DNA replication, checkpoint, and repair modules

As we expected, our screen identified genes involved in DNA replication and checkpoint activation (Fig. 2.7). Among the genes we found to be involved in these processes are several components of the replication machinery, including the (RFC) complex, the single-strand DNA binding protein, replication protein A (RPA), the DNA -DNA polymerase alpha complex, and MCM10, a minichromosome maintenance protein. We also identified Timeless and Tipin, a complex of two proteins needed to activate the replication checkpoint36, as well as Set8, a histone methyltransferase needed for DNA replication37, 38. Other checkpoint proteins identified that play a role in S phase progression include Chk1, Claspin, TopBP1, and Dbf4, a regulator of the Cdc7 kinase.

44

Figure 2.7. Network modeling of screen hits identifies DNA replication and DNA repair functional groups linked to genome maintenance. Networks of interacting proteins identified using DAVID Bioinformatics Database and Ingenuity Pathway Analysis. Color indicates strength of statistical significance (green) or strength of deconvolution results (red). If a statistically significant gene was deconvoluted, the deconvolution result is preferentially shown.

Many DNA repair proteins were also well represented amongst our hits, including components of homologous recombination (HR) and nucleotide excision repair (NER) processes (Fig. 2.7). The HR portion of the module contains many proteins from the BRCA/Fanconi anemia (FA) pathway, which is required for double-strand break and cross-link repair. These proteins include the BRCA1-interacting protein, BRIP1(FANCJ), the BRCA2(FANCD1)-interacting protein C11ORF30/EMSY, and the additional FA components FANCM, FANCI, FANCC, FANCE and FANCA. Components of the ATR checkpoint, TopBP1, Claspin and Chk1, which regulate the FA/BRCA pathway, are also linked to this module. The NER portion of this module includes excision repair cross complementation group 6 (ERCC6/CSB), an ERCC6 and Xeroderma Pigmentosa A binding protein (XAB2), the interacting nucleases ERCC4 and

45

ERCC1, and GTF2H1, a component of TFIIH. Altogether, the significant representation of replication, checkpoint and repair genes among our validated hits provides confidence in our screening data.

mRNA processing module

Interestingly, the most significantly enriched interaction network contains proteins involved in mRNA processing. These hits are involved in different stages of mRNA processing, including RNA splicing, spliceosome assembly, mRNA surveillance, and mRNA export, with the majority having roles in RNA splicing (Fig. 2.8., Table 3.1). Recent studies have linked some mRNA processing genes to genome maintenance both in yeast and mammals4, 39. Based on these and a limited number of additional observations, we expected to identify a few mRNA processing proteins within our screen40-45. Strikingly, however, our studies revealed that mRNA processing is involved in preserving genomic integrity on a much broader scale.

Charcot-Marie-Tooth disease module

Genes involved in Charcot-Marie-Tooth disease (CMT) were also statistically enriched within our most stringent gene set (Fig. 2.8), and although the γH2AX signal was relatively low for this category of genes, we confirmed many of these hits through deconvolution. CMT is a clinically and genetically heterogeneous set of disorders of a relatively high prevalence that cause demyelinating and axonal neuropathies46, 47. Among the genes we found are peripheral myelin protein (Pmp22), whose mutation or altered gene dosage accounts for nearly 70% of all cases of hereditary neuropathies; gap junction protein beta 1 (GJB1), another commonly mutated gene in CMT patients; early growth response protein 2 (EGR2), a transcription factor that regulates the expression of myelin proteins including PMP22; SH3TC2, a protein of unknown function with a putative SH3 domain and tetracopeptide repeats, myotubularin-related protein 2 (MTMR2), a phosphatidylinositol phosphatase, and its interacting protein CMT4B2/MTMR1346-48. Although there is no reported connection between CMT and genome instability, defects in the DNA damage response have been linked to other neurodegenerative disorders6.

46

Figure 2.8. Network modeling of screen hits identifies mRNA processing and Charcot-Marie-Tooth functional groups linked to genome maintenance. Networks of interacting proteins identified using DAVID Bioinformatics Database and Ingenuity Pathway Analysis. Color indicates strength of statistical significance (green) or strength of deconvolution results (red). If a statistically significant gene was deconvoluted, the deconvolution result is preferentially shown.

Other screening modules

Our screen and significance analysis also identified several protein interaction networks with less defined links to H2AX phosphorylation that have yet to be extensively deconvoluted (Fig. 2.9). One of these networks contains pericentric binding proteins including components of the kinetochore, centromere and spindle assembly checkpoint. Defects in formation of the kinetochore or centromere could lead to defects in spindle

47 assembly and the mitotic checkpoint. Thus, the increase in γH2AX could be caused by premature mitosis and chromosome breaks due to incomplete decatenation49, 50. While additional work will be needed to validate this gene set, it is noteworthy that a screen for genes causing Rad52 foci in yeast also led to identification of several mitotic checkpoint genes51.

Figure 2.9. Network modeling of screen hits identifies nuclear pore and pericentric binding functional groups linked to genome maintenance. Networks of interacting proteins identified using DAVID Bioinformatics Database and Ingenuity Pathway Analysis. Color indicates strength of statistical significance (green) or strength of deconvolution results (red). If a statistically significant gene was deconvoluted, the deconvolution result is preferentially shown.

Components of the nuclear pore comprised another interesting group of genes significantly enriched among our hits. This link is of particular interest in light of studies linking the nuclear pore complex (NPC) and components of the nuclear periphery to DNA repair and DNA damage responses in several organisms52-55. Indeed, Nup107, a conserved component the NPC among our hits, appears to regulate repair of DSBs in yeast55 where recruitment of telomeres and persistent breaks to the nuclear pore and nuclear periphery may suppress potentially dangerous chromosomal rearrangements and promote certain types of repair56. While it appears as if some types of DSBs have limited mobility within the nucleus of mammalian cells, specific types of DNA damage, such as

48

deprotected telomeres, do exhibit increased mobility and may be able to undergo relocalization57. Thus, while further studies are clearly required, our data are consistent with the idea that the NPC may have a role in DNA damage processing in higher eukaryotes.

Other modules of genes we identified by deconvolution include a group of proteins involved in circadian rhythms, several genes involved in Wnt signaling, and several components of the GABA receptor (Fig. 2.10). Previous links between circadian rhythms and circadian rhythm proteins to cancer and tumor progression, and direct connections between proteins involved in circadian oscillations with the DNA damage response make this an interesting cluster36, 58.

Interestingly, the Nup107-Nup160 complex and other components of the NPC appear to link several of the modules we identified (Fig. 2.11). Nup107-Nup160 along with the interacting nucleoporins Seh1L and Elys/Mel-28, which were identified within the screen, interact with the kinetochore during mitosis59, 60. In addition, Elys interacts with the MCM2-7 helicase complex, and its loss sensitizes cells to replication stress54, 61. It is therefore tempting to speculate that some of the effects of NPC perturbations are due to problems with replication and the resolution or repair of stalled or collapsed forks, a hypothesis consistent with the model that the Nup84Nup107-Slx5/8 complex resolves DNA damage at collapsed forks in yeast55. Finally, it is worth noting that NPC components also interact with several mRNA processing proteins. Clearly further work will be required to validate the many hypotheses that arise from these hits, but the extensive nature of this network and many interconnections between known mediators of genome stability and the additional modules we identified suggests the preservation of genome stability might be coordinated by a larger network of biological processes than previously appreciated.

49

Figure 2.10. Network modeling of screen hits identifies circadian rhythm, GABA signaling, and WNT signaling functional groups linked to genome maintenance. Networks of interacting proteins identified using DAVID Bioinformatics Database and Ingenuity Pathway Analysis. Color indicates strength of statistical significance (green) or strength of deconvolution results (red). If a statistically significant gene was deconvoluted, the deconvolution result is preferentially shown.

50

Figure 2.11. Interconnections between functional groups linked to genome maintenance. The functional groups: Pericentric chromatin binding, nuclear pore, mRNA processing, DNA replication and DNA repair were examined for direct protein- protein interactions using Ingenuity pathway analysis.

51

2.2.3 Results of the targeted screen for novel regulators of replication fork stability

For the aphidicolin treated siRNA screen the data was analyzed in the same manner as the non-drug treated screen and hits were assigned into significance categories based on their false-discovery rate corrected p-value. With aphidicolin treatment we observed that the percent γH2AX positive cells in the Chk1 siRNA treated population increased from ~50% without drug treatment to ~85% with aphidicolin treatment. Therefore, we had a wider range of signal and thus our average Z’ factor for the aphidicolin screen rose to 0.85 (Fig. 2.12A). However, upon aphidicolin treatment, the number of hits within the top significance category, category 4, where both duplicate γH2AX measurements had significant FDR corrected p-values that rose by over 6 fold (Fig. 2.12B). The number of hits in category 2 and 1 dropped from the non-drug treated screen due to the better signal to noise ratio within the controls (Fig. 2.12B). As a whole, many of the genes identified in the presence of aphidicolin, also gave significant γH2AX signals without any drug treatment, suggesting that comparing strength of signal between the two datasets will likely be necessary for identifying genes involved specifically in replication fork stability.

Figure 2.12. Screen for genes suppressing replication fork stability. (A) The Z score for each aphidicolin plate tested. Average Z score for the aphidicolin dataset was 0.85. (B) Significance group breakdown of the aphidicolin data set.

52

2.2.3.1 Bioinformatic analysis of aphidicolin screening hits

To survey the spectrum of biological functions within the aphidicolin dataset, we again utilized PANTHER (Protein Analysis through Evolutionary Relationships) 33 on the genes found within our top significance group (group four). However, due to the high number of genes found within the category, we limited our analysis to category 4 genes that also had an H2AX signal that was within the top 90% of signals observed for the genome (Fig. 2.13A,B). We found that similar to the non-drug treated screen our largest grouping of genes were genes with unknown biological functions, followed by genes involved in nucleic acid metabolism. To determine if our strongest γH2AX effectors (group 4, top 90%) in the presence of aphidicolin were enriched for any groups of genes involved in known biological processes in a statistically significant manner, we again functionally categorized our hits using the DAVID bioinformatics database (http://david.abcc.ncifcrf.gov/)34 . The genes were categorized according to GO (gene ontology) terms (biological process, cellular complex, molecular function), protein information resource keywords, or the OMIM/Genetic Association disease datasets. As might be expected, we again found that genes involved in the cell cycle, cancer, DNA replication and repair were enriched in our data set, providing confidence in our results. While the genes involved in RNA post-transcriptional modification were still enriched within the aphidicolin dataset, the most enriched grouping of genes was now kinases, likely due to the enhanced cell signaling that might be occurring due to the presence of a cellular stress like aphidicolin (Fig. 2.13C)

53

Figure 2.13. Functional classification of the aphidicolin statistically significant gene set. (A&B) The genes in significance category 4 that also had a γH2AX signal in the top 90% of the genome (1635 genes) were analyzed according to biological process (A) or molecular function (B) using the PANTHER classification database. (C) Classification enrichment was determined using the David bioinformatic database and Ingenuity Pathway Analysis, and the right-tailed Fisher’s exact test. The threshold of significance was applied for –log (p=0.05).

54

2.2.4 Secondary Screening

2.2.4.1 53BP1 Staining

To validate that the γH2AX signal we observed both in the absence and presence of aphidicolin was indeed sites of DNA damage, and not just phosphorylation due to activation of cell cycle checkpoints, we looked at foci formation of the 53BP1 protein (p53 binding protein 1). 53BP1 is a DNA damage response protein that is recruited very efficiently and rapidly to sites of DNA double-strand breaks, and is known to form damage dependent nuclear foci62. Recruitment of 53BP1 to sites of DNA damage depends on a tandem tudor domain that recognizes methylated histone residues63-65 as well as additional elements that facilitate oligomerization and damage recognition66. Damage recognition is also facilitated by γH2AX although through an indirect mechanism as the phosphorylation binding BRCT (BRCA1 C-terminus) domains of 53BP1 are dispensable for its recruitment65, 67. Because 53BP1 is nuclear, we could not quantitate a gain of signal in response to siRNA treatment, and instead developed a method to count 53BP1 foci present after extraction of the proteins in the nucleus to remove any non-chromatin bound protein. Because one to two 53BP1 foci are normally found in cancer cell lines due to endogenous DNA break formation, we counted cells to be 53BP1 positive if they contained greater than five 53BP1 foci after gene knockdown. We tested 264 of the genes that we deconvoluted for their ability to induce 53BP1 foci, and while the correlation between γH2AX induction and 53BP1 formation was not linear, several high inducing γH2AX genes also caused a large number of 53BP1 foci to be present (Table S7). Lack of 53BP1 protein foci could indicate that the γH2AX signal is being perpetuated by a different type of damage other than double-strand breaks as H2AX is known to be phosphorylated under conditions of replicative stress as well as UV irradiation. Alternatively, since we assayed for damage at a relatively late time point, the 53BP1 foci may not be maintained until the time of fixation.

55

2.2.4.2 Phospho-H3 Recovery assay

We also designed an assay to help us focus on the genes with specific defects in progressing through replication by utilizing a replication recovery assay to measure how many of the cells were able to proceed into mitosis over a specific amount of time. Briefly, cells were transfected with siRNA for 48 hours before treating the cells with or without a high dose of aphidicolin (5uM) to halt replication for 6 hours. After the 6 hour period, the aphidicolin was washed out and the cells were allowed to recover in the presence of nocodazole to arrest cells in mitosis. After 18 hrs of incubation with nocodazole, the cells were fixed and stained for the mitotic marker phosphorylation of Histone 3 (Ser10).

We hypothesized that any gene that had a role specifically in replication fork stability would have normal cell cycle progression in the absence of drug, but would have a dramatic decrease in the number of cells that would make it into mitosis after aphidicolin treatment, due to the formation of double strand breaks from collapsed replication forks in S-phase. We found many genes that had defects progressing into mitosis specifically after aphidicolin treatment (Table S8) however, due to the design of the assay, we cannot conclusively say these siRNAs are causing defects in S-phase progression, rather than an arrest somewhere else in the cell cycle. Currently the assay has been redesigned to label the replicating fraction to eliminate this caveat, however, even without the replication marker, we were able to observe replication progression defects for many genes we expected, such as Timeless, the RPA subunits, along with several mRNA processing genes.

2.3 Discussion

A proper response to DNA damage is critical for the maintenance of genome stability, and it serves as a key barrier to the prevention of cancer. The screens described in this study have led to the identification of several hundred genes that when lost, induce the phosphorylation of H2AX, a robust and reliable marker for DNA damage. We expect there is a strong likelihood of identifying genes with functions in the DNA damage response pathway amongst these hits, as many genes known to be checkpoint regulated

56

were found within our hits. This is the first DNA damage response screen of its kind that has been reported in higher eukaryotes, and the results provide further insight into processes that prevent the formation and accumulation of DNA damage. Indeed, many of the genes and processes identified have not been previously linked to the formation of DNA damage, suggesting the events that contribute to genome instability may be more widespread than previously realized.

A number of the genes we identified exhibited a relatively high level of H2AX phosphorylation when knocked down, particularly those known to be involved in DNA replication and DNA damage responses. The genes involved in RNA splicing also caused a high level of H2AX phosphorylation. However, several hundred genes consistently led to low, but reproducible and significant levels of phosphorylation when targeted. While the high effectors are of obvious importance, those causing low levels of H2AX phosphorylation may also be of interest. Indeed, loss of function of these genes may be tolerated by the cell/organism and could drive genome instability and transformation, while those causing high levels of γH2AX seem more likely to cause cell death or senescence. In this respect, it is interesting that the level of genome instability linked to the CMT genes is relatively low.

For a majority of the genes identified in our screen, it seems likely the increase in γH2AX observed is due to increased spontaneous DNA damage. However, spontaneous or unrepaired DNA damage may not be the only reason for increased γH2AX. For example, H2AX phosphorylation could result from loss of the phosphatases that dephosphorylate γH2AX. In fact, we did identify subunits of the PP2A and PP4 phosphatase complexes that are involved in dephosphorylating γH2AX68-70. Some of the genes identified may also cause an increase in γH2AX via apoptosis; however, this category was largely eliminated by removing genes that caused overt and widespread cell death, as well as by setting nuclear area parameters to eliminate the identification of nuclear fragments.

We also have high hopes that the screen conducted in the presence of aphidicolin will help narrow our hits to genes involved in maintaining replication fork stability or genome stability during DNA replication by alternative mechanisms. Elucidating pathways

57

involved in these processes will hopefully shed light on new functions of the DNA damage response and connect it to pathways previously unknown to be regulated by checkpoint mechanisms.

Other screens assessing different aspects of the DNA damage response have been carried out in various organisms. For example, the formation of Rad52 foci was examined in Saccharomyces cerevisiae 51, and several screens were also carried out in Caenorhabditis elegans to identify genes affecting radiation sensitivity71, 72. Of the genes identified in these screens, many were also found in our data set suggesting that some of the properties measured by previous screens may be linked to increased γH2AX (Table S9). A proteomic analysis designed to identify the targets of the DNA damage protein kinases has also been carried out using mammalian cells73. Although this approach is orthogonal to ours, we found significant overlap in the genes and pathways identified by this method and our dataset (110 genes, p = 1.5 x 10-2) (Table S9). Interestingly, beyond specific gene overlap between screens, greater commonality was observed between the biological processes and pathways found, suggesting that while individual hits may vary from screen to screen, the enriched pathways observed may provide greater biological insight. For example, mRNA processing genes were also enriched in this proteomic analysis. Nevertheless, the majority of genes found in all studies were not found in the other, and we identified many additional genes and pathways of diverse function not previously linked to the DNA damage response. This indicates that our knowledge of this process is still incomplete and that the screens are not yet saturating. Further, it suggests that a systems biology approach utilizing many genomic datasets could ultimately prove useful in understanding the mechanisms underlying genomic stability.

Altogether, the results of our study indicate the pathways and processes affecting genome stability are much broader than anticipated, and our data provide links between the maintenance of genome stability and the kinetochore, the nuclear pore, mRNA processing and Charcot-Marie-Tooth disease. We expect there will be important roles for these genes and pathways in the DNA damage response, cancer, neurodegeneration, aging, and other human diseases, and the nature of these links will be of great interest for future study.

58

REFERENCES

1. Negrini S, Gorgoulis VG, Halazonetis TD. Genomic instability--an evolving hallmark of cancer. Nat Rev Mol Cell Biol; 11:220-8.

2. Tsai AG, Lieber MR. Mechanisms of chromosomal rearrangement in the human genome. BMC genomics; 11 Suppl 1:S1.

3. Halazonetis TD, Gorgoulis VG, Bartek J. An oncogene-induced DNA damage model for cancer development. Science (New York, NY 2008; 319:1352-5.

4. Aguilera A, Gomez-Gonzalez B. Genome instability: a mechanistic view of its causes and consequences. Nature reviews 2008; 9:204-17.

5. McKinnon PJ, Caldecott KW. DNA strand break repair and human genetic disease. Annual review of genomics and human genetics 2007; 8:37-55.

6. Rass U, Ahel I, West SC. Defective DNA repair and neurodegenerative disease. Cell 2007; 130:991-1004.

7. Kolodner RD, Putnam CD, Myung K. Maintenance of genome stability in Saccharomyces cerevisiae. Science (New York, NY 2002; 297:552-7.

8. Segurado M, Tercero JA. The S-phase checkpoint: targeting the replication fork. Biology of the cell / under the auspices of the European Cell Biology Organization 2009; 101:617-27.

9. Tourriere H, Pasero P. Maintenance of fork integrity at damaged DNA and natural pause sites. DNA repair 2007; 6:900-13.

10. Paulsen RD, Cimprich KA. The ATR pathway: fine-tuning the fork. DNA repair 2007; 6:953-66.

11. Lambert S, Froget B, Carr AM. Arrested replication fork processing: interplay between checkpoints and recombination. DNA repair 2007; 6:1042-61.

12. Branzei D, Foiani M. Interplay of replication checkpoints and repair proteins at stalled replication forks. DNA repair 2007; 6:994-1003.

13. Zhou BB, Elledge SJ. The DNA damage response: putting checkpoints in perspective. Nature 2000; 408:433-9.

14. Nyberg KA, Michelson RJ, Putnam CW, Weinert TA. Toward maintaining the genome: DNA damage and replication checkpoints. Annu Rev Genet 2002; 36:617-56.

59

15. Sancar A, Lindsey-Boltz LA, Unsal-Kacmaz K, Linn S. Molecular mechanisms of mammalian DNA repair and the DNA damage checkpoints. Annu Rev Biochem 2004; 73:39-85.

16. Melo J, Toczyski D. A unified view of the DNA-damage checkpoint. Current opinion in cell biology 2002; 14:237-45.

17. Moffat J, Sabatini DM. Building mammalian signalling pathways with RNAi screens. Nat Rev Mol Cell Biol 2006; 7:177-87.

18. Quon K, Kassner PD. RNA interference screening for the discovery of oncology targets. Expert opinion on therapeutic targets 2009; 13:1027-35.

19. Boutros M, Ahringer J. The art and design of genetic screens: RNA interference. Nature reviews 2008; 9:554-66.

20. Stucki M, Jackson SP. gammaH2AX and MDC1: anchoring the DNA-damage- response machinery to broken chromosomes. DNA repair 2006; 5:534-43.

21. Fernandez-Capetillo O, Lee A, Nussenzweig M, Nussenzweig A. H2AX: the histone guardian of the genome. DNA repair 2004; 3:959-67.

22. Ward IM, Chen J. Histone H2AX is phosphorylated in an ATR-dependent manner in response to replicational stress. The Journal of biological chemistry 2001; 276:47759- 62.

23. Burma S, Chen BP, Murphy M, Kurimasa A, Chen DJ. ATM phosphorylates histone H2AX in response to DNA double-strand breaks. The Journal of biological chemistry 2001; 276:42462-7.

24. Stiff T, O'Driscoll M, Rief N, Iwabuchi K, Lobrich M, Jeggo PA. ATM and DNA-PK function redundantly to phosphorylate H2AX after exposure to ionizing radiation. Cancer research 2004; 64:2390-6.

25. Kim JA, Kruhlak M, Dotiwala F, Nussenzweig A, Haber JE. Heterochromatin is refractory to gamma-H2AX modification in yeast and mammals. The Journal of cell biology 2007; 178:209-18.

26. Paull TT, Rogakou EP, Yamazaki V, Kirchgessner CU, Gellert M, Bonner WM. A critical role for histone H2AX in recruitment of repair factors to nuclear foci after DNA damage. Curr Biol 2000; 10:886-95.

27. Gorgoulis VG, Vassiliou LV, Karakaidos P, Zacharatos P, Kotsinas A, Liloglou T, Venere M, Ditullio RA, Jr., Kastrinakis NG, Levy B, Kletsas D, Yoneta A, Herlyn M, Kittas C, Halazonetis TD. Activation of the DNA damage checkpoint and genomic instability in human precancerous lesions. Nature 2005; 434:907-13.

60

28. Bartkova J, Horejsi Z, Koed K, Kramer A, Tort F, Zieger K, Guldberg P, Sehested M, Nesland JM, Lukas C, Orntoft T, Lukas J, Bartek J. DNA damage response as a candidate anti-cancer barrier in early human tumorigenesis. Nature 2005; 434:864- 70.

29. Casper AM, Nghiem P, Arlt MF, Glover TW. ATR regulates fragile site stability. Cell 2002; 111:779-89.

30. Durkin SG, Arlt MF, Howlett NG, Glover TW. Depletion of CHK1, but not CHK2, induces chromosomal instability and breaks at common fragile sites. Oncogene 2006; 25:4381-8.

31. Mirzoeva OK, Petrini JH. DNA replication-dependent nuclear dynamics of the Mre11 complex. Mol Cancer Res 2003; 1:207-18.

32. Wollman R, Stuurman N. High throughput microscopy: from raw images to discoveries. J Cell Sci 2007; 120:3715-22.

33. Thomas PD, Campbell MJ, Kejariwal A, Mi H, Karlak B, Daverman R, Diemer K, Muruganujan A, Narechania A. PANTHER: a library of protein families and subfamilies indexed by function. Genome research 2003; 13:2129-41.

34. Dennis G, Jr., Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, Lempicki RA. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome biology 2003; 4:P3.

35. Cimprich KA, Cortez D. ATR: an essential regulator of genome integrity. Nat Rev Mol Cell Biol 2008.

36. Kondratov RV, Antoch MP. Circadian proteins in the regulation of cell cycle and genotoxic stress responses. Trends in cell biology 2007; 17:311-7.

37. Tardat M, Murr R, Herceg Z, Sardet C, Julien E. PR-Set7-dependent lysine methylation ensures genome replication and stability through S phase. The Journal of cell biology 2007; 179:1413-26.

38. Jorgensen S, Elvers I, Trelle MB, Menzel T, Eskildsen M, Jensen ON, Helleday T, Helin K, Sorensen CS. The histone methyltransferase SET8 is required for S-phase progression. The Journal of cell biology 2007; 179:1337-45.

39. Li X, Manley JL. Cotranscriptional processes and their influence on genome stability. Genes & development 2006; 20:1838-47.

61

40. Hossain MN, Fuji M, Miki K, Endoh M, Ayusawa D. Downregulation of hnRNP C1/C2 by siRNA sensitizes HeLa cells to various stresses. Molecular and cellular biochemistry 2007; 296:151-7.

41. Xiao R, Sun Y, Ding JH, Lin S, Rose DW, Rosenfeld MG, Fu XD, Li X. Splicing regulator SC35 is essential for genomic stability and cell proliferation during mammalian organogenesis. Molecular and cellular biology 2007; 27:5393-402.

42. Brumbaugh KM, Otterness DM, Geisen C, Oliveira V, Brognard J, Li X, Lejeune F, Tibbetts RS, Maquat LE, Abraham RT. The mRNA surveillance protein hSMG-1 functions in genotoxic stress response pathways in mammalian cells. Molecular cell 2004; 14:585-98.

43. Azzalin CM, Lingner J. The human RNA surveillance factor UPF1 is required for S phase progression and genome stability. Curr Biol 2006; 16:433-9.

44. Li X, Manley JL. Inactivation of the SR protein splicing factor ASF/SF2 results in genomic instability. Cell 2005; 122:365-78.

45. Moumen A, Masterson P, O'Connor MJ, Jackson SP. hnRNP K: an HDM2 target and transcriptional coactivator of p53 in response to DNA damage. Cell 2005; 123:1065- 78.

46. Berger P, Young P, Suter U. Molecular cell biology of Charcot-Marie-Tooth disease. Neurogenetics 2002; 4:1-15.

47. Szigeti K, Lupski JR. Charcot-Marie-Tooth disease. Eur J Hum Genet 2009; 17:703-10.

48. Niemann A, Berger P, Suter U. Pathomechanisms of mutant proteins in Charcot- Marie-Tooth disease. Neuromolecular medicine 2006; 8:217-42.

49. Cleveland DW, Mao Y, Sullivan KF. Centromeres and kinetochores: from epigenetics to mitotic checkpoint signaling. Cell 2003; 112:407-21.

50. Damelin M, Bestor TH. The decatenation checkpoint. British journal of cancer 2007; 96:201-5.

51. Alvaro D, Lisby M, Rothstein R. Genome-Wide Analysis of Rad52 Foci Reveals Diverse Mechanisms Impacting Recombination. PLoS Genet 2007; 3:e228.

52. Palancade B, Liu X, Garcia-Rubio M, Aguilera A, Zhao X, Doye V. Nucleoporins prevent DNA damage accumulation by modulating Ulp1-dependent sumoylation processes. Molecular biology of the cell 2007; 18:2912-23.

62

53. Loeillet S, Palancade B, Cartron M, Thierry A, Richard GF, Dujon B, Doye V, Nicolas A. Genetic network interactions among replication, repair and nuclear pore deficiencies in yeast. DNA repair 2005; 4:459-68.

54. Davuluri G, Gong W, Yusuff S, Lorent K, Muthumani M, Dolan AC, Pack M. Mutation of the zebrafish nucleoporin elys sensitizes tissue progenitors to replication stress. PLoS Genet 2008; 4:e1000240.

55. Nagai S, Dubrana K, Tsai-Pflugfelder M, Davidson MB, Roberts TM, Brown GW, Varela E, Hediger F, Gasser SM, Krogan NJ. Functional targeting of DNA damage to a nuclear pore-associated SUMO-dependent ubiquitin ligase. Science (New York, NY 2008; 322:597-602.

56. Gartenberg MR. Life on the edge: telomeres and persistent DNA breaks converge at the nuclear periphery. Genes & development 2009; 23:1027-31.

57. Misteli T, Soutoglou E. The emerging role of nuclear architecture in DNA repair and genome maintenance. Nat Rev Mol Cell Biol 2009; 10:243-54.

58. Chen-Goodspeed M, Lee CC. Tumor suppression and circadian function. Journal of biological rhythms 2007; 22:291-8.

59. Loiodice I, Alves A, Rabut G, Van Overbeek M, Ellenberg J, Sibarita JB, Doye V. The entire Nup107-160 complex, including three new members, is targeted as one entity to kinetochores in mitosis. Molecular biology of the cell 2004; 15:3333-44.

60. Rasala BA, Orjalo AV, Shen Z, Briggs S, Forbes DJ. ELYS is a dual nucleoporin/kinetochore protein required for nuclear pore assembly and proper cell division. Proceedings of the National Academy of Sciences of the United States of America 2006; 103:17801-6.

61. Gillespie PJ, Khoudoli GA, Stewart G, Swedlow JR, Blow JJ. ELYS/MEL-28 chromatin association coordinates nuclear pore complex assembly and replication licensing. Curr Biol 2007; 17:1657-62.

62. Schultz LB, Chehab NH, Malikzay A, Halazonetis TD. p53 binding protein 1 (53BP1) is an early participant in the cellular response to DNA double-strand breaks. The Journal of cell biology 2000; 151:1381-90.

63. Sanders SL, Portoso M, Mata J, Bahler J, Allshire RC, Kouzarides T. Methylation of histone H4 lysine 20 controls recruitment of Crb2 to sites of DNA damage. Cell 2004; 119:603-14.

64. Botuyan MV, Lee J, Ward IM, Kim JE, Thompson JR, Chen J, Mer G. Structural basis for the methylation state-specific recognition of histone H4-K20 by 53BP1 and Crb2 in DNA repair. Cell 2006; 127:1361-73.

63

65. Huyen Y, Zgheib O, Ditullio RA, Jr., Gorgoulis VG, Zacharatos P, Petty TJ, Sheston EA, Mellert HS, Stavridi ES, Halazonetis TD. Methylated lysine 79 of histone H3 targets 53BP1 to DNA double-strand breaks. Nature 2004; 432:406-11.

66. Zgheib O, Pataky K, Brugger J, Halazonetis TD. An oligomerized 53BP1 tudor domain suffices for recognition of DNA double-strand breaks. Molecular and cellular biology 2009; 29:1050-8.

67. Iwabuchi K, Basu BP, Kysela B, Kurihara T, Shibata M, Guan D, Cao Y, Hamada T, Imamura K, Jeggo PA, Date T, Doherty AJ. Potential role for 53BP1 in DNA end- joining repair through direct interaction with DNA. The Journal of biological chemistry 2003; 278:36487-95.

68. Chowdhury D, Keogh MC, Ishii H, Peterson CL, Buratowski S, Lieberman J. gamma-H2AX dephosphorylation by protein phosphatase 2A facilitates DNA double- strand break repair. Molecular cell 2005; 20:801-9.

69. Chowdhury D, Xu X, Zhong X, Ahmed F, Zhong J, Liao J, Dykxhoorn DM, Weinstock DM, Pfeifer GP, Lieberman J. A PP4-phosphatase complex dephosphorylates gamma-H2AX generated during DNA replication. Molecular cell 2008; 31:33-46.

70. Nakada S, Chen GI, Gingras AC, Durocher D. PP4 is a gamma H2AX phosphatase required for recovery from the DNA damage checkpoint. EMBO Rep 2008; 9:1019-26.

71. van Haaften G, Romeijn R, Pothof J, Koole W, Mullenders LH, Pastink A, Plasterk RH, Tijsterman M. Identification of conserved pathways of DNA-damage response and radiation protection by genome-wide RNAi. Curr Biol 2006; 16:1344-50.

72. van Haaften G, Plasterk RH, Tijsterman M. Genomic instability and cancer: scanning the Caenorhabditis elegans genome for tumor suppressors. Oncogene 2004; 23:8366-75.

73. Matsuoka S, Ballif BA, Smogorzewska A, McDonald ER, 3rd, Hurov KE, Luo J, Bakalarski CE, Zhao Z, Solimini N, Lerenthal Y, Shiloh Y, Gygi SP, Elledge SJ. ATM and ATR substrate analysis reveals extensive protein networks responsive to DNA damage. Science (New York, NY 2007; 316:1160-6.

64

CHAPTER 3

The roles of co-transcriptional processes in maintaining genome stability

Parts of this chapter have been adapted with permission from:

“A Genome-wide siRNA Screen Reveals Diverse Cellular Processes and Pathways that Mediate Genome Stability” Paulsen RD, Soni DV, Wollman R, Hahn AH, Yee MC, Guan A, Hesley JA, Miller SC, Cromwell EF, Solow-Cordero DE, Meyer T, and Cimprich KA. Molecular Cell, 2009

Contributions

In this chapter, Renee Paulsen performed the γH2AX assay, 53BP1 assay, and RNAseH γH2AX reduction assay. Muh-Ching Yee cloned the RNAseH gene for stable cell line production by Renee Paulsen. The R-loop detection sodium bisulfite assay was performed by Renee Paulsen with sample preparation help by Muh-Ching Yee. The G2/M checkpoint assay and homologous recombination assay were performed by Deena Soni and Anna Guan. All other data presented was acquired and analyzed by Renee Paulsen

3.1 Introduction

3.1.1 Co-transcriptional processes: basic biological mechanisms

The coupling between transcription and other mRNA maturation processes is key to proper gene expression. One of the main processes that occurs co-transcriptionally in eukaryotic systems is the splicing of pre-mRNA. In humans, it is estimated that upwards of 80% of genes undergo splicing mechanisms1, and the complexity of this process is clearly demonstrated by the number of proteins that participate. As of 1999, around 100 splicing factors had been identified2, however, upon subsequent mass-spectometry experiments this number has increased to nearly two hundred3-6.

65

Figure 3.1. Basic eukaryotic mRNA splicing cycle. Eukaryotic mRNA splicing occurs co-transcriptionally. Spliceosome assembly proceeds through a series of intermediate stages. In the E complex, the U1 snRNP binds the 5’ splice site, and serves to recruit the U2 snRNP as well as other core splicing factors to form the A complex and facilitate lariat formation. The B complex is formed by release of the U1 snRNP and recruitment of members of the U4/U5/U6 snRNP and thus is poised to perform the first catalytic step of 5’ splice site cleavage. The C complex, or catalytic complex, performs the catalytic steps of 5’ splice site cleavage, followed by 3’ splice site cleavage, intron release and exon joining. After all introns have been removed, the mRNA is 3’polyadenylated and exported for subsequent translation

66

Spliceosome assembly occurs on the transcribed RNA as it is being extruded from the RNA polymerase2. Each intron is recognized by a series of components of the spliceosome including the five U snRNPs (U1, U2, U4, U5, and U6), which are each comprised of a small RNA bound by several proteins, as well as other splicing accessory factors, which are less stably-associated with the spliceosome2. Excision of introns occurs with two catalytic steps, 5’ splice site cleavage and lariat formation followed by 3’ splice site cleavage and exon ligation (Fig. 3.1). Several additional types of proteins are involved in the regulation or process of mRNA splicing. These include; the SR proteins, a family of serine/arginine-rich proteins which are involved in regulating and selecting splice sites in eukaryotic mRNA7; the SRPK protein kinases, which regulate the assembly of the SR proteins by phosphorylation8, 9, and the hnRNPs, which are complexes of RNA and protein that bind to the pre-mRNA and are thought to serve as a signal that the pre- mRNA is not yet fully processed and ready for export to the cytoplasm10. In addition to splicing, the pre-mRNA must also undergo 5’ methyl capping and 3’ polyadenylation before its transport out of the nucleus for subsequent translation by the ribosome and degradation2.

Due to the dynamic nature of the spliceosome and quantity of proteins involved, deciphering how the identified “splicing factor” proteins are involved in splicing is a continual challenge. One ongoing problem is how to determine which proteins have true splicing roles from others that are simply contaminates due to RNA-binding capabilities or are associated with splicing factors. Also, distinguishing sequence specific pre-mRNA splicing factors from core components presents additional complexity. In any case, the mRNA co-transciptional processes involve a large set of highly regulated proteins which are integral for cell survival.

3.1.2 Transcription as a source of DNA damage

The maintenance of genome integrity is critical for proper gene expression either globally, by chromosomal loss or duplication affecting expression of a group of genes, or at the level of individual genes, by DNA mutations affecting the resulting protein sequence. However, as mentioned previously, the act of transcription itself makes the

67

DNA more prone to damage11, 12. While every DNA process that separates the DNA duplex may be viewed as a threat to the genome, transcription stands out due to its frequency. While the DNA is replicated only once per cell cycle, many genes are transcribed several times a minute, and thus the exposed DNA is more vulnerable to damage. Interestingly, the spontaneous mutation rate in eukaryotic cells is directly proportional to the transcription level of the gene, and the direction of transcription has an effect on the mutational spectra observed13.

Fork collision

The collision of a replication fork with the transcriptional apparatus has the high likelihood of being a traumatic event in the cell, and correspondingly it has been shown that replication forks pause much more frequently in highly transcribed genes14. Also, it has been shown in S.cerevisiae that transcription of a gene leads to higher recombination frequencies (TAR) during S-phase13 15. These lines of evidence suggest that transcription could cause genomic instability during DNA replication. Transcription can occur up to twenty times more slowly than DNA replication16, so collisions between the two would ultimately be inevitable. Interestingly, it appears that cells have evolved mechanisms to alleviate this conflict, as at least in bacteria, there is a genome wide bias towards co- orientation of replication and transcription17, which allows the replication fork to progress without displacing the RNA polymerase18. However, in genomic regions with oppositely oriented genes that result in a head on collision between the replication and transcription machinery, the replication fork stalls and results in the disruption of the transcriptional machinery19. In S.cerevisiae, head on collisions between the replication fork and transcription machinery cause an increase in the number of large DNA deletions at the transcribed locus13. This is particularly dangerous in highly transcribed regions, such as the ribosomal RNA locus (rRNA) as this obstruction can result in the activation of the DNA damage response, loss of genomic integrity and cell death20, 21. Thus minimizing replication -transcription conflicts would benefit the organism. This may be why S. cerevisae have evolved specific protein-DNA complexes at the 3’ end of the rDNA locus that serves as a replication fork blocking site, thus preventing head on replication fork collisions with the highly transcribed rRNA genes22.

68

R-loop formation

In addition to conflicts due to fork collisions, alternative DNA structures formed at sites of transcription could also be a challenge for DNA replication and the preservation of genomic stability. Single-stranded DNA (ssDNA) is chemically more unstable than double-stranded DNA (dsDNA)23. As transient ssDNA is found at sites of transcription due to the local negative supercoiling found behind the advancing RNA polymerase, this increases the chances for transcriptional DNA damage24-26. This local negative supercoiling also creates the possibility for the elongating RNA to remain associated with the DNA template and thus create an RNA/DNA hybrid, known as an R-loop, opposite of a ssDNA region (Fig. 3.2A). These R-loops have been observed in the highly transcribed regions of E.coli topA mutants27, 28. While it is unclear to what extent these structures are formed under native conditions, several examples have shown that when pre-mRNA processing, splicing, or export are disrupted, R-loops are found at sites of transcription and correlate to the genomic instability observed (Fig. 3.2B)29, 30.

R-loops have been shown to form in yeast upon mutation of the THO/TREX complex. THO is an evolutionally conserved multi-subunit complex comprising of four subunits: Tho2, Hpr1, Mft1, and Thp231. TREX also consists of multiple subunits, Sub2/UAP56 and Yra1/Aly32. Together, these complexes associate to promote transcription elongation and mRNA export. Strikingly, mutation of any components of the THO/TREX complex leads to transcription dependent hyper-recombination33, 34. In the studied case of hpr1 mutation, R-loops were formed co-transcriptionally, and over-expression of the enzyme RNAse H, which specifically cleaves RNA/DNA hybrids, was able to suppress the transcription elongation and hyper-recombination phenotypes of hpr1 mutation30.

RNA/DNA hybrids have also been observed in higher eukaryotic systems. One specialized case of R-loop formation occurs during class switch recombination in differentiating B-cells. Here R-loops are known to form in the switch regions (S regions) of the immunoglobulin loci35, 36 and be recognized by the activation-induced cytodine deaminase (AID), which selectively deaminates the ssDNA region opposite the R-loop to

69

facilitate DNA nick formation and subsequent recombination37. In this system, the formed R-loops were shown to extend for several kilobases of DNA36.

Figure 3.2. Simplified depiction of RNA/DNA hybrid formation that could occur in the absence of mRNA processing factors. (A) As the mRNA is being elongated by the RNA polymerase, splicing factors, including the THO protein and the Asf protein are loaded onto the extruded RNA. Some of these factors, including ASF, also bind the hyperphosphorylated C-terminal tail of the RNA polymerase to keep the RNA from rehybridizing back to the DNA template in the negatively supercoiled region found behind the RNA polymerase. (B) Under conditions where one or more of the mRNA splicing and/or processing proteins have been depleted or mutated in the cell, the elongating RNA is free to rehybridize back to the DNA template in the region of negative supercoiling thus creating an RNA/DNA hybrid or “R-loop”.

Disruption of proper mRNA processing has also been shown to induce R-loops in other types of vertebrate cells. Specifically, deletion of the ASF/SF2 protein in either chicken DT40 or human HeLa cells was shown to induce RNAse H reversible R-loop formation and to cause DNA double-strand breaks, rearrangements, and cell cycle arrest29. Also upon inhibition of the 1 enzyme, which is known to phosphorylate and regulate the splicing function of ASF/SF2, increased chromosomal rearrangements, DNA

70

damage, and replication progression defects were all observed38. Interestingly, the DNA damage was localized to highly transcribed regions and was reversible by RNAse H over expression, again suggesting that the cell employs active mechanisms to prevent R-loop formation during transcription and loss of these processes can result in genomic instability38.

Possible mechanisms of R-loop induced DNA damage

How these R-loops induce the observed genomic instability is still a matter under investigation, and ultimately could involve numerous mechanisms. One possibility is that the increased ssDNA found opposite the formed R-loops could make the DNA at transcribed regions more susceptible to DNA damage simply by increasing the frequency of single-strandedness at these regions (Fig. 3.3A). A second possibility is that the formed R-loops could serve as a structure that would impede replication fork progression resulting in replication fork stalling and potentially collapse followed by genome rearrangement (Fig. 3.3B). This would be similar to the genome instability observed by the head on collision of the replication and transcription machinery, however, in the case of R-loop formation DNA damage could occur independently of the transcriptional direction. A third possible mechanism of genomic instability due to R-loop formation would involve the formation of secondary structure in the ssDNA opposite of the R-loops (Fig. 3.3C).

R-loops are known to preferentially form when the non-template strand is guanine rich36. Indeed in B-cells, guanine is indispensible for R-loop formation within the immunoglobulin switch regions39. This raises the possibility that G-quartets or G4 form on the non-template strand to stabilize the ssDNA structure. These G4 structures have been microscopically observed during the transcription of switch region-containing plasmids in E.coli 40. Several nucleases have been identified that specifically cleave G4 DNA thus raising the interesting possibility that DNA secondary structure may both drive R-loop formation by stabilizing the ssDNA as well as drive the genomic instability induced by these structures by the recruitment of targeted endonucleases. Additionally, beyond G4 DNA, all R-loop structures may be targets for endogenous endonucleases due

71 to their composition. Each R-loop contains two duplex-single strand junctions similar to what is found during transcription-coupled DNA repair (TCR). The nucleases involved in TCR, XPF and XPG, could therefore be responsible for cleaving RNA/DNA hybrids either to promote genomic stability, by clearing the R-loops before DNA replication, or to promote genomic instability, by creating DNA double-strand breaks. Interestingly, these nucleases have been shown to be capable of cleaving the R-loops found within S regions41.

Figure 3.3. Possible mechanisms for RNA/DNA hybrid induced DNA damage. (A) R-loops make the genome more susceptible to DNA damage as they increase the amount of ssDNA found at the sites of transcription. This ssDNA can serve as a template for selective deamination leading to mutation or also can serve as a substrate for endogenous endonucleases, thus creating double-strand DNA breaks. (B) R-loops may also serve as a barrier to oncoming DNA replication forks, cause replication fork stalling, possible breakage and rearrangements due to recombination mechanisms. (C) The single stranded DNA opposite of the R-loop may also fold into alternative structures such as G4 quartets and thus again serve as a target for endogenous endonucleases in the cell causing double strand breaks.

72

3.1.3 Links between the DNA damage response and mRNA processing

The DNA damage response has well characterized roles in regulating transcription post- damage exposure; however, cellular checkpoints may have additional roles during transcription to prevent the formation of DNA damage at transcriptional sites. Interestingly, a recent proteomic paper characterizing targets of the ATM/ATR kinases found several phosphorylation sites within genes involved in mRNA splicing and/or processing42. Also the Cdc5L protein, a protein with evolutionarily conserved roles in pre-mRNA splicing, has also been shown to interact with ATR and be required to prevent cellular sensitivity to replication fork stalling agents43. Together, these suggest that mRNA processing may be regulated by the checkpoint to prevent genomic instability.

3.1.4 Synopsis

The mRNA processing cluster was the most significantly enriched group of genes to arise from our screen. While mRNA processing could affect genome stability indirectly by altering protein levels, recent studies have eluded to a more direct mechanism of DNA damage induction due to R-loop formation. Here we demonstrate that loss of mRNA processing factors induces DNA damage as read out by γH2AX and 53BP1 foci formation. Interestingly the γH2AX phenotype for a subset of the mRNA processing genes could be reduced by over-expression of the RNAse H enzyme suggesting at least part of the DNA damage could arise due to R-loop formation. We also present preliminary evidence of the existence of R-loops at a highly transcribed locus after depletion of one mRNA processing gene. Additionally, we begin to address whether the genome instability that arises occurs during DNA replication or causes specific replication progression defects. Finally, we look at the effects of mRNA processing gene loss on the DNA damage response by observing checkpoint activation and repair efficiency assays. Ultimately, these studies demonstrate the surprising breadth of involvement the mRNA processing genes have in preventing genomic instability.

73

3.2 Results

3.2.1 mRNA processing genes induce DNA damage when silenced

From the original screen conducted in the absence of damage, several mRNA processing genes were identified that gave very high γH2AX values when knocked down as well as high assigned significance. To determine if we had identified genes in our screen that were specific to any specific aspect of mRNA processing, we classified the genes according to their known mRNA processing and/or splicing roles (Table 3.1). We found that we had identified genes in nearly every aspect of mRNA processing from the early splice site recognition steps to the final aspects of mRNA surveillance and export. Seeing that the functional classification did not lead us to focus on a specific aspect of splicing, we proceeded to confirm the γH2AX phenotypes we had observed for a set of mRNA processing genes with a variety of functions. To test whether the mRNA processing genes we identified were indeed causing DNA damage, we assessed the formation of a distinct marker for DNA damage by analyzing 53BP1 foci for several of the factors that caused an increase in γH2AX (Fig. 3.4A). Consistent with the idea that increased γH2AX is due to increased DNA damage, knockdown of most splicing factors caused an increase in cells with multiple 53BP1 foci (Fig. 3.4B).

Figure 3.4. Functional assays for mRNA processing genes affecting γH2AX. (A) Effect on γH2AX after targeting genes involved in mRNA processing. (B) Percent of cells exhibiting greater than five 53BP1 foci after siRNA treatment. All graphs shown are mean ± SE for n=3. Duplicate bars indicate the effects of two different siRNAs.

74

Table 3.1 mRNA processing genes identified by the siRNA screen Putative ATM/ATR Average Deconvolution Complex substrate Gene Accession Significance %H2AX (x of 4) Core snRNP X LSM2 NM_021177 4 13.86 4 LSM4 NM_012321 4 3.69 ‐‐ LSM5 NM_012322 2 1.45 ‐‐ LSM6 NM_007080 4 7.73 ‐‐ LSM7 NM_016199 2 2.07 ‐‐ LSM8 NM_016200 3 3.12 ‐‐ SNRPB/COD NM_003091 4 5.62 4

SNRPD1/SMD1/Hs T2456 NM_006938 4 11.16 4 SNRPD2/SMD2 NM_004597 4 6.11 ‐‐ SNRPD3/SMD3 NM_004175 4 6.80 4 SNRPE/SME NM_003094 4 3.96 ‐‐ hnRNP proteins HNRNPA3 NM_194247 2 1.72 ‐‐ HNRPC NM_004500 4 11.58 4 HNRPK NM_002140 4 1.51 ‐‐ HNRPL NM_001533 2 4.19 1 HNRPU/SAFA NM_004501 2 2.90 ‐‐ U1 snRNP

SNRP70/RPU1/U1‐ 70 NM_003089 2 1.82 ‐‐ U2 snRNP

P14/SF3B14/CGI‐ 110 NM_016047 4 32.19 ‐‐ PHF5A/SF3b14b NM_032758 4 9.15 ‐‐

SF3A1/PRP21/SAP X 114/SF3a120 NM_005877 4 20.25 4

SF3A2/PRP11/SAP 62/SF3a66 NM_007165 0 1.99 4

SF3A3/PRP9/SAP6 1/SF3a60 NM_006802 4 20.86 3

SF3B1/PRP10/SAP 155/SF3b155 NM_012433 4 5.86 ‐‐

SF3B3/STAF130/KI AA0017/SAP130/S F3b130 NM_012426 4 9.31 3

SF3B4/SAP49/SF3b 49 NM_005850 4 12.77 4

75

Putative ATM/ATR Average Deconvolution Complex substrate Gene Accession Significance %H2AX (x of 4) X SF3B5 NM_031287 4 10.12 2 U4/U5/U6 tri snRNP NHP2L1 NM_005008 2 7.50 ‐‐ X PRPF3 NM_004698 2 6.06 ‐‐ PRPF31 NM_015629 4 4.52 ‐‐ PRPF4 NM_004697 2 3.05 ‐‐ X RY1/SNRNP27 NM_006857 2 3.14 ‐‐

SART1/ARA1/SNRN X P110 NM_005146 4 16.87 4 U5 snRNP C20ORF14/PRPF6 NM_012469 4 5.20 ‐‐ X PRPF8/PRP8 NM_006445 4 13.94 4

U5‐ 116KD/EFTUD2/Sn X rp116 NM_004247 3 3.84 ‐‐ U11/U12

C16ORF33/SNRNP 25 NM_024571 4 4.21 ‐‐ FLJ25070/RNPC3 NM_017619 2 1.65 ‐‐ LOC55954/ZMAT5 NM_019103 2 1.41 ‐‐ Sr protein X PNN/SDK3 NM_002687 2 1.51 ‐‐

SFRS1/SF2p33/ASF /SF2 NM_006924 2 1.47 ‐‐ SFRS2/PR264 NM_003016 2 3.36 ‐‐ SFRS3/SRP20 NM_003017 2 4.09 ‐‐ X SFRS8/SWAP NM_152235 2 2.99 ‐‐ SR‐A1/SCAF1 NM_021228 2 2.97 ‐‐ SRP46/SFRS2B NM_032102 4 4.68 ‐‐

SRRM2/ATBF1/CW F21/KIAA0324/SR X M300 NM_016333 4 1.73 3 Sr protein kinase SRPK1/SFRSK NM_003137 0 0.64 1 SRPK2/SFRSK2 NM_182691 0 0.63 1 Dead box helicase motif AQR NM_014691 4 15.60 4 DDX19/DDX19B NM_007242 4 10.14 3 DDX39 NM_005804 2 2.09 ‐‐ DDX41 NM_016222 4 3.98 ‐‐

76

Putative ATM/ATR Average Deconvolution Complex substrate Gene Accession Significance %H2AX (x of 4) DHX35 NM_021931 0 2.04 3 mRNA export RBM8A/ZRNP1 NM_005105 4 12.82 ‐‐ mRNA survellience

SMG1/ X LIP/KIAA0421 NM_014006 3 6.43 3 non‐snRNP spliceosomal assembly TFIP11/NTR1 NM_012143 2 1.51 ‐‐

SKIIP/SNW1/Bx42/ PRPF45 NM_012245 4 14.74 4 U2AF2/U2AF65 NM_007279 4 6.47 3 USP39 NM_006590 2 5.87 3 poly A binding protein GRSF1 NM_002092 2 4.90 ‐‐ Prp19 complex X KIAA1160/ISY1 NM_020701 4 13.95 ‐‐ PRP19/PRPF19 NM_014502 4 2.12 4 CDC5L/CDC5 NM_001253 4 13.24 4 CRNKL1/CRN NM_016652 4 27.10 4 PLRG1/PRL1 NM_002669 2 3.62 ‐‐ Splicing regulation

FLJ36754/SFRS12IP 1 NM_173829 4 7.48 ‐‐ WBP11 NM_016312 4 6.13 4 THO complex PCF11 NM_015885 2 1.61 ‐‐ THOC1/HPR1 NM_005131 2 2.05 ‐‐ tRNA splicing

LOC283989/TSEN5 4 NM_207346 4 4.03 ‐‐ 5' Cap Binding NCBP1 NM_002486 2 4.41 ‐‐ miscellaneo us CDC40/PRP17 NM_015891 4 14.84 4 X CSTF2T NM_015235 2 1.63 ‐‐ ELAVL4/HUD NM_021952 3 1.69 ‐‐

77

Putative ATM/ATR Average Deconvolution Complex substrate Gene Accession Significance %H2AX (x of 4) FLJ20514/GEMIN8 NM_017856 2 2.13 ‐‐ FNBP3/prp40 XM_371575 2 1.72 ‐‐ HSPC148/CWC15 NM_016403 2 2.09 ‐‐ P29/SYF2 NM_015484 4 3.76 ‐‐ PIPPIN/CSDC2 NM_014460 3 1.89 ‐‐ PPP4R2 NM_174907 0 1.10 2 PRPF45 NM_006425 4 6.65 ‐‐

RBM22/ZC3H16/E CM2 NM_018047 4 5.69 3 XAB2 NM_020196 4 17.16 4 YT521 NM_133370 1 6.85 ‐‐ MGC10871/RBM4 B NM_031492 4 6.14 ‐‐ SF4/RBP NM_172231 2 3.15 ‐‐ X ZNF265/ZRANB2 NM_005455 2 2.78 2

Table 3.1. mRNA processing genes identified within the siRNA screen The average γH2AX value and significance score is shown from the siRNA screen in the absence of replicative stress (NT dataset). The deconvolution score demonstrated how many siRNAs of the 4 individual oligos tested gave an H2AX signal at least 2 standard deviation higher than the negative control. Putative ATM or ATR substrates were identified by comparison to Matusoka and Elledge, 2007.

3.2.2 The H2AX signal upon mRNA processing gene knockdown is reduced by over- expression of the enzyme RNAse H

In S. cerevisiae, when genes involved in mRNA processing are mutated, defects arise in the packaging of nascent mRNAs 12, 44. As a result the nascent mRNA hybridizes with the transcribed strand (RNA-DNA hybrid) creating an R-loop and causing elevated recombination. Furthermore, the genome instability arising from the disruption of cotranscriptional processes is suppressed upon removal of the RNA structures suggesting that R-loops formed by lack of proper mRNA processing are a direct source of genome instability. Similar events have also been observed in mammalian cells with loss of the splicing factor ASF/SF2 29. To determine if the observed DNA damage may involve the co-transcriptional formation of R-loops, we created a cell line stably expressing RNase H and analyzed γH2AX before and after expression (Fig. 3.5A,B). This approach has been

78

shown to prevent R-loop formation in yeast and mammalian cells, and it reverses the increases in DSB formation and G2 arrest caused by knockdown of the splicing factor ASF/SF2 29, 44. We found that expression of RNase H caused a slight increase in γH2AX for our control, suggesting that it may be generally toxic to cells. Despite this effect, H2AX phosphorylation was reduced in several of the samples (Fig. 3.5A,B, Table S10). These observations suggest that the cotranscriptional formation of R-loops may be a broad source of genome instability which is prevented by efficient mRNA processing.

Figure 3.5. Reduction of H2AX phosphorylation by over-expression of RNAse H. (A) Representative images of change in γH2AX signal following RNase H expression. Within the merged panel, nuclear staining is represented in red and γH2AX in green after pseudo-coloring the images. (B) Quantitation of effect represented in C for the indicated genes. Inset shows RNase H-HA protein expression.

3.2.3 Detection of R-loops in the absence of proper mRNA processing

While the reduction of γH2AX upon over-expression of RNAse H strongly suggests that the DNA damage caused by mRNA processing gene knockdown is RNA mediated, it does not directly prove R-loop formation at sites of transcription. Therefore, we

79

proceeded to utilize the biochemical method of native sodium bisulfite DNA sequencing to look for the appearance of R-loops at sites of transcription in the presence and absence of a gene needed for proper mRNA splicing (splicing gene X). Native bisulfite sequencing allows for the mapping of ssDNA regions at the chosen locus, by selectively converting cytosine to uracil in regions of ssDNA. Such changes are converted to thymine upon PCR amplification. Therefore, upon R-loop formation, a large stretch of ssDNA on the non-template strand would result in an extended track of C to T conversion in the cloned PCR products (Fig.3.6).

To test if we could indeed see an increase in R-loop formation, we siRNA treated HeLa cells with either an siRNA against luciferase of a siRNA against the chosen splicing gene and looked for C to T conversion in the 3’ untranslated region (UTR) of the β-actin gene (Fig. 3.7A). We hypothesized this region would have a high probability of R-loop formation as it is highly transcribed, and was found to form R-loops in the DT40 cell line after ASF/SF2 depletion29. After 72 hours of knockdown, genomic DNA was harvested under RNAse free conditions to preserve any transcriptionally formed R-loops, purified, converted with bisulfite, PCR amplified, and individual PCR products were cloned and sequenced. To select for templates that had some degree of R-loop formation, we used a pair of primers that was designed for the native DNA template in the forward direction, and the converted DNA template in the reverse direction. Therefore, the changes in the reverse primer account for the small stretch of C to T conversions present at the 3’ end of each clone sequenced.

Interestingly, after sequencing at least 10 non-redundant clones from both conditions, we saw that the HeLa cells depleted of the mRNA splicing gene had an increase in the length of converted sequences, suggesting a longer stretch of RNA invasion and hybrid formation at the β-actin locus (Fig. 3.7 B,C). Due to loss of sequence identity after conversion in the sequences obtained after mRNA splicing gene knockdown, we also saw several truncated clones that likely arose from mispriming with the converted primer within the β-actin locus. While this data is very preliminary evidence for an increase in R-loop formation at transcribed loci after splicing gene knockdown, it does directly show an increase in ssDNA on the non-template strand in these regions. However, for proper

80

conclusion of definitive R-loop formation, the reversal of the phenotype by RNAse H digestion needs to be demonstrated.

Figure adapted from 45

Figure 3.6. Sodium bisulfite method for detection of alternative DNA structures. (A) The chemistry of bisulfite-catalyzed modification of cytosine into uracil. Briefly, any cytosine within a region of single stranded DNA can react with bisulfite to add a bisulfite moiety to the C5-C6 double bond resulting in the formation of cytosine sulfonate. Then the cytosine sulfonate is hydrolytically deaminated to generate uridine sulfonate, which is irreversible. Finally, the uridine sulfonate is desulfonated under basic pH to produce uracil and is converted to thymidine by PCR amplification. (B) Schematic representation of outcomes of bisulfite treatment based on DNA conformation. In the left most panel the DNA is double stranded and therefore, no C to T conversion would be present. In the middle panel, transient melting of the DNA results in very small openings of the DNA which would cause spot conversions in both strands of the DNA duplex at random locations. In the right most panel, RNA/DNA hybrid formation during transcription creates an extended region of single stranded DNA on the non-template strand. This would lead to an extended region of C to T conversion specific to one strand of the DNA duplex.

81

Figure 3.7. Sodium bisulfite sequencing of the β-actin locus after splicing gene knockdown. (A) Schematic of the 400bp region targeted within the β−actin locus. This region was in the 3’UTR of the gene. Every circle represents a cytidine that has the possibility to be converted into thymidine upon bisulfite treatment. An open circle represents no conversion, while a filled in circle represents a C to T conversion at that base. The PCR amplification was performed with a forward native primer and a reverse primer with 5 converted bases complementary to a converted template sequence to enrich the number of products with conversion. (B) The bisulfite treated products of the β-actin locus of cells treated with siLuciferase. 10 individual clones were sequenced. (C) The bisulfite treated products of the β-actin locus of cells treated with siRNA against a splicing gene. 13 individual clones were sequenced.

82

3.2.4 Loss of mRNA processing genes induces cell cycle arrest, replication progression defects, and damage during S-phase

As part of the original siRNA screen, the cell cycle phases were assigned to each siRNA tested by using their propidium iodide intensity as a measure of DNA content. Interestingly, we found that loss of many of the splicing genes lead to increases in the percentage of G2/M phase cells in the absence of replicative stress (Fig. 3.8 A) and increases in the percentage of cells arrested in S-phase after aphidicolin addition as compared to the negative control (Fig. 3.8 B). Aphidicolin treatment also resulted in an increased percentage of γH2AX positive S-phase cells, providing preliminary evidence that depletion of genes involved in mRNA processing may be causing damage during DNA replication.

Figure 3.8. Effect of knocking down mRNA splicing genes on the cell cycle. (A &B) Cell cycle analysis of mRNA splicing/processing genes post-knockdown. Cell cycle analysis was performed as described in the materials and methods. Highlighted numbers indicate cell cycle variations as compared to the control. (B) 48hrs post-knockdown HeLa cells were treated with 400nM aphidicolin and allowed to incubate for 24 hours before fixation and staining. Analysis was carried out as in A. Highlighted numbers indicate an increase in S phase cells compared to control or high levels of γH2AX staining during S phase.

83

We had also observed within the primary screen that aphidicolin treatment led to greater percentage of cells staining positive for γH2AX upon splicing gene knockdown. To confirm this observation we reordered a siRNA library comprising of approximately 40 genes involved in mRNA splicing and processing to retest their γH2AX phenotype in the presence and absence of aphidicolin. Upon retesting, we again observed a high level of γH2AX in the absence of external stress, but also a gain of signal upon the addition of aphidicolin (Fig. 3.9 & Table S11).

Figure 3.9. Effect of aphidicolin on the H2AX phosphorylation observed post mRNA processing gene knockdown. HeLa cells were transfected with 25nM individual duplexes. 48 hours later cells were mock-treated or treated with 400nM aphidicolin and allowed to incubate for 24 hours. 72 hours post-knockdown cells were fixed and stained with antibodies against γH2AX and propidium iodide for nuclear content. Shown is the average % γH2AX values of the 4 individual duplexes for each gene +/- SD of experimental duplicates n=1.

The aphidicolin-induced increase in γH2AX suggested that mRNA processing gene loss could be causing DNA damage specifically during DNA replication. Indeed, transcription-associated recombination (TAR) is dependent on replication15, and R-loop formation could act as a replication fork block. To test if the γH2AX we observed upon splicing gene knockdown was specific for S-phase, we again utilized the splicing library of 40 genes, depleted proteins for 72 hours, pulse-labeled the cells for replication using

84

Figure 3.10. Replication dependence of the DNA damage induced by splicing gene knockdown. (A) Representative images of DNA damage occurring during replication. Cells were incubated either with or without mitomycin C 1uM for 12 hours before pulse- labeling with the nucleotide analog Edu. After labeling cells were fixed and stained for replication (Edu) and DNA damage (γH2AX). (B) The percent of replicating cells that stained positive for DNA damage post splicing gene knockdown. HeLa cells were treated with 25nM siRNA for 72 hours and at the end of the siRNA knockdown pulse labeled for 30 min with Edu for replication. The number of Edu positive cells also staining for γH2AX is shown. (C). The percent of γH2AX positive cells staining for replication. Cells were treated as in B. The number of γH2AX positive cells that were also Edu positive is shown. Error bars represent the standard deviation between two replicate experiments.

85 the nucleotide analog Edu, then fixed and stained the cells for replication (Edu) and DNA damage (γH2AX) (Fig. 3.10 A). We found that knockdown of many of the splicing genes led to a large pile up of S-phase cells as read out by Edu staining, and this correlated with an increase in the γH2AX staining observed within this replicating fraction (Table S12; Fig. 3.10 B,C). However, loss of an equal number of mRNA processing genes also led to a decrease in the replicating fraction at the end of the experiment and a corresponding decrease in the number of γH2AX positive replicating cells (Table S12; Fig. 3.10B,C). Many genes tested also showed discrepancy between siRNAs targeting the same gene. A likely explanation for the discrepancies within siRNA both within genes and between genes, is that due to varying siRNA potentcy, many siRNAs could have led to problems during replication that inevitably ended in checkpoint activation along with S or G2 arrest and would no longer be incorporating nucleotides at the time-point measured. The hypothesis that loss of mRNA processing as a whole, leads to defects specific to DNA replication cannot be confirmed by this method; however, examining the replication γH2AX correlation on a gene to gene basis does illuminate several genes that appear to cause DNA damage nearly exclusively in S- phase (Table S12).

To further confirm that loss of proper mRNA processing mechanisms leads to defects during replication, we examined the effect of loss of an individual splicing gene (splicing gene X) on the cell cycle and replication progression defects. Loss of the Asf/SF2 protein caused a G2 arrest that was reduced by over-expression of the RNAse H enzyme29. Correspondingly, upon depletion of splicing gene X with four individual siRNAs we observed an increase in both the S and G2 fraction in HeLa and U2OS cell lines (Fig. 3.11A,B). This suggests that defects arising during replication could be activating cell cycle checkpoints. Defects in replication were confirmed in both cell lines by tracking the ability of replicating cells to proceed into mitosis. Briefly, replicating cells were labeled with Edu, and then treated with nocodazole for 18 hours to collect mitotic cells. At the end of the experiment, cells were fixed and labeled for replication (Edu) and mitosis (p-H3) and the ratio was normalized to the negative control. Replication progression defects were observed for all siRNA duplexes tested in both cell

86

lines against splicing gene X (Fig. 3.11C). While these observations cannot be generalized for all the mRNA processing genes found to induce DNA damage, they strongly suggest that loss of at least a subset of mRNA processing genes leads to DNA damage defects during replication. Whether the replication progression defects are a consequence of R-loop formation causing replication fork blockage or result from other transcriptionally-induced DNA damage will be of great interest for future study.

Figure 3.11. Cell cycle and replication progression analysis after splicing gene knockdown. (A&B) Propidium iodide cell cycle analysis of U2OS (A) or HeLa cells (B) 72 hours after knockdown of splicing gene X. Each line represents one of the four individual duplexes tested for the same gene. (C) Replication progression analysis after knockdown of splicing gene X. Hela or U2OS cells were treated with 25nM siRNA for 48hours before pulse labeling with Edu for 30min. After labeling, cells were allowed to incubate in the presence of nocodazole for 18hrs to arrest cells in mitosis. Cells were fixed and stained for Edu and p-H3 to measure the number of replicating cells that were able to progress into mitosis. The ratio for each individual gene was normalized to siLuc for comparison. Error bars indicate the SD between two duplicate measurements.

87

3.2.5 mRNA processing gene loss affects checkpoint activation and double strand break repair processes

To test the idea that some of these mRNA processing proteins might have other roles in the DNA damage response, we analyzed activation of the G2/M checkpoint and homologous recombination upon knockdown of three genes: Cdc40/Prp17, a splicing factor linked to S phase progression in yeast; Skiip/Snw1, a transcriptional coactivator that also affects splicing, and Aqr, a putative RNA helicase related to Dna246-48. Interestingly, Cdc40 and Skiip showed a defect in activation of the checkpoint, while all three genes led to a decrease in homologous recombination, similar to that caused by loss of Rad51 (Fig. 3.12A,B). This suggests the overall process is complex and that there may also be more direct roles for these RNA processing genes in checkpoint and repair responses.

Figure 3.12. Loss of mRNA processing genes leads to checkpoint and DNA repair defects. (A) G2/M checkpoint assay after knockdown of the indicated genes and IR treatment. Fold increase of the mitotic index (% mitotic cells post-IR / % mitotic cells nontreated) in cells transfected with targeting siRNA relative to control is shown. (B) HR repair frequency at an induced double-strand break after knockdown of indicated gene. Samples were normalized to the HR frequency in the siControl-transfected cells. All graphs shown are mean ± SE for n=3. Duplicate bars indicate the effects of two different siRNAs.

88

3.3 Discussion

The most significantly enriched interaction network from the siRNA screen in the absence of replicative stress contained proteins involved in mRNA processing. These hits are involved in different stages of mRNA processing, including RNA splicing, spliceosome assembly, mRNA surveillance, and mRNA export, with the majority having roles in RNA splicing. Some recent studies have linked some mRNA processing genes to genome maintenance both in yeast and mammals12, 44. Therefore, based on these, and a limited number of additional observations, we expected to identify a few mRNA processing proteins within our screen29, 49-53. Strikingly, however, our studies revealed that mRNA processing is involved in preserving genomic integrity on a much broader scale.

How loss of mRNA processing genes is leading to DNA damage is still unclear. We have not ruled out the possibility that the DNA damage observed may partially be due to apoptotic mechanisms. While measures were put into place to exclude dying cells within our image analysis, loss of mRNA processing genes in many cases did result in a decrease in cell number at the end of the experiment. Proper mRNA processing is essential for cellular survival, however attributing all of the γH2AX phenotypes to cell death would be highly unlikely. Correspondingly, in C. elegans many of the SR proteins were found to be non-essential7, and several non-essential splicing genes were discovered in a genomic screen looking for DNA damage sensitivity54.

Other mechanisms that would explain the observed DNA damage upon loss of proper mRNA processing include damage due to replication and transcription fork collision as well as damage due to R-loop formation. One could argue that the later is more likely as we did not identify a large number of genes with roles in transcription that induced DNA damage even though an equal number of proteins are involved in this process. This suggests that disruption of the co-transcriptional mechanisms rather than the transcriptional machinery itself induces genomic instability. Several studies have now shown that the DNA damage caused by splicing gene loss is reversible by over- expression of RNAse H,29, 30, 38 and we also present preliminary evidence of R-loop

89

formation at a highly transcribed locus after loss of an mRNA processing gene as well as the reduction of the induced γH2AX signal after RNAse H over-expression.

How R loops might lead to genome instability is unclear. One possibility is that the displaced ssDNA is more susceptible to DNA damage, ultimately leading to DSB formation and recombination. Alternatively, disrupted mRNA splicing may create replication fork barriers via R-loop formation or by preventing timely removal of the transcription machinery during replication. These structures could cause fork arrest and collapse or could be subject to aberrant processing44. Here we show initial observations that loss of mRNA processing genes cause DNA damage during S-phase as well as defects in replication progression. However, not all genes caused a definitive S-phase arrest or correlation between staining of replication and DNA damage markers. More experiments are needed to address if R-loops serve as replication fork barriers, or if they alternatively serve as targets for endogenous endonucleases, and thus present challenges during S-phase due to their cleaved products.

Additionally, not all of the effects of mRNA processing genes on H2AX phosphorylation were decreased by RNAse H expression, and why these genes but not others affect R loop formation is not yet apparent. Because R-loop formation is thought to be suppressed by co-transcriptional packaging of the mRNA12, 44, our observations may suggest the proteins we identified to have RNAse H reversible γH2AX phenotypes play crucial or early roles in this process. However, it is also quite likely there are additional and multiple mechanisms by which these genes affect genome stability. Correspondingly, three mRNA processing genes also show roles in checkpoint activation and DNA damage repair mechanisms. Clearly, co-transcriptional processes are a highly integrated group of reactions with many links to preserving genome integrity, and deciphering the roles of the individual proteins involved will facilitate many years of study to come.

90

REFERENCES

1. Matlin AJ, Clark F, Smith CW. Understanding alternative splicing: towards a cellular code. Nat Rev Mol Cell Biol 2005; 6:386-98.

2. Burge CB, Tuschl, T., and Sharp, P.A. Splicing of precursors to mRNAs by the spliceosomes. NY: Cold Spring Harbor Labratory Press, 1999.

3. Jurica MS, Licklider LJ, Gygi SR, Grigorieff N, Moore MJ. Purification and characterization of native spliceosomes suitable for three-dimensional structural analysis. RNA (New York, NY 2002; 8:426-39.

4. Hartmuth K, Urlaub H, Vornlocher HP, Will CL, Gentzel M, Wilm M, Luhrmann R. Protein composition of human prespliceosomes isolated by a tobramycin affinity- selection method. Proceedings of the National Academy of Sciences of the United States of America 2002; 99:16719-24.

5. Makarov EM, Makarova OV, Urlaub H, Gentzel M, Will CL, Wilm M, Luhrmann R. Small nuclear ribonucleoprotein remodeling during catalytic activation of the spliceosome. Science (New York, NY 2002; 298:2205-8.

6. Zhou Z, Licklider LJ, Gygi SP, Reed R. Comprehensive proteomic analysis of the human spliceosome. Nature 2002; 419:182-5.

7. Graveley BR. Sorting out the complexity of SR protein functions. RNA (New York, NY 2000; 6:1197-211.

8. Gui JF, Lane WS, Fu XD. A serine kinase regulates intracellular localization of splicing factors in the cell cycle. Nature 1994; 369:678-82.

9. Kuroyanagi N, Onogi H, Wakabayashi T, Hagiwara M. Novel SR-protein-specific kinase, SRPK2, disassembles nuclear speckles. Biochemical and biophysical research communications 1998; 242:357-64.

10. He Y, Smith R. Nuclear functions of heterogeneous nuclear ribonucleoproteins A/B. Cell Mol Life Sci 2009; 66:1239-56.

11. Aguilera A. The connection between transcription and genomic instability. The EMBO journal 2002; 21:195-201.

12. Aguilera A, Gomez-Gonzalez B. Genome instability: a mechanistic view of its causes and consequences. Nature reviews 2008; 9:204-17.

91

13. Kim N, Abdulovic AL, Gealy R, Lippert MJ, Jinks-Robertson S. Transcription- associated mutagenesis in yeast is directly proportional to the level of gene expression and influenced by the direction of DNA replication. DNA repair 2007; 6:1285-96.

14. Azvolinsky A, Giresi PG, Lieb JD, Zakian VA. Highly transcribed RNA polymerase II genes are impediments to replication fork progression in Saccharomyces cerevisiae. Molecular cell 2009; 34:722-34.

15. Prado F, Aguilera A. Impairment of replication fork progression mediates RNA polII transcription-associated recombination. The EMBO journal 2005; 24:1267-76. 16. Mirkin EV, Mirkin SM. Replication fork stalling at natural impediments. Microbiol Mol Biol Rev 2007; 71:13-35.

17. Rocha E. Is there a role for replication fork asymmetry in the distribution of genes in bacterial genomes? Trends in microbiology 2002; 10:393-5.

18. Liu B, Wong ML, Tinker RL, Geiduschek EP, Alberts BM. The DNA replication fork can pass RNA polymerase without displacing the nascent transcript. Nature 1993; 366:33-9.

19. Liu B, Alberts BM. Head-on collision between a DNA replication apparatus and RNA polymerase transcription complex. Science (New York, NY 1995; 267:1131-7.

20. Torres JZ, Bessler JB, Zakian VA. Local chromatin structure at the ribosomal DNA causes replication fork pausing and genome instability in the absence of the S. cerevisiae DNA helicase Rrm3p. Genes & development 2004; 18:498-503.

21. Srivatsan A, Tehranchi A, MacAlpine DM, Wang JD. Co-orientation of replication and transcription preserves genome integrity. PLoS Genet; 6:e1000810.

22. Brewer BJ, Fangman WL. A replication fork barrier at the 3' end of yeast ribosomal RNA genes. Cell 1988; 55:637-43.

23. Frederico LA, Kunkel TA, Shaw BR. A sensitive genetic assay for the detection of cytosine deamination: determination of rate constants and the activation energy. Biochemistry 1990; 29:2532-7.

24. Fix DF, Glickman BW. Asymmetric cytosine deamination revealed by spontaneous mutational specificity in an Ung- strain of Escherichia coli. Mol Gen Genet 1987; 209:78-82.

25. Skandalis A, Ford BN, Glickman BW. Strand bias in mutation involving 5- methylcytosine deamination in the human hprt gene. Mutat Res 1994; 314:21-6.

26. Beletskii A, Bhagwat AS. Transcription-induced mutations: increase in C to T mutations in the nontranscribed strand during transcription in Escherichia coli.

92

Proceedings of the National Academy of Sciences of the United States of America 1996; 93:13919-24.

27. Drolet M, Phoenix P, Menzel R, Masse E, Liu LF, Crouch RJ. Overexpression of RNase H partially complements the growth defect of an Escherichia coli delta topA mutant: R-loop formation is a major problem in the absence of DNA topoisomerase I. Proceedings of the National Academy of Sciences of the United States of America 1995; 92:3526-30.

28. Masse E, Drolet M. Escherichia coli DNA topoisomerase I inhibits R-loop formation by relaxing transcription-induced negative supercoiling. The Journal of biological chemistry 1999; 274:16659-64.

29. Li X, Manley JL. Inactivation of the SR protein splicing factor ASF/SF2 results in genomic instability. Cell 2005; 122:365-78.

30. Huertas P, Aguilera A. Cotranscriptionally formed DNA:RNA hybrids mediate transcription elongation impairment and transcription-associated recombination. Molecular cell 2003; 12:711-21.

31. Chavez S, Beilharz T, Rondon AG, Erdjument-Bromage H, Tempst P, Svejstrup JQ, Lithgow T, Aguilera A. A protein complex containing Tho2, Hpr1, Mft1 and a novel protein, Thp2, connects transcription elongation with mitotic recombination in Saccharomyces cerevisiae. The EMBO journal 2000; 19:5824-34.

32. Masuda S, Das R, Cheng H, Hurt E, Dorman N, Reed R. Recruitment of the human TREX complex to mRNA during splicing. Genes & development 2005; 19:1512- 7.

33. Prado F, Piruat JI, Aguilera A. Recombination between DNA repeats in yeast hpr1delta cells is linked to transcription elongation. The EMBO journal 1997; 16:2826- 35.

34. Chavez S, Aguilera A. The yeast HPR1 gene has a functional role in transcriptional elongation that uncovers a novel source of genome instability. Genes & development 1997; 11:3459-70.

35. Daniels GA, Lieber MR. RNA:DNA complex formation upon transcription of immunoglobulin switch regions: implications for the mechanism and regulation of class switch recombination. Nucleic Acids Res 1995; 23:5006-11.

36. Yu K, Chedin F, Hsieh CL, Wilson TE, Lieber MR. R-loops at immunoglobulin class switch regions in the chromosomes of stimulated B cells. Nature immunology 2003; 4:442-51.

93

37. Yu K, Roy D, Bayramyan M, Haworth IS, Lieber MR. Fine-structure analysis of activation-induced deaminase accessibility to class switch region R-loops. Molecular and cellular biology 2005; 25:1730-6.

38. Tuduri S, Crabbe L, Conti C, Tourriere H, Holtgreve-Grez H, Jauch A, Pantesco V, De Vos J, Thomas A, Theillet C, Pommier Y, Tazi J, Coquelle A, Pasero P. Topoisomerase I suppresses genomic instability by preventing interference between replication and transcription. Nature cell biology 2009; 11:1315-24.

39. Mizuta R, Mizuta M, Kitamura D. Guanine is indispensable for immunoglobulin switch region RNA-DNA hybrid formation. Journal of electron microscopy 2005; 54:403-8.

40. Duquette ML, Handa P, Vincent JA, Taylor AF, Maizels N. Intracellular transcription of G-rich DNAs induces formation of G-loops, novel structures containing G4 DNA. Genes & development 2004; 18:1618-29.

41. Tian M, Alt FW. Transcription-induced cleavage of immunoglobulin switch regions by nucleotide excision repair nucleases in vitro. The Journal of biological chemistry 2000; 275:24163-72.

42. Matsuoka S, Ballif BA, Smogorzewska A, McDonald ER, 3rd, Hurov KE, Luo J, Bakalarski CE, Zhao Z, Solimini N, Lerenthal Y, Shiloh Y, Gygi SP, Elledge SJ. ATM and ATR substrate analysis reveals extensive protein networks responsive to DNA damage. Science (New York, NY 2007; 316:1160-6.

43. Zhang N, Kaur R, Akhter S, Legerski RJ. Cdc5L interacts with ATR and is required for the S-phase cell-cycle checkpoint. EMBO Rep 2009; 10:1029-35.

44. Li X, Manley JL. Cotranscriptional processes and their influence on genome stability. Genes & development 2006; 20:1838-47.

45. Raghavan SC, Tsai A, Hsieh CL, Lieber MR. Analysis of non-B DNA structure at chromosomal sites in the mammalian genome. Methods in enzymology 2006; 409:301- 16.

46. Ben Yehuda S, Dix I, Russell CS, Levy S, Beggs JD, Kupiec M. Identification and functional analysis of hPRP17, the human homologue of the PRP17/CDC40 yeast gene involved in splicing and cell cycle control. RNA (New York, NY 1998; 4:1304-12.

47. Sam M, Wurst W, Kluppel M, Jin O, Heng H, Bernstein A. Aquarius, a novel gene isolated by gene trapping with an RNA-dependent RNA polymerase motif. Dev Dyn 1998; 212:304-17.

48. Folk P, Puta F, Skruzny M. Transcriptional coregulator SNW/SKIP: the concealed tie of dissimilar pathways. Cell Mol Life Sci 2004; 61:629-40.

94

49. Hossain MN, Fuji M, Miki K, Endoh M, Ayusawa D. Downregulation of hnRNP C1/C2 by siRNA sensitizes HeLa cells to various stresses. Molecular and cellular biochemistry 2007; 296:151-7.

50. Xiao R, Sun Y, Ding JH, Lin S, Rose DW, Rosenfeld MG, Fu XD, Li X. Splicing regulator SC35 is essential for genomic stability and cell proliferation during mammalian organogenesis. Molecular and cellular biology 2007; 27:5393-402.

51. Brumbaugh KM, Otterness DM, Geisen C, Oliveira V, Brognard J, Li X, Lejeune F, Tibbetts RS, Maquat LE, Abraham RT. The mRNA surveillance protein hSMG-1 functions in genotoxic stress response pathways in mammalian cells. Molecular cell 2004; 14:585-98.

52. Azzalin CM, Lingner J. The human RNA surveillance factor UPF1 is required for S phase progression and genome stability. Curr Biol 2006; 16:433-9.

53. Moumen A, Masterson P, O'Connor MJ, Jackson SP. hnRNP K: an HDM2 target and transcriptional coactivator of p53 in response to DNA damage. Cell 2005; 123:1065- 78.

54. van Haaften G, Romeijn R, Pothof J, Koole W, Mullenders LH, Pastink A, Plasterk RH, Tijsterman M. Identification of conserved pathways of DNA-damage response and radiation protection by genome-wide RNAi. Curr Biol 2006; 16:1344-50.

95

96

CHAPTER 4

Charcot-Marie-Tooth Disease genes and the link to DNA damage

Parts of this chapter have been adapted with permission from:

“A Genome-wide siRNA Screen Reveals Diverse Cellular Processes and Pathways that Mediate Genome Stability” Paulsen RD, Soni DV, Wollman R, Hahn AH, Yee MC, Guan A, Hesley JA, Miller SC, Cromwell EF, Solow-Cordero DE, Meyer T, and Cimprich KA. Molecular Cell, 2009

Contributions

In this chapter, Angela Hahn developed and implemented the homologous recombination assay used in addition to bioinformatic analysis of the CMT genes. Muh-Ching Yee performed the Q-PCR analysis and helped with sample preparation for both the DNA damage sensitivity assay as well as homologous recombination assay. Renee Paulsen performed the DNA damage sensitivity assay, comparative western of CMT patient samples, and was ultimately responsible for final analysis and presentation all included data.

4.1 Introduction

Charcot-Marie-Tooth (CMT) disease includes a clinically and genetically defined group of disorders that encompass mutations in upward of twenty separate genes. Mutations or duplications of these genes all cause the resulting phenotype of peripheral neuropathy, meaning loss of motor and/or sensory skills in the limbs. The finding that Charcot- Marie-Tooth genes were enriched within the hits of our screen that caused inherent genome instability was surprising. Although when the clinical manifestations of loss of many proteins involved in the DNA damage or DNA repair pathways are compared, several similarities between symptoms arise.

97

4.1.1 Charcot-Marie-Tooth disease: clinical designations and involved genes

Charcot-Marie-Tooth disease is the most commonly inherited neuropathy with an incidence rate of 1 in 25001. The disease characteristically has a late onset with initial symptoms arising in the second to third decade with progressive deterioration of neural function2. Although not considered to be fatal, mild to significant disability is common. CMT has traditionally been divided into two major subclasses: either demyelinating (affecting Schwann cells) or axonal (affecting neurons), with several clinical and electrophysically defined subclasses (Table 4.1). The demyelinating subclasses (CMT1, CMTX, CMT3, and CMT4) compromise the majority of clinical cases and are defined by reduced nerve conduction velocity due to disruption of the myelin sheet, while the axonal subclass (CMT2) is defined by reduced compound muscle action potential3. However, each individual case can exhibit a combination of symptoms due to the large influence myelinating cells have on their axons and vice versa. The underlying mutations causative of the disease have largely been mapped and have been shown to include genes involved in a wide range of functions including myelination, protein and vesicle transport, transcription, signal transduction, mitochondrial function, nuclear architecture, and DNA single-strand break repair2-4. Some of the affected proteins found within our primary screen or subsequent deconvolution will be highlighted in the following text.

Pmp22 (peripheral myelin protein 22)

Mutations or altered gene dosage of the PMP22 gene comprise approximately 70% of all cases of CMT2. The majority of cases present a duplication of the PMP22 locus, but the clinical phenotypes are also observed with dominant missense mutants. PMP22 encodes a glycosylated four pass transmembrane protein that has been shown to be essential for myelin production5. The neuropathy caused by PMP22 under expression or duplication appears to be a result of decreased trafficking of wild type PMP22 to the membrane along with a toxic gain of function due to the accumulation of wild type PMP22 in endoplasmic reticulum in the case of gene duplication6, 7. These results suggest that expression needs to be tightly controlled. Indeed, the majority of PMP22 in Schwan cells is degraded and translocation to the membrane is dependent on axonal contact6, 8. Outside of its role in

98

myelin production, PMP22 has also been shown to play a role in regulation of cell proliferation, differentiation, apoptosis and cell shape9. The Pmp22 gene is also expressed in a variety of tissues both during development and adulthood consistent with other functional roles outside of myelination and myelin maintenance10, 11 One study additionally demonstrated a down regulation of PMP22 in metastatic versus primary carcinomas12.

GJB1 (gap junction beta protein 1)

GJB1 encodes a connexin protein that is the second most commonly mutated gene in CMT cases3. Being a gap junction protein, GJB1 is membrane spanning protein that assembles to form channels for the transport of ions and small molecules in and out of cells. GJB1 mutations cause the CMTX (X linked) subclass and are spread throughout the length of the protein 4. The neuropathy is believed to be caused by decreased trafficking due to the formation of nonfunctional channels in the cellular membrane and the myelin sheath. However, not all CMTX mutations cause decreased trafficking, suggesting that mechanisms other than loss of functional channels may be causative13. Interestingly, the decreased expression of GJB1 was correlated with the formation and metastatic potential of gastric cancers14. Almost all malignant cells show aberrant expression or localization of connexins, and the transfection of connexin genes into tumorigenic cells restores normal cell growth, supporting the idea that connexins form a family of tumor-suppressor genes15.

MPZ (myelin protein zero)

MPZ is a single pass transmembrane protein that plays a major structural role of myelin accounting for nearly 50% of the protein content found within the myelin sheath16. The neuropathy is caused by either over-expression as well as haplo-insufficiency, leading to incomplete myelin compaction or demyelination2. While shown to interact with PMP22 to promote structural integrity of the myelin, MPZ has also been shown by yeast to hybrid to directly interact with PINX1 protein which is essential for proper chromosome segregation during mitosis17, 18 again suggesting roles for the MPZ protein outside of myelination.

99

Egr2 (early growth response 2)

Egr2 is a zinc finger transcription factor that plays a major for in the development of the peripheral nervous system by regulating the expression of PMP22, MPZ, GJB1 and enzymes required for the synthesis of lipids19, 20. Mutations normally affect the DNA binding region negatively and thus affect Schwann cell development and lead to CMT subtypes 1 and 4. Like GJB1, EGR2 has been proposed to act as a tumor suppressor as its expression is often decreased in human tumors and cancer cell lines21, 22, although its function outside of peripheral nervous system development remains to be determined.

MTMR2 (myotubularin related protein 2) and MTMR13

MTMR2 and MTMR13 are interacting phosphatases with activity towards phospha- tidylinositol-3-phosphate and phosphatidylinositol-3,5-bisphosphate4. The substrate specificity of these phosphatases suggests they may have some function in regulating protein trafficking and/or protein degradation. Mutations in these genes are a cause of the rare form Charcot-Marie-Tooth disease type 4B, which has an early onset compared to the other subtypes and usually leads to severe disability. The MTMR2 protein contains several interesting domains including a GRAM domain, which is predicted to be a intracellular protein binding or lipid binding signaling domain23; a SET-interacting domain, which suggests binding to proteins involved in epigenetic modification24; and a PDZ domain, which is a common structural domain of signaling proteins. CMT causing mutations are distributed throughout the gene2, therefore not implicating one specific domain to disease progression.

Other Charcot-Marie-Tooth genes

Other genes with mutations causing CMT include genes with a variety of functions. Kif1B, a cytoskeletal protein required for the intracellular movement of mitochondria, and in neurons, the transport of synaptic vesicles, causes the axonal CMT subtype CMT22. SH3TC2, the CMT gene that had the highest γH2AX phenotype within the primary screen encodes a protein of unknown function and causes the CMT subtype

100

Deconvolution Deconvolution Gene CMT class Protein Form of CMT γH2AX HeLa U2OS KIAA1985/SH3TC2 CMT4C SH3 domain and tetratricopeptide repeats Demyelinating 5.74 4/4 -- 2 EGR2 CMT1D and Early growth response protein 2 Demyelinating 5.40 2/4 2/4 CMT4E PMP22 CMT1A and Peripheral myelin protein 22 Demyelinating 3.37 2/4 2/4 CMT1E GJB1 CMTX Connexin 32/ Gap junction protein beta 1 Demyelinating 3.27 3/4 2/4 SBF2/CMT4B2/ CMT4B2 SET binding factor 2 Demyelinating 2.87 4/4 1/4 MTMR13 NDRG1 CMT4D N-myc downstream-regulated gene 1 Demyelinating 2.86 -- -- HSPB1 CMT2F Heat-shock protein B1 Axonal 2.11 -- -- LMNA CMT2B1 Lamin A Axonal 1.45 -- -- MPZ CMT2J/I, Myelin protein zero Axonal 1.43 2/4 3/4 CMT1B, and DI-CMTD TDP1 CMT2 Trosyl-DNA-phosphodiesterase 1 Axonal 1.37 -- -- NEFL CMT1F and Neurofilament protein, light polypeptide Axonal 1.17 -- -- CMT2E/F1 GARS CMT2D Glycyl-tRNA synthetase Axonal 0.86 -- -- PRX CMT4F Periaxin Demyelinating 0.72 -- -- HSPB8 CMT2L Heat-shock protein B8 Axonal 0.63 4/4 -- LITAF CMT1C Lipopolysaccharide-induced TNF factor Demyelinating 0.60 -- -- RAB7 CMT2B Ras-related protein Rab-7 Axonal 0.54 -- -- GDAP1 CMT2K Ganglioside-induced differentiation protein Axonal 0.52 -- -- 1 DNM2 DI-CMTB Dynamin2 Intermediate 0.42 -- -- Kif1B CMT2A Kinesin beta Axonal -- -- FGD4 CMT4H FYVE, RhoGEF and PH domain Demyelinating 0.42 -- -- containing 4 MFN2 CMT2A Mitofusin Axonal 0.40 -- -- MTMR2 CMT4B1 Myotubularin-related protein 2 Demyelinating 0.29 3/4 2/4 KIAA0274/FIG4 CMT4J FIG4 homolog Demyelinating 0.19 -- -- YARS DI-CMTC Tyrosyl- tRNA synthetase Intermediate 0.18 -- --

Table 4.1. Charcot-Marie-Tooth gene classification and effects on γH2AX in the original screen and in the deconvolution analysis.

101

CMT4C25. The protein contains two SH3 and ten tetratricopeptide-repeat motifs, which are thought to serve as adaptor motifs for mediating protein complexes, however, the interacting proteins remain unknown. Mutations in HSPB8 are known to cause CMT type 2, although the relevant affected chaperone targets also remain unidentified4.

Interestingly, two proteins shown to affect DNA repair processes, LMNA and TDP1 have also been shown to cause peripheral neuropathies when mutated26, 27. LMNA (lamin A) is a major component of the nuclear lamina, the inmost layer of the nuclear envelope, and it thought to interact with a number of various proteins to organize the structure of the chromatin within the nucleus28. One point mutation in LMNA causes the Hutchinson- Gilford progeria syndrome29, a severe premature aging disease, other mutations are known to cause CMT type 2A3, and yet other mutations are known to cause a variety of diseases including muscular dystrophies, cardiomyopathies, lipodistrophy, and dysplasia27, 28. Given that LMNA influences an array of diverse processes such as DNA replication, gene expression, nuclear transport, apoptosis, and intracellular signaling, the diversity of diseases is not surprising. However, for rationalizing the γH2AX phenotype of the CMT genes found within our screen, it is interesting to note that mutations in LMNA are known to trigger a DNA damage response, cause sensitivity to various DNA damaging reagents, and induce cellular senescence in response to stress27.

TDP1 (tyrosyl-DNA-phosphodiesterase 1) repairs covalently bound topoisomerase 1 DNA complexes that create DNA single-strand breaks26. Mutation of TDP1 causes the late onset peripheral neuropathy SCAN1 disease, which has also been classified as a CMT type 230, 31. While the only known role of TDP1 is in single strand DNA repair, the phenotypes typical of mutation in a DNA repair gene, including chromosomal instability and cancer susceptibility are absent, suggesting a particular requirement for this enzyme in neurons.

Summary of Charcot-Marie-Tooth Disease

Clearly, the genes involved in the pathology of CMT disease are involved in a variety of different functions and processes. However, mutation of all of these genes converges on the same clinical phenotype of peripheral neuropathy. While many of these genes have a

102

direct role that would affect the nervous system, several others do not show a clear picture of why neurodegeneration specifically would arise.

4.1.2 DNA damage response genes with neuronal phenotypes

Notably, while mutations in DNA repair genes show the clinical phenotypes of increased cancer risk, another common clinical manifestation is neurological disease. Many proteins involved in the DNA damage response have profound effects on neuronal development and function30, 32, 33. For example, ataxia, axonal neuropathy, progressive neurodegeneration, and myelination defects are some of the characteristic features observed in individuals bearing mutations in crucial DNA damage response genes. (Table 4.2)

Gene Syndrome Molecular Role Neurological Phenotype

XPA, XPC, Xeroderma Pigementosa Nucleotide neurodegeneration, XPF (XP) Excision Repair microcephaly (NER) CSA, CSB Cockayne Syndrome (CS) Transcription neurodegeneration, Coupled Repair microcephaly, (TCR) de/dysmylination, and calcification XPG XP and CS NER and TCR neurodegeneration, microcephaly, de/dysmylination, and calcification XPB, XPD XP and CS and NER and TCR neurodegeneration, trichothiodystrophy (TTD) microcephaly, de/dysmylination, and calcification TFB5/TTD-1 TTD NER microcephaly ATM Ataxia telangiectasia (AT) DNA damage Ataxia, neurodegeneration response (DDR) Mre11 AT like disorder (ATLD) DDR Ataxia, neurodegeneration TDP1 Spinocerebellar ataxia Single strand Ataxia, sensory loss with axonal neuropathy break repair (SCAN1) (SSBR) APTX Ataxia with oculomotor SSBR Ataxia, neurodegeneration, apraxia 1 oculomoter apraxia ATR Seckel Syndrome DDR Microcephaly, mental retardation NBS1 Nijmegen Breakage DDR Microcephaly, mental syndrome (NBS) retardation

103

Gene Syndrome Molecular Role Neurological Phenotype

LIG4 LIG4 syndrome DDR Microcephaly MCPH/BRIT1 Primary microcephaly DDR Microcephaly, mental retardation XLF Immunodeficiency with DDR Microcephaly microcephaly TREX1 Aicardi-Goutieres Nuclease De/dysmyelination, syndrome (AGS) calcification, microcephaly

Table 4.2. List of neurological phenotypes from mutations in genes involved in DNArepair and the DNA damage response.

The main clinical phenotype for loss of the DNA damage response kinase, ATM, is ataxia34. While it has been suggested that ATM might have specific functions distinct from its DNA damage response functions in neurons35-38, ATM loss has also been shown to increase the level of reactive oxygen species in post-mitotic neurons39, suggesting there is an overall increased accumulation of DNA damage. Twenty percent of all patients with Xeroderma pigmentosa (XP) also have neurological defects including microcephaly (reduced head size) and peripheral neuropathy32. The XP family of genes have been shown to have roles in nucleotide excision repair (NER) again suggesting the accumulation of DNA damage causes neurological problems. Cumulatively, looking at the genes involved in DNA repair and the DNA damage response, causing a neurologic defect appears to be the consensus rather than exception.

4.1.3 Why are neurons specifically sensitive to DNA damage?

Why would neurons then be sensitive to loss of the DNA repair enzymes? Neurons by nature are post mitotic and do not need to have accurate DNA to pass on to dividing daughter cells. Also many of the classical functions of the DNA damage response genes are related to activating checkpoints to halt cell cycle progression in cycling cells, which again clearly is not applicable to neurons. However, due to their finite number, high metabolic and transcriptional activities, long lifespan, as well as reduced repair capacities, neurons are actually inherently sensitive to accumulating DNA damage, and strict mechanisms must be in place to maintain genomic stability.

104

In a replicating cell, a variety of repair mechanisms are in place to cope with the encountered DNA damage including mismatch repair, nucleotide excision repair, and non-homologous end joinging40 which all can operate outside of replication; however, any lesions that escape repair, will be encountered by the replication fork and processed by a number of different mechanisms that are not available to the neuron such as homologous recombination41. This limitation of DNA repair processes could lead to an excess of DNA damage, activate checkpoint mechanisms, and may eventually trigger the cells for apoptosis. Additionally, neurons have very high transcriptional activities and any escaped DNA damage could stall the transcription machinery or allow for the production of mutated proteins both which can again trigger neuronal apoptosis.

4.1.4 Synopsis

Here we show that a subset of Charcot-Marie-Tooth genes when down-regulated cause the induction of phospho-H2AX in multiple cell types. In many cases, this can be enhanced by the presence of aphidicolin. We find that although these CMT genes have been shown to have neuronal specific affects, the subset we examined are expressed in the HeLa cell line. We also find that like many DNA repair genes, loss of some CMT genes sensitizes the cells to DNA damage by either double-strand break inducing agents or replicative stress. Reduction in the efficiency of DNA repair was also measured for several CMT involved genes. Finally, we tested if increased DNA damage as read out by γH2AX could be observed within a matched pair of CMT patient cell lines either deficient or proficient for the GJB1 protein.

4.2 Results

4.2.1 Charcot-Marie-Tooth genes cause DNA damage when lost

To investigate the link between Charcot-Marie-Tooth (CMT) genes and the induction of γH2AX phosphorylation, we selected several CMT genes for further study including genes that displayed a high rate of γH2AX without external damage as well as CMT genes that showed signals that were inducible by the presence of aphidicolin. Since the

105

main roles of most CMT genes have been characterized in neurons, we reconfirmed the effects seen on H2AX phosphorylation in our original screen both in HeLa and U2OS cells with at least two siRNAs targeting each gene (Table 4.1, Fig.4.1A). Seeing that off- target effects are always a major concern with siRNA screening, we also verified knockdown efficiency with Q-PCR. All of the genes were detectable in HeLa cells, and nearly all siRNAs tested gave knockdown to a levels of <30% of the transcript remaining 48 hours post transfection compared to the non-targeting control (Table 4.3). Analysis of γH2AX after knockdown of several CMT genes also revealed that its localization was nuclear and focal (Fig. 4.1B), consistent with the idea that DNA damage is elevated when these genes are lost. Interestingly, the γH2AX signal in many cases could be enhanced by the presence of aphidicolin suggesting that these genes may having a specific effect in S-phase (Fig. 4.2)

Figure 4.1 Loss of Charcot-Marie-Tooth disease genes leads to increased H2AX phosphorylation. (A) Percentage of γH2AX+ cells after knockdown of the indicated genes. Duplicate bars indicate effects of individual siRNAs tested. (B) Images of γH2AX signal 72h after knockdown of the indicated gene. Aphidicolin (400 nM) was added for the final 24 hours. Within the merged panel, nuclear staining is represented in red and γH2AX in green after pseudocoloring the images. Scale bar indicates 50μm. All graphs shown are mean ± SE for n=3.

106

Knockdo Average wn Dharmacon SE Homologous DNA damage Gene Sequence %γH2AX verified catalog Number γH2AX Recombination sensitivity + by RT- PCR D-014684-01 SBF2/CMT4B2 CAACAUUGCCGCAGCAUUA 11.41 2.34 ++ IR/Aph Defect D-014684-02 SBF2/CMT4B2 GAACAUCAGCGCCAGGUGA 9.33 1.24 + (shown) D-014684-03 SBF2/CMT4B2 GCUCUAAAGCCCAAUGUAA 8.15 1.85 ++ Defect IR/Aph Defect D-014684-17 SBF2/CMT4B2 CCAGAAAGUUCCACGGCCA 5.83 1.43 D-006527-01 EGR2 GAAACCAGACCUUCACUUA 4.77 0.74 ++ D-006527-04 EGR2 CCACGUCGGUGACCAUCUU 8.46 0.45 + D-017887-03 GJB1 AGAAUGAGAUCAACAAGCU 37.91 13.50 D-017887-01 GJB1 GAUGAGAAAUCUUCCUUCA 5.21 2.51 D-005006-01 HSPB8 GAACUCAGAUUUAGUGCAA 5.97 0.37 D-005006-02 HSPB8 GGACUUAACAUUUCACGUU 4.86 0.70 D-005006-03 HSPB8 GCUGGGAGCCUGUCAGUUU 11.43 4.04 D-005006-04 HSPB8 CUAAGAACUUCACAAAGAA 5.28 0.41 D-015424-02 MPZ UCAAAGAGCGCAUCCAGUG 11.07 2.11 ++ Defect Aph Defect (shown) D-015424-03 MPZ GACCCUCGCUGGAAGGAUG 17.73 1.39 ++ Aph Defect D-008038-02 MTMR2 GAAACUGUGUGUAAGGAUA 6.69 1.61 ++ D-008038-03 MTMR2 GAGAAAGAAUGGCUAAGUU 14.12 0.89 ++ Normal IR Defect D-010616-02 PMP22 UUACAUCACUGGAAUCUUC 8.12 1.62 + Normal D-010616-19 PMP22 GUGUGAAGCUUUACGCGCA 4.66 0.09 ++ Defect Normal D-024332-01 KIAA1985/SH3TC2 GAAGGCCUUGACGGGUUA 8.96 1.18 + D-024332-02 KIAA1985/SH3TC2 CAGCAAGGUUGGUCAGUAU 27.95 2.57 ++ Normal IR Defect D-024332-03 KIAA1985/SH3TC2 GGGUUAUGCUGACCACUUU 5.89 0.68 +++ D-024332-04 KIAA1985/SH3TC2 UAAAGGCUCCGCCCUGUUG 18.71 2.94 + D-001400-01 Luciferase 3.27 0.62 M-003255-04 CHK1 78.86 1.85

+++ < 10%; ++ <30%; + <60% mRNA level of control Table 4.3. Individual CMT gene and siRNA information. CMT siRNAs were retested for γH2AX induction in HeLa cells along with verification of gene knockdown by Q-PCR. Individual siRNA were tested for homologous recombination defects and damage sensitivities. Phenotypes are listed above. If more than one siRNA was tested, both phenotypes are listed with the siRNA shown in Figure 4.2 designated as (shown).

107

Figure 4.2 Loss of Charcot-Marie-Tooth disease genes leads to enhanced H2AX phosphorylation in the presence of aphidicolin. Percentage of γH2AX+ cells after knockdown of the indicated genes. Replicate bars indicate effects of individual siRNAs tested. Graph shown represents the mean ± SD for experimental duplicates.

4.2.2 DNA repair defects are visible upon loss of Charcot-Marie Tooth genes

To better understand how these CMT genes might be affecting the DNA damage response, we assessed the effect of their knockdown on cell survival after exposure to ionizing radiation or aphidicolin (Fig. 4.3A). DNA damage sensitivity analysis is traditionally performed by colony survival but was adapted to use differentially labeled fluorescent cells to monitor cell growth with and without damage. Briefly, two populations of YFP or non-fluorescent cells were transfected with either the siRNA of interest or a non-targeting siRNA. Forty-eight hours post transfection the cells were replated in a 50/50 ratio mixing the two populations and then exposed to various DNA damaging agents. Three days post replating, the cells were harvested for FACS or imaged by immunofluorescence to look at the sensitivity of the siRNA of interest to the DNA damaging agent as compared to the control. Inherent growth rate defects were monitored and adjusted for by measuring the ratio of undamaged cells at the end of the experiment. Using this method, knockdown of several CMT genes caused cellular

108

sensitivity to the tested DNA damaging agents, suggesting CMT genes may be needed for DNA damage processing (Fig. 4.3A). To further test this idea, we looked at homologous recombination efficiency after CMT gene knockdown by measuring the effect on repair of a chromosomal double-strand break induced by the I-SceI nuclease (Fig. 4.3B). Interestingly, we also observed defects in homologous recombination for several of the genes we tested. With three of the genes (MPZ, CMT4B2, and PMP22) causing a decrease in overall homologous recombination efficiency, and one gene (SH3TC2) causing a slight increase in recombination. These findings strongly suggest the increase in H2AX phosphorylation observed in the original screen results from an increase in DNA damage. Further, they indicate that the CMT genes may play a role in the DNA damage response.

Figure 4.3 Loss of Charcot-Marie-Tooth disease genes leads to increased DNA damage sensitivity and DNA repair defects. (A) Sensitivity to aphidicolin (100nM) or IR treatment (2Gy). Samples were corrected for the effect of the indicated siRNA on growth rate and then normalized to the siLuciferase transfected sample. Details can be found in the methods appendix. (B) HR repair frequency at an induced double-strand break after knockdown of the indicated gene. Samples were normalized to the HR frequency in the siLuciferase-transfected cells. All graphs shown are mean ± SE for n=3.

109

4.2.3 Charcot-Marie-Tooth patient cell lines show a heightened checkpoint response

Since the pathology of this disease has not previously been linked to DNA damage, we also asked if increased H2AX phosphorylation was observed in patients with CMT. To do so, we analyzed γH2AX in two fibroblast cell lines, one derived from a CMT patient with a mutation in GJB1 and another derived from an unaffected family member (Fig. 4.4A). Importantly, we observed increased basal levels of γH2AX in the patient line as well as higher levels post-damage. Furthermore, we observed elevated levels of Chk1 phosphorylation following DNA damage in the patient lines, suggesting that checkpoint signaling is increased. These differences in checkpoint activation were not due to cell cycle variations between the two cell lines (Fig. 4.4B). Taken together, these results suggest the increased genomic instability resulting from the down-regulation of these genes may be a common phenotype of the CMT disorder.

Figure 4.4 Charcot Marie Tooth patient cell lines show heightened checkpoint activation. (A) p-Chk1 and γH2AX response in GJB1 patient cell lines. Samples were collected 24 hrs after drug addition. (B) BrdU plots indicate there is no significant difference in the cell cycle distribution of the mock treated cells.

110

4.3 Discussion

Cumulatively, we found that several CMT genes when down-regulated cause increased γH2AX staining that can be enhanced by the presence of aphidicolin. Loss of these genes also causes DNA damage sensitivities and DNA repair defects, consistent with these genes being involved in DNA damage processing. However, one of the large remaining questions is if the DNA damage we see is a cause or simply a consequence of the CMT disease. Using a pair of matched CMT cell lines, we see that indeed there is increase γH2AX staining in the fibroblasts of affected individuals. However, CMT is obviously a neuronal disease and whether increased DNA damage can be observed in the neurons of affected individuals remains to be seen.

Interestingly, most of the CMT genes we found to induce γH2AX cause demyelinating forms of CMT, suggesting there may be connections between DNA damage and myelination defects. Indeed, many of the CMT genes affecting γH2AX that we found include components of myelin, regulators of its production, and proteins involved in vesicle mediated transport, a process that affects myelination 4. Also of note, in addition to neurodegeneration, mutations in several proteins involved in DNA repair also cause myelination defects including mutations in CSA and CSB, the proteins involved in transcription coupled repair, as well as several proteins involved in NER, specifically XPB, XPD, XPG, and ERCC132. This may indicate the cell type responsible for myelin production, the Schwann cell, is particularly sensitive to increased DNA damage, or there could be alternate mechanisms of myelin regulation by DNA damage. Indeed, recent studies have shown that mutations in XPD inhibit the activation of thyroid hormone gene expression, which cause a deregulation of myelin-related gene expression42. Additionally, mutations in TREX1 or RNAseH2, two enzymes required for the degradation of certain endogenous nucleic acids, were shown to cause activation of the immune system and subsequent myelination defects and calcification of the brain43 44.

While further work will be required to understand the molecular links between the functions of these genes, DNA damage accumulation, and the pathogenesis of CMT, one potential mechanism may include improper membrane division in mitosis. As mentioned

111

previously, MPZ is known to interact with PINX1 a protein required for chromosomal segregation during mitosis17, 18. Also another CMT gene encoding the DNM2 protein is known to associate with microtubules and be required for proper chromosomal segregation during cytokinesis45-47. Alternatively, many of the CMT proteins could have roles in maintaining proper nuclear or cellular architecture which is also critical for maintaining DNA repair, replication, and cellular processes critical for the prevention of DNA damage accumulation48. Cumulatively, many of the CMT genes are membrane bound proteins with not well defined functions and are especially poorly studied outside of the context of the neuron. Here, we show that loss of the CMT gene subset we examined induced DNA damage in two cancer cell lines of a completely non-neuronal heritage. Therefore, examining how these CMT genes operate to prevent DNA damage both in neuronal tissues and beyond will be of great interest for future study.

112

REFERENCES

1. Skre H. Genetic and clinical aspects of Charcot-Marie-Tooth's disease. Clinical genetics 1974; 6:98-118.

2. Berger P, Young P, Suter U. Molecular cell biology of Charcot-Marie-Tooth disease. Neurogenetics 2002; 4:1-15.

3. Szigeti K, Lupski JR. Charcot-Marie-Tooth disease. Eur J Hum Genet 2009; 17:703-10.

4. Niemann A, Berger P, Suter U. Pathomechanisms of mutant proteins in Charcot- Marie-Tooth disease. Neuromolecular medicine 2006; 8:217-42.

5. Taylor V, Zgraggen C, Naef R, Suter U. Membrane topology of peripheral myelin protein 22. Journal of neuroscience research 2000; 62:15-27.

6. Pareek S, Suter U, Snipes GJ, Welcher AA, Shooter EM, Murphy RA. Detection and processing of peripheral myelin protein PMP22 in cultured Schwann cells. The Journal of biological chemistry 1993; 268:10372-9.

7. Ryan MC, Notterpek L, Tobler AR, Liu N, Shooter EM. Role of the peripheral myelin protein 22 N-linked glycan in oligomer stability. Journal of neurochemistry 2000; 75:1465-74.

8. Pareek S, Notterpek L, Snipes GJ, Naef R, Sossin W, Laliberte J, Iacampo S, Suter U, Shooter EM, Murphy RA. Neurons promote the translocation of peripheral myelin protein 22 into myelin. J Neurosci 1997; 17:7754-62.

9. Jetten AM, Suter U. The peripheral myelin protein 22 and epithelial membrane protein family. Progress in nucleic acid research and molecular biology 2000; 64:97-129.

10. Taylor V, Welcher AA, Program AE, Suter U. Epithelial membrane protein-1, peripheral myelin protein 22, and lens membrane protein 20 define a novel gene family. The Journal of biological chemistry 1995; 270:28824-33.

11. Baechner D, Liehr T, Hameister H, Altenberger H, Grehl H, Suter U, Rautenstrauss B. Widespread expression of the peripheral myelin protein-22 gene (PMP22) in neural and non-neural tissues during murine development. Journal of neuroscience research 1995; 42:733-41.

12. Mimori K, Kataoka A, Yoshinaga K, Ohta M, Sagara Y, Yoshikawa Y, Ohno S, Barnard GF, Mori M. Identification of molecular markers for metastasis-related genes in primary breast cancer cells. Clinical & experimental metastasis 2005; 22:59-67. 113

13. Abrams CK, Oh S, Ri Y, Bargiello TA. Mutations in connexin 32: the molecular and biophysical bases for the X-linked form of Charcot-Marie-Tooth disease. Brain research 2000; 32:203-14.

14. Wu J, Zhou HF, Wang CH, Zhang B, Liu D, Wang W, Sui GJ. [Decreased expression of Cx32 and Cx43 and their function of gap junction intercellular communication in gastric cancer]. Zhonghua zhong liu za zhi [Chinese journal of oncology] 2007; 29:742-7.

15. Yamasaki H, Krutovskikh V, Mesnil M, Tanaka T, Zaidan-Dagli ML, Omori Y. Role of connexin (gap junction) genes in cell growth control and carcinogenesis. Comptes rendus de l'Academie des sciences 1999; 322:151-9.

16. Lemke G, Axel R. Isolation and sequence of a cDNA encoding the major structural protein of peripheral myelin. Cell 1985; 40:501-8.

17. Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, Stroedicke M, Zenkner M, Schoenherr A, Koeppen S, Timm J, Mintzlaff S, Abraham C, Bock N, Kietzmann S, Goedde A, Toksoz E, Droege A, Krobitsch S, Korn B, Birchmeier W, Lehrach H, Wanker EE. A human protein-protein interaction network: a resource for annotating the proteome. Cell 2005; 122:957-68.

18. Yuan K, Li N, Jiang K, Zhu T, Huo Y, Wang C, Lu J, Shaw A, Thomas K, Zhang J, Mann D, Liao J, Jin C, Yao X. PinX1 is a novel microtubule-binding protein essential for accurate chromosome segregation. The Journal of biological chemistry 2009; 284:23072-82.

19. Mirsky R, Jessen KR. The neurobiology of Schwann cells. Brain pathology (Zurich, Switzerland) 1999; 9:293-311.

20. Topilko P, Schneider-Maunoury S, Levi G, Baron-Van Evercooren A, Chennoufi AB, Seitanidou T, Babinet C, Charnay P. Krox-20 controls myelination in the peripheral nervous system. Nature 1994; 371:796-9.

21. Unoki M, Nakamura Y. Growth-suppressive effects of BPOZ and EGR2, two genes involved in the PTEN signaling pathway. Oncogene 2001; 20:4457-65.

22. Unoki M, Nakamura Y. EGR2 induces apoptosis in various cancer cell lines by direct transactivation of BNIP3L and BAK. Oncogene 2003; 22:2172-85.

23. Doerks T, Strauss M, Brendel M, Bork P. GRAM, a novel domain in glucosyltransferases, myotubularins and other putative membrane-associated proteins. Trends in biochemical sciences 2000; 25:483-5.

114

24. Cui X, De Vivo I, Slany R, Miyamoto A, Firestein R, Cleary ML. Association of SET domain and myotubularin-related proteins modulates growth control. Nat Genet 1998; 18:331-7.

25. Senderek J, Bergmann C, Stendel C, Kirfel J, Verpoorten N, De Jonghe P, Timmerman V, Chrast R, Verheijen MH, Lemke G, Battaloglu E, Parman Y, Erdem S, Tan E, Topaloglu H, Hahn A, Muller-Felber W, Rizzuto N, Fabrizi GM, Stuhrmann M, Rudnik-Schoneborn S, Zuchner S, Michael Schroder J, Buchheim E, Straub V, Klepper J, Huehne K, Rautenstrauss B, Buttner R, Nelis E, Zerres K. Mutations in a gene encoding a novel SH3/TPR domain protein cause autosomal recessive Charcot-Marie-Tooth type 4C neuropathy. Am J Hum Genet 2003; 73:1106-19.

26. El-Khamisy SF, Caldecott KW. TDP1-dependent DNA single-strand break repair and neurodegeneration. Mutagenesis 2006; 21:219-24.

27. Liu B, Wang J, Chan KM, Tjia WM, Deng W, Guan X, Huang JD, Li KM, Chau PY, Chen DJ, Pei D, Pendas AM, Cadinanos J, Lopez-Otin C, Tse HF, Hutchison C, Chen J, Cao Y, Cheah KS, Tryggvason K, Zhou Z. Genomic instability in laminopathy- based premature aging. Nature medicine 2005; 11:780-5.

28. Gruenbaum Y, Margalit A, Goldman RD, Shumaker DK, Wilson KL. The nuclear lamina comes of age. Nat Rev Mol Cell Biol 2005; 6:21-31.

29. Pollex RL, Hegele RA. Hutchinson-Gilford progeria syndrome. Clinical genetics 2004; 66:375-81.

30. Rass U, Ahel I, West SC. Defective DNA repair and neurodegenerative disease. Cell 2007; 130:991-1004.

31. Takashima H. [Molecular genetics of inherited neuropathies]. Rinsho shinkeigaku = Clinical neurology 2006; 46:760-7.

32. Brooks PJ, Cheng TF, Cooper L. Do all of the neurologic diseases in patients with DNA repair gene mutations result from the accumulation of DNA damage? DNA repair 2008; 7:834-48.

33. McKinnon PJ, Caldecott KW. DNA strand break repair and human genetic disease. Annual review of genomics and human genetics 2007; 8:37-55.

34. Biton S, Barzilai A, Shiloh Y. The neurological phenotype of ataxia- telangiectasia: solving a persistent puzzle. DNA repair 2008; 7:1028-38.

35. Oka A, Takashima S. Expression of the ataxia-telangiectasia gene (ATM) product in human cerebellar neurons during development. Neuroscience letters 1998; 252:195-8.

115

36. Kuljis RO, Chen G, Lee EY, Aguila MC, Xu Y. ATM immunolocalization in mouse neuronal endosomes: implications for ataxia-telangiectasia. Brain Res 1999; 842:351-8.

37. Barlow C, Ribaut-Barassin C, Zwingman TA, Pope AJ, Brown KD, Owens JW, Larson D, Harrington EA, Haeberle AM, Mariani J, Eckhaus M, Herrup K, Bailly Y, Wynshaw-Boris A. ATM is a cytoplasmic protein in mouse brain required to prevent lysosomal accumulation. Proceedings of the National Academy of Sciences of the United States of America 2000; 97:871-6.

38. Li J, Han YR, Plummer MR, Herrup K. Cytoplasmic ATM in neurons modulates synaptic function. Curr Biol 2009; 19:2091-6.

39. Chen P, Peng C, Luff J, Spring K, Watters D, Bottle S, Furuya S, Lavin MF. Oxidative stress is responsible for deficient survival and dendritogenesis in purkinje neurons from ataxia-telangiectasia mutated mutant mice. J Neurosci 2003; 23:11453-60.

40. Sancar A, Lindsey-Boltz LA, Unsal-Kacmaz K, Linn S. Molecular mechanisms of mammalian DNA repair and the DNA damage checkpoints. Annu Rev Biochem 2004; 73:39-85.

41. Staropoli JF. Tumorigenesis and neurodegeneration: two sides of the same coin? Bioessays 2008; 30:719-27.

42. Compe E, Malerba M, Soler L, Marescaux J, Borrelli E, Egly JM. Neurological defects in trichothiodystrophy reveal a coactivator function of TFIIH. Nature neuroscience 2007; 10:1414-22.

43. Crow YJ, Hayward BE, Parmar R, Robins P, Leitch A, Ali M, Black DN, van Bokhoven H, Brunner HG, Hamel BC, Corry PC, Cowan FM, Frints SG, Klepper J, Livingston JH, Lynch SA, Massey RF, Meritet JF, Michaud JL, Ponsot G, Voit T, Lebon P, Bonthron DT, Jackson AP, Barnes DE, Lindahl T. Mutations in the gene encoding the 3'-5' DNA exonuclease TREX1 cause Aicardi-Goutieres syndrome at the AGS1 locus. Nat Genet 2006; 38:917-20.

44. Crow YJ, Leitch A, Hayward BE, Garner A, Parmar R, Griffith E, Ali M, Semple C, Aicardi J, Babul-Hirji R, Baumann C, Baxter P, Bertini E, Chandler KE, Chitayat D, Cau D, Dery C, Fazzi E, Goizet C, King MD, Klepper J, Lacombe D, Lanzi G, Lyall H, Martinez-Frias ML, Mathieu M, McKeown C, Monier A, Oade Y, Quarrell OW, Rittey CD, Rogers RC, Sanchis A, Stephenson JB, Tacke U, Till M, Tolmie JL, Tomlin P, Voit T, Weschke B, Woods CG, Lebon P, Bonthron DT, Ponting CP, Jackson AP. Mutations in genes encoding ribonuclease H2 subunits cause Aicardi-Goutieres syndrome and mimic congenital viral brain infection. Nat Genet 2006; 38:910-6.

116

45. Thompson HM, Skop AR, Euteneuer U, Meyer BJ, McNiven MA. The large GTPase dynamin associates with the spindle midzone and is required for cytokinesis. Curr Biol 2002; 12:2111-7.

46. Thompson HM, Cao H, Chen J, Euteneuer U, McNiven MA. Dynamin 2 binds gamma-tubulin and participates in centrosome cohesion. Nature cell biology 2004; 6:335- 42.

47. Hamao K, Morita M, Hosoya H. New function of the proline rich domain in dynamin-2 to negatively regulate its interaction with microtubules in mammalian cells. Exp Cell Res 2009; 315:1336-45.

48. Lees-Miller SP. Dysfunction of lamin A triggers a DNA damage response and cellular senescence. DNA repair 2006; 5:286-9.

117

118

CHAPTER 5

The role of Set8 in maintaining genomic stability

Contributions

In this chapter, Renee Paulsen performed and analyzed all experiments shown.

5.1 Introduction

The integrity of the genome is particularly vulnerable during DNA replication, where cells rely on a number of coordinated processes to maintain genome stability1, 2. To ensure proper transmission of genome integrity, the sequence of DNA must be maintained. However, DNA replication does not just occur on naked DNA, but must be carried out in the context of chromatin. Specifically, 146 base pairs of DNA is wrapped around the nucleosome protein complex, consisting of two copies of each histone H3, H4, H2A, and H2B. Although this system of packaging provides a relative consistency to the DNA folding structure, each of the histones can undergo a variety of post-translational modifications and thus convey variation in a wide range of cellular processes including gene regulation, transcriptional profiles, and protein recruitment. For the maintenance of accurate cellular functions, the correct transmission of covalent epigenetic modifications that occur on the DNA and histones, along with proper packaging of DNA into chromatin3, 4 is arguably equally important as maintaining the underlying DNA sequence.

5.1.1 Chromatin modifying enzymes and the DNA damage response

The histones undergo an array of post-translational modifications including phosphorylation, methylation, ubiquitination, ADP-ribosylation and acetylation5. These covalent marks may directly affect chromatin structure, but they have also been shown to recruit other proteins that direct diverse cellular processes including, heterochromatin formation, gene regulation and normal S-phase progression6-10. Given that the assembly of nucleosomes is primarily linked with DNA replication, understandably, a number of

119

proteins involved in histone deposition, modification, or the modified histones themselves have been implicated in the DNA damage response.

Figure 5.1 Nucleosome Architecture The above figure is based on entry PDB 1aoi.

In human cells, loss of the essential function of CAF1 (chromatin assembly factor 1), which is responsible for depositing newly synthesized histones onto DNA, causes spontaneous DNA damage and defects in S-phase progression11. Similarly, loss of Asf1 in yeast, which serves as a histone chaperone, directly activates the DNA damage checkpoint, causes G2/M cell cycle accumulation, and makes cells highly sensitive to DNA damaging agents12, 13. Interestingly, the HU sensitivity of rad53 mutants, one of the effector checkpoint kinases, could be suppressed by over expression of Asf1 suggesting a key function of rad53 post damage is controlling histone levels14, 15. Consistent with this idea, rad53 mutants are extremely sensitive to histone overexpression16.

Additionally, specific epigenetic marks have also been linked to the DNA damage response. Histone H3 lysine 56 acetylation (H3K56ac) is important for the cellular

120

response to stalled or collapsed replication forks 17. H3K56ac is a mark that specifically is found in replicating cells, and in yeast was shown to also occur and be maintained adjacently to replication coupled double-strand breaks18. Cells that lack the enzyme responsible for H3K56ac, Rtt109, showed increased sensitivity to the S-phase specific DNA damaging drugs HU, MMS, and CPT as well as an S-phase specific increase in chromosome breaks7. H3K56ac has also been suggested to have a role in the DNA damage response in human cells, as the responsible enzymes, CBP and p300, have been shown to be required for the assembly of H3K56 acetylated histones at the sites of double-strand breaks19. Altogether, defects in chromatin assembly present a significant barrier to replication progression and thus create a large opportunity for genomic instability to arise.

5.1.2 The Set8 histone methyltransferase

Set8/Pr-Set7 is a Set domain-containing protein known to have a role in the methylation of lysine 20 on histone H4 (H4K20)20-24. The Set8 protein is relatively well conserved. In yeast, the Set8 homolog, Set9, is responsible for mono, di and tri methylation of H4K2025; however, in higher eukaryotes, the activity of Set8 is limited to specifically mono-methylation of H4K2020. Set8 expression and histone methylase activity has been shown to be cell cycle regulated, increasing throughout S-phase and peaking at mitosis26, and a role for Set8 in mitotic chromosome condensation has been suggested27. Mutation of the Set8 gene in Drosophila melanogaster leads to an S-phase arrest that can be relieved by simultaneous deletion of the ortholog of ATR27. Interestingly, no DNA damage was detected upon loss of Set8 in Drosophila, and it was concluded that the arrest was a result of changes in chromatin structure.

5.1.3 Histone H4 K20 methylation

Histone H4 lysine 20 (K20) can be mono-, di- or tri-methylated, and its methylation has been implicated in transcriptional activation, gene silencing, heterochromatin formation, mitosis and DNA repair8, 22, 26, 28-30. This modification is unusual as while most histone modifications are found within the flexible tail regions of the histone, K20 methylation occurs at the interface between the histone and the DNA, which gives it the unique

121

capability to directly impact chromatin structure (Fig. 5.1). Recent proteomic studies have shown that almost all newly synthesized H4K20 is progressively methylated beginning in G2 phase and proceeding through mitosis and G1 phase, with turnover of the methyl mark remaining undetectable in vivo31. Mono-methylated H4K20 serves as a substrate for the Suv 4-20 enzyme which is responsible for the di and tri-methylation, and in vivo, the predominant modification form is the di-methylated histone32. While this histone modification is conserved, the levels found in mouse and human greatly outweigh the amount observed in yeast, particularly in the dimethylated form33, suggesting the relevance of the methylation may have evolved for a more complex genome. Interestingly, the dimethylated form of H4 is specifically recognized by the DNA damage checkpoint and repair protein 53BP1 in mammalians, and its fission yeast ortholog Crb225, 28, however, the mark is not induced by damage, and is not specifically found at DNA damage sites34.

5.1.4 Additional targets of Set8

In addition to histone H4, Set8 has also been shown to methylate the tumor suppressor gene p53 on lysine 38235. This modification suppresses the transcriptional activation of p53 post DNA damage, and correspondingly depletion of the Set8 enzyme by siRNA leads to more p53 activity and subsequent p53 mediated apoptosis. Interestingly, this study also found that in U2OS cells, Set8 is down-regulated post damage, thus allowing for a heightened p53 response.

5.1.5 Synopsis

Here we study Set8 as a hit from our genome-wide siRNA screen designed to identify genes that cause genome instability, as assessed by the phosphorylation of H2AX (γH2AX). We show, that upon Set8 knockdown, cells accumulate DNA damage in S- phase and activate both the Chk1 and Chk2 kinases. In addition, Set8 loss leads to the inability to efficiently progress through S-phase as well as a reduction in total nucleotide incorporation. We also find that Set8 loss results in a decrease in both mono- and di- methylation of H4K20, global chromatin relaxation, as well as an inability to recruit 53BP1 to sites of DNA damage. However, the loss of histone di-methylation was not

122

required for the induction of DNA damage upon Set8 knockdown. We also find that the damage induced by Set8 depletion can be suppressed by co-depletion of Rad51 or the Mus81 nuclease suggesting the damage could be due to aberrant recombination or template switching during DNA replication. Collectively, our data demonstrates that Set8 has key roles in promoting genomic stability during S-phase.

5.2 Results

5.2.1 Set8 is needed to maintain genomic stability

Surprisingly, the gene that caused one of the greatest increases in the percent of cells with γH2AX from our genome-wide siRNA screen was the histone methyltransferase Set8 (Fig. 5.1A). The Set8 siRNA pool induced γH2AX in approximately 50% of cells within the primary screen, and the intensity of γH2AX staining was similar in strength to the signal produced by depletion of the checkpoint kinase, Chk136, which served as the positive control in the screen (Fig. 5.1B). To test the specificity of this result, we tested each of the four siRNAs targeted to Set8 from the pool individually. All four siRNAs induced H2AX phosphorylation to a significant level, suggesting the phenotype we observed is a specific result of knocking down Set8 (Fig. 5.1C). Loss of Set8 also reduced the cell number to approximately one half of that of the control (Fig. 5.1D).

As noted above, Set8 is known to methylate p53 on lysine 38235, raising the possibility that the effects observed were due to loss of p53 function in the HeLa cell line that we used. Therefore, we tested the effects of two siRNAs targeted to Set8 in U2OS cells, which are positive for p53 function. Set8 knockdown consistently led to a high level of γH2AX in U2OS cells as well (Fig. 5.1E), suggesting the role of Set8 in preventing H2AX phosphorylation is not dependent on its role in p53 methylation. Therefore, we conclude that Set8 is needed to maintain genomic stability outside of its function in regulating the p53 DNA damage response.

123

Figure 5.1 Set8 is needed to maintain genomic stability (A) Set8 prevents phosphorylation of H2AX. HeLa cells were treated with the Dharmacon siGenome smart pools (25 nM), stained with antibodies to γH2AX and with PI 72 hours later and analyzed by laser scanning cytometry. The average percentage of cells staining positive for γH2AX is plotted for each pool (one pool/gene) in the genome in ascending order. Cells were scored as γH2AX positive if their integrated γH2AX intensity was greater than the intensity of a control siRNA treated sample. (B) Images of cells fixed and stained as described in (A). Effects of the positive (siChk1) and negative (siControl) siRNA used for the screen are shown for comparison. (C) Individual siRNAs used in the Set8 siRNA pool from (A) were tested as described in (A). (D) Effects of Set8 siRNAs on cell number. Cell numbers from (C) were quantified using automated nuclear segmentation. (E) HeLa and U2OS cells were treated with two Set8-targeted siRNAs and processed as described in (A). All error bars indicate standard error.

124

5.2.2 Set8 loss induces DNA damage during S-phase

The decrease in cell number observed upon knockdown of Set8 suggested that Set8 might be causing an arrest of cell cycle progression. To investigate this possibility, we took advantage of the fact that our cells were also stained with propidium iodide (PI) during the screen and analyzed the PI intensity for each cell. This allowed us to extract a cell cycle profile from our data and to determine the γH2AX intensity as a function of PI intensity (Fig. 5.2A). We found that upon Set8 depletion, there was an accumulation of cells in S and G2. This arrest was confirmed by flow cytometric analysis, and no further slowing of S-phase was observed upon treatment with a low dose of the replication polymerase inhibitor, aphidicolin (Fig. 5.2B). We also found that the γH2AX levels appeared to increase as the cells entered S-phase (Fig. 5.2A). These results suggest that the loss of Set8 causes DNA damage as cells move through S-phase. Thus, we hypothesized that Set8 may be needed for efficient replication.

To test this possibility, we tracked the progression of Set8 siRNA-treated cells through S- phase by pulse-labeling the cells with BrdU and following the movement of this S-phase fraction through the cell cycle by flow cytometry. The microtubule inhibitor nocodazole was added after the BrdU pulse to prevent labeled cells from repopulating the G1 fraction. Upon Set8 knockdown, we observed a striking defect in replication progression, as 24 hours later ~17% of the cells remained in early S-phase after Set8 knockdown compared to less than 1% for the control knockdown (Fig. 5.2C). In addition, we noted a significant decrease in the ability to incorporate BrdU in the absence of Set8, reflected in the large population of S-phase cells that stained positively for BrdU, but with a much lower intensity than those found within the siControl treated sample (Fig. 5.2C, see arrow). These observations suggest that Set8 is needed for efficient S-phase progression.

125

Figure 5.2 Set8 loss induces DNA damage in S-phase and is required for efficient DNA replication (A) Representation of γH2AX signal as a function of cell cycle phase. A cell cycle profile for both siControl- and siSet8-treated cells was constructed from the PI data obtained in the γH2AX screen and is represented as a histogram. Left two panels relate the γH2AX staining for each individual cell to its position within the cell cycle. Each dot represents an individual cell and is plotted to represent γH2AX intensity as a function of PI intensity. Left panels represent effects of siControl and siSet8. Right panel represents an overlay of the cell cycle profile/PI for the siControl sample to the γH2AX/PI signal of the Set8 siRNA-treated sample. (B) Effect of siControl or siSet8 on cell cycle as assessed by flow cytometry. (C) Effect of siControl or siSet8 on S-phase progression. Cells were pulse-labeled with BrdU 48 h after knockdown, then nocodazole was added, and samples were taken at the indicated times. Cells were fixed and stained with antibodies for BrdU and PI then analyze by flow cytometry. Fraction of BrdU-positive cells in early S-phase is boxed and quantified. Arrows are directed to cells that are inefficiently incorporating nucleotides.

126

5.2.3 Set8 knockdown causes activation of the DNA damage checkpoint

To further investigate the S-phase arrest observed upon Set8 knockdown, we assessed the phosphorylation state of two checkpoint kinases, Chk1 and Chk2. Chk1 phosphorylation results from activation of the upstream kinase ATR, largely in response to ssDNA that forms at stalled replication forks, but also after the processing of a double-strand break37- 40. In contrast, Chk2 phosphorylation occurs in response to DNA double-strand breaks and is dependent upon activation of the upstream kinase ATM 41. After Set8 depletion, we noted a significant increase in the phosphorylation of both Chk1 and Chk2, suggesting both ATM and ATR are activated in response to Set8 loss (Fig. 5.3A). However, Set8 loss did not prevent further activation of Chk1 or Chk2 by either ionizing radiation (Fig. 5.3A) or aphidicolin (data not shown). Consistent with activation of the ATR-dependent checkpoint, knockdown of Set8 also led to accumulation of nuclear foci containing replication protein A (RPA), a critical component of ATR signaling (Fig. 5.3B)39, 42.

Figure 5.3 Set8 knockdown causes the activation of the DNA damage checkpoint (A) HeLa cells were mock- or IR-treated (10Gy) 72 h after knockdown with control or Set8-targeted siRNAs and allowed to recover for 1 hour. Cells were harvested and analyzed by western blot for the indicated proteins. NT = non-treated. (B) Cells treated with control or Set8-targeted siRNAs for 72 h were pre-extracted, fixed and stained with antibodies to RPA2, p-Chk2 (Thr68), and DAPI, then imaged by epifluorescence microscopy.

These findings suggested that the large S-phase accumulation observed upon knockdown of Set8 could be due to activation of the checkpoint. Similarly, the decreased incorporation of BrdU could reflect the inhibition of origin firing that results from

127

activation of the checkpoint43, 44. Alternatively, these defects could result from early termination of replication forks thereby suggesting a role for Set8 in DNA replication. To distinguish between these possibilities, we tracked S-phase progression and BrdU incorporation after inhibition of the checkpoint kinases. Mock-depleted or Set8-depleted cells were pretreated with the ATR and ATM inhibitor caffeine for 1 hour. Replicating cells were then pulse-labeled with BrdU and harvested after 30 min or after 24 hours to monitor BrdU incorporation and cell cycle progression, respectively. Caffeine treatment had no effect on the incorporation of BrdU in cells treated with siRNA targeted to Set8 (Fig. 5.4A), suggesting that checkpoint activation was not responsible for this defect. However, the slowing of S-phase progression observed upon knockdown of Set8 was partially relieved by the presence of caffeine (Fig. 5.4B). This suggests the checkpoint is partially responsible for the severe S-phase arrest after Set8 loss, but that Set8 knockdown also leads to a defect in nucleotide incorporation and S-phase progression which is checkpoint independent.

Figure 5.4 Set8 knockdown causes replication progression defects (A&B) Forty-eight hours post-knockdown cells were mock-treated or caffeine-treated (4 mM) then pulsed with BrdU for 30 minutes. Cells were either immediately fixed (A) or incubated in the presence of nocodazole with or without caffeine for an additional 24 hours (B) before harvesting for flow cytometric analysis. The boxed region and percentage shown represent the population of cells found in early S-phase.

128

These results therefore suggest that Set8 may have a direct role in DNA replication. It is possible that caffeine treatment after the checkpoint has already been activated is not sufficient to inhibit ATR and ATM kinase activity, thus the inability to see a recovery in nucleotide incorporation could be due to the fact we did not inhibit all of ATR or ATM’s activity. However, since a rescue of the cell cycle progression defect is observed, it would argue that caffeine is at least achieving partial inhibition.

5.2.4 Set8 knockdown inhibits 53BP1 foci formation

We next examined the localization of 53BP1, a checkpoint mediator and marker for double-strand break formation45. 53BP1 is recruited to double-strand breaks via interactions with γH2AX, and the di-methylated form of H4 (H4K20me2)28, 46, and it is required under certain conditions for checkpoint activation and Chk2 phosphorylation following double-strand break formation47, 48. We found that although cells treated with Set8 siRNA formed prominent foci containing γH2AX, few 53BP1 foci were observed (Fig. 5.5A). By comparison, treatment with ionizing radiation (10Gy) led to similar levels of γH2AX foci as observed with Set8 knockdown but a significantly higher number of 53BP1 foci. These observations suggest there is a defect in 53BP1 recruitment. Because this result differs from two other studies in which 53BP1 foci were found to form upon knockdown of Set8, we also carefully quantified the number of foci we observed per cell after treatment with control or Set8-targeted siRNAs (Fig. 5.5B). Although we found an increase in the number of cells containing greater than five 53BP1 foci upon knockdown of Set8, only ~ 5% of siSet8-treated cells fell into this category. Importantly, approximately 40% of Set8-depleted cells stained highly positive for γH2AX, strongly suggesting that there is a defect in 53BP1 recruitment upon knockdown of Set8. To further confirm this defect, we damaged cells with IR (10Gy) and monitored the phosphorylation of H2AX and formation of 53BP1 foci in Set8 versus control siRNA-treated cells. Strikingly, the Set8 siRNA-treated cells continued to display an inability to recruit 53BP1 (Fig. 5.5C). Therefore, Set8 knockdown appears to cause a defect in 53BP1 chromatin recruitment after DNA damage.

129

Figure 5.5 Set8 knockdown inhibits 53BP1 foci formation (A) HeLa cells were fixed and stained with DAPI and γH2AX or 53BP1 72 hours post- siRNA treatment. A separate sample was treated with IR (10 Gy) and processed in the same manner after one hour. Representative images are shown. (B) 53BP1 foci were scored on a per cell basis by an in-house cell profiler foci identification program. Approximately 200 HeLa cells were scored for each sample. Averages with standard error are shown. (C) Cells were exposed to IR (10 Gy) 72 h post-knockdown, and then fixed and stained as in (A) after 1 hour

5.2.5 Loss of 53BP1 foci upon Set8 knockdown does not affect Chk2 activation

53BP1 is recruited to sites of DNA damage through an interaction with the di-methylated and to a lesser extent the mono-methylated form of H4K2028. Therefore, we reasoned that Set8 siRNA treatment was affecting the methylation of histone 4. Two recent studies concluded that Set8 knockdown reduces the mono-methylation of H4K20, but leaves the di-methylation status of H4K20 unaltered49, 50. However, we found that Set8 depletion reduced not only the mono-methylation of H4K20, but also its di-methylation 72 hours after transfection of the siRNA (Fig. 5.6B). This effect was observed in both HeLa and U2OS cells, indicating it is not due to the use of a specific cell type or affected by the status of p53 (Fig. 5.6A and data not shown). This observation was corroborated by an additional study that also found a reduction in Set8 caused a loss of both the mono and di-methylated forms of histone H451.

130

One function of 53BP1 is to facilitate recruitment of the effector kinase Chk2 to sites of DNA damage. In the absence of 53BP1, Chk2 phosphorylation does not occur after low doses of ionizing radiation. However, at higher doses, another pathway seems to facilitate Chk2 activation, since Chk2 is still phosphorylated in the absence of 53BP1 and recruitment of pChk2 to nuclear foci may occur as well47, 48. To test whether Set8 loss affects the recruitment of Chk2 to damaged DNA, we examined localization of the phosphorylated form of Chk2 by immunofluorescence. Despite the decrease in 53BP1 foci formation after Set8 knockdown, we still observed a significant increase in pChk2 foci (Fig. 5.3B). When taken together with the finding that Chk2 is strongly phosphorylated after Set8 knockdown, this observation suggests that Set8 loss causes a level of DNA damage that is sufficient to activate the checkpoint in the absence of 53BP1 localization.

5.2.6 Set8 loss induces a relaxation of bulk chromatin

Because knockdown of Set8 caused a decrease in the methylation status of H4K20, we reasoned that loss of this methylation could be inducing an overall change in the chromatin. Addition of a methyl group does not add an electrostatic charge to the histone molecule like phosphorylation or acetylation, but it is generally believed to be a repressive mark and thus cause chromatin compaction as the bulk of tri-methylated H4K20 has been shown to be found in heterochromatic regions52. Therefore, we hypothesized that loss of Set8 could cause an overall relaxation of the chromatin upon loss of H4K20 methylation. To test this we harvested genomic DNA from HeLa cells either treated with a control siRNA or siRNA against Set8 72 hrs post knockdown and digested the chromatin with the micrococcal nuclease enzyme for increasing times. Micrococcal nuclease can cleave the DNA in the linker regions in between the histones, and thus can serve as a measure of general compaction, as DNA that is less compact would be more accessible for enzymatic digestion. We found that the chromatin after Set8 depletion was more easily enzymatically digested in monomeric histones, indicating the chromatin with reduces H4K20 methylation was indeed more relaxed (Fig. 5.6C,D). These results were corroborated by a complementary study that also found the chromatin to be less condensed upon loss of Set8 34.

131

Figure 5.6 Set8 knockdown reduces histone H4K20 methylation and alters chromatin compaction status. (A) U2OS cells treated with the indicated siRNAs were fixed and stained with antibodies for H4K20me1 and DAPI 72 h after knockdown then analyzed by epifluorescence microscopy. (B) Histones were acid-extracted from siRNA-treated samples after 72 h and lysates were analyzed by western blotting with the indicated antibodies. Set8 knockdown was assessed by blotting the cytosolic fraction. (C) Genomic DNA from HeLa cells treated with control or Set8-targeted siRNAs was harvested and treated by micrococcal nuclease digestion for the indicated times and run on an agarose gel. Colored asterisks indicate lanes chosen for quantitation in (D). (D) The 5 min time point was quantitated by phosphoimager line scan and plotted in Excel.

132

5.2.7 A prior mitosis is not required to induce DNA damage in the absence of Set8

Set8 expression has been shown to increase in S-phase and G2 phase peaking in mitosis26. In addition, Set8 has also been suggested to have roles in mitotic chromosome condensation27. Thus, it is possible that the replication defect that occurs upon loss of Set8 results from a function for Set8 in the previous round of DNA replication or mitosis that is manifest in the subsequent round of replication. Previous studies have looked at the effect of cell cycle position on DNA damage induced by Set8 loss by transfecting the Set8-targeted siRNA before the thymidine block50. Under these conditions, cells could have moved through S or G2 phase during Set8 depletion and prior to synchronization, thus interfering with proper interpretation of the results.

To determine if progression through a previous S-phase or mitosis was needed for DNA damage to occur in the subsequent round of S-phase, we arrested the normal human fibroblast cell line Hs68 by contact inhibition for 1 week, and then treated the cells with Set8 siRNA. After 48 hours the cells were replated to subconfluent levels, transfected with a second round of siRNA, and pulsed with BrdU at time 0 or 20 and 40 hours later. Cells were then fixed and stained with antibodies against BrdU and γH2AX. There were equal numbers of BrdU-positive cells 20 hours after release from arrest in siControl and siSet8 depleted cells, suggesting there was no defect in S-phase entry (Fig. 5.7A). In addition, the cells did not stain positive either for γH2AX or BrdU immediately after release. However, a strong γH2AX signal was observed in a significant fraction of cells after 20 hours (Fig. 5.7B&C). More importantly, H2AX phosphorylation only occurred in cells that also stained positive for BrdU (Fig. 5.7C). Interestingly, we also found that H4K20 di-methylation was unchanged in these cells 20 hours after release from arrest, while it was significantly decreased 40 hours post-release (Fig. 5.7B and data not shown). In contrast, the mono-methylation of H4K20 did decrease after knockdown of Set8. Taken together, these observations suggest that Set8 has a direct role in promoting genome stability during S-phase and that the damage observed is not an indirect effect of a Set8 requirement within the prior mitosis or S-phase. They also suggest that the effects of Set8 on S-phase are likely to be independent of its effects on H4K20 di-methylation, though they could still result from an effect of Set8 on H4K20 mono-methylation.

133

Figure 5.7: Prior mitosis is not required for γH2AX induction in the absence of Set8 (A). Hs68 cells were arrested by contact inhibition for 1 week and then transfected with 50nM of the indicated siRNA. 48 h after knockdown, cells were trypsinized, replated to subconfluency, and samples were pulsed with BrdU either immediately or 20 hours later. Cell cycle progression was analyzed by monitoring DNA content (PI) and BrdU incorporation was analyzed by flow cytometry. Numbers represent BrdU positive cells. Time points correlate to when the cells were plated to subconfluent levels. (B&C) Twenty hours post-release, cells from (A) were pulse-labeled with BrdU, fixed and stained with DAPI as well as γH2AX and BrdU antibodies then analyzed by microscopy. Percent γH2AX positive cells were calculated using MetaXpress cell identification and scoring software. Approximately 500 cells were analyzed for each sample. Error bars indicate standard error. (D) Cells from (A) were harvested for western blot analysis 20 hours post-release.

134

5.2.8 The damage induced by Set8 knockdown can be relieved by co-depletion with either Rad51 or Mus81

Altogether, the data we have observed upon Set8 loss all points to direct issues in the progression of DNA replication rather than defects in mitosis causing problems in the subsequent S-phase. During the course of our studies, three additional stories came out with complementary information to also demonstrate a direct role for Set8 during replication. Specifically, Set8 was shown to interact with the proliferating cell nuclear antigen (PCNA), a critical component of the replication fork, through a PIP box in the N- terminal region of the protein49, 53. Also, Set8 loss was shown to decrease replication fork velocity and origin firing50 and cause DNA double-strand breaks that were specific to S- phase49. Additionally, in these studies, it was shown the damage induced by loss of Set8 could be prevented by co-depletion of either Cdc45 or Rad51 or by high doses of aphidicolin to prevent S-phase entry49. While the damage loss by aphidicolin treatment or Cdc45 loss again just confirms that Set8 loss causes DNA damage during replication, the observation that Rad51 could cause damage reversal gives a more direct insight as to how the damage might be arising during replication.

Rad51 is the homologous recombination protein that forms a filament along the ssDNA at the site of a DNA double strand break to facilitate cross over and subsequent homologous recombination processes. Therefore one interpretation of the loss of DNA damage upon Set8 and Rad51 depletion together would be that Set8 works downstream of Rad51 to facilitate the completion of DNA repair and the resumption of replication. However, an alternative explanation would be that Set8 depletion causes an overall increase in chromatin relaxation and a slowing of fork velocity, which could cause an increased incidence of template switching at the replication fork, a process that is also facilitated by Rad5154, 55. To differentiate between these two possibilities we decided to co-deplete Set8 with Mus81, the nuclease shown to induce breaks at the sites of stalled or blocked replication forks56. Previously, it was shown in Mus81 knockout cells, that during replicative stress, even though Rad51 foci formed during replication, double strand-breaks could not form without the presence of the Mus81 nuclease and replication could not be completed. This suggests that Mus81 was needed to cleave the structures

135

formed at sites of stalled replication to promote replication progression 57. If the loss of Set8 was causing an increase in template switching at the sites of DNA replication, rather than working downstream to promote DNA repair, the co-depletion of Set8 with either Rad51 or Mus81 should cause a reduction in γH2AX. Indeed, upon co-depletion with either Set8 and Rad51 or Set8 and two different siRNAs against Mus81, we observed a significant reduction in the γH2AX signal after 72 hours of knockdown (Fig. 5.8A,B). This suggests that Set8 loss could be causing a reduction in fork velocity that results in aberrant recombination processes, DNA damage, and an overall inability to progress through S-phase.

Figure 5.8: The DNA damage induced by Set8 knockdown can be relieved by co- depletion with either Rad51 or Mus81 (A) HeLa cells were treated for 72 hours with the indicated siRNAs at a final total concentration of each siRNA of 25nM. (B) HeLa cells were fixed and stained with DAPI and γH2AX 72 hours post-siRNA treatment with the indicated siRNA at a final total concentration of each siRNA of 25 nM. The percent of γH2AX positive cells was quantitated using the MetaXpress software cell scoring module. Error bars indicate SE between 3 separate transfections. Asterisks indicate a significant change between the indicated double knockdown versus the single knockdown alone. P values * < 0.05, ** < 0.01, *** < 0.005.

136

5.3 Discussion

In this study we have shown that the histone methyltransferase Set8 is needed to suppress H2AX phosphorylation during S-phase, indicating that is needed to prevent genome instability. Moreover, our data strongly suggest this DNA damage is due to a direct role for Set8 in S-phase progression, and that the S-phase problems are not solely an indirect result of checkpoint activation or due to a previous defect in mitosis. Although the ATM and ATR pathways are induced upon knockdown of Set8, and S-phase progression is slowed in a checkpoint-dependent manner, some cells fail to progress through S-phase despite inactivation of the checkpoint. Moreover, there is a profound, checkpoint- independent defect in nucleotide incorporation when Set8 is lost, indicating that forks are prematurely terminating or moving more slowly. Consistent with this idea, three other studies made important connections between Set8 methyltransferase activity, DNA replication and genome stability during completion of this work49, 50, 58. These studies showed that Set8 is found at replication forks, and that its interaction is mediated by an interaction with PCNA. They also demonstrated that Set8 is needed for replication fork progression. Collectively, this work therefore demonstrates that Set8 has a critical S- phase function needed for genome stability.

What then is the target of Set8 that is needed to maintain genome stability? One of the two known targets of Set8, p53, does not appear to be relevant here since many of our experiments were done in cells lacking p53. However, lysine 20 of histone H4 may be critical. We have observed that Set8 depletion decreases both the mono- and di- methylation of H4K20. Consistent with this effect, we also observed a defect in the recruitment of 53BP1 to sites of damage, which recognizes di-methylated and to a lesser extent, mono-methylated forms of H4K20. Although our data regarding H4K20 di- methylation and 53BP1 foci formation in response to Set8 knockdown differ from recently published results49, 50, one other study did note a defect in 53BP1 foci formation in response to Set8 siRNA treatment28. Discrepancies in these experiments may be due to differing efficiencies of Set8 knockdown. Nevertheless, the data suggest the defects in 53BP1 localization and H4K20 di-methylation are not critical to the loss of genome stability observed upon Set8 loss. Under some conditions we observed no effect on

137

H4K20 di-methylation, yet H2AX phosphorylation was still strongly induced, and similar findings were made in other studies as well. In addition, loss of 53BP1 itself does not cause significant H2AX phosphorylation59.

Despite findings suggesting the di-methylation of H4K20 is not required for genome stability, Set8 dependent mono-methylation of H4K20 may still be critical as loss of this mark did correlate with the increased H2AX phosphorylation. One possibility is that failure to methylate H4K20 leads to an overall change in chromatin structure that impedes progression of the replication fork and increases genome instability. Structure analysis shows that K20 of histone 4 lies close to the histone-fold domain and is normally covered by DNA60. Therefore, unlike the majority of histone tail modifications that do not contact this domain or DNA, H4K20 methylation has the potential to directly affect chromatin structure. Consistent with the idea that Set8 may modulate chromatin structure, we and other studies also observed a relaxation of chromatin in the absence of Set834. Although we cannot rule out the possibility that this change in chromatin structure is a result of the DNA damage induced upon Set8 loss, it is important to note that loss of Set8 in Drosophila leads to activation of the DNA damage checkpoints and chromatin changes in the absence of detectable DNA damage27. Similar interactions are likely to be responsible for the effects of H3K56 acetylation on genome stability. This lysine residue also lies close to the histone fold and its acetylation weakens the interactions between histones and DNA61, 62, affecting normal chromatin structure. Failure to acetylate K56 leads to defects in replication progression, increased sensitivity to DNA damage and increased genome instability6, 7. Interestingly, the strength of the checkpoint response has also been shown to be amplified in the context of more relaxed chromatin, thus suggesting that Set8 deficient cells might be primed for heightened checkpoint activation when they encounter cellular stress63.

It is also possible that Set8 has another substrate at the replication fork, modification of which is essential for DNA replication. Indeed, the interaction between Set8 and PCNA may bring Set8 to the replication fork giving it ready access to methylate replication- associated proteins. Although, targeted in vitro methylation studies showed no methylation of a panel of replication and checkpoint proteins by Set8, sites similar to the

138

consensus methylation site (RHRK) for Set8 are found in a variety of DNA replication, checkpoint and repair proteins including Dbf4, ATM, and ASPM. What is intriguing though is that the induction of H2AX phosphorylation induced by loss of Set8 can be suppressed by the simultaneous knockdown of Rad51 or Mus81. Recombination or template switching may be needed to resolve naturally stalled replication forks during unperturbed DNA replication1. However, the profound effect of Set8 loss on γH2AX is striking and is more similar to the effects that result from loss of proteins involved in DNA replication than in recombination. Altogether, this indicates that while Set8 may have multiple targets to maintain genomic stability during S-phase, including H4K20 methylation, the damage caused by loss of Set8 methylation can be alleviated by preventing the cleavage of stalled replication forks or inhibition of recombination type processes.

Whether the loss of Set8 and the resultant genomic instability predisposes a cell for oncogenic transformation remains unclear. Loss of H4K20 methylation is a common phenotype for cancerous cells64 and interestingly a SNP in the 3’UTR of the Set8 genes that reduces Set8 expression has also been shown to cause an increase risk of the early onset of breast cancer65. However, complete gene deletion results in early embryonic lethality, thus demonstrating the essential role of Set851. Overall, the roles of Set8 in development, the maintenance of genome stability, and cancer progression will be of great interest for future study.

139

REFERENCES

1. Paulsen RD, Cimprich KA. The ATR pathway: fine-tuning the fork. DNA repair 2007; 6:953-66.

2. Aguilera A, Gomez-Gonzalez B. Genome instability: a mechanistic view of its causes and consequences. Nature reviews 2008; 9:204-17.

3. Groth A, Rocha W, Verreault A, Almouzni G. Chromatin challenges during DNA replication and repair. Cell 2007; 128:721-33.

4. Martin C, Zhang Y. Mechanisms of epigenetic inheritance. Current opinion in cell biology 2007; 19:266-72.

5. Kouzarides T. Chromatin modifications and their function. Cell 2007; 128:693- 705.

6. Han J, Zhou H, Horazdovsky B, Zhang K, Xu RM, Zhang Z. Rtt109 acetylates histone H3 lysine 56 and functions in DNA replication. Science (New York, NY 2007; 315:653-5.

7. Driscoll R, Hudson A, Jackson SP. Yeast Rtt109 promotes genome stability by acetylating histone H3 on lysine 56. Science (New York, NY 2007; 315:649-52.

8. Trojer P, Li G, Sims RJ, 3rd, Vaquero A, Kalakonda N, Boccuni P, Lee D, Erdjument-Bromage H, Tempst P, Nimer SD, Wang YH, Reinberg D. L3MBTL1, a histone-methylation-dependent chromatin lock. Cell 2007; 129:915-28.

9. Trojer P, Reinberg D. Beyond histone methyl-lysine binding: How malignant brain tumor (MBT) protein L3MBTL1 impacts chromatin structure. Cell cycle (Georgetown, Tex 2008; 7.

10. Taverna SD, Li H, Ruthenburg AJ, Allis CD, Patel DJ. How chromatin-binding modules interpret histone modifications: lessons from professional pocket pickers. Nature structural & molecular biology 2007; 14:1025-40.

11. Ye X, Franco AA, Santos H, Nelson DM, Kaufman PD, Adams PD. Defective S phase chromatin assembly causes DNA damage, activation of the S phase checkpoint, and S phase arrest. Molecular cell 2003; 11:341-51.

12. Ramey CJ, Howar S, Adkins M, Linger J, Spicer J, Tyler JK. Activation of the DNA damage checkpoint in yeast lacking the histone chaperone anti-silencing function 1. Molecular and cellular biology 2004; 24:10313-27.

140

13. Tyler JK, Adams CR, Chen SR, Kobayashi R, Kamakaka RT, Kadonaga JT. The RCAF complex mediates chromatin assembly during DNA replication and repair. Nature 1999; 402:555-60.

14. Emili A, Schieltz DM, Yates JR, 3rd, Hartwell LH. Dynamic interaction of DNA damage checkpoint protein Rad53 with chromatin assembly factor Asf1. Molecular cell 2001; 7:13-20.

15. Hu F, Alcasabas AA, Elledge SJ. Asf1 links Rad53 to control of chromatin assembly. Genes & development 2001; 15:1061-6.

16. Gunjan A, Verreault A. A Rad53 kinase-dependent surveillance mechanism that regulates histone protein levels in S. cerevisiae. Cell 2003; 115:537-49.

17. Yang JH, Freudenreich CH. The Rtt109 histone acetyltransferase facilitates error- free replication to prevent CAG/CTG repeat contractions. DNA repair; 9:414-20.

18. Masumoto H, Hawke D, Kobayashi R, Verreault A. A role for cell-cycle- regulated histone H3 lysine 56 acetylation in the DNA damage response. Nature 2005; 436:294-8.

19. Das C, Lucia MS, Hansen KC, Tyler JK. CBP/p300-mediated acetylation of histone H3 on lysine 56. Nature 2009; 459:113-7.

20. Xiao B, Jing C, Kelly G, Walker PA, Muskett FW, Frenkiel TA, Martin SR, Sarma K, Reinberg D, Gamblin SJ, Wilson JR. Specificity and mechanism of the histone methyltransferase Pr-Set7. Genes & development 2005; 19:1444-54.

21. Fang J, Feng Q, Ketel CS, Wang H, Cao R, Xia L, Erdjument-Bromage H, Tempst P, Simon JA, Zhang Y. Purification and functional characterization of SET8, a nucleosomal histone H4-lysine 20-specific methyltransferase. Curr Biol 2002; 12:1086- 99.

22. Nishioka K, Rice JC, Sarma K, Erdjument-Bromage H, Werner J, Wang Y, Chuikov S, Valenzuela P, Tempst P, Steward R, Lis JT, Allis CD, Reinberg D. PR-Set7 is a nucleosome-specific methyltransferase that modifies lysine 20 of histone H4 and is associated with silent chromatin. Molecular cell 2002; 9:1201-13.

23. Yin Y, Liu C, Tsai SN, Zhou B, Ngai SM, Zhu G. SET8 recognizes the sequence RHRK20VLRDN within the N terminus of histone H4 and mono-methylates lysine 20. The Journal of biological chemistry 2005; 280:30025-31.

24. Couture JF, Collazo E, Brunzelle JS, Trievel RC. Structural and functional analysis of SET8, a histone H4 Lys-20 methyltransferase. Genes & development 2005; 19:1455-65.

141

25. Sanders SL, Portoso M, Mata J, Bahler J, Allshire RC, Kouzarides T. Methylation of histone H4 lysine 20 controls recruitment of Crb2 to sites of DNA damage. Cell 2004; 119:603-14.

26. Rice JC, Nishioka K, Sarma K, Steward R, Reinberg D, Allis CD. Mitotic- specific methylation of histone H4 Lys 20 follows increased PR-Set7 expression and its localization to mitotic chromosomes. Genes & development 2002; 16:2225-30.

27. Sakaguchi A, Steward R. Aberrant monomethylation of histone H4 lysine 20 activates the DNA damage checkpoint in Drosophila melanogaster. The Journal of cell biology 2007; 176:155-62.

28. Botuyan MV, Lee J, Ward IM, Kim JE, Thompson JR, Chen J, Mer G. Structural basis for the methylation state-specific recognition of histone H4-K20 by 53BP1 and Crb2 in DNA repair. Cell 2006; 127:1361-73.

29. Karachentsev D, Sarma K, Reinberg D, Steward R. PR-Set7-dependent methylation of histone H4 Lys 20 functions in repression of gene expression and is essential for mitosis. Genes & development 2005; 19:431-5.

30. Sautel CF, Cannella D, Bastien O, Kieffer S, Aldebert D, Garin J, Tardieux I, Belrhali H, Hakimi MA. SET8-mediated methylations of histone H4 lysine 20 mark silent heterochromatic domains in apicomplexan genomes. Molecular and cellular biology 2007; 27:5711-24.

31. Pesavento JJ, Yang H, Kelleher NL, Mizzen CA. Certain and progressive methylation of histone H4 at lysine 20 during the cell cycle. Molecular and cellular biology 2008; 28:468-86.

32. Yang H, Pesavento JJ, Starnes TW, Cryderman DE, Wallrath LL, Kelleher NL, Mizzen CA. Preferential dimethylation of histone H4 lysine 20 by Suv4-20. The Journal of biological chemistry 2008; 283:12085-92.

33. Garcia BA, Hake SB, Diaz RL, Kauer M, Morris SA, Recht J, Shabanowitz J, Mishra N, Strahl BD, Allis CD, Hunt DF. Organismal differences in post-translational modifications in histones H3 and H4. The Journal of biological chemistry 2007; 282:7641-55.

34. Houston SI, McManus KJ, Adams MM, Sims JK, Carpenter PB, Hendzel MJ, Rice JC. Catalytic function of the PR-Set7 histone H4 lysine 20 monomethyltransferase is essential for mitotic entry and genomic stability. The Journal of biological chemistry 2008; 283:19478-88.

35. Shi X, Kachirskaia I, Yamaguchi H, West LE, Wen H, Wang EW, Dutta S, Appella E, Gozani O. Modulation of p53 function by SET8-mediated methylation at lysine 382. Molecular cell 2007; 27:636-46.

142

36. Syljuasen RG, Sorensen CS, Hansen LT, Fugger K, Lundin C, Johansson F, Helleday T, Sehested M, Lukas J, Bartek J. Inhibition of human Chk1 causes increased initiation of DNA replication, phosphorylation of ATR targets, and DNA breakage. Molecular and cellular biology 2005; 25:3553-62.

37. Liu Q, Guntuku S, Cui XS, Matsuoka S, Cortez D, Tamai K, Luo G, Carattini- Rivera S, DeMayo F, Bradley A, Donehower LA, Elledge SJ. Chk1 is an essential kinase that is regulated by Atr and required for the G(2)/M DNA damage checkpoint. Genes & development 2000; 14:1448-59.

38. Myers JS, Cortez D. Rapid activation of ATR by ionizing radiation requires ATM and Mre11. The Journal of biological chemistry 2006; 281:9346-50.

39. Zou L. Single- and double-stranded DNA: building a trigger of ATR-mediated DNA damage response. Genes & development 2007; 21:879-85.

40. Jazayeri A, Falck J, Lukas C, Bartek J, Smith GC, Lukas J, Jackson SP. ATM- and cell cycle-dependent regulation of ATR in response to DNA double-strand breaks. Nature cell biology 2006; 8:37-45.

41. Matsuoka S, Huang M, Elledge SJ. Linkage of ATM to cell cycle regulation by the Chk2 protein kinase. Science (New York, NY 1998; 282:1893-7.

42. Zou L, Elledge SJ. Sensing DNA damage through ATRIP recognition of RPA- ssDNA complexes. Science (New York, NY 2003; 300:1542-8.

43. Santocanale C, Diffley JF. A Mec1- and Rad53-dependent checkpoint controls late-firing origins of DNA replication. Nature 1998; 395:615-8.

44. Ge XQ, Jackson DA, Blow JJ. Dormant origins licensed by excess Mcm2-7 are required for human cells to survive replicative stress. Genes & development 2007; 21:3331-41.

45. Schultz LB, Chehab NH, Malikzay A, Halazonetis TD. p53 binding protein 1 (53BP1) is an early participant in the cellular response to DNA double-strand breaks. The Journal of cell biology 2000; 151:1381-90.

46. Ward IM, Minn K, Jorda KG, Chen J. Accumulation of checkpoint protein 53BP1 at DNA breaks involves its binding to phosphorylated histone H2AX. The Journal of biological chemistry 2003; 278:19579-82.

47. Ward IM, Minn K, van Deursen J, Chen J. p53 Binding protein 53BP1 is required for DNA damage responses and tumor suppression in mice. Molecular and cellular biology 2003; 23:2556-63.

143

48. Wang B, Matsuoka S, Carpenter PB, Elledge SJ. 53BP1, a mediator of the DNA damage checkpoint. Science (New York, NY 2002; 298:1435-8.

49. Jorgensen S, Elvers I, Trelle MB, Menzel T, Eskildsen M, Jensen ON, Helleday T, Helin K, Sorensen CS. The histone methyltransferase SET8 is required for S-phase progression. The Journal of cell biology 2007; 179:1337-45.

50. Tardat M, Murr R, Herceg Z, Sardet C, Julien E. PR-Set7-dependent lysine methylation ensures genome replication and stability through S phase. The Journal of cell biology 2007; 179:1413-26.

51. Oda H, Okamoto I, Murphy N, Chu J, Price SM, Shen MM, Torres-Padilla ME, Heard E, Reinberg D. Monomethylation of histone H4-lysine 20 is involved in chromosome structure and stability and is essential for mouse development. Molecular and cellular biology 2009; 29:2278-95.

52. Schotta G, Lachner M, Sarma K, Ebert A, Sengupta R, Reuter G, Reinberg D, Jenuwein T. A silencing pathway to induce H3-K9 and H4-K20 trimethylation at constitutive heterochromatin. Genes & development 2004; 18:1251-62.

53. Huen MS, Sy SM, van Deursen JM, Chen J. Direct interaction between SET8 and proliferating cell nuclear antigen couples H4-K20 methylation with DNA replication. The Journal of biological chemistry 2008; 283:11073-7.

54. Li X, Heyer WD. Homologous recombination in DNA repair and DNA damage tolerance. Cell research 2008; 18:99-113.

55. Adar S, Izhar L, Hendel A, Geacintov N, Livneh Z. Repair of gaps opposite lesions by homologous recombination in mammalian cells. Nucleic Acids Res 2009; 37:5737-48.

56. Osman F, Whitby MC. Exploring the roles of Mus81-Eme1/Mms4 at perturbed replication forks. DNA repair 2007; 6:1004-17.

57. Hanada K, Budzowska M, Davies SL, van Drunen E, Onizawa H, Beverloo HB, Maas A, Essers J, Hickson ID, Kanaar R. The structure-specific endonuclease Mus81 contributes to replication restart by generating double-strand DNA breaks. Nature structural & molecular biology 2007; 14:1096-104.

58. Huen MS, Sy SM, van Deursen JM, Chen J. Direct interaction between SET8 and PCNA couples H4-K20 methylation with DNA replication. The Journal of biological chemistry 2008.

59. DiTullio RA, Jr., Mochan TA, Venere M, Bartkova J, Sehested M, Bartek J, Halazonetis TD. 53BP1 functions in an ATM-dependent checkpoint pathway that is constitutively activated in human cancer. Nature cell biology 2002; 4:998-1002.

144

60. Luger K, Mader AW, Richmond RK, Sargent DF, Richmond TJ. Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature 1997; 389:251-60.

61. Peterson CL. Genome integrity: a HAT needs a chaperone. Curr Biol 2007; 17:R324-6.

62. Ozdemir A, Masumoto H, Fitzjohn P, Verreault A, Logie C. Histone H3 lysine 56 acetylation: a new twist in the chromosome cycle. Cell cycle (Georgetown, Tex 2006; 5:2602-8.

63. Murga M, Jaco I, Fan Y, Soria R, Martinez-Pastor B, Cuadrado M, Yang SM, Blasco MA, Skoultchi AI, Fernandez-Capetillo O. Global chromatin compaction limits the strength of the DNA damage response. The Journal of cell biology 2007; 178:1101-8.

64. Van Den Broeck A, Brambilla E, Moro-Sibilot D, Lantuejoul S, Brambilla C, Eymin B, Khochbin S, Gazzeri S. Loss of histone H4K20 trimethylation occurs in preneoplasia and influences prognosis of non-small cell lung cancer. Clin Cancer Res 2008; 14:7237-45.

65. Song F, Zheng H, Liu B, Wei S, Dai H, Zhang L, Calin GA, Hao X, Wei Q, Zhang W, Chen K. An miR-502-binding site single-nucleotide polymorphism in the 3'- untranslated region of the SET8 gene is associated with early age of breast cancer onset. Clin Cancer Res 2009; 15:6292-300.

145

146

CHAPTER 6

Outlook and Future Directions

Limitations and future directions of the DNA damage siRNA screens

The end of any experiment should leave one with more questions than answers. This was most certainly the case with our screens, and deciphering the novel roles of our identified genes in maintaining genomic integrity should be a very fruitful and clinically relevant endeavor in the lab for years to come. However, several challenges and qualifications should be addressed in regard to our screening results.

First, there is the possibility that our dataset contains genes that are inducing genomic instability due to off-target effects. To minimize this, during the course of our validation studies, we retested the ability of the siRNA to induce γH2AX by transfecting cells with the four individual siRNAs that made up the original pool. Currently, the general consensus in the siRNA screening field is that the more siRNAs that produce the same phenotype, the more confidence you can place on that result. However, if you have only one out of four siRNA showing a phenotype, it is more likely the phenotype is due to an off target effect rather than due to the knockdown of your gene of interest. While this is a valid approach, currently, we have deconvoluted only a small fraction of the hits identified by both genomic screens. An equally accepted and technically more realistic approach for validation of a larger portion of our hits would be to order or generate a different siRNA library against the same genes and retest their ability to induce γH2AX.

Furthermore, the siRNA screen, siRNA validation, and most follow-up studies were all performed in the HeLa cell line. While this line was ideal for screening purposes due to its ease in transfection efficiency and immunofluorescence staining, it is not an accurate representation of a typical human cell as it is already cancerous. Therefore, many of the genes we identified in the screens could have had a heightened γH2AX response due to the endogenous stress found within a cancerous cell as well as their increased replicative

147

potential. Validating the siRNA screening hits by retesting their phenotype in a more “normal” cell line would provide a better picture of which siRNA produce damage specifically in a cancerous context compared to which siRNA produce damage very globally. Ultimately differentiating between the two could be very useful to identify novel cancer therapy targets.

A further challenge going forward with the screening results will be actually proving loss of our identified hits does indeed cause genomic instability. While γH2AX is generally accepted as an excellent marker of DNA damage, and thought to be found at the sites of DNA double-strand breaks, it has also been shown under conditions of replication stress, as well as in response to UV irradiation1, 2. Therefore, while γH2AX is good marker of checkpoint activation, it does not necessarily correlate with the physical induction of breaks within the DNA. To address this, we have looked at independent readouts for double-strand break formation including 53BP1 and phospho-KAP1 (KRAB associated protein 1, data not shown) on a limited basis, but again these methods while applicable to a high-throughput format, do not directly show DNA fragmentation or rearrangements. Visualizing DNA breaks can be achieved by many methods including pulse-field gel electrophoresis, chromosome mitotic spreads, the TUNEL assay, or COMET assay. However, none of these applications is technically feasible in a high-throughput format, and thus must be accomplished on a gene by gene basis.

The remaining challenges of the aphidicolin dataset

Our original intention in designing two siRNAs screens was to utilize the aphidicolin dataset as a way of narrowing down our hit list, as we would primarily be interested in genes that cause γH2AX in the presence, but not absence of aphidicolin. In retrospect, this ideology was rather naïve as logically many genes involved in replication fork stability or preventing DNA damage during S-phase would likely cause a γH2AX phenotype in the absence of damage simply due to their requirement during DNA replication. Many of the genes we expected to find to induce γH2AX, including genes involved in DNA replication and DNA repair, exhibited a high γH2AX signal in the absence of aphidicolin which was then further increased in the presence of drug.

148

However, while we expected the number of genes inducing γH2AX in the presence of aphidicolin to be relatively small, due to the defined question proposed, the number of significant hits from the screen in the presence of aphidicolin was actually nearly seven fold higher than in its absence. Many of these effects are likely to be due to a cell cycle synchronization effect of aphidicolin, with DNA damage being slightly higher during S- phase, and any perturbation of cellular processes needed for proper replication could potentially show an γH2AX phenotype. Therefore deciphering which hits are truly involved in replication fork stability remains an ongoing challenge. One method that may help in this process would be to compare the γH2AX intensity within each cell after gene knockdown both in the presence and absence of aphidicolin. Currently, our designed analysis looks at the percentage of cells that stained positively for γH2AX, by applying an intensity threshold cutoff for what is considered a positive cell. However, this analysis ignores how much the γH2AX intensity changes between the absence and presence of aphidicolin. Fragile sites and replication fork collapse are known to be induced by the presence of aphidicolin3, therefore it would be logical that genes involve in replication fork stability would have a greater change in γH2AX intensity upon aphidicolin treatment.

Another method for deciphering which siRNA screening hits are involved in preventing replication fork collapse would be the implementation of a refined replication recovery assay. Currently the assay we used to measure the ability of cells to proceed into mitosis either in the absence or presence of stalled replication forks, simply utilizes a mitotic marker. Therefore if a defect in recovery is observed, we cannot determine if the cells that were not in mitosis were actually having problems during replication, or if they were arrested during some other phase of the cell cycle. Incorporation of a replication marker like Edu before tracking the cells into mitosis would distinguish which siRNA are causing replication recovery issues versus cell cycle arrest.

149

Co-transcriptional processes

The identification of such a large number of mRNA processing genes in the prevention of genomic instability opens many new doors at the interface between activation of the checkpoint and co-transcriptional processes. Distinguishing the sources of the DNA damage caused by loss of mRNA processing genes is ongoing, but a likely candidate is the formation of RNA/DNA hybrid structures, or R-loops, which may cause damage both during and outside of replication. While proving these structures exist in cells has proven to be technically difficult, studying how these structures affect replication fork progression and how they are processed by the cell may be ultimately relevant to our understanding of transcription-induced DNA damage. The use of an in vitro system could prove very valuable for the initial characterization of R-loop formation.

Also the question remains: why would the DNA damage checkpoint regulate co- transcriptional processes? Interestingly, the checkpoint has putative targets both in the core spliceosomal machinery as well as several splicing regulatory or accessory factors4. One possibility may be that the checkpoint regulates alternative splicing by altering the core machinery to selectively splice genes with a particular splice site. Alternatively, the checkpoint may also regulate the mRNA processing machinery by phosphorylating spliceosomal regulatory elements to avoid DNA damage that can be caused by transcriptional processes. Interestingly, several of the identified mRNA processing genes have RNA helicase domains and thus could have the potential to remove R-loop structures and prevent DNA damage accumulation. Regulation of these activities could potentially be a function of the DNA damage response both during endogenous DNA replication as well as in response to replicative stress.

Charcot-Marie-Tooth disease

Identification of the Charcot-Marie-Tooth (CMT) genes as an enriched category within the siRNA screening hits highlights the connections between the DNA damage response genes and the phenotype of neurodegeneration. However many questions remain in regard to the clinical relevance and mechanisms leading to the observed DNA damage. First and foremost, is the DNA damage we see a cause or simply a consequence of the

150

CMT disease? The group of CMT genes was originally clinically defined as having the same patient phenotype of peripheral neuropathy; however, after molecular characterization, they are now known to encode a heterogeneous group of proteins with a variety of functions in the cell5, 6. Loss of these genes may not lead to DNA damage through a common mechanism, but rather the damage may reflect the roles of these genes in a variety of functions that serve to protect the DNA from instability. Literature evidence suggests some of these roles may be promoting proper nuclear structure or chromosome segregation during mitosis 5, 7-9. However, other undefined mechanisms for inducing genomic instability are equally plausible. Interestingly, several neurodegenerative disorders, including Alzheimer’s, Parkinson’s, and Huntington’s disease are all known to increase oxidative stress within the cell and thus could also generate increased DNA damage potentially contributing to neuronal stress and apoptosis 10. Whether increased DNA damage is a common phenotype between many neurological diseases remains to be tested. Interestingly, beyond the CMT genes, our screens also identified the DMD (Duchene’s muscular dystrophy) gene and many genes involved in the pathology of Huntington’s disease.

The role of Set8 in maintaining genomic stability

The discovery of the chromatin modifying enzyme, Set8, as a mediator of genomic stability during DNA replication was also surprising for a variety of reasons. First, the high extent of DNA damage induced after Set8 knockdown was striking. The largest inducers of γH2AX were mostly genes known to have roles in DNA replication or checkpoint activation such as RPA, CHK1, and the DNA polymerases. This observation first led us to investigate the role of Set8 during DNA replication. Secondly, the discovery of Set8 was also surprising due to the apparent specificity of damage due to Set8 loss, but not other chromatin modifying enzymes. Many histone modifications exist in the cell, and all have their own specific enzymes for catalysis, however, if the general regulation of chromatin structure is important for genomic stability, one would expect that our screen would have discovered many chromatin modifying enzymes as hits, which was not the case. This observation then begs the question of what is special about loss of Set8? One possibility is that the H4K20 methylation is specifically required to

151

prevent DNA damage, whether by promoting proper chromosome structure, facilitating efficient DNA replication, or by promoting proper DNA repair through recruitment of the 53BP1 protein. While the latter seems unlikely as loss of 53BP1 is not known to induce significant amounts of γH2AX11, examining the effect of H4K20 methylation on DNA replication would be of great interest. Alternatively, Set8 may have other relevant targets whose regulation is critical for promoting genomic stability. While Set8 has been shown to methylate p53 to regulate its function12, several other putative targets with similar methylation consensus sites have DNA repair or checkpoint roles could also explain the high extent of DNA damage induction observed upon Set8 loss.

Concluding remarks

The technological advances of the past decade have transformed the scope of biological questions we can address. With functional genomics, we now have the capability to “perturb and observe” every gene within the human genome, and thus decipher the sources of human disease. While beautifully simple in theory, many diseases, such as cancer, are ultimately complex with multi-faceted causes. However, if we simplify our questions by attempting to understand some of the underlying disease phenotypes such as uncontrolled proliferation or genomic instability, we can begin to understand potential causation of these complicated diseases.

Genomic instability is known to exist in many forms and be a hallmark phenotype of cancer; however, the cellular causes and consequences of such instability still remain to be fully understood. Here we have shown that the pathways and processes affecting genome stability are much broader than anticipated and deciphering the relevance of our identified genes to cancer formation and progression will be a subject of great interest moving forward. Ultimately, the worth of a functional genomic screen will be determined by the amount of insight brought forth from its results. While we have begun to validate and test hypothesis regarding the roles of a portion of the genes identified by the screen, many others have yet to be considered. Deciphering the mechanisms and clinical relevance of the genes involve in maintaining genomic stability will ultimately keep researchers busy for years to come.

152

References

1. Ward IM, Chen J. Histone H2AX is phosphorylated in an ATR-dependent manner in response to replicational stress. The Journal of biological chemistry 2001; 276:47759- 62.

2. Hanasoge S, Ljungman M. H2AX phosphorylation after UV irradiation is triggered by DNA repair intermediates and is mediated by the ATR kinase. Carcinogenesis 2007; 28:2298-304.

3. Arlt MF, Casper AM, Glover TW. Common fragile sites. Cytogenetic and genome research 2003; 100:92-100.

4. Matsuoka S, Ballif BA, Smogorzewska A, McDonald ER, 3rd, Hurov KE, Luo J, Bakalarski CE, Zhao Z, Solimini N, Lerenthal Y, Shiloh Y, Gygi SP, Elledge SJ. ATM and ATR substrate analysis reveals extensive protein networks responsive to DNA damage. Science (New York, NY 2007; 316:1160-6.

5. Berger P, Young P, Suter U. Molecular cell biology of Charcot-Marie-Tooth disease. Neurogenetics 2002; 4:1-15.

6. Niemann A, Berger P, Suter U. Pathomechanisms of mutant proteins in Charcot- Marie-Tooth disease. Neuromolecular medicine 2006; 8:217-42.

7. Pollex RL, Hegele RA. Hutchinson-Gilford progeria syndrome. Clinical genetics 2004; 66:375-81.

8. Hamao K, Morita M, Hosoya H. New function of the proline rich domain in dynamin-2 to negatively regulate its interaction with microtubules in mammalian cells. Exp Cell Res 2009; 315:1336-45.

9. Thompson HM, Skop AR, Euteneuer U, Meyer BJ, McNiven MA. The large GTPase dynamin associates with the spindle midzone and is required for cytokinesis. Curr Biol 2002; 12:2111-7.

10. Staropoli JF. Tumorigenesis and neurodegeneration: two sides of the same coin? Bioessays 2008; 30:719-27.

11. DiTullio RA, Jr., Mochan TA, Venere M, Bartkova J, Sehested M, Bartek J, Halazonetis TD. 53BP1 functions in an ATM-dependent checkpoint pathway that is constitutively activated in human cancer. Nature cell biology 2002; 4:998-1002.

12. Shi X, Kachirskaia I, Yamaguchi H, West LE, Wen H, Wang EW, Dutta S, Appella E, Gozani O. Modulation of p53 function by SET8-mediated methylation at lysine 382. Molecular cell 2007; 27:636-46.

153

154

APPENDIX A siRNA Screening Protocols and Analysis Methods

A.1 Robotic siRNA Procedures

A.1.1 High throughput Reagents, Buffers, and Equipment Used

Reagents:

Lot # Reagent Vendor Item Number /Date Made Misc. Human siARRAY Dharmacon G-005000-25 Diluted 7/2006 10 μL of 2 μM Genomic Smart Pools Order# 74569 siRNA oligos DMEM Invitrogen 11960-044 1398500 +4.5g D-Glucose - L-Glut - sodium pyruvate OptiMEM Invitrogen 31985-070 1354865 +HEPES +2.4g sodium bicarbonate + L-Glut Fetal Bovine Serum Invitrogen 10437-028 1320238 origin (Mex) 200mM L-Glutamine Invitrogen 25030-081 1384126 100X Dharmafect #1 Dharmacon T-2001-03 070220T

Aphidicolin Sigma-Aldrich A-0781 094K4104

Methanol Fisher A433P-4 062946W

Bovine Serum Equitech Bio BAC62-1000 BAC62-716 Fraction V Albumin (BSA) INC TBS Anti γH2AX Antibody Cell Signaling 2577 Lot 2 use 1:500 (Rabbit) Goat-Anti-Rabbit Invitrogen A11008 52357A use 1:1000 Alexa 488 Propidium Iodide Calbiochem 537059 B13972

RNAse A Sigma-Aldrich R4642

Chk1 SmartPool Dharmacon M-003255-02- Sequence Available siRNA 0020 Nontargeting Control Dharmacon D-001206-13-20 siRNA Pool #1

155

Equipment List:

Item #/ Equipment Model Lot/Serial Name Vendor Number Number Misc. PlateLoc Velocity11 01867.001 1.00406

BenchCel 4X Velocity11 08344.004 20.00158.0040

VPrep Velocity11 02318.102 2.00158.1000

96 Channel LT Head- Velocity11 04730.002 12.00102 VPrep VPrep Tips Velocity11 06879.002 60 μL-96LT Clear-bottom 384 E&K Scientific EK-30091 384-well black well Greiner plates polystyrene Multidrop 384 Titertek 5840200 832003819 For plating cells

WellMate Dispenser Matrix 201-10001 119542592

WellMate Stacker Matrix 201-20001 201-2-0107

WellMate Tubing- Matrix 201-30002 116730014 Generic-cells, Small Bore antibodies, etc. WellMate Tubing- Matrix 201-30002 118340021 Dharmafect (RNAse Small Bore Free) WellMate Tubing- Matrix 201-30002 118340006 DMEM (RNAse Free) Small Bore Centrifuge Beckman Allegra-6 AL599317

Plate Washer Low Bio-Tek ELx405UCW 190953 Flow BioStack Bio-Tek BioStack 196344

ImageXpress Axon (Molecular 5000e IX96020 Devices) Isocyte Blueshift Biotech

Buffers:

siRNA OptiMEM Buffer (300 mL for 40 96-well siRNA plates) (300 mL) OptiMEM (RNAse Free) (300 mL) Total Volume

Dharmafect Buffer (250 mL for 60 384-well assay plates) Final Concentration (250 mL) OptiMEM (RNAse Free) 0.03uL (750 μL) Dharmafect Dharmafect/well (250 mL) Total Volume

156

Cell Media (750mL for 60 384-well assay plates) Final Concentration (675mL) DMEM 10% FBS, 2mM L- (75mL) FBS Glut (7.5mL) L-Glutamine

6X Aphidicolin Buffer (150 mL for 30 384-well assay plates) Final Concentration (150 mL) DMEM 400 nM Aphidicolin (2 μL) 30mM Aphidicolin (150 mL) Total Volume

Blocking Buffer (750mL for 60 384-well assay plates) Final Concentration (750mL) 1X TBS 2% BSA (15g) BSA (750mL) Total Volume

γH2AX Antibody (250 mL for 60 384-well assay plates) Final Concentration (250 mL) 1X TBS 2% BSA (5g) BSA 1:500 γH2AX AB (0.5 mL) γH2AX Antibody lot#2 (250 mL) Total Volume

Secondary Antibody (250 mL for 60 384-well assay plates) Final Concentration (250 mL) 1X TBS 2% BSA (5g) BSA 1:1000 Goat-anti- (0.25 mL) Goat-anti-Rabbit Antibody Rabbit (250 mL) Total Volume

3X Propidium Iodide /RNAseA Staining Solution (700mL for 60 Final Concentration 384-well assay plates) 1X (700 mL) 1X TBS 10ug/mL RNAse A (210uL) 1mg/mL Propidium Iodine 0.1ug/mL PI (840uL) 25mg/mL RNAse A

A.1.2 384 Well Robotic Transfection

Step 1. Library Preparation

One copy (0.25 nmoles of desiccated siRNA pools targeting 21,122 genes in 267 X 96- well plates, 80 siRNA pools/plate) of the siARRAY whole human genome siRNA library from ThermoFisher Scientific (formerly Dharmacon; Cat# G-005000-025) was diluted to 2 μM with 125 μl of 1X siRNA buffer (20 mM KCl, 6 mM HEPES-pH 7.5, and 0.2 mM,

157

MgCl2) using a Velocity11 VPrep with a 96 tip disposable tip head. Plates were immediately heat-sealed using an integrated PlateLoc and BenchCel (Velocity11) and then frozen at -20 oC. After 24 hours, the plates were thawed and 10 μL was transferred with the V-prep to single-use 96-well polypropylene U-bottom plates. The “daughter” plates were immediately sealed and placed at -20 oC until use.

Step 2. Adding siRNA DMEM Buffer to siRNA Plates.

In this step, the Matrix Wellmate will add 70 μL of siRNA DMEM Buffer to 1 set of siRNA Daughter plates (0.02 nmoles), resulting in 80 μL of 250 nM siRNA pools.

1. Wipe Benches with RNaseZAP using RNaseZAP wipes.

2. Thaw one set of siRNA Daughter plates (room temperature, 1 hour).

3. Spin Plates in Beckman Allegra 5 minutes, 600 RPM. Remove and save lids of siRNA plates. Remove foil covers. Use gloves.

4. Aliquot 1uL of 20uM Chk1siRNA to wells B2 and F2 of all siRNA plates

5. Aliquot 1uL of 20uM Nontargeting Pool #1 to wells C2 and G2 of all siRNA plates

6. Matrix Wellmate setup: Use Small Bore Tubing set #118340006, wash with ethanol, ddH20, and then prime with siRNA DMEM Buffer.

7. Program Wellmate to dispense 70 μL to all wells of 96-well plates.

8. Place siRNA Daughter plates into stacker.

9. Press Start to fill all the plates.

10. After run is done wash tubing with ethanol and ddH20 and store in RNA drawer.

Step 3. Aliquoting to 384 Well Cell Plates. siRNA daughter plates will now contain 80 μL of dissolved siRNA pools. This step will transfer 10 μL from each well to a quadrant in a 384-well cell plate (4 plates total).

158

1. Label clear bottom-black 384 well Greiner plates (EK-30091) with the bar code NT for “no treatment” or AP for aphidicolin treatment and then the date (year, month, day) and a serialized number (_01 to _20) and a replicate designation A or B. Use the Excel file “cimprich.xls” on the barcode printer computer to generate the barcode list and use the WASP software with the “cimprich.lab” template to print the barcodes. Apply the barcodes on the A1-P1 side. Example NT070514_01A. Make second copy of barcode using cimprich2.lab and apply to P1-P24 side (this shows the barcode more prominently).

2. Set up the Velocity 11 as follow: Stack 1: 96 LT 200 μL tips (1 box for each Daughter siRNA plate, label tips boxes with numbers 1 to 40 for each siRNA plate). Stack 2: 384-well cell plates.

3. VPrep Setup: Shelf 2: siRNA Plate #1 Shelf 4: siRNA Plate #2 Shelf 6: siRNA Plate #3 Shelf 8: siRNA Plate #4

4. Open “4assay4siRNA.bwl” protocol in siRNA Folder and press Start. Enter number of runs (equal to number of siRNA Daughter plates).

5. After run is complete collate sets by suffix number and affix Master plates with original polystyrene lids and place all plates at -20 oC (using one rack per replicate set)

Step 4. Adding Dharmafect Buffer to 384 Well Cell Plates.

In this step, the Matrix Wellmate will add 10 μL of Dharmafect Buffer to each well of the 384-well cell plates.

1. Matrix Wellmate setup: Use Small Bore Tubing set #118340021 wash with ethanol, ddH2O, OptiMEM, and then prime with Dharmafect Buffer. 2. Program Wellmate to dispense 10 μL to all wells of 384-well plates.

3. Place 384-well cell plates (with siRNA) into stacker.

4. Press Start to fill all the plates.

5. After run is done wash tubing with ethanol and ddH20 and store in RNA drawer.

159

Step 5. Adding Cells to 384-well Plates.

In this step, the Matrix Wellmate will add 30 μL of HeLa cells to each well of the 384- well cell plates. This will be performed on Day 0 and will commence no earlier than 30 minutes after the start of Step 3 and no later than 2 hours after the start of Step 3.

1. Use the Matrix Wellmate to plate the cells from a 25K/mL solution. Use small bore tubing (generic # 116730014), wash first with ethanol, then ddH2O, then prime with cells.

2. Program Wellmate to dispense 30 μL to all wells of 384-well plates.

3. After every sixth plate PAUSE Wellmate and mix cell solution

4. After adding cells to plates, let sit at room temperature on lab bench for 15 to 30 minutes to let cells settle and attach.

o 5. Place in 37 C CO2 incubator.

6. Press the Empty button on the Wellmate to empty the tubing. Then wash out with ddH2O and 70% Ethanol.

Step 6. Adding Aphidicolin to 384 Well AP Plates and Media to 384 Well NT Plates.

In this step, the Matrix Wellmate will add 10 μL of 2 μM Aphidicolin cells to each well of the 384-well AP plates. This will be performed on Day 2.

1. Matrix Wellmate setup: Use small bore tubing (generic # 116730014), wash first with ethanol, then ddH2O, then prime with cell media. 2. Program Wellmate to dispense 10 μL to all wells of the NT 384-well plates.

o 3. Return NT plates to the 37 C CO2 incubator.

4. Press the Empty button on the Wellmate to empty the tubing.

5. Prime the Well mate with 6X Aphidicolin buffer

6. Program Wellmate to dispense 10uL to all wells of the AP 384-well plates

o 7. Return AP plates to the 37 C CO2 incubator.

160

A.1.3 384 Well γH2AX Immunofluorescence Staining and Imaging

Step 1. Fixing and Staining of all Plates.

In this step, the 384-well plates will be fixed and stained to prepare for imaging. This will be performed on Day 3. For Brevity, all instrument pre and post wash steps will not be included-please follow instrument standard operating procedures.

1. Remove cell plates from automated incubator.

2. Toss the media (in sink-with Bleach).

3. Add 30 μL of Ice-cold 90% Methanol, using Wellmate (generic tubing)

4. Place plates in -20oC freezer for 20 minutes.

5. Dump Methanol and air dry.

6. Add 30 μL Blocking Buffer, using Wellmate (generic tubing), 30 minutes

7. Dump Blocking Solution.

8. Add 10 μL H2AX Antibody Solution, using Wellmate (generic tubing), incubate overnight at 4 oC.

9. Wash plates in TBS using the ELX405UCW (with stacker), program Cimprich (#13, this program will wash the cells 3 times with Asp Height 55, speed 01, ~50 μL, Disp Height 120, volume 200, speed 01, final volume ~50 μL)

10. Dump final volume of TBS.

11. Add 15 μL 2ndary Antibody Solution, using Wellmate (generic tubing), incubate 1 hour at room temperature.

12. Wash plates in TBS using the ELX405UCW (with stacker), program Cimprich Final (#14, this program will wash the cells 5 times with Asp Height 55, speed 01, ~50 μL, Disp Height 120, volume 200, speed 01, final volume ~50 μL)

13. Add 25uL 3X PI/RNAse A solution using Wellmate (generic tubing)

161

14. Seal 384 plates using the PlateLoc and BenchCel with the following protocol: For 384 well plates “Seal 384 PP.bwl”

15. Store Plates to be imaged at 4 C wrapped in aluminum foil to protect from light

Step 2. Image the Plates on the IsoCyte™ (MDS Analytical Technology).

The IsoCyte™ (MDS Analytical Technology) laser scanning platform was equipped with a 20mW 488nm laser and set up for two channels of acquisition. γH2AX-Alexa-488 fluorescence was acquired using a 510-540 nm band pass (green) filter, and PI fluorescence was acquired using a 600 nm long pass (red) filter. Image acquisition was done at nominally 10x10 μm2 sampling. The entire 384-well plate was scanned with a complete image of each well acquired and analyzed in a single 3 minute scan cycle. The IsoCyte™ was integrated with a Twister II Microplate Handler (Caliper) to allow automated imaging of up to 80 plates at one time.

Image analysis was done concurrently with data acquisition as follows: the fluorescence intensity images were flattened and background corrected using a rolling-ball algorithm with a characteristic length of 150 μm. A region of interest (ROI) of approximately 90% of the area of each well was defined. A threshold algorithm based upon the pixel intensity histogram in the ROI was applied to these processed images in the PI channel, and contiguous pixels above threshold were grouped into individual objects. Intensity and area filters were applied to select those objects consistent with individual cells and the total fluorescence intensity of each object, or cell, was calculated for both channels by integrating the pixel values associated with the cell and subtracting the average background intensity of the well.

162

A.2 Data Analysis and Statistical Methods

-this method of analysis was designed and implemented by Roy Wollman

Here we describe the statistical procedure used to analyze the raw intensity data per cell and to determine which siRNA treatments showed γH2AX staining increases that were statistically significant. We first give an overview of the analysis procedure and then provide further details to demonstrate some of the analysis decisions using appropriate diagnostic plots.

First, we identified two sources of variability in the data, per-plate and per-day. The analysis as described below takes both types into account. Each plate had its own characteristics (e.g. staining level in control wells) so we identified several parameters on a per-plate basis to determine the strength of the response in each well. The experiments were performed in weekly runs, with between 20 to 40 plates in each day. We noticed the strength of the γH2AX signal observed in the negative and positive controls varied between treatment days. Therefore, we also estimated another set of parameters to describe the per-day variability of the data. The results of this dual step procedure is that the strength of the response, which is derived on a per-plate basis, does not translate directly into p-value since the significance was derived based on probabilistic estimate that took the per-day variability into account. This can sometimes result in unintuitive results where the ranking of p-value is not in agreement with the ranking based on signal strength.

The following steps describe the overall data analysis:

1. Per plate data transformation. a. Transform into log scale b. Estimate linear regression model for γH2AX as a function of PI c. Correct γH2AX signal for DNA content 2. Calculation of signal strength per well. a. Fit Gaussian Mixture (GM) model to positive controls, and identify responsive cell intensity Gaussian.

163

b. Use non-parametric kernel smoothing to identify distribution of negative controls. c. Using the two distributions identified above, define optimal intensity threshold. d. Using the intensity threshold, calculate number of responsive cells per well (k) and total number of cells (n) per well 3. Assignment of p-value for each well a. Estimation of within-day and between-day variability of controls b. Derivation of mathematical expression for the probability to get k responsive cells out of n trials given known between day/within day variability. c. Assignment of p-value for each treatment. 4. Correction for multiple testing to assign significant categories.

A.2.1 Per Plate Data Transformation

The raw data included the measured intensity of γH2AX and propidium iodine (PI) per cell. For both PI and γH2AX, the intensity distribution had a heavy right tail as can be seen in the histogram of all cells in a single plate (Fig. A1).

Figure A.1. Histograms of measured raw cellular γH2AX and propidium iodide intensities. Left panel shows the γH2AX distribution of all cells within a representative plate. Right panel histogram show the propidium iodide (PI) distribution of all cells within the same representative plate. The dashed cyan and dashed magenta lines represent the PI distribution for the positive and negative controls respectively. Intensity is measured in arbitrary units (AU). Approximately 6*10^5 cells were scored.

164

Therefore, the first step was to log transform the data to reduce the heavy right tail. This step was especially critical for the γH2AX staining. After initial transformation, the histograms show the distribution is more balanced and the PI is clearly bimodal as expected from a DNA content dye (Fig. A2).

Figure A.2. Histograms of log transformed cellular γH2AX (left panel) and propidium iodide (right panel) intensities.

We also needed to consider the fact that a correlation between the γH2AX and the PI stain was evident. The following scatter plots show the negative control cells from all the wells of a single plate before (left panel) and after (right panel) correction (Fig. A3).

Figure A.3. Scatter plots of negative control cells before (left panel) and after (right panel) correction for DNA content. Frequency density is represented by the shown pseudocoloring ranging from dark blue (low density) to dark red (high density). Approximately 12,000 cells were scored.

165

In this plot, each point’s color represents the number of points at this position. It is obvious from this plot, as from previous biological knowledge1, that as DNA content increases, the intensity of γH2AX staining increases as well. To correct for the DNA content in each cell, we decided to use the negative control cells from each plate. For each plate, we fit a simple linear regression model of γH2AX as a function of PI. Then, for each cell in the plate, we subtract the expected value based on this linear model from the observed γH2AX intensity. For the rest of the analysis, we only considered a single measurement per cell, its adjusted log γH2AX (y-axis in right panel of Fig.A3). For simplicity, we will still use the term γH2AX intensity, but from here on we are referring to the adjusted log transformed γH2AX intensity.

A.2.2 Calculation of Signal Strength per Well

After data transformation, we further investigated the response of the siChk1 positive control wells. The following histogram, which is representative of all the plates, shows the distribution of the positive and negative controls:

Figure A.4. Histogram comparing the negative and positive control γH2AX staining distributions. The blue line indicates the negative control which was a non-targeting siRNA pool and the green line indicates the positive control which was a siRNA pool targeting the Chk1 kinase. The distributions represent all cells scored for a representative plate.

166

The negative control showed a single unimodal Gaussian whereas the positive controls showed a strong bimodal distribution where one peak is close to the negative control (~-0.2) and another strong peak is evident at ~1.2. One interpretation of the data is that in each well some percent of the population is affected by the treatment and shifts its intensity to a new distribution centered around 1.2. Therefore, we decided to divide each well into two populations: the non-responding and responding cells, and use the percent of responding cells as the signal from each well. However, since estimating the two populations separately on each well is not robust enough for such large dataset, we opted on estimating a single threshold value per plate based on the controls. Then, the statistics for each well was based off of the percent of cells above this threshold.

To determine the threshold for each plate, we estimated the probability distribution for the negative and positive controls and used those parametric distributions to determine the cutoff. To estimate the distribution of the negative control (non responsive) cells, we used a simple, single Gaussian fit of normal distribution. For the positive controls, we first fitted, using the Expectation Maximization (EM) algorithm, a Gaussian Mixture (GM) distribution with two Gaussians to the positive control wells. Then, we took the mean and standard deviation of the higher mixture and used a single Gaussian normal distribution with these parameters as an approximation of the distribution of the responsive cells. We then determined the cutoff by using two Gaussians, the negative control Gaussian and the Gaussian of the responsive cells in the positive control, e.g. the right Gaussian in the Gaussian mixture. The cutoff for a γH2AX positive cell was determined as the point in this probability where the responsive cell distribution is higher than the non-responsive distribution. The result of this procedure was a single threshold below which a cell was considered to be non-responsive (negative) and above which the cell was considered to be responsive (positive). The following plots show the positive (green) and negative control (blue) distributions with the per-plate threshold (red) and the average of all thresholds (cyan).

167

Figure A.5. Representative comparative histograms from random plates assayed. The blue line indicates the negative control which was a non-targeting siRNA pool and the green line indicates the positive control which was a siRNA pool targeting the Chk1 kinase. Each graph represents an individual plate in which the distributions represent all control cells scored within that plate. Vertical lines indicate calculated thresholds, with red showing that plate’s individual threshold for what was considered to be an γH2AX positive cell and cyan indicating the average threshold for all plates analyzed. Approximately 12,000 cells were scored for the negative control and 6,500 cells were scored for the positive control for each graph shown

A.2.3 Assignment of P-value for Each Well

Given the signal strength from each well, we now turned to the question of statistical significance or assignment of P-values. By definition, a P-value is the probability to obtain a certain result assuming a null hypothesis. Therefore, we needed to specify the null hypothesis in detail and use it to determine the P-value for each well. The higher the variance of the null hypothesis was, the more likely we were to obtain high signal wells by chance alone and therefore the higher (less significant) the P-value of these seemingly strong wells would be.

The following two plots analyze the variability of the negative controls from all the plates in the screen. The left panel shows the histogram of the average signal strength per plate of the negative control wells (8 replicates) in that plate. The histogram shows non-

168 negligible variability suggesting that one needs to consider differences of signal strength in plates. The right panel shows a box-plot of different days. The red line is the median signal strength of all the negative control wells done in that day (across multiple plates). This shows that not only was there well to well variability in the signal strength of the controls but more importantly that this variability changed between different days, both of which needed to be accounted for within our analysis.

Figure A.6. Demonstration of the types of signal variability observed. Left panel shows the distribution of the average γH2AX signal strength (percent γH2AX positive population) of the negative controls for each plate assayed. Right panel shows a box plot of the variation seen on a per day basis with each box representing the variation seen between plates within that day’s set. The red line indicates the median γH2AX signal strength (percent γH2AX positive population) of all of the negative control wells assayed within that day. The upper and lower limits of the boxes represent the 25th and 75th percentile of the median signal strength of the same data set. The error bars indicate the maximum and minimum median signal with crosses indicating outliers within that day’s analysis.

Since we treated each cell as having only possible two states, we can think of each well as a binomial distribution and ask what is the probability of finding K responsive cells out of N possible cells given the probability to belong to the responsive population q. Determining q will give us a full description of the null hypothesis in a parametric form and will allow us to assign P-values. The big question is what is q? The above analysis of plate-to-plate and day-to-day variability shows that this might not be a trivial question. In

169

the next section, we describe an analytical expression that we developed to allow us to estimate the P-values without knowing q explicitly, but assuming a parametric form of the distribution of q.

A.2.4 Mathematical Derivative of Analytical Expression for the P-value

Let the probability of a cell in well wi,j to show positive γH2AX staining be qi,j. We assume each cell follows a simple Bernoulli distribution. We assume cells within a well are independent of one another so that the probability of seeing k positive cells in a well with n cells is given by the binomial distribution:

⎛ ni, j ⎞ n−k P() positive# = k = ⎜ ⎟ qk 1− q k i, j ()i, j ⎝ i, j ⎠

where i indexes all plates and j indexes all wells within plate i. To estimate the P-value, the probability of seeing more than k positive cells in a well, we can look at the complimentary probability, the probability of observing k or lower cells:

⎛ n ⎞ i, j k n−k P − value = 1− P() positive# ≤ K = 1− ∑⎜ ⎟q i, j ()1− qi, j k ⎝ ki, j ⎠

Therefore, if we know q for each well, we can estimate the p-value. However, as we

showed earlier, qi,j is not constant and has non-negligible variability that must be taken into consideration. Below we show how this variability in q was taken into account using a hierarchical statistical model.

In our model, a cell can be classified into the responsive class either if it really belongs to that class or if it was misclassified as such. We define λ as the probability of belonging to the responsive class. Note, λ is different from q in that it represents the probability of being classified as responsive. What is then the relationship between q and λ? To answer this question let us define the intensity distribution I of a single well to be a mixture of two distributions:

170

⎧ λ Fresponsive I ~ ⎨ ⎩1 − λ Fnon−responsive

where Fresponsive follows a normal distribution with parameters μ and θ and

Fnon-responsive follows some arbitrary distribution. Based on these probability assumptions, we can write the probability that a cell is above threshold to be (q) in terms of the probability to belong to the responsive class (λ) and the two conditioned probabilities:

q = λ ⋅ PI()> Ti | responsive + (1− λ)⋅ PI( > T | non − resposive)

Given the variability in well signal strength in the negative controls from different wells and the day-to-day variability demonstrated in the previous figure, we assume that λ has a log-normal distribution with parameters νd, θd where νd is the average and θd is the standard deviation for all negative control wells from plates done in day d. Since q is a function of λ, which is a random variable, without prior knowledge of the specific λ value for a specific well, we take the expectation over all values of λ:

q = q(λ)= Eq( (λ))= ∫ q(λ)f (λ)dλ where f(λ) is the probability density for λ. Substituting the expression for q and the two responsive / non responsive intensity distributions we get:

Inf

qi, j = ∫ ()λ ⋅ PI()> Ti | responsive + ()1− λ ⋅ PI()> Ti | non − responsive f ()λ dλ = 0 2 2 Inf ⎡ Inf −()I −μi Inf ⎤ ⎛ −()ln λ−ν d ⎞ 1 2 1 2 ⎢ 2σ i ⎥ ⎜ 2θ d ⎟ ∫ λ ∫ e dI + ()1− λ ∫ Fnon−responsivedI ⋅ e dλ ⎢ 2πθ ⎥ ⎜ λ 2πθ ⎟ 0 ⎣ Ti i Ti ⎦ ⎝ d ⎠

This integral can be solved numerically. The distribution parameters are estimated based on several sources: μi, σi are based on the higher Gaussian in the Gaussian Mixture fit to the positive control, νd, θd are based on the negative controls from all the plates done in

171 day d, Ti is the threshold for this plate, and Fnon-responsive is estimated from same plate negative controls using a kernel smoothing procedure. Given the numerical solution of this integral, the estimation of the p-value uses the binomial distribution as explained above.

A.2.5 Correction for Multiple Testing and Assignment of Significance Categories

We applied the procedure described above to 46,400 different wells. When performing so many statistical tests, one must correct for multiple testing. We used a FDR (false discovery rate) correction to define two P-value cutoffs, the traditional alpha 0.05 and an FDR-corrected alpha for each replica. We then defined three levels of significance: group 4 includes all the genes where both replicas were lower than the FDR corrected alpha (0.0042). In group 3, one replica was below the FDR corrected alpha and the other replica was below the traditional alpha (0.05). In group 2, one of the replicas was below the traditional alpha. Genes were also required to have a γH2AX signal within the top 25% of the genome to belong to one of the aforementioned groups. In addition, we acknowledge that due to the stringent statistical procedure described above, we are likely to lose true biological ‘hits’ due to high variability and lack of statistical power. Therefore, we added a final group, group 1, of all the genes not in groups 2-4, which had a signal strength in the upper 0.05 percentile.

A.3 Other Screening Protocols, Analysis, and Calculations

A.3.1 siRNA Deconvolution

The individual siRNA oligos for the 350 siRNA pool hits were tested in the deconvolution assay. For this screen, each unique siRNA oligo in a 96-well plate was resuspended in 125 μl of 1X siRNA buffer (2 μM). Plates were heat-sealed using the PlateLoc (Velocity11) and frozen at -20 oC. After 24 hours the plates were thawed and 20 μL was transferred to 96-well polypropylene U-bottom plates with 88 siRNAs per plate. 10 μL from these daughter plates was transferred to a new daughter plate that was

172 then used for the screen in the same manner as described above. Image acquisition for these siRNAs was performed as above and resulting data (percent γH2AX staining of cells) for each of the four individual siRNAs was determined by setting an intensity cut- off approximately three times greater than the average γH2AX staining observed in the siControl-treated wells.

A.3.2 Z’ Factor Calculation

The Z’ Factor was defined from the means and standard deviations of both the positive and negative controls on each plate according to Zhang, 1999.

Z’ = 1-((3SD+ + 3SD-) / |Mean+ - Mean-|)

A.3.3 Propidium iodide Cell Cycle Analysis

To assign cell cycle stage values, all cells within a given well were compiled and a histogram of the PI intensities per cell was plotted. To eliminate nuclear fragments and cellular doublets, a nuclear area parameter gate was set. To assign stages of the cell cycle and percent γH2AX+ per stage, the first four non-targeting control siRNA wells of each individual plate were compiled and a cumulative histogram was plotted. According to external experiments, we calculated the average cell cycle profile for siControl-treated HeLa cells to be 40% G1, 30% S, and 20% G2 when the 5th and 95th percentile of cells were eliminated. Thus, we set PI intensity parameters within the compiled control samples to fit this model. To account for the effect of cell number variance on PI intensity, the siRNA targeted histogram was overlaid on the histogram from the non- targeting control, and PI intensities were adjusted according to the min and max values of the control eliminating the 5th and 95th percentile. Then the same PI intensity cutoffs for each cell stage were applied to the siRNA targeted histogram. Analysis was done in duplicate and averages with SD were compiled. The percent γH2AX positive population (% γH2AX+) was calculated by applying an intensity cutoff approximately three times greater than the average H2AX intensity of all siControl treated wells. This value was then subdivided to calculate the % γH2AX+ population per cell cycle stage by

173 categorizing each positive cell based off of its corrected PI intensity and where that fell in relation to the cell cycle stage PI intensity cutoffs.

A.3.4 Bioinformatic Analysis

Functional analysis:

Genes within the highest significance group within the original screen (i.e. group 4) were uploaded into David bioinformatics database (http://david.abcc.ncifcrf.gov/) and Ingenuity pathway analysis (Ingenuity Systems, www.ingenuity.com) to identify the biological functions and/or diseases that were most significant in the data set. The genes were categorized according to GO terms (biological process, cellular complex, molecular function), protein information resource keywords, or the OMIM/Genetic Association disease datasets. Fischer’s exact test was used to calculate a p-value determining the probability that each biological function and/or disease assigned to that data set is due to chance alone. Select enrichments are shown in Figure 2C with the full list of significantly enriched functions listed in TableS4.

Networks:

After a functional category had been identified, the genes within each category were uploaded into the IPA software to look for protein interaction networks. Each gene identifier was mapped to its corresponding gene object in the Ingenuity Pathways Knowledge Base. These genes, called focus genes, were overlaid onto a global molecular network developed from information contained in the Ingenuity Pathways Knowledge Base. Networks of these focus genes were then algorithmically generated based on their connectivity. Once a network had been defined, genes within the remaining significance groups (groups 1-3), as well as the genes validated with 1 or more siRNAs upon deconvolution, were mined to look for other interacting proteins which also fell into the original functional category. Networks were modified by hand to reflect additional literature defined direct protein interactions where appropriate.

174

Screening Results Comparison:

Genes that gave a significant γH2AX signal within the original screen (i.e. groups 2-4), plus genes that gave a γH2AX signal greater than two standard deviations from the control signal with one or more siRNAs within the deconvolution experiment were used for cross comparison with relevant published screening results. For published human studies, gene names were converted to the most recent identifier using either the NCBI Gene Database or the Biobase Proteome Platform and used for comparison. The significance of enrichment was calculated using a Chi-squared test. For screening results from other species, the functional orthologs (if applicable) were identified using the Biobase Proteome platform and used for comparison purposes. Significance of enrichment was not calculated for cross-species comparisons.

References

1. Mirzoeva OK, Petrini JH. DNA replication-dependent nuclear dynamics of the Mre11 complex. Mol Cancer Res 2003; 1:207-18.

175

176

APPENDIX B

General Lab and Cell culture Protocols

B.1 Basic Cell Culture Maintenance

B.1.1 Cell Culture Care

Preparation of Media for TC

Thaw 50 mL 100% FBS (Fetal Bovine Serum) aliquot, 5 mL PS (Penicillin, Streptomycin) aliquot and 5mL Glutamine aliquot in 37°C water bath.

Remove 50 mL of media into sterile 50 mL falcon tube.

To remaining 450 mL media add 50 mL FBS (final conc. 10%) and 10 mL PSG (final concentration 1%Pen/Strep, 1%Glutamine).

Store at 4°C.

Thawing cells

Pre-warm media in 37°C water bath.

Pipette 10 mL media into 10 cm plate and put in incubator for 10 min.

Get cells from liquid N2 stocks and wipe down with 70% ethanol.

Thaw frozen stock immediately using 1 mL of pre-warmed media.

Transfer cells to remaining media in 10 cm plate.

Replace media following day or when the cells have attached to the plate (at least 2 hours).

177

Counting cells

Aspirate media and rinse cells with 2 mL trypsin per 10 cm plate.

Incubate cells with 2 mL trypsin per 10 cm plate for 5 min. in incubator.

Make sure all cells are lifted off plate by tapping plate forcefully against hood.

Resuspend trypsinized cells with 5 mL media and transfer to a sterile 15 mL conical tube.

Pipette 10 µL in to “v” of hemocytometer.

Count the number of cells in four corners (with 5 squares each per corner), if number is over 200 dilute 1:10 with media and recount.

Average # cells per corner = # cells x 10 4/mL.

Passaging cells

Aspirate media and rinse cells with 2 mL trypsin per 10 cm plate.

Incubate cells with 2 mL trypsin per 10 cm plate for 5 min. in incubator.

Make sure all cells are lifted off plate by tapping plate forcefully against hood.

Dilute cells to the proper density, put the appropriate amount of media/cell mix into the desired size of plate and place into incubator.

Freezing cells

To make Freezing Media add 25 mL of 100% FBS (final conc. 50%) and 5 mL DMSO (final conc. 10%) to 20 mL regular media.

Pre-warm trypsin in 37°C water bath.

Aspirate media and rinse cells with 2 mL trypsin per 10 cm plate.

Incubate cells with 2 mL trypsin per 10 cm plate for 5 min. in incubator.

Make sure all cells are lifted off plate by tapping plate forcefully against hood.

178

Resuspend trypsinized cells with 5 mL media per 10 cm plate.

Transfer cells to 50 mL conical tubes and spin in clinical centrifuge at 2000 rpm for 5 min.

While cells are spinning, label cryotubes with initials, date, cell line, media, and selection conditions. Remove caps and place in blue cryotube rack. (1 cryotube per original 10 cm plate of 80% confluent cells).

After all cells are pelletted, aspirate media and resuspend in 1 mL Freezing Media per original 10 cm plate of cells.

Immediately, transfer 1 mL of resuspended cells to each cryotube and freeze down over night in -80oC freezer in a Styrofoam container.

Next day transfer to liquid N2 dewer. The longer cells are left at -80 C, the less viable they will be when thawed.

Cell Culture Plates Media Volumes

Plates Volume Dishes Volume 96 well 0.2 mL 3.5 cm 2.5 mL 24 well 0.5 mL 6 cm 5 mL 12 well 1 mL 10 cm 10 mL 6 well 2 mL 15 cm 20 mL

B.1.2 Cell Lines Used

Cell Line Media Passage conditions Character/Source HeLa DMEM 10%FBS/PSG 1/5 every 2 days ATCC 293T DMEM 10%FBS/PSG 1/10 every 2 days Cimprich lab U2OS DMEM 10%FBS/PSG 1/5 every 2 days Cimprich lab Hs68 DMEM 10%FBS/PSG 1/3 every 3 days Meyer lab HeLa-YFP DMEM 10%FBS/PSG 1/5 every 2 days Cimprich lab +hygromycin B 200ug/mL HeLa-RNAseH1 DMEM 10%FBS/PSG 1/5 every 2 days Cimprich lab +blastocidin 1ug/mL GM04595 EMEM 15%FBS/PSG 1/3 every 3 days Coriell GM04593 EMEM 15%FBS/PSG 1/3 every 3 days Coriell

179

B.2 Experimental Cell Culture Protocols

B.2.1 Immunofluorescence

Cell plating

Cells should be plated so that they are non-confluent at the end of the experiment. This corresponds to ~ 800 HeLa/384 well; 3000 HeLa/96 well, or 100,000 HeLa/coverslip in a 6-well dish.

Cell Fixation

Cells can be fixed by a variety of methods. 4%PFA in PBS is the most common and least harsh condition used. It is a good starting point for most staining as it keeps cellular structure relatively intact. Ice cold 90% methanol is good for chromatin bound epitopes such as histones.

Cell permeabilization

Cells can be permeabilized either before or after fixation. Permeabilization before fixation is commonly called pre-extraction and is a good method to use to attempt to see foci of particular proteins. 0.5% NP-40 in PBS is a good starting point for pre-extraction, while 0.2% Triton-X PBS is the common method for permeabilization after fixation.

Primary antibody staining

Antibody staining needs to be optimized on a case by case basis, but a good general starting point is 1hour at room temperature for non-phospho specific antibodies, or 4 C overnight for phospho-specific antibodies with a starting titer of 1:500.

Secondary antibody staining

Secondary antibody staining need to be performed in the dark and can be done in concurrently with nuclear stains such as DAPI or Hoescht. A good starting titer is

180

1:1000, and doing a secondary only control is critical when testing new antibodies for immunofluorescence.

Specific useful staining conditions

53BP1: 72 hours following siRNA knockdown, cells were pre-extracted with 0.25% NP- 40 on ice for 10 min, fixed with 4% para-formaldehyde for 20 min, blocked in 2% BSA/PBS for 30 min, and incubated in primary anti-53BP1 antibody for 1 hour at room temperature (1:500). γH2AX: cells were fixed with 4% para-formaldehyde for 20 min, permeabilized with 0.2% Triton-X/PBS, blocked in 2% BSA/PBS for 30 min, and incubated in primary γH2AX antibody overnight at 4°C (1:500). In both cases, cells were then washed three times in 2%BSA/PBS, incubated in Alexa 488 (1:1000) secondary antibody for 1 hour, and then washed three times with PBS. DAPI was added to stain nuclei.

B.2.2 Transient Transfection of DNA and/or siRNA

Dharmafect siRNA reverse transfection

6 well 96 well 384 well Volume siRNA/well (for 25nM) 2.5 uL 0.125 uL 0.063 uL Volume Dharmafect/well 2 uL 0.1 uL 0.03 uL Volume siRNA mix/well 200 uL 20 uL 10 uL Volume Dharmafect mix/well 200 uL 20 uL 10 uL Cell number per well 130,000 3000 800

Dilute Dharmafect in OptiMEM, let incubate 5 min (10uL Dharmafect in 2mL for 96 well).

Dilute siRNA in OptiMEM, for a 50nM transfection (0.25uL 20uM siRNA/ well of 96 well or 3uL diced siRNA).

Add Dharmafect mix to siRNA mix, and pipette up and down 3X.

Let incubate between ½ hour to 1 hour.

Put si/Dharm complex into plate (40uL per well).

181

Add cells to mix.

6 well = 130,000 HeLa/well in 2mL 96 well= 3000 HeLa/well in 100uL 384 well= 800 HeLa/well in 30uL Let plate sit at room temp for 15 min, then place in incubator.

Assay after 48 hours.

Lipofectamine siRNA transfection

Plate cells the night before transfection ( 2million 293T/ 10cm2 plate) or (200,000 293T/ well of a 6 well plate). Do not use HeLa, lipofectamine is toxic.

Use between 5-10ug per 10cm2 dish or between 1-2ug per well of a 6 well plate.

Dilute Lipofectamine in OptiMEM (25uL/10 cm2 dish into 175uL OptiMEM or 3uL/well of a 6 well plate into 100uL OptiMEM).

Let rest 5 min.

Dilute DNA into 50uL OptiMEM.

Add lipofectamine mix to the DNA mix and pipette up and down once.

Wait 30 min.

Add mixture to cells in a spiral fashion.

Wait 24-48 hrs for protein expression.

Fugene DNA transfection

Plate cells the night before transfection ( 2million 293T or 750,000 HeLa/ 10cm2 plate) or (200,000 293T or 100,000 HeLa/ well of a 6 well plate).

Use between 5-10ug per 10cm2 dish or between 1-2ug per well of a 6 well plate.

182

Dilute Fugene in OptiMEM (15uL/10 cm2 dish into 185uL OptiMEM or 3uL/well of a 6 well plate into 100uL OptiMEM).

Let rest 5 min.

Dilute DNA into 50uL OptiMEM.

Add Fugene mix drop wise to the DNA mix.

Wait between 15-30 min.

Add mixture to a cells in a spiral fashion.

Wait 24-48 hrs for protein expression.

B.2.3 Retroviral Transfection/ Stable Cell Line Generation

Plate 2 million 293T cells the night before transfection.

The next day transfect cell with Lipofectamine 2000, using a ratio of 2:1 (ie 25uL lipofectamine per 12ug of DNA).

For the 12ug of DNA use 8ug of your pBMN protein expression vector and 4ug of the pCL ampho packaging vector.

Follow the standard lipofectamine transfection protocol.

24 hours post 293T transfection plate the cells you want to infect at a proper density so that they will be replicating the following day (ie 500,000 HeLa/10cm2 dish).

48 hours post 293T transfection remove media from cells and replace with 10mL fresh media for a 72 hr harvest.

Syringe filter with 0.45μm filter to remove any remaining 293T cells.

Aspirate media from cells you wish to infect and add enough of the 293T media to cover the plate (5mL/ 10cm2 plate).

183

Add polybrene to 4ug/mL and rock plate back and forth to mix.

Incubate cells with virus for 4 hours, then remove and add fresh growth media.

After 48 hours, trypsinize cells and replate into selection containing media.

Select cells for at least 2 weeks before using for subsequent experiments.

B.2.4 G2/M Checkpoint Assay

Transfect cells with 25nM of a single siRNA for 48h.

Irradiate with 10 Gy of IR and allowed to recover for 1 h.

Replace media with fresh media containing 100 ng/ml of nocodazole.

Eight hours later harvest cells by trypsinization and fix with 90% methanol for FACS analysis.

Label mitotic cells with anti-Mpm2 antibody (1:250) and DNA content with propidium iodide (1ug/mL).

Collect samples on FACS caliber, and analyze using FlowJo.

B.2.5 EdU Staining for Co-staining with H3 or γH2AX

This assay has been optimized for 96 well imaging plates.

This assay utilizes Invitrogen’s Click-it Cell Proliferation Assay Kit (#C10083).

Label the cells with 10uM EdU (stored at -20oC) for ½ hour prior to staining.

all washes are 100uL unless otherwise noted.

Fixation

Rinse cells in PBS.

Fix with 4% PFA at RT for 15 min or 20min -20oC in 90%MeOH.

184

You can stop here and store the cells at 4oC for 1 month.

Permeabilization

Rinse cells with PBS.

Permeabilize with .2% Triton X/PBS for 15 min at RT.

Rinse cells 2X with PBS.

EdU Staining

Dilute component C (stored in -20oC) 1:10 in deionized water.

Add component C 1:10 in Component B (stored at 4oC) this is the reaction mix.

Add 40uL reaction mix/well of a 96 well plate, coverslips use 75uL in a humidity chamber.

Let incubate at room temperature for ½ hour in the dark.

Rinse once in 40uL Rinse buffer (component D).

γH2AX or p-H3 Staining

Rinse cells with PBS.

Block with 2%BSA/TBS 1/2 hour at RT.

Dilute primary 1:500 cell signaling γH2AX AB cat#2577 in 2% BSA/TBS, or 1:1000 Abcam p-H3 AB cat#ab5176 in 2%BSA/TBS.

Add 50uL antibody mix per well of a 96 well plate and incubate overnight at 4oC in the dark.

Wash 3X with TBS/2%BSA.

Incubate with goat anti rabbit 1:800, for 1 hour at RT in the dark, you can add Hoescht at 1:10,000 directly into this mixture.

Wash 4X with TBS.

185

B.2.6 Homologous Recombination Assay

The following assay was adapted from (Pierce, A.J. 1999. Genes Dev) with the following modifications. We generated a pooled HeLa cell line stably transduced with a fragment of the YFP gene missing the last 35 bp that had an I-SceI target site at its 3’ end.

Reverse transfect ~ 150,000 of the 5’-YFP/I-SceI target site cells with the desired siRNA (25nM) using Dharmafect transfection reagent.

Twenty-four hours post-knockdown, retransfect with two plasmids - one encoding the I-SceI enzyme and another plasmid containing both the 3’ fragment of YFP missing the first 97bp of the YFP gene and mCherry (a red fluorescent protein) (3’ YFP/mCherry) using the Fugene transfection reagent (6 uL Fugene6 (Roche) + I-SceI enzyme plasmid (0.5mg) + 3’ YFP/mCherry plasmid (0.5 mg)/well.

Before transfection trypsinize and replate cells into three separate wells of a 6 well plate to avoid overcrowding.

Successful homologous recombination events occurred in cells that had fluorescence in both the red (to verify successful transfection of the 3’ YFP/mCherry plasmid) and YFP (homologously recombined 5’ and 3’ YFP fragments) channels.

Harvest cells by trypsinization four days after plasmid transfection and replating and analyzed by flow cytometry using a BD FACSCalibur acquiring ~ 50,000 events.

The percent of homologous recombination events was calculated thusly: 1) cells were gated by forward- and side-scatter readings to eliminate dead or clumped cells; 2) live cells were gated for positive red fluorescence; 3) live, red cells were scored for yellow fluorescence.

Successful HR events occurred in live cells that were both red and yellow, and percents were calculated as (# of YFP+Red cells)/(# of Red cells) and were normalized to the luciferase values.

186

B.2.7 DNA Damage Sensitivity Assay

The following protocol was adapted from 1with the following modifications. eYFP was cloned into the pBMN retroviral vector with hygromycin resistance. Stable YFP expressing HeLa pools were selected and utilized for subsequent experiments.

Reverse transfect ~150,000 YFP expressing cells in one well of a 6 well tissue culture plate with the siRNA (25nM) of interest using Dharmafect transfection reagent.

Transfect an equal population of unfluorescent HeLa with siLuciferase (25nM).

After 48 hours, mix the two populations in a one to one ratio and replate at 20% density.

Four hours post-replating and damage to cells (IR 2 Gy or aphidicolin 100nM), and allowed to grow for the following 5 days.

Change media every two days to ensure maximal cell growth.

At the end of the experiment trypsinize cells and analyze by FACS for the ratio of YFP +/ YFP – cells. Correct rations for the growth rate effect of each siRNA + Damage condition + NT + alone. [(%YFP siRNAX * (%YFP siRNALuc / %YFP siRNAX NT + Damage condition ))/ %YFP siRNALuc ]*100.

B.2.8 BrdU FACS Cell Cycle Tracking

Pulse-label cells with BrdU (20μM) for 30 min. Be sure to add BrdU to conditioned media otherwise the cell cycle profile will be altered.

Trypsinize cells and wash in PBS 2X to remove trypsin.

Fixed in 70% ethanol by resuspeding cell pellet in 1 mL PBS and dropping into a vortex of 10mL ice cold 70% ethanol, let sit on ice for at least 4 hrs or overnight.

Count out 1 million cells to stain.

187

Spin down fixed cells (1500 rpm, 5min) and resuspend in 1mL 0.2% Triton X/PBS for 15 min on ice to permeabilize.

Spin and resuspend in 1mL 2M HCl for 15min to denature DNA.

Spin and neutralize with 100mM NaBorate pH 8.5 1mL.

Wash 1X in PBX and resuspend in 2% BSA/PBS for 30 min to block.

Spin and resuspend in 100uL primary BrdU antibody (1:100) in 2% BSA/PBS for 1.5 hours at room temperature in the rotator.

Wash three times in 2% BSA/PBS, and incubate in Alexa 488 secondary for 1 hour in 2%BSA/PBS rotating at room temperature.

Wash three times with PBS, and resuspend in PI/RNAse A solution (final concentrations 10ug and 1ug/mL respectively) in FACS tubes.

FACS on BD FACSCalibur and analyze data with FlowJo.

B.2.9 RNAseH1 H2AX Assay

The RNAseH1 construct was obtained from Open Biosystems and subcloned in the pBMN blastocidin resistant retroviral vector containing an HA and FKBP L106P degron to obtain conditional protein expression via the addition of the small molecule Shld as described previously (Banazynski, 2006). Stable RNAseH1 expressing HeLa pools were selected and utilized for subsequent experiments.

Reverse transfect siRNA into HeLa and HeLa that contain the RNAseH1 construct in separate wells.

After 24 hours knockdown 1µM Shld to the RNAseH1 cell line, and the appropriate amount of EtOH to the other HeLa population.

At 72 hours post transfection, fix and stain cells for γH2AX.

188

In the absence of Shld minimal destabilization of the protein was observed, so the γH2AX change was compared to the parental HeLa cell line.

Percent change in γH2AX expression was calculated as follows: [(% γH2AX+ cells RNAseH HeLa - % γH2AX+ parental HeLa) / (% γH2AX+ parental HeLa)] *100. B.3 Other General Protocols

B.3.1 Micrococcal Nuclease Chromatin Digestion Assay

Generate sample with genomic DNA of interest

Transfect three wells of a 6 well dish per sample.

48-72 hrs post knockdown, harvest cells by trypsinization.

Rinse 2X in PBS.

Lyse cells

Count 1x107 HeLa cells, spin down 1500 rpm in the swinging bucket centrifuge in the cold room.

Lyse in 1mL lysis buffer (20mM Hepes pH 7.9, 0.25M Sucrose, 3mM MgCl2, 0.5% NP-40) on ice for 5 min, and then dounce with 15 strokes of a tissue homogenizer.

Pellet 3000g 15 minutes in the cold room.

Resuspend in 1mL Buffer B (20mM Hepes ph 7.9, 3mM MgCl2, 0.2mM EGTA, 1mM DTT), and pellet 3000g 10min.

Resuspend in 250uL Micrococcal Nuclease Digestion buffer (15mM TrisCl pH 7.4, 60mM KCl, 15mM NaCl, 0.25M Sucrose, 1mM CaCl2, 0.5mM DTT), and add 5U micrococcal nuclease to the 250uL.

189

Take 50uL timepoints at 1,2,5,7.5, and 10min and stop reaction by adding (0.2 M EDTA, 1% SDS, and 1mg/mL proteinase K).

Treat with proteinase K for ½ hr at 37oC.

Run samples on a 2% agarose gel and visualize banding with ethidium bromide staining.

B.3.2 Sodium Bisulfite Sequencing Assay

Generate sample with genomic DNA of interest

Transfect two wells of a 6 well dish per sample.

48-72 hrs post knockdown, harvest cells by trypsinization.

Rinse 2X in PBS.

Lyse genomic DNA by non-denaturing methods

10mM Tris pH 8.0 1mM EDTA 0.5% SDS with 20ug/mL proteinase K.

Incubate at 37oC overnight.

I usually use 0.3 to 0.5mL buffer per sample.

Phenol Chloroform extract

Add an equal volume of phenol/chloroform/isoamly alcohol to the DNA solution to be purified.

Vortex vigorously for 10 sec and microcentrifuge for 15 sec at RT.

Transfer the aqueous phase to a new tube, and re-extract the organic phase with a second volume of phenol chloroform. Do not worry about taking the white precipitant interface.

Chloroform extract

190

Add an equal volume of chloroform to the pooled aqueous phases.

Vortex vigorously for 10 sec and microcentrifuge for 15 sec at RT.

Ethanol precipitate

Add 1/10 the volume of 3M NaAcetate pH 5.2 to the DNA solution.

Flick tube to mix.

Add 2 to 2.5 volumes ice cold 100% EtOH, mix by vortexing and put in the -80C for ½ hour or longer.

Spin 10 min in a fixed angle centrifuge and remove the supernant.

Add 1mL 70% EtOH.

Invert the tube several times and centrifuge as before.

Remove supernant and air dry the pellet- but not too long.

Dissolve the pellet in an appropriate amount of TE pH 8.0 to a final concentration of ~1ug/uL (50uL is a good starting point).

Digest DNA with BamH1

Use 3uL per sample with 1:10 addition of NEB buffer 3.

Digest overnight at 37oC.

Ethanol precipitate as done in steps 6-9.

Sodium bisufite treat DNA

Make solutions for bisulfite conversion fresh.

20mM hydroquinone ( 0.11g/50mL RNAse free H20)

2.5M NaSO3 (2.6g/10mL H20)

pH the NaSO3 solution to 5.2 with ~100uL 10M NaOH

Mix the Reagents for DNA conversion.

191

12.5uL 20nM hydroquinone

457.5uL 2.5M NaSO3

30uL DNA (10ug total)

Incubate 14-16hrs at 37oC in the dark.

Purify converted DNA with Quiagen PCR purification kit

Add 2.5mL buffer PB to reaction and mix.

Bind to QuiaQuick spin column with repetitive spinning until all reaction mixture has been put over column.

Wash with 0.75mL buffer PE.

Elute with ~50uL EB.

Desulphonate the reaction

Make fresh 2M NaOH from the pellets.

Mix 30uL fresh 2M NaOH, 50uL eluted DNA, + 120uL H20.

Incubate 15 min at 37oC.

Neutralize and Precipitate the DNA

Add 60uL 1M HCl to the reaction.

Add 30ug carrier tRNA/ reaction (3uL if from sigma).

Add 1/10th volume NaAcetate pH 5.2 (30uL).

Add 2.5 volume ice cold 100% EtOH (750uL).

Spin, wash, dry and resuspend as before.

Set up PCR reaction

5uL Taq buffer 5uL 2mM dNTP

192

1.5uL MgCl2 1 uL forward native primer 1uL reverse converted primer 0.3 uL Taq 33uL H2O Input 3uL bisulfite treated DNA or 3uL of genomic DNA diluted 1:5 in H2O.

Setup PCR machine

Step 1: 94oC 30sec Step 2: 92oC 30sec Step 3: 57oC 30sec Step 4: 72oC 30sec Cycle back to 2 50X Step 5: 72oC 5min Run 1.5% agarose gel to check for PCR product.

Correct product should run at 400bp.

Ligate fresh PCR product into Pgem Teasy vector

5uL 2X ligation buffer 1uL vector 3uL PCR product 1uL T4 ligase Incubate in ice water overnight.

Tranform, pick white colonies, grow up and sequence.

B.3.3 Histone Extraction Protocol

Harvest cells by trypsinization.

Wash cell pellet 2X in PBS.

193

Lyse in modified TGN buffer (50mM Tris pH7.5, 150mM NaCl, 50mM β- glycerophosphate, 10% glycerol, 1% Tween 20, 0.2% NP-40, 1mM NaF, 1mM NaVO4, 1mM DTT, and 10μg/mL leupeptin, pepstatin and aprotinin) for 20 min on ice.

Clear by centrifugation (10 min, 20,000g). The supernant constitutes of the soluble protein fraction.

Resuspend the remaining pellet in 0.2N HCl overnight (or at least 4 hrs) on ice.

Spin down (20,000g, 10min) and neutralize with 1/10th the volume 2M NaOH. The supernant consists of the histones, and strongly chromatin bound protein fraction.

References

1. Smogorzewska A, Matsuoka S, Vinciguerra P, McDonald ER, 3rd, Hurov KE, Luo J, Ballif BA, Gygi SP, Hofmann K, D'Andrea AD, Elledge SJ. Identification of the FANCI protein, a monoubiquitinated FANCD2 paralog required for DNA repair. Cell 2007; 129:289-301.

194

APPENDIX C

Cloning and siRNA Reagents

C.1 General Cloning Protocols

PCR Cloning

Assemble the following in a PCR tube:

Volume Reagent 5 uL 10x Taq HiFi Buffer 2 uL 2mM dNTP 2 uL 50mM MgSO4 1 uL Template DNA (cDNA) 1 uL Forward Primer (10uM) 1 uL Reverse Primer (10uM) 0.3 uL Platinum HiFi Taq DNA Polymerase 38.7 uL water 50 uL

PCR conditions

Step Temperature Time 195oC 30s 292oC 30s 35oC below Tm 30s 468oC 1kb/min 5 Return to step 2 (x29) 668oC 5min 723oC 24hr

195

Restriction Digest

Assemble the following in an eppendorf tube:

Volume Reagent 2 uL 10X NEB buffer 2 uL BSA (1mg/mL) 0.5 uL Enzyme 5 uL DNA (1 to 2ug) 10.5 uL water 20 uL

Digest at 37oC for 1 to 2 hours unless otherwise specific by enzyme instructions.

Calf Intestinal Alkaline Phosphatase (CIAP) Treatment

This treatment should be performed on digested vector so they do not reanneal to themselves during the ligation process.

Dilute 1uL CIAP into 9uL 1x restriction digest buffer (same buffer as used in restriction digest).

Add 3uL of CIAP mixture to the 20uL restriction digest reaction.

Incubate for 15 min at 37oC.

Quench reaction with 1uL 0.5M EDTA pH 8.0.

Agarose Gel Purification

Verify bands by ethidium bromide agarose gel.

Run 2-3 uL restriction digest on a 0.8-2.0% agarose TAE + 0.01% EtBr.

Run samples to ligate on crystal violet gel.

Add 80% Glyerol (3uL) to the remaining 17-18uL restriction digest and run gel.

Visualize DNA bands on light box and cut out appropriate bands.

Purify with Quiagen Gel Extraction kit according to manufacturer’s instructions.

196

Ligation

Assemble the following in an PCR tube:

Volume Reagent 2 uL 10X ligation buffer 2 uL 10mM ATP (make fresh) 10 uL Purified insert (3 to 1 ratio to vector) 5 uL Purified vector 1 uL T4 DNA ligase 20 uL

Incubate reaction either at 15 C overnight in the PCR machine, in ice water overnight.

Transformation

Thaw competent DH5A on ice.

Add 1-2 uL DNA or 10uL ligation mix to the competent cells and flick tube to mix.

Incubate for 20min on ice.

Heat shock at 37oC for 1 min.

Incubate for 5 min on ice.

Add 1mL LB media and allow cells to recover for 2 hours at 37oC with shaking.

Plate 1 % and 10% on to LB Agar with the appropriate selection and incubate overnight at 37oC.

197

C.2 Antibody information

Western IF Epitope Vendor Catalog # concentration concentration

Chk1 (N19) Santa cruz sc-8408 1:2000 -- p-Chk1 (S345) Cell signaling #2341 1:1000 -- Chk2 Santa cruz sc-7235 1:2000 -- p-Chk2 (S345) Cell signaling #2661 1:1000 1:500 Set8 Abcam ab3744 1:1000 -- p-H2AX (S139) Cell signaling #2577 1:1000 1:500 Rad51 Santa cruz sc-8349 1:1000 1:500 GAPDH Abcam ab9482 1:10000 -- 53BP1 Chemicon MAB3802 1:500 RPA (AB3) Oncogene NA19L 1:250 H4 Cell signaling #2592 1:1000 -- Mono-methyl Abcam 1:2000 1:500 ab16974 H4 K20 Di-methyl Abcam 1:2000 1:500 ab9052 H4K20 BrdU BD 347580 1:200 p-H3 (S10) Abcam ab2567 1:1000 Flag M2 Sigma F1804 1:5000 1:1000 Flag M5 Sigma F4042 1:5000 1:1000

C.3 Recombinant Proteins

C.3.1 human RNAseH1

Full length human RNAseH1 was cloned from IMAGE clone ID # 3537074 into pBMN blastocidin resistant retroviral vector containing an HA and FKBP L106P degron to obtain conditional protein expression via the addition of the small molecule Shld as described previously1. Stable RNAseH1 expressing HeLa pools were selected and utilized for subsequent experiments. HA-FKBP-RNASEH1 ran approximately half way between the 37 kDa and 50kDa by SDS page.

198

C.3.2 human Set8

Full length human Set8 constructs were obtained from Or Gozani’s lab. They provided untagged Set8 in pcDNA3.1, an untagged methylation defective mutant also in pcDNA3.1, and a N-terminal Flag tagged Set8 construct. Expression was tested in 293T and Hela and the flag-tagged Set8 runs at ~ 50kDa.

The relevant NCBI gene references are: Set8 mRNA transcript NM_020382.3; Set8 protein NP_065115.3

Set8 Protein Sequence MARGRKMSKPRAVEAAAAAAAVAATAPGPEMVERRGPGRPRTDGENVFTGQSKIYSYMSPNKCSGMRFPL QEENSVTHHEVKCQGKPLAGIYRKREEKRNAGNAVRSAMKSEEQKIKDARKGPLVPFPNQKSEAAEPPKT PPSSCDSTNAAIAKQALKKPIKGKQAPRKKAQGKTQQNRKLTDFYPVRRSSRKSKAELQSEERKRIDELI ESGKEEGMKIDLIDGKGRGVIATKQFSRGDFVVEYHGDLIEITDAKKREALYAQDPSTGCYMYYFQYLSK TYCVDATRETNRLGRLINHSKCGNCQTKLHDIDGVPHLILIASRDIAAGEELLYDYGDRSKASIEAHPWL KH

C.3.3 human Znf574

Full length human Znf574 was cloned from IMAGE clone ID # 3502564 using the following primers: pBMN Znf574 forward: CGGGATCCGCCACCATGACTGAGGAATCAGAGG pBMN Znf574 Reverse: ATAGTTTAGCGGCCGCGAGTCAGCCACTGATCTG into the pBMN hygromycin resistant vector. Protein expression was never checked by western blotting.

The relevant NCBI gene references are: Znf574 mRNA transcript NM_022752.5; Znf574 Protein: NP_073589.4

199

Znf574 Protein Sequence MTEESEETVLYIEHRYVCSECNQLYGSLEEVLMHQNSHVPQQHFELVGVADPGVTVATDTASGTGLYQTL VQESQYQCLECGQLLMSPSQLLEHQELHLKMMAPQEAVPAEPSPKAPPLSSSTIHYECVDCKALFASQEL WLNHRQTHLRATPTKAPAPVVLGSPVVLGPPVGQARVAVEHSYRKAEEGGEGATVPSAAATTTEVVTEVE LLLYKCSECSQLFQLPADFLEHQATHFPAPVPESQEPALQQEVQASSPAEVPVSQPDPLPASDHSYELRN GEAIGRDRRGRRARRNNSGEAGGAATQELFCSACDQLFLSPHQLQQHLRSHREGVFKCPLCSRVFPSPSS LDQHLGDHSSESHFLCVDCGLAFGTEALLLAHRRAHTPNPLHSCPCGKTFVNLTKFLYHRRTHGVGGVPL PTTPVPPEEPVIGFPEPAPAETGEPEAPEPPVSEETSAGPAAPGTYRCLLCSREFGKALQLTRHQRFVHR LERRHKCSICGKMFKKKSHVRNHLRTHTGERPFPCPDCSKPFNSPANLARHRLTHTGERPYRCGDCGKAF TQSSTLRQHRLVHAQHFPYRCQECGVRFHRPYRLLMHRYHHTGEYPYKCRECPRSFLLRRLLEVHQLVVH AGRQPHRCPSCGAAFPSSLRLREHRCAAAAAQAPRRFECGTCGKKVGSAARLQAHEAAHAAAGPGEVLAK EPPAPRAPRATRAPVASPAALGSTATASPAAPARRRGLECSECKKLFSTETSLQVHRRIHTGERPYPCPD CGKAFRQSTHLKDHRRLHTGERPFACEVCGKAFAISMRLAEHRRIHTGERPYSCPDCGKSYRSFSNLWKH RKTHQQQHQAAVRQQLAEAEAAVGLAVMETAVEALPLVEAIEIYPLAEAEGVQISG

200

C.4 Dharmacon siRNA information

Information for individually ordered siRNA duplexes

Dharmacon Dharmacon Duplex Pool Number Number Gene # Accession Sequence D-009847-01 APC2 NM_005883 TTATGAAGCTGTCCTTTGA D-009847-03 APC2 NM_005883 GGACATCACCAGCCTGTAC D-022214-01 AQR #1 NM_014691 GGUAGUACAUUGCCAGAUG D-022214-02 AQR #2 NM_014691 GCUUACAGCUGGCACGAUU D-022214-03 AQR #3 NM_014691 GAAGUCAAACGAUUGCAAA D-022214-04 AQR #4 NM_014691 GGAUGUCCGUCGCUUGGUA M-003202-05 D-003202-05 ATR NM_001184 GAACAACACUGCUGGUUUG D-003202-17 ATR NM_001184 GCAACUCGCCUAACAGAUA D-003202-31 ATR NM_001184 UCUCAGAAGUCAACCGAUU D-003202-32 ATR NM_001184 GAAUUGUGUUGCAGAGCUU D-003459-02 BIRC5 NM_001168 GCAAAGGAAACCAACAATA D-003459-07 BIRC5 NM_001168 CCACTGAGAACGAGCCAGA D-013213-04 CDC40 #1 NM_015891 UGAGUUAUGUGAUUUCAGG D-013213-01 CDC40 #2 NM_015891 GAGAAAUUGUGCAGGAAUA D-011237-04 CDC5L NM_001253 GUGCAAAGCCAGAUGGUAU D-021163-01 CDCA8 NM_018101 GAAGAGAACTCAGTCCATA D-021163-03 CDCA8 NM_018101 TGAACTGGCTTGACTACTT D-003239-08 CDK5 #1 NM_004935 GGAUUCCCGUCCGCUGUUA D-003239-07 CDK5 #2 NM_004935 CAACAUCCCUGGUGAACGU D-003252-06 CENPE NM_001813 CAACAAAGCTACTAAATCA D-003252-21 CENPE NM_001813 GAACTAAGAAGAAGCGTAT M-003255-04 D-003255-06 Chek1 NM_001274 GCAACAGUAUUUCGGUAUA D-003255-07 Chek1 NM_001274 GGACUUCUCUCCAGUAAAC D-003255-08 Chek1 NM_001274 AAAGAUAGAUGGUACAACA D-003255-09 Chek1 NM_001274 CCACAUGUCCUGAUCAUAU D-001206-13 control unknown pool 1 D-008212-03 CLOCK NM_004898 GGACAAATCTACTGTTCTG D-008212-04 CLOCK NM_004898 GAGTTTACATCTAGACATA D-014684-01 CMT4B2 NM_030962 CAACATTGCCGCAGCATTA D-014684-02 CMT4B2 NM_030962 GAACATCAGCGCCAGGTGA D-014684-03 CMT4B2 NM_030962 GCTCTAAAGCCCAATGTAA D-014684-17 CMT4B2 NM_030962 CCAGAAAGTTCCACGGCCA D-014151-03 CRY2 NM_021117 CCTCTTCGGTGCACTGGTT D-014151-04 CRY2 NM_021117 GGCTTAACATTGAACGAAT D-003479-04 CSNK1E NM_001894 GAGAGCAAGTTCTACAAGA D-003479-05 CSNK1E NM_001894 CCACCAAGCGCCAGAAGTA

201

Dharmacon Dharmacon Duplex Pool Number Number Gene # Accession Sequence D-011809-04 DMD #1 NM_000109 CAAGACAGUUGGGUGAAGU D-011809-02 DMD #2 NM_000109 GCAAGUGGCAAGUUCAACA D-004069-03 DVL2 NM_004422 TGTGAGAGCTACCTAGTCA D-004069-05 DVL2 NM_004422 CGCTAAACATGGAGAAGTA D-006527-01 Egr2 #1 NM_000399 GAAACCAGACCUUCACUUA D-006527-04 Egr2 #2 NM_000399 CCACGUCGGUGACCAUCUU D-004888-05 ERCC6 NM_000124 GAUCACAUCUUACUCCUAC D-004888-04 ERCC6 NM_000124 GAAGAGGGCUUUGCAGUUC M-019946- D-019946-01 ERCC4 NM_005236 UGACAAGGGUACUACAUGA 00-005 D-019946-02 ERCC4 NM_005236 GUAGGAUACUUGUGGUUGA D-019946-03 ERCC4 NM_005236 ACAAGACAAUCCGCCAUUA D-019946-04 ERCC4 NM_005236 AAGACGAGCUCACGAGUAU M-006626- D-006626-01 ERCC5 NM_000123 CAUGAAAUCUUGACUGAUA 01-005 D-006626-02 ERCC5 NM_000123 GAACGCACCUCGCAGCUGUA D-006626-03 ERCC5 NM_000123 GAAAGAAGAUGCUAAACGU D-006626-04 ERCC5 NM_000123 GAACGAACUUUGCCCAUAU D-009426-04 FKBP6 NM_003602 GAAACUGGCUAGCUGUUAC D-006170-01 GABRB3 NM_000814 GAACTGCACTCTGGAAATT D-006170-05 GABRB3 NM_000814 TACCTGATCTAACCGATGT D-006175-02 GABRG3 NM_033223 GAAAGGGCGTATTCACATA D-006175-03 GABRG3 NM_033223 GAAGATACCTGTGTCTATG D-006178-06 GABRR1 NM_002042 GTCCTGACATCACCAAATC D-006178-07 GABRR1 NM_002042 GGATAGATGACCATGATTT D-017887-03 GJB1 #1 NM_000166 AGAAUGAGAUCAACAAGCU D-017887-01 GJB1 #2 NM_000166 GAUGAGAAAUCUUCCUUCA D-001400-01 GL3 CTT ACG CTG AGT ACT TCG D-005006-01 HSPB8 NM_014365 GAACTCAGATTTAGTGCAA D-005006-02 HSPB8 NM_014365 GGACTTAACATTTCACGTT D-005006-03 HSPB8 NM_014365 GCTGGGAGCCTGTCAGTTT D-005006-04 HSPB8 NM_014365 CTAAGAACTTCACAAAGAA D-006823-01 Incenp NM_020238 CCACGATGCTGACTAAGAA D-006823-02 Incenp NM_020238 CAAGAAGACTGCCGAAGAG D-009317-03 KIF1B NM_015074 CAAACTGGTTCGTGAATTA D-009317-19 KIF1B NM_015074 GGGAGTTGCCATTCGGGAA D-008388-03 LYPLAL1 #1 NM_138794 CAAUUGAUGUCAUGUGUCA D-008388-02 LYPLAL1 #2 NM_138794 CCACGAAGUUUCAUAGUUU D-015424-02 MPZ NM_000530 UCAAAGAGCGCAUCCAGUG D-015424-03 MPZ NM_000530 GACCCUCGCUGGAAGGAUG D-008038-02 MtmR2 #1 NM_016156 GAAACUGUGUGUAAGGAUA D-008038-03 MtmR2 #2 NM_016156 GAGAAAGAAUGGCUAAGUU

202

Dharmacon Dharmacon Duplex Pool Number Number Gene # Accession Sequence D-016143-02 Mus81 NM_025128 GGGAGCACCUGAAUCCUAA D-016143-03 Mus81 NM_025128 CAGGAGCCAUCAAGAAUAA D-016143-04 Mus81 NM_025128 GGGUAUACCUGGUGGAAGA D-016143-17 Mus81 NM_025128 CAGCCCUGGUGGAUCGAUA D-004866-01 Nek8 #1 NM_178170 GGGCAGAGAGCGAAGUGUA D-004866-04 Nek8 #2 NM_178170 UGCCUUCACUGUAGCUAUU D-004866-02 Nek8 #3 NM_178170 ACAGGAAGCUCAGCGAGUU D-004866-03 Nek8 #4 NM_178170 GGUAUCGAUUCCUCCAUGA D-011350-01 PER1 NM_002616 GAACCAGGATACCTTCTCA D-011350-03 PER1 NM_002616 GGCCGAATCGTCTACATTT D-010616-02 PMP22 NM_000304 UUACAUCACUGGAAUCUUC D-010616-19 PMP22 NM_000304 unknown L-003530-00 Rad51 NM_002875 unknown D-031917-01 Set8 #1 NM_020382 ACCCGUGGCUGAAGCAUUA D-031917-02 Set8 #2 NM_020382 GCAACUAGAGAGACAAAUC D-031917-03 Set8 #3 NM_020382 GAUUGAAAGUGGGAAGGAA D-031917-04 Set8 #4 NM_020382 GCACGACAUCGACGGCGUA D-024332-01 KIAA1985 NM_024577 GAAGGCCTTGACGGGTTA D-024332-02 KIAA1985 NM_024577 CAGCAAGGTTGGTCAGTAT D-024332-03 KIAA1985 NM_024577 GGGTTATGCTGACCACTTT D-024332-04 KIAA1985 NM_024577 TAAAGGCTCCGCCCTGTTG D-012446-04 SKIIP #1 NM_012245 CUACAUUGCUGAUCGGAAG D-012446-04 SKIIP #2 NM_012245 AAACAGAGGGUUAUUCGGA D-012446-01 SKIIP #3 NM_012245 GGAAAUGCGUGCCCAAGUA D-004830-01 SRCAP #2 NM_006662 GGAAAUUGCUGCCCUCGUA D-004830-03 SRCAP #3 NM_006662 GAACAGGUCUCCAGCAGAU D-003982-02 SRPK1 #1 NM_003137 GAAGUCAGUUCGCAAUUCA D-003982-04 SRPK1 #2 NM_003137 GAACACAUAUCUGCAUGGU D-003982-05 SRPK1 #3 NM_003137 UCACGAAGCUGAAACCUUG D-003982-07 SRPK1 #4 NM_003137 GUUACAGGGUCUUGAUUAU D-004050-05 STK22C #1 NM_052841 GCAUGGGUGUGGUCCUGUA D-004050-06 STK22C #2 NM_052841 GAACAUCAUCCAGGUGUAU D-005038-09 STK22D #1 NM_032028 CAUCACAUGGCAAGGUCUA D-005038-07 STK22D #2 NM_032028 GGAUGACAGUGGUCGAAUG D-009409-01 WNT8B NM_003393 CCAAAGGCTTACCTGATTT D-009409-17 WNT8B NM_003393 CCATGAAACGCACGTGCAA D-007054-02 Znf574 NM_022752 ACGCAAAGCUCCACACUGA

203

Information for mRNA processing siRNA ordered in plate format

Dharmacon Duplex Gene Symbol Accession Number Sequence Number D-022214-01 AQR NM_014691 GGUAGUACAUUGCCAGAUG D-022214-02 AQR NM_014691 GCUUACAGCUGGCACGAUU D-022214-03 AQR NM_014691 GAAGUCAAACGAUUGCAAA D-022214-04 AQR NM_014691 GGAUGUCCGUCGCUUGGUA D-013213-01 CDC40 NM_015891 GAGAAAUUGUGCAGGAAUA D-013213-02 CDC40 NM_015891 GUUAAUCUACGGUCAACUA D-013213-03 CDC40 NM_015891 GCCUAUGACAGGUAUCUUA D-013213-04 CDC40 NM_015891 UGAGUUAUGUGAUUUCAGG D-011237-01 CDC5L NM_001253 CGAGACAAGUUAAACAUUA D-011237-02 CDC5L NM_001253 GAGAGGAGUUGAUUAUAAU D-011237-03 CDC5L NM_001253 GCUCUCAAGUGAAGCUUAU D-011237-04 CDC5L NM_001253 GUGCAAAGCCAGAUGGUAU D-019013-01 CRNKL1 NM_016652 GAUCAAGUAUGCCCGCUUU D-019013-02 CRNKL1 NM_016652 GAAAGGGUACGAGUGAUUU D-019013-03 CRNKL1 NM_016652 CAAUUAUGAUGCAUGGUUU D-019013-04 CRNKL1 NM_016652 CGAGCGUGCUUUAGAUGUA D-011869-01 HNRPC NM_001077443 CAACGGGACUAUUAUGAUA D-011869-02 HNRPC NM_001077443 CAGUAGAGAUGAAGAAUGA D-011869-03 HNRPC NM_001077443 AGAAGGAGCUGACCCAGAU D-011869-04 HNRPC NM_001077443 GAAACGUCAGCGUGUAUCA D-017813-01 LSM2 NM_021177 CAGAUGAGGUCGACACACA D-017813-02 LSM2 NM_021177 GACCUGAGCAUCUGUGGAA D-017813-03 LSM2 NM_021177 UGUCACAGACCCUGAGAAA D-017813-04 LSM2 NM_021177 GGUCGUGGAACUAAAGAAU D-021186-01 RBM22 NM_018047 GUAAAUGGCCGCAGACUGA D-021186-02 RBM22 NM_018047 CAACAAAGAGUACUAUACA D-021186-03 RBM22 NM_018047 UGCCAAACCUGCAGUAAAU D-021186-04 RBM22 NM_018047 CCAUAUAUCCGAAUGACCA D-016051-01 SF3A1 NM_001005409 GUAAGAAGAUCGGUGAGGA D-016051-02 SF3A1 NM_001005409 CAUCUUCGGUGUAGAGGAA D-016051-03 SF3A1 NM_001005409 CAGAUCGACUGGCAUGAUU D-016051-04 SF3A1 NM_001005409 GAACACAUGCGCAUUGGAC D-018282-01 SF3A2 NM_007165 GCAGAGAGAUCGACAAGGC D-018282-02 SF3A2 NM_007165 CACCUGGGCUCCUAUGAAU D-018282-03 SF3A2 NM_007165 CAACAAGGACCCGUACUUC D-018282-04 SF3A2 NM_007165 CCACGUCACCGCUUCAUGU D-019808-02 SF3A3 NM_006802 GCAAACCUAUUCCCUACUG D-019808-03 SF3A3 NM_006802 CGUCAUGGCUAAAGAGAUG D-019808-04 SF3A3 NM_006802 UGAUAAGGAUGGAUUACGA D-019808-17 SF3A3 NM_006802 GGACAGGAGAAGAGCGAGA D-026599-01 SF3B2 NM_006842 GAGAGAAAGUUCGGCCUAA D-026599-03 SF3B2 NM_006842 UGACAUCGACUACCAGAAA D-026599-04 SF3B2 NM_006842 GGACAAAGCCGCUCCACCU D-026599-18 SF3B2 NM_006842 AAGCGUAGGAACCGAAAGA

204

Dharmacon Duplex Gene Symbol Accession Number Sequence Number D-020085-01 SF3B3 NM_012426 GGACAUAGGGUAAUUGUAU D-020085-02 SF3B3 NM_012426 GAUAUCCGCUGUCCAAUUC D-020085-03 SF3B3 NM_012426 UAGCUGAUCUGGCCAAUGA D-020085-04 SF3B3 NM_012426 GCCAAGGACCUGAUACUAA D-017190-01 SF3B4 NM_005850 GAGAAGUUGCUUUAUGAUA D-017190-02 SF3B4 NM_005850 GAACAAAGCAUCAGCUCAC D-017190-03 SF3B4 NM_005850 UGAUCAAACUCUAUGGGAA D-017190-04 SF3B4 NM_005850 UAACCGUCCUAUCACCGUA D-012446-01 SKIIP NM_012245 GGAAAUGCGUGCCCAAGUA D-012446-02 SKIIP NM_012245 GAGAAGAGCUGGGAUCAAA D-012446-03 SKIIP NM_012245 AAACAGAGGGUUAUUCGGA D-012446-04 SKIIP NM_012245 CUACAUUGCUGAUCGGAAG D-005033-01 SMG1 NM_015092 GUGAAGAUGUUCCCUAUGA D-005033-02 SMG1 NM_015092 GAGGUUAGCUGCGGAAAGA D-005033-03 SMG1 NM_015092 GGUCAGACAUCCACCAGAA D-005033-04 SMG1 NM_015092 UAACUUGGCUCAGCUGUAU D-019577-01 SNRPA1 NM_003090 AAAUCUAGGUGCUACGUUA D-019577-02 SNRPA1 NM_003090 CAACAGAAUAUGCCGUAUA D-019577-03 SNRPA1 NM_003090 UAAGAAAUCCGGUAACCAA D-019577-04 SNRPA1 NM_003090 UCAAAUCGCUGACUUACCU D-017766-01 SNRPB NM_003091 CCAAAGAACUCCAAACAAG D-017766-02 SNRPB NM_003091 GGACCUCCUCCCAAAGAUA D-017766-03 SNRPB NM_003091 CAUAUUGAUUACAGGAUGA D-017766-04 SNRPB NM_003091 UAUGAGACCUCCUAUGGGU D-012353-01 SNRPD1 NM_006938 GGAAAUAACAUUCGGUAUU D-012353-02 SNRPD1 NM_006938 GCUGAGUAUUCGAGGAAAU D-012353-03 SNRPD1 NM_006938 GAACACAGGUCCAUGGAAC D-012353-04 SNRPD1 NM_006938 GCAUGAAUACACAUCUUAA D-019085-01 SNRPD3 NM_004175 GAAGAACGCACCCAUGUUA D-019085-02 SNRPD3 NM_004175 GAACACCGGUGAGGUAUAU D-019085-03 SNRPD3 NM_004175 CGAUUAAAGUACUGCAUGA D-019085-04 SNRPD3 NM_004175 AUACAUCCGUGGCAGCAAA D-003982-02 SRPK1 NM_003137 GAAGUCAGUUCGCAAUUCA D-003982-04 SRPK1 NM_003137 GAACACAUAUCUGCAUGGU D-003982-05 SRPK1 NM_003137 UCACGAAGCUGAAACCUUG D-003982-07 SRPK1 NM_003137 GUUACAGGGUCUUGAUUAU D-004839-05 SRPK2 NM_182691 GAACAUAGACCCUACGUGG D-004839-06 SRPK2 NM_182691 GCAGAACGGUUUCAGCCUC D-004839-07 SRPK2 NM_182691 UGAAGAGUAUCAUUCGACA D-004839-08 SRPK2 NM_182691 GACCCUACGUGGAUAGAAU D-012380-01 U2AF2 NM_001012478 GGAGUUCACCUCUGUGUUU D-012380-02 U2AF2 NM_001012478 GGAGCACGGUGGACUGAUU D-012380-04 U2AF2 NM_001012478 CAACGAGAAUAAACAAGAG D-012380-17 U2AF2 NM_001012478 CGGUAGGAACAUAGCGUGU D-006087-01 USP39 NM_006590 GAUCAUCGAUUCCUCAUUG D-006087-02 USP39 NM_006590 CAAGUUGCCUCCAUAUCUA D-006087-03 USP39 NM_006590 UCACUGAGAAGGAAUAUAA

205

Dharmacon Duplex Gene Symbol Accession Number Sequence Number D-006087-04 USP39 NM_006590 ACAUAAAGGCCAAUGAUUA D-013343-01 WBP11 NM_016312 GGACAUGGAUCAAGAUAAG D-013343-02 WBP11 NM_016312 CAAAUCAGAUGGAGAAAGU D-013343-03 WBP11 NM_016312 GAGAGUACGUCGGGAGAAU D-013343-04 WBP11 NM_016312 AGAGAUUACUCGAUUUGUG D-004914-01 XAB2 NM_020196 CAUGAUCGCUGCAAAGAUG D-004914-02 XAB2 NM_020196 ACGCAGCACUCUCGAAUUU D-004914-03 XAB2 NM_020196 GCAUCUCGCUGUUCAAGUG D-004914-04 XAB2 NM_020196 CCAAUUCUCUGUCAAAUGC D-011293-01 HNRPL NM_001005335 CAUCAUGCCUGGUCAGUCA D-011293-02 HNRPL NM_001005335 GAAGAUCGAAUACGCAAAG D-011293-03 HNRPL NM_001005335 CCACGGAUGUUCUUUACAC D-011293-04 HNRPL NM_001005335 GGAGUGAAGCGGCCAUCUU D-012252-01 PRPF8 NM_006445 UGAAGCAUCUCAUCUAUUA D-012252-02 PRPF8 NM_006445 GCAGAUGGAUUGCAGUAUA D-012252-04 PRPF8 NM_006445 GGAAGAAGCUAACUAAUGC D-012252-05 PRPF8 NM_006445 GAUAAGGGCUGGCGUGUCA D-013471-01 DDX19 NM_001014449 GGACGGGAAUCCUGACAAU D-013471-03 DDX19 NM_001014449 GCUCCAAGCUCAAGUUCAU D-013471-17 DDX19 NM_001014449 CUGCAGUGAUUGAGCGCUU D-013471-18 DDX19 NM_001014449 GAACAAGUGUCUGUCGUCA D-010394-01 DDX41 NM_016222 GCAAAGAGUCUGCCAAGGA D-010394-02 DDX41 NM_016222 UCAAAGCGCUGCUGCUAGA D-010394-03 DDX41 NM_016222 CAAGGGAGCUGCGGAGGAA D-010394-04 DDX41 NM_016222 CUAAGGGCAUUACGUAUGA D-018106-01 FLJ36754 NM_173829 GAUGUCAGCAGUACAAGUA D-018106-02 FLJ36754 NM_173829 GAAAAGGUCUUACUCAUCC D-018106-03 FLJ36754 NM_173829 AAGCAUUCUUCUACACCUA D-018106-04 FLJ36754 NM_173829 CUGAAUAAAUUGCAGGCAU D-013894-01 KIAA1160 NM_020701 GCUGGGAGGUCCUGAUUAU D-013894-02 KIAA1160 NM_020701 GUACUAUGGUUACCUAGAU D-013894-03 KIAA1160 NM_020701 GAAGUUCAUUGCUCACGUC D-013894-04 KIAA1160 NM_020701 GAGCCGAGUUAGUGGAAAA D-019754-01 LSM6 NM_007080 GUAAAUGGACAACUGAAGA D-019754-02 LSM6 NM_007080 UAAAUUCUGGAGUGGAUUA D-019754-03 LSM6 NM_007080 UGGCUUGCCUGGAUGGCUA D-019754-04 LSM6 NM_007080 GCAAAUCAUCGGACGACCA D-013078-01 NUP98 NM_139131 GGAAAGAAGUAGUUGUCUA D-013078-02 NUP98 NM_139131 GGCAAUAACAAGCUUACUA D-013078-03 NUP98 NM_139131 ACAAAUACCACCUCUAAUC D-013078-04 NUP98 NM_139131 GCAUUUGGUUCUAGCAACA D-014987-01 PHF5A NM_032758 GCAAGUGUGUGAUUUGUGA D-014987-02 PHF5A NM_032758 GCAUAUGUGAUGAGUGUAA D-014987-17 PHF5A NM_032758 UGGUGUUGCCAUCGGAAGA D-014987-18 PHF5A NM_032758 GGAACUGACUGUGAAGCGA D-019593-01 PLRG1 NM_002669 GACAAUCGAUGUGUUGGUA D-019593-02 PLRG1 NM_002669 UGAGUAUGGUCCUGUGUUG

206

Dharmacon Duplex Gene Symbol Accession Number Sequence Number D-019593-03 PLRG1 NM_002669 UCUGAAAGUCGAUUACUAA D-019593-17 PLRG1 NM_002669 AUUAACACAUUGACGGUAA D-019836-01 PRPF3 NM_004698 GGCCGAAGCUCUAGGCAUU D-019836-02 PRPF3 NM_004698 GGACAAAGGCUCAACUGGA D-019836-03 PRPF3 NM_004698 UUGGAGAGAUGAAGUUUAA D-019836-04 PRPF3 NM_004698 GGGAGUAUAUCUUACCAAG D-020525-01 PRPF31 NM_015629 GCUCUUAGCUGAUCUCGAA D-020525-02 PRPF31 NM_015629 GGAAGCAGGCCAACCGUAU D-020525-17 PRPF31 NM_015629 GCGCCUGAAUACCGCGUCA D-020525-18 PRPF31 NM_015629 GCACCGCAUCUACGAGUAU D-017283-01 SART1 NM_005146 CCGAAUACCUCACGCCUGA D-017283-02 SART1 NM_005146 GCAAGAGCAUGAACGCGAA D-017283-03 SART1 NM_005146 GCUACAAACCCGACGUUAA D-017283-04 SART1 NM_005146 GAACCGAUCGUGAAUAGGG D-014706-01 SF3B5 NM_031287 GCUCCUACAUGGGCCACUU D-014706-02 SF3B5 NM_031287 UCAACUACUUCGCCAUUGC D-014706-03 SF3B5 NM_031287 CCAAGUACAUCGGCACGGG D-014706-04 SF3B5 NM_031287 GAAUGAGAGCAAAGCGCGA D-017191-01 SLU7 NM_006425 CGAAAGAGCAGUUCAGAUA D-017191-02 SLU7 NM_006425 GGAAGGAGAUUGUUAACUC D-017191-03 SLU7 NM_006425 GGAGCCAAAUUUACAGGUA D-017191-04 SLU7 NM_006425 GUGGAGUACUCAAGACAUG D-013617-01 SNRPD2 NM_004597 GAUAGGCACUGCAACAUGG D-013617-02 SNRPD2 NM_004597 UCAACAAGCCCAAGAGUGA D-013617-03 SNRPD2 NM_004597 UCAAGAACAAUACCCAAGU D-013617-04 SNRPD2 NM_004597 AUCAACUGCCGCAACAAUA D-019719-01 SNRPE NM_003094 CAAAAUAGAUCGCGGAUUC D-019719-02 SNRPE NM_003094 GCAAGUGAAUAUGCGGAUA D-019719-03 SNRPE NM_003094 UACAAAGUGUCUCCAACUA D-019719-04 SNRPE NM_003094 GAACCUUGUAUUAGAUGAU D-012323-01 SRP46 NM_032102 CACUACAGCUCAUCUGGUU D-012323-02 SRP46 NM_032102 GAAUCUCGCUACGGCGGAU D-012323-03 SRP46 NM_032102 CAAAUCGAGCUCUGCGCGA D-012323-04 SRP46 NM_032102 CGAUCUCGCUAUAGGGGUU D-019851-01 U5-116KD NM_004247 UGGCGGAUCUGAUGGAUAA D-019851-02 U5-116KD NM_004247 UGGAUGAGGUCAAUGGAUU D-019851-03 U5-116KD NM_004247 AUGAUGAGUUUGGGAAUUA D-019851-04 U5-116KD NM_004247 GACCAAGAUCUGUGCUAUA

References

1. Banaszynski LA, Chen LC, Maynard-Smith LA, Ooi AG, Wandless TJ. A rapid, reversible, and tunable method to regulate protein function in living cells using synthetic small molecules. Cell 2006; 126:995-1004.

207

208

APPENDIX D

Supplementary Tables

******These files, due to size constraints, are available as supplemental material and can be accessed via the Stanford library website.******

Table S1. γH2AX Signal and Cell Cycle Distribution for Genome (NT dataset). The following list shows the percentage of γH2AX positive cells, cell number and cell cycle distribution for all genes tested within the original screen. Genes are arranged in order of descending average γH2AX percentage. Where no cell cycle information is available, the PI intensity varied beyond that which could be corrected in our analysis program because of insufficient cell numbers or variation between duplicates. Average control values for all plates are located at the top of the list, before the genome data.

Table S2. List of Genes with Significant γH2AX Values (NT dataset). The following list shows genes that had a γH2AX signal significantly above the calculated noise of the screen. The significance column designates the category of assigned confidence. The confidence groupings ranged from 4 to 1, with the highest level of confidence in group 4. See Appendix A for details and further description of these confidence levels.

Table S3. Genes that Caused Extensive Cell Death. The following is a list of genes that when knocked down lead to widespread cell death. This was defined based on the number of cells per well in the original genome-wide screen left 72 hours after siRNA transfection. For these genes, less than 400 cells were left per well after transfecting the corresponding siRNA, a value less than 50% the number of cells originally transfected. The gene/protein product localization and function were categorized using Ingenuity Pathway Analysis (Ingenuity Systems, www.ingenuity.com). These genes were eliminated from subsequent enrichment analyses.

Table S4. Categories of Enrichment Determined by DAVID Bioinformatic Database and Ingenuity Pathway Analysis. The genes in significance group 4 were analyzed for enrichments in GO terms (biological process, cellular complex, molecular function), protein information resource keywords, or the OMIM/Genetic Association disease datasets. Significantly enriched categories (p value < 0.05) are shown, along with the corresponding genes. Highlighted categories are shown in Figure 2.5.

Table S5. List of Deconvoluted Genes. The following is a list of genes for which we individually tested four different siRNAs. Shown is the percentage of cells considered γH2AX positive and the number of viable cells/well for each siRNA tested. Duplicate measurements for γH2AX signal are represented by H2AX 01A and 01B, and duplicate

209

measurements for cell number are represented by Cell# 01A and 01B. Averages of both values are also shown together with standard deviation. The average control values for all plates are located at the top of the list, before the deconvolution data.

Table S6. Individual Components of Gene Modules and Networks Enriched Amongst Screen Hits. The individual modules identified by DAVID Bioinformatic Database and linked by Ingenuity Pathway Analysis are shown in Chapter 2. Components of these modules are shown here, along with their gene ID numbers.

Table S7. List 53BP1 Staining Results. The following genes were assessed for the presence of 53BP1 foci formation with four individual siRNA. For a cell to designated positive it had to contain at 5 or greater 53BP1 foci. Foci were counted using a MetaXpress granularity script.

Table S8. Phospho-H3 Recovery Assay Results. The following genes were assessed for their ability to progress into mitosis both in the presence and absence of aphidicolin with four individual siRNA. The average % NT or %AP refers to the percentage of cells staining positive for the mitotic marker, the phosphorylation of histone H3 (Ser10). The ratio of recovery was calculated as the (%AP/%NT Gene X)/(%AP/%NT siControl).

Table S9. Table of Genes Identified in This Screen and Other Screens. Hits from the screens shown below or their orthologs were compared to the genes found in our screen within our significance groups or which were confirmed by deconvolution with at least 1 siRNA.

Table S10. γH2AX Values From Retesting mRNA Processing Genes and the Effect of RNAseH Treatment. The average γH2AX value after gene knockdown is shown with the standard error calculated from three replicates. Genes highlighted in gray were shown to have consistent decreases in γH2AX after RNAseH treatment. A consistent decrease was classified as any gene that showed a decrease with at least three siRNAs. Two of which had a decrease of greater than 10% of the γH2AX signal.

Table S11. mRNA Processing γH2AX Values +/- Aphidicolin Treatment. The average γH2AX value after gene knockdown is shown with the standard deviation calculated from two replicates.

Table S12. Replication and DNA Damage Correlation for mRNA Processing Genes. The average % EdU positive cells and % γH2AX positive cells are shown from two experimental replicates. The designations of “number of siRNAs that show an increase or decrease in S-phase” were determined by percentage cutoffs. Greater than 35% EdU positive was considered an S-phase increase and less than 20% EdU positive was considered to be an S-phase decrease.

210