UNDERSTANDING HUMAN DNA POLYMERASE EPSILON FLTNCTIONS: CANCER-

ASSOCIATED MUTATOR VARIANTS, PROOFREADING DEFECTS AND POST.

TRANSLATIONAL MODIFIC ATION S

AN ABSTRACT SUBMITTED ON THE NINTH DAY OF MARCH 2015

TO THE DEPARTMENT OF BIOCI{EMISTRY AND MOLECULAR BIOLOGY

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

OF TI{E GRADUATE SCHOOL

OF TULANE LINryERSITY

FOR T}IE DEGREE

OF

DOCTOR OF PHILOSOPHY

BY .\ -ru ./ -t' i..tr- -*<*fr**_-*f

ERIN ELV ABETH T{ENNINGER

APPROVED: F. Pursell, Ph.D. Advisor

Dr. Samuel J.

Dr. Victoria Bellancio

ABSTRACT

DNA Polymerase Epsilon (Pol ε) is one of three main eukaryotic Pols responsible for nuclear DNA replication. The Pol ε holoenzyme is comprised of four subunits, termed p261, p59, p17, and p12, with the largest subunit containing the DNA polymerase and 3ʹ to 5ʹ exonuclease

(exo) proofreading activities. In addition to nuclear DNA replication, Pol ε participates in DNA repair, recombination, maintenance of epigenetic states and S-phase regulation, though the contribution of the smaller subunits to these processes is largely unknown. I set out to identify functions of the p12 subunit through determining post-translational modifications and - protein interaction partners. This approach found that p12 is likely constitutively phosphorylated and that p12 ubiquitylation dynamics may be important during replication stress and fork stalling. p12 also putatively interacts with involved in maintaining genome stability including

TOP1, HSP90, nucleolin and PRKDC.

A larger portion of my project involved studying the role of cancer-associated mutations within the exo domain of POLE1, the encoding the p261 subunit. Tumors harboring these

POLE1 mutations are hypermutated, with mutation frequencies exceeding 100 mutations/Mb.

However, unlike POLE1 wild type hypermutated tumors, the POLE1 mutant tumors are microsatellite stable (MSS). With our collaborators at the Baylor College of Medicine Human

Genome Sequencing Center and the Memorial Sloan Kettering Cancer Center, we determined that C→A and C→T substitutions are highly elevated in these tumors relative to others, specifically at TCT and TCG motifs, respectively. I purified recombinant Pol ε and showed that several cancer mutant constructs, including S459F, P286H/R, L424V/I, and D275A/E277A, had elevated error rates for all 12 base pair substitutions and frameshifts, with the same propensity to

make TCT→TAT mutations in vitro. In order to study the mechanism of how these specific mutations are made upon Pol ε exo inactivation in vivo, I constructed a knock-in cell culture model. In this model, I used targeted knock-in approach to introduce the D275A/E277A double amino acid substitution at the genomic POLE1 locus using an engineered recombinant adeno- associated virus. Mutation rates and base pair substitution error rates were both increased in a mismatch repair null background upon Pol  exo inactivation. TCT→TAT basepair substitutions had the highest increase in error rate in the engineered cell culture system, as was seen in the

POLE tumors, demonstrating the utility of this system for studying the relationship between Pol

-dependent mutagenesis and tumor formation.

The high rate of TCT→TAT mutagenesis has interesting consequences in tumors. The nucleotide preference of Pol ε variants leads to increases in recurrent nonsense mutations in key tumor suppressors such as TP53, ATM and PIK3R1. Moreover, strand-specific mutation patterns are seen during replication of these . Mapping of TCT→TAT hotspot mutations around known origin of replications provided the first direct evidence that Pol ε is the leading strand polymerase in human cells. The strand specificity of these mutations and high abundance in human tumors allows for unique identification of eukaryotic origins of replication.

LINDERSTANDING HUMAN DNA POLYMERASE EPSILON FLINCTIONS: CANCER-

ASSOCIATED MUTATOR VARIANTS, PROOFREADING DEFECTS AND POST-

TRANSLATTONAL MODIFICATIONS

A DISSERTATION SUBMITTED ON THE NTNTH DAY OF MARCH 2015

TO T}IE DE,PARTMENT OF BIOCHEMISTRY AND MOLECULAR BIOLOGY

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

OF THE GRADUATE SCHOOL

OF TI.ILANE UNTVERSITY

FOR THE DEGREE

OF

DOCTOR OF PHILOSOPHY

BY

"' {*- .---*" l,..*4-**-:- ERIN ELIZABETH HENNINGER

APPROVED: achfury F. Pursell,

Dr. Victoria Bellanci

ACKNOWLEDGEMENTS

I first and foremost want to thank my advisor, Zachary Pursell, as without you this dissertation would not have been possible. I am thankful to have had the opportunity to study biochemistry in your lab, and I know you were very committed to giving me the best education I could get.

To my committee members Dr. Victoria Belancio, Dr. Sam Landry, Dr. Arthur Lustig, and Dr. Astrid Engel: thank you all for being on my committee, and for the encouragements and acknowledgements you have given me during my stay at Tulane. I would also like to thank Dr.

William Wimley and Dr. Erik Flemmington: you were great mentors to me.

Thanks to my Pursell lab mates: Anderson Ayuk Agbor: my “big brother” who was very reassuring; Yassi Goksenin: the helpful raccoon who found fun new lab things for us to tinker with; Kim LeCompte: who gave great lab, baking, and shopping advice; Karl Hodel: who admirably works very hard and shares funny cat videos.

To Dr. Paul Lambert and my labmates at Wisconsin: for helping me make the switch from chemistry to biology, being very patient with me and going out of their way to help me learn new things.

To Wisconsin Friends: Soyeong Park and Rup Chakravorty: Thanks for all of the fun times we had together. I enjoyed our sleepover/study sessions in the Ebling Library and our weekly “BRIS” dinners at Nam’s Noodles. To Nicole Woodards, Shu Yao, and Justin Shorb: my chemistry buddies. Thank you for all the times we could be silly together and laugh until our sides hurt. We have too many fun adventures to name, and I hope to continue them. To Jessica

Gross: It was such a joy spending undergrad and part of grad school with you. I’m so proud of what you’ve accomplished for your PhD; To Kimberly and George Dahlman: You are both

ii amazing and strong people. Thanks for your support through a difficult time in my life. To Jim and Cheryl Rot: I love you guys so much (mom and dad!). I could write this whole section just on the two of you. And I don’t know where I would be without you. Thanks for helping me grow as a person, showing me what unconditional love is, and taking me in as your own daughter.

Thanks to great friends from New Orleans, Philly, and Americorps: Brian Ridley: I admire your hard work and dedication to everything you do; Colleen Purcell: you have a wonderful, bubbly personality and you truly set an example with your passion for helping others;

Kate Jenkins: thanks for getting me acquainted with New Orleans and being and awesome roommate and person; Andrea Covington: my twin and dancing buddy; Donna Edwards: thanks for all the laughs; Preet Gill: my best college friend, we had so many fun times together. I would even take P chem again if I could do it with you; Lesley McCall: I’m so lucky to have made a lifelong friend like you, and I have always looked up to you ever since I can remember; Cecilia

Burns: I have never met a more kind and caring person, with such an endless store of compassion and I admire you so much; Rebecca Bortolin: you are one of my favorite people and a great concert buddy. I really appreciate all of your advice and compassion; Mallory Cortez: you are also one of my favorite people, an amazing scientist, and gave the best advice about grad school. I will miss our fun times at the camp and 80s nights; Teddy Livingston: you are so smart, considerate, and selfless. Thanks for helping me through difficult decisions and for being a great friend; Amie Devlin: my soulmate friend, I love that we have the best times when we are together, even if we were to end up stranded in NYC for a night, Your friendship has meant the world to me; Meredith Sosulski: I’m so glad we became such good friends. I will miss hanging out at Finns with you; Nam Nguyen: thanks for helping me through and always listening. You are a wonderful person who deserves great things; Brian Deskin: you were my first and are my best friend from BMS, and you are like a brother to me. Will see you soon in Europe; Aaron Miller: I know in heaven you are “in my corner.” I will always remember you.

iii

To the Nechvatal Clan, Grandma and Grandpa Henninger, and Uncle Brett: Thanks for always believing in me and loving me. I don’t know where I would be without any of you either, and it would take a long time to thank all of you, but thanks for teaching me how important family is. Uncle Brett, I know you supported me from heaven as well. I miss you.

To Mom, Dad and Jeff: I know you would do anything for me, and I know I made it through because of you. I can’t thank you enough, for everything. I definitely have the best family one could ask for. I could go on and on, but I know I wouldn’t be able to thank you enough. I love you dearly.

To my husband, Doug Stanley: I’m so glad I found you, and I know I would have even if

I had stayed in Wisconsin. It was meant to be. I know it isn’t easy being with someone who has chronic illness, but it shows what an amazing and special person you must be. You truly are my best friend. Thank you beyond words.

To my dear friends and family above (and maybe a few others I’m sure I forgot to mention): you all have been very supportive through my PhD process and through the difficulties of living with Celiac’s disease. I will always miss and love all of you, near and far! I couldn’t have done this without you!

iv

TABLE OF CONTENTS

ACKNOWLEDGEMENTS ...... ii

LIST OF TABLES ...... viii

LIST OF FIGURES ...... ix

Chapter 1: Introduction ……...... 1

1.1 DNA Polymerases: definition and enzymatic mechanism...... 2

1.2 Forms of Genome instability and repair pathways...... 4

1.3 Barriers to DNA replication...... 5

1.4 Eukaryotic DNA polymerases: X, Y and A families...... 6

1.5 DNA replication and B family polymerases...... 12

1.6 Polymerase alpha...... 14

1.7 Polymerase delta...... 15

1.8 Polymerase epsilon……………….……….………………………………………………….16

Chapter 2: Tumor-associated mutations in Pol ε lead to a mutator phenotype in vitro and in vivo, and reveal replication strand-specific mutation patterns and human origins of replication...... 24

INTRODUCTION ...... 25

METHODS…………………………………………………………...…………………………..30

RESULTS………………………………………………………………………………….……..36

2.1 Polymerase and excision activities of Pole-exo* variants...... 36

2.2 Reduction in excision for Pol ε cancer variants causes mutagenesis during in vitro DNA replication...... 37

v

2.3 The POLE tumor error spectrum and cell-free pole synthesis show that cancer variants produce a unique hotspot mutation...... 38

2.4 Genome-wide mutation patterns show replication strand bias...... 40

2.5 Strand-specific mutation patterns at origins of replication support that Pol ε replicates the leading strand in human cells...... 41

DISCUSSION ...... 43

Chapter 3: The consequences of p261 exo inactivation on lesion bypass, cell viability, and point mutations in a human cell line model...... 72

INTRODUCTION ...... 73

METHODS……………………………………………………………………………………….77

RESULTS……………………………………………………………………………...…………83

3.1 Complete Pol ε exonuclease inactivation in an MMR-deficient background is likely incompatible with cell viability…………………………………………………………...... 83

3.2 Inactivation of Pol ε proofreading causes a mutator phenotype in human cells...... 85

3.3 Inactivation of Pol ε proofreading leads to large increase in base pair substitutions, particularly C:G→A:T transversions……………………………...... 86

3.4 Inactivation of Pol ε proofreading leads to increased abasic site bypass and insensitivity to

DNA damaging agents……………………………………………………………...…………….88

DISCUSSION ...... 90

Chapter 4: Discovering functions of the p12 subunit through binding partners and post- translational modifications……………………………………………………………...... 108

INTRODUCTION ...... 109

METHODS ...... 115

RESULTS……………………………………………………………….………………………119

vi

4.1 Phosphorylation of p12...... 119

4.2 Verification of p12 modifications by mass spectrometry...... 120

4.3 p12 is modified within the last 25 residues ...... 120

4.4 Replication stress affects p12 ubiquitylation...... 122

4.5 Determination of p12 binding partners...... 122

DISCUSSION...... 124

REFERENCES...... 144

vii

LIST OF TABLES

Chapter 1

1.1 Repair pathways for mismatched or damaged bases………………………………….……..19

1.2 Human DNA Polymerase families……………………………………………………..…….20

Chapter 2

2.1 Excision by Pol ε variant constructs relative to WT activity…………………………………51

2.2 LacZ mutant frequencies for the Pol ε cancer mutant alleles……………………………..….52

Chapter 3

3.1 Karyotype Analysis of Pol ε exo- knock-in cells……………………………………………..93

3.2 HPRT1 mutation rates of Pol ε exo single knock-in clones………………………..………...94

3.3 HPRT1 mutations sequenced from 6-thioguanine resistant Pol ε wt/exo- and Pol ε wt/wt

HCT116 cells……………………………………………………………………………………..95

3.4 Results of HCT116 WT sequences from previous studies………………………………...…96

3.5 Exo- Pol ε allele increases base pair substitutions, specifically C:G to A:T, in the HPRT1 spontaneous mutation assay………………………………………………………………………97

Chapter 4

4.1 Primers used to make p12 constructs……………………………………………….……….129

4.2 Summary of p12 construct modification and stability……………………………..………..131

4.3 p12 binding partners identified by MS………………………………………………….…..132

viii

LIST OF FIGURES

Chapter 1

1.1 A model of eukaryotic DNA polymerases……………………………………………….…..21

1.2 The repair of DSBs is accomplished by two main pathways: HR and NHEJ……………..…22

1.3 Model Eukaryotic replication fork……………………………………………………….…..23

Chapter 2

2.1 Sequence and structural elements found altered in POLE mutated tumors………….………53

2.2 Conserved exonuclease domain elements from B family polymerases………………………55

2.3 Pol ε cancer variants are functional for polymerase activity but exonuclease activity is strongly reduced on ssDNA and primer/template containing a mismatch…………………….….57

2.4 Pol ε cancer variants increase replication errors in vitro………………………..……………60

2.5 Mutation spectra of POLE and POLD1 exonuclease domain-mutated tumors………………62

2.6 POLE cancer variants exhibit strand specific hotspot mutations……………………………..64

2.7 Propensity of Pole-exo* hotspot mutations to cluster on DNA strands………...……..……..66

2.8 The strand-specific mutation pattern of Pole-exo* tumor variants at known replication origins is consistent with Pol ε replicating the leading strand in human cells…………………..………..68

2.9 Pol ε cancer variants and their interaction with functional Pol and Exo Motifs……..……….70

Chapter 3

3.1 Generation of Exonuclease-deficient Pol ε human cell lines by gene targeting……………...98

3.2 Complete inactivation of polymerase error proofreading is incompatible with cell viability………………………………………………………………………………………….100

3.3 Pol ε exonuclease deficiency increases bypass of abasic lesions in vitro………………...…102

3.4 Pol ε exonuclease activity does not affect bypass of 8-GO in the template………………...104

3.5 Exo inactivation increases survival of drug-challenged cells……………………………….106

ix

Chapter 4

4.1 p12 phosphorylation occurs in the absence of IR treatment and in the S25A mutant………134

4.2 p12 has multiple ubiquitylated forms…………………………………………………….…136

4.3 p12 truncation and mutation constructs……………………………………………………..138

4.4 Ubiquitin dynamics of p12 may be important for response to stalled forks after dT treatment…………………………………………………………………………………….…..140

4.5 Identification of p12 binding partners………………………………………………...……..142

x

Chapter 1: Introduction

1

1.1 DNA Polymerases: definition and enzymatic mechanism

A DNA polymerase is an enzyme that uses a ssDNA molecule as a template for synthesis of another DNA strand by inserting a complementary base across from a template base1. DNA Pols are necessary for all synthetic DNA processes including replication, recombination, and repair, and translesion synthesis1,2. In human cells, there are 16 DNA Pols that accomplish these functions. The abundance of Pols in cells highlights their importance in many roles. In many cases their accuracy and expression level is very important to genome stability.

The polymerase reaction is a nucleotidyl transfer reaction. Polymerases lower the transition state energy of these nucleotidyl transfer reactions by aligning substrates, but do not themselves become reaction intermediates3. The nucleotidyl transfer involves the formation of a phosphodiester bond between the 3′-OH of a primer strand and the α-phosphate of an incoming dNTP3. Pols have common structural motifs to accomplish catalysis, including the finger, palm and thumb domain, while some families have unique accessory domains4. The finger, palm, and thumb domains are arrayed in a “right-handed” arrangement (Figure 1.1). The palm domain consists of 4-6 beta-sheets transversed by alpha helices4. The beta sheets contain two catalytic aspartate and glutamate residues4 which coordinate two divalent metal cations that are required for catalysis1. These are normally Mg2+ ions, but Mn2+ can be used in vitro and in some instances may be used in vivo. For example, Mn2+ is likely used by Pol lambda and Primpol in vivo5,6. The fingers domain interacts with the incoming dNTP and paired template base, and the thumb domain interacts with duplex DNA1,4. The thumb domain is essentially comprised of α-helices, and the fingers domain is of highly variable structure4. Recent evidence supports a role for a third metal ion during catalysis to transiently stabilize the new phosphodiester bond7,8.

2

Correct dNTP incorporation is favored over incorrect incorporation and is thought to occur by an induced fit mechanism that is supported by structural studies3,4. These studies show that most polymerases move between open and closed conformations during the polymerase reaction.

When bound to DNA in a binary complex, the enzyme is in the open state. Upon dNTP binding to form the tertiary complex, the enzymes enter a closed conformation that is mainly accomplished by movement of the fingers domain. With the release of a pyrophosphate upon catalysis, the enzyme resumes open conformation and the cycle resets. This closing of the fingers can discriminate between the right or wrong fit and in the event of a mismatch the rate of catalysis slows to an extent that the incorrect dNTP can dissociate before chemistry occurs9. This phenomenon has been coined a kinetic checkpoint10. A discrimination step can also occur before enzyme conformational change, in which some Pols can bind the correct nucleotide with higher affinity than the incorrect one4. A simpler explanation for the selection against nucleotide misincorporation is that differences in the free energy of transition states occur, with misincorporation having higher energy barriers at each step of the reaction compared to those of correct incorporation10. In vitro fidelity assays show that error rates of Pols range from 10-2 to 10-5 per base pair11.

In the event of misincorporation, some Pols, including Pol ε, can excise the mismatch with a

3ʹ-5ʹ exonuclease (exo) activity. As shown for the Bacillus DNA polymerase I fragment (BF), incorporation of a mismatch induces distortion of the next nucleotide insertion site through different mechanisms, slowing the rate of catalysis for the pol active site for all 12 mismatches12.

After this change, movement of the primer strand into the exo active site is favored. This change is largely mediated by the thumb domain13. Amino acid variants in the pol and exo domains of

Pols can upset the balance between these activities, leading to increases or decreases in mismatch incorporation rates14.

3

1.2 Forms of Genome instability and repair pathways

Polymerases can guard against, or in some cases contribute to, forms of genomic instability, and are important in DNA repair pathways. There are three forms of genomic instability: point mutation instability (PIN), microsatellite instability (MSI or MIN), and chromosomal instability

(CIN)15. CIN refers to structure and number changes as well as translocations15.

MSI refers to the expansion or contraction of small repeat sequences 1-6 nts in length called microsatellites16. These regions comprise 0.3% of the genome and are interspersed throughout, and instability within genes can contribute to human diseases like Huntington's disease, myotonic dystrophy, and fragile X syndrome16. PIN refers to base pair substitution errors, which can be made by the misinsertion of nucleotides during replication or repair processes, and can also result from damaged DNA bases. Damage to DNA bases comes from endogenous or exogenous insults such as oxidative metabolic intermediates, ionizing radiation, UV light, and chemicals. Mismatch repair (MMR), nucleotide excision repair (NER), base excision repair (BER), and ribonucleotide excision repair (RER) handle removal of damaging DNA lesions, mispairs, and loops or bulges to avoid genome instability17,18.

These repair pathways consist of proteins that participate in enzymatic removal of the damage in four primary steps: recognition of the damage or mismatch, incision and removal of the offending fragment, error-free gap-filling synthesis, and ligation of the remaining nick17 (see

Table 1.1). Functional overlap between repair pathways occurs and provides high rates of damage correction19. Failure to correct or incomplete correction of these lesions can lead to single-strand breaks (SSBs)17. SSB repair (SSBR) occurs through pathways closely related to

BER and RER, and in addition SSBs can be recognized by PARP1 (poly ADP-ribose polymerase

1)20.

The repair of double strand breaks (DSBs) is also accomplished by multiple pathways, and the pathway choice is partially dependent on the cell-cycle17,21. These lesions are particularly

4

deleterious because they can lead to loss of genetic information or translocations, which may be tumor promoting22. DSBs can form from reactions with endogenous or exogenous chemicals, or due to replication forks encountering stress or SSBs22. DSB repair can occur through error-free pathways such as homologous recombination (HR) and polymerase template switching, also known as break-induced replication (BIR)22,23. However, gene conversion, the unidirectional transfer of a donor sequence to acceptor, can also result from these repair pathways through non- crossover and synthesis-dependent strand annealing (SDSA) and BIR17,24. Other error-prone pathways include non-homologous end joining (NHEJ), which results in small indels, and single strand annealing, which also results in deletions17,21. An alternative end-joining pathway (Alt-EJ) has been identified that uses microhomology mediated joining, and factors that participate in HR or SSBR17. Recently, the main Pol involved in Alt-EJ has been identified as Pol θ25. The abundance of DSB repair pathways shows the importance of precluding these deleterious DNA lesions. Common DSB repair pathways are highlighted in Figure 1.2. Actions of DNA Pols in all repair pathways are important for proper genome maintenance1.

1.3 Barriers to DNA replication

Even with intact repair pathways, the 104-105 spontaneous base lesions generated per cell per day can be overwhelming and some inevitably escape repair17,26. Lesions may be replicated in a mutagenic manner or pose as barriers to replication. Common lesions include oxidation byproducts such as 8-oxo-guanine (8-GO) and 2-hydroxy-adenine, abasic sites, and UV induced lesions. UV light induces intrastrand crosslinks between two adjacent pyrimidine bases, resulting in 6-4 photoproducts, thymine-thymine cyclopyrimidine dimers, and other helix distorting lesions. In addition, undamaged DNA can also hinder replication in the form of difficult to replicate sequences. These include centromeric and telomeric repeats that create polymerase slippage27, fragile sites that are prone to breakage, g-quadruplex (G4) sequences that form

5

secondary structures, and dense heterochromatin structures28. Collisions with transcription machinery represent another form of replication barrier. These phenomena can cause fork stalling or pausing and dissolution of replication machinery. If unresolved, fork stalling can lead to replication fork collapse. This is synonymous with the inability to restart replication and potentially leads to DSBs29. Thus, during normal replication, the action of replicative Pols may lead to mutagenesis at tricky sequences.

1.4 Eukaryotic DNA polymerases: X, Y and A families

Human Pols are divided into four families based on homology of primary amino acid sequence30 (Table 1.2). The X family polymerases include Pols beta, lambda, and mu (β, λ, µ), as well as terminal deoxy-nucleotidyl transferase (TdT). The X family Pols lack 3ʹ-5ʹ proofreading capability, and synthesize DNA with relatively low fidelity. They are therefore considered error- prone and mutagenic31. These Pols are mainly responsible for filling in DNA gaps during base excision repair (BER), or repair of DNA double strand breaks (DSBs) during non-homologous end-joining (NHEJ)2,32. Pols λ, μ, and TdT- contain a BRCA1 C-terminal protein–protein interaction (BRCT) DNA binding domain that is important for NHEJ31. Pol λ fills in short gaps of

DNA during NHEJ and BER, and has some overlapping functions with Pols β and µ5. Pol λ also may participate in trans-lesion synthesis at abasic lesions, 8-GO, thymine glycol, and benzo[a]pyrene DNA adducts. Interestingly, Pol λ has the capability to extend from a misaligned primer/template substrate, which aids the ability to fill-in synthesis during NHEJ where this misalignment is likely to occur. This results in single nucleotide deletions having a much higher error rates than base pair substitutions during synthesis, which is unique to this enzyme5.

However Pol λ tends to make T•dG mispairs resulting in T to C transitions, and structural evidence shows this mispair is well accommodated by the active site5.

6

Like other X-family Pols, Pol λ is comprised of a single subunit. X family Pols contain similar structural motifs important for function, including a dRP lyase domain unique to this family, and finger, palm and thumb domains. The dRP lyase domain is responsible for cleavage of the sugar phosphate backbone 3ʹ to an abasic site through a β-elimination reaction and Schiff base intermediate5,31. This motif is non-functional in TdT and Pol µ31. Pol β has additional AP lyase activity, which creates nicks in the sugar phosphate backbone at abasic sites. Pol β uses lyase and polymerase activities during BER, where it can perform fill-in and strand-displacement

DNA synthesis on nicked, gapped, or recessed DNA substrates31. The error spectrum for Pol β contains a preponderance of single nucleotide deletions and transition mutations. In an in vitro lacZ forward mutation assay, Pol β has an overall error rate of 5e-4, which is relatively high among polymerases33,34. Two amino acid variants found in Pol  from cancer patients, Ile260Met and Lys289Met, cause an increase in mutagenesis31. These residues are important for base selectivity in the polymerase active site. Pol β may also contribute to mutagenesis through lesion bypass, as it can bypass cyclobutanol pyrimidine dimers (CPD) (6-4) T-T photoproducts, 8-GO and cisplatin adducts in vitro, causing frameshift mutations31.

Pol µ, an X family Pol, can synthesize DNA with or without a template, in the latter case as a terminal transferase, adding a nucleotide to a free 3ʹ-OH group35. Pol µ’s most important role is thought to be in VDJ recombination which generates antibody and T-cell receptor diversity; it is expressed to the greatest extent in lymphocytes, and deletion of the enzyme in mice leads to B- cell differentiation abnormalities31. Pol µ may also contribute to lesion bypass of 8-GO, AP sites,

[6-4] T-T photoproducts, cisplatin and oxaliplatin-crosslinked guanine dimers, and 1,N6- ethenoadenine sites, but this has not been shown in vivo. As with other X-family pols, Pol µ produces a high number of single nucleotide deletions31. These occur through a strand slippage mechanism called Streisinger slippage, where the lesion is looped out of the helix, allowing the primer to reanneal up- or down-stream from the lesion during primer-template realignment. Pol λ

7

also contributes to expansions at repeat sequences30. In addition, Pol λ can bridge DNA ends during NHEJ using template-dependent or template-independent synthesis, and is upregulated in response to IR, which induces DSBs30,31. It is also overexpressed in non-Hodgkin’s B-cell lymphomas, and may contribute to mutagenesis through the NHEJ pathway31. TdT also participates in VDJ recombination and completes synthesis entirely independent of a template, generating random stretches of DNA with random sequence to increase immunoglobulin diversity32. TdT localizes to broken DNA ends, where it participates in NHEJ32.

Pols eta, iota and kappa (η, ι, and κ), along with Rev1, comprise the Y family of DNA polymerases. These polymerases promote cell survival by allowing various lesions to be bypassed, or “tolerated”, leaving them to be repaired at a later timepoint. These lesions would otherwise cause replication fork stalling and contribute to genome instability through fork collapse36,37. In addition, the lesion bypass involves repriming of DNA synthesis downstream of lesions, which can be accomplished by the newly identified primase and polymerase

Primpol6,38,39. Primpol is a member of the AEP (archaea and eukaryote-like) family of primases.

Primpol can synthesize RNA or DNA primers de novo, without a free 3ʹOH in a template dependent manner6. This enzyme is important in the bypass of UV-generated DNA lesions, and also plays an important but unknown role in mitochondrial DNA (mtDNA) synthesis6,38,39. Fork restart may be accomplished by PrimPol’s binding to replication protein A (RPA), a ssDNA binding protein important in DNA replication, repair, recombination, and DNA damage signaling40. Primpol also contributes to normal replication fork progression during unperturbed S- phase39.

The Y family of DNA Pols contains four translesion Pols (TLS Pols) that participate in translesion synthesis (TLS)37. TLS Pols have low processivity and loose active site geometries that allow for synthesis across from a distorted template, due to having smaller finger subdomains

4. Y family Pols have classical thumb fingers, and palm domains, as well as C-terminal

8

polymerase associated domains (PAD, or little fingers domain) unique to this family4. This family of Pols is error prone even on undamaged templates, with error rates ranging from 10-2 to 10-4 in the lacZ forward mutation assay4. TLS Pols are recruited to stalled forks at RPA-coated ssDNA coated RPA signal by ubiquitylated PCNA after DNA damage41. Each of the Y family Pols have unique and overlapping functions that are partly regulated by PCNA binding. The activity of these Pols is highly regulated due to their error prone synthesis4.

Pol η is the main polymerase to bypass thymine dimers, which are the most common UV induced lesion2. During this bypass, Pol η typically adds A’s across from the TT-CPD42.

However, Pol η is less accurately able to handle 6-4 T-T photoproducts and typically the enzyme will insert two G’s across from the lesion, which is mutagenic2. Mutations in NER proteins, including in Pol η, lead to increased sensitivity to UV exposure in the disease Xermadoma

Pigmentosa2. These patients have an increased risk of skin cancer.

In the absence of Pol η, Pol ι can replicate across from thymine dimers. However, this bypass is usually mutagenic, with incorporation of G’s opposite the lesion2. In fact, this enzyme typically incorporates a G even across from undamaged T on template strand 10-fold more efficiently than the correct A2,4. This is because Pol ι’s active site can accommodate Hoogsteen base pairing, in which one base is flipped into the syn conformation, as well as traditional Watson-Crick base pairing4. Interestingly, Pol ι can insert a nucleotide across from a DNA lesion, but cannot extend past them due to its extremely low fidelity and poor processivity4. The lesion-misinserted base can then be extended by the B family polymerase Pol zeta (ζ, or REV3L), which is also a TLS

Pol43. Pol choice can also be determined by active site geometry: the active site of Pol κ can only accommodate one template base, making Pols ι and η, which can accommodate two, more suitable to bypass pyrimidine dimers4. In addition to UV-induced lesions, ι and η can bypass oxidative lesions, abasic lesions, and alkyl- and aryl-DNA adducts2.

9

Pol κ is the best suited to bypass the latter bulkier lesions that are more helix distorting2. In fact, the presence of polycyclic aromatic hydrocarbons (PAHs) induces expression of Pol κ, as the promoter for the POLK gene, which encodes Pol , contains binding sites for the aryl- hydrocarbon receptor molecules2. Common lesions Pol κ can bypass include cisplatin adducts, benzo[a]pyrene diol epoxide (BPDE), N2-methyl-G, N2-ethyl-G, and N2-isobutyl-G, and N2- acetylaminofluorene-G2,4. Like Pol ζ, Pol κ can also efficiently extend a primer terminus created by insertion opposite a lesion by another TLS Pol, hence its active site must accommodate distorted primer-template termini4. On an undamaged template, Pol κ produces many T to G errors44. The N-clasp domain, unique to Pol , is thought to be responsible for this decrease in fidelity. This domain is an extension of the thumb domain that surrounds the DNA mismatch near the primer terminus and this action may help stabilize the structure for catalysis4.

Rev1 is necessary for cell survival after genomic insults of UV, MMS, DNA crosslinking agents, and DSBs37. Rev1 functions in the repair of interstrand crosslinks (ICLs) in conjunction with the Fanconi anemia (FA) pathway and contributes to bypass of abasic and UV-induced lesions2,37. It does not synthesize long polymers of DNA as other Pols do, but adds a dCTP to a primer terminus2. The active site of Rev1 contains an arginine residue that coordinates the dCTP and during this reaction, and a unique “N-digit” motif that can flip a base out of the template position3,4. The primary role of Rev1 is thought to aid in polymerase switching at lesions2. In this model, the replicative Pol stalls at lesions and dissociates to allow for TLS Pol binding. One or more of these TLS Pols synthesizes across from and extends past the lesion. Then, due to very low processivity, dissociates to allow for rebinding of the replicative Pol4. The C-terminal domain of Rev1 interacts with other Y family Pols which may help regulate polymerase switching2.

Pol ζ, a B family polymerase, is also considered a TLS polymerase. It is also important though not essential for replication, possibly through its role in facilitating lesion bypass during fork progression45. In yeast, this enzyme is composed of Rev1, Rev3, and Rev7 subunits, and this

10

complex participates in DNA damage tolerance in response to UV lesions46. Rev stands for

“defective mutation reversion,” and these subunits were so named because they were identified as mutants which could not process single stranded gaps left by bypass of these lesions36. These same ssDNA gaps are seen in cells from patients with XP. Deletion of Pol ζ shows that ~50% of spontaneous mutagenesis and UV induced mutagenesis is dependent upon this enzyme in S. cerevisiae47. Like other TLS pols, Pol ζ is much less processive than other B-family Pols, and lacks 3ʹ-5ʹ proofreading activity. It is particularly good at replicating through thymine glycol in an error free manner2. The mutation spectrum of Pol ζ uniquely shows an increase of tandem base pair substitutions and closely spaced clusters of mutations48. Rev3, though non-essential in yeast, is embryonic lethal in mice. This suggests its functions cannot be carried out by other TLS pols and that Pol ζ handles endogenous lesions, being a necessity for replication fork progression.

Loss of Pol ζ promotes CIN as well, through replication-dependent DSB formation, and for this reason is necessary for cell proliferation49,50. When specifically knocked out in mouse epithelial cells, tumors develop in the rev3l-/- cells even without UV treatment. The tumors are characterized by CIN, supporting a role for Pol ζ as a tumor suppressor51. Recently, it was shown that Pol ζ shares subunits with the replicative Pol delta (Pol δ)52–54. This may be a highly utilized mechanism of Pol switching at lesions.

The remainder of TLS pols, Pol theta (θ) and Pol nu (ν), belong to the A family of pols. This family also contains the mitochondrial replicative Pol, Pol gamma (γ). Pol θ is unique in that it can both add across from and extend past an abasic site or thymine glycol lesion2,55. It also uniquely contains a helicase-like domain that is likely non-functional56. The error signature of Pol

θ contains mostly single nucleotide indels2. Like Pol ζ, Pol θ protects cells from CIN57. This is because it is able to extend from non-homologous primer-template structures in an alternative end-joining pathway (Alt-EJ), and functions in DSB repair and IgH class switch recombination25,57. In PolQ-/- mice, which thus lack Pol , frequent translocations are seen

11

between the IgH and Myc loci57. Other functions of Pol θ include bypass of lesions, and regulating the timing of replication55,58.

Pol nu (), is a poorly understood TLS Pol. It is thought to participate in bypass of thymine glycol and interstrand crosslinks. Pol  has low intrinsic fidelity and strand displacement activity.

Loss of the POLN locus is frequently seen in some human cancers2. Pol  synthesis is error prone, with a propensity to make G•dT mispairs59. This error prone synthesis likely aids in immunoglobulin gene diversity but through a mechanism other than HR60.

Pol γ is the mitochondrial polymerase, but is encoded by the nuclear gene, POLG. Pol  is comprised of two subunits, α and β, in a 1:2 stoichiometry, and is the primary polymerase responsible for replicating the 16 kb circular mitochondrial genome61. The  subunit contains

DNA polymerase, 3ʹ-5ʹ exo activity and 5′ deoxyribose-5-phosphate lyase (dRP lyase) activity61.

Pol γ was the only known mitochondrial Pol until the discovery of Primpol6. Before this discovery, Pol γ was thought to be responsible for all replication and DNA repair. Both of these functions safeguard genome stability, as mutations in either Pol γ subunit can lead to mtDNA mutagenesis. Approximately 150 Pol γ variants have been found that increase mtDNA genome instability in the form of point mutations, deletions, or decreases in mtDNA copy number62,63.

These changes lead to devastating mitochondrial disorders due to decrease in oxidative phosphorylation such as Alpers syndrome, early infantile hepatocerebral syndromes, progressive external ophthalmoplegia (PEO), Charcot-Marie-Tooth disease, and idiopathic parkinsonism62,63.

In addition, the proofreading function of Pol γ safeguards against aging although this mechanism is not well understood61.

1.5 DNA replication and B family polymerases

12

Nuclear genome replication is primarily carried out by three required B family polymerases: alpha, delta, and epsilon (α, δ and ε) 64. Pols ε and δ both consist of four subunits in humans, and the largest catalytic subunits contain pol and exo active sites. ε and δ are the only B family Pols with exo activity and are responsible for the majority of nuclear DNA synthesis64,65.

In order for the replicative Pols to initiate replicative DNA synthesis, a number of factors cooperate to unwind DNA at replication origins (ORIs) once and only once per cell cycle. One such essential factor is the replicative helicase known as the CMG complex that is a multi-subunit complex composed of Cdc45 and the Mcm2–7 and GINS subcomplexes. The CMG helicase assembles at ORIs in G1, where the Mcm2-7 ATPase activity is activated upon entry into S phase to unwind duplex DNA into ssDNA. DNA synthesis is unidirectional, only proceeding from the

5ʹ to 3′ direction with respect to the deoxyribose carbon numbering, by addition of a new dNTP to a 3ʹ -OH group13. Due to the antiparallel nature of double stranded DNA, one template strand can be replicated continuously (the “leading strand”), while the “lagging strand” is replicated in discrete, discontinuous fragments called Okazaki fragments. The CMG helicase unwinds DNA traveling from the ORI in both directions on the leading strand64. Unwound ssDNA is coated by

RPA to avoid ssDNA breaks45 (see Figure 1.3).

DNA synthesis is initiated by Pol α, which is a heterotetramer comprised of two primase and two polymerase subunits. Pol α is the least processive of B family Pols, and is proofreading- deficient64. Pol δ may correct errors made by Pol α66. The primase subunits of Pol α initiate de novo synthesis of an RNA primer 10-20 nucleotides in length. The Pol  polymerase subunits then catalyze synthesis 20-40 deoxyribonucleotides (dNTPs), to start synthesis on both leading and lagging strands of the replication bubble45 (see Figure 1.3). Pols δ and ε then continue synthesis on the lagging and leading strands respectively. This assignment was made in yeast studies with specific mutator variants, allowing sites of synthesis to be mapped based on characteristic mutations made67–69. A reconstituted in vitro S. cerevisiae system recently showed that the asymmetry in polymerase recruitment to the replication fork is mediated by the CMG

13

helicase, which directs Pol ε to the leading strand70. CMG mediates very processive synthesis by

Pol ε on the leading but not lagging strand, adding many dNTPs before dissociation. In contrast,

Pol δ synthesis on the leading strand in the presence of CMG is distributive, meaning the polymerase dissociates after the addition of each nucleotide. For lagging strand assembly, PCNA and RPA recruit Pol δ and cause faster and more processive synthesis compared to Pol ε on this substrate70. Other proteins are recruited to assemble a replisome, including the RFC clamp loader, and the PCNA clamp. The PCNA clamp is a homotrimeric protein that encircles DNA and binds many different proteins through their PIP motif71. Among these, the flap endonuclease (Fen1) and

DNA ligase 1 (Lig1) travel with PCNA at the fork to process Okazaki fragments71.

1.6 Polymerase alpha

Apart from replication, Pol α also functions in response to DNA damage (see chapter 4 introduction), telomere maintenance, and control of chromatin assembly72. Pol α was actually the first Pol discovered in human cells, and was initially thought to be solely responsible for full genome replication72. Evidence shows that Pol α primes DNA synthesis on and at telomeres, unique structures at the end of chromosomes that are comprised of 5ʹ (TTAGGG)n-3ʹ repeats in humans, having a single stranded 3ʹ overhang73. Telomere sequence and length varies between species. Telomeres are replicated partially through standard semi-conservative DNA replication73. However, the ends of chromosomes pose a problem to DNA replication called the

“end replication problem”. Since replication can only proceed from 5ʹ to 3ʹ direction, the 3ʹ end of chromosomes shortens with each round of replication after RNA primer degradation73. The enzyme telomerase can catalyze addition of telomeric repeats to the 3ʹ end, and Pol α primed synthesis on the 5ʹ end is thought to occur to complete replication on the 5ʹ end. Interaction with

Cdc13, a ssDNA-binding protein specific to telomeres, results in stimulation of Pol α’s primase activity72,74. In addition, telomere length is dependent on Pol α73. In general, maintenance of

14

telomeres prevents loss of genes near the ends of the chromosome, and telomere-telomere fusions that would result in CIN73.

Pol α also participates in silencing of mating loci and centromeric and telomeric regions through control of higher order chromatin structures72. Swi6 is largely responsible for maintaining these silenced states between cell divisions and is an interaction partner of Pol α. Certain mutations in Pol α disrupt normal silencing at these regions72. Lastly, Pol α may contribute to genome instability at repetitive sequences such as centromeres and telomeres in vivo through

MIN , since it is unable to faithfully replicate such sequences in cell free assays75.

1.7 Polymerase delta

Human Pol δ has four subunits (p125, p50, p66, and p12), while the S. cerevisiae enzyme has three: (Pol3, Pol31, and Pol 32)65. Pol δ is among the most accurate Pols, making only 1 mutation per 107 base pairs during synthesis. Forward mutation assays show that WT Pol δ causes single nucleotide deletions in homonucleotide runs76. The structure of Pol δ’s exo active site is similar to standard Pol motifs, and is comprised of beta sheets supported by alpha helices. These alpha helices contain catalytic residues Asp321, Glu323 and Asp407 that coordinate the two divalent Mg2+ ions necessary for catalysis77,78. The 3ʹ-5ʹ exo proofreading activity improves accuracy 10-100 fold76. Pol δ also contains a characteristic β-hairpin motif that is thought to aid in partitioning of the primer strand between exo and pol active sites78. The C-terminal domain of the catalytic subunit contains an [4Fe-4S] cluster in the region that binds the smaller subunits. Pol ζ also binds the smaller subunits of Pol δ through its Fe-S cluster to mediate Pol switching53.

Aside from bulk DNA replication during S-phase, DNA polymerase δ and its subunits participate in licensing of DNA replication, telomere maintenance, and DNA repair through

MMR, BER, NER, lesion bypass, break-induced replication, and recombination23,76,77,79. Pol δ is required for the MMR pathway but it may be dispensable for other repair pathways76. Pols δ and ε can substitute for Pol β in cell-free repair of an abasic substrate and may play a role in a back-up

15

BER pathway during long-patch repair76,79. A role for Pol δ in HR is also implicated, as deletion of Pol32 or destabilization of the holoenzyme leads to inability to finish HR synthesis. Notably, destabilization of the holoenzyme can also lead to fork stalling and collapse76. Other evidence shows that Pol δ is able to replicate D-loops, which are intermediate substrates in HR, in vitro76.

Pol δ also participates in NER and bypass of some lesions. Although Pol δ stalls at most lesions, it can synthesize across from 8-GO, 6-methyl-deoxyguanine or abasic sites. However, the processivity and fidelity is reduced, and another Pol may be needed to extend past the lesion. For

NER, a reconstituted system of repair showed Pol δ, ε, and κ can provide repair synthesis in this pathway76. Lastly, like Pol α, Pol δ participates in partial replication of telomeres through semi- conservative replication. In human cancers that maintain telomere length through alternative lengthening of telomeres (ALT), BIR is a likely mechanism for telomere extension, which means

Pol δ may play a role in this pathway23,76.

1.8 Polymerase epsilon

Pol ε consists of four subunits in humans (p261, p59, p17, and p12) and four in yeast

(Pol2p, Dpb2p, Dpb3p, and Dpb4p). Pol and exo active sites reside in the largest subunit. The Pol

ε holoenzyme contributes to the most faithful DNA synthesis of all DNA Pols, making approximately one mistake per 105 base pairs synthesized80. The structure of Pol ε has recently been solved for the yeast enzyme and shows that, in addition to the conserved fingers, thumb and palm structure, Pol ε contains unique motifs not found in other B family Pols81,82. These unique motifs- an extension in the palm domain and an extra alpha helix in the thumb domain- interact with duplex DNA within the first 10 nascent primer nucleotides and are therefore poised to affect processivity and accuracy82. Deletion of the palm domain extension lowers processivity in vitro82.

The exo domain of Pol ε lacks the beta-hairpin structure common to other B family Pols that aids in primer transitioning between active sites. However, Pol ε is able to partition the primer strand between active sites without dissociating from the primer-template substrate83.

16

Mutations in the Pol domain that abolish the catalytic, metal-coordinating aspartate residues are lethal in yeast. However deletion of the entire catalytic domain causes cells to be viable, though sick84,85. This suggests that another Pol, likely Pol δ, is able to replace synthesis in the absence of functional Pol ε, but only inefficiently, and that the C-terminal half of p261 is essential84,85. This portion is likely required for holoenzyme stability, and binds the three smaller subunits86. Dpb2 is also essential and likely because it is necessary for holoenzyme stability as well86. Interestingly, the C-terminus of p261 and N-terminus of Dpb2 are both necessary for interaction with the CMG helicase, promoting its translocation during replication fork progression87,88. The C-terminus of Dpb2 also plays an important although poorly understood structural role that is critical for cell viability as mutants in the C-terminus do not support viability89. Pol2 and Dpb2 are recruited to sites of replication initiation prior to Pol α where they form a complex containing GINS, Dpb11, and Sld286,90. This suggests that Pol ε participates in cell cycle regulation or replication initiation apart from simply being delivered to the fork via interactions with the replication helicase. Also in support of these functions, Dpb2 is a target of S- phase kinases and a substrate of cullin ring ligases91,92.

Like Pol δ, Pol ε has the capacity to participate in BER, NER, RER, and BIR repair pathways, but actual contributions in vivo are not certain in some pathways. Pols ε and δ participate in NER in vivo, with half of repair synthesis performed by Pol ε, and the other half performed by Pol δ and Pol κ93. The choice of Pol is likely determined by different clamp loaders that specifically recruit them to leading vs lagging strand lesions93. In cycling cells, replicative

Pols can be co-IPed with components of BER and SSBR pathways, suggesting Pol ε participates in these repair pathways94. In yeast, Pol ε is required for longer synthesis products95, while Pols α and δ initiate and extend BIR synthesis23. In addition to replication and repair functions, yeast Pol

ε contributes to maintenance of epigenetic states at rDNA, telomeres, and heterochromatin96–99. In humans, these possible functions of Pol ε and the involvement of different subunits are largely unstudied.

17

In contrast to the larger subunits, the smallest Pol ε subunits, Dpb3 and Dpb4, are non- essential100. These subunits function within the Pol ε holoenzyme during replication in yeast, but also have the ability to bind dsDNA as a homodimer through basic lysine residues101. Dpb3 (p17) and Dpb4 (p12) contain histone fold motifs (HFMs) which are motifs that allow dimerization between proteins containing them102, and have high similarity in sequence to histone H2A and

H2B respectively101. Recent evidence shows that the small subunits are post-translationally modified after different stressors in human cells. For example, p17 is sumoylated during heat shock stress103 and p12 is a putative target of ATM/ATR kinases after IR treament104. Roles for human p12 have not been identified, but p17 is known to function in the human ISW1/CHRAC complex. This complex maintains open chromatin states at subtelomeric regions and relaxes heterochromatic regions during DNA replication96,105. Participation of p17 in this complex likely has consequences in epigenetics and DNA damage signaling. In yeast, deletion of either Dpb4 or the catalytic subunit leads to loss of inherited epigenetic state at telomeres. The silenced or non- silenced states are determined by a balance of Pol ε and ISW2/yCHRAC opposing functions97.

Chromatin remodeling is also important in the DNA damage response (see chapter 4 intro), and a role for CHRAC subunits is implicated in NHEJ, G2/M checkpoint, and preventing UV-mediated replication fork collapse106. Thus p12 and p17 likely play important roles in epigenetics and DNA damage signaling.

18

Table 1.1 Repair pathways for mismatched or damaged bases. Mismatch or Gap filling Pathway Lesion handled Recognition complex damage removal and ligation Single base substitution, one RPA, PCNA, MMR16 nucleotide Msh2/Msh6 (MutSα) Mlhl/Pms2, ExoI LIGI, Pol δ, insertion/deletio Pol ε n loop (IDL) RPA, PCNA, MMR Large IDLs Msh2/Msh3 (MutSβ) Mlhl/Pms2, ExoI DNA ligase I, Pol δ, Pol ε Helix distorting lesions (Thymine RPA, PCNA, dimers and other TC-NER XPA, XPB, XPD, Pol δ, Pol ε photoproducts, (transcription CSB, CSA XPF (ERCC1), XRCC1– cisplatnin coupled17 XPG LIG3α, adducts, FEN1–LIG1 cyclopurines) which block transcription Helix distorting lesions (Thymine RPA, PCNA, dimers and other GG-NER XPA, XPB, XPD, Pol δ, Pol ε photoproducts, (global XPC-Rad23B XPF (ERCC1), XRCC1– cisplatnin genome) XPG LIG3α, adducts, FEN1–LIG1 cyclopurines), cell cycle independent DNA glycosylases (uracil-DNA Short patch: glycosylase (UNG) Pol β, Pol λ, Oxidation, and N-methylpurine- XRCC1/LIG alkylation, or DNA glycosylase APE1, PNKP, Pol 3 BER17 deamination (MPG), 8-oxoguanine β Long patch: products (8-GO, DNA glycosylase Pol ε, Pol δ, abasic sites, (OGG1), mutY PCNA, Fen1, homolog (MUTYH), Lig1 endonuclease III-like 1 (NTH1) and NEIL1) Fen1, LIG1, RER18 ribonucleotides Top1, RNAseH2 Top1, RNAse H2 Pol δ, Pol ε APE1, PNKP, tyrosyl- APE1, PNKP, Pol β, DNA tyrosyl-DNA SSBR17 SSBs XRCC1/LIG phosphodiesterase 1 phosphodiesterase 3α (TDP1) 1 (TDP1)

19

Table 1.2 Human DNA Polymerase families. All 16 human DNA Pols are grouped into the indicated families based on primary sequence homology.

Family Polymerase Gene Function

A Pol Gamma (γ) POLG Mitochondrial DNA repair and replication Pol Theta (θ) POLQ DSB repair, microhomology mediated NHEJ25 Pol Nu (ν) POLN TLS, immunoglobulin diversity B Pol Alpha (α) POLA1 DNA replication priming Pol Delta (δ) POLD1 DNA replication, NER and MMR Pol Epsilon (ε) POLE DNA replication, NER and MMR Pol Zeta (ζ) REV3L TLS X Pol Beta (β) POLB BER, NHEJ Pol Lambda (λ) POLL BER, NHEJ, VDJ recombination Pol Mu (µ) POLM BER, NHEJ, VDJ recombination TDT DNTT Immunoglobulin diversity Y Pol Eta (η) POLN TLS bypass at UV lesions Pol Iota (ι) POLI TLS, backup BER Pol Kappa (κ) POLK TLS, backup NER Rev1 REV1 TLS AEP Primpol Cdcc11 Mitochondrial DNA replication

Adapted from Lange et al, 20112. TDT: terminal deoxynucleotidyltransferase; TLS: translesion synthesis; NER: nucleotide excision repair; BER: base excision repair; MMR: mismatch repair; DSB: double strand break; NHEJ: non-homologous end-joining

20

Figure 1.1 A model of eukaryotic DNA polymerases. The figure shows the fingers, palm, and thumb domains common to DNA Pols. An accessory 3ʹ-5ʹ exonuclease domain is shown in red. This figure was taken with permission from Loeb and Monnat, 20081.

21

Figure 1.2 The repair of DSBs is accomplished by two main pathways: HR and NHEJ. These pathways are active in different stages of the cell cycle. In HR, the DSB is recognized by the MRN complex (Mre11-Rad50-Nbs1), and exonucleases including CtIP and possibly Exo1 perform 5ʹ-3ʹ end resection to form a 3ʹ ssDNA overhang. This overhand is coated by RPA, which recruits RAD52. A RAD51-BRCA2 complex then creates RAD51 filaments that function in homology search for intact DNA locus. The second DNA end invades the homologous region to form a double Holliday junction structure. In classical HR, resolution of the Holliday junctions results in non-crossover (cleavage at blue arrows). Crossover can also occur (cleavage at blue arrows on one side and red arrows on the other). Single strand annealing (SSA) occurs in a Rad51 independent mechanism, and broken DNA ends anneal at regions of microhomology leaving ssDNA flaps that are subsequently processed. In, synthesis-dependent strand annealing (SDSA), a single holiday junction structure allows new synthesis at the invading strand, which is displaced and anneals to the other DSB end. The Ku (Ku70/Ku80) complex recognizes DSB ends in the NHEJ pathway and recruits DNA-PKcs. These kinases activate exonucleases including Artemis to generate shorter 3ʹ overhangs. DNA Pols perform fill-in synthesis to remove gaps, and end joining is carried out by XRCC4–LIG4 and XLF. This figure was adapted from Iyama and Wilson17, with permission.

22

Figure 1.3 Model Eukaryotic replication fork. The minimal proteins required for DNA replication are shown. This model is a three polymerase model, and Pol ε is pictured on the leading strand, while Pols δ and α are pictured on the lagging strand. Pol α primes synthesis on the lagging strand, and Pol δ extends the primers to form Okazaki fragments. ssDNA stretches between Okazaki fragments are coated by RPA. The CMG helicase unwinds DNA at the replication fork. PCNA, Fen1, and Lig1 process small gaps between Okazaki fragments. Taken with permission from Prindle and Loeb76.

23

Chapter 2: Tumor-associated mutations in Pol ε lead to a mutator phenotype in vitro and in vivo, and reveal replication strand-specific mutation patterns and human origins of replication

24

INTRODUCTION

Maintaining genomic stability in part requires a balance of accurate and inaccurate template strand copying by DNA polymerases2. There are 16 known human DNA polymerases that participate in such processes, contributing to faithful genome duplication, repair of DNA lesions, or damage tolerance by translesion synthesis2,6,38,39. These actions are important because there are many natural insults to genomic integrity including difficult to replicate sequences (e.g. microsatellites, telomeric repeats, rDNA, fragile sites)107–110, single or double strand breaks111,112, spontaneous and induced DNA lesions (e.g. abasic sites, alkylations, oxidations, interstrand crosslinks (ICLs))2, and the high abundance of ribonucleotides18,113,114. Proper maintenance of genome stability is important to preventing tumor development15. Replicative polymerases conduct highly accurate synthesis that safeguards against microsatellite and point mutation instability. In eukaryotes, several mechanisms contribute to an observed spontaneous mutation rate of less than 10-9 mutations per base pair per cell division115: the high intrinsic polymerase selectivity against incorporation of the wrong base-pair, the 3ʹ to 5ʹ exo proofreading activity and the post-replication mismatch repair system115,116. However, some DNA lesions cause stalling of replicative polymerases, which could lead to fork collapse and subsequent chromosomal instability (CIN). The action of translesion polymerases, though most often leading to point mutations, can safeguard against this more detrimental type of DNA damage2.

Tumors have often been described as having mutator phenotypes as they have forms of genomic instability not present in normal cells and sometimes thousands of mutations, of which only a few contribute to tumorigenesis15,117. Some sources for the large numbers of mutations found in human cancers have been uncovered but others remain puzzling. For example, in hereditary cancers, common susceptibility mutations exist in DNA repair genes such as BRCA1,

BRCA2, BLM, RAD52, MSH6 and others15. In contrast, sporadic cancers contain very few mutations in these caretaker genes (but may develop them in response to treatments) and can also contain a perplexingly high numbers of mutations15. Therefore, it is controversial how mutator

25

phenotypes arise or whether they exist in spontaneous tumors at all. Two hypotheses exist to explain high mutation rates in tumor cells: oncogene induced replication stress and selection in tumors for mutations in DNA polymerases of DNA repair proteins15,117.

The best examples of a mutator phenotype in cancer are inactivation of MMR genes, especially MLH1, which is often transcriptionally silenced by promoter methylation, leading to microsatellite instability (MSI)118. Silencing or mutational inactivation of certain MMR genes leads to expansion or contraction of microsatellite repeats, termed MSI, since these gene products process small indels during DNA synthesis119. MSI is found in 15% of colorectal cancers (CRCs),

20% of stomach cancers, 30% of endometrial cancers and in rare occurrences of pancreatic adenocarcinoma, glioblastoma multiforme, and renal cell chromophobe tumors120. Tumors exhibiting MSI are considered hypermutated, with 10 to 100 mutations per Mb observed in whole genome and whole exome studies121,122. In contrast, microsatellite stable (MSS) tumors have mutation frequencies of 1 to 10 mutations/Mb. Although MSI tumors are considered to have a mutator phenotype, they have reduced chromosome instability (CIN) compared to their MSS counterparts120.

Mutations in polymerases may be another source of a mutator phenotype, as they can contribute to mutator phenotypes14,123. Most of these such mutations, though not all, reside in exo domains of replicative polymerases14. Exo inactivation in mice showed that the proofreading function of Pol ε and Pol δ acts as a tumor suppressor124. Until very recently, it was unknown whether mutations in replicative DNA pols existed that could contribute to tumor-specific mutagenesis in humans. These were first identified in CRC tumors122,125, and with the help of whole genome and exome sequencing, have since been identified in many others, including stomach adenocarcinoma (STAD), glioblastoma (GBM), low grade glioma (LGG), breast carcinomas (BRCA), bladder carcinomas (BLCA), pancreatic adenocarcinoma (PAAD), head and neck cancer (HNSC), kidney chromophobe (KICH), lung adenocarcinoma (LUAD), squamous

26

cell lung cancer, and ovarian cancer121,126,127. Many of these tumors are associated with a newly defined ultra-mutated phenotype and have >100 mutations/Mb)122,128–130.

This study will focus on mutations in POLE1, the gene encoding the Pol  catalytic subunit, found in CRC and endometrial cancers. These tumors will be termed POLE tumors.

POLE1 mutations are found in about 3% and 7% of each type, respectively122,126,128–130. Pol ε exo variants (Pol ε-exo*) found recurrently in these tumors include P286R, V411L, S297F, A456P,

L424V/I, and S459F. These residues are all within the highly conserved exo domain122,131–133.

This class of tumors may develop through a new pathway of CRC tumorigenesis131. About 20% of CRCs are hereditary134, and are normally characterized as having abnormal activation of the

Wnt pathway (90% of hereditary CRCs)119,129, or loss of MMR (15% of all CRCs)118. The most common forms of inherited CRC are FAP (familial adenomatous polyposis), HNPCC (the hereditary nonpolyposis colorectal cancer, also called Lynch Syndrome), which make up about

5% of hereditary cancers each118. In FAP, inactivation mutations in APC make up >90% of cases.

Loss of APC leads to abnormal stabilization of beta-catenin, which then activates c-Myc and

Cyclin-D growth pathways among others135. In HNPCC, mutations in Mlh1 and Msh2 account for

70% of the tumors. Another hereditary CRC class involves mutations in another MMR gene,

MUTYH, termed MAP (MYH-associated polyposis syndrome). These tumors are characterized by inability to repair oxidative damage, resulting in increased G:C to A:T transversions118.

Interestingly, POLE1 and POLD1 germline mutations were identified during a search for other hereditary CRC susceptibility genes128. This study by Palles et al. identified the POLE L424V and POLD S478N variants, and POLD R444Q has also been identified as a germline mutation in squamous cell lung cancers and endometrial cancers127,128,130.

In the case of sporadic CRC generation, activation of the Wnt pathway is also a very important mechanism, with 70–80% of sporadic CRCs having a mutation in APC118. These are usually followed by mutations in k-Ras or others that activate growth pathways118. The Cancer

Genome Atlas study identified a few pathways that are mainly responsible for CRC

27

tumorigenesis: Wnt, MAPK, PI3K, TGF-beta and p53 pathways122. Compared to other CRCs,

POLE tumors are characterized by having an intermediate activation of Wnt, and do not show inactivating frameshift mutations in APC119,131. Oddly enough, the POLE tumors that do have an

APC mutation have the R1114X variant, which is rare in other CRCs131. Other genes normally important in sporadic CRCs- k-Ras and TP53- were found in lower abundance in the

POLE/hypermutated tumors. Instead, recurring BRAF and TGFβ receptor mutations were seen122.

About half of POLE tumors showed detrimental mutations or silencing in MLH1, and instead had somatic mutations in other MMR genes, not normally seen in MSS or MSI non-POLE tumors122.

In addition, of the hypermutated tumors that were MSS stable, nearly all had a mutation in the Pol

ε exo domain, and these tumors tended to have the highest mutation frequencies found131.

Overall, hypermutated tumors had less copy number variation122, and those of the hypermutated non-MLH1 silenced showed more missense and nonsense mutations119.

In endometrial cancers, genes involved in pathogenesis are less well understood, but it is thought that one of the early important hits is loss of the PTEN phosphatase136,137. Other frequently mutated genes include other members of the PI3K/AKT pathway (PIK3CA, PIK3R1 and KRAS), FGFR2 (fibroblast growth factor receptor 2), ARID1A (of the SWI/SNF chromatin remodeling complex), TP53, PPP2R1A (protein phosphatase 2, regulatory subunit A), and

CTNNB1 (β-catenin)132,138. In fact, endometrial tumors have more mutations in PI3K/AKT pathway than any other tumor type132. KRAS and CTNNB1 mutations in endometrial cancers are mutually exclusive, adding another unique feature to this tumor type132. Like in colon cancer, the

Wnt pathway and MSI are also important mechanisms in endometrial cancer132. Susceptibility to endometrial cancer can also be inherited, and 2-14% of EC cases are seen in women under 40139.

Like CRCs, inherited endometrial cancer susceptibility is due to mutations in MMR genes in 2-

3% of cases140. Lynch syndrome, characterized by mutations in MMR genes, causes 40%–80% risk for colon cancer, a 25%–60% risk for EC, and a 10%–12% risk for ovarian cancer140.

28

The Cancer Genome Atlas consortium (TCGA) sequenced 373 endometrial tumors and found that 7% contained mutations in POLE1. Not surprisingly, the POLE tumors also had the highest mutation frequencies in endometrial tumors, and were deemed ultramutated, with up to

500 mutations per Mb132. Genes that were also significantly mutated in POLE tumor subgroup are

PTEN, PIK3R1, PIK3CA, FBXW7, KRAS132. Mutations in FBXW7 and bi-allelic mutations in

PTEN are see more frequently in POLE ECs130. However, as with APC in CRC, an important gene in endometrial cancer progression, when see in POLE tumors, was also enriched for other variants: in PIK3CA the classic mutation hotspots, E545 and H1047, were not seen in POLE tumors130,132,141. These POLE tumors were also largely MSS, had frequent non-sense mutations in

TP53, and had increases in C:G→A:T and A:T→C:G transversions132. Patients with the POLE1 mutations had increased long-term progression free survival132,141.

Certain Pol variants are mutators that have a propensity to make specific base pair substitutions and therefore a mutation signature. In yeast, Pol δ and ε variants have been used to map where synthesis has occurred, taking advantage of this phenomenon, around origins of replication (ORIs), and have shown that Pol ε replicates the leading strand while Pol δ replicates the lagging strand in yeast68,69. For example, in yeast, the M644G mutation leads to many T→A but not A→T transversions68, and the mispair imbalance was critical to assigning leading vs lagging strand Pols. This assignment has been more difficult in human cells because there are at least 50,000 origins, whereas there are about 400 in yeast, and the regulation of them is much more complex142. In humans and yeast, not all origins are activated at the same time, and not all are used in a given cell cycle142. The origins used are not always consistent between different tissues, and maybe even be different in different cell cycles within the same tissue, although some strong constitutive origins have been defined at some loci142. Human origins do not occur at consensus sequences as they do in yeast, which can help predict positions of origins. The position of pre-RC also does not delineate which origins are used, because not all licensed origins fire142.

This chapter will describe specific Pol ε variants found in human tumors that cause a specific

29

hotspot mutation, allowing the assignment of Pol ε as the leading strand Pol and defining a new way to map strong constitutive origins. In addition, the hotspot mutagenesis may contribute to a novel mechanism of tumor progression.

METHODS

2.1 Purification of Pol ε tumor variants.

An expression vector encoding amino acid residues 1–1189 of the exo-proficient catalytic subunit of human POLE (POLE-N140) was used as described143. Site-directed mutagenesis was used to introduce the Pole-exo* mutations encoding: P286H, P286R, L424V, L424I, V411L,

F367S, S459F, and D275A/D277A in the pGEX/4T3 vector. Each construct contained an N- terminal cleavable GST tag. Expression vectors were co-transformed into Rosetta cells (EMD

Millipore) along with the pRK603 vector which expresses TEV protease, to allow for intracellular removal of the GST tag. For expression, an aliquot of an overnight culture of co-transformed

Rosetta cells was added to 6 L of LB media, to a final A600nm of 0.1. Cells were grown at 37°C with shaking at 220 rpm to an A600nm of 0.6, when IPTG was added to a final concentration of 4 mM. Expression was induced for 4 h at room temperature, then cells were harvested by centrifugation. Cells were resuspended in lysis buffer [300 mM Tris, 100 mM NaCl, 20 mM

K2SO4, 0.5 mM EDTA, pH 7.8, protease inhibitor tablets (Roche)] at 1.5 mL per 1 g of cell pellet.

Cells were lysed by several passages through a French press, and lysate was homogenized with a

20.5-gauge needle. Cell lysates were purified using a HisTrapFF column(GE Healthcare).

Columns were equilibrated with 10 column volumes (CV) of wash buffer [150 mM Tris, 200 mM

NaCl, 20 mM K2SO4, and 2mM DTT, pH 7.8]. Bound protein was washed with 25 CV of wash buffer, then 10 CV of wash buffer containing 75 mM imidazole. The sample was eluted in 150 mM imidazole and dialyzed overnight into SEC buffer [150 mM Tris, 200 mM NaCl, 20 mM

30

K2SO4, and 2mM DTT at pH 7.8]. The sample was concentrated to 100 µL in a spin concentrator and resolved by a Superdex 200 10/300 GL column (GE Healthcare). Peak fractions were monitored by UV and by Western blot. Relevant fractions were pooled, and glycerol was added to a final concentration of 10%. Samples were then quick frozen in liquid N2 and stored at -80°C.

2.2 Primer extension assays and excision reactions

For primer extension analysis, a 20-mer DNA oligo, 5ʹ-CCTCTTCGCTATTACGCCAG-

3ʹ, was 5ʹend-labeled with 32P by incubating with T4 polynucleotide kinase (Invitrogen) and γ-

32P-ATP (Perkin Elmer). The primer was annealed to a complementary 45-mer DNA oligonucleotide, 5ʹ-TTGCAGCACATCCCCCTTTCGCCAGCTGGCGTAATAGCGAAGAGG-

3ʹ by incubating at 85°C for 5 minutes in 1x SSC, then slow cooling the mixture to room temperature. Each DNA polymerization reaction, at a total volume of 30 uL, contained 50 mM

Tris, pH 7.4, 8 mM MgCl2, 1mM DTT, 10% glycerol, 25 mM each dNTP, 1 nM enzyme, and 100 nM DNA primer/template. Reactions were started by the addition of enzyme and carried out at

37°C. At the indicated timepoints, an aliquot of the reaction was removed and stopped by mixing in an equal volume of stop buffer [95% formamide, 5 mM EDTA, 0.1% xylene cylanol, and 0.1% bromphenol blue]. Exo reactions were performed in the same manner, with the exception that dNTPs were withheld. For the ssDNA substrate, the 20-mer primer above was used, and enzyme concentrations were 40 pM. Mismatch excision reactions were performed at 1 nM enzyme, and each primer was hybridized to the above template or one of the sequence 5ʹ-

TTGCAGCACATCCCCCTTTCGCCAGATGGCGTAATAGCGAAGAGG-3ʹ. (primer sequences for each mismatch were: CdT: 5ʹ-CCTCTTCGCTATTACGCCAT-3ʹ, AdG 5ʹ-

CCTCTTCGCTATTACGCCAGCG-3, GdA 5ʹ-CCTCTTCGCTATTACGCCAGA-3ʹ, TdG 5ʹ-

CCTCTTCGCTATTACGCCG-3ʹ). Products were resolved by 12% denaturing PAGE. Gels were dried and exposed to phosphorimager cassettes, and quantified using a phosphorimager (GE

Healthcare) and ImageQuantTL software (GE Healthcare).

31

2.3 LacZ gap filling forward mutation assays

The in vitro lacZ forward mutation assay was performed essentially as described previously33. Double-stranded M13mp2 DNA containing a 407-nt ssDNA gap was used as a substrate. Each reaction contained 0.15 nM DNA, 50 mM Tris-Cl pH 7.4, 8 mM MgCl2, 2 mM

DTT, 100 mg/mL BSA, 10% glycerol, 250 µM dNTPs, and 1.5 nM purified Pol ε. Reactions were carried out at 37°C. Reactions were stopped by the addition of 0.5 M EDTA pH 8 to a final concentration of 25 mM, and checked for completion by resolution on a 0.8% agarose gel for 16 hours. Filled reactions were electroporated into MC1061 cells using BioRad Genepulser, then plated with 0.5 mls CSH50 liquid culture (in 2X YT: 16 grams bactotryptone, 10g bacto yeast extract, 10g NaCl, pH 7.4), 2.5 mls of soft agar (0.8% bactoagar, 0.9% NaCl), 85 µL of 66 mg/ml

X-Gal in DMF, and 20 µL of 100 mM IPTG) onto minimal agar plates. Minimal plates were made from 1 L of autoclaved 1.6% agar. After sterilization and cooling the following were added:

0.3 ml of 100 mM IPTG, 20 ml of 20% glucose, 5 mls of 1 mg/ml thiamine hydrocholoride, and

. 20 ml of 50x VB salts [10 g MgSO4 7H20, 100 g citric acid anhydrate, 500g K2HPO4, and 175g

Na2(NH4)HPO4 per liter at pH 7.0 to 7.2]. Plaques were purified to determine status as true mutant or false positive. For purification of mutant plaques, agar plugs of matched mutant and wild type plaques were picked and co-incubated in 2.5 mls of 0.9% NaCl for 1 hour at room temperature. 1 µL of extracted phage solution was diluted into 2.5 mls of 0.9% NaCl, then 1 µL of dilution was plated with aforementioned amounts of X-Gal, IPTG, and CSH50 cells to give

50/50 mutant/WT plates.

For sequencing of mutant plaques, mutant plugs were picked from 50/50 plates into 2.5 mls of .9% NaCl. These were incubated at room temperature, then used to start overnight cultures with 1 ml 2xYT, 1 µL phage solution, and 5 µL of CSH50 liquid culture. Phage supernatants

32

were sent to Genewiz for sequencing. Mutation frequencies and error rates were calculated from sequencing data using the following equations:

# 표푓 푚푢푡푎푛푡 푝푙푎푞푢푒푠 1. 푚푢푡푎푡𝑖표푛 푓푟푒푞푢푒푛푐푦 = 푡표푡푎푙 푛푢푚푏푒푟 표푓 푝푙푎푞푢푒푠

2. 퐵푎푠푒 푝푎𝑖푟 푠푢푏푠푡𝑖푡푢푡𝑖표푛 푒푟푟표푟 푟푎푡푒푚→푛

# 표푓 푚 → 푛 푐ℎ푎푛푔푒푠 1 1 1 = ∗ ∗ ∗ 푡표푡푎푙 # 표푓 푚푢푡푎푛푡푠 푚푢푡푎푛푡 푓푟푒푞푢푒푛푐푦 0.6 # 표푓 푑푒푡푒푐푡푎푏푙푒 푠𝑖푡푒푠

# 표푓 푓푟푎푚푒푠ℎ𝑖푓푡푠 + 표푟 − 1 1 1 3. ±1 푓푟푎푚푒ℎ𝑖푓푡 푒푟푟표푟 푟푎푡푒 = ∗ ∗ ∗ 푡표푡푎푙 # 표푓 푚푢푡푎푛푡푠 푚푢푡푎푛푡 푓푟푒푞푢푒푛푐푦 0.6 199 where 0.6 is a correction factor for the probability of the genome strand of interest to be packaged into the phage and expressed in host cells, and 199 is the number of detectable sites in the regulatory and coding regions. The # of detectable changes for each base pair substitution is found in the table below33.

Template base Total number mutation Template.dNTP # of detectable mispair sites A 33 A to G A.dC 19 A to T A.dA 23 A to C A.dG 17 G 29 G to A G.dT 22 G to C G.dG 19 G to T G.dA 25 T 33 T to C T.dG 27 T to A T.dT 16 T to G T.dC 23 C 30 C to T C.dA 25 C to G C.dC 9 C to A C.dT 16 total 125 241

2.4 Direct sequencing of DNA polymerization reactions

To generate the substrate for the direct sequencing of Pol ε variant synthesis products, three approximately equally spaced DNA primers were hybridized to single-strand M13mp2

33

DNA. The primers were located at positions 1324, 4012, and 6434 of the 7216-nt M13mp2 template. Primer sequences were 1324: 5ʹ-AGCAACGGCTACAGAGGCT-3ʹ; 4012: 5ʹ-

TTTTTAACCTCCGGCTTAGG-3ʹ; 6434: 5ʹ-GATCGCACTCCAGCCAGC-3ʹ. Primers were added to the template on ice at 1.2-fold molar excess and hybridized by heating for 5 min at 80°C followed by slow cooling to room temperature. Reactions were assembled containing 40 nM

DNA, 50 mM Tris-Cl, pH 7.4, 8 mM MgCl2, 2mM DTT, 100 mg/mL BSA, 10% glycerol, and

250 mM dNTPs. Reactions were initiated by the addition of 40 nM enzyme and incubated for 30 min at 37°C. DNA synthesis was measured in parallel in separate identical reactions in which α-

32P-dATP (3000 Ci/mmol; Perkin Elmer) was added, followed by separation by denaturing

PAGE. Libraries were prepared from the reactions lacking radio-labeled dATP by the following method: Single-stranded product was removed from the synthesis reactions using mung bean nuclease to generate double-stranded blunt-end DNA molecules. The double-stranded product was then formed into next-generation sequencing libraries using a ThruPLEX-FD kit (Rubicon

Genomics). Libraries were individually barcoded and pooled. Pooled libraries were sequenced on a single MiSeq lane (Illumina) using a 2 x 150 paired-end protocol. Sequences obtained from the

POLE synthesis products from singlestrand M13mp2 DNA have been submitted to NCBI’s

BioSample database (http://www.ncbi.nlm.nih.gov/biosample) under accession number

SAMN03023853.

2.5 Tumor sequencing

Tumor sequencing was performed by our collaborators at Baylor College of Medicine.

497 tumor and normal pairs of colon and rectal cancer were sequenced. The study design and sequencing approaches were reported by The CancerGenome Atlas Network122. POLE mutation data in other cancers was obtained from The Cancer Genome Atlas Data Portal

(https://tcgadata.nci.nih.gov/tcga/dataAccessMatrix.htm), Genome Data Analysis Center (GDAC)

(https://confluence.broadinstitute.org), and the cBio Portal for cancer genomics

34

(http://www.cbioportal.org). Other mutations affecting the POLE and POLD1-exo domains are reported in COSMIC (http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/) but were not evaluated in the rest of this study because (1) they were not accompanied with overall mutation frequencies and nucleotide change frequencies, or (2) their validation status was not clear.

Cancers in TCGA that were found to contain Pole-exo* mutations are colorectal cancers (CRC), endometrial cancer (EEC), stomach adenocarcinoma (STAD), glioblastoma (GBM), low grade glioma (LGG), breast carcinomas (BRCA), bladder carcinomas (BLCA), pancreatic adenocarcinoma (PAAD), head and neck cancer (HNSC), kidney chromophobe (KICH), and lung adenocarcinoma(LUAD). All publicly available TCGA tumor data comply with U.S. law protecting patient confidentiality and other ethical standards.

2.6 Mutation context clustering and enrichment

Whole-genome data from TCGA patients TCGA-F5-6814, TCGAAA-A00N, TCGA-EI-

6917, TCGA-CA-6718, TCGA-AA-3555, and TCGA-A6-6141 were analyzed as follows: To test for clustering of mutations, windows 50 kb in width were generated across the genomic space and the ratio of TCT→TAT to TCT→TAT plus AGA→ATA was calculated. Windows were analyzed further if they contained 10 or greater mutations. To generate a random mutation model, at each window a random sampling from either TCT→TAT or AGA→ATA was taken for each observed mutation within that window, and the ratio of TCT→TAT to total was reported. To test for correlation, for each TCT→TAT and AGA→ATA mutation, the distance (in genomic space) was calculated to the next (higher chromosomal coordinate) mutation subject to a minimum distance of 1 bp, 5 kb, 100 kb, or 1 Mb. These mutations were paired and classified as same context if they were identical and different context if they were inverse mutations. Distances and orientation configurations for the contexts (same orientation or reverse compliment) were binned for analysis.

35

2.7 Analysis of origins of replication

For annotation of mutation classes in the UCSC Genome Browser, mutation data for specific contexts were extracted from MAF files for each subject, from TCGA patients TCGA-

F5-6814, TCGAAA-A00N, TCGA-EI-6917, TCGA-CA-6718, TCGA-AA-3555, and TCGA-A6-

6141. Mutations were uploaded to the browser and colored depending on strand orientation:

TCT→TAT are colored red, and AGA→ATA mutations are blue. To determine the significance of mutation clustering at origins, mutations to the left of the position of origins were counted for the windows shown in Figure 2.8, and the numbers of TCT→TAT vs AGA→ATA on left and right sides of the origin were compared by Chi-Square anaylsis with Yate’s correction.

RESULTS

2.1 Polymerase and excision activities of Pole-exo* variants

In order to identify the effect of tumor associated POLE mutations (Pole-exo* variants) on the function of human Pol ε, we used site-directed mutagenesis to introduce several CRC and endometrial cancer mutations into an expression construct encoding the first 1-1189 amino acids.

This Pol ε construct was previously shown to retain full polymerase and exo function in vitro and contains all six conserved pol and exo domains143. The truncated construct of the catalytic subunit in yeast has essentially the same replication fidelity as the Pol ε holoenzyme during in vitro synthesis144. The first Pole-exo* variants to be analyzed were chosen based on proximity to catalytic active site residues, conservation among Pol ε homologs, and prevalence in different tumor samples in the TCGA data sets. These included P286R, P286H, V411L, F367S, L424V,

L424I, and S459F. These mutations are highly conserved among Pol ε homologs, and cluster around a pocket containing the acidic catalytic residues (Figure 2.1). In Figure 2.2, the conservation among other replicative polymerases is shown. Notably, P286, F367, and L424 are also highly conserved here.

36

After construction and purification of the Pole-exo* variants, their pol and exo activities were tested in steady state primer extension/excision reactions. The amino acid substitutions in the Pol ε exo domain had only marginal effects on DNA polymerase activity (Figure 2.3A); however, each mutation dramatically reduced 3ʹ-5ʹ exo activity relative to the wild-type Pol ε

(Figure 2.3B). Interestingly, the effects on exo activity on ssDNA and different mismatches varied (Fig. 3B-F, Table 2.1). The S459F substitution, which maps to the Exo III motif, reduced proofreading to a similar extent as the D275A/E277A amino acid substitutions, which essentially inactivate proofreading143. Substitution of P286, a conserved Exo I residue, with either H or R caused similar reduction in excision. Interestingly, the F367S substitution, which is at a highly conserved residue immediately adjacent to the catalytic D368 in the Exo II motif, severely reduced but did not completely abolish exo activity. The V411L variant, which is recurrent in

CRC and endometrial tumors and maps to a conserved residue in between the Exo II and Exo III motifs, also only partially reduced exo activity.

2.2 Reduction in excision for Pol ε cancer variants causes mutagenesis during in vitro DNA replication

In yeast and mouse models, mutation of catalytic residues in replicative polymerases that suppress or inactivate proofreading result in an increased propensity to extend from a mismatch and a subsequent increase in error rates80,124,145,146. Hence, it was hypothesized that Pole-exo* variants were mutators. The effects on replication fidelity of each mutant were measured using a lacZ forward mutation assay33. The Pole-exo* variants all increased the mutation frequency over the wild type, exo-proficient Pol , again to varying extents (Table 2.2). The increase in lacZ mutant frequency roughly correlated to excision ability, with L424V/I and P286R/H variants having the lowest increases in mutant frequency. The S459F mutant had the highest increase, and was more detrimental to fidelity than the catalytic D275A/E277A mutations. The triple mutant of

37

S459F/D275A/E277A had an identical mutant frequency as S459, suggesting D275A/E277A still retains activity that is not detected in our steady state assay.

To better understand exo inactivation in Pol ε variants, individual lacZ mutant clones were sequenced, and error rates for individual base pair substitutions and frameshifts were calculated (Figure 2.4). Overall, the rate of base pair substitutions and frameshifts increases for each variant. Although error rates varied for each mutant and each mispair, only two errors were strongly elevated for every one of the Pole-exo* enzymes: C→A transversions and C→T transitions (Figure 2.4B), irrespective of the context of flanking bases. Several Pole-exo* variants do show elevated error rates for a subset of different mispairings, in particular, in T→C and

G→A transitions. These result from a G•dT mispair, a mispair that is most efficiently corrected by MMR147, and therefore not as likely to contribute to mutagenesis in vivo in cells containing functional MMR. The S459F Pole-exo* variant shows high error rates for two additional errors,

T→G and G→C transversions, which may indicate a more complex effect on the exonucleolytic mechanism for this mutant.

2.3 The POLE tumor error spectrum and cell-free Pol ε synthesis show that cancer variants produce a unique hotspot mutation

Thanks to collaboration with Eve Shinbrot and David Wheeler at the Baylor College of

Medicine Sequencing Center, we were able to compare the errors made during in vitro synthesis those found in POLE tumors. Figure 2.5 shows the general error spectrum of

POLE tumors sequenced by the TCGA to POLD1 tumors or CRCs found to have MSS and MSI lacking a B family Pol mutation. In most POLE tumors (40/50), C→A mutations accounted for over 20% of all base pair substitutions, regardless of immediate sequence context. In contrast, all

MSI or MSS tumors from these same cancers have C->A mutation frequencies <20% (Fig. 2.5, right panel). The frequencies of all other base substitution mutations and indels were similar to those observed in the MSS tumors, but with slight increases in T→G/A→C transversions for

38

POLE tumors. When immediate flanking sequence context was taken into account, the extraordinary mutation frequencies of POLE tumors were accounted for almost exclusively by increased numbers of TCT→TAT and TCG→TTG substitutions (Figure 2.6A). Together,

TCG→TTG and TCT→TAT mutations account for >50% of the mutations found in POLE1 mutated tumors, and these tumors have higher mutation frequencies overall comparison with the

MSS/MSI group (Figure 2.5, bottom row). There were several POLE tumors which did not fit the ultra-mutator or high rate of TCG→TTG and TCT→TAT phenotype (Figure 2.5 right panel). Of these, most reside in the C-terminal portion of the exo active site, and were therefore predicted to have little effect on exo activity, as seen for similarly located residues of the T4 Pol148. Indeed, when the A428T variant was purified and tested, the excision capability was identical to that of the exo- proficient construct (data not shown, see Shinbrot et al121).

One of the advantages of the lacZ forward mutation assay is that replication fidelity for individual errors can be measured in a large number of sequence contexts. Given this fact, it was rather surprising when we determined that in the lacZ sequence there are no TCT motifs in which a C→A transversion can be measured. This may partially explain the relatively moderate increase in C→A transversions when compared to other errors. Therefore, to better compare tumor hotspot mutations with in vitro synthesis, direct sequencing of DNA synthesis products was performed.

Direct sequencing allows measurement of all replication errors, regardless of whether that replication error results in a phenotypically detectable change in the assay. There is an abundance of TCT sequence motifs throughout the M13-lacZ DNA that fit this criteria. M13mp2 ssDNA was used as a template for synthesis, with three equally spaced primers. Sequencing these reaction products revealed that P286R showed a preference for TCT→TAT transversions over

TCT→TTT or →TGT errors (Figure 2.6B). Also, the rate of TCT→TAT was higher than

AGA→ATA, supporting that in tumors, the C:G→A:T error spectrum is due to C→A transversions (extension from C•dT mispair) and not G→T transversions (G•dA mispair).

39

2.4 Genome-wide mutation patterns show replication strand bias

An interesting consequence of a high rate of TCG→TTG and TCT→TAT hotspot mutations is that in certain sequence contexts, the changes lead to introduction of stop codons.

For example, a TCG followed by an A in the coding strand, “T CGA” (Glu codon), is converted to ”T TGA” (stop) through this mutation. Although there is only one possible site where this can occur, this change is recurrently seen in POLE tumors in the TP53 gene, R213X, but not in MSI and MSS patient tumors. TP53 also contains five sites that could mutate to a nonsense codon through the second hotspot, TCT→TAT. However, the sequence context which would allow this occurs on the non-coding strand, on the opposite strand required for the TCG context. TCT flanked by T or C on the non-coding strand gives “A GAA” and “A GAG” on the coding strand, leading to “A TAA” (stop) and “A TAG” (stop) after TCT→TAT mutagenesis. Surprisingly, this hotspot is five times more abundant in this gene, but did not lead to induction of recurring stop codons, whereas the single TCG hotspot did (Figure 2.6C, leftmost panel; x2 p < 0.0001). This phenomenon shows that Pol ε exhibits strand specificity when replicating the TP53 gene, and this same pattern of strand specificity was also seen in PIK3R1 and ATM (Figure 2.6C). In each case, potential nonsense sites from both hotspots are available from both strands, but conversion to stop codons only occurs on one strand. This supports the idea that Pol ε is responsible for synthesis of one strand in human cells, as in yeast, where Pol ε is responsible for leading strand synthesis68.

In order to confirm strand bias and determine what fraction of the genome exhibits the strand bias, the more abundant TCT→TAT hotspot and AGA→ATA opposite strand mutations were tallied in consecutive 50-kb windows across all chromosomes. The ratios of C→A to the total number of C→A plus G→T mutations in each window is shown as a frequency distribution in Figure 2.7A, and compared to the distribution of ratios obtained from drawing an equivalent number of C→A and G→T mutations at random. The POLE tumors exhibited increased accumulation of windows on the tails of the distribution (<0.2 and >0.8), and decreased accumulation in the center (0.5), which shows the patter of C→A mutations is not random, and

40

the frequency distribution of C→A differed significantly in four of six tumors (Komolgrov

Smirnov (KS) test P-value range 0.0019–2 e -16). The other two tumors did not have a significant number of mutations to analyze. For all tumors, between 10% and 20% of the genome was found in 50 kb windows with <0.2 or >0.8 ratios.

In a second test, each TCT→TAT mutation was compared to the next TCT→TAT or

AGA→ATA mutation within a distance of 1 kb, 5 kb, 100 kb, and 1 Mb away, and classified as same-strand or opposite-strand respectively. The ratios of same-strand pairs to opposite-strand pairs were calculated for the different distance intervals (Figure 2.7B). This analysis showed that

POLE tumors had consistent enrichment of same-strand mutations at smaller distance intervals, but a random pattern was seen in the 1 Mb interval. This strand bias was not seen in a tumor containing wild-type POLE (Figure 2.7B, lower panel). These data indicate that Pole-exo* variants make mutations in a stranded and spatially dependent manner.

2.5 Strand-specific mutation patterns at origins of replication support that Pol ε replicates the leading strand in human cells

Given the strand bias of Pol  TCT→TAT mutations at replication origins, we hypothesized that Pol ε replicates the leading strand, as it was established in yeast68,69. As shown in Figure 2.8A, the reference strand should show an enrichment of TCT→TAT mutations to the left of an origin of replication, and enrichment of AGA→ATA to the right of origins since Pol ε would be expected to replicate away from the origin using opposite template strands. These mutation patterns were mapped onto chromosomes in the UCSC genome browser, using whole- genome sequencing (WGS) results from six different POLE tumors (TCT→TAT as red tick marks, and AGA→ATA as blue). On each chromosome, clustering of these two mutation types were seen across all six tumors, and in some cases transitions between clusters were common in all tumors, suggesting these various tumors may share common strong ORIs, which are also consistently used within each tumor and cell each cell division and for subsequent ones. In

41

support of this, strong transitions are seen at multiple well characterized human ORIs: LAMB2149

(Figure 2.8B), cMyc150 (Figure 2.8D), TOP1151 (Figure 2.8C), DHFR152,153(data not shown),

PRKDC/MCM4154 (data not shown), and G6PD155 (data not shown). The promoter of LAMB2 is one of the best-characterized human origins of replication. As shown in Figure 2.8B, the mutation pattern shows a strong preference of TCT→TAT on the left reference/leading strand while

AGA→ATA mutations are concentrated to the right. We counted the fraction of expected mutations on each side, and found that TCT→TAT mutations represented 65% on the left, higher than expected than for random, and 26% on the right (Figure 2.8B, x2 P = 0.0016).The mutational pattern near the TOP1 ORI gave similar results, with 95% of mutations on the left being

TCT→TAT, and 5% on the right (p= <0.0001). In the case of the Myc ORI, multiple strong ORIs have been reported over a 50 Kb window150, even though mapping mammalian origins is complicated because different origins are used in different cell lineages and results may be influenced by the methods. In addition, origins can be constitutive, flexible, or inactive and are influenced by replication timing events and replication stress142, all factors that make their location harder to define142.

Despite these features, the pattern of hotspot mutations at the Myc locus in the POLE tumors matches positions of strong origins around Myc identified by Besnard et al. in IMR-90 cells (human fetal lung fibroblasts), which are depicted by grey bars in Figure 2.8D. These results show that many regions in the genome show a Pol ε strand bias, likely marking strong origins of replication in the tumor lineages. Importantly, evidence of strand bias occurs at many well characterized origins of replication. These results are consistent with Pol ε continuously replicating DNA in a strand specific manner (Figure 2.8A), and provide the first evidence that Pol

ε is the leading strand Pol in human cells.

42

DISCUSSION

POLE tumors: ultramutator and MSS phenotypes collide

In this chapter, the effects of Pol  exo domain mutations found in CRC and endometrial cancers were examined. These variants are mutators in vitro that are associated with a hypermutated (10-100 mutations/Mb) or ultramutated (> 100 mutations per Mb) state in tumors.

These tumors also have the highest mutation frequencies seen in human tumors to date, and such high MFs have not been seen in tumors lacking a Pole-exo* variant, except for a few cases where

POLD1 is mutated156. It has yet to be demonstrated that introduction of a Pol ε mutator variant into non-malignant human cells leads to a mutator phenotype. The closest in vivo model so far is budding yeast, where the equivalent P286R mutation, P301R, was made in both haploid and diploid cells. The P301R mutation leads to an increase in mutation rates of 150- and 29-fold in haploid and diploid cells, respectively, in a Canr reversion assay133. More evidence that these mutations are causal in tumorigenesis is that inherited germline mutations increase susceptibility to CRC128. However, there is no evidence that mutations outside of the POLE exo domain, or even mutations in the very C-terminus of this active site contribute to pathogenesis through high mutation frequency or inherited germline mutation. These POLE mutations, which are found in

3–4% of CRCs and ECs, are more likely to be passenger mutations, and are more likely driven by their MSI phenotype131. Importantly, the current study shows that at least the A428T variant, which lacks an ultramutated phenotype in vivo (and has mutation frequency between 0.5 and 10 mutations per Mb), does not have a defect in in vitro exo activity using steady state excision assays121. In fact, the corresponding residue in yeast is T.

An intriguing question regarding the mutation rates in POLE tumors is to what extent Pol

ε variants contribute to high mutation rates in tumors with seemingly fully functional MMR background. One possibility is that the somatic mutations in Msh6 found in many POLE tumors,

43

though uncommon in other CRCs, may diminish MMR function. The effects of these variants on

MMR function have not yet been tested. Another possibility is that even with fully functional

MMR, the mutation burden overwhelms the repair pathway. The reduction of fidelity due to inactivation of MMR is 10-100 fold lower in Pol ε and Pol δ, and proofreading contributes an additional 10-fold correction116. Consistent with this, in our in vitro synthesis assay, we found that

Pol ε variants lead to 4-16 fold increase in the errors made during DNA synthesis in vitro compared to exo- proficient enzyme.

In vivo, it is possible that errors made by Pol ε could be corrected by Pol δ or the second

WT copy of POLE in addition to MMR, but it is unclear whether this is occurring in tumors. In the diploid yeast model, the mutation rate in Canr is lower than haploid, suggesting the wild-type allele can partially correct for mutagenesis. POLE mutations rarely have loss of heterozygosity in the exo domain, and this may be due to the fact that error rates would be too high for cell survival122,130,131,133. As the number of mutations in a given cell increases, the probability for mutation and inactivating a gene essential for life increases. In yeast, the threshold for increase in mutation rate is 1000 fold above the WT rate157. In collaboration with the Hospital for Sick

Children in Toronto, we have measured a ceiling for the absolute number of mutations in tumors containing one copy of POLE mutation and inherited bi-allelic MMR deficiency (bMMRD)156.

These tumors have acquired the inactivation of both safeguard mechanisms that correct replication error: MMR and polymerase proofreading. When the tumor genomes were sequenced at different timepoints during their development, they acquired about 600 mutations per cell division, and reached no higher than 20,000 mutations per cell. Therefore if LOH occurs in the absence of MMR, the mutation rate may be too high to support viability. The mutation rate for tumors with fully or partially MMR with POLD or POLE have not been measured. Pol δ EDMs are rare compared to Pol ε, and this may be because inactivation of Pol δ-exo leads to a higher rate of frameshifts, which would be more detrimental to cell survival143.

44

Exonuclease inactivation can occur at multiple locations, and at non-catalytic sites

We have mutated various Pol  exo domain residues that are found in POLE tumors, and found that these alterations have varying degrees of effect on exo inactivation and mutagenesis in vitro. Different degrees of exo inactivation are also seen in pre-steady state kinetic assays by our collaborators in the Suo lab at Ohio State University. 44,000 fold reduction in S459F activity, 250 fold reduction for P286H, and 10 fold for F367S (Zauhranick and Suo, personal communication).

This analysis supports the idea that each residue contributes differentially to exo function, but the mechanisms remain unknown.

Ideas of how the residues may disrupt exo function can be determined by looking at the structure of Pol ε, and the proximity of the residues to catalytic residues and metal ions. This has recently been determined for yeast Pol , but not yet for human Pol ε81,82. The exo site of Pol ε is thought to contain two metals that participate in catalysis, as other B family Pols81,82. Metal B is directly coordinated by one aspartate: D275 in human Pol ε81. This is similar to RB69, with the corresponding D114 binding metal B13. Also in RB69, the binding of a second D to metal B is mediated through water, D222 (D368 in Pol ε). The position of metal A could not be resolved in

Pol ε structures, but is bound by D114, D116, and D327 in RB69 pol (D462 in pol ), the latter through a water mediated interaction13. S459F, which is the cancer variant with the most disrupted exo activity yet measured, is adjacent to D462 in Pol  on almost the same face of an alpha helix in the structure. This residue points in toward the pocket thought to contain metal A, thus it may participate in H-bonding with the metal and this interaction would be interrupted with the addition of a phenyl group. This phenyl group could actually sterically clash with either D275 or E277, and therefore affect binding of either metal ion (see Figure 2.9A). S459F has less exo activity than the DIE/AIA mutant, for unclear reasons, but it is possible that in addition to affecting binding of both metals, this introduction of a large amino acid may distort the tertiary structure of the active site.

45

F367S may likewise affect exo function by interfering with metal coordination at

Asp368, but more likely plays a different role. F367 and L424 are actually poised to interact with the melted primer strand as it transverses from the pol active site to the pocket containing catalytic exo residues. In our assay, they each show the least disruption in exo activity. The

F367S and L424V/I substitutions may lower the affinity of the melted primer strand with the exo active site, and cause Pol activity to be slightly more favored kinetically83. The corresponding residues in phi29 polymerase, F65 and V130 (in T4, L287), show reduced binding to ssDNA and reduction in exo activity. They also require a lower dNTP concentration to switch from exo to pol function158. Interestingly, the second most common Pol ε variant found in cancers, V411L, also shows modest change in exo activity. In the yeast Pol ε structure, V411L is on a helix-loop-helix structure that points toward the polymerase active site and incoming dNTP. V411 is homologous to L123 in phi29 pol, which also binds ssDNA of melted primer strand and is highly conserved158.

P286R, the most common variant, is in the conserved ExoI motif, adjacent to DIE, which is in a flexible loop region in the yeast structure81,82. The analogous P286R substitution in yeast

(P301R) inactivates proofreading to a greater extent than the DIE to AIA substitutions, 50-fold lower, as shown by higher mutation rates in Canr, Lys+, and His+ reporter assays133. This loop in poised to interact with the thumb domain, with two helix-loop-helix motifs at residues 1064-1083 and 1111-1127 (Figure 2.9B). Hogg et al. identified that Pol ε has additional alpha helices in this region (residues 1116–1140) compared to Pol δ and other B-family Pols, that may be involved in exo processivity82. Mutation of leucine in this region of phi29 led to defects in exo activity, and the thumb region is therefore hypothesized to be involved in mediating transfer between active sites159. It will be important to determine the mechanism by which P286R is more mutagenic than

DIE to AIA, and this residue may contribute to Pol ε functions outside of exo activity133. In conclusion, the different Pole-exo* cancer variants may affect exo activity by different mechanisms depending on their relative locations: disrupting metal ion binding or positioning of catalytic residues, reducing stability of the melted ssDNA primer in the exo active site, or

46

affecting thumb and exo domain interactions and possibly the transition of the primer strand between active sites. These mechanisms may lead to similar effects of replication fidelity because they reduce kinetics of the exo reaction, and cause extension from mismatches in the pol active site to be favored.

Mechanism of specific hotspot mutations

Pole-exo* variants have a high rate of TCT→TAT transversions unique to these tumors, that we found is also made by the P286R variant in vitro121. Many tumors have a relatively high rate of C:G→A:T and A:T →C:G errors, and this is thought to be due to oxidative damage in high ROS environments of tumors26. Colon cancer cell lines have more A:T →C:G than other common cell lines that were sequenced, and also one of the highest C→T rates160. The POLE tumors show an increase in C:G→A:T compared to other CRCs and EECs without POLE1 mutations, and this increase is mostly due to the specific hotspot mutation. Therefore, this mutation signature is specific to Pol ε, and these errors are not made by Pol δ. As shown in Figure

2.5, the C to A skew is more prominent for some variants than others, with P286R>V411L. To address this we asked whether pol  variants had different activity on a C•dT mismatch (figure

2.3C), and found that V411L can better excise this mismatch than P286R, which would lead to a lower C to A rate for V411L. However, overall the functions of all variants tested were better able to correct this mismatch than T•dG, A•dG, or G•dA mismatches (see Table 2.1). Therefore increased excision ability does not correlate with lower error rate for that specific mispair, and it is possible that mismatch inactivation coupled another mechanism explains a high rate of C→A transversions.

One possible mechanism is the propensity of certain mismatches to form. Polymerases are more likely to make pyrimidine:pyrimidine mismatches due to active site constraints, compared to purine:purine, purine:pyrimidine, or pyrimidine:purine mismatches12,161. Once a mismatch is inserted, distortion of the active site, primer, and template can occur, and in some

47

cases translocation of the primer/template can be hindered12. These all change the rate of catalysis since substrates are no longer aligned, so that melting of the primer strand is favored. In the BF

Pol system, C•dT and T•dT mismatches cause the least distortion, with only the positioning of the primer end relative to the catalytic residues being distorted12. C•dT, G•dT, and G•dG mismatches can also be extended by this Pol12. If dynamics are similar with Pol ε, the mechanism of C→A could be that this is one of the mismatches most favorably made and extended kinetically among all mismatches. This may also explain why these mismatches are enriched at TCT, TCx, and xCT motifs. The dT primer end may be stabilized by temporary re-alignment within the active site as a

T•dT mispair. Lastly, mismatch repair may also play a role in leading to C→A in vivo, as transversion errors are the most poorly corrected errors by mismatch repair116. An increase in

C•dT mispairing would be seen in the event of exo inactivation in Pol variants, and this mispair may oversaturate the MMR system, even if it is fully functional.

Like C→A transversions, C→T transitions were also among the highest error rates in our in vitro lacZ forward mutation assay, but in tumors the amount of these errors are not significantly increased unless comparing at TCG motifs. They can arise in the lacZ forward mutation assay due to C•dA mispair insertion, or insertion of A opposite deaminated cytosine which becomes uracil26,33. Another possibility is that hydroxycytosine is contributing to C->T mutations162. We have not looked whether this occurs more at TCG motifs in vitro. However, in vivo, cytosine deaminases in the APOBEC family can act on regions of ssDNA and are sources of mutagenesis in some human cancers163,164. APOBECs acting in a sequence specific manner and in combination with increased mismatch synthesis by Pole-exo* variants may be contributing to higher rates of TCG->TTG mutations in POLE tumors.

Unique pathway for tumor initiation in POLE tumors

48

Cancers develop in a stepwise manner by sequentially acquiring genetic and epigenetic alterations that allow for clonal expansion of cells containing changes advantageous to growth165.

These are often mutations in proto-oncogenes that cause gain of function of the gene, or an inactivating mutation in a tumor suppressor gene. Each tumor has 2-8 mutations that promote or

“drive” tumorigenesis, in about 140 commonly altered genes. These genes are part of 12 signaling pathways now known to be involved causally, that carry out cell fate, cell survival, or genome maintenance166. The mutations which inactivate or lead to gain of function for genes in these pathways are often single base pair substitutions representing 95% of mutations, and small indels account for the remainder. 90.7% of base-pair substitutions in tumors lead to missense mutations,

7.6% are non-sense changes, and 1.7% alter splice sites or untranslated regions adjacent to start or stop codons”166 POLE tumors are unique in that 12% of AA changes are missense in EEC and

CRC, and in MSI 5%, MSS 7% (see sup tables 3 in ref.)121. This increase is due to the

TCT→TAT and TCG→TTG mutation hotspots, which increase the frequency of stop codons in tumor suppressors, with E to X and R to X being common.

As mentioned, the genes commonly mutated in other CRC and EEC are different than in

POLE tumors, or recurring mutations in these genes occur at unique residues in POLE tumors.

Some tumor suppressor genes showed unique recurrent changes in POLE tumors at TCT→ATA and TCG→TTG, including MSH2 (E580X) and TP53 (R213X) (see Figure 2.6C)167. Still other recurring mutations are seen at possible secondary hotspots in oncogenes and tumor suppressors.

Some examples include S→X variants of APC, R88Q in PIK3CA, and A146T in K-Ras. The latter two missense mutations occur in oncogenes, and could be explained by a C→T change in the non-coding strand at TCG or GCT motifs. GCT is actually the next most common motif where a C→T takes place in POLE tumors121,156. This suggests these unique mutations in the oncogenes may be caused by Pole-exo* variants, and importantly both mutations are oncogenic in the PI3K pathway168,169. The S→X mutations in APC may represent another secondary hotspot, as most of these occur at TCA codons. These could be mutated through C→A or C→G changes, the

49

former change being enriched at C’s with a 5ʹ or 3ʹ flanking T156. Other consequences of possible minor hotspots mutagenesis in POLE tumors will be an exciting new area of research. The mutation rate of Pol-exo* tumors can be up to 600 mutations/cell division in bMMRD tumors156, and this raises the important questions of how many “hits” in a tumor are a result of Pol-exo* variants, and how quickly do they arise. Pole-exo* mutations may be sufficient to initiate tumorigenesis, as APC, and also promote tumorigenesis through activation of Ras/PI3K and inactivation of p53. This would explain why typical APC mutations are mutually exclusive with

POLE, and in POLE tumors any APC mutations are likely a result of Pole-exo* variants.

Human origins of replication

Another interesting consequence of hotspot mutations is that they can provide clues as to the nature of Pol ε in vivo121. Shown in Figure 2.7, TCT→TAT mutations are found in clusters, that is, a TCT→TAT mutation is statistically more likely to be followed by another TCT→TAT mutation rather than an AGA→ATA mutation. Thus the pattern of replication is strand specific, and such a patter occurs across 10-20% of the genome. Well characterized human ORIs in the promoters of LMNB2, TOP1, MYC and DHFR show this mutational signature, and are all consistent with Pol ε as the leading strand polymerase. Pol ε specific mutations represent a method for mapping strong human origins, which has not an easy task due to the plastic nature of human ORI firing142. Therefore it is astonishing that the accumulation of POLE hotspot mutations over the life-span of a tumor can be seen in such distinct patterns across multiple tumor samples, which would be expected to be obscured by the pattern changing often. These areas likely represent ORIs that fire consistently through subsequent rounds of replication.

50

Table 2.1 Excision by Pol  variant constructs relative to WT activity. Numbers averaged over three timepoints, for 1 or 3 replicates (n=3, or n=9), are expressed as percent activity relative to wild type. Uncertainty, SEM, is shown after values in parentheses, the last column “n” is the number of replicates used to calculate percent activity and SEM.

S459F P286H P286R L424V L424I F367S V411L n ssDNA 5 (± 0.8) 9 (± 3.2) 12 (± 39 (± 17 (± 42 (± 34 (± 9 2.1) 2.2) 5.5) 2.4) 2.4) C•dT 3 (± 0.5) 5 (± 1.3) 6 (± 0.8) 56 15 (± 60 61 9 (± 1.9) 0.6) (±2.1) (± 4.6) A•dG 0 3 (± 3.4) 1 (±1.4) 2 (± 1.4) 4 (± 2 (± 1.1) 2 (± 1.4) 3 2.4) G•dA 2 (± 1.9) 3 (± 3.5) 4 (± 1.3) 21 (± 3 (± 14 (± 25 (± 3 1.2) 1.0) 0.1) 1.1) T•dG 5 (± 3.0) 10 (± 11 (± 41 (± 40 (± 21 (± 56 (± 3 6.1) 5.6) 11.8) 12.3) 11.7) 14.9)

51

Table 2.2 LacZ mutant frequencies for the Pol ε cancer mutant alleles. The lacZ forward mutation assay was carried out as described in the Methods with each mutant construct at the same time in parallel. The indicated numbers of lacZ mutations were sequenced and used to calculate individual error rates.

Wild S459F/ P286H/ D275A/ Type P286H P286R L424I L424V S459F D275A/ D275A/ D277A Exo+ D277A D277A No. Plaques 27,792 3,482 6,647 7,690 5,713 4,566 3,370 2,059 4,855 Counted No. Mutant 18 35 45 37 26 44 61 37 61 Plaques lacZ Mutant 6.5 100 68 48 46 96 180 180 125 Frequency (x10-4) Fold increase over Wild 1x 15x 10x 7.4x 7.1x 15x 28x 28x 19x Type Exo+ No. Mutant 18 35 45 37 26 44 61 d.n.s. d.n.s. Plaques Sequenced No. Mutations 18 44 46 37 26 44 61 d.n.s. d.n.s sequenced

52

53

Figure 2.1 Sequence and structural elements found altered in Pol ε mutated tumors. A) Sequence alignment of Pol ε homologs. The exonuclease domains of Pol ε homologs were aligned using Clustal Omega 1.2.0. The completely conserved residues are highlighted in blue, and the catalytic residues are in bold. Positions of the three conserved exonuclease motifs are underlined, with the residues mutated in human tumors indicated above the sequences. B) Exonuclease domain mutations mapped to the structure of Saccharomyces cerevisiae Pol ε. The exonuclease domain of yeast Pol ε (4M8O82) is shown as a ribbon diagram with the remainder of the enzyme shown as a surface representation. DNA (green) and incoming dNTP (blue) are indicated. Conserved Pol ε residues are shown in blue, with the remaining residues shown in gray. Residues found mutated in human tumors (red), catalytic residues (yellow), and the catalytic residue found mutated in a tumor (orange) are all indicated.

54

55

Figure 2.2 Conserved exonuclease domain elements from B family polymerases. The exonuclease domain sequences of the indicated B family polymerases were aligned using Clustal Omega 1.2.0. Residues that are completely conserved among all polymerases (gold) and catalytic residues (bold) are highlighted. The three conserved exonuclease motifs (ExoI, ExoII, and ExoIII) are underlined. Residues found mutated in human Pol  mutant tumors are indicated above the sequences. α-Helices (purple) and β-sheets (green) are denoted and were from the following structures: yeast Pol ε (4M8O82), yeast Pol  (3IAY78), RB69 Pol (2P5G170), and phi29 Pol (1XHX171).

56

57

58

Figure 2.3 Pol ε cancer variants are functional for polymerase activity but exonuclease activity is strongly reduced on ssDNA and primer/template containing a mismatch. A) The relative polymerase and exonuclease activities of the Pol ε cancer-associated variants were compared to those of the wild-type POLE enzyme (WTexo+). Reactions were carried out at 37°C and initiated by the addition of the enzyme. Aliquots were removed at the indicated times and stopped by addition of 5 mM EDTA. Reactions were resolved on 12% denaturing PAGE gels. For polymerase reactions, activity was tested on 21mer hybridized to a 45mer in the presence of 25 µM dNTPs. Enzyme to substrate ratio (E:S) was 1:50. B) Exonuclease activities were tested on a 21mer ssDNA substrate, at E:S of 1:2500, in the absence of dNTPs. C) The relative exonuclease activities were tested on different primer template substrates containing a terminal mismatch: C•dT, D) A•dG, E) G•dA, and F) T•dG. The nucleotide poised for excision is listed to the right of each gel. For more reaction details, please see the Methods section.

59

60

Figure 2.4 Pol ε cancer variants increase replication errors in vitro. A) Mutant plaques from the lacZ gap filling assay were sequenced, and used to calculate error rates for frameshift and base pair substitutions as described33,143. B) Error rates for each of the 12 possible individual base pair substitutions were determined as above. WT exo- is a previously characterized exonuclease- inactivating mutant (D275A/E277A)143.

61

62

Figure 2.5 Mutation spectra of POLE and POLD1 exonuclease domain-mutated tumors. Each column in the figure represents a tumor from an individual patient. The relative base change frequency portion indicates the relative proportion of a given class of mutation (color code for the type of mutation is shown in the key above the histogram) among all sequenced mutations for each individual tumor. Tumors are grouped first by polymerase (Pol) and then by affected residue (Amino Acid). Based on criteria including polymerase mutation, tumor mutation frequency, microsatellite stability status and relative proportion C to A transversions, tumors were classified into three distinct groups: Group A (dark blue bar above), Group B (light blue bar above) and wild-type polymerase in MSI and MSS (white bar above). Black horizontal dotted line demarks 20% C to A, the provisional threshold for classifying Group A. Black vertical line separates Group A from the Group B and wild-type polymerase categories. Cancer abbreviations are the following: (CRC) TCGA COAD/READ colorectal cancers; (EEC) endometrial cancer; (STAD) stomach adenocarcinoma; (GBM) glioblastoma; (LGG) lowgrade glioma; (BRCA) breast carcinomas; (PAAD) pancreatic adenocarcinoma; (HNSC) head and neck cancer; (KICH) kidney chromophobe. The MSI status, type of cancer, and mutation frequency range are color coded as shown in the key below the histogram. Other abbreviations are as follows: (MSS) microsatellite stable; (MSI) microsatellite instable; (nt) not tested. The white asterisk in the cancer track demarks a pancreatic tumor that had both POLE-exoP286R and POLD1-exoR311H mutations.

63

64

Figure 2.6 Pol ε cancer variants exhibit strand specific hotspot mutations. A) Frequencies of two different base substitutions when occurring in a specific local sequence context in colorectal (CRC) cancers in the most common mutation classes are shown: TCT -> TAT (Upper); TCG - >TTG (lower). Individual dots represent mutation frequency in individual tumors. Group assignments are represented by colors. Pole-exo* Group A (Blue); POLD mutant (light green); MSI, wild type pol (green); MSS, wild type pol (yellow). B) Pol  P286R mutants exhibit a preference for TCT -> TAT mutations in a cell-free reaction. In order to overcome the sequence context limitations of the lacZ forward mutation assay, DNA synthesis products from reactions with purified Pol  containing the P286R substitution were sequenced directly. This DNA substrate used the same M13mp2 single-stranded DNA template as in the forward mutation assay, but with different primers: three equally spaced oligonucleotides. Reactions were carried out under conditions identical to the forward assay. DNA synthesis products were then directly sequenced using Illumina paired end protocol. C) Pole-exo* mutations lead to enrichment of nonsense mutations at tumor suppressor genes in endometrial and colorectal tumors. Comparison of potential nonsense sites that occur in Pole-exo* context (TCT/AGA -> TAT/ATA and TCG/CGA -> TTG/CAA) to actual nonsense mutations found in POLE tumors. Selected tumor suppressor genes are shown. The number of potential nonsense mutation sites are shown in light blue; the nonsense mutation sites that actually occurred in POLE tumors are in dark blue. P- values calculated in Chi-square analysis are: TP53 p= 4.33 e -23; PIK3R1 p= 3.07 e -30; ATM p= 5.33 e -4; ATR p= 0.99 (the Pole-exo* mutations are evenly distributed among the potential sites).

65

66

Figure 2.7 Propensity of Pole-exo* hotspot mutations to cluster on DNA strands. A) Pole-exo* context mutations (TCT->TAT and AGA->ATA) were counted in 50-kb windows across six POLE tumors from the TCGA set: F5-6814, AA-A00N, EI-6917, CA-6718, AA-3555, and A6-6141. Normalized density of TCT->TAT mutations divided by total Pole-exo* context is shown in red, and randomized contexts are shown in blue. POLE tumors with high mutation rates show enrichment for extreme ratios of TCT->TAT mutations indicating clusters of same-strand mutation within the 50-kb windows. Four of six significantly deviate from random expectation by Kolmogorov-Smirnov (KS) P-value < 0.05. B) Pole-exo* context mutations were paired such that for each mutation, the partner was the next mutation at the indicated minimum genomic distance upstream. The Pole-exo* (CA-6718) tumor showed an enrichment for the same context (TCT- >TAT, TCT->TAT or AGA->ATA, AGA->ATA) versus different context (TCT->TAT, AGA- >ATA or AGA->ATA, TCT->TAT). This enrichment was not observed in the Pole-wt (AA- 3666) tumor. The blue bar in the 1-bp to 5-kb range of the Pole-wt only had 10 observations across the entire genome and may be a statistical outlier due to the limited number of observations.

67

68

Figure 2.8 The strand-specific mutation pattern of POLE tumor variants at known replication origins is consistent with Pol ε replicating the leading strand in human cells. A) A model replication bubble at an origin of replication is shown. The grey bar indicates ORI location. Pol ε C to A mutations were mapped to the reference strand. TCT -> TAT (red) or AGA -> ATA (blue) mutations were modeled using the UCSC Genome Browser and the Hg19 build. B) Pol  mutations at the LMNB2 ORI. The ORI position is chromosome 19:2367825 to 2368428, and is denoted by a vertical grey bar149. The TCT ->TAT and AGA -> ATA mutations were taken from six whole-genome sequences of POLE mutant tumors from TCGA patients (F5- 6814, AA-A00N, EI-6917, CA-6718, AA-3555, A6-6141)121. TCGA patient ID is to the left of the track. C->A (red hash marks) predominate to the left of the ORI (17/26); C->A (16/60) are the minority to the right (x2 P = 0.00164). C) C->A hotspot mutations also demonstrate Pol  is the leading strand polymerase at the TOP1 ORI. This origin was identified through ChIP of ORC complexes151. (x2 P < 0.0001) D) The strand specificity is shown at the c-Myc locus. The positions of origins, shown as gray bars, were defined in IMR-90 cells by Besnard et al172. Two strong origins were also located between positions 128,790,835 to 128,840,835 on chromosome 8 by Lucas et al150.

69

70

Figure 2.9 Pol  cancer variants and their interaction with functional Pol and Exo Motifs. A) The proximity of S459 (yellow) is shown in relation to catalytic residues (forest green). The highlighted residues are labeled with the human residue number but modeled onto the yeast structure PDB ID: 4PTF81. The template strand, primer strand, and incoming dCTP shown here in the polymerase active site, are labeled. Pymol was used to mutate S459 to F in silico. B) The interaction surface of the exo loop (periwinkle) containing P286 (hot pink) is shown with the Pol thumb domain (light pink). The template strand, primer strand, and incoming dCTP shown here in the polymerase active site, are labeled.

71

Chapter 3: The consequences of p261 exo inactivation on lesion bypass, cell viability, and point mutations in a human cell line model

72

INTRODUCTION

Since polymerase epsilon (Pol ε) replicates approximately half of the human genome, accurate DNA synthesis by Pol ε is very important. A small increase in error frequency can lead to a substantial increase in number of genomic mutations, as the diploid human genome is around

6 billion base pairs. Mutations in Pols can contribute to both mutator and anti-mutator phenotypes, the former contributing to genome instability14,123. To date, Pol ε variants have been identified in two human diseases: FILS (facial dysmorphism, immunosuppression, livedo, and short stature) syndrome and cancer167,173. In FILS syndrome, homozygous intronic mutations that lead to loss of exon 34 and a truncating frameshift in the catalytic subunit cause defects in G1/S phase transition in B and T lymphocytes, chondrocytes, and osteoblasts from the patients173. In addition to lower protein expression of POLE1 caused by the truncation, POLE2 is also downregulated173. Pol ε variants in cancers are mostly residues within the exo domain (Pole- exo*). Inactivation of nearby catalytic D275 and E277 (DIE/AIA) residues leads to increases in mutations during synthesis, a mutator phenotype80,124. Likewise, we have shown that Pole-exo* variants similarly reduce proofreading and increase mutagenesis121,156 (see Chapter 2 of this thesis). In addition, Pole-exo* variants contribute to a specific mutation pattern in tumors, as our lab and collaborators have shown121,156. Prior to the current work, this specific signature of

TCT→TAT and TCG→TTG base pair substitutions was unknown and had not been identified as

Pol ε-specific, although mutational signatures for exo and pol active site mutants had been characterized in vivo in mice and yeast, but not human cells80,124. Secondary characteristic mutations of Pol ε in tumors are C→A transversions with 5ʹ or 3ʹ flanking T156. Interestingly, one recurring or hotspot mutation made by yeast Pol ε is made from a C•dT mispair with a 5ʹ flanking

T, as measured by the URA3 reporter assay80. This error represented 37% of errors sequenced in the study80.

Measuring polymerase error signatures

73

The fidelity of DNA Pols can be measured in vivo by sequencing reporter genes, like

URA3. This is generally done in the context of proofreading inactivation, polymerase active site variants, or absence of MMR174. In yeast, colorimetric assays can also be used to quantitate mutation rates, but assays that measure reversion rates of mutant metabolic alleles to wild type function by specific replication error are more common. Such assays include His7-2 reversion assay, which measures +1 frameshifts at a specific site in the his7 gene that restores functional histidine biosynthesis175. Other common assays include the canavanine forward mutation assay, which selects for mutants in the arginine transporter/CAN1 gene that prevent uptake of the normally toxic compound canavanine176. The 5-FOA (5-Fluoroorotic Acid) forward mutation assay selects for mutations in URA3, a gene involved in uracil biosynthesis176. In vertebrate cells only two forward mutation assays have been used: 6-TG and ouabain resistance assays, which select for mutations in the HPRT1 and ATAP1 loci, respectively177,178. In the 6-TG resistance assay, cells with mutation in HRPT1 can first be removed from the population with HAT

(hypoxanthine, aminopterin, and thymidine) selection. 6-TG then selects for mutations that inactivate HPRT1 gene product function. For the ouabain resistance assay, mutations in the

Na+/K+ transporter that remove the ouabain binding site allow for cell survival. Detectable mutations are thus limited to a few residues on one surface of the protein. The ouabain and 6-TG resistance assays can be used to measure mutation rates through mutation fluctuation analysis. In this strategy, a small number of cells is expanded under non-selective conditions in many parallel cultures, and mutants are then selected by plating in drug selectivity176. Since an early mutation during expansion could produce a “jackpot” of mutants, the mutation rate is estimated from many parallel expansions, with the Ma–Sandri–Sarkar maximum likelihood method being the most accurate calculation176,179.

Pol ε error signatures

74

The 6-TG resistance and lacZ assays have been used to determine error signatures of human Pol ε pol active site mutants. Alteration of a key catalytic residue, M630G (M644G in yeast), leads to a high increase in T:A→A:T transversions, but specifically through T•dT not

A•dA mispairs68,180. In vitro, C→A, T→C, G→A, and C→T mutations are made almost equally abundantly by the M630G variant, followed by T:A→A:T transversions180. T:A→A:T transversions dominated the HPRT1 mutation spectrum from cells expressing proofreading- deficient Pol -M630G. These were followed by G:C→A:T and A:T→ G:C transitions180.

Mutation signatures Pol ε variants lacking exo activity have also been measured using

DIE/AIA mutants in vivo in both yeast and mice. This variant in yeast uniquely makes pyrimidine-pyrimidine mispairs T·dTTP, T·dCTP, and C·dTTP errors leading to T→A, T→G, and C→A base pair substitutions respectively. G·dATP and C·dATP (G→T and C→T) are also made at relatively higher rates than for other Pols80. It is estimated that the enzyme corrects 92-

99% of these during in vitro synthesis80. Exo inactivation led to a 6- to 13-fold increase in lacZ mutant frequency. G:C→A:T and G:C→T:A base pair substitutions were also the most commonly made in vivo, consistent with the high Pol ε error rate in vitro80. When exo activity and

Msh6 are inactivated in yeast, a preponderance of G:C→A:T and G:C→T:A substitutions are seen in the error spectrum as well80.

The error signature of the DIE/AIA variant has also been studied using isolated mouse embryonic fibroblasts (MEFs) derived from knock-in mice engineered to express this variant124.

The most common error seen in HPRT1 was A:T→C:G, followed by C:G→T:A , A:T→G:C, then C:G→A:T. Compared to the error spectrum in yeast, fewer T:A→A:T changes and more

A:T→G:C changes were seen in MEFs. The reason for this discrepancy is unknown, but may be explained by differences in number and sequence context of detectable sites since reporter genes in different organisms were used. In human cells, the error spectrum of DIE/AIA exo inactivation has not been measured, and that is one of the goals of this current chapter.

75

Error signatures in human tumors

With current advances in high throughput sequencing methods, many new mutations have been identified in human tumors. Classically these mutations had been classified in two categories: driver and passenger mutations, where driver mutations are responsible for tumor promotion and progression. Driver mutations offer selective advantage for the tumor cell, and 5-8 of these mutations commonly contribute to tumorigenesis within one tumor166. Passenger mutations, in contrast, are thought to be bystanders that do not play a role166. Vogelstein defined a mutational landscape of cancers wherein a few driver genes most commonly found in cancers are represented as mountains. Hills represent driver genes that are mutated less frequently in cancers.

The number of genes represented by hills is much higher than the number of genes represented by mountains181. Passenger mutations are even larger in number and are predicted to have no effect on tumorigenesis. High throughput sequencing efforts have revealed that these passenger mutations can be informative in understanding tumor development, in that they highlight mutational signatures, or patterns of mutations, of different mutagenic processes during tumor progression182. A mutational signature can result from combinations of processes that change over time182. Each unique signature is a result of DNA damage, via either exogenous or endogenous exposure, and the DNA repair or replicative mechanisms that act at the damage site either normally or abnormally182. For example, a preponderance of C:G→T:A base pair substitutions occur at dipyrimidine motifs, due to UV damage and repair of the lesions by NER. Traditionally, the six possible base pair changes (C:G→G:C, A:T→T:A, A:T→C:G, A:T→G:C, C:G→A:T, and C:G→T:A) have been used to analyze mutation signatures. However, recently next generation sequencing has allowed for useful interpretation of how mutations are affected by flanking nucleotides, leading to 96 possible base pair substitution patterns182. 21 unique error signatures were identified in 7,000 tumor samples from 30 different classes of cancer, and each tumor could be defined by one or multiple signatures181. Novel signatures have been identified for which mechanisms remain unknown, and this highlights the importance of developing

76

appropriate model systems for investigating how mutation signatures arise. Understanding these mechanisms and whether they are active could be key to developing cancer preventions or treatments.

Human cell culture model of Pol ε error signatures

Mutations in replicative polymerases have been assigned their own error signature, which is characterized by high proportion of C→T and C→A base pair substitutions. However, the mechanisms by which these errors are made are unknown, and whether they are made solely by polymerase or in combination with certain types of DNA damage is an open question. C→A and

C→T changes are common in other error signatures, such as mutations due to oxidative damage and lesion bypass, or actions of APOBEC (apolipoprotein B mRNA editing enzyme, catalytic polypeptide) cytosine deaminases which can edit stretches of ssDNA182. These may contribute to or confuse the polymerase signature. In addition, Pol ε variants may contribute to multiple/secondary error signatures since it operates in repair pathways. To address mechanisms of the Pol ε error signature seen in tumors, I developed and characterized a model of Pol ε exo inactivation in human cells. In this model, it is possible to observe mutations that are made by Pol

ε variants in different contexts, such as upon exposure to DNA damaging agents, and in the presence or absence of MMR. This is the first system to characterize heterozygous proofreading loss of Pol ε in a human cell culture model.

METHODS

3.1 Cell culture and generation of rAAV knock-in cell lines

The HCT116 human colorectal cancer cell line (a kind gift from Dr. Prescott Deininger) was grown in HyClone MEM/EBSS (Thermo Scientific) supplemented with 10% fetal bovine serum (Atlanta Biologicals), 1% sodium pyruvate (Invitrogen) and 1% MEM-NEAA

77

(Invitrogen). Generation of targeting constructs: In order to target the proofreading inactivating mutations to the POLE1 locus in vivo, we used rAAV with a synthetic exon promoter trap183,184.

A 1,045 bp fragment containing POLE exons 7 and 8 along with intron 7 (termed HA1) was PCR amplified from HCT116 genomic DNA using primers designed to add unique NotI and SacI sites to the 5ʹ and 3ʹ ends, respectively. A 1,057 bp fragment containing exons 9, 10 and 11 along with introns 9 and 10 (termed HA2) was PCR amplified from HCT116 genomic DNA using primers designed to add unique EcoRI and NotI sites to the 5ʹ and 3ʹ ends, respectively. Both HA1 and

HA2 were first cloned into pCR-TOPO and sequence verified. The catalytic exo DIE275-277 residues located in HA2 (exon 9) were changed to AIA (DIE/AIA mutant) using site directed mutagenesis and sequence verified. The Pol ε rAAV shuttle vector was assembled by four-way ligation using the restriction enzyme-digested gene-specific HA1 and HA2 fragments, along with the SEPT/loxP cassette digested with NotI-EcoRI and the ITR-containing pAAV shuttle vector digested with NotI. The exo-targeting vector was used to package high-titer recombinant adeno- associated virus into AAV2 serotype capsids.

Gene targeting and isolation of recombinant cell lines: Cells were grown in 6-well plates and transduced with rAAV when 75-80% confluent. At the time of transduction, cells were washed with 1x Hanks buffered saline solution (Invitrogen) before adding 1 ml of media containing 75 µl of a 1:100 dilution of rAAV lysate. 3 hours after transduction, virus was replaced with fresh media, and cells were incubated at 37°C for 48 hrs. After 48 hours, media was changed and cells were expanded onto 10 cm dishes into 400 ug/ml geneticin. Colonies were selected for 14 days, then isolated and expanded. Genomic DNA was extracted from expanded clones using DNeasy Blood and Tissue kit (QIAgen) according to the manufacturer’s protocol.

Locus-specific integration was assessed by PCR using a primer that annealed outside the homology region and another that annealed within the neo cassette (see Figure 3.1).

Cre-mediated excision of the SEPT cassette: To remove the SEPT cassette from correctly targeted clones, cells were infected in 75-80% confluent wells of a 6-well plate with adenovirus

78

that expresses the Cre recombinase (Vector Biolabs, Philadelphia, PA). Cells were plated at a limiting dilution in nonselective medium 24 hrs after infection. 12 days after infection, single cell colonies were plated in duplicate and geneticin was added to one set of wells at a final concentration of 400 µg/ml to test for sensitivity. During this time, genomic DNA was extracted as previously described and screened using primers that annealed across both homology arms.

PCR products were digested with SacI to distinguish between the wild type and recombinant locus.

3.2 Karyotype analysis

Metaphase chromosome spreads were prepared from the indicated cultures during the exponential phase of growth (65–75% confluence). Colcemid (10 μL/mL) (Boehringer

Mannheim GMBH) was added directly to the cultures and incubated for two hours at 37°C. Cells were trypsinized, washed with PBS, and incubated in 75 mM KCl at 37°C for 15 min. Cells were fixed with methanol/acetic acid (3:1), G-banded, and analyzed. Slides were analyzed under a light microscope at ×10 and ×100 magnifications. Images of the individual metaphase spreads were captured and karyotyped using an automated imaging system for cytogenetics (CytoVision; Leica

Microsystems).

3.3 Southern Blot Analysis

Genomic DNA was harvested from the knock-in cell lines using the DNeasy Blood &

Tissue Kit (Qiagen), and double digested with SacI and SalI. Hoechst fluorimetry was used to determine the concentration of DNA samples for accurate loading of samples. 4 µg of each sample was run on a 0.8% agarose gel in TBE. DNA was transferred to Hybond N+ membrane

(Amersham), blotted with a probe to HA2 at 65∘C overnight, and washed at 65∘C. To make the probe, a 300 bp sequence was amplified from the HA2-pCR-TOPO clone using the primers: 5ʹ-

GCATCTGCCCCACTGTTAGT-3ʹ and 5ʹ-CTCCCTGTTGGTGATGAGGT-3ʹ. The PCR

79

product was labeled using the Prime-It II Random Primer Labeling Kit (Agilent) and α-32P-dCTP

(Perkin Elmer). Membrane was blocked in Denhardt’s pre-hybridization buffer [6x SSC, 0.5%

SDS, 0.1% Ficoll 70, 0.1% Ficoll 400, 0.2% PVP, and 0.2%] at 65 C for 1 hour. The probe was added to hybridization buffer [6x SSC, 0.5% SDS, and 10% Dextran Sulfate] and incubated overnight at 65 C. To wash off excess probe, the blot was washed for 2 x 15 minute washes in wash 1 [10x SSC, 0.5% SDS], 2 x 15 minute washes in wash 2 [1x SSC, 1% SDS], and 2 x 30 minute washes in wash 3 [0.1x SSC, 1% SDS]. The gel was exposed to a PhosphorImage screen and scanned on a Typhoon Imager.

3.4 MLH1 correction in HCT116 cell lines

A pCMV-XL5 bacterial expression vector containing human MLH1 was obtained from the generous lab of Dr. Victoria Belancio lab at Tulane University. Mlh1 was amplified from the vector using the primers 5ʹ-TCGACTCGAGTCCACCATGTCGTTCGTGGCAGG-3ʹ and 5ʹ-

TCGAGGATCCGTTACTTAACACCTCTCAAAGAC-3ʹ, and Q5 polymerase (NEB). Next, Taq polymerase (Invitrogen) was used to add A overhangs to the insert, which was subsequently cloned into the pLenti-TOPOv6.3 vector using the standard protocol. Clones were verified by digest and sequencing. To produce MLH1-lentivirus, 293FT cells (Invitrogen) were seeded in 10 cm dishes at 1.5e6 cells per dish. The next day, cells were transfected with 3 ug vector DNA and

9 µg of ViraPower Packaging Mix (Invitrogen) using Lipofectamine 2000 (Invitrogen) according to standard protocol. After 48 hours, the supernatant containing virus particles was harvested, and filtered using a 0.45 µm cellulose filter. This virus was stored in 1 ml aliquots at -80 C. The virus was titered by transduction of confluent 24 wells containing HCT116 cells, using 10-fold serial dilutions of virus. Cells were selected in 10 ug/ml blasticidin for one week, and colonies were stained with crystal violet [0.2% crystal violet, 2.4% acetic acid, 5% methanol]. For transduction of Pol ε exo single knock-in cells, cells were seeded below confluency in a 24 well plate, and virus was added at an MOI of 1 in serum containing medium. After 48 hour incubation with

80

virus, cells were expanded and selected in 10 ug/ml blasticidin in 10 cm dishes. After 10-12 days, individual colonies were isolated and expanded. Mlh1 expression was assayed by western blotting, with anti-Mlh1 (BD Pharmingen, clone 168-15).

3.5 Mutation rate and mutant frequency measurements

For each cell line analyzed, cells were grown to confluence in two 6-well plates. Cells from one well were harvested and counted to estimate cell number in the remaining wells. For mutation rate measurement, 500 cells from each of the remaining eleven wells were seeded per dish in 3 x 100 mm dishes in media lacking 6-TG. These were used to measure plating efficiency.

At the same time, 5e5 cells from each of the remaining eleven wells were plated in 5 x 100 mm dishes in media containing 6-TG. After 7 days, colonies on the plating efficiency wells were stained with crystal violet and counted. After 12-14 days, the 6-TG resistant colonies were also stained with crystal violet and counted. Mutation rate was calculated using the Ma-Sandri-Sarkar

Maximum Likelihood Estimator (MSS-MLE) method179,185.

For mutant frequency measurement, cells at 70-80% confluence were transfected with 3

g plasmid DNA. After 24 hours, cells were trypsinized and counted. Briefly, 500 cells per clone were seeded in duplicate in 6-well plates in media lacking 6-TG and allowed to grow for 5-7 days to determine plating efficiency. The remaining wells were seeded with 5e4 cells in media containing 6-TG and allowed to grow for 12-14 days. After the indicated time, colonies were stained with crystal violet and counted. Mutant frequency was calculated by the following equation: (# 6-TG resistant colonies) / ([(plating efficiency scored colonies)/(plating efficiency cells seeded)] x (6-TG seeded cells)). Colonies were defined as ≥50 cells.

3.6 HPRT1 sequencing

One colony was picked per plate from the mutation rate experiment for sequencing, so that mutant clones were independently derived. Total RNA was isolated using the Qiagen RNeasy

81

kit (Qiagen) according to the manufacturer’s protocol. RT-PCR was performed with SuperScript

III Reverse Transcriptase (Invitrogen) according to the manufacturer’s protocol using 1 g of

RNA as a template. Primer-specific cDNA was amplified for 32 cycles at an annealing temperature of 60°C using the following HPRT1 primers: 123(fwd)

CTTCCTCCTCCTGAGCAGTC and 1041 (rev) GCCCAAAGGGAACTGATAGTC. From the

HPRT1 sequencing of 6-TG resistant colonies, clones were found to have exon 7 deletions. Exon deletions in HPRT1 have been shown to be caused by splice site mutations186. We therefore amplified exon 7 and its flanking region from genomic DNA prepared from the appropriate clone using the following primers: Forward: TTGTTTTCTTACATAATTCATTATCATACC; Reverse:

TTACTTTGTTCTGGTCCCTACAGAG.

3.7 Clonogenicity assay

Into a 6-well plate, 300 cells were plated per well in media containing increasing concentrations of H2O2. 6 replicates were performed per H2O2 concentration. Cells were incubated 8-10 days, then washed with PBS and colonies were stained with crystal violet. The per cent survival was calculated as a ratio of drug treated versus untreated colony counts.

3.8 In vitro lesion bypass assays

Recombinant enzyme used in the assays was purified as described in Chapter 2

121 methods . Primer extension assays were carried out in 50 mM Tris pH 7.5, 8 mM MgCl2, 0.1 M

DTT, 250 μM dNTPs, 100 nM 32P 5ʹ-end-labeled primer, and 1 or 2 nM enzyme at 37∘C for timepoints of 2, 4, and 8 minutes. Reactions were started by addition of enzyme. Reactions were stopped by diluting 1:2 in reaction stop buffer (95% formamide, 5 mM EDTA, 0.1% xylene cylanol, 0.1% bromphenol blue). Reaction products were heated to 95°C and separated on 12% denaturing PAGE and dried. Synthesis products were quantitated using a Typhoon imaging system. Primer and template sequences were 5ʹ-CCTCTTCGCTATTACGCC-3ʹ and 5ʹ-

82

TTGCAGCACATCCCCCTTTCGCCAGCXTGGCGTAATAGCGAAGAGG-3ʹ, where X denotes the position of the abasic site. Calculation of insertion, bypass, and extension probabilities was performed as described187.

RESULTS

3.1 Complete Pol ε exonuclease inactivation in an MMR-deficient background is likely incompatible with cell viability

Pol ε exo domain mutations have been identified in ultramutated human colorectal cancer, which suggests alteration of Pol ε proofreading function may be responsible for this phenotype167. However, the role of MMR in correcting Pol  variant replication errors is unclear since in a mouse model of Pol  proofreading deficiency with fully functional MMR, inactivation of one POLE1 proofreading allele does not increase cancer mortality124. In human patients, typically only one copy of POLE1 proofreading is inactivated and the tumors are MSS. To address whether mutation of one POLE1 allele leads to mutagenesis in MMR+ cells, we set out to develop a cell culture model of human Pol ε proofreading deficiency using a recombinant adeno- associated virus (rAAV) knock-in approach (Figure 3.1)183,184. This approach uses a synthetic exon promoter trap in which a cassette containing the targeted genomic change and the neomycin resistance marker, which lacks its own promoter, is inserted into the endogenous POLE1 locus.

Clones are initially selected for neomycin resistance, which now relies on expression from the promoter of the target gene for expression184. The colorectal cancer cell line HCT116 was used as it lacks Mlh1 and is MMR deficient188,189. This cell line has a stable diploid karyotype, with two wild type POLE1 alleles. With the rAAV approach, the neomycin resistance cassette is flanked by loxP sites, allowing for excision of the drug resistance marker and subsequent re-targeting of the second POLE1 allele.

83

To generate the knock-in virus, homology arms of 1.1 and 1.2 kb (HA1 and HA2, Fig.

3.1A) were cloned from HCT116 genomic DNA and then ligated into the rAAV viral construction vector183. Site-directed mutagenesis was used to introduce the DIE/AIA mutation143.

Single knock-in clones were made by my former lab mate Kim LeCompte, which we refer to as

Pol ε wt/exo- or single knock-in cell lines. Integration of the rAAV construct was verified by

PCR amplificiation of genomic DNA (Figure 3.1 B, top). After excision of the resistance cassette, the knock-in of one allele was verified by amplification around a SacI site that was introduced in the vicinity of exon 9, containing the DIE change to AIA (figure 3.1A, B bottom). PCR product that is resistant to SacI digest represents unaltered POLE1 allele. These PCR products were also sequenced to verify the genomic alteration (Figure 3.1C).

We also attempted to make homozygous Pol ε exo-/exo- cell lines by re-targeting the remaining wild type POLE1 allele in three different heterozygous Pol ε wt/exo- cell lines, using the same rAAV strategy . The first of these cell lines, generated by Kim, showed retention of a

SacI-resistant allele (Figure 3.2C). Sequencing exon 9 from POLE1 in these putative double knock-in cells revealed an increase in the ratio of AIA to DIE genotype (Fig. 3.2D) as compared to the Pol ε wt/exo- (Fig. 3.1C). This suggests the second targeting did not reintegrate into the first targeted allele. However, one wild type allele still remained. The second integration was also confirmed by southern blotting (Figure 3.2A, B). Before excision of the second targeting, three unique alleles are seen corresponding to WT, integrated, and successfully knocked-in alleles.

Analysis of band density using Image J software190 showed the WT and knocked-in alleles were present in a 1:3 ratio (Figure 3.2B, lane 3). Karyotype analysis showed this cell line was near tetraploid, and 100% of the cells analyzed were aneuploid (see Table 3.1).

The aneuploid state of the re-targeted knock-in clone could demonstrate that inactivation of Pol ε proofreading contributes to chromosome mis-segregation. Alternatively, if full inactivation of exo function in the absence of functional MMR is incompatible with cell viability, the process of re-targeting the second allele could select for rare aneuploid cells among the

84

population that retain at least one POLE1 alleles. Shown in Table 3.1, 0.0035% of the parental cell line is aneuploid. To help distinguish between these mechanisms, I attempted to make additional double knock-in clones. The retention of the wild type allele was again verified by SacI digest of the exon 9 PCR product (data not shown). This was shown for three independent double knock-in clones. I am currently in the process of verifying the clones by southern blot. These results suggest cell lines that are fully deficient in MMR and Pol ε proofreading are not viable, and this is consistent with what is seen in the mouse model of Pol ε exo inactivation124,191.

3.2 Inactivation of Pol ε proofreading causes a mutator phenotype in human cells

To test the effect of inactivation of proofreading on genome instability, the mutation rate at the hypoxanthine-guanine phosphoribosyltransferase (HPRT1) locus was measured178. This assay is a forward mutation assay which allows sequencing of the HPRT1 gene from 6- thioguanine resistant clones (6-TGr). The measurements were repeated in clones derived from independent single knock-in events. As shown in Table 3.2, loss of one exo-proficient allele led to a 4- to 7-fold increase in mutation rate for three independent clones. The mutation rate calculated for the HCT116 WT cells, 46 x 10-7, is consistent with previously measured rates in this cell line178,180,189. Notably, this increase occurred in the presence of a wild type POLE1 allele, suggesting alteration of one allele can have at least a partially dominant effect on mutator phenotype. The mutation rates of double knock-in cells could not be measured due to the aneuploid state and presence of multiple HPRT1 loci. Similar knockout of exo function in the pol2-4 (DIE/AIA) mutant comparably increased His7-2 reversion and URA3 forward mutation rates 10-fold, however, this was in a MMR-proficient haploid strain192. In a diploid MMR proficient strain, heterozygous inactivation of Pol ε proofreading increased His7-2 reversion 3.8- fold193. Similarly, in mouse embryonic fibroblasts (MEFs), single knock-in of exo- allele led to 4 fold increase in HRPT1 reporter mutagenesis, though in the presence of MMR194. To determine whether Pol ε wt/exo- cells are mutators even with functional MMR, I am in the process of

85

correcting MMR deficiency using retroviral overexpression of Mlh1 in two independent heterozygous clones and HCT116 wild type cells. HCT116 cells are actually deficient in multiple

MMR proteins in addition to Mlh1, and Mlh1 restoration will hence allow correction of Msh2/6 substrates but not Msh2/3195. This approach is being used because generation of knock-in cell lines in MMR proficient cells lines is very inefficient because MMR suppresses homeologous recombination, the process used during viral transduction196.

3.3 Inactivation of Pol  proofreading leads to large increase in base pair substitutions, particularly C:G→A:T transversions

To determine the type and frequency of errors made in cells upon Pol ε exo inactivation,

HRPT1 mutations were sequenced from independent heterozygous isolates, as well as from matched HCT116 wild type cells (Table 3.3). The mutations sequenced from the 20 Pol ε wt/wt clones were similar to the spontaneous HPRT1 mutation spectra reported by four previous studies178,186,197,198. For example, two hotspot mutations, insertion of G at position 207-212 and

C→T transition at position 508, were common in both wild type mutation spectra. To validate our data set, we compared error rates calculated for base pair substitutions and frameshift errors from this study (20 mutations) to the combined data from the four previous studies all using HCT116 cells (144 mutations, Table 3.4, Table 3.5 column 2). The error rates calculated from our current data set are in close agreement with the large combined historical dataset. These results show that base pair substitutions were made at higher rates than frameshift errors (Table 3.5), with C→T errors being the most common. This error can be made through legitimate C•dA and G•dT misincorporations, or through spontaneous misinsertion opposite a deaminated cytosine. C or methyl C residues can spontaneously deaminate to uridine or thymine respectively, and if not repaired correctly, an A will be added across from uridine in the subsequent round of replication

26.

86

In the wt/exo- cells, the error rate for base pair substitutions increased 11-fold over wild type, also rising from 61.5% of errors to 96% of total errors (p= 0.0004, Fisher’s exact test). In addition, the proportion of frameshift errors decreased from 36.5% to 4% (p=0.0005, Fisher’s exact test), suggesting that the Pol ε DIE/AIA variant mainly contributes to mutagenesis through base pair substitutions. However, the overall error rates for both types of errors increased, as predicted by in vitro data121 (Table 3.5). In agreement with our in vitro studies, the majority of base pair substitutions were C:G→T:A mutations, which showed a 6-fold increase from the error rate in WT cells. The second and third highest error rates were C:G→A:T and T:A→A:T changes. These were the only mutations to increase in significant proportion between data sets

(p=0.0134, and p=0.0315, Fisher’s Exact Test). The C:G→A:T base pair substitution was predicted from in vitro and tumor data (see Chapter 2, Figure 2.4B)121,156. A:T→T:A errors are also highly increased with pol or exo mutants in vitro143,180.

Since C→A mutations are made at TCT hotspots by exo-compromised Pol ε variants121,156, we analyzed the context of C→A mutations sequenced in HPRT1 from heterozygous cells. Of the seven C:G→A:T mutations sequenced, 4 occurred in the TCT context,

2 occurred in CCT, and 1 occurred in TCC motif. The C→A at TCT occurred at two different positions, on opposite strands. Therefore all mutations were made in the context of a 3ʹ flanking

T, 5ʹ flanking T, or both. In comparison to the error spectrum in HCT116 WT cells, all G→T or

C→A mutations were also in the context of 1 or 2 flanking Ts178,186,197,198. 13 C→A mutations were seen out of 164 total, and 2 of these errors were at the TCT hotspot. Again, these C→A mutations were seen on both strands of the HPRT1 gene. This is consistent with the Pol ε-specific mutation pattern seen in human tumors at this locus, that our collaborator Eve Shinbrot had loaded into the UCSC Genome Browser121. The error rates for C:G→A:T substitutions at TCT and non TCT hotspots were calculated for HCT116 WT and single knock-in cells (see Table 3.5).

Interestingly, the C:G to A:T error rate at TCT motifs in Pol ε wt/exo- cells increased at least 16- fold over HCT116 WT cells, compared to a 6-fold increase at non-TCT motifs. When the same

87

error rates were calculated including previous HCT116 WT sequencing data, the TCT→TAT specific error rate increased 66-fold.

3.4 Inactivation of Pol ε proofreading leads to increased abasic site bypass and insensitivity to DNA damaging agents

In tumors, base pair substitution patterns may be influenced by the tumor microenvironment, which is oxidative199. C→A and A→C transversions can be caused by bypass of the most common oxidative lesion, 8-oxo-guanine (8-GO), or by incorporation of 8-oxo-dGTP opposite template A115,200. While normal undamaged DNA bases adopt the anti-configuration with respect to the glycosidic bond, the 8-GO lesion is more stable in the rotated syn conformation. When in the syn conformation it can readily base pair with an A115. In addition, in some cases exo-deficient enzymes have been shown to readily incorporate oxidized dGTP201. In cancer as well as normal cells, spontaneous abasic lesions are another very prominent lesion.

These lesions can also be induced by oxidative stress and are intermediates in the repair of many lesions (e.g. after the glycosylase step during BER). Many polymerases add an A across from abasic sites, according to the A-rule202,203. It follows that the spontaneous depurination of G, followed by polymerase insertion of an A, would then lead to C:G→A:T transversions. Yeast Pol

ε has been shown to bypass abasic lesions204. Based on these observations, we hypothesized that

Pol ε cancer mutants may more readily bypass these two common lesions in vitro. To test this, we designed 45mer templates which contain 8-oxo-G or abasic sites, and looked for the ability of the enzyme to insert nucleotides across from or extend past the lesions.

Both wild type (exo+) and DIE/AIA (exo-) Pol ε variants were able to fully extend the undamaged substrate (Figure 3.3A left and 3.4A, left). However, when the template contained an abasic site, synthesis of full length product was strongly reduced for the exo+ enzyme (Figures

3A, right). Notably, the exo+ enzyme showed a reduced ability to insert a nucleotide opposite the abasic site as measured by the low insertion efficiency opposite the lesion (Fig. 3.3B).

88

Inactivation of 3ʹ-5ʹ exo activity in DIE/AIA, P286H, and S459F variants enhanced the ability of

Pol ε to insert a nucleotide opposite the non-coding abasic site and increased the probability of bypass past this lesion (Fig. 3.3B). Overall bypass efficiency increased by 2- to 4- fold for exo- deficient enzymes, indicating more efficient bypass (Fig. 3.3B). The effects on insertion were magnified when the enzyme begins from a “standing start”, with the abasic site occupying the first template position (Figure 3.3C). In contrast, exo activity had no effect on Pol ε insertion or bypass past a template 8-GO under the same reaction conditions used to measure the abasic lesion

(Fig. 3.4A, B). Since abasic sites arise in significant numbers spontaneously, the increase in base pair substitutions seen in Pol ε wt/exo- cells observed under normal growth conditions could be partially explained by an increase in the ability of the DIE/AIA variant to bypass these lesions specifically.

As H2O2 exposure increases the occurrence of abasic sites, particularly at sites of DNA replication205, we measured the effects of increasing the number of abasic sites and other oxidative lesions in cells by chronic exposure to varying concentrations of H2O2. Inactivation of a single proofreading allele enhanced resistance to cell death at increasing doses of H2O2 (Fig.

3.5A). At the lowest dose, cells expressing Pol ε DIE/AIA were essentially unaffected by H2O2 treatment. This is likely not due to differences in cell growth, since essentially no difference was seen between wild type and heterozygous clones (Figure 3.5C). In addition, tolerance to 5- fluorouridine (5-FU) was tested (Figure 3.5B). This drug can be incorporated into DNA during synthesis, and metabolized to many different lesions, including transient abasic sites in vivo206.

Just like for H2O2 challenge, the single knock-in clone had enhanced colony formation ability relative to HCT116 wild type cells at various concentrations of 5-FU. Taken together, these data suggest that the enhanced tolerance of oxidative lesions, in particular bypass of abasic sites, confers a cytoprotective effect upon cells with inactivated Pol ε proofreading.

89

DISCUSSION

The effects of Pol ε proofreading deficiency on replication errors in cells

To identify errors made by a replicative polymerases rather than by spontaneous DNA damage, Pol variants that show reduced proofreading and/or nucleotide selectivity are commonly used 174. In this chapter, the first human cell model of Pol ε with inactivated exonuclease activity was generated and characterized. This model helped us to explore to what extent exo-deficient

Pol ε is responsible for the error signature seen in tumors. The pattern of Pol -dependent replication errors made in the HRPT1 reporter gene in Pol  wt/exo- cells matched the replication errors seen when proofreading-deficient Pol  was measured in vitro as well as in tumors containing Pol  exo mutants. A high rate of base pair substitutions are seen, specifically for

C:G→A:T and T:A→A:T changes. As shown in Chapter 2, a high rate of C→A transversions is consistent with the Pol ε-specific mutation pattern in human tumors121,156. These base pair substitutions were also among the highest errors made by a polymerase active site Pol ε mutant that also lacked exo activity80,180. However, our data is also in slight contrast to those from Pol  exo-deficient mouse cells, which showed the highest increase in A:T→C:G followed by

C:G→A:T mutations as well as a lack of A:T→T:A transversions194. Another error expected based on data from mice is T:A→C:G changes, but these errors were not seen in high abundance in the human cells. Therefore, additional mechanisms may be occurring in vivo which alter the base pair error spectrum that is expected from in vitro data, which specifically select for

C:G→A:T mutations. In addition, discrepancies between our data and mouse data may be explained by differences in MMR, as the mouse data was collected in an MMR-proficient background. T→C transition errors result from T•dG mispairs which are less well corrected than transversion errors116. The Pol ε specific errors may represent one piece of a mutational signature in an established tumor, to which many processes may contribute and at different timepoints in the development of the tumor182.

90

This model cell line also allowed us to begin to study the contributions of an exo- deficient POLE1 allele to lesion bypass. Pol ε-dependent mutagenesis may be due in part to the loss of functions other than correcting misinserted nucleotides during unchallenged replication.

The increase in C:G→A:T and A:T→C:G transversions seen in in Pol ε wt/exo- cells is similar to the mutation spectra from cells exposed to oxidizing agents and from cells with reduced ability to efficiently repair oxidative damage to DNA115,200. However, we saw no difference in 8-GO bypass between wild type and DIE/AIA variants in vitro (Figure 3.4). In addition, upon treatment of

H2O2, the HPRT1 mutant frequency increased in Pol ε wt/wt cells to the same extent in Pol ε wt/exo- cells, suggesting the increased survival is not due to bypass of 8-GO (data not shown, and see Figure 3.4). H2O2 also increases transient abasic sites, especially at sites of active DNA replication, and exo-deficient Pol ε may confer increased ability to bypass these lesions205. The increased ability of the Pol ε-DIE/AIA to bypass abasic sites, one of the most common spontaneous DNA lesions found in vivo, coupled with the observation that proofreading inactivation confers protection from oxidative damage-induced cell death, suggests the possibility that exo- Pol ε contributes to cell survival and potentially mutagenesis through increased lesion tolerance. This is in contrast to what is seen with human Pol δ, where a similar two amino acid replacement that inactivated proofreading actually decreased abasic site bypass207. Bypass of lesions by Pol ε may add to mutational signatures in tumors, and this can be studied with our cell culture model of exo inactivation in the future.

Pol ε mutagenesis in the context of MMR

In addition to the proofreading activity of polymerases themselves, replication errors are also corrected by the MMR pathway208. Upon exo inactivation of one copy of POLE1 in MMR- deficient cells, an average 5-fold increase in mutation rate at HPRT1 reporter was observed. This moderate effect is similar to the 8-fold increase seen in Pol ε wt/exo- mismatch repair-deficient diploid yeast193. As in yeast, it is likely that this effect is relatively modest due to the

91

simultaneous presence of both the proofreading-proficient and –deficient enzymes, with the exo- proficient Pol being able to correct errors made by the exo-deficient enzyme. In support of this hypothesis, a 520-fold increase in mutation rate is observed in the pol2-4 pms1 double mutant in haploid yeast192. We were unable to measure full loss of Pol ε exo activity in the absence of

MMR, due to inability to generate clones lacking a wild type POLE1 allele (Figure 3.2). This is consistent with a study that showed full inactivation of Pol ε proofreading and MMR was embryonic lethal in mice, and suggests human cells are likewise inviable under these conditions, due to such high error rates increasing the probability to inactivate an essential gene191,194,209.

The Pol ε hotspot mutation was seen in tumors that were completely deficient for MMR and in tumors whose MMR status was not measured biochemically, but which lacked microsatellite instability, a hallmark of MMR inactivation121,156. While MSS, several of the latter tumors had somatic base pair substitutions in Msh6. This could mean that tumors were slightly deficient in correcting base pair substitutions but proficient in correcting frameshift errors195.

However, the functional significance of these mutations remains to be characterized. Deficiency in MMR may lead to high error rates that cause self-extinction within the tumor. In contrast, although the absolute numbers of errors made in the tumors is high, fully proficient MMR may substantially decrease the error rate, and this could explain why these tumors are slow growing and low grade. Mutation rates in the similar Pol ε wt/exo- mismatch repair null background were not measured in mice. Instead, the mutation rate in the heterozygous Pol ε wt/exo- mutant was measured in mismatch repair proficient cells and no significant change was seen relative to the

Pol ε wt/wt cells194. To help answer the question of whether the Pol ε wt/exo- state is mutagenic in the presence of functional MMR in human cells, I am in the process of correcting Mlh1 expression in Pol ε wt/exo- cell lines using a retroviral expression system. This correction will also help determine whether MMR influences Pol ε specific error signatures.

92

Table 3.1 Karyotype Analysis of Pol ε exo- knock-in cells

Genotype Number Aneuploid Number Diploid % Aneuploid Pol ε wt/wt 1 284 0.0035 Pol ε exo- single 2 127 0.016 knock-in Pol ε exo- double 123 0 100 knock-in

93

Table 3.2 HPRT1 mutation rates of Pol ε exo single knock-in clones. Mutation rates and 95% confidence intervals were measured by fluctuation analysis as described in the Methods using the Ma-Sandri-Sarkar Maximum Likelihood Estimator

Cell line Pol  Mutation rate 95% CI Fold increase per 10^7 HCT116 wt/wt 46 (22-77) 1x Clone 1 wt/exo- 200 (127-286) 4x Clone 2 wt/exo- 163 (97-241) 4x Clone 3 wt/exo- 347 (240-467) 7x

94

Table 3.3 HPRT1 mutations sequenced from 6-thioguanine resistant Pol ε wt/exo- and Pol ε wt/wt HCT116 cells. For each cell line, HPRT1 cDNA was made by RT-PCR, amplified and sequenced from independent 6-thioguanine resistant clones. Verified errors are indicated by type on the coding strand and position relative to the +1 start site. Insertion (ins) or deletion (Δ) of the indicated base(s) is denoted.

Pol ε wt/wt Pol ε wt/exo- Sequence Sequence Error Position Error Position context context g to t 134 AGG g to a 47 GGT g to a 209 GGG g to a 143 CGT c to a 235 GCT t to a 146 CTT g to a 400 GGA c to t 149 GCT c to t 508 ACG c to t 149 GCT c to t 508 ACG c to t 151 TCG c to t 508 ACG t to g 170 ATG c to t 508 ACG c to a 222 TCT c to t 508 ACG c to a 222 TCT g to a 539 GGA c to a 222 TCT g to a 539 GGA a to g 349 AAT g to a 539 GGA c to t 416 ACT insG 207-212 c to t 430 GCA insG 207-212 c to t 508 ACG insG 207-212 c to t 508 ACG insT 315-316 c to t 508 ACG delA 496-449 g to t 529 AGA Δ17bp 610-626 c to a 550 TCC Δ17bp 610-626 g to a 574 TGC ΔT 812-813 c to a 577 CCT c to a 577 CCT insT 758-763 g to a intronic AGC t to a intronic GTA t to a intronic GTA

95

Table 3.4 Results of HCT116 WT sequences from previous studies. Error rates are calculated with the following formula: (observed number of events/total number sequenced) x (mutation rate). The mutation rate used to calculate error was the one measured in this study, 47 e -7 (This rate matched the average rate from three previous studies, 46 e -7)178,180,189

Error Number of errors observed Error rate (x 10-5)

Frameshift 54 18 -1 42 14 +1 12 4 Base Pair Substitution 90 30 C:G→T:A 52 17 A:T→C:G 0 ≤ 0.3 C:G→A:T 15 5 A:T→G:C 18 6 A:T→T:A 3 1 C:G→G:C 2 1 Number sequenced 144

96

Table 3.5 Exo- Pol ε allele increases base pair substitutions, specifically C:G to A:T, in HPRT1 spontaneous mutation assay. Error rates for frameshifts and base pair substitutions were calculated for both the Pol ε wt/wt and Pol ε wt/exo- strains by using the following equation: (observed number of events/total number sequenced) x (mutation rate).

Pol ε wt/wt in Pol ε wt/exo- in Fold Increase Error type vivo Error vivo Error (Pol ε wt/exo- error rate/Pol Rate (x 10-5) Rate (x 10-5) ε wt/wt error rate) Frameshift 10.8 19.0 2 -1 3.6 ≤9.5 3 +1 7.2 9.5 1 Base Pair 21.6 237.0 11 Substitution C:G→T:A 18.0 113.8 6 A:T→C:G ≤ 1.8 9.5 ≥ 5 C:G→A:T 3.6 66.4 18 At TCT motif ≤ 2.3 37.9 ≥16.1 At non-TCT motif 4.7 28.4 6.4 A:T→G:C ≤ 1.8 9.5 ≥ 5

A:T→T:A ≤ 1.8 28.4 ≥ 16 C:G→G:C ≤ 1.8 ≤9.5 ≥ 5 Number 20 25 sequenced

97

98

Figure 3.1 Generation of exonuclease-deficient Pol ε human cell lines by gene targeting. (A) Gene targeting scheme to change the sequence coding for the exo active site amino acid residues DIE275-277 to AIA at the endogenous human Pol ε p261 locus (POLE1). Two regions (HA1 and HA2) of the POLE1 locus containing exons 7 and 8 and exons 9-11 (black boxes), respectively, were amplified from HCT116 cells and used as homology arms in rAAV construction. The rAAV created for gene targeting used a promoterless neomycin-resistance marker containing a splice acceptor site and introduced a novel SacI cleavage site into the POLE1 locus183. LoxP sites (triangles) flanked the cassette, allowing for Cre-mediated cassette excision and subsequent targeting of the second wild type allele. PCR primers are denoted by arrows. (B) The indicated primer pairs (shown on the scheme in A) were used to amplify genomic DNA from geneticin resistant (integration) and geneticin sensitive (excision) clones to verify construct integration at the genomic POLE1 locus and subsequent excision. (C) PCR was used to amplify exon 9 and verify gene targeting using genomic DNA from HCT116 cells (Pol ε wt/wt) and Pol ε wt/exo- cells. The positions of base changes are denoted by asterisks.

99

100

Figure 3.2 Complete inactivation of polymerase error proofreading is incompatible with cell viability. A) The remaining wild type, proofreading-proficient Pol ε allele in the heterozygous Pol ε wt/exo- cells was targeted for exo- allele replacement using the same rAAV strategy described above. The scheme for verification of correct gene targeting using southern blotting is shown. For a probe targeting HA2, SacI digestion yields three unique sizes for three unique alleles as shown. B) Southern blot of one representative double knock-in clones. Genomic DNA was digested with SacI and resolved on a 1% agarose gel in TBE. The DNA was transferred to Hybond N+ membrane (Amersham), and blotted with a probe against HA2. The sizes of the 1Kb ladder are listed to the left of the blot. The expected sizes of genomic, integrated, and knocked-in alleles are shown. C) The retention of a WT allele was also verified by SacI digest of a PCR amplification, as in Figure 3.1B. The arrow highlights unaltered genomic POLE1 allele. D) Genomic DNA was prepared from correctly targeted, Cre-treated geneticin sensitive cells and a region containing exon 9 was amplified by PCR and sequenced to verify gene targeting in a re- targeted Pol ε wt/(n)exo- clone.

101

102

Figure 3.3 Pol ε exonuclease deficiency increases bypass of abasic lesions in vitro. A) Primer extension assays were carried out with an 18mer 32P 5ʹ end-labeled primer annealed to a 45mer undamaged template or template containing an abasic site (position indicated by X).The template sequence immediately downstream of the lesion is shown. Reactions contained 100 nM DNA substrate and were initiated by the addition of enzyme (2 nM P286H, 1 nM for other enzymes) and incubated at 37°C for 2, 4 or 8 minutes. B) Insertion probabilities, bypass probabilities, and bypass efficiencies were determined from quantifying gel bands in (A), as described187. Briefly, insertion probability was calculated as ([≥N+2]/[≥N+1])*100), bypass probability as ([≥N+3]/[≥N+1])*100, and bypass efficiency as (bypass probability on damaged template)/(bypass probability on undamaged temple)*100, where N is the position of the unreacted primer. (Error bars: SEM, N=9). C) Primer extension assays were carried out as in A, but with a 19-mer primer.

103

104

Figure 3.4 Pol ε exonuclease activity does not affect bypass of 8-GO in the template. A) Primer extension assays were carried out with a 5ʹ-32P end-labeled 18mer primer annealed to an undamaged template 45mer or a 45mer template containing an 8-GO site (position indicated by G*). The template sequence immediately downstream of the lesion is shown. Reactions contained 100 nM DNA substrate and were initiated by the addition of enzyme (2 nM P286H, 1 nM for other enzymes) and incubated at 37°C for 2, 4 or 8 minutes. B) Insertion probabilities, bypass probabilities, and bypass efficiencies were determined from quantifying gel bands in A). (Error bars: SEM, N=9).

105

106

Figure 3.5 Exo inactivation increases survival of drug-challenged cells. A) Fraction survival of HCT116 Pol ε wt/wt, Pol ε wt/exo- and Pol ε wt/(n)exo- cell lines was measured by colony formation assays and with exposure to chronic H2O2. Colony counts for each concentration of H2O2 were normalized to those from untreated cells. Statistical significance was determined using a Student’s t test. (* p<0.05, ** p<0.01, *** p<0.0001; Error bars: SEM, N=6) B) Fraction survival as in A), with chronic exposure to 5-fluorouridine (5-FU). C) The growth rates of heterozygous clones were measures by seeding 2.5e5 cells per well in 6 well dishes in triplicate. At 24, 48, and 72 hour timepoints, cells were harvested by trypsinization, resuspended, and counted. Total cell counts, live and dead, were used. Error bars represent SEM, with N=3.

107

Chapter 4: Discovering functions of the p12 subunit through binding partners and post- translational modifications

108

INTRODUCTION

Several lines of evidence suggest that at least part of Pol ε’s role in maintaining genomic stability may be mediated through its smaller, non-enzymatic accessory subunits. The p12 subunit of Pol ε is similar in structure to histone H2B, and can bind dsDNA independently of the holoenzyme101,102. Therefore p12 may be post-translationally modified in a manner similar to histones. This chapter will focus on the putative role of p12 in the DNA damage response (DDR), and the attempt to elucidate p12 functions through identification of p12 binding partners and post- translational modifications (PTMs). Results from past studies in yeast show that p12 is not essential for viability210; however, deletion of p12 leads to an increase in spontaneous mutagenesis through a mechanism that does not involve the MMR pathway100. This is a strong indicator that p12 function aids in guarding genome stability, although the mechanism remains unknown. Further evidence in support of a role for p12 in genome stability came from its identification as a phosphorylation target of ATM, one of the key kinases in the DDR which is involved in checkpoint signalling104.

Cell cycle checkpoints are mechanisms used by cells to ensure that damaged DNA is not replicated or passed on to daughter cells, and also ensure that DNA is fully replicated before cells enter mitosis. These include the G1, S-phase and G2/M checkpoints. The G1 checkpoint blocks entry into S-phase in the event of a DSB and requires functional p53 and p12211. In the S-phase checkpoint, replication stress or DSBs signal to halt the cell cycle. This checkpoint also regulates origin firing and guards against re-licensing of origins and DNA re-replication to ensure DNA is fully replicated211. The G2/M checkpoint also responds to DBSs and functions to prevent chromosome mis-segregation211. These checkpoints can be subsequently activated after the

DDR212. Conditions which lead to activation of replication checkpoints and the DDR include fork stalling, replication stress and replication fork collapse leading to DSBs28,213. Replication stress in turn is caused by DNA re-replication (re-licensing and firing of replication origins during a single s-phase), insufficient origin firing, and various DNA lesions including double-strand breaks28.

109

Replication stress is difficult to define, and the amount of stress needed to activate checkpoint within a single cell is unknown213. However, replication stress is anything that causes pausing or halting of replication complexes. Upon stalling, aberrant structures can form at the replication fork, including large stretches of ssDNA, which activate the replication stress response28.

The checkpoint response is carried out by three kinases of the PIKK family (phosphatidyl inositol 3-kinase-like kinase): ATM (ataxia-telangiectasia mutated), ATR (ATM- and Rad3- related), and DNA-PK (DNA-dependent protein kinase)213,214. DNA-PK participates in repair of

DSBs through the NHEJ pathway, and does not participate in DDR signaling unless cells are deficient for ATM215. In contrast, global DDR activation is mediated through ATM and ATR, which start signaling cascades through phosphorylation of hundreds of substrates, most of which are phosphorylated on S/TQ consensus sequences104,213. ATM mainly responds to DSBs, while

ATR responds to replication stress216. However, these kinases have overlapping but non- redundant functions, and both unique and shared substrates214,216. ATR is essential for cell viability, whereas ATM is not. However, bi-allelic ATM mutations can lead to the disorder ataxia-telangiectasia, where affected patients are tumor-prone and characterized with neurodegeneration and immunodeficiency216. Hence, activities of these kinases help safeguard genome stability.

The structures and conditions which lead to definite activation of checkpoint are just beginning to be understood28. For DNA-PK and ATM activation, the broken ends of a DSB are first recognized by Ku70/Ku80 or Mre11/Rad50/Nbs1 (MRN) complexes respectively217. The

MRN protein complex functions at DSB ends as well as telomeres and functions in recognition, signaling, and regulation of DSB repair218. Each subunit functions as a dimer and differential subunit assembly may regulate binding of different subunits for different telomere and DSB processing218. For the purpose of this chapter, this introduction will focus on ATM activation.

ATM can be activated through the MRN complex or through acetylation of ATM by phospho-

KAT5, a protein acetyl-transferase, at H3K9me3 marks219. Once activated, ATM initiates a

110

signaling cascade by phosphorylation of key targets, including the histone variant H2AX. When

H2AX is phosphorylated at Ser139, the now termed -H2AX recruits proteins like MDC1

(mediator of DNA damage checkpoint protein 1) and RNF8–RNF168 to the damage site. These recruited factors are important for tethering of ATM at the break site and ubiquitylation of histones respectively220,221. The transient tethering of ATM at the break site leads to local spreading of the γH2AX signal222. Subsequently, the signal leads to recruitment of more proteins involved in DSB repair and response, chromatin remodeling (to provide access to the damage site), repression of local transcription, and mobility of the DSB break ends to facilitate homology search220,221,223,224. This happens through two branches: a diffusible pathway that globally signals the checkpoint through Chk2 (checkpoint kinase 2), and a local chromatin based pathway that leads to repair of the DSB225.

In the case of ATR activation, the signal is thought to be large stretches of RPA-coated ssDNA, which could result from the leading strand polymerase stalling and becoming uncoupled from the helicase28,214. ATR and its activating protein ATRIP (ATR Interacting Protein) form a constitutive complex213 that recognizes the RPA coated ssDNA signal226. In addition, Rad17-RFC

(Replication factor C clamp loader), and the 9-1-1 (Rad9-Hus1-Rad1) checkpoint clamp are recruited to RPA-coated ssDNA227 Rad17 is a clamp loader which loads the 9-1-1 (Rad9-Hus1-

Rad1) checkpoint clamp onto primer template junctions, requiring the help of TopBP1 and Pol alpha228,229. ATR is next activated through a single autophosphorylation at T1989230. ATR/ATRIP complexes can phosphorylate each other in trans on this residue230. Full activation of ATR to a hyperphosphorylated state requires Rad17, TopBP1, and 9-1-1 through mechanisms that are not well known230. The primer template junctions on which 9-1-1 is loaded are thought to be synthesized by Pol alpha231, and increase in the amount of these structures leads to increase in magnitude of the checkpoint response232. Once loaded onto these structures, the 9-1-1 clamp mediates interaction between ATR/ATRIP and TopBP1233. Finally, an activation domain within

TopBP1 leads to phosphorylation of Chk1 (Checkpoint Kinase 1) by ATR234. Thus, both

111

ATR/ATRIP and 9-1-1 clamp loading are necessary for downstream phosphorylation of ATR targets228.

Chk1 and Chk2 are the major effector kinases of ATR and ATM respectively, and there is evidence that cross-talk can occur between ATR-Chk1 and ATM-Chk2 pathways213. For example, when DSBs are induced by IR, ATR can be activated in S or G2 phases and phosphorylate Chk1 downstream of ATM and Mre11 function213. Mre11 is an endo/exonuclease responsible for the DNA end resection that generate stretches of RPA-coated ssDNA upon DSB formation235. The activation of Chk1 or Chk2 leads to stalling of the cell cycle, restart of collapsed forks promotion of DNA repair, and stopping the firing of dormant origins28,212,214,217.

There is still much to be determined in how these functions are carried out, due in part to the fact that substrates of Chk1 and Chk2 partially overlap and that Chk1 deletion is not compatible with cell viability236. During S or G2/M phases, the checkpoint kinases accomplish cell cycle block by inhibiting Cdc25 phosphatases from activating CDKs (cyclin dependent kinases).212,217 Other key substrates of checkpoint kinases include p53 and Mdm4 (affecting apoptosis), BRCA2 and

RAD51 (affecting DSB repair), histone H3 (affecting transcription and DNA replication), and

E2F/Rb (affecting cell-cycle specific transcription)236. Importantly, in yeast, checkpoint kinases play a role in stabilizing stalled forks, as in their absence, increased fork collapse leads to increased genome instability236.

In the signaling pathways discussed above, PTMs are very important. These PTMs include phosphorylation, acetylation, methylation, polyADP-ribosylation, sumoylation, and ubiquitylation217,220,221,225,237–239. In the DDR, PTMs modulate protein interactions and enzymatic activities for effective DNA damage repair239,240. Discovery of new modifications on signaling proteins is ongoing, suggesting that perhaps many remain to be discovered239. The first techniques used to identify PTMs included mobility shift on Western blot or the use of modification specific antibodies241. However, these approaches required prior knowledge of a modification. Other issues with modification-specific antibodies they are time-consuming to make, may recognize

112

multiple substrates, and may only allow for identification of one modified species at a time.

Verifying the site of modification is also time consuming and requires mutagenesis. In addition, secondary sites may accept the modification in the absence of primary sites, leading to confusing results. Finally, mutating one or more residues in the protein could disrupt structure242.

Currently, the most widely used technique for studying PTMs is mass spectrometry (MS).

MS is a technique that can identify the mass and abundance of ions in a substance based on their charge to mass ratio and time of flight in an electric or magnetic field. This technique has several advantages over other techniques, especially when the ability to simultaneously identify more than one PTM is required. MS is an unbiased technique for identifying PTMs since ΔM, the difference between observed and calculated theoretical mass, is used to identify a modified peptide241. This approach can easily identify novel modifications. New modifications to histones have recently been discovered that add to the complexity of the histone code, including crotonylation and formylation243. Another reason MS is ideal for detecting modifications is that there is essentially no limit to the number and type of modifications that can be found on a single peptide. New advancements in the technique can ask how modifications change during different cellular processes. One such technique developed is SILAC (stable isotope labeling by amino acids in cell culture), which can easily detect isotopic shifts in MS spectra and quantitatively asses differences between labeled and unlabeled samples243.

Although MS is very powerful, the technique does have limitations when assessing

PTMs: a ΔM value may represent a variable peptide sequence rather than a modification, modifications may be labile during sample preparation or modifications may react with other peptides during analysis. For example, the transfer of a methyl group to another residue within a peptide can occur243. Detection of ubiquitin or UBL (Ubiquitin-like molecule) conjugation by MS is more difficult for several reasons: the large number of UBLs, their high sequence variety244, and variable mass shifts due to variable tryptic digestion of UBLs 242,244. In addition, ubiquitiylations do not occur in conserved consensus motifs as phosphorylations do242, and some

113

ubiquitylations occur through highly labile ester or thioester bonds on serine (S), threonine (T), and cysteine (C) moieties245. Large scale identification of ubiquitylated sites is difficult due to heterogeneity in UBL chain length and structure caused by promiscuous conjugation to multiple lysines (Ks) within themselves246. In one study, the authors expressed recombinant ubiquitin engineered to remove Ks. Using immunoprecipitation followed by MS, they identified 1392 ubiquitylation sites in 293 cells246. Very interestingly, this screen identified K51 of p12 (peptide

ALVKADPDVTLAGQEAIFILAR)246.

Ubiquitylation of proteins occurs during a three step process. First, a thioester bond is formed between an E1 activating enzyme and the C-terminal glycine of ubiquitin in an ATP- dependent manner. Second, ubiquitin is transferred an E2-conjugating enzyme. Finally, an E3- ligase facilitates transfer of ubiquitin from the E2-Ub complex to an acceptor substrate. Rarely, residues other than K can be modified by ubiquitin including S, T, C, and the N-terminus itself242,247–249. In one study, a K-less protein still showed ubiquitin-modified forms that were sensitive to mild alkaline lysis (1 M hydroxylamine, pH 9.0, or 0.1 M NaOH) and therefore hypothesized to be ester bonds247. In another study, S and T residues were determined to be the primary acceptors of ubiquitin moieties. When the S and T residues were mutated, Ks were modified instead248. This suggests that the ubiquitylation of both K and non-K residues may be achieved by the same conjugating enzymes. This has been shown in cases where the E3 ligase was identified245. In this chapter, we wanted to assess the functional significance(s) of Pol  subunit post-translational modification. However, Pol  PTMs were only reported as targets in high-throughput proteomics assays. Therefore we first set out to independently verify these reports. For example, Matsuoka et al. reported that p12-S25 was phosphorylated by ATM/ATR in response to ionizing radiation 104. During this quest, we discovered further higher molecular weight modifications of p12, which began to characterize.

114

METHODS

4.1 Cloning of p12 constructs

FLAG-p12-pIRES2eGFP cloning: Human p12 was amplified from a p12 construct in the pDEST-565 vector previously made by Zachary F. Pursell using primers “EcoRI-FLAGp12 Fwd” and “BamHI-p12 Rev” (see Table 4.1) which encoded EcoRI and BamHI cut sites respectively.

The forward primer encoded a FLAG epitope tag of sequence DYKDDDDK. The insert was amplified from the pDEST template by PCR. Amplified insert and empty vector were digested with EcoRI and BamHI according to standard NEB protocol. Digested insert and vector were gel purified using Qiagen gel extraction kit, and ligated with T4 ligase (Invitrogen) according to standard protocol. Other constructs were made similarly, using primers depicted in Table 4.1.

FLAG-p12 S25A and S25E mutants were made by site directed mutagenesis using primers listed in Table 4.1.

4.2 Expression of p12 constructs.

FLAG-p12 constructs were transiently expressed in HCT116 human colon cancer cells by transfection with Lipofectamine 2000 (Invitrogen). Briefly, 3 µg DNA and 10 µL

Lipofectamine 2000 were separately diluted in the appropriate amount of serum free medium

(250 µL for 6-well plate format). These mixtures were incubated 5 minutes at room temperature, then combined and incubated for an additional 20 minutes at room temperature. The transfection mixture was then added drop-wise to wells containing cells that were 80-90% confluent and fresh serum containing medium. The transfected cells were incubated 24-72 hours before analysis.

4.3 Lysis and Western Blotting.

Lysis: Cells transiently transfected with p12 constructs according to Method 4.2. 24 or

48 hours post-transfection, cells were washed with PBS and harvested using a rubber cell scraper.

Cells were pelleted by centrifugation at 200xg for 5 minutes and resuspended in lysis buffer [50

115

mM Tris, 150 mM NaCl, 1% Triton-X, pH 7.4; Roche protease inhibitor tablets added prior to lysis]. For lysis, cells were mixed in lysis buffer and incubated on ice for 10 mins. Lysates were clarified by centrifugation at 16,000xg for 10 minutes, and supernatants were transferred to new tubes. In the case of looking for phosphorylation, Sigma Phosphate Inhibitor was added to lysis buffer.

Western blotting: Cell-free extracts were prepared with 5x LDS sample buffer

(Invitrogen) and 10x dithiothreitol (DTT, 1M) to 20 µL final volume. These samples were mixed, boiled for 5 minutes, then chilled on ice. Samples were resolved on pre-poured 10% Bis-Tris polyacrylamide gels (Invitrogen) in MES running buffer (Invitrogen) at 200 volts for 35-45 minutes. Gels were transferred to PVDF membrane using the Invitrogen X-cell transfer system for 1.5 hours at 30 volts. After transfer, blots were stained with Ponceau stain [0.1% Ponceau S,

5% Acetic Acid] and cut to the appropriate size. Blots were the blocked for 1 hour at room temperature or at 4 C overnight in 5% milk/Tris-buffered saline with Tween-20 (TBST). [TBST:

50 mM Tris, 150 mM NaCl, 0.05% Tween 20, pH 7.4]. Primary and secondary antibodies were diluted in 1% milk/TBST. Primary antibody was incubated for 1 hour at room temperature or at 4

C overnight at 1:1000-1:10,000, and secondary antibody was incubated for 1-2 hours at room temperature, at a 1:10,000 dilution. In between antibody incubations, blots were washed with

TBST with 3 x 5 minute incubations. For detection, blots were incubated for 2 minutes in

Invitrogen ECL reagent, and imaged with a Fujifilm-LAS imager.

4.4 Immunoprecipitation of p12 constructs.

Transfected cells were lysed as described in Method 4.3. The same lysis buffer was used to prepare anti-FLAG beads. Bead slurry was pipetted with a cut pipet tip to avoid shearing of beads. The appropriate amount of beads was washed with 3 x 1 ml of lysis buffer. Beads were pelleted for 3 minutes at 5000xg at 4∘C between washes. After washing, beads were aliquoted into 1.5 ml Eppendorf or 15 ml conical tubes, and 200 µL to 10 mls lysate was added. The

116

samples were incubated on a nutator overnight at 4C. The next day, beads were spun down and supernatant was removed. The beads were washed 3 times with 1 ml wash buffer [50 mM Tris,

150 mM NaCl, and 0.05% Triton-x 100 at pH 7.4]. Immunoprecipitated protein was then eluted from beads by incubation with FLAG peptide at 50-200 µg/ml in 20 to 50 µL.

4.5 IR, HU, and dT treatments.

HCT116 cells were seeded in 6 well plates with 4e5-1e6 cells per well. 16 hours later, cells were transfected as described in Method 4.2. After 24 hours post-transfection, media was changed to that containing dialyzed FBS and deoxy-thymidine (dT) or hydroxyurea (HU) were added to wells to specific concentrations. Dialyzed FBS was used because it does not contain dNTPs, and dT and HU cause replication stress through distortion of dNTP pools. Cells were treated for 24 hours before harvest and analysis by western blot. For IR treatment, cells were exposed to gamma particles from a 137Cesium source for indicated timepoints.

Cytotoxicity assays: For p12 cytotoxicity, cells were transfected and treated as described.

24 hours post-treatment, cells were trypsinized and reseeded 500 wells per well in 6 well plates in replicates of three. The plates were incubated for 8 days to allow for colony formation. The plates were then washed with PBS and stained with crystal violet [0.2% crystal violet, 2.4% acetic acid,

5% methanol].

4.6 Flow Cytometry.

Cells were transfected and treated with HU and dT as in Method 4.5. After treatment, cells were harvested using a cell scraper and each sample was washed in 5 mls of PBS. Cells were then resuspended in 500 µL PBS and thoroughly mixed with 200 µL pipet tip to remove clumps. Next, 500 µL of cells in PBS were added to 4.5 mls of 70% ethanol which had been pre- chilled on ice in 15 ml conical tubes. Cells were stained with propidium iodide as follows: roughly 2e6 cells were collected for each sample by trypsinization and washed in 1 ml PBS. Cells

117

were then resuspended to single cell suspension 500 µL PBS. These suspensions were added to

4.5 mls of 70% EtOH prechilled on ice, and incubated on ice for two hours or more. Next, the cells were pelleted at 200g for 5 minutes. Cell pellets were washed in 5 mls of PBS for one minute, and re-spun. Cells were then resuspended in 1 ml PI staining solution [0.1% Triton-X

100, 0.2 mg/ml RNAse A, and 20 ug/ml propidium iodide in PBS, pH 7.4] for 30 minutes or more before analysis. Samples analyzed at the Cell Cycle Analysis Core at Tulane University on a BD LSRII Analyzer. Data was analyzed using FACSDiva software.

4.7 Identification of p12 binding partners and modified forms by mass spectrometry.

Expression and Immnuoprecipitation: FLAG-p12 and empty pIRES2-eGFP were expressed and immunoprecipitated as described in, Methods 4.2, 4.3 and 4.4. Briefly, lysates were combined from transiently transfected HCT116 cells, from 10 x 10cm dishes. Each plate was lysed in 1 ml of lysis buffer, and 5 mls of combined lysate for each construct was incubated with 80 µL bed volume of beads. After incubation overnight at 4C and washing, sample was eluted from the beads with 100 µL FLAG peptide at 200 µg/ml for 2 hours at 4C on a nutator.

The eluted samples were then run on a 10% Bis-Tris Gel, and stained with Coomassie stain. The stained gel was visualized on a light box to identify bands unique to FLAG-p12 vector compared to lysates from empty vector transfection. Bands unique to the FLAG-p12 sample were cut from the gel and analyzed by MS2 at the LSU-HSC Mass Spectrometry Core Facility. Protein was extracted from the gel and digested with Trypsin overnight. Samples were then analyzed with a

Thermo Finnigan Deca XP Max Nanospray LC/MS-MS Mass Spectrometer. The Mascot server was used to identify proteins from peptide peaks.

118

RESULTS

4.1 Phosphorylation of p12

The first goal in this study was to verify phosphorylation of p12. p12 had previously been identified in a screen to find targets of ATM and ATR kinases after treatment with IR. In the initial study, 293T cells were exposed to 10 grays of IR and peptides were IP’d with anti-S/TQ antibodies after trypsin digest. In this study, p12 peptides were picked up by an anti-Rad17 S645 and an antibody to recognize the sequence SGpSQP104. In ourexperiment, HCT116 cells transfected with p12 constructs were treated with 10 grays of IR. As shown in Figure 4.1A, phosphorylation of p12 was seen on FLAG-p12, FLAGp12 1-92, and FLAG-p12-S25A lysates treated with phosphatase inhibitors, but not without. This is shown by a slight mobility shift, however full separation of the bands was not seen. p12 actually migrates at 20 kDa, likely due to its high amount of acidic residues, and therefore making it difficult to see separation due to phosphorylation. Unexpectedly, the slight mobility shift in irradiated cells was also seen in

FLAG-p12 expressing cells that had not been treated with IR. A few phenomena could explain these results. First, a secondary phosphorylation site may accept the modification in the absence of S25242. Second, different p12 residues in p12 may be the direct of other kinases. p12 contains 1 tyrosine (Y), 4 S, and 6 T and residues. The S25A construct is modified, further raising the possibility that residues other than S25 are phosphorylated on p12. Interestingly, when attempting to observe p12 phosphorylation, other unknown modifications were observed with a molecular weight shift that was larger than expected for a phosphorylation event. This will be discussed in the next section. If p12 is an ATM phosphorylation target involved in DSB signaling, it is hypothesized to sensitize cells to IR. To address this, FLAG-p12 was cloned into pIRES2-eGFP, and an S25A mutant was made that lacked the ability to be phosphorylated using site-directed mutagenesis. These constructs were overexpressed in HCT116 cells and treated with increasing doses of IR to a maximum of 10 Gy. The results of the cytotoxicity assay are shown in Figure

119

4.1B. Each treatment group was normalized to untreated cells. The % of colony formation compared to untreated cells was significantly different between FLAG-p12 and pIRES2-eGFP empty vector after irradiation (p <0.05 in each case, Student’s T-test). However, for FLAG-p12-

S25A, there was also slight sensitization during IR treatment. This suggests that if p12 affects cell survival after IR treatment, it is through a mechanism not involving S25 phosphorylation, or that other phosphorylation sites can substitute for S25 function.

4.2 Verification of p12 modifications by mass spectrometry

The Western blot for FLAG-p12 expression showed curious higher molecular weight species that positively cross-reacted with -FLAG antibody only when FLAG-p12 was expressed

(Figure 4.2A, compare input lanes for empty vector and FLAG-p12). These mobility shifts (+17,

+22, +27, +32) cannot be explained by phosphorylation alone (see Figure 4.1A). These include bands around 40, 45, 50 and 55 kDa that are not seen in lysates from an empty vector control

(Figure 4.2A, C,D). These modifications were also seen for FLAG-p12 expressed in 293 and

HeLa cells (data not shown). To further determine if these bands were p12 specific, lysates from

HCT116 cells expressing FLAG-p12 were IP’d using anti-FLAG beads. As shown in Figure

4.1A, these modified forms IP’d with anti-FLAG beads and were not seen in empty vector transfected cells. These bands were analyzed by LC-MS/MS, and identified as p12. However, a majority of peaks in the LC-MS/MS spectra could not be assigned. This suggests that p12 could be heavily modified. I have an ongoing collaboration with the Zhao lab at the University of Texas to use the PTMap algorithm for further analysis of p12 modifications250.

4.3 p12 is modified within the last 25 residues.

The modifications showed a shift comparable in size to that expected for SUMOylated or ubiquitlyated proteins251,252, so it was first suspected that K residues are modified. p12 contains K

120

residues at 47, 51, 80, 90, and 92. Truncated forms of FLAG-p12 were cloned in the pIRES2- eGFP in order to identify which K’s are modified. First, three constructs were made that encompassed either the N-terminus plus the central HFM domain, the central HFM domain plus the C terminus, or the central HFM domain alone, lacking N and C-terminal tails (see Figure 4.4).

The N-terminus/HFM constructs did not express in HCT116 cells, although the C-terminal construct expressed well. Therefore it could not be initially determined whether a modification resided in the N-terminus of the protein, and it is likely that the N-terminal portion in needed for proper folding and stability of p12 (see Table 4.2). The HFM/C-terminus construct, encompassing residues 52-117, was modified, as shown in Figure 4.2B. To further narrow down the modification site, p12C78 and p12C81 were constructed and as shown in Figure 4.2B both constructs are modified. To confirm this modification site, Ks 80, 90, and 92 were mutated to make the “K3R” construct, but perplexingly, this construct was also modified. In addition, a construct with all five Ks mutated to arginines (Rs) was modified (data not shown). This suggests that residue(s) other than K are modified on p12. In addition, the p12 1-92 construct, containing all 5 Ks, lacked the modification (Figure 4.1A). One explanation for these results is that the modification site resides in the last 25 residues (TLQRRDLDNAIEAVDEFAFLEGTLD).

Another explanation is that these residues are responsible for recruiting a SUMO or Ubi ligase which modifies other portions of p12.In at least one study, only mutation of all possible acceptor sites abolished ubiquitilyation245. Single T93A and T115A mutations did not abolish modification

(data not shown), but a construct lacking all possible sites of ubiquitylation was not made.

To identify ubiquitylation of p12 by Western blot, FLAG-p12 was expressed in HCT116 cells and immunoprecipitated with anti-FLAG beads. The eluates were blotted with the FK2 antibody, which recognizes mono- and poly-ubiquitin moieties. Ubiquitin is an 8.5 kDa protein, and modified p12 is found in 40, 45, 50, and 55 kDa approximate sizes (Figure 4.2D). If modified by ubiquitin, the 45 kDa form of p12 may represent a di- or tri- chain. An anti-FK2 blot of empty

121

vector or FLAG-p12 transfected cells showed that two bands were specific to the FLAG-p12 IP elution (Figure 4.2C). These bands were of approximate sizes of 50 and 55 kDa. These results support the idea that p12 may be poly-ubiquitylated in short chains.

4.4 Replication stress affects p12 ubiquitylation

In an effort to identify p12 phosphorylation during replication stress or in response to double strand breaks, p12 transfected HCT116 cells were treated with hydroxyurea (HU), deoxythymidine (dT) and ionizing radiation (IR). Treatment with HU causes replication fork stalling because it leads to depletion of dNTP pools253. dT inhibits synthesis of dC to create an imbalance in dNTP pools which also leads to replication fork stalling254. Cells were collected for analysis by anti-FLAG Western blot, flow cytometry and propidium iodide staining. Although

HU treatment at 0.5 and 1 mM successfully blocked cells in S-phase (increase from 8% to 22 and

17%, respectively), no change in p12 modification was seen (Figure 4.4). dT treatment, however, induced a change in the relative amount of 45 kDa and 55 kDa forms of p12 as compared to untreated cells. In addition, at 2mM dT treatment a p12 specific band appears below the form that migrates at 20 kDa. This lower molecular weight species also faintly appears after treatment with

10 mM dT, 0.5 mM HU, and 1 mM HU.

4.5 Determination of p12 binding partners

In order to help determine the function of p12, we set out to identify its binding partners using comparative immunoprecipitation and protein identification by peptide sequencing, FLAG- p12 was expressed in HCT116 cells and immunoprecipitated using anti-FLAG resin. Cells transfected with empty vector were used as a control. Proteins judged to be specific to immunoprecipitation in cells expressing FLAG-p12 were cut out and peptide sequenced (Figure

4.5). Notably, the largest band (band #1) contained the p261 catalytic subunit, demonstrating that

122

some of the overexpressed p12 is incorporated into the holoenzyme. This also suggests that other binding partners found in the screen are partners of p12 with functional consequence. The results of top MASCOT search score are listed in Table 4.3. Notable interaction partners of p12 that were found included TOP1, HSP90, nucleolin, and DNA-dependent protein kinase catalytic subunit (PRKDC). Not surprisingly, these proteins are all involved in maintaining genomic stability255–260. Other weakly identified genome repair factors include: PRKDC (DNA-PK catalytic subunit), XRCC5 (KU80), and XRCC6 (KU70), which function in recognition and repair of DSBs.

Another group of binding partners identified for p12 include a family of heat shock proteins and heterogeneous nuclear ribonucleoproteins. The latter family of proteins binds newly transcribed mRNAs to mark them as un-fully processed, and may have been IP’d non-specifically due to their ubiquitous nature. However, nucleolin and HSP90 were binding partners with the highest MASCOT scores. Both of these proteins function in stabilization of certain mRNAs during mitosis261. The role of p12 in conjunction with mRNA stability is currently unclear, as no prior evidence suggests it has a function here. Nucleolin and HSP90 also respond to heat shock stress, and ionizing radiation255. During this response, nucleolin binds RPA and may inhibit initiation of DNA replication, and may therefore bind to p12 or Pol ε to accomplish this mechanism255,262. In support of this, our screen identified that p12 binds the replication licensing factors MCM3, MCM5, and MCM7. These p12 binding partners have not yet been verified by other means. The possible functions of p12 in IR and heat shock response, and replication initiation, will be important to investigate. These functions may be regulated by modifications of p12 seen above.

123

DISCUSSION

p12 is a protein about which very little is known. In fact, there are no papers published on human p12 function. This study only began to scratch the surface to uncover roles of p12, based on analysis of p12 post-translational modifications and binding partners. The results from this study suggest possible roles for p12 in response to replication stress and ionizing radiation.

Phosphorylation of p12

Phosphorylation of p12 was first identified by Matsuoka et al. using a proteomics approach104. There have been a few other MS-based searches to answer what proteins are targets

238,263 of ATM/ATR . Both searches have relied on purifying phospho-peptides using TiO2 and

ERLIC (Electrostatic Repulsion-Hydrophilic Interaction Chromatography) resins. These studies did not identify p12 phosphorylation. However, TiO2 resin preferentially separates singly phosphorylated peptides, and this may be due to the fact that peptides with multiple negative charges are difficult to elute from the resin. The p12 peptide fragment containing S25 would have a net charge of -4, and thus be difficult to identify using this method. The study that discovered p12 phosphorylation used an array of 60 phospho-specific antibodies to IP phosphor-peptides after SILAC-labeling and treatment of cells with IR. In that study, p12 was IPed using an antibody that detects phosphor-Rad17 (S645) and an antibody raised against the sequence

SGpSQP104. The phosphorylation of p12 was not fully verified in this study, however an enrichment of p12 modification consistent with a phosphorylation size shift was observed with the use of phosphatase inhibitors.. The data is inconclusive as to whether there is an enrichment of S25 phosphorylation after treatment of IR, as phosphor-p12-S25A was seen by Western, even in untreated cells. Multiple sites of p12 may be phosphorylated, and may mask a change in phospho-S25. More detailed analysis is needed to verify S25 phosphorylation as a result of IR treatment. Interestingly, MS analysis showed that multiple peptides of p12 may be

124

phosphorylated, including on residues S9, T11, and S25, however, due to a larger number of unassigned peaks it was difficult to interpret the results. Many p12 peptides were not identified by in the LC-MS samples which means they may be heavily modified with multiple phosphorylations and difficult to assign by conventional methods. This raises the questions of what kinases may be responsible for phosphorylation of different p12 residues, and how these modifications contribute to p12 function.

p12 may be a direct phosphorylation target of ATM, and this should be determined in the future. Alternatively, p12 may be phosphorylated by another kinase during ATM activation. In one study, 40% of phosphorylations after DSBs induced by IR were ATM-independent263. One possibility is DNA-PK, which was shown to be a binding partner of p12 in this study. This kinase also phosphorylates proteins at an S/TQ consensus motif264. Another possibility is CK2 (Casein kinase 2). This kinase is responsible for continuously phosphorylating MDC1, which allows for spreading of phospho-H2AX through tethering of ATM238. The consensus sequence of CK2 is

S/TXXD/E265, and T11 of p12 fits this motif (sequence TPRE). CK2 is a constitutively active kinase and its substrates Smc1 and XRCC1 are constitutively phosphorylated266. In the case of

Smc1, the constitutively phosphorylated site allows for phosphorylation of a second adjacent site during DDR; abrogation of the CK2 phosphorylation site leads to checkpoint defect due to lack of phosphorylation at the second site266. A similar mechanism may occur with constitutive phosphorylation of p12-T11 by CK2 allowing for proper phosphorylation of S25 during ATM activation. In addition, the p12 subunit of Pol δ is a verified phosphorylation target of CK277.

Another kinase which phosphorylates a similar consensus site and that is also involved in checkpoint signaling is protein kinase C (PKC). This kinase is necessary for phosphorylation of

Rad53 (Chk2 in humans) in yeast, and is a phospho-target of Tel1 (ATM in humans)267. In humans, one PKC isoform, PKC-gamma, also is necessary for Chk2 activation267. The consensus site for PKC is S/T-X-R/K 268,269, which is also found in p12 at T11. Another way PKC may be related to p12 function is through control of the cell cycle. It is currently unknown whether p12 is

125

involved in here, however, the p12 subunit of Pol δ (POLD4) is phosphorylated and degraded during licensing of DNA replication270,271. PKC activity also helps Pol η accumulate at UV stalled forks to participate in lesion bypass272.

p12 may also be phosphorylated by ATR. ATR and ATM share the S/TQ consensus motif, which is found at p12-S25. Activation of p12 by ATR or Chk1 may serve the purpose of replisome stabilization at stalled forks, a process for which these kinases are necessary273.

Stabilization of the replisome needs to be tightly regulated, so that forks can be restarted in a timely fashion to avoid fork collapse. However, forks should not restart until damage is fully repaired as this could lead to DSBs. Checkpoint regulation of mitosis, gene expression, and late origin firing actually contribute little to cell viability, whereas stabilization of the replisome to prevent fork collapse is very important to cell viability274. It is largely thought that fork protection is accomplished by the same proteins which comprise the Replication pausing complex (RPC)

(e.g. Tipin, Tim1 and Claspin)273. These proteins are necessary for fork progression in normal S- phase, and also participate in checkpoint signaling273. One of these components, Claspin/Mrc1, binds the catalytic subunit of Pol ε in yeast. Phosphorylation of Mrc1 leads to Mrc1 association with the C-terminus of Pol2, abrogating N-terminal interaction275. This interaction may facilitate tethering of the leading strand polymerase with MCM helicase to avoid deleterious fork unwinding273. The phosphorylation of p12 in response to checkpoint activation may serve to disengage p12 from the holoenzyme, allowing binding of Claspin to the catalytic subunit at stalled forks. Otherwise, the phosphorylation may enhance tethering of p12 at the fork, or block p12 from ubiquitin-mediated degradation. Degradation of replisome components has also been shown to occur in yeast in the absence of fork protection component Swi1 (Timeless in humans)276.

Ubiquitylation of p12

126

p12 was identified as a ubiquitylated target prior to this study246. Preliminary western blot evidence from this work showed p12 may be mono-, di-, and poly-ubiquitylated (see Figure

4.2C). However, mutation of all Ks within p12 did not abrogate the modification, even with K- less epitope tags. The consequences of p12 ubiquitylation may be similar to those of p12 phosphorylation: it may regulate Pol ε holoenzyme assembly/disassembly, protein stability, DDR signaling, or replication progression. The results from this work, although very preliminary, show that ubiquitylation dynamics of p12 change during dT-mediated fork stalling. HU treatment, which blocks DNA synthesis earlier in S-phase, did not show appreciable change in p12 modification. This demonstrates that p12 may have different functions in late vs early S-phase and its association with the Pol ε holoenzyme may be temporally regulated. In early S-phase in budding yeast, Pol ε localizes to sites of replication initiation and is part of the pre-replication complex90. In this stage, Dpb2/p59 is present, but the association of Dpb4/p12 was not determined90.

Pol ε may occur in two or more distinct complexes, with and without p12, to mediate different functions of Pol ε during the cell cycle. There is recent evidence that this occurs with

Pol δ, which has at least two distinct complexes: Pol δ4 and Pol δ3, the latter of which forms after dissociation and ubiquitin-mediated degradation of its p12 subunit277–279. Temporal regulation of

Pol ε’s p12 incorporation into the holoenzyme during normal S-phase may likewise be regulated by ubiquitylation. This would be an interesting area for future study of p12. Free p12 may be permitted to participate in stalled fork and DDR signaling, or may be degraded after release from the holoenyme. Our results in Figure 1 are consistent with the former mechanism, where an increase in p12 expression sensitizes cells to IR.

As with phosphorylation, p12 association with the holoenzyme may affect Pol ε and claspin interaction during tethering of the replisome at stalled forks. Alternatively, p12 may participate in fork stalling only in late S-phase. Activation of ATR might lead to different DNA

127

repair pathways in early vs late S-phase, with transcription coupled repair occurring in early S- phase, and delay of mitosis by checkpoint signaling in late S-phase280.

The ubiquitin ligases responsible for p12 modification are unknown, and the modified sites are unknown. Multiple E3 ligases are recruited to DSB sites including RNF20-RNF40,

Rad18, BRCA1, RNF8 and RNF168220,240,281. RNF8 is a candidate E3 ligase of p12, as it modifies histones H2A and H2B and p12 is similar in sequence and predicted structure to H2B282. Mono- ubiquitylation of these histones is necessary for timely recruitment of HR and NHEJ repair factors during DSB repair220,281. Likewise, p12 may be mono-ubiquitylated on one or more sites, to participate in DDR signaling, and recruitment of proteins to sites of DSBs. p12 is likely regulated by multiple ubiquitin ligases, as is the case for POLD4277. As shown in Table 4.3, a few putative p12 binding partners identified in this study are family members of ubiquitin metabolism, including the trim25 E3 ubiquitin ligase (also called EFP) and USP9X, a protein de- ubiquitylase. This protein is also an ISG15 E3 ligase which modifies PCNA with the ubiquitin- like molecule ISG15, to promote cease and desist of lesion bypass283.

Clearly, there is much more work to be done in uncovering the functions of p12, and the

PTMs involved in the processes. In addition to phosphorylation and ubiquitylation, other modifications may be involved. Understanding these may aid in understanding how p12 contributes to genome stability.

128

Table 4.1 Primers used to make p12 constructs

Primer Name Primer Sequence (5ʹ-3ʹ) Construct(s) CGGAATTCCACAATGGACTACAAAGACG EcoRI-FLAGp12 Fwd ATGACGACAAGGCGGCGGCGGCGGCGGC FLAG-p12 AGGA BamHI-p12 Rev CGGGATCCTCAATCTAAAGTACCTTCCAG FLAG-p12 GCTGGGGAGGCAGCGGCCGCGCAGCCCC p12 S25A Fwd FLAG-p12 S25A

AGGCCCCAACG CGTTGGGGCCTGGGGCTGCGCGGCCGCTG p12 S25A Rev FLAG-p12 S25A CCTCCCCAGC GCTGGGGAGGCAGCGGCCGAACAGCCCC p12 S25E Fwd FLAG-p12 S25E AGGCCCCAACG CGTTGGGGCCTGGGGCTGTTCGGCCGCTG p12 S25E Rev FLAG-p12 S25E CCTCCCCAGC BamHI-p12N rev CGATGGATCCTCAGTCAGGTCTCCAC FLAG p12-N GATCCTCGAGCCACCATGGACTACAAAGA p12C Fwd FLAG-p12C CGATGACGACAA G GATCCCGACGTGACG CGATGAATTCTCAGTCAATCTAAAGTACC p12C rev FLAG-p12C TTC GATCGAATTCCCACCATGGACTACAAAGA p12 HFMmin Fwd CGATGACGACAAGCTGGCGCGAGTGAAG FLAG-p12-HFM GCC CGATGGATCCTCAGTCATATTGCATTATC p12 HFMmin Rev FLAG-p12-HFM CAAGTC GATCCTCGAGCCACCATGGACTACAAAGA p12C78Fwd FLAG-p12C78 CGATGACGAC ATTGCAAAAGATGCCTAC GATCCTCGAGCCACCATGGACTACAAAGA p12C81Fwd FLAG-p12C81 CGATGACGAC GATGCCTACTGTTGCGCTC GTTGCGCTCAGCAGGGACGAAGGCGAAC FLAG-p12K90R, p12K90-92R Fwd CCTTCAGAGGAGAGAC K92R GTCTCTCCTCTGAAGGGTTCGCCTTCGTCC FLAG-p12K90R, p12K90-92R Rev CTGCTGAGCGCAAC K92R GTTTGTGGAGACCATTGCACGAGATGCCT p12K80R Fwd FLAG-p12K80R AC TGTTGCGC GCGCAACAGTAGGCATCTCGTGCAATGGT p12K80R Rev FLAG-p12K80R CTCCACAAAC CGGAATTCCACCATGCACCATCATCACCA His-p12 EcoRI Fwd His-p12 TCATGCGGCGGCGGCGGCGGCAGGA GCTAGGATCCCTACTTATTTCCTTTTTCCC p12 1-92 BamHI Rev FLAG-p12 1-92 TGCTG TCGGAATTCCCACCATGTACCCATACGAT HAp12EcoRI Fwd GTTCCAGATTACGCTGCGGCGGCGGCGGC HA p12 GGCAGG GCCTCTGGCGCGAGTGAGAGCCTTGGTGA FLAG-p12 K47R, p12K47-51R Fwd GGGCAGATCCCGACGTGACG K51R

129

CGTCACGTCGGGATCTGCCCTCACCAAGG FLAG-p12 K47R, p12K47-51R Rev CTCTCACTCGCGCCAGAGGC K51R GCTTTTCTGGAAGGTGCGTTAGATTGAGG p12 T115A Fwd FLAG-p12 T115A ATCCGCC GGCGGATCCTCAATCTAACGCACCTTCCA p12 T115A Rev FLAG-p12 T115A GAAAAGC GCAGGGAAAAAGGAAAGCCCTTCAGAGG p12 T93A Fwd FLAG-p12 T93A AGAGACTTG G CCAAGTCTCTCCTCTGAAGGGCTTTCCTTT p12 T93A Rev FLAG-p12 T93A TTCCCTGC (Table 4.1 continued)

130

Table 4.2: Summary of p12 construct modification and stability. NT: not tested

Amino Contains p12 Acid modifications Stability Construct Residues M1 and M2? FLAG-p12 1-117 Y +++ full length FLAG-p12N 1-77 NT - FLAG-p12C 52-117 Y +++ FLAG-p12 43-103 NT - HFM min FLAG-p12 1- 1-92 Y +++ 92 FLAG p12 78-117 N + C78 FLAG p12 81-117 N + C81 HA-p12 1-117 Y +++ His-p12 1-117 Y +++ FLAG-p12 1-117 Y +++ K80R FLAG-p12 1-117 Y +++ K90,92R FLAG-p12 1-117 Y +++ K80,90,92R FLAG-p12 1-117 Y +++ “5K5R” p12 T115A 1-117 Y +++

p12 T93A 1-117 Y +++ p12-S25A, 1-117 Y +++ T93A p12 S25A 1-117 Y +++ p12-S25E 1-117 Y +++

131

Table 4.3 p12 binding partners identified by MS

Band Protein Description MW Peptides Mascot score 1 DPOE1 Human DNA polymerase epsilon 261515 51 1635 catalytic subunit 1 GCN1L Translational activator GCN1 292755 25 526 1 FAS Fatty acid synthase 273423 20 448 1 PRKDC DNA-dependent protein kinase 469084 5 129 catalytic subunit 2 NUCL Nucleolin 76613 62 1727 2 EF2 Elongation factor 2 95337 7 249 2 HSP90B Heat shock protein HSP-90 beta 83263 8 243 2 HSP90A Heat shock protein HSP-90 alpha 84659 7 231 2 IMB1 Importin subunit beta-1 97169 3 168 2 TOP1 DNA topoisomerase 1 90725 5 142 2 ENPL Endoplasmin 92468 4 137 2 COPG Coatomer subunit gamma 97717 3 124 2 DDX21 Nucleolar RNA helicase 2 87343 5 102 2 MCM3 DNA replication licensing factor 90980 3 90 MCM3 3 K6PP 6-phosphofructokinase type c 85595 35 1186 3 HSP90B Heat shock protein HSP 90 beta 83263 30 849 3 NUCL Nucleolin 76613 15 469 3 HNRPM Heterogenous nuclear 77515 12 426 ribonucleoprotein M 3 K6PL 6-phosphofructokinase 85018 9 319 3 ILF3 Interleukin enhancer-binding 95337 6 313 factor 3 3 K6PF 6-phosphofructokinase 85182 11 285 3 HSP7C Heat shock cognate 71 kDa 70897 2 160 3 XRCC5 X-ray repair cross- 82704 6 132 complementing protein 5 3 SSRP1 FACT complex subunit SSRP1 81074 3 132 3 MCM7 DNA replication licensing factor 81307 5 118 MCM7 4 HSP7C Heat shock cognate 71 kDa 70897 84 2849 4 GRP78 78 kDa glucose-regulated protein 72332 45 1883 4 HSP71 Heat shock 70 kDa protein 1A/1B 70051 45 1660 4 GRP75 Stress-70 protein, mitochondrial 73680 24 1033 4 HSP76 Heat shock 70 kDa protein 6 71027 23 815 4 CKAP4 Cytoskeleton-associated protein 4 66022 17 702 4 HNRPM Heterogenous nuclear 77515 17 612 ribonucleoprotein M 4 NUCL nucleolin 76613 9 358 4 DDX5 Probable ATP-dependent RNA 69147 7 194 helicase DDX5 4 XRCC6 X-ray repair cross- 69842 6 160 complementing protein 6

132

4 HNRPR Heterogenous nuclear 70942 3 135 ribonucleoprotein R 4 HNRPQ Heterogenous nuclear 69602 2 121 ribonucleoprotein Q 4 CH60 60 kDa heat shock protein 61054 2 119 4 PABP1 Polyadenylate-binding protein 1 70670 4 110 (Table 4.3 continued)

133

134

Figure 4.1 p12 phosphorylation occurs in the absence of IR treatment and in the S25A mutant. A) FLAG-p12 was transiently expressed in HCT116 cells as follows: 1 x 106 cells were plated per well in a 6-well plate, and transfected with 3 µg of DNA for indicated constructs with Lipofectamine 2000. After 24 hours, cells were irradiated with 10 grays of IR. One hour later, cells were harvested and lysed with or without phosphatase inhibitors. FLAG-p12 was immunoprecipitated from lysates using anti-FLAG beads, and eluted protein samples were resolved on a 4-12% Bis-Tris gel. The anti-FLAG western blot is shown. The upper and lower panels show p12 specific bands at two different sizes. B) Cells were transfected as above, and treated with various doses of IR as shown. After treatment, cells were trypsinized and counted. 300 cells were plated per well in triplicate per each sample in 6 well plates. Cells were incubated for 10 days at 37°C to measure colony formation. The colony counts in each sample were normalized to untreated sample to give relative percent colony formation. Statistical significance was determined using a Student’s t test (* p<0.05 Error bars: SEM, N=3).

135

136

Figure 4.2 p12 has multiple ubiquitylated forms. A) Flag p12 and pIRES2-eGFP vectors were transfected in HCT116 cells using Lipofectamine 2000 and 3 µg of each construct, then cells were harvested after 24 hours expression. Cells were lysed in lysis buffer for 10 minutes at 4°C [50 mM Tris, 150 mM NaCl, 1% Triton-X 100, pH 7.4 ] and supernatants were clarified by spinning for 10 minutes at 13,000 x g at 4°C. The FLAG-p12 construct was IP’d using anti- FLAG beads, and an IP of empty vector transfected cells was included as a negative control. The beads were washed with wash buffer [50 mM Tris, 150 mM NaCl, and 0.05% Triton-x 100 at pH 7.4], three times with 1 ml buffer per wash. Then, proteins were eluted from anti-FLAG beads using FLAG peptide at 150 µg/ml, resolved on a 10% Bis-Tris gel, and blotted with mouse anti- FLAG. B) Flag-p12 truncation constructs are depicted in Figure 4.3. The transfections and immunoprecipitations were conducted as in part A. C) FLAG-p12 and empty vector were transfected and immunoprecipitated as in A, and western blotting was performed with the Fk2 antibody. D) FLAG-p12 S25A was constructed from FLAGp12 using site directed mutagenesis. Each construct was transfected and immunoprecipitated as described in A.

137

138

Figure 4.3. p12 truncation and mutation constructs. The p12 constructs used in this study are depicted here. Each truncated construct is drawn to scale with respect to the full-length construct. The relative positions of lysine residues are shown (residues 47, 51, 80, 90, and 92). In K3R and K5R constructs, the lysines mutated to arginines are shown as R. HFM stands for histone fold motif.

139

140

Figure 4.4 Ubiquitin dynamics of p12 may be important for response to stalled forks after dT treatment. A) HCT116 cells were seeded in 6 well plates with 1e6 cells per well. 16 hours later, cells were transfected with Lipofectamine 2000 and 3 µg DNA. 24 hours post-transfection, media was changed to that containing dialyzed FBS and deoxy-thymidine (dT) or hydroxyurea (HU) were added to wells to specific concentrations. Cells were treated for 24 hours before harvest and analysis by western blot. B) HCT116 cells were transfected as in A, and exposed to varying doses of IR 24 hours post-transfection. Cells were harvested and analyzed by western blot 24 hours later.

141

142

Figure 4.5 Identification of p12 binding partners. HCT116 cells were seeded in 10 cm dishes and transfected with empty pIRES2-eGFP or FLAG-p12. Lysates were prepared from 5 combined plates for each construct, then incubated overnight with pre-washed anti-FLAG beads. After washing and elution with FLAG peptide, the IP’d proteins were resolved on a 10% Bis-Tris gel and stained with Coomassie SimplySafe stain. The bands that were cut out and sequenced by MS are shown to the right of the gel and numbered 1-4, and the positions of the molecular weight marker are shown. Results of MS sequencing are listed in Table 4.3.

143

REFERENCES

1. Loeb, L. A. & Monnat, R. J. DNA polymerases and human disease. Nat. Rev. Genet. 9,

594–604 (2008).

2. Lange, S. S., Takata, K. & Wood, R. D. DNA polymerases and cancer. Nat. Rev. Cancer

11, 96–110 (2011).

3. Yang, W. An overview of Y-Family DNA polymerases and a case study of human DNA

polymerase η. Biochemistry (Mosc.) 53, 2793–2803 (2014).

4. Washington, M. T., Carlson, K. D., Freudenthal, B. D. & Pryor, J. M. Variations on a

theme: eukaryotic Y-family DNA polymerases. Biochim. Biophys. Acta 1804, 1113–1123

(2010).

5. Bebenek, K., Pedersen, L. C. & Kunkel, T. A. Structure-function studies of DNA

polymerase λ. Biochemistry (Mosc.) 53, 2781–2792 (2014).

6. García-Gómez, S. et al. PrimPol, an archaic primase/polymerase operating in human cells.

Mol. Cell 52, 541–553 (2013).

7. Nakamura, T., Zhao, Y., Yamagata, Y., Hua, Y. & Yang, W. Watching DNA polymerase η

make a phosphodiester bond. Nature 487, 196–201 (2012).

8. Freudenthal, B. D., Beard, W. A., Shock, D. D. & Wilson, S. H. Observing a DNA

polymerase choose right from wrong. Cell 154, 157–168 (2013).

9. Wong, I., Patel, S. S. & Johnson, K. A. An induced-fit kinetic mechanism for DNA

replication fidelity: direct measurement by single-turnover kinetics. Biochemistry (Mosc.)

30, 526–537 (1991).

10. Joyce, C. M. & Benkovic, S. J. DNA polymerase fidelity: kinetics, structure, and

checkpoints. Biochemistry (Mosc.) 43, 14317–14324 (2004).

11. Arana, M. E. & Kunkel, T. A. Mutator phenotypes due to DNA replication infidelity.

Semin. Cancer Biol. 20, 304–311 (2010).

144

12. Johnson, S. J. & Beese, L. S. Structures of mismatch replication errors observed in a DNA

polymerase. Cell 116, 803–816 (2004).

13. Shamoo, Y. & Steitz, T. A. Building a replisome from interacting pieces: sliding clamp

complexed to a peptide from DNA polymerase and a polymerase editing complex. Cell 99,

155–166 (1999).

14. Reha-Krantz, L. J. DNA polymerase proofreading: Multiple roles maintain genome

stability. Biochim. Biophys. Acta 1804, 1049–1063 (2010).

15. Negrini, S., Gorgoulis, V. G. & Halazonetis, T. D. Genomic instability--an evolving

hallmark of cancer. Nat. Rev. Mol. Cell Biol. 11, 220–228 (2010).

16. Li, G.-M. Mechanisms and functions of DNA mismatch repair. Cell Res. 18, 85–98 (2008).

17. Iyama, T. & Wilson, D. M., 3rd. DNA repair mechanisms in dividing and non-dividing

cells. DNA Repair 12, 620–636 (2013).

18. Caldecott, K. W. Molecular biology. Ribose--an internal threat to DNA. Science 343, 260–

261 (2014).

19. Friedberg, E. C. DNA damage and repair. Nature 421, 436–440 (2003).

20. Caldecott, K. W. DNA single-strand break repair. Exp. Cell Res. 329, 2–8 (2014).

21. Mimitou, E. P. & Symington, L. S. DNA end resection: many nucleases make light work.

DNA Repair 8, 983–995 (2009).

22. Khanna, K. K. & Jackson, S. P. DNA double-strand breaks: signaling, repair and the cancer

connection. Nat. Genet. 27, 247–254 (2001).

23. Lydeard, J. R., Jain, S., Yamaguchi, M. & Haber, J. E. Break-induced replication and

telomerase-independent telomere maintenance require Pol32. Nature 448, 820–823 (2007).

24. Chen, J.-M., Cooper, D. N., Chuzhanova, N., Férec, C. & Patrinos, G. P. Gene conversion:

mechanisms, evolution and human disease. Nat. Rev. Genet. 8, 762–775 (2007).

145

25. Kent, T., Chandramouly, G., McDevitt, S. M., Ozdemir, A. Y. & Pomerantz, R. T.

Mechanism of microhomology-mediated end-joining promoted by human DNA polymerase

θ. Nat. Struct. Mol. Biol. (2015). doi:10.1038/nsmb.2961

26. Barnes, D. E. & Lindahl, T. Repair and genetic consequences of endogenous DNA base

damage in mammalian cells. Annu. Rev. Genet. 38, 445–476 (2004).

27. Dieringer, D. & Schlötterer, C. Two distinct modes of microsatellite mutation processes:

evidence from the complete genomic sequences of nine species. Genome Res. 13, 2242–

2251 (2003).

28. Zeman, M. K. & Cimprich, K. A. Causes and consequences of replication stress. Nat. Cell

Biol. 16, 2–9 (2014).

29. Petermann, E. & Helleday, T. Pathways of mammalian replication fork restart. Nat. Rev.

Mol. Cell Biol. 11, 683–687 (2010).

30. Aza, A., Martin, M. J., Juarez, R., Blanco, L. & Terrados, G. DNA expansions generated by

human Polμ on iterative sequences. Nucleic Acids Res. 41, 253–263 (2013).

31. Yamtich, J. & Sweasy, J. B. DNA polymerase family X: function, structure, and cellular

roles. Biochim. Biophys. Acta 1804, 1136–1150 (2010).

32. Moon, A. F. et al. The X family portrait: structural insights into biological functions of X

family polymerases. DNA Repair 6, 1709–1725 (2007).

33. Bebenek, K. & Kunkel, T. A. Analyzing fidelity of DNA polymerases. Methods Enzymol.

262, 217–232 (1995).

34. Kunkel, T. A. The mutational specificity of DNA polymerase-beta during in vitro DNA

synthesis. Production of frameshift, base substitution, and deletion mutations. J. Biol.

Chem. 260, 5787–5796 (1985).

35. Juárez, R., Ruiz, J. F., Nick McElhinny, S. A., Ramsden, D. & Blanco, L. A specific loop in

human DNA polymerase mu allows switching between creative and DNA-instructed

synthesis. Nucleic Acids Res. 34, 4572–4582 (2006).

146

36. Sharma, S., Helchowski, C. M. & Canman, C. E. The roles of DNA polymerase ζ and the Y

family DNA polymerases in promoting or preventing genome instability. Mutat. Res.

(2012). doi:10.1016/j.mrfmmm.2012.11.002

37. Ghosal, G. & Chen, J. DNA damage tolerance: a double-edged sword guarding the genome.

Transl. Cancer Res. 2, 107–129 (2013).

38. Bianchi, J. et al. PrimPol Bypasses UV Photoproducts during Eukaryotic Chromosomal

DNA Replication. Mol. Cell 52, 566–573 (2013).

39. Mourón, S. et al. Repriming of DNA synthesis at stalled replication forks by human

PrimPol. Nat. Struct. Mol. Biol. 20, 1383–1389 (2013).

40. Wan, L. et al. hPrimpol1/CCDC111 is a human DNA primase-polymerase required for the

maintenance of genome integrity. EMBO Rep. 14, 1104–1112 (2013).

41. Prakash, S., Johnson, R. E. & Prakash, L. Eukaryotic translesion synthesis DNA

polymerases: specificity of structure and function. Annu. Rev. Biochem. 74, 317–353

(2005).

42. Johnson, R. E., Washington, M. T., Prakash, S. & Prakash, L. Fidelity of human DNA

polymerase eta. J. Biol. Chem. 275, 7447–7450 (2000).

43. Johnson, R. E., Washington, M. T., Haracska, L., Prakash, S. & Prakash, L. Eukaryotic

polymerases iota and zeta act sequentially to bypass DNA lesions. Nature 406, 1015–1019

(2000).

44. Ohashi, E., Takeishi, Y., Ueda, S. & Tsurimoto, T. Interaction between Rad9-Hus1-Rad1

and TopBP1 activates ATR-ATRIP and promotes TopBP1 recruitment to sites of UV-

damage. DNA Repair 21, 1–11 (2014).

45. Pavlov, Y. I. & Shcherbakova, P. V. DNA polymerases at the eukaryotic fork-20 years

later. Mutat. Res. 685, 45–53 (2010).

46. Nelson, J. R., Lawrence, C. W. & Hinkle, D. C. Thymine-thymine dimer bypass by yeast

DNA polymerase zeta. Science 272, 1646–1649 (1996).

147

47. Gan, G. N., Wittschieben, J. P., Wittschieben, B. Ø. & Wood, R. D. DNA polymerase zeta

(pol zeta) in higher eukaryotes. Cell Res. 18, 174–183 (2008).

48. Stone, J. E., Lujan, S. A., Kunkel, T. A. & Kunkel, T. A. DNA polymerase zeta generates

clustered mutations during bypass of endogenous DNA lesions in Saccharomyces

cerevisiae. Environ. Mol. Mutagen. 53, 777–786 (2012).

49. Wittschieben, J. P., Reshmi, S. C., Gollin, S. M. & Wood, R. D. Loss of DNA polymerase

zeta causes chromosomal instability in mammalian cells. Cancer Res. 66, 134–142 (2006).

50. Lange, S. S., Wittschieben, J. P. & Wood, R. D. DNA polymerase zeta is required for

proliferation of normal mammalian cells. Nucleic Acids Res. 40, 4473–4482 (2012).

51. Wood, R. D. & Lange, S. S. Breakthrough for a DNA break-preventer. Proc. Natl. Acad.

Sci. U. S. A. 111, 2864–2865 (2014).

52. Lee, Y.-S., Gregory, M. T. & Yang, W. Human Pol ζ purified with accessory subunits is

active in translesion DNA synthesis and complements Pol η in cisplatin bypass. Proc. Natl.

Acad. Sci. U. S. A. 111, 2954–2959 (2014).

53. Baranovskiy, A. G. et al. DNA polymerase δ and ζ switch by sharing accessory subunits of

DNA polymerase δ. J. Biol. Chem. 287, 17281–17287 (2012).

54. Johnson, R. E., Prakash, L. & Prakash, S. Pol31 and Pol32 subunits of yeast DNA

polymerase δ are also essential subunits of DNA polymerase ζ. Proc. Natl. Acad. Sci. U. S.

A. 109, 12455–12460 (2012).

55. Yoon, J.-H., Roy Choudhury, J., Park, J., Prakash, S. & Prakash, L. A role for DNA

polymerase θ in promoting replication through oxidative DNA lesion, thymine glycol, in

human cells. J. Biol. Chem. 289, 13177–13185 (2014).

56. Yousefzadeh, M. J. & Wood, R. D. DNA polymerase POLQ and cellular defense against

DNA damage. DNA Repair 12, 1–9 (2013).

57. Yousefzadeh, M. J. et al. Mechanism of suppression of chromosomal instability by DNA

polymerase POLQ. PLoS Genet. 10, e1004654 (2014).

148

58. Fernandez-Vidal, A. et al. A role for DNA polymerase θ in the timing of DNA replication.

Nat. Commun. 5, 4285 (2014).

59. Arana, M. E., Potapova, O., Kunkel, T. A. & Joyce, C. M. Kinetic analysis of the unique

error signature of human DNA polymerase ν. Biochemistry (Mosc.) 50, 10126–10135

(2011).

60. Kohzaki, M. et al. DNA polymerases nu and theta are required for efficient

immunoglobulin V gene diversification in chicken. J. Cell Biol. 189, 1117–1127 (2010).

61. McKinney, E. A. & Oliveira, M. T. Replicating animal mitochondrial DNA. Genet. Mol.

Biol. 36, 308–315 (2013).

62. Stumpf, J. D., Saneto, R. P. & Copeland, W. C. Clinical and molecular features of POLG-

related mitochondrial disease. Cold Spring Harb. Perspect. Biol. 5, a011395 (2013).

63. Chan, S. S. L. & Copeland, W. C. DNA polymerase gamma and mitochondrial disease:

understanding the consequence of POLG mutations. Biochim. Biophys. Acta 1787, 312–319

(2009).

64. Kunkel, T. A. Balancing eukaryotic replication asymmetry with replication fidelity. Curr.

Opin. Chem. Biol. 15, 620–626 (2011).

65. Doublié, S. & Zahn, K. E. Structural insights into eukaryotic DNA replication. Front.

Microbiol. 5, 444 (2014).

66. Pavlov, Y. I. et al. Evidence that errors made by DNA polymerase alpha are corrected by

DNA polymerase delta. Curr. Biol. CB 16, 202–207 (2006).

67. Nick McElhinny, S. A., Gordenin, D. A., Stith, C. M., Burgers, P. M. J. & Kunkel, T. A.

Division of labor at the eukaryotic replication fork. Mol. Cell 30, 137–144 (2008).

68. Pursell, Z. F., Isoz, I., Lundström, E.-B., Johansson, E. & Kunkel, T. A. Yeast DNA

polymerase epsilon participates in leading-strand DNA replication. Science 317, 127–130

(2007).

149

69. Miyabe, I., Kunkel, T. A. & Carr, A. M. The major roles of DNA polymerases epsilon and

delta at the eukaryotic replication fork are evolutionarily conserved. PLoS Genet. 7,

e1002407 (2011).

70. Georgescu, R. E. et al. Mechanism of asymmetric polymerase assembly at the eukaryotic

replication fork. Nat. Struct. Mol. Biol. 21, 664–670 (2014).

71. Dovrat, D., Stodola, J. L., Burgers, P. M. J. & Aharoni, A. Sequential switching of binding

partners on PCNA during in vitro Okazaki fragment maturation. Proc. Natl. Acad. Sci. U. S.

A. 111, 14118–14123 (2014).

72. Muzi-Falconi, M., Giannattasio, M., Foiani, M. & Plevani, P. The DNA polymerase alpha-

primase complex: multiple functions and interactions. ScientificWorldJournal 3, 21–33

(2003).

73. Wellinger, R. J. & Zakian, V. A. Everything you ever wanted to know about

Saccharomyces cerevisiae telomeres: beginning to end. Genetics 191, 1073–1105 (2012).

74. Lue, N. F., Chan, J., Wright, W. E. & Hurwitz, J. The CDC13-STN1-TEN1 complex

stimulates Pol α activity by promoting RNA priming and primase-to-polymerase switch.

Nat. Commun. 5, 5762 (2014).

75. Hile, S. E. & Eckert, K. A. Positive correlation between DNA polymerase alpha-primase

pausing and mutagenesis within polypyrimidine/polypurine microsatellite sequences. J.

Mol. Biol. 335, 745–759 (2004).

76. Prindle, M. J. & Loeb, L. A. DNA polymerase delta in DNA replication and genome

maintenance. Environ. Mol. Mutagen. 53, 666–682 (2012).

77. Tahirov, T. H. Structure and function of eukaryotic DNA polymerase δ. Subcell. Biochem.

62, 217–236 (2012).

78. Swan, M. K., Johnson, R. E., Prakash, L., Prakash, S. & Aggarwal, A. K. Structural basis of

high-fidelity DNA synthesis by yeast DNA polymerase delta. Nat. Struct. Mol. Biol. 16,

979–986 (2009).

150

79. Stucki, M. et al. Mammalian base excision repair by DNA polymerases delta and epsilon.

Oncogene 17, 835–843 (1998).

80. Shcherbakova, P. V. et al. Unique error signature of the four-subunit yeast DNA

polymerase epsilon. J. Biol. Chem. 278, 43770–43780 (2003).

81. Jain, R. et al. Crystal Structure of Yeast DNA Polymerase ε Catalytic Domain. PloS One 9,

e94835 (2014).

82. Hogg, M. et al. Structural basis for processive DNA synthesis by yeast DNA polymerase ɛ.

Nat. Struct. Mol. Biol. 21, 49–55 (2014).

83. Ganai, R. A., Bylund, G. O. & Johansson, E. Switching between polymerase and

exonuclease sites in DNA polymerase ε. Nucleic Acids Res. (2014).

doi:10.1093/nar/gku1353

84. Dua, R., Levy, D. L. & Campbell, J. L. Analysis of the essential functions of the C-terminal

protein/protein interaction domain of Saccharomyces cerevisiae pol epsilon and its

unexpected ability to support growth in the absence of the DNA polymerase domain. J.

Biol. Chem. 274, 22283–22288 (1999).

85. Kesti, T., Flick, K., Keränen, S., Syväoja, J. E. & Wittenberg, C. DNA polymerase epsilon

catalytic domains are dispensable for DNA replication, DNA repair, and cell viability. Mol.

Cell 3, 679–685 (1999).

86. Bermudez, V. P., Farina, A., Raghavan, V., Tappin, I. & Hurwitz, J. Studies on human

DNA polymerase epsilon and GINS complex and their role in DNA replication. J. Biol.

Chem. 286, 28963–28977 (2011).

87. Handa, T., Kanke, M., Takahashi, T. S., Nakagawa, T. & Masukata, H. DNA

polymerization-independent functions of DNA polymerase epsilon in assembly and

progression of the replisome in fission yeast. Mol. Biol. Cell 23, 3240–3253 (2012).

151

88. Sengupta, S., van Deursen, F., de Piccoli, G. & Labib, K. Dpb2 Integrates the Leading-

Strand DNA Polymerase into the Eukaryotic Replisome. Curr. Biol. CB (2013).

doi:10.1016/j.cub.2013.02.011

89. Isoz, I., Persson, U., Volkov, K. & Johansson, E. The C-terminus of Dpb2 is required for

interaction with Pol2 and for cell viability. Nucleic Acids Res. 40, 11545–11553 (2012).

90. Muramatsu, S., Hirai, K., Tak, Y.-S., Kamimura, Y. & Araki, H. CDK-dependent complex

formation between replication proteins Dpb11, Sld2, Pol (epsilon}, and GINS in budding

yeast. Genes Dev. 24, 602–612 (2010).

91. Emanuele, M. J. et al. Global identification of modular cullin-RING ligase substrates. Cell

147, 459–474 (2011).

92. Kesti, T., McDonald, W. H., Yates, J. R. & Wittenberg, C. Cell cycle-dependent

phosphorylation of the DNA polymerase epsilon subunit, Dpb2, by the Cdc28 cyclin-

dependent protein kinase. J. Biol. Chem. 279, 14245–14255 (2004).

93. Ogi, T. et al. Three DNA polymerases, recruited by different mechanisms, carry out NER

repair synthesis in human cells. Mol. Cell 37, 714–727 (2010).

94. Parlanti, E., Locatelli, G., Maga, G. & Dogliotti, E. Human base excision repair complex is

physically associated to DNA replication and cell cycle regulatory proteins. Nucleic Acids

Res. 35, 1569–1577 (2007).

95. Hicks, W. M., Kim, M. & Haber, J. E. Increased mutagenesis and unique mutation signature

associated with mitotic gene conversion. Science 329, 82–85 (2010).

96. Collins, N. et al. An ACF1-ISWI chromatin-remodeling complex is required for DNA

replication through heterochromatin. Nat. Genet. 32, 627–632 (2002).

97. Iida, T. & Araki, H. Noncompetitive counteractions of DNA polymerase epsilon and

ISW2/yCHRAC for epigenetic inheritance of telomere position effect in Saccharomyces

cerevisiae. Mol. Cell. Biol. 24, 217–227 (2004).

152

98. Smith, J. S., Caputo, E. & Boeke, J. D. A genetic screen for ribosomal DNA silencing

defects identifies multiple DNA replication and chromatin-modulating factors. Mol. Cell.

Biol. 19, 3184–3197 (1999).

99. Henninger, E. E. & Pursell, Z. F. DNA polymerase ε and its roles in genome stability.

IUBMB Life 66, 339–351 (2014).

100. Aksenova, A. et al. Mismatch repair-independent increase in spontaneous mutagenesis in

yeast lacking non-essential subunits of DNA polymerase ε. PLoS Genet. 6, e1001209

(2010).

101. Tsubota, T. et al. Double-stranded DNA binding, an unusual property of DNA polymerase

epsilon, promotes epigenetic silencing in Saccharomyces cerevisiae. J. Biol. Chem. 281,

32898–32908 (2006).

102. Li, Y., Pursell, Z. F. & Linn, S. Identification and cloning of two histone fold motif-

containing subunits of HeLa DNA polymerase epsilon. J. Biol. Chem. 275, 23247–23252

(2000).

103. Golebiowski, F. et al. System-wide changes to SUMO modifications in response to heat

shock. Sci. Signal. 2, ra24 (2009).

104. Matsuoka, S. et al. ATM and ATR substrate analysis reveals extensive protein networks

responsive to DNA damage. Science 316, 1160–1166 (2007).

105. Poot, R. A. et al. HuCHRAC, a human ISWI chromatin remodelling complex contains

hACF1 and two novel histone-fold proteins. EMBO J. 19, 3377–3387 (2000).

106. Sánchez-Molina, S. et al. Role for hACF1 in the G2/M damage checkpoint. Nucleic Acids

Res. 39, 8445–8456 (2011).

107. Aksenova, A. Y. et al. Genome rearrangements caused by interstitial telomeric sequences in

yeast. Proc. Natl. Acad. Sci. U. S. A. 110, 19866–19871 (2013).

153

108. Liu, G., Chen, X. & Leffak, M. Oligodeoxynucleotide binding to (CTG) · (CAG)

microsatellite repeats inhibits replication fork stalling, hairpin formation, and genome

instability. Mol. Cell. Biol. 33, 571–581 (2013).

109. Franchitto, A. & Pichierri, P. Understanding the molecular basis of common fragile sites

instability: role of the proteins involved in the recovery of stalled replication forks. Cell

Cycle Georget. Tex 10, 4039–4046 (2011).

110. Benguría, A., Hernández, P., Krimer, D. B. & Schvartzman, J. B. Sir2p suppresses

recombination of replication forks stalled at the replication fork barrier of ribosomal DNA

in Saccharomyces cerevisiae. Nucleic Acids Res. 31, 893–898 (2003).

111. Caldecott, K. W. Mammalian single-strand break repair: mechanisms and links with

chromatin. DNA Repair 6, 443–453 (2007).

112. Vilenchik, M. M. & Knudson, A. G. Endogenous DNA double-strand breaks: production,

fidelity of repair, and induction of cancer. Proc. Natl. Acad. Sci. U. S. A. 100, 12871–12876

(2003).

113. Clausen, A. R., Zhang, S., Burgers, P. M., Lee, M. Y. & Kunkel, T. A. Ribonucleotide

incorporation, proofreading and bypass by human DNA polymerase δ. DNA Repair 12,

121–127 (2013).

114. Göksenin, A. Y. et al. Human DNA Polymerase {epsilon} Is Able to Efficiently Extend

from Multiple Consecutive Ribonucleotides. J. Biol. Chem. 287, 42675–42684 (2012).

115. McCulloch, S. D. & Kunkel, T. A. The fidelity of DNA synthesis by eukaryotic replicative

and translesion synthesis polymerases. Cell Res. 18, 148–161 (2008).

116. Lujan, S. A. et al. Mismatch repair balances leading and lagging strand DNA replication

fidelity. PLoS Genet. 8, e1003016 (2012).

117. Loeb, L. A. Human cancers express mutator phenotypes: origin, consequences and

targeting. Nat. Rev. Cancer 11, 450–457 (2011).

154

118. Fearon, E. R. Molecular genetics of colorectal cancer. Annu. Rev. Pathol. 6, 479–507

(2011).

119. Donehower, L. A. et al. MLH1-silenced and non-silenced subgroups of hypermutated

colorectal carcinomas have distinct mutational landscapes. J. Pathol. 229, 99–110 (2013).

120. Ciriello, G. et al. Emerging landscape of oncogenic signatures across human cancers. Nat.

Genet. 45, 1127–1133 (2013).

121. Shinbrot, E. et al. Exonuclease mutations in DNA polymerase epsilon reveal replication

strand specific mutation patterns and human origins of replication. Genome Res. 24, 1740–

1750 (2014).

122. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487,

330–337 (2012).

123. Zhong, X., Pedersen, L. C. & Kunkel, T. A. Characterization of a replicative DNA

polymerase mutant with reduced fidelity and increased translesion synthesis capacity.

Nucleic Acids Res. 36, 3892–3904 (2008).

124. Albertson, T. M. et al. DNA polymerase epsilon and delta proofreading suppress discrete

mutator and cancer phenotypes in mice. Proc. Natl. Acad. Sci. U. S. A. 106, 17101–17104

(2009).

125. Yoshida, R. et al. Concurrent genetic alterations in DNA polymerase proofreading and

mismatch repair in human colorectal cancer. Eur. J. Hum. Genet. EJHG 19, 320–325

(2011).

126. Zou, Y. et al. Frequent POLE1 p.S297F mutation in Chinese patients with ovarian

endometrioid carcinoma. Mutat. Res. Fundam. Mol. Mech. Mutagen. 761, 49–52 (2014).

127. Hammerman, P. S. et al. Comprehensive genomic characterization of squamous cell lung

cancers. Nature 489, 519–525 (2012).

155

128. Palles, C. et al. Germline mutations affecting the proofreading domains of POLE and

POLD1 predispose to colorectal adenomas and carcinomas. Nat. Genet. (2012).

doi:10.1038/ng.2503

129. Seshagiri, S. et al. Recurrent R-spondin fusions in colon cancer. Nature 488, 660–664

(2012).

130. Church, D. N. et al. DNA polymerase {epsilon} and δ exonuclease domain mutations in

endometrial cancer. Hum. Mol. Genet. (2013). doi:10.1093/hmg/ddt131

131. Briggs, S. & Tomlinson, I. Germline and somatic polymerase ɛ and δ mutations define a

new class of hypermutated colorectal and endometrial cancers. J. Pathol. (2013).

doi:10.1002/path.4185

132. Kandoth, C. et al. Integrated genomic characterization of endometrial carcinoma. Nature

497, 67–73 (2013).

133. Kane, D. P. & Shcherbakova, P. V. A common cancer-associated DNA polymerase

{varepsilon} mutation causes an exceptionally strong mutator phenotype, indicating fidelity

defects distinct from loss of proofreading. Cancer Res. (2014). doi:10.1158/0008-

5472.CAN-13-2892

134. Valle, L. et al. New insights into POLE and POLD1 germline mutations in familial

colorectal cancer and polyposis. Hum. Mol. Genet. (2014). doi:10.1093/hmg/ddu058

135. Kinzler, K. W. & Vogelstein, B. Lessons from hereditary colorectal cancer. Cell 87, 159–

170 (1996).

136. Levine, R. L. et al. PTEN mutations and microsatellite instability in complex atypical

hyperplasia, a precursor lesion to uterine endometrioid carcinoma. Cancer Res. 58, 3254–

3258 (1998).

137. Le Gallo, M. et al. Exome sequencing of serous endometrial tumors identifies recurrent

somatic mutations in chromatin-remodeling and ubiquitin ligase complex genes. Nat. Genet.

44, 1310–1315 (2012).

156

138. Byron, S. A. et al. FGFR2 point mutations in 466 endometrioid endometrial tumors:

relationship with MSI, KRAS, PIK3CA, CTNNB1 mutations and clinicopathological

features. PloS One 7, e30801 (2012).

139. Garg, K. & Soslow, R. A. Endometrial carcinoma in women aged 40 years and younger.

Arch. Pathol. Lab. Med. 138, 335–342 (2014).

140. Tafe, L. J., Riggs, E. R. & Tsongalis, G. J. Lynch syndrome presenting as endometrial

cancer. Clin. Chem. 60, 111–121 (2014).

141. Meng, B. et al. POLE exonuclease domain mutation predicts long progression-free survival

in grade 3 endometrioid carcinoma of the endometrium. Gynecol. Oncol. 134, 15–19

(2014).

142. Méchali, M. Eukaryotic DNA replication origins: many choices for appropriate answers.

Nat. Rev. Mol. Cell Biol. 11, 728–738 (2010).

143. Korona, D. A., Lecompte, K. G. & Pursell, Z. F. The high fidelity and unique error

signature of human DNA polymerase epsilon. Nucleic Acids Res. 39, 1763–1773 (2011).

144. Pursell, Z. F., Isoz, I., Lundström, E.-B., Johansson, E. & Kunkel, T. A. Regulation of B

family DNA polymerase fidelity by a conserved active site residue: characterization of

M644W, M644L and M644F mutants of yeast DNA polymerase epsilon. Nucleic Acids Res.

35, 3076–3086 (2007).

145. Fortune, J. M. et al. Saccharomyces cerevisiae DNA polymerase delta: high fidelity for base

substitutions but lower fidelity for single- and multi-base deletions. J. Biol. Chem. 280,

29980–29987 (2005).

146. Schmitt, M. W., Matsumoto, Y. & Loeb, L. A. High fidelity and lesion bypass capability of

human DNA polymerase delta. Biochimie 91, 1163–1172 (2009).

147. Kunkel, T. A. & Erie, D. A. DNA mismatch repair. Annu. Rev. Biochem. 74, 681–710

(2005).

157

148. Reha-Krantz, L. J. Amino acid changes coded by bacteriophage T4 DNA polymerase

mutator mutants. Relating structure to function. J. Mol. Biol. 202, 711–724 (1988).

149. Abdurashidova, G. et al. Start sites of bidirectional DNA synthesis at the human lamin B2

origin. Science 287, 2023–2026 (2000).

150. Lucas, I. et al. High-throughput mapping of origins of replication in human cells. EMBO

Rep. 8, 770–777 (2007).

151. Keller, C., Ladenburger, E.-M., Kremer, M. & Knippers, R. The origin recognition complex

marks a replication origin in the human TOP1 gene promoter. J. Biol. Chem. 277, 31430–

31440 (2002).

152. Altman, A. L. & Fanning, E. The Chinese hamster dihydrofolate reductase replication

origin beta is active at multiple ectopic chromosomal locations and requires specific DNA

sequence elements for activity. Mol. Cell. Biol. 21, 1098–1110 (2001).

153. Mesner, L. D., Li, X., Dijkwel, P. A. & Hamlin, J. L. The dihydrofolate reductase origin of

replication does not contain any nonredundant genetic elements required for origin activity.

Mol. Cell. Biol. 23, 804–814 (2003).

154. Schaarschmidt, D., Ladenburger, E.-M., Keller, C. & Knippers, R. Human Mcm proteins at

a replication origin during the G1 to S phase transition. Nucleic Acids Res. 30, 4176–4185

(2002).

155. Cadoret, J.-C. et al. Genome-wide studies highlight indirect links between human

replication origins and gene regulation. Proc. Natl. Acad. Sci. U. S. A. 105, 15837–15842

(2008).

156. Shlien, A. et al. Combined hereditary and somatic mutations of replication error repair

genes result in rapid onset of ultra-hypermutated cancers. Nat. Genet. (2015).

doi:10.1038/ng.3202

157. Herr, A. J. et al. Mutator suppression and escape from replication error-induced extinction

in yeast. PLoS Genet. 7, e1002282 (2011).

158

158. De Vega, M., Lázaro, J. M., Salas, M. & Blanco, L. Mutational analysis of phi29 DNA

polymerase residues acting as ssDNA ligands for 3’-5’ exonucleolysis. J. Mol. Biol. 279,

807–822 (1998).

159. Pérez-Arnaiz, P., Lázaro, J. M., Salas, M. & de Vega, M. Involvement of phi29 DNA

polymerase thumb subdomain in the proper coordination of synthesis and degradation

during DNA replication. Nucleic Acids Res. 34, 3107–3115 (2006).

160. Abaan, O. D. et al. The Exomes of the NCI-60 Panel: A Genomic Resource for Cancer

Biology and Systems Pharmacology. Cancer Res. 73, 4372–4382 (2013).

161. Xia, S., Wang, J. & Konigsberg, W. H. DNA mismatch synthesis complexes provide

insights into base selectivity of a B family DNA polymerase. J. Am. Chem. Soc. 135, 193–

202 (2013).

162. Zahn, K. E., Averill, A., Wallace, S. S. & Doublié, S. The miscoding potential of 5-

hydroxycytosine arises due to template instability in the replicative polymerase active site.

Biochemistry (Mosc.) 50, 10350–10358 (2011).

163. Roberts, S. A. & Gordenin, D. A. Hypermutation in human cancer genomes: footprints and

mechanisms. Nat. Rev. Cancer 14, 786–800 (2014).

164. Roberts, S. A. et al. Clustered mutations in yeast and in human cancers can arise from

damaged long single-strand DNA regions. Mol. Cell 46, 424–435 (2012).

165. Cahill, D. P., Kinzler, K. W., Vogelstein, B. & Lengauer, C. Genetic instability and

darwinian selection in tumours. Trends Cell Biol. 9, M57–60 (1999).

166. Vogelstein, B. et al. Cancer genome landscapes. Science 339, 1546–1558 (2013).

167. Heitzer, E. & Tomlinson, I. Replicative DNA polymerase mutations in cancer. Curr. Opin.

Genet. Dev. 24, 107–113 (2014).

168. Tyner, J. W. et al. High-throughput sequencing screen reveals novel, transforming RAS

mutations in myeloid leukemia patients. Blood 113, 1749–1755 (2009).

159

169. Oda, K. et al. PIK3CA cooperates with other phosphatidylinositol 3’-kinase pathway

mutations to effect oncogenic transformation. Cancer Res. 68, 8127–8136 (2008).

170. Zahn, K. E., Belrhali, H., Wallace, S. S. & Doublié, S. Caught bending the A-rule: crystal

structures of translesion DNA synthesis with a non-natural nucleotide. Biochemistry

(Mosc.) 46, 10551–10561 (2007).

171. Kamtekar, S. et al. Insights into strand displacement and processivity from the crystal

structure of the protein-primed DNA polymerase of bacteriophage phi29. Mol. Cell 16,

609–618 (2004).

172. Besnard, E. et al. Unraveling cell type-specific and reprogrammable human replication

origin signatures associated with G-quadruplex consensus motifs. Nat. Struct. Mol. Biol. 19,

837–844 (2012).

173. Schmid, J. P. et al. Polymerase {varepsilon}1 mutation in a human syndrome with facial

dysmorphism, immunodeficiency, livedo, and short stature (‘FILS syndrome’). J. Exp. Med.

(2012). doi:10.1084/jem.20121303

174. Lujan, S. A. et al. Heterogeneous polymerase fidelity and mismatch repair bias genome

variation and composition. Genome Res. (2014). doi:10.1101/gr.178335.114

175. Erdeniz, N., Dudley, S., Gealy, R., Jinks-Robertson, S. & Liskay, R. M. Novel PMS1

alleles preferentially affect the repair of primer strand loops during DNA replication. Mol.

Cell. Biol. 25, 9221–9231 (2005).

176. Lang, G. I. & Murray, A. W. Estimating the per-base-pair mutation rate in the yeast

Saccharomyces cerevisiae. Genetics 178, 67–82 (2008).

177. Croyle, M. L., Woo, A. L. & Lingrel, J. B. Extensive random mutagenesis analysis of the

Na+/K+-ATPase alpha subunit identifies known and previously unidentified amino acid

residues that alter ouabain sensitivity--implications for ouabain binding. Eur. J. Biochem.

FEBS 248, 488–495 (1997).

160

178. Malkhosyan, S., McCarty, A., Sawai, H. & Perucho, M. Differences in the spectrum of

spontaneous mutations in the hprt gene between tumor cells of the microsatellite mutator

phenotype. Mutat. Res. 316, 249–259 (1996).

179. Hall, B. M., Ma, C.-X., Liang, P. & Singh, K. K. Fluctuation analysis CalculatOR: a web

tool for the determination of mutation rate using Luria-Delbruck fluctuation analysis.

Bioinforma. Oxf. Engl. 25, 1564–1565 (2009).

180. Agbor, A. A., Göksenin, A. Y., Lecompte, K. G., Hans, S. H. & Pursell, Z. F. Human Pol ɛ-

dependent replication errors and the influence of mismatch repair on their correction. DNA

Repair (2013). doi:10.1016/j.dnarep.2013.08.012

181. Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500,

415–421 (2013).

182. Helleday, T., Eshtad, S. & Nik-Zainal, S. Mechanisms underlying mutational signatures in

human cancers. Nat. Rev. Genet. 15, 585–598 (2014).

183. Rago, C., Vogelstein, B. & Bunz, F. Genetic knockouts and knockins in human somatic

cells. Nat. Protoc. 2, 2734–2746 (2007).

184. Topaloglu, O., Hurley, P. J., Yildirim, O., Civin, C. I. & Bunz, F. Improved methods for the

generation of human gene knockout and knockin cell lines. Nucleic Acids Res. 33, e158

(2005).

185. Foster, P. L. Methods for determining spontaneous mutation rates. Methods Enzymol. 409,

195–213 (2006).

186. Bhattacharyya, N. P. et al. Molecular analysis of mutations in mutator colorectal carcinoma

cell lines. Hum. Mol. Genet. 4, 2057–2064 (1995).

187. Kokoska, R. J., McCulloch, S. D. & Kunkel, T. A. The efficiency and specificity of

apurinic/apyrimidinic site bypass by human DNA polymerase eta and Sulfolobus

solfataricus Dpo4. J. Biol. Chem. 278, 50537–50545 (2003).

161

188. Parsons, R. et al. Hypermutability and mismatch repair deficiency in RER+ tumor cells.

Cell 75, 1227–1236 (1993).

189. Branch, P., Hampson, R. & Karran, P. DNA mismatch binding defects, DNA damage

tolerance, and mutator phenotypes in human colorectal carcinoma cell lines. Cancer Res.

55, 2304–2309 (1995).

190. Schneider, C. A., Rasband, W. S. & Eliceiri, K. W. NIH Image to ImageJ: 25 years of

image analysis. Nat. Methods 9, 671–675 (2012).

191. Treuting, P. M., Albertson, T. M. & Preston, B. D. Case series: acute tumor lysis syndrome

in mutator mice with disseminated lymphoblastic lymphoma. Toxicol. Pathol. 38, 476–485

(2010).

192. Morrison, A. & Sugino, A. The 3’-->5’ exonucleases of both DNA polymerases delta and

epsilon participate in correcting errors of DNA replication in Saccharomyces cerevisiae.

Mol. Gen. Genet. MGG 242, 289–296 (1994).

193. Pavlov, Y. I., Maki, S., Maki, H. & Kunkel, T. A. Evidence for interplay among yeast

replicative DNA polymerases alpha, delta and epsilon from studies of exonuclease and

polymerase active site mutations. BMC Biol. 2, 11 (2004).

194. Albertson, T. M. et al. DNA polymerase epsilon and delta proofreading suppress discrete

mutator and cancer phenotypes in mice. Proc. Natl. Acad. Sci. U. S. A. 106, 17101–17104

(2009).

195. Iyer, L. M., Koonin, E. V., Leipe, D. D. & Aravind, L. Origin and evolution of the archaeo-

eukaryotic primase superfamily and related palm-domain proteins: structural insights and

new members. Nucleic Acids Res. 33, 3875–3896 (2005).

196. Tham, K.-C. et al. Mismatch repair inhibits homeologous recombination via coordinated

directional unwinding of trapped DNA structures. Mol. Cell 51, 326–337 (2013).

162

197. Glaab, W. E., Tindall, K. R. & Skopek, T. R. Specificity of mutations induced by methyl

methanesulfonate in mismatch repair-deficient human cancer cell lines. Mutat. Res. 427,

67–78 (1999).

198. Ohzeki, S., Tachibana, A., Tatsumi, K. & Kato, T. Spectra of spontaneous mutations at the

hprt locus in colorectal carcinoma cell lines defective in mismatch repair. Carcinogenesis

18, 1127–1133 (1997).

199. Gasche, C., Chang, C. L., Rhees, J., Goel, A. & Boland, C. R. Oxidative stress increases

frameshift mutations in human colorectal cancer cells. Cancer Res. 61, 7444–7448 (2001).

200. Macpherson, P. et al. 8-oxoguanine incorporation into DNA repeats in vitro and mismatch

recognition by MutSalpha. Nucleic Acids Res. 33, 5094–5105 (2005).

201. Song, S. et al. DNA precursor asymmetries in mammalian tissue mitochondria and possible

contribution to mutagenesis through reduced replication fidelity. Proc. Natl. Acad. Sci. U. S.

A. 102, 4990–4995 (2005).

202. Loeb, L. A. & Preston, B. D. Mutagenesis by apurinic/apyrimidinic sites. Annu. Rev. Genet.

20, 201–230 (1986).

203. Shibutani, S., Takeshita, M. & Grollman, A. P. Translesional synthesis on DNA templates

containing a single abasic site. A mechanistic study of the ‘A rule’. J. Biol. Chem. 272,

13916–13922 (1997).

204. Sabouri, N. & Johansson, E. Translesion synthesis of abasic sites by yeast DNA polymerase

epsilon. J. Biol. Chem. 284, 31555–31563 (2009).

205. Chastain, P. D., 2nd et al. Abasic sites preferentially form at regions undergoing DNA

replication. FASEB J. Off. Publ. Fed. Am. Soc. Exp. Biol. 24, 3674–3680 (2010).

206. Pettersen, H. S. et al. UNG-initiated base excision repair is the major repair route for 5-

fluorouracil in DNA, but 5-fluorouracil cytotoxicity depends mainly on RNA incorporation.

Nucleic Acids Res. 39, 8430–8444 (2011).

163

207. Fazlieva, R. et al. Proofreading exonuclease activity of human DNA polymerase delta and

its effects on lesion-bypass DNA synthesis. Nucleic Acids Res. 37, 2854–2866 (2009).

208. Tran, H. T., Gordenin, D. A. & Resnick, M. A. The 3’-->5’ exonucleases of DNA

polymerases delta and epsilon and the 5’-->3’ exonuclease Exo1 have major roles in

postreplication mutation avoidance in Saccharomyces cerevisiae. Mol. Cell. Biol. 19, 2000–

2007 (1999).

209. Williams, L. N., Herr, A. J. & Preston, B. D. Emergence of DNA Polymerase {varepsilon}

Antimutators That Escape Error-Induced Extinction in Yeast. Genetics 193, 751–770

(2013).

210. Ohya, T. et al. The DNA polymerase domain of pol(epsilon) is required for rapid, efficient,

and highly accurate chromosomal DNA replication, telomere length maintenance, and

normal cell senescence in Saccharomyces cerevisiae. J. Biol. Chem. 277, 28099–28108

(2002).

211. Shaltiel, I. A., Krenning, L., Bruinsma, W. & Medema, R. H. The same, only different -

DNA damage checkpoints and their reversal throughout the cell cycle. J. Cell Sci. 128,

607–620 (2015).

212. Chen, Y. & Poon, R. Y. C. The multiple checkpoint functions of CHK1 and CHK2 in

maintenance of genome stability. Front. Biosci. J. Virtual Libr. 13, 5016–5029 (2008).

213. López-Contreras, A. J. & Fernandez-Capetillo, O. The ATR barrier to replication-born

DNA damage. DNA Repair 9, 1249–1255 (2010).

214. Cimprich, K. A. & Cortez, D. ATR: an essential regulator of genome integrity. Nat. Rev.

Mol. Cell Biol. 9, 616–627 (2008).

215. Callén, E. et al. Essential role for DNA-PKcs in DNA double-strand break repair and

apoptosis in ATM-deficient lymphocytes. Mol. Cell 34, 285–297 (2009).

216. Sirbu, B. M. & Cortez, D. DNA damage response: three levels of DNA repair regulation.

Cold Spring Harb. Perspect. Biol. 5, a012724 (2013).

164

217. Bologna, S. & Ferrari, S. It takes two to tango: Ubiquitin and SUMO in the DNA damage

response. Front. Genet. 4, 106 (2013).

218. Williams, G. J., Lees-Miller, S. P. & Tainer, J. A. Mre11-Rad50-Nbs1 conformations and

the control of sensing, signaling, and effector responses at DNA double-strand breaks. DNA

Repair 9, 1299–1306 (2010).

219. Kaidi, A. & Jackson, S. P. KAT5 tyrosine phosphorylation couples chromatin sensing to

ATM signalling. Nature 498, 70–74 (2013).

220. Lukas, J., Lukas, C. & Bartek, J. More than just a focus: The chromatin response to DNA

damage and its role in genome integrity maintenance. Nat. Cell Biol. 13, 1161–1169 (2011).

221. Panier, S. & Boulton, S. J. Double-strand break repair: 53BP1 comes into focus. Nat. Rev.

Mol. Cell Biol. 15, 7–18 (2014).

222. Savic, V. et al. Formation of dynamic gamma-H2AX domains along broken DNA strands is

distinctly regulated by ATM and MDC1 and dependent upon H2AX densities in chromatin.

Mol. Cell 34, 298–310 (2009).

223. Price, B. D. & D’Andrea, A. D. Chromatin remodeling at DNA double-strand breaks. Cell

152, 1344–1354 (2013).

224. Dion, V. & Gasser, S. M. Chromatin movement in the maintenance of genome stability.

Cell 152, 1355–1364 (2013).

225. Panier, S. & Durocher, D. Push back to respond better: regulatory inhibition of the DNA

double-strand break response. Nat. Rev. Mol. Cell Biol. 14, 661–672 (2013).

226. Cortez, D., Guntuku, S., Qin, J. & Elledge, S. J. ATR and ATRIP: partners in checkpoint

signaling. Science 294, 1713–1716 (2001).

227. Yang, X. H. & Zou, L. Recruitment of ATR-ATRIP, Rad17, and 9-1-1 complexes to DNA

damage. Methods Enzymol. 409, 118–131 (2006).

165

228. Yan, S. & Michael, W. M. TopBP1 and DNA polymerase alpha-mediated recruitment of

the 9-1-1 complex to stalled replication forks: implications for a replication restart-based

mechanism for ATR checkpoint activation. Cell Cycle Georget. Tex 8, 2877–2884 (2009).

229. Majka, J., Binz, S. K., Wold, M. S. & Burgers, P. M. J. Replication protein A directs

loading of the DNA damage checkpoint clamp to 5’-DNA junctions. J. Biol. Chem. 281,

27855–27861 (2006).

230. Liu, S. et al. ATR autophosphorylation as a molecular switch for checkpoint activation.

Mol. Cell 43, 192–202 (2011).

231. Michael, W. M., Ott, R., Fanning, E. & Newport, J. Activation of the DNA replication

checkpoint through RNA synthesis by primase. Science 289, 2133–2137 (2000).

232. Van, C., Yan, S., Michael, W. M., Waga, S. & Cimprich, K. A. Continued primer synthesis

at stalled replication forks contributes to checkpoint activation. J. Cell Biol. 189, 233–246

(2010).

233. Lee, J., Kumagai, A. & Dunphy, W. G. The Rad9-Hus1-Rad1 checkpoint clamp regulates

interaction of TopBP1 with ATR. J. Biol. Chem. 282, 28036–28044 (2007).

234. Delacroix, S., Wagner, J. M., Kobayashi, M., Yamamoto, K. & Karnitz, L. M. The Rad9-

Hus1-Rad1 (9-1-1) clamp activates checkpoint signaling via TopBP1. Genes Dev. 21,

1472–1477 (2007).

235. Jazayeri, A. et al. ATM- and cell cycle-dependent regulation of ATR in response to DNA

double-strand breaks. Nat. Cell Biol. 8, 37–45 (2006).

236. Stracker, T. H., Usui, T. & Petrini, J. H. J. Taking the time to make important decisions: the

checkpoint effector kinases Chk1 and Chk2 and the DNA damage response. DNA Repair 8,

1047–1054 (2009).

237. Bennetzen, M. V. et al. Acetylation dynamics of human nuclear proteins during the ionizing

radiation-induced DNA damage response. Cell Cycle Georget. Tex 12, 1688–1695 (2013).

166

238. Bennetzen, M. V. et al. Site-specific phosphorylation dynamics of the nuclear proteome

during the DNA damage response. Mol. Cell. Proteomics MCP 9, 1314–1323 (2010).

239. Polo, S. E. & Jackson, S. P. Dynamics of DNA damage response proteins at DNA breaks: a

focus on protein modifications. Genes Dev. 25, 409–433 (2011).

240. Jackson, S. P. & Durocher, D. Regulation of DNA damage responses by ubiquitin and

SUMO. Mol. Cell 49, 795–807 (2013).

241. Soldi, M., Bremang, M. & Bonaldi, T. Biochemical systems approaches for the analysis of

histone modification readout. Biochim. Biophys. Acta 1839, 657–668 (2014).

242. Xu, G. & Jaffrey, S. R. Proteomic identification of protein ubiquitination events.

Biotechnol. Genet. Eng. Rev. 29, 73–109 (2013).

243. Arnaudo, A. M. & Garcia, B. A. Proteomic characterization of novel histone post-

translational modifications. Epigenetics Chromatin 6, 24 (2013).

244. Jeram, S. M., Srikumar, T., Pedrioli, P. G. A. & Raught, B. Using mass spectrometry to

identify ubiquitin and ubiquitin-like protein conjugation sites. Proteomics 9, 922–934

(2009).

245. Wang, X., Herr, R. A. & Hansen, T. H. Ubiquitination of substrates by esterification.

Traffic Cph. Den. 13, 19–24 (2012).

246. Oshikawa, K., Matsumoto, M., Oyamada, K. & Nakayama, K. I. Proteome-wide

identification of ubiquitylation sites by conjugation of engineered lysine-less ubiquitin. J.

Proteome Res. 11, 796–807 (2012).

247. Wang, X. et al. Ubiquitination of serine, threonine, or lysine residues on the cytoplasmic

tail can induce ERAD of MHC-I by viral E3 ligase mK3. J. Cell Biol. 177, 613–624 (2007).

248. Shimizu, Y., Okuda-Shimizu, Y. & Hendershot, L. M. Ubiquitylation of an ERAD substrate

occurs on multiple types of amino acids. Mol. Cell 40, 917–926 (2010).

249. Cadwell, K. & Coscoy, L. Ubiquitination on nonlysine residues by a viral E3 ubiquitin

ligase. Science 309, 127–130 (2005).

167

250. Chen, Y., Chen, W., Cobb, M. H. & Zhao, Y. PTMap--a sequence alignment software for

unrestricted, accurate, and full-spectrum identification of post-translational modification

sites. Proc. Natl. Acad. Sci. U. S. A. 106, 761–766 (2009).

251. Wang, Q.-E. et al. DNA repair factor XPC is modified by SUMO-1 and ubiquitin following

UV irradiation. Nucleic Acids Res. 33, 4023–4034 (2005).

252. Dou, H., Huang, C., Singh, M., Carpenter, P. B. & Yeh, E. T. H. Regulation of DNA repair

through deSUMOylation and SUMOylation of replication protein A complex. Mol. Cell 39,

333–345 (2010).

253. Bianchi, V., Pontis, E. & Reichard, P. Changes of deoxyribonucleoside triphosphate pools

induced by hydroxyurea and their relation to DNA synthesis. J. Biol. Chem. 261, 16037–

16042 (1986).

254. Saleh-Gohari, N. et al. Spontaneous homologous recombination is induced by collapsed

replication forks that are caused by endogenous DNA single-strand breaks. Mol. Cell. Biol.

25, 7158–7169 (2005).

255. Daniely, Y., Dimitrova, D. D. & Borowiec, J. A. Stress-dependent nucleolin mobilization

mediated by p53-nucleolin complex formation. Mol. Cell. Biol. 22, 6014–6022 (2002).

256. Mittelman, D., Sykoudis, K., Hersh, M., Lin, Y. & Wilson, J. H. Hsp90 modulates CAG

repeat instability in human cells. Cell Stress Chaperones 15, 753–759 (2010).

257. Kaplan, K. B. & Li, R. A prescription for ‘stress’--the role of Hsp90 in genome stability and

cellular adaptation. Trends Cell Biol. 22, 576–583 (2012).

258. Williams, J. S. et al. Topoisomerase 1-mediated removal of ribonucleotides from nascent

leading-strand DNA. Mol. Cell 49, 1010–1015 (2013).

259. Stiff, T. et al. ATM and DNA-PK function redundantly to phosphorylate H2AX after

exposure to ionizing radiation. Cancer Res. 64, 2390–2396 (2004).

260. Williams, E. S. et al. Telomere dysfunction and DNA-PKcs deficiency: characterization and

consequence. Cancer Res. 69, 2100–2107 (2009).

168

261. Wang, S.-A. et al. Heat shock protein 90 stabilizes nucleolin to increase mRNA stability in

mitosis. J. Biol. Chem. 286, 43816–43829 (2011).

262. Daniely, Y. & Borowiec, J. A. Formation of a complex between nucleolin and replication

protein A after cell stress prevents initiation of DNA replication. J. Cell Biol. 149, 799–810

(2000).

263. Bensimon, A. et al. ATM-dependent and -independent dynamics of the nuclear

phosphoproteome after DNA damage. Sci. Signal. 3, rs3 (2010).

264. Collis, S. J., DeWeese, T. L., Jeggo, P. A. & Parker, A. R. The life and death of DNA-PK.

Oncogene 24, 949–961 (2005).

265. Meggio, F. & Pinna, L. A. One-thousand-and-one substrates of protein kinase CK2? FASEB

J. Off. Publ. Fed. Am. Soc. Exp. Biol. 17, 349–368 (2003).

266. Luo, H. et al. Regulation of intra-S phase checkpoint by ionizing radiation (IR)-dependent

and IR-independent phosphorylation of SMC3. J. Biol. Chem. 283, 19176–19183 (2008).

267. Soriano-Carot, M., Quilis, I., Bañó, M. C. & Igual, J. C. Protein kinase C controls activation

of the DNA integrity checkpoint. Nucleic Acids Res. 42, 7084–7095 (2014).

268. Chessa, T. A. M. et al. Phosphorylation of threonine 154 in p40phox is an important

physiological signal for activation of the neutrophil NADPH oxidase. Blood 116, 6027–

6036 (2010).

269. Kang, J.-H., Toita, R., Kim, C. W. & Katayama, Y. Protein kinase C (PKC) isozyme-

specific substrates and their design. Biotechnol. Adv. 30, 1662–1672 (2012).

270. Zhao, H. et al. Expression of the p12 subunit of human DNA polymerase δ (Pol δ), CDK

inhibitor p21(WAF1), Cdt1, cyclin A, PCNA and Ki-67 in relation to DNA replication in

individual cells. Cell Cycle Georget. Tex 13, 3529–3540 (2014).

271. Zhang, S. et al. A novel function of CRL4(Cdt2): regulation of the subunit structure of

DNA polymerase δ in response to DNA damage and during the S phase. J. Biol. Chem. 288,

29550–29561 (2013).

169

272. Chen, Y.-W. et al. Human DNA polymerase eta activity and translocation is regulated by

phosphorylation. Proc. Natl. Acad. Sci. U. S. A. 105, 16578–16583 (2008).

273. Errico, A. & Costanzo, V. Mechanisms of replication fork protection: a safeguard for

genome stability. Crit. Rev. Biochem. Mol. Biol. 47, 222–235 (2012).

274. Tercero, J. A., Longhese, M. P. & Diffley, J. F. X. A central role for DNA replication forks

in checkpoint activation and response. Mol. Cell 11, 1323–1336 (2003).

275. Lou, H. et al. Mrc1 and DNA polymerase epsilon function together in linking DNA

replication and the S phase checkpoint. Mol. Cell 32, 106–117 (2008).

276. Roseaulin, L. C. et al. Coordinated Degradation of Replisome Components Ensures

Genome Stability upon Replication Stress in the Absence of the Replication Fork Protection

Complex. PLoS Genet. 9, e1003213 (2013).

277. Lee, M. Y. W. T. et al. The tail that wags the dog: p12, the smallest subunit of DNA

polymerase δ, is degraded by ubiquitin ligases in response to DNA damage and during cell

cycle progression. Cell Cycle Georget. Tex 13, 23–31 (2014).

278. Lin, S. H. S. et al. Dynamics of enzymatic interactions during short flap human Okazaki

fragment processing by two forms of human DNA polymerase δ. DNA Repair 12, 922–935

(2013).

279. Terai, K., Shibata, E., Abbas, T. & Dutta, A. Degradation of p12 subunit by CRL4Cdt2 E3

ligase inhibits fork progression after DNA damage. J. Biol. Chem. 288, 30509–30514

(2013).

280. Fuss, J. & Linn, S. Human DNA polymerase epsilon colocalizes with proliferating cell

nuclear antigen and DNA replication late, but not early, in S phase. J. Biol. Chem. 277,

8658–8666 (2002).

281. Moyal, L. et al. Requirement of ATM-dependent monoubiquitylation of histone H2B for

timely repair of DNA double-strand breaks. Mol. Cell 41, 529–542 (2011).

170

282. Wu, J. et al. Histone ubiquitination associates with BRCA1-dependent DNA damage

response. Mol. Cell. Biol. 29, 849–860 (2009).

283. Park, J. M. et al. Modification of PCNA by ISG15 plays a crucial role in termination of

error-prone translesion DNA synthesis. Mol. Cell 54, 626–638 (2014).

171