<<

A Dissertation

Entitled

Analysis of the Interactions between the 5' to 3' Exonuclease

and the Single-Stranded DNA-Binding from

Bacteriophage T4 and Related Phages

By Laurence S. Boutemy

Submitted as partial fulfillment of the requirements for

the Doctor of Philosophy in Chemistry

______Advisor: Timothy C. Mueser, Ph.D.

______College of Graduate Studies

The University of Toledo

August 2008

Copyright © 2008

This document is copyrighted material. Under copyright law, no parts of this document may be reproduced without the expressed permission of the author.

An Abstract of

Analysis of the Interactions between the 5' to 3' Exonuclease

and the Single-Stranded DNA-Binding Protein from

Bacteriophage T4 and Related Phages

Laurence S. Boutemy

Submitted as partial fulfillment of the requirements for

the Doctor of Philosophy in Chemistry

The University of Toledo

August 2008

DNA replication and repair is one of the most important cellular processes, since preserving the integrity of the DNA genome is essential to all forms of life.

Many are involved in the DNA replication process, and their interaction ensures that the DNA is duplicated and repaired in a coordinated and efficient manner. Bacteriophage T4 is a very good model to study DNA replication, since it encodes all the proteins required at the replication fork, proteins which have been extensively characterized. However, how these proteins interact and coordinate the replication process is still largely unknown. One of these

iii

interactions that appears to govern the rate and efficiency of the lagging strand synthesis occurs between the 5’ to 3’ exonuclease RNase H and the single- stranded DNA-binding 32 protein. The interaction between these two proteins is the focus of this work.

RNase H and the 32 protein, as well as a number of mutants and truncations, were cloned, expressed and purified. These proteins were then used to form different variants of the RNase H + 32 protein complex, which were characterized through biophysical and structural studies. A crystal structure was obtained for the RNase H + 32-B truncation. This structure, along with the results obtained from the biophysical experiments, provides valuable information on how these two proteins interact to coordinate the lagging strand DNA replication.

Finally, the study of the interaction between RNase H and the 32 protein from bacteriophage Rb 69, a phage related to bacteriophage T4, was also initiated.

iv

ACKNOWLEDGEMENTS

First of all I would like to thank my advisor, Dr. Mueser, for his help and

guidance throughout these past five years. Thank you so much for teaching me

the ways of scientific research, when the scientific knowledge I had when I

arrived in Toledo was mostly academic. What I learned in the Mueser lab will be,

I am sure, invaluable for the rest of my research career. I also want to thank Dr.

B. Leif Hanson, who was like a second advisor to me, for his help and assistance

on everything from data collection to career advice. Many thanks to Dr. Funk, Dr.

Viola and Dr. Von Grafenstein, my committee members, for their helpful

suggestions, and to Dr. Huang and Dr. Slama as well, who were part of my

committee at some point. I would like to thank Dr. Charlie Jones and Dr. Nancy

Nossal from the National Institute of Health in Bethesda, MD, for their precious

collaboration on this project. Thank you also to the UT Department of Chemistry and its staff, especially in the Instrumentation Center and the Chemistry

Stockroom.

I am very grateful for all the encouragement and support received from my family and friends back in France while I was in Toledo. Thank you so much for your love and your patience! I also want to thank my labmates, past and present, for their help and friendship, and for all the good times we have had over these five years in the lab. Finally, many thanks go to all the friends I have made in the

Toledo area and who made my stay in Ohio lots of fun and a memory I will forever treasure. I will miss you guys, thank you!

v

TABLE OF CONTENTS

ACKNOWLEDGEMENTS ...... v

TABLE OF CONTENTS ...... vi

LIST OF TABLES...... xiii

LIST OF FIGURES ...... xvii

LIST OF ABBREVIATIONS ...... xxiv

CHAPTER 1 - Background ...... 1

1.1. Bacteriophage T4 DNA Replication and Repair...... 1

1.1.1. Bacteriophage T4 is a Model for DNA Replication ...... 1

1.1.2. Bacteriophage T4 Single-Stranded DNA-Binding 32 Protein ...... 6

1.1.3. Bacteriophage T4 RNase H ...... 9

1.1.4. Known Interactions between Nucleases and Single-Stranded

DNA-Binding Proteins ...... 15

1.1.5. Project Goals ...... 15

1.2. Escherichia coli DNA-binding Protein from Starved Cells ...... 17

CHAPTER 2 - Methodology...... 20

2.1. Molecular Cloning ...... 20

2.1.1. Chain Reaction (PCR)...... 20

2.1.2. Insertion of the PCR Product into an Entry or Expression Vector .... 22

2.1.3. Site-Directed Mutagenesis ...... 24

vi

2.1.4. Transformation into Competent E. coli Cells ...... 27

2.1.5. Agarose Gel Electrophoresis ...... 28

2.1.6. Overview of the Molecular Cloning Process...... 28

2.2. Protein Expression...... 30

2.2.1. Small Scale Expression Studies ...... 30

2.2.2. Large Scale Protein Expression...... 32

2.2.3. SDS-PAGE Gel Electrophoresis ...... 33

2.3. Cell Lysis and Protein Solubility...... 33

2.4. Protein Purification ...... 34

2.5. Protein Preparation ...... 35

2.5.1. Dialysis...... 35

2.5.2. Protein Concentration ...... 36

2.5.3. Solubility Screen ...... 36

2.6. Protein Crystallization ...... 38

2.7. X-Ray Diffraction Data Collection ...... 40

2.7.1. Crystal Cryoprotection and Freezing...... 40

2.7.2. Data Collection...... 42

2.8. Data Processing and Structure Determination...... 43

2.8.1. Data Processing...... 43

2.8.2. Phasing...... 43

2.8.3. Model Building ...... 44

2.8.4. Structure Refinement and Validation ...... 45

2.8.5. Summary of the Data Processing and Model Building Process ...... 46

vii

2.9. Non-Denaturing Gel Electrophoresis ...... 46

2.10. Scattering Studies...... 48

2.10.1. Dynamic Light Scattering (DLS)...... 48

2.10.2. Small-Angle X-Ray Scattering (SAXS)...... 49

2.11. Isothermal Titration Calorimetry (ITC)...... 52

2.12. Fluorescence Anisotropy Titration...... 54

2.13. DNA Purification and Annealing ...... 55

2.13.1. DNA Oligomer Purification ...... 55

2.13.2. DNA Substrate Annealing ...... 56

2.13.3. TBE Gel Electrophoresis...... 56

CHAPTER 3 - Bacteriophage T4 32 Protein and Its Truncations 58

3.1. Introduction ...... 58

3.2. Bacteriophage T4 32 Protein...... 60

3.2.1. Introduction ...... 60

3.2.2. Cell Lysis...... 60

3.2.3. Protein Purification...... 61

3.3. Bacteriophage T4 32 Core Protein...... 62

3.4. Bacteriophage T4 32-A Protein ...... 65

3.5. Bacteriophage T4 32-B Protein ...... 65

3.5.1. Introduction ...... 65

3.5.2. Molecular Cloning ...... 66

3.5.3. Protein Expression...... 78

3.5.4. Cell Lysis...... 80

viii

3.5.5. Protein Purification...... 81

3.5.6. Solubility Screen ...... 88

3.5.7. Dialysis and Concentration ...... 89

3.5.8. Crystal Screening and Optimization...... 89

3.5.9. Data Collection...... 92

3.5.10. Data Processing...... 94

3.5.11. Dynamic Light Scattering ...... 96

3.5.12. Small Angle X-Ray Scattering...... 97

3.6. Bacteriophage T4 32-B Mutants...... 106

3.6.1. Introduction ...... 106

3.6.2. Molecular Cloning ...... 107

3.6.3. Protein Expression and Solubility...... 112

3.6.4. Protein Purification...... 115

3.6.5. Cleaving of the His-Tag...... 118

3.7. Conclusion...... 123

CHAPTER 4 - Bacteriophage T4 RNase H ...... 125

4.1. Introduction ...... 125

4.2. Bacteriophage T4 Native and D132N Mutant RNase H...... 126

4.2.1. Protein Expression...... 126

4.2.2. Cell Lysis...... 127

4.2.3. Protein Purification...... 128

4.2.4. Dialysis and Concentration ...... 129

4.2.5 Scattering Studies...... 133

ix

4.3. Bacteriophage T4 D132N ∆N RNase H...... 137

4.3.1. Protein Expression and Cell Lysis...... 137

4.3.2. Protein Purification...... 139

4.3.4. Solubility Screen ...... 142

4.3.5. Dialysis and Concentration ...... 143

4.4. Conclusion...... 143

CHAPTER 5 - Bacteriophage T4 RNase H + 32 Protein + DNA

Interactions ...... 144

5.1. Introduction ...... 144

5.2. Preliminary Complex Determination...... 144

5.2.1. Protein-Protein Interactions...... 145

5.2.2. Protein-Protein-DNA Interactions...... 147

5.2.3. Summary of the T4 RNase H + 32 Protein Complexes...... 153

5.3. D132N RNase H + 32-B Protein Interaction...... 154

5.3.1. Complex Preparation ...... 154

5.3.2. Structural Studies...... 154

5.3.3. Scattering Studies...... 185

5.3.4. Size Exclusion Chromatography ...... 195

5.3.5. Isothermal Titration Calorimetry ...... 199

5.3.6. Fluorescence Anisotropy Titration...... 202

5.3.7. Protein-Protein-DNA Crystallization ...... 208

5.3.8. 32-B Mutants Studies...... 211

5.3.9. Conclusion ...... 216

x

5.4. D132N ∆N RNase H + 32-B Protein Interaction ...... 218

5.4.1. Complex Preparation ...... 218

5.4.2. Protein-Protein Crystallization...... 218

5.4.3. Protein-Protein-DNA Complex ...... 223

5.4.4. 32-B Mutants Studies...... 223

5.5. D132N ∆N RNase H + 32 Protein Interaction...... 227

5.5.1. Complex Preparation ...... 227

5.5.2. Protein-Protein Crystallization...... 227

5.5.3. Protein-Protein-DNA Crystallization ...... 228

5.5.4. Fluorescence Anisotropy...... 229

5.6. D132N ∆N RNase H + 32 Core Protein Interaction...... 231

5.6.1. Complex Preparation ...... 231

5.6.2. Protein-Protein Crystallization...... 231

5.6.3. Fluorescence Anisotropy...... 233

5.7. Conclusion...... 234

CHAPTER 6 - Bacteriophage Rb69...... 236

6.1. Introduction ...... 236

6.2. Bacteriophage Rb69 Native RNase H ...... 238

6.2.1. Introduction ...... 238

6.2.2. Initial Cloning and Expression...... 239

6.2.3. Molecular Cloning ...... 244

6.2.4. Protein Expression and Solubility...... 252

6.2.5. Protein Purification...... 256

xi

6.3. Bacteriophage Rb69 D132N RNase H...... 271

6.3.1. Introduction ...... 271

6.3.2. Molecular Cloning ...... 273

6.3.3. Protein Expression and Solubility...... 276

6.3.4. Protein Purification...... 278

6.3.5. Cleaving of the His-Tag...... 283

6.4. Bacteriophage Rb69 32-B and Future Work ...... 285

CHAPTER 7 - Escherichia coli DNA-Binding Protein from Starved

Cells ...... 286

7.1. Introduction ...... 286

7.2. Previous Work ...... 286

7.2.1. Expression and Purification...... 287

7.2.2. Characterization...... 287

7.2.3. X-Ray Diffraction Studies...... 289

7.2.4. Discussion and Future Work ...... 290

7.3. Project Follow-up ...... 290

7.3.1. Further Characterization of (SJT Ape FEN-1) Dps...... 292

7.3.2. X-Ray Diffraction Studies...... 296

8.4. Conclusion...... 303

BIBLIOGRAPHY ...... 305

APPENDICES ...... 310

xii LIST OF TABLES

Table 2.1 – PCR Reaction Setup...... 22

Table 2.2 – Vector Insertion Reaction Setup ...... 23

Table 2.3 – Site-Directed Mutagenesis PCR Reaction ...... 26

Table 2.4 – Cloning and Expression Hosts...... 27

Table 2.5 – Antibiotics Required by the Vectors / Cell Lines ...... 31

Table 2.6 – Solubility Screen Solutions ...... 37

Table 2.7 – Crystal Screens ...... 39

Table 2.8 – Cryoprotectants ...... 41

Table 3.1 – 32 Protein and Truncations Characteristics ...... 60

Table 3.2 – HPLC Buffers for T4 32 Protein Purification ...... 61

Table 3.3 – T4 32-B PCR Reactions ...... 70

Table 3.4 – T4 32-B Insertion in pET101 Reaction...... 71

Table 3.5 – T4 32-B Insertion in pENTR-D Reaction...... 72

Table 3.6 – T4 32-B Insertion in pDEST-C1 Reaction ...... 73

Table 3.7 – HPLC Buffers for T4 32-B Purification Scheme 1 ...... 81

Table 3.8 – HPLC Buffers for T4 32-B Purification Scheme 2 ...... 85

Table 3.9 – T4 32-B Crystal Screens...... 90

Table 3.10 – 32-B Data Processing Summary ...... 95

Table 3.11 – 32-B Protein Dynamic Light Scattering Results ...... 97

Table 3.12 – 32 Protein and Truncations Characteristics ...... 106

Table 3.13 – Site-Directed Mutagenesis PCR Reactions for the 32-B Mutants 109

xiii

Table 3.14 – Lysis and HPLC Buffers for the T4 32-B Mutants Purification ..... 115

Table 3.15 – TEV Protease Reaction Setup...... 119

Table 4.1 – RNase H characteristics ...... 126

Table 4.2 – HPLC buffers for T4 RNase H purification ...... 129

Table 4.3 – D132N RNase H Dynamic Light Scattering Results ...... 134

Table 5.1 – D132N RNase H and 32 Truncations Calculated pIs...... 145

Table 5.2 – RNase H + 32 Protein ± DNA Complexes ...... 153

Table 5.3 – RNase H + 32-B Crystal Screens ...... 155

Table 5.4 – D132N RNase H + 32-B Crystals Cryoprotection ...... 159

Table 5.5 – Crystallographic Data for the D132N RNase H + 32-B Dataset 1.. 162

Table 5.6 – Crystallographic Data for the D132N RNase H + 32-B Dataset 2.. 164

Table 5.7 - D132N RNase H + 32-B Crystal Data Collection and Processing .. 175

Table 5.8 - D132N RNase H + 32-B Molecular Replacement Results...... 176

Table 5.9 – Final Refinement and Validation Summary...... 178

Table 5.10 – Dynamic Light Scattering Results for the D132N

RNase H + 32-B Complex at 4 °C ...... 187

Table 5.11 – D132N and 32-B Molecular Weights...... 195

Table 5.12 – Protein Concentrations Used in the ITC Experiment ...... 200

Table 5.13 – Thermodynamic Parameters of the D132N RNase H + 32-B

Complex Formation ...... 201

Table 5.14 –Summary of the Dissociation Constants for the 32 Truncations +

D132N RNase H + Fork DNA Complex...... 206

Table 5.15 – D132N RNase H + 32-B + DNA Crystal Screens ...... 209

xiv

Table 5.16 – Summary of the Dissociation Constants for the 32-B Mutants +

D132N RNase H + fork DNA Complex ...... 215

Table 5.17 – D132N ∆N RNase H + 32-B Crystal Screens ...... 220

Table 5.18 – D132N ∆N RNase H + 32 Protein Crystal Screens...... 227

Table 5.19 – D132N ∆N RNase H + 32 Protein + DNA Crystal Screens ...... 228

Table 5.20 – Summary of the Dissociation Constants from the FA Titrations... 230

Table 5.21 – D132N ∆N RNase H + 32 Core Crystal Screens ...... 231

Table 6.1 – Rb69 RNase H Characteristics...... 239

Table 6.2 – Truncated Rb 69 RNase H vs. T4 RNase H Characteristics...... 240

Table 6.3 – Rb69 RNase H PCR Reaction...... 245

Table 6.4 – Rb69 RNase H Insertion in pET101 Reaction ...... 246

Table 6.5 – Rb69 RNase H Insertion in pENTR-D Reaction ...... 247

Table 6.6 – Rb69 RNase H Insertion in pDEST-C1 Reaction...... 249

Table 6.7 – Lysis and HPLC buffers for Rb69 RNase H purification scheme 1 257

Table 6.8 – Lysis and HPLC buffers for Rb69 RNase H purification scheme 2 261

Table 6.9 – Lysis and HPLC buffers for Rb69 RNase H purification scheme 3 266

Table 6.10 – Rb69 D132N RNase H characteristics...... 272

Table 6.11 – Site-Directed Mutagenesis PCR Reaction for D132N RNase H .. 274

Table 6.12 – Lysis and HPLC buffers for Rb69 D132N RNase H purification .. 278

Table 6.13 – TEV Protease Reaction Setup...... 283

Table 7.1 – Dps Characteristics...... 288

Table 7.2 – DLS Results for (SJT Ape FEN-1) Dps...... 294

Table 7.3 – (BKC Ape FEN-1) Dps Data Processing Summary ...... 298

xv

Table 7.4 – (BKC Ape FEN-1) Dps MolRep Molecular Replacement Summary

...... 299

Table 7.5 - (BKC Ape FEN-1) Dps Phaser Molecular Replacement Summary 300

xvi LIST OF FIGURES

Figure 1.1 – Bacteriophage T4 DNA Replication Fork...... 2

Figure 1.2 – Interaction between T4 RNase H and the 32 Protein ...... 5

Figure 1.3 – T4 32 Protein Domains and Truncations ...... 7

Figure 1.4 – T4 32 Protein Core Domain Crystal Structure ...... 8

Figure 1.5 – Exonuclease vs. Endonuclease Activity ...... 10

Figure 1.6 – T4 RNase H Crystal Structure ...... 11

Figure 1.7 – T4 RNase H Active Site...... 12

Figure 1.8 – T4 RNase H Native vs. Metal Free Crystal Structures...... 13

Figure 1.9 – T4 RNase H + Fork DNA Crystal Structure ...... 14

Figure 1.10 – Ribbon structure of Dps...... 18

Figure 2.1 – Primer Design Scheme...... 21

Figure 2.2 – Site-Directed Mutagenesis Scheme ...... 25

Figure 2.3 – Summary of the Molecular Cloning Process...... 29

Figure 2.4 – X-Ray Diffraction Data Analysis Scheme ...... 47

Figure 2.5 – SAXS Data Analysis Scheme...... 52

Figure 2.6 – DNA Substrate Used in the Fluorescence Anisotropy Titrations .... 54

Figure 3.1 – 32 Protein Domains and 32 Truncations ...... 59

Figure 3.2 – T4 32 Protein Purification ...... 63

Figure 3.3 – T4 32-B Protein Amino-Acid Sequence ...... 66

Figure 3.4 – T4 32-B Initial Expression Plasmid Miniprep ...... 67

Figure 3.5 – T4 32-B PCR Primers...... 69

xvii

Figure 3.6 – Agarose Gels for T4 32-B Cloning...... 74

Figure 3.7 – T4 32-B Protein Expression and Solubility ...... 77

Figure 3.8 – T4 32-B Protein Expression...... 79

Figure 3.9 – T4 32-B Cell Lysis ...... 80

Figure 3.10 – T4 32-B Purification Scheme 1...... 82

Figure 3.11 – T4 32-B Purification Scheme 2...... 86

Figure 3.12 – T4 32-B Solubility Screen ...... 88

Figure 3.13 – T4 32-B Crystals after Screening...... 91

Figure 3.14 – T4 32-B Crystals after Optimization...... 92

Figure 3.15 – 32-B Crystal X-Ray Diffraction Images ...... 94

Figure 3.16 – 32-B Protein Dynamic Light Scattering Results...... 97

Figure 3.17 – 32-B SAXS Data Collection ...... 99

Figure 3.18 – 32-B GNOM Plots...... 101

Figure 3.19 – 32-B 3D Molecular Envelope...... 103

Figure 3.20 – Modeling of the A Domain of 32 Protein (Chadd) ...... 105

Figure 3.21 – T4 32-B Mutants Site-Directed Mutagenesis Primers...... 107

Figure 3.22 – Agarose Gels for the T4 32-B Mutants Cloning ...... 111

Figure 3.23 – T4 32-B Mutants Expression and Solubility ...... 113

Figure 3.24 –T4 I60D 32-B Purification ...... 116

Figure 3.25 – TEV Protease Cleavage Site...... 118

Figure 3.26 – 32-B Mutants TEV Protease Reactions ...... 121

Figure 3.27 – Cysteine Residues in the 32-B Protein ...... 122

Figure 3.28 – I151D 32-B Cross Linking...... 123

xviii

Figure 4.1 – SDS-PAGE of T4 native and D132N RNase H expression ...... 127

Figure 4.2 – SDS-PAGE of T4 D132N RNase H cell lysis...... 128

Figure 4.3 – T4 D132N RNase H purification ...... 130

Figure 4.4 – D132N RNase H Dynamic Light Scattering Results ...... 133

Figure 4.5 – D132N RNase H SAXS Data Collection ...... 134

Figure 4.6 – D132N RNase H GNOM Plots...... 135

Figure 4.7 – D132N RNase H 3D SAXS Molecular Envelopes...... 136

Figure 4.8 – SDS-PAGE of T4 D132N ∆N RNase Expression and Lysis ...... 138

Figure 4.9 – T4 D132N ∆N RNase H Purification ...... 140

Figure 4.10 – T4 D132N ∆N RNase H Solubillity Screen Results...... 142

Figure 5.1 – D132N RNase H + 32 Truncations Native Gel ...... 146

Figure 5.2 – D132N ∆N RNase H + 32 Truncations Native Gel...... 147

Figure 5.3 – DNA Substrates...... 148

Figure 5.4 – RNase H + 32 Truncations + DNA Substrates Gel Shift Assays .. 150

Figure 5.5 – Native RNase H + 32-B Initial Crystal Hits...... 156

Figure 5.6 – D132N RNase H + 32-B Crystal Hits after Screening...... 157

Figure 5.7 – D132N RNase H + 32-B Crystals after Optimization ...... 158

Figure 5.8 – D132N RNase H + 32-B Crystal Data Collection 1...... 161

Figure 5.9 – D132N RNase H + 32-B Crystal Data Collection 2...... 163

Figure 5.10 – SDS-PAGE Gel of the D132N RNase H + 32-B Crystals ...... 166

Figure 5.11 – Intact Mass Spectrum of the D132N RNase H + 32-B Crystals.. 167

Figure 5.12 – RNase H + 32-B Crystals MALDI-TOF Results ...... 168

Figure 5.13 - D132N RNase H + 32-B Crystal Used in Data Collection 3...... 174

xix

Figure 5.14 - D132N RNase H + 32-B Crystal Data Collection 3 Images...... 174

Figure 5.15 – D132N RNase H + 32-B Model Building and Refinement ...... 177

Figure 5.16 – Final D132N RNase H + 32-B Model...... 179

Figure 5.17 – Domain Movement Observed upon Binding ...... 181

Figure 5.18 – Electrostatic Surfaces...... 182

Figure 5.19 – Superposition of a Fork DNA Substrate...... 183

Figure 5.20 – 32-B Mutated Residues ...... 185

Figure 5.21 – Dynamic Light Scattering Results for the D132N

RNase H + 32-B Complex at 4 °C ...... 186

Figure 5.22 – D132N RNase H + 32-B SAXS Data Collection ...... 188

Figure 5.23 – D132N RNase H + 32-B SAXS Data Processing (GNOM)...... 189

Figure 5.24 – D132N RNase H + 32-B SAXS 3D Molecular Envelopes...... 190

Figure 5.25 – Best SASREF fit for the D132N RNase H + 32-B calculated

vs. experimental data ...... 192

Figure 5.26 – Best SASREF Model for the D132N RNase H + 32-B Complex. 192

Figure 5.27 –CRYSOL fit for the D132N RNase H + 32-B theoretical vs.

experimental data...... 193

Figure 5.28 – D132N RNase H + 32-B Complex Gel Filtration Assay ...... 197

Figure 5.29 – D132N RNase H + 32-B Complex Gel Filtration SDS-PAGE Gel198

Figure 5.30 – D132N RNase H + 32-B Isothermal Titration...... 201

Figure 5.31 – Fluorescence Anistropy Titration Fork DNA Substrate ...... 203

Figure 5.32 – Fluorescence Anisotropy Titration of the 32-B + D132N

RNase H + fork DNA Complex ...... 205

xx

Figure 5.33 – Fluorescence Anisotropy Titrations of the 32 Truncations + D132N

RNase H + Fork DNA Complex...... 207

Figure 5.34 – DNA Substrates Used in the Ternary Complex Screens ...... 208

Figure 5.35 – D132N RNase H + 32-B + Fork DNA Crystals after Screening .. 211

Figure 5.36 – Location of the 32-B Mutated Residues at the Interface between

D132N RNase H and 32-B ...... 212

Figure 5.37 – D132N RNase H + 32-B Mutants Native Gels...... 213

Figure 5.38 – Fluorescence Anisotropy Titration of the 32-B Mutants + D132N

RNase H + fork DNA Complex ...... 214

Figure 5.39 – D132N ∆N RNase H + 32-B Initial Crystals ...... 219

Figure 5.40 – D132N ∆N RNase H + 32-B Crystal Hits after Screening...... 220

Figure 5.41 – D132N ∆N RNase H + 32-B Crystal Optimization ...... 222

Figure 5.42 – D132N ∆N RNase H + 32-B Crystal Hits after Optimization ...... 222

Figure 5.43 – D132N ∆N RNase H + 32-B Mutants Native Gels ...... 224

Figure 5.44 – Fluorescence Anisotropy Titrations of the D132N ∆N RNase H +

32-B Mutants + Fork DNA Complex ...... 226

Figure 5.45 – D132N ∆N RNase H + 32 Protein Crystals after Screening...... 228

Figure 5.46 – Fluorescence Anisotropy Titrations of the 32 Truncations + D132N

∆N RNase H + Fork DNA Complex ...... 230

Figure 5.47 – D132N ∆N RNase H + 32 Core Crystals after Screening ...... 232

Figure 5.48 – D132N ∆N RNase H + 32 Core Crystals after Optimization ...... 232

Figure 6.1 – Bacteriophage Rb69 Genomic Map...... 237

Figure 6.2 – Sequence Alignement of T4 RNase H and Rb69 RNase H...... 238

xxi

Figure 6.3 – Sequence Alignment of T4 Native and Rb69 Truncated RNase H240

Figure 6.4 – pET 101 Insert of the Rb69 rnh Gene...... 241

Figure 6.5 – Rb69 RNase H Expression...... 241

Figure 6.6 – Rb69 RNase H Cell Lysis ...... 242

Figure 6.7 – Nucleotide and Amino-Acid C-terminus Sequence Alignment of T4

and Rb69 RNase H ...... 243

Figure 6.8 – Rb69 Full Length RNase H PCR Primers...... 244

Figure 6.9 – Agarose Gels for Rb69 RNase H Cloning ...... 250

Figure 6.10 – Rb69 RNase H (pET101) Expression...... 253

Figure 6.11 – Rb69 RNase H (pDEST-C1) Expression and Cell Lysis...... 255

Figure 6.12 – Rb69 RNase H purification scheme 1...... 258

Figure 6.13 – Rb69 Full Length RNase H purification scheme 2...... 263

Figure 6.14 – Rb69 Full Length RNase H purification scheme 3...... 268

Figure 6.15 – Sequence Alignement of T4 RNase H and Rb69 RNase H...... 272

Figure 6.16 – Site-Directed Mutagenesis PCR primers for Rb69 D132N RNase H

Cloning ...... 273

Figure 6.17 – Agarose Gels for Rb69 D132N RNase H Cloning ...... 275

Figure 6.18 – Rb69 D132N RNase H Expression and Cell Lysis ...... 277

Figure 6.19 – Rb69 D132N RNase H purification...... 280

Figure 6.20 – TEV Protease Reaction Results ...... 284

Figure 7.1 – Amino-Acid Sequence of E. coli Dps...... 288

Figure 7.2 – Crystals of (BKC Tzi FEN-1) Dps and (BKC Ape FEN-1) Dps...... 289

Figure 7.3 – SDS-PAGE Gel of the Truncated Dps Samples ...... 291

xxii

Figure 7.4 – DLS Results for (SJT Ape FEN-1) Dps...... 293

Figure 7.5 – MALDI-TOF Mass Spectrometry Results ...... 294

Figure 7.6 – (BKC Ape FEN-1) Dps Crystals...... 296

Figure 7.7 – X-Ray Diffraction Images of the (BKC Ape FEN-1) Dps Crystals . 297

Figure 7.8 – MolRep vs. Phaser Solutions ...... 301

Figure 7.9 – Final Dps Model from the (BKC Ape FEN-1) Dps Crystals...... 302

Figure 7.10 – Final Dps Model from the (BKC Ape FEN-1) Dps Crystals...... 303

xxiii LIST OF ABBREVIATIONS

AEBSF...... 4-(2-aminoethyl)-benzenesulfonylfluoride

Afu ...... Archeoglobus fulgidus

Ape ...... Aeropyrum pernix

APS ...... Advanced Photon Source

Bis-Tris HCl ...... 2,2-Bis(hydroxymethyl)-2,2’,2’’-nitrilotriethanol Hydrochloride

BME...... β-mercaptoethanol bp ...... base pair

CAPS...... 3-cyclohexamino-1-propanesulfonic acid

CC ...... Correlation Coefficient

CCD...... Charge-Coupled Device

CHES ...... Cyclohexyl-2-aminoethanesulfonic acid

COOT ...... Crystallographic Object-Oriented Toolkit

DLS ...... Dynamic Light Scattering

DNA...... Deoxyribonucleic Acid

dNTP ...... deoxyribonucleotide triphosphate

Dps ...... DNA binding protein from starved cells

dsDNA ...... double-stranded DNA

DTT ...... dithiothreitol

ε...... Extinction Coefficient

E. coli...... Escherichia coli

xxiv

EDTA...... Ethylenediaminetetraacetic Acid

FA...... Fluorescence Anisotropy

FEN-1 ...... 1

GOI...... Gene of Interest

HEPES ...... 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid

HPLC...... High Performance Liquid Chromatography

HSV-1...... Herpes Simplex Virus type 1

IDT...... Integrated DNA Technologies

IPTG ...... Isopropyl β-D-1 Thiogalactopyranoside

ITC...... Isothermal Titration Calorimetry

kb...... kilobase

Kd...... Dissociation Constant kDa ...... kilo Dalton

KOD...... Thermococcus kodakaraensis

LB ...... Luria-Bertani

LDS ...... Lauryl dodecylsulfate

MES...... 4-morpholineethanesulfonic acid monohydrate

MPD ...... 2-Methyl-2,4-Pentanediol

MS ...... Mass Spectrometry

NIH ...... National Institute of Health

OB-fold ...... Oligosaccharide/Oligonucleotide-Binding Fold

PCR...... Polymerase Chain Reaction

PEG...... PolyEthylene Glycol

xxv

PEI...... Polyethylene Imine

Pfu ...... Pyrococcus furiosus pI ...... Isoelectric Point

PIPES...... Piperazinebis(ethanesulfonic) acid

Pol III ...... Polymerase III holoenzyme

Rh ...... Hydrodynamic Radius

RNA...... Ribonucleic Acid

RP-A...... Replication Protein A

SAXS...... Small-Angle X-Ray Scattering

SDS-PAGE...... Sodium Dodecylsulfate–Polyacrylamide Gel Electrophoresis

SEC ...... Size Exclusion Chromatography

SOC...... Super Optimal Catabolite Repression Broth

SSB ...... Single-Stranded DNA-Binding Protein ssDNA ...... single-stranded DNA

TAE ...... Tris-Acetate-EDTA buffer

TAPS ...... N-Tris(hydroxymethyl)methyl-3-aminopropanesulfonic acid

TBE ...... Tris-Borate-EDTA buffer

TCEP...... Tris-2-carboxyethylphosphine

TE...... Tris-EDTA buffer

TEV ...... Tobacco Etch Virus

TOPO ...... I

Tris HCl ...... Tris(hydroxymethyl)aminomethane Hydrochloride

Tzi...... Thermococcus ziligii

xxvi

UV ...... Ultra Violet

WT...... Wild Type

xxvii

CHAPTER 1 - Background

1.1. Bacteriophage T4 DNA Replication and Repair

1.1.1. Bacteriophage T4 is a Model for DNA Replication

Bacteriophage T4 is a virus that infects E. coli bacteria. It undergoes a

lytic lifecycle, where after infecting the bacterial cell, it replicates its phage DNA genome using phage-encoded DNA replication proteins. After the new DNA has been replicated and new phage particles have been formed, they are released by the burst of the host E. coli cell. (Karam, 1994)

DNA replication takes place in the 5’ to 3’ direction. Because of that directionality, the synthesis of the leading strand happens in a continuous fashion, while the synthesis of the lagging strand is discontinuous: Fragments of

DNA, called , are synthesized separately and then joined together to form the new lagging DNA strand. (Kornberg, 1992)

A model of the bacteriophage T4 DNA replication fork along with all the proteins involved in the process is presented in Figure 1.1. The T4 DNA replication proteins are named after their gene number, with the exception of

RNase H.

1 2

Figure 1.1 – Bacteriophage T4 DNA Replication Fork

The parent DNA strands are shown in black, the newly synthesized daughters strands in red and the primers in green. The direction of DNA replication, from the 5’ end to the 3’ end is indicated by the arrows. RNase H and the single-stranded DNA–binding 32 proteins are in red, as the work presented here focuses on the interaction between these two proteins.

The DNA duplex is unwound by the 5’ to 3’ DNA (gene 41 protein) in purple. The hexameric helicase is loaded on the DNA by the helicase loading protein (gene 59 protein), in red. The leading strand synthesis only involves a limited number of proteins: The DNA polymerase (gene 43 protein), in gray, synthesizes the new DNA strand in the 5’ to 3’ direction. It also has a 3’ to

5’ exonuclease activity. The of the DNA polymerase is increased by its interaction with the trimeric clamp protein (gene 45 protein) in green, which is itself loaded on the DNA substrate by the clamp loader proteins (gene 44 and 62

3 proteins) in orange. The single-stranded DNA is protected from nucleases and re-annealing by the single-stranded DNA-binding proteins (gene 32 proteins), in blue. The synthesis of the lagging strand is somewhat more complicated. First of all, as mentioned previously, the lagging strand synthesis is discontinuous and occurs through the synthesis of Okazaki fragments, that then need to be linked together to form a continuous strand of DNA. Moreover, the DNA polymerase cannot initiate new chains, therefore requiring the need for RNA primers to initiate the synthesis of the Okazaki fragments. The short RNA primers (5 nucleotides) are synthesized by the (gene 61 protein) in bright yellow.

The DNA polymerase can then proceed from the primer and synthesize the

Okazaki fragment. Similarly to the leading strand, the single-stranded DNA of the lagging strand is protected by the 32 proteins. The RNA primers are removed by the 5’ to 3’ nuclease RNase H, in pink. Finally, after removal of the primers and subsequent repairs of the gaps, the nicks between the different Okazaki fragments are sealed by the DNA ligase (gene 30 protein) in light yellow. Another protein not shown on the diagram is the Dda helicase. Its role is to remove any other protein that might be blocking the DNA replication. Obviously, the coordination of the DNA replication, for both the leading and the lagging strand, depends on the interactions between all of these proteins. (Nossal, 1992; Nossal,

1994)

The bacteriophage T4 replication proteins are similar to other DNA replication proteins from higher organisms (Nossal, 1994), only the T4 system is a lot simpler as each activity at the replication fork is performed by a separate

4 protein. For instance, the E. coli Pol III holoenzyme is a complex made of ten types of subunits that has both polymerase and exonuclease activities, as well as the DNA binding enhancing activity that is associated with the clamp and clamp loader proteins in T4 (Nossal, 1994). The relative simplicity of the bacteriophage

T4 replication fork, compared to the eukaryotic or prokaryotic DNA replication systems, makes it a very good model to study DNA replication. It is especially useful to study the protein-protein interactions at the replication fork, and how these interactions are involved in coordinating the synthesis of the leading and lagging strands.

One of the important protein-protein interactions at the T4 DNA replication fork is the one between RNase H and the single-stranded DNA-binding protein or

32 protein. The two proteins interact at the RNA primer location on the Okazaki fragment, as is shown on Figure 1.2 below. The RNase H / 32 protein complex is especially important is it appears to be a key player in the regulation of the lagging strand synthesis. Indeed, RNase H alone can only remove a short oligonucleotide (one to four nucleotides) before falling off the replication fork.

However, upon 32 binding its processivity is dramatically increased, and the

32-assisted RNase H can then go through multiple rounds of DNA cleavage, and is able to remove up to 50 nucleotides each time it binds to the DNA duplex.

(Bhagwat et al., 1997)

The two proteins will be introduced separately in the following sections.

5

Figure 1.2 – Interaction between T4 RNase H and the 32 Protein

The proteins and DNA are color-coded similarly to the ones in Figure 1.1.

c The polymerase / clamp complex synthesizes the Okazaki fragment on the left, and displaces the 32 proteins as it moves along the DNA strand.

d RNase H binds where the RNA primer is located.

e RNase H cleaves the RNA primer along with some of the adjacent DNA, and then falls off the DNA replication fork. The DNA polymerase displaces the remaining 32 proteins and fills the gap left by RNase H.

f The DNA ligase comes in and seals the nick between the two Okazaki fragments.

6

1.1.2. Bacteriophage T4 Single-Stranded DNA-Binding 32 Protein

The single-stranded DNA-binding 32 protein from bacteriophage T4 is the

phage equivalent of SSB in E. coli or RP-A in humans (Nossal, 1994). It was first

isolated by Bruce Alberts in 1970 (Alberts, 1970). It cooperatively binds to the

single-stranded DNA during replication, in order to prevent it from re-annealing

and protect it from nucleases. It is also known as the “helix destabilizing protein”

or “DNA melting protein” for its ability to lower the melting temperature of

double-stranded DNA helices (Waidner et al., 2001). It also that appears the

ssDNA decorated with 32 proteins is not extended but rather forms a compact structure (Chastain, 2003).

The 32 protein plays a very important role at the replication fork, especially in lagging strand DNA synthesis. It stimulates the assembly of the polymerase-clamp complex on the lagging strand (Nossal, 1992). In the event that the helicase is loaded by the helicase-loading protein (59 protein), which is much more efficient than when the helicase is loaded without the help of 59 protein, then the 32 protein is required for the leading strand synthesis (Jones et al., 2004). And as it was mentioned before, the processivity of RNase H is greatly increased upon binding to the 32 protein (Bhagwat et al., 1997). So clearly the 32 protein is a very important protein at the DNA replication fork, by its implication on leading strand as well as lagging strand synthesis efficiency. These roles can however only be fulfilled through interactions with other proteins at the replication fork.

7

The 32 protein is a metalloprotein that contains one Zn2+ ion per molecule of 32 (Giedroc et al., 1986). That Zn (II) atom has a structural role, which was shown by proteolysis studies (Giedroc et al., 1987) and proton NMR studies (Pan et al., 1989).

The domains of the 32 protein were identified by limited proteolysis experiments (Karpel, 1990). They are described below in Figure 1.3.

Figure 1.3 – T4 32 Protein Domains and Truncations

1 16 17 253 254 301 B Domain Core Domain A Domain

• 32 Protein : amino-acid 1 to 301 • 32 Core Protein : amino-acid 17 to 253 • 32-B (32 minus B) Protein : amino-acid 17 to 301 • 32-A (32 minus A) Protein : amino-acid 1 to 253

The N-terminal or B (for basic) domain is responsible for the cooperativity of the 32 protein binding to DNA (Giedroc et al., 1991; Casas-Finet et al., 1992).

The C-terminus or A (acidic) domain plays a large role in the interaction of the 32 protein with other proteins at the replication fork, and has for instance been shown to bind to the T4 DNA polymerase (43 protein) (Hurley, 1993). It is not known, however, if the A domain is also involved in the RNase H interaction.

Finally, the core domain is responsible for ssDNA binding, but this DNA-protein interaction is enhanced in the presence of the A domain (Waidner et al., 2001).

Similarly, the C-terminus of the E. coli SSB protein was found to affect the

8

binding to DNA (Williams, 1983) and this domain was disordered in the crystal

structure, even in the presence of ssDNA (Savvides, 2004).

The crystal structure of the core domain of bacteriophage T4 32 protein

was solved in 1995 by Yousif Shamoo (Shamoo et al., 1995). The model of the

32 core domain, shown below in Figure 1.4, shows that the protein has an overall

OB-fold (oligosaccharide / oligonucleotide-binding fold).

Figure 1.4 – T4 32 Protein Core Domain Crystal Structure

The 32 core domain is colored as Jones’rainbow, with the N-terminus in blue and the C- 2+ terminus in red. The Zn ion is shown in gray. The figure was generated using PyMOL (DeLano and Lam, 2005)

The structure shows three main domains in the 32 core protein. The

subdomain I binds the Zn2+ ion, while the sudomain II and the connecting region

forms the DNA-binding cleft. This cleft is lined with positively charged residues on one side and hydrophobic residues on the other, allowing the single-stranded

DNA to bind through respective electrostatic interactions with the phosphate

9

backbone, and hydrophobic interactions with the bases. Moreover, the crystals

used to solve the structure did contain single-stranded DNA, for which only very

weak electron density was observed, indicating that the binding of ssDNA is not

sequence-dependent, and that the ssDNA can slide freely along that cleft

(Shamoo et al., 1995).

1.1.3. Bacteriophage T4 RNase H

Bacteriophage T4 RNase H is a member of the FEN-1 family of replication

and repair nucleases (Liu et al., 2004). It was first isolated in 1990 by Nancy

Nossal (Hollingsworth and Nossal, 1991). RNase H is a 5’ to 3’ nuclease, with

both exonuclease and endonuclease activities. It removes the five nucleotide

long RNA primers and about thirty nucleotides of the adjacent DNA before the

lagging strand Okazaki fragments are completed and ligated. RNase H acts as a

5’ to 3’ exonuclease on DNA/DNA or DNA/RNA duplex, and as an endonuclease

on fork and flap substrates. This endonuclease activity is necessary, in case the

RNA primer from one Okazaki fragment is being displaced by the polymerase/clamp complex while extending the next fragment. It was shown

however that the 5’ to 3’ exonuclease activity is the most prominent at the T4

DNA replication fork (Bhagwat and Nossal, 2001). In other members of the FEN-

1 family, such as the Aeropyrum pernix (Ape) FEN-1, the endonuclease activity is

mostly responsible for the RNA primer processing. A comparison of the two

mechanisms is presented in Figure 1.5.

10

Figure 1.5 – Exonuclease vs. Endonuclease Activity

c

d

c Exonuclease activity of RNase H – it removes the RNA primer and adjacent DNA before the polymerase/clamp complex reaches that particular Okazaki fragment.

d Endonuclease activity of FEN-1 – The primer is first diplaced by the polymerase/clamp complex, and FEN-1 can then act on the flap DNA that is created and remove the RNA primer and adjacent DNA. The cut is made one nucleotide short of the junction between dsDNA and ssDNA.

RNase H is known to interact with two other proteins at the replication

fork, the 32 protein (single-stranded DNA-binding protein), which was described in the previous section, and the 45 protein (clamp protein). The 32 protein increases the processivity of the nuclease: upon 32 binding, RNase H can go through multiple cuts and remove an average of 30 nucleotides from each

Okazaki fragment, while it can only remove a maximum of four nucleotides by itself (Bhagwat et al., 1997). 32 also strongly inhibits the flap endonuclease activity of RNase H, when binding to the single-strand of the flap substrate

(Bhagwat et al., 1997). The processing of nicked or gapped substrates is stimulated by the binding of the 45 clamp protein (Gangisetty et al., 2005). It was

11 shown that the 32 protein interaction occurs through the C-terminus of RNase H, and the interaction with the 45 protein through the N-terminus (Gangisetty et al.,

2005).

The crystal structure of T4 RNase H was solved in 1996 by Timothy

Mueser (Mueser et al., 1996). It is shown below in Figure 1.6.

Figure 1.6 – T4 RNase H Crystal Structure

The RNase H protein is colored as Jones’rainbow, with the N-terminus in blue and the 2+ C-terminus in red. The two Mg ions in the active site are shown in gray. The figure was generated using PyMOL (DeLano and Lam, 2005)

The structure was solved in the presence of magnesium, which is required for the nuclease activity of the protein. RNase H is composed a small subdomain, and a large subdomain containing the N- and C-termini. The active site is located in the groove in between. The bridge region (residues 89 to 97), connecting the two subdomains is disordered in the crystal structure, and so are the first eleven residues in the N-terminus. A closer look at the active site is given in Figure 1.7.

12

Figure 1.7 – T4 RNase H Active Site

K199

D155 D157 Q22

D19 Mg2 Mg1 D132

D71

E130

The magnesium ions are shown in black and the water molecules directly coordinated to the metal in yellow. The residues shown in orange are negatively charged (Asp and Glu residues), the one in green is neutral (Gln) and the one in blue positively charged (Lys). They are all involved in Mg2+ binding to some extent.

The two magnesium ions have an octahedral coordination sphere. The

first one, Mg1, is directly coordinated to the D132 residue, and five water molecules. Mg2 has an inner sphere coordination of only water molecules. These water molecules are bound through a network of hydrogen bonds made with the

carboxyl groups of aspartate and glutamate residues clustered in the active site.

A lot of these residues had previously been identified as important for catalysis since they are conserved throughout the FEN-1 family of nucleases, and their

13 role been deciphered by site-directed mutagenesis studies (Bhagwat et al.,

1997). Other residues such as Q22 and K199 are there to orient the carboxyl groups properly for the hydrogen bond interactions. Mg1 is important for the catalytic nuclease reaction, while Mg2 is a structural ion and plays a smaller role in catalysis (Mueser et al., 1996).

A number of aspartate mutants were designed as inactive versions of

RNase H that could be used for protein-DNA studies (Bhagwat et al., 1997), among which the D132N mutant. The crystal structure of the D132N RNase H without magnesium bound has been solved (Tomanicek) and is presented below in Figure 1.8, along with the native RNase H crystal structure.

Figure 1.8 – T4 RNase H Native vs. Metal Free Crystal Structures

The native RNase H is shown in gray and the metal-free D132N mutant in blue. The fig ure was generated using PyMOL (DeLano and Lam, 2005)

14

The two structures are very similar, but the N-terminus and the bridge region are ordered in the metal-free structure. This is an interesting result, as it is usually expected that metalloproteins are more ordered in the presence of metal ions, which is clearly not the case for RNase H. Other experimental evidence further confirmed this result (Mueser, personal communication). This might be an indication that the magnesium ions are not bound to the protein when it is inactive, but rather come along with the DNA substrate and activate the protein upon substrate binding. This hypothesis is still under ongoing investigation.

Yet another crystal structure of T4 RNase H has been solved, this time complexed with a fork DNA substrate (Devos et al., 2007). This is shown in

Figure 1.9.

Figure 1.9 – T4 RNase H + Fork DNA Crystal Structure

The figure was generated using PyMOL (DeLano and Lam, 2005)

15

1.1.4. Known Interactions between Nucleases and Single-Stranded

DNA-Binding Proteins

Interactions between single-stranded DNA binding proteins and nucleases have been characterized in a number of different organisms.

The E. coli DNA replication system being one of the most studied, the

E. coli tetrameric SSB has been shown to interact with several E. coli nucleases.

SSB was found to stimulate the activity of Polymerase II and the 3’ to 5’

Exonuclease I (Molineux and Gefter, 1975), and bind to either one even in the absence of DNA. The Exonuclease I – SSB interaction was further characterized later on, as it appears that SSB also enhances the dRpase (DNA deoxyribophosphodiesterase) activity of the protein (Sandigursky, 1993), and that the two proteins interact through the SSB C-terminus (Genschel et al.,

2000). E. coli SSB also interacts with and enhances the activity of the 5’ to 3’

RecJ exonuclease (Han et al., 2006).

In HSV-1 (Herpes Simplex Virus type 1), the 5’ to 3’ exonuclease alkaline nuclease, required for , performs strand exchange in association with the single-stranded DNA-binding protein ICP8 (Reuven et al.,

2003).

1.1.5. Project Goals

One of the focuses of the Mueser lab is to study structure-specific recognition of fork and flap DNA substrates at the DNA replication fork.

Bacteriophage T4 is a very good model system for that purpose, as it encodes all

16 the proteins needed for DNA replication as separate entities, therefore making the system easier to study. The structures of most of these proteins have already been solved, so the next step is to look at protein-protein and protein-DNA complexes, in order to shed more light on how the different proteins involved at the replication fork come together to run and coordinate the DNA replication process.

One of the main interactions that occur at the replication fork is the one between the 5’ to 3’ RNase H and the single-stranded DNA-binding 32 protein.

This interaction appears to be one of the key players in keeping the lagging strand replication organized and efficient. As it was shown, the structures of

RNase H and the DNA-binding 32 core domain have been solved. The way

RNase H binds to its fork DNA substrate is also known, thanks to the RNase H +

DNA co-crystal structure. With this information in hand, the objective is to further characterize the interaction between RNase H and the 32 protein via structural and biophysical studies. The results from these studies, together with the RNase

H + DNA interaction studies already available, should provide critical information on the organization of the lagging strand replication of bacteriophage T4. And as we have seen in the previous section, the interactions between single-stranded

DNA-binding proteins and nucleases are ubiquitous in various life forms, making the information that will be obtained from the T4 system applicable to higher organisms’ DNA replication systems.

The work done on the RNase H – 32 protein interaction is described in

Chapters 3 through 6 of this dissertation.

17

1.2. Escherichia coli DNA-binding Protein from Starved Cells

Bacteria are constantly facing various changes in their environment, and

conditions such as low pH, oxidative stress or nutrient limitations can impair their

chances of survival. Therefore, they have developed a number of mechanisms to

enable them to survive under these stressful conditions. For instance, when E. coli enters the stationary phase, it starts expressing a non-specific DNA-binding protein, Dps (DNA-binding protein from starved cells), also called PexB (Grant et

al., 1998). Dps has been shown to protect DNA from oxidative damage (Martinez

and Kolter, 1997; Ilari et al., 2002), and induce compaction of the genomic DNA

during the transition from the exponential to the stationary growth phase, which

would also provide protection to the DNA (Azam and Ishihama, 1999).

The crystal structure of Dps was solved by Grant and coworkers (1998).

The Dps monomer folds as a four-helix bundle, which then associates as a

dodecamer to form a sphere-like structure measuring 90 Å in diameter, with a 45

Å hollow core. The ribbon structures of the monomer and dodecamer are shown

in Figure 1.10.

Dps appears to be a structural analogue of ferritin, an iron storage protein,

which also forms a four-helix bundle that associates as a 24-mer into a hollow

sphere. The ferritin 24-mer is 120 Å in diameter, the central core 80 Å in diameter

and can contain up to 4000 iron atoms. It has a ferroxidase site that allows it to

oxidize Fe2+ ions into Fe3+ ions that can then be incorporated into the ferrihydrite

core. Because of the structural similarities, it has been proposed that Dps

protects DNA from oxidative damage by storing Fe2+ ions, therefore preventing

18

them from generating hydroxyl free radicals through the Fenton reaction that

could then create single- and double-strand breaks. It has since been shown that

Dps also contains a ferroxidase center and can store up to 400 iron atoms inside

its hollow core (Ilari et al., 2002). This provides an original way of protecting DNA

from free radicals by preventing their formation, as opposed to oxidative repair

proteins such as the catalase or the superoxide dismutase that remove active

oxygen species after they appear.

Figure 1.10 – Ribbon structure of Dps

a b

a – Dps monomer, the sodium ion is shown in black, the N-terminus is in blue and the C-terminus in red. b – Dodecameric structure of Dps (12 monomers). PDB: 1dps (Grant et al., 1998) The figures were generated using PyMOL (DeLano and Lam, 2005)

19

The crystal structure doesn’t provide any insight into on the mechanism of

DNA-binding of Dps, as the surface of the sphere is negatively charged.

However, the lysine-rich N-terminus of the protein was disordered in the crystals and is not present in the model. Once modeled in, it can be seen that within the two-dimensional hexagonal crystal lattice, each N-terminus along with two other neighboring N-termini are brought together and line the solvent channels of the crystal, which are then positively charged with nine lysine residues and can accommodate a DNA helix. As a result, the overall structure of Dps bound to

DNA would be several hexagonal sheet-like structures, with DNA threaded though the solvent channels (Grant et al., 1998). Indeed, N-terminal deletion studies showed that the N-terminus plays an important role in self-aggregation and DNA binding activities, but not in preventing oxidative damage (Ceci et al.,

2004). Finally, Azam and coworkers showed that Dps binds DNA ranging from 40 to 64 base pairs with a dissociation constant Kd of 172 - 178 nM (Azam and

Ishihama, 1999).

The work done on the E. coli Dps protein is described in Chapter 7 of this

dissertation.

CHAPTER 2 - Methodology

2.1. Molecular Cloning

2.1.1. Polymerase Chain Reaction (PCR)

The primers for the PCR reaction are designed according to the expression vector that is to be used later on. Out of the different proteins that were cloned for this project, the pET101 (Invitrogen) and the pDEST-C1 (Horanyi et al., 2006) expression vectors were used. The PCR product can be inserted directly in the pET101 vector, but in the case of pDEST-C1 it has to be inserted in the pENTR-D (Invitrogen) entry vector first. Maps of all these vectors are available in Figure 2.2. Both pET101 and pENTR-D vectors use directed TOPO- assisted cloning, and therefore require a CACC overhang on the 5’-end of the gene that is to be inserted. This CACC overhang therefore has to be added to the forward primer. Another consideration to take into account is the fact that the pDEST-C1 inserts an N-terminal His-Tag. To allow cleavage of that tag after purification, it is recommended to insert a TEV protease cleavage site between the CACC overhang and the gene-coding sequence. A schematic overview on how to design the forward and reverse primers is provided in Figure 2.1. The

20 21

annealing part of the primers should be extended in such a way that the melting

temperature is around 55 °C.

Figure 2.1 – Primer Design Scheme

Forward Primer – pDEST-C1 insertion

GOI 5’- ATG NNN… -3’ primer 5’- C ACC GAG AAC CTC TAC TTC CAA GGA ATG NNN… -3’

5’- C ACC GAG AAC CTC TAC TTC CAA GGA ATG NNN… -3’

Forward Primer – pET101 insertion

GOI 5’- ATG NNN… -3’ primer 5’- C ACC ATG NNN……-3’

5’- C ACC ATG CTG NNN… -3’

Reverse Primer (Inverse Complement)

GOI 5’- …XXX TAA -3’ primer 3’- …NNN ATT -5’

5’- TTA NNN… -3’

The CACC overhang is colored in red, the TEV protease cleavage site in blue. The ATG start codon is highlighted in green and the TAA (can also be TGA) stop codon in red.

The primers are ordered from Integrated DNA Technologies (IDT). Upon

reception, they are redissolved in 1X TE buffer (20 mM Tris HCl pH 8.0 + 1 mM

EDTA) and a final primer solution at 10 µM is prepared. The PCR reaction is then

set up as described in Table 2.1. The polymerase that is usually used in this step

is the ProofStart DNA Polymerase (Qiagen), but in some cases, like for a longer

gene that requires improved fidelity, the pfuUltra DNA Polymerase (Strategene)

22 can be used as well. The annealing temperature is calculated as the melting temperature + 2 °C.

Table 2.1 – PCR Reaction Setup

PCR Reaction PCR Program

Polymerase Buffer 1X Activation 95 °C

Primer solution 2 µM Denaturation 95 °C

dNTPs 10 mM Annealing Annealing temp.

MgSO4 2.5 mM Extension 68 to 70 °C

Polymerase (1 U/µL) 1 µL N cycles

Extension temp. Template 1 µL Final Extension 5 to 20 minutes Autoclaved water Final vol. 50 µL

The concentrations indicated are final concentration in the PCR reaction. Autoclaved MQ water is added so that the final volume of the reaction is 50 µL. The temperatures and times used in the PCR program depend on the polymerase used and the length of the gene being amplified.

After the PCR reaction is done, the PCR product is run on a 1 % agarose gel (see Section 2.1.5), along with the appropriate DNA ladder (100 bp or 1 kb linear DNA ladder at this stage). If the product has the correct size, it is then gel purified using the MiniElute kit (Qiagen).

2.1.2. Insertion of the PCR Product into an Entry or Expression Vector

After the PCR product has been purified, it is inserted into the vector of choice using the Gateway® technology. The entry (pENTR-D) and expression

23

(pET101 and pDEST-C1) vectors used for this particular project are presented in

Appendix 1.

The PCR product is first inserted in the vector using directional TOPO-assisted cloning. The Topoisomerase 1 is already covalently linked to the commercially available vector. In the case of pDEST-C1, the gene is swapped from the entry vector to the expression vector using a transposition reaction, using the LR

Clonase enzyme mix.

The way the different reactions are setup is described in the following tables. The reactions are then incubated at room temperature for 30 minutes to 2 hours. After the LR Clonase reaction is performed, the LR Clonase enzymes have to be digested. This is done by adding 1 µL of Proteinase K to the reaction and incubating at 37 °C for 10 minutes.

Table 2.2 – Vector Insertion Reaction Setup

pENTR-D Insertion Reaction

Purified PCR product 2 µL

Salt Solution 1 µL

pENTR-D vector 1 µL

Autoclaved water 2 µL

pET101 Insertion Reaction

Purified PCR product 2 µL

Salt Solution 1 µL

pET101 vector 1 µL

Autoclaved water 2 µL

24

LR Clonase Reaction

Gel Purified GOI in pENTR-D 1 µL

pDEST-C1 1 µL

LR Clonase Mix 2 µL

TE buffer (1X) 6 µL

The 1X TE buffer is made of 20 mM Tris HCl pH 8.0 + 1 mM EDTA

After the insertion reaction is done, the plasmid is transformed into competent

E. coli cells (see Section 2.1.4) and amplified before being run on an agarose gel

to check the success of the reaction.

2.1.3. Site-Directed Mutagenesis

The site-directed mutagenesis protocol described here is similar to the

QuikChange® (Strategene) protocol.

Contrary to regular gene amplification by PCR, both primers here are centered on the codon that is to be mutated. The mutation is introduced by modifying the nucleotides of interest directly in the primers. These primers will therefore be able to anneal completely with the gene, except at the site of the mutated nucleotides. A scheme of the site-directed mutagenesis process is given in Figure 2.2. The modified primers are annealed on the template plasmid, and the mutated plasmid is amplified by PCR reaction. Finally, the original template is recognized as methylated DNA and digested using an appropriate restriction enzyme.

25

Figure 2.2 – Site-Directed Mutagenesis Scheme

c Annealing of the Template plasmid mutant primers and DNA replication

d

e Mutated plasmid Mutated and template plasmids

The gene of interest that needs to be mutated is shown in red, the mutated gene in green and the rest of the vector in black. c - The mutated primers are annealed on the template plasmid. The mutated nucleotides do not anneal with the template gene. d - The PCR amplification reaction is performed, after which both the initial plasmid and the mutated one are present in solution. e - The template plasmid, methylated as it was isolated from bacteria, is digested using the DpnI restriction enzyme. Only the mutated plasmid is left in solution, and it can now be transformed into competent cells.

26

The primers are designed so that the annealing temperature (calculated according to the QuikChange manual) is close to 75 °C. They are also ordered from Integrated DNA Technologies, and dissolved in TE buffer upon reception.

The polymerase used in the site-directed mutagenesis PCR reaction is the high fidelity Hot Start KOD DNA Polymerase (Novagen), since it is more efficient in replicating large pieces of DNA like plasmids than the ProofStart DNA

Polymerase used previously. The PCR reaction is set up as described in Table

2.3.

Table 2.3 – Site-Directed Mutagenesis PCR Reaction

PCR Reaction PCR Program

KOD buffer (10X) 5 µL Activation 95 °C, 2 minutes

Forward primer (2.5 µM) 6 µL Denaturation 95 °C, 20 seconds

Reverse primer (2.5 µM) 6 µL Annealing 60 °C, 10 seconds

dNTPs (2 mM) 5 µL Extension 70 °C, 2 minutes

MgSO4 (25 mM) 5 µL 20 cycles

KOD Polymerase (1 U/µL) 1 µL Final extension 70 °C, 20 minutes

Template 1 µL

Autoclaved water 21 µL

After the reaction, the PCR product is incubated with the DpnI restriction

enzyme for one hour at 37 °C. The mutated plasmid is then ready for

transformation.

27

2.1.4. Transformation into Competent E. coli Cells

There are two types of cell lines: the cloning host, used to amplify plasmids, and the expression host, for protein production. Each plasmid has to be transformed first into a competent cloning host. Once it has been assessed that the gene is inserted correctly in the vector, the expression plasmid can then be transformed into a competent expression host. The different cell lines used in this project are described in Table 2.4.

Table 2.4 – Cloning and Expression Hosts

Cloning Hosts Expression Hosts

TOP10 BL21 (DE3) Star

XL10 BL21 (DE3) Gold

DH5α BL21 (DE3) pLysS

Omnimax BL21 (DE3) RILP

RosettaBlue™ (DE3)

T7 Express lacIq

The plasmid of interest (2 µL for cloning hosts, 0.5 to 0.8 µL for expression

hosts) is added to 25 to 50 µL of competent cells. The mixture is incubated on ice for 30 minutes, heat-shocked at 42 °C for 30 seconds and incubated on ice again for a few minutes. SOC media is then added (250 µL) to the cells which are then incubated at 37 °C for one hour. Finally, the cells can be plated on LB Agar

(35 g/L) + antibiotics plates, or grown directly in liquid media of LB (25 g/L) + antibiotics. A list of the antibiotics required by the vectors and/or cell lines is

28 given in Table 2.5. The plates or cell cultures are incubated at 37 °C until growth is achieved.

When the cells have been plated and colonies obtained, some of these colonies are picked and grown overnight in LB + antibiotics media. The cells are then harvested and the plasmid isolated using the MiniPrep kit from Qiagen. The plasmid can then be run on an agarose gel to check for insertion of the gene.

2.1.5. Agarose Gel Electrophoresis

The 1 % agarose gel is prepared by boiling Agarose MB (Midsci) in 20 mM

Tris-Acetate pH 8.0, 1 mM EDTA (TAE) buffer, which is then poured in the gel box, and the comb is placed on top of the gel. It is left on the bench at room temperature to cool down, then at 4°C overnight. The samples are prepared by mixing 2 µL of the DNA with 1 µL of Gel Loading Solution (0.25 % Bromophenol

Blue + 40 % glycerol). Once the samples are pipetted inside the wells, the gel is run at 90 V for 90 to 120 minutes. It is then stained in 0.01 % SYBR Gold nucleic acid gel stain (Invitrogen) for 30 minutes, and visualized using a UV transilluminator and a 515 nm emission filter.

2.1.6. Overview of the Molecular Cloning Process

A schematic summary of the molecular cloning process is given below in

Figure 2.3.

29

Figure 2.3 – Summary of the Molecular Cloning Process

Gene Amplification by PCR Site-Directed Mutagenesis * PCR Reaction

* Insertion in pENTR-D Insertion in pET101

Transformation in Cloning Host

Transformation in Transformation in * Cloning Host Cloning Host Transformation in * * Expression Host LR Clonase Reaction Transformation in Insertion in pDEST-C1 Expression Host Small Scale Expression Studies

Transformation in Small Scale * Cloning Host Expression Studies Gene Sequencing * *

Transformation in Gene Sequencing Expression Host

Small Scale * Agarose Gel Expression Studies * SDS-PAGE Gel *

Gene Sequencing

30

2.2. Protein Expression

2.2.1. Small Scale Expression Studies

Once an expression plasmid has been obtained and successfully

transformed into an expression host, small scale expression studies are carried

out to confirm that the protein is indeed expressed.

After transformation, 100 µL of the transformed cells are added to 10 mL

of Luria Broth culture media at 25 g/L and 1 mM of the required antibiotics (see

Table 2.5), and grown overnight at 37 °C and 180 rpm in a New Brunswick

Scientific Innova 4000 Incubator Shaker. The next morning, 500 µL of the overnight cell culture are taken and used to inoculate a fresh solution of 10 mL of

LB at 25 g/L and 1 mM antibiotics. After the optical density at 600 nm OD600 reaches 0.4 to 0.6, 1 mL of the culture is taken, mixed with 1 mL of 50 % glycerol, and flash frozen on dry ice. This glycerol stock of cells is stored at -80°C and can be used later on to inoculate cell cultures for this particular protein preparation. At OD600 = 0.6, 1 mM of IPTG is added to induce protein expression.

A 1 mL sample for SDS-PAGE gel electrophoresis is also taken out to check that

no leaky expression of the protein of interest is occurring before IPTG induction

(0 hour sample). After three hours, another 1 mL SDS-PAGE sample is taken to

check for protein expression (3 hour sample).

31

Table 2.5 – Antibiotics Required by the Vectors / Cell Lines

Vector Antibiotic Cell Line Antibiotic

pET 21a Ampicillin TOP10 Streptomycin

pENTR-D Kanamycin XL10 /

pET101 Ampicillin DH5α /

pDEST-C1 Streptomycin Omnimax Tetracyclin

32 proteins Ampicillin BL21 (DE3) Star / plasmids BL21 (DE3) Gold / BL21 (DE3) pLysS Chloramphenicol The antibiotics required by each vector are shown on the left table, and the ones BL21 (DE3) RILP Chloramphenicol required by each cell line on the right table. Chloramphenicol The cell lines shown in italic are cloning RosettaBlue™ (DE3) hosts, the other ones are expression hosts. and Tetracyclin T7 Express lacIq Tetracyclin

Final Concentration Antibiotic Stock Concentration in Cell Culture

Ampicillin 25 mg/mL 25 µg/mL

Kanamycin 30 mg/mL 30 µg/mL

Streptomycin 50 mg/mL 200 µg/mL

Tetracyclin 5 mg/mL 5 µg/mL

Chloramphenicol 34 mg/mL 17 - 34 µg/mL

Once the SDS-PAGE gel electrophoresis confirms that the protein is expressed, large scale protein expression can be done, as described in the following section. On the other hand, if no protein expression can be seen, it might be necessary to transform the expression plasmid into a different expression host, or clone the gene in a different expression vector. If expression

32 has been assessed, the gene is also sent for sequencing to the Plant-Microbe

Genomics Facility at Ohio State University.

2.2.2. Large Scale Protein Expression

After confirmation that the protein of interest is expressed in the chosen expression host, large scale expression can take place. Indeed, protein crystallization trials require large amounts of protein and large scale preparations are necessary.

Three Erlenmeyer flasks containing 100 mL of 25 g/L LB medium and 1 mM of the antibiotics needed are inoculated with a small amount of frozen cells taken from a glycerol stock stored at -80 °C. The cells are then grown at 37 °C and a shaking speed of 180 rpm. The next day, each one of the 6 flasks of 1 L of

25 g/L of LB medium and 1 mM antibiotics are again inoculated with 50 mL of the overnight cell culture. The 6 L of cell culture are then left to grow at 37 °C and

180 rpm until an optical density OD600 at 600 nm reaches around 0.6. At that

point, protein expression is induced by adding 1 mM IPTG or 1 mM Nalidixic

Acid, depending on the protein. Also, 1 mL of the cell culture can be taken to

make a glycerol stock of the expression host. After IPTG induction, the cells are

left in the shaker for another three hours. Like in the case of small scale

expression studies, 0 hour and 3 hour samples are taken to check on

SDS-PAGE that the protein is expressed. The cells are then harvested by

centrifugation at 5,000 rcf for 15 minutes using a Beckman Coulter™ TJ-25

33 centrifuge, after which the supernatant is discarded. Finally, the cell pellet is stored at -20 °C.

2.2.3. SDS-PAGE Gel Electrophoresis

Protein expression samples previously mentioned have to be first centrifuged on a bench-top centrifuge (5417C from Eppendorf), at 10,000 rcf for a few minutes. The supernatant is discarded, and the pellet resuspended in 50

µL of BugBuster™ Protein Extraction Reagent (Novagen). Another 50 µL of 2X

NuPage™ LDS Sample Buffer (Invitrogen) are then added. If samples are to be prepared from a protein solution, the 2X NuPage™ LDS Sample Buffer is added to the protein solution in a 1:1 volume ratio. The samples are boiled for 10 minutes, centrifuged at 10,000 rcf for a few minutes before being loaded on a

NuPage™ 4-12 % Bis-Tris gel. The gel is run in 1X NuPage™ MES SDS

Running Buffer, in a XCell SureLock™ gel box, at 200 V for 35 minutes.

Once the gel is finished running, it is stained in Commassie® G250 stain

(Bio-Rad) overnight, then destained in a 30 % methanol + 10 % acetic acid

destaining solution. Finally, the gel is dried in a 30 % methanol + 5 % glycerol

drying solution.

2.3. Cell Lysis and Protein Solubility

The cell pellet stored at -20°C is thawed out in lysis buffer. The lysis buffer

composition depends on the protein, and 10 mL are used for every gram of cells.

Small amounts of lyophilized hen egg white lysozyme (Sigma) and AEBSF serine

34 protease inhibitor (USB) are then added to the cells to the solution, which is stirred at room temperature for 20 minutes. The cells are kept on ice and lysed open by sonication with a Branson™ sonifier 250. The soluble portion or lysate is then separated from the insoluble cell debris pellet by centrifugation at 4 °C, at

18,000 rpm for 30 minutes. Samples of the lysate and pellet are run on an

SDS-PAGE gel to check for solubility of the protein of interest.

If the protein is insoluble and found in the pellet, solubility studies can be done. The salt content of the lysis buffer is altered from no salt present to 1 M salt, or a different pH/buffer can be used. Bug Buster™ can also be used to extract the protein. The protein can be expressed at a lower temperature (17 °C) to slow down the rate of protein folding. Yet another option is to use a different expression host.

2.4. Protein Purification (Huber, 2000)

Protein purification was done using a BioLogic DuoFlow™ HPLC system

(Bio-Rad) controlled by the BioLogic DuoFlow™ software version 5.0. The HPLC

system is placed inside a 4 °C cabinet. The purification process is monitored by

following the absorbance at 260 nm and 280 nm, as well as the conductivity. The

different columns used for this work and the chemistry of each resin are

presented in Appendix 2. The purification protocols vary from one protein to

another. The isoelectric point is first calculated using the ExPASy tool (Gasteiger

et al., 2003). If the pI is acidic, anion-exchange chromatography is used, and if

the pI is basic, cation-exchange chromatography used. Size exclusion

35 chromatography is used as a polishing step to further clean a sample that has already been purified. Metal affinity, hydrophobic interaction and hydroxyapatite are used in special cases.

Two buffers are needed for most purification steps, a low salt buffer (buffer

A) and a high salt buffer used for elution (buffer B). The resin has to be washed and equilibrated in the corresponding buffer A before the protein sample can be loaded. If the protein sample happens to be a lysate, it also has to be filtered before loading. After the protein has been loaded, impurities are washed with the buffer A. The elution run is then started by slowly increasing the amount of buffer

B. Fractions are collected and the evolution of the run is monitored by UV absorbance. The fractions suspected to contain the protein are then run on an

SDS-PAGE gel. Once it is known which fractions contain the protein of interest, these fractions are pooled to be run on the next column or dialyzed and concentrated if the protein is pure. Any leftover impurity on the resin is washed away with buffer B or a cleaning buffer, and the resin is then stored in 20 % ethanol until the next use.

2.5. Protein Preparation

2.5.1. Dialysis

The protein is dialyzed in Slide-A-Lyzer® dialysis cassettes (Pierce) that can accommodate up to 12 mL of protein solution. When larger volumes have to be dialyzed, SnakeSkin® Pleated Dialysis Tubing (Pierce) is used. Two pore

sizes for the dialysis membrane are available, 3,500 or 10,000 MWCO. The

36 protein is left to dialyze, at slow stirring speed, at 4 °C or room temperature for a few hours, and the dialysis buffer is changed twice.

2.5.2. Protein Concentration

Small volumes of protein are concentrated in Microcon™ concentrators

(Millipore), with a molecular weight cutoff of 3,500 kDa (YM-3, yellow) or 10,000 kDa (YM-10, green). For larger volumes, Amicon™ Ultra-4 or Ultra-15 (4 mL or

15 mL) centrifugal filter devices (Millipore) are used. The protein sample is spun at 3,000 rpm until the desirable concentration is reached.

Protein concentration is calculated by measuring the absorbance at 280 nm with an Agilent 8453 UV-Visible Spectrophotometer (Agilent Technologies).

The protein sample may have to be diluted with buffer in order to get an absorbance reading between 0.1 and 1, which corresponds to the linear range of measurement. The concentration is calculated by multiplying the absorbance measured by the dilution factor and then dividing by the extinction coefficient of the protein, which was calculated with the ExPASy ProtParam tool (Gill and von

Hippel, 1989; Gasteiger et al., 2003).

Finally, the protein sample is filtered with an Ultrafree®-MC centrifugal unit,

0.45 µm pore size (Millipore).

2.5.3. Solubility Screen

The solubility screen is used to find an optimum buffer condition, in order

to enhance the protein solubility and therefore increase the chances of obtaining

37 diffraction quality crystals (Collins et al., 2004; Izaac et al., 2006). The screen tests six different chloride salts, six sodium salts and four buffers, as is shown in

Table 2.6, at either 100 mM or 1 M concentration.

Table 2.6 – Solubility Screen Solutions

Cations Anions Buffers

NH4Cl Na Formate Na MES pH 5.6

NaCl Na Acetate Na PIPES pH 6.5

KCl Na Cacodylate Na HEPES pH 7.5

LiCl Na Sulfate Na TAPS pH 8.5

MgCl2 Na Phosphate

CaCl2 Na Citrate

The protein is first precipitated by dialysis against deionized water or

addition of polyethylene glycol (PEG). The precipitated solution is aliquoted in 16 eppendorf tubes and spun down at 20,000 rcf for 5 minutes. The supernatants are removed and kept as a control. Next, 25 µL of each salt or buffer stated in

Table 2.2 is added to the tubes, the precipitate is resuspended and the solutions are incubated at room temperature for 20 minutes, before being centrifuged again. The protein concentration in the supernatant, corresponding to the amount of protein that went back into solution, is measured using the Bio-Rad protein assay, based on the Bradford assay (Bradford, 1976). For this assay, 995 µL of the 1X Bio-Rad protein assay reagent are mixed with 5 µL of each supernatant, and absorbance at 595 nm is measured with the UV-visible spectrophotometer.

38

The salt and buffer conditions corresponding to the highest solubility values are then chosen as the optimum buffer conditions.

2.6. Protein Crystallization

Preliminary crystal screening is done using the Honeybee 963 crystallization robot from Genomic Solutions, in the Ohio Crystallography

Consortium, located in the Instrumentation Center. Corning™ trays (Hampton

Research) are used when screening only one protein, and three-well Greiner™

trays (Hampton Research) can be used with up to three different protein samples per tray. These trays use the sitting drop vapor diffusion technique. A list of the

screens available in the lab is shown in Table 2.7. Each screen contains 96 different crystallization conditions. The Honeybee robot transfers 100 µL of each condition into the wells of the Corning™ or Greiner™ tray, then transfers 0.5, 1 or

2 µL, depending on the program that is used, inside the sitting drop depression.

The same volume of protein is dispensed in each sitting drop depression. Finally, the tray is manually sealed with clear tape, and stored at room temperature in a cabinet or at 4 °C in a cold room. The results can be manually recorded with a

Nikon SMZ1500 microscope and pictures of the crystals taken with a Nikon

CoolPix™ 990 digital camera, or automatically using the DCA Rhombix Imager

(Kendro) located in the Ohio Crystallography Consortium as well. Typically,

results are recorded after one day, three days, one week and every week after

that if some wells are still clear. To make sure the crystals growing are protein

crystals and not salt crystals, 0.5 µL of Izit™ Crystal Dye is added to the drop

39 and left to react for 30 minutes. The blue Izit™ dye only fits in solvent channels in protein crystals and turns the protein crystals blue, while the solvent channels in salt crystals are too small and the salt crystals remain colorless.

Table 2.7 – Crystal Screens

Screen Type Origin

Crystal Screen I / II ™ Sparse Matrix Hampton Research

Wizard I / II ™ Random Sparse Matrix Emerald BioSystems Sparse Matrix with Cryo I / II ™ Emerald BioSystems Cryoprotectants Combination of Grid Screen, Index ™ Sparse Matrix, and Incomplete Hampton Research Factorial Sparse Matrix for Natrix ™ Hampton Research Nucleic Acids Salt Rx ™ High Salt Sparse Matrix Hampton Research Sparse Matrix for MembFac ™ Hampton Research Membrane Proteins PEG Ion Screen PEG vs. Salt Matrix In-House

Additive Screen PEG vs. Additive Matrix In-House

Once crystal hits are obtained from the crystal screening, these hits have

to be optimized to grow diffraction quality crystals, which are very rarely obtained

directly from the crystal screens. Twenty-four-well expansion trays are set up in

VDX™ (Hampton Research) or Nextal™ (Qiagen) plates, using the hanging drop

vapor diffusion technique this time. For each crystal hit condition, the amount of

precipitating agent, the amount of salt, the pH… can be varied, and this is done

using the A/B gradient technique (Senger and Mueser, 2005). Only two solutions

have to be prepared, solution A corresponding to the lower end of the gradient,

40 and solution B corresponding to the higher end. Pipetting maps are available for

4x6, 2x12 and 1x24 setups. The A and B crystallization solutions are poured inside each well according to the pipetting map with an EDP 10 mL programmable pipette (Rainin). The tray is then placed on a stirring device for a few minutes. It should be noted that in the case of VDX™ trays, the edges of the well also have to be greased. Next, 1 µL or 2 µL of the protein are pipetted onto a

22 mm siliconized glass cover slide for the VDX™ tray, or on the screw-in crystallization support provided with the Nextal™ tray, and 1 or 2 µL of the well solution is added to the protein drop. Finally, the glass cover slides are sealed upside down on top of the wells using tweezers, and the Nextal™ crystallization supports simply screwed on each well. Sitting drop vapor diffusion can also be used. In this case, the protein drop is placed in a polypropylene Micro-Bridge®

(Hampton Research), which is then placed in the well of a VDX™ tray containing the crystallization solution. The wells are covered with clear tape. The advantage of the sitting drop method is that it allows for a larger protein drop, which might be needed to obtain large enough crystals. The trays are also stored at either room temperature or 4°C, and results are recorded in the same ways as the crystals screens after a few days.

2.7. X-Ray Diffraction Data Collection

2.7.1. Crystal Cryoprotection and Freezing

Once crystals are obtained, they have to be cryoprotected and flash

frozen prior to exposure to X-Rays for data collection (Rodgers, 1994).

41

A list of cryoprotectants is available in Table 2.4. Crystals grown in polyethylene glycol or other polyalcohols are cryoprotected with organic cryoprotectants, while crystals grown in high salt conditions are typically cryoprotected with high salt cryoprotectants. The crystals are removed from the drop using a nylon CryoLoop™ (Hampton Research), and rinsed for a few minutes in the substitute mother liquor, which is composed of the crystallization condition as well as the dialysis condition of the protein before it was crystallized.

The crystals are then placed in a drop containing the substitute mother liquor and the amount of cryoprotectant required to get an amorphous glass upon freezing

(see Table 2.8), and quickly picked up from the drop and plunged in liquid

Nitrogen.

Table 2.8 – Cryoprotectants

Organic Cryoprotectants High Salt Cryoprotectants

Name Concentration Name Concentration

PEG 400 25 – 35 % (v/v) Sodium Chloride 3.0 M (5.0 M)

Ethylene Glycol 11 – 30 % (v/v) Sodium Nitrate 5.0 M (7.0 M) 2-Methyl-2,4- 20 – 30 % (v/v) Sodium Malonate 2.0 M (3.5 M) Pentanediol (MPD) Glycerol 13 – 30 % (v/v) Sodium Formate 4.0 M (7.0 M)

Glucose 25 % (w/v) Lithium Sulfate 2.0 M (2.5 M)

Xylitol 22 % (w/v) Lithium Nitrate 4.0 M (8.0 M)

Lithium Chloride 2.5 M (10.0 M)

The list of organic cryoprotectants is available in (Rodgers, 1994) The first molarity given for the high salt cryoprotectants is the minimum molarity required to obtain a amorphous glass freeze, the molarity in parenthesis is the maximum solubility. It should be noted that NaCl and NaNO3 gave lower quality freezes than the other salts.

42

It is possible that the crystal cannot handle being kept in the cryoprotection solution, even for a few seconds, and will start degrading, cracking or dissolving. In that case, another cryoprotectant should be used. Another option is to incremently increase the amount of cryoprotectant, from none to the concentration needed, and soak the crystal at each step for a few minutes.

Also, an alternative to flash freezing the crystal by plunging it into liquid

Nitrogen is to flash freeze the crystal directly in the Nitrogen stream at 100 K while mounting the crystal on the diffractometer goniometer. The crystal can also be flash frozen in a Helium stream between 15 and 20 K. The Helium freeze is thought to reduce the lattice disruption that occurs while freezing.

2.7.2. Data Collection

X-Ray diffraction data collection can be done on the in-house FR-E

Rigaku Diffractometer at the Ohio Crystallography Consortium, in the

Instrumentation Center. However, when a more intense X-Ray source is needed, the in-house diffractometer is only used to screen crystals and data are collected at the Argonne Photon Source (APS) synchrotron, at Argonne National

Laboratories. Two beamlines were used for the present work, the Bio-CARS

14-BM-C beamline and the Ser-CAT 22-ID-D beamline.

43

2.8. Data Processing and Structure Determination

2.8.1. Data Processing

Once the dataset has been collected, it has to be processed and a

reflection file written before phasing can be done. Processing consists of several steps: indexing of the data to a particular space group, integration of all the

reflections and finally data reduction and merging of the equivalent reflections,

and scaling of the different resolution bins. The resolution is usually cut at the

scaling stage so that the Rmerge value is below 15 %.

⎡ n ⎤ n R =100 × F 2 − F 2 / F 2 Equation 2.1 merge ⎢∑∑ hkl hkl i ⎥ ∑∑ hkl ⎣ hkl i=1 ⎦ hkl i=1

where F 2 is the intensity of each hkl reflection and F 2 is the mean value of i hkl hkl i

measurements of n equivalent reflections.

For this particular project, HKL2000 (Otwinowski and Minor, 1997), d*TREK

(Pflugrath, 1999) or MOSFLM (Leslie, 1992) were used for data processing. In

the case of MOSFLM processing, the scaling step is done using the program

SCALA in the CCP4 program suite (Bailey, 1994).

2.8.2. Phasing

For this particular project, molecular replacement phasing can be used

since the structures of the separate RNase H and 32 core domain proteins have

already been solved. All the molecular replacement programs used here,

AMoRe, MolRep and Phaser (McCoy, 2007), are part of the CCP4 program suite

(Bailey, 1994). They all carry out rotation and translation searches as well as

44 rigid body refinement. AMoRe and MolRep output a R value and a CC correlation coefficient, in order to assess the quality of the solution. CC is defined as:

1/ 2 2 2 CC = ∆ F obs ∆ F calc /⎡ (∆ F obs ) (∆ F calc ) ⎤ Equation 2.2 ⎣⎢ ⎦⎥

The R value should be low and the correlation factor high. Phaser, a maximum-likelihood phasing program, outputs log-likelihood gains (LLG) as well as translation (TFZ) and rotation (RFZ) Z scores. The TFZ value should be

higher than 8, and the RFZ one higher than five for an acceptable solution. All

programs output a coordinate file with the solution after an initial rigid body

refinement.

2.8.3. Model Building

The coordinate file from molecular replacement and the reflection file

obtained after the initial rigid body refinement are fed into the model building

program COOT (Emsley, 2004). The electron density is calculated from the

reflection file as follows:

1 ρ(x,y,z) = ∑ Fhkl cos[2π (hx + ky + lz − φhkl )] Equation 2.3 V hkl

In there, the model can be adjusted to fit the electron density map better

using the different building and refinement tools available.

45

2.8.4. Structure Refinement and Validation

After a round of model building, the coordinate file from COOT is refined

against the reflection file using the refinement program REFMAC (Bailey, 1994;

Murshudov, 1997), in the restrained refinement mode. In this mode, the bonds

and angles from the model are refined against the REFMAC dictionary of allowed

bond lengths, angles and atomic sizes. How much freedom is allowed compared

to the dictionary is input in the weight term: the higher the weight, the less

REFMAC will try to follow the dictionary and the more it will try to fit atoms in the

electron density. If the weight term is too high and the electron density map

quality low, bonds can be broken, resulting in a model that doesn’t make any sense.

REFMAC outputs an R and Rfree value after the refinement process is

completed. The R value is the error between the model and the data at that stage, and the Rfree the same error on the amount of data (usually 5 %) that was

left out of the refinement process. The Rfree value should be around 5 % higher

than the R value.

The refined model is then fed into COOT once again, the model adjusted

after which another cycle of refinement is done. This process should be repeated

until an R value of around 20 % is reached. For low resolution models, a slightly

higher R value is acceptable. The water molecules are also added last using the

ARP/wARP tool in REFMAC.

The final model is then validated using PROCHECK (Laskowski, 1993) in

the CCP4 program suite.

46

2.8.5. Summary of the Data Processing and Model Building Process

A summary of all the steps needed from data collection to model building

is given in Figure 2.4.

2.9. Non-Denaturing Gel Electrophoresis

Non-denaturing or native gel electrophoresis is used to study the

interactions between proteins in their native state. The pH of the running buffer

has to be chosen according to the pI of the proteins that are to be run. Indeed, if

the pH of the solution is too close to the pI, the protein won’t move out of the well.

For these studies, a pH of 6.5 was chosen for the running buffer.

A 0.6 % agarose gel is prepared by boiling Agarose MB (Midsci) in 40 mM

Bis-Tris Acetate pH 6.5, 1 mM EDTA (TAE) buffer. The agarose is poured onto a

GelBond® film (Amersham Biosciences), and the comb is placed in the middle of

the gel. The agarose is left to solidify at room temperature first, then at 4°C

overnight. The protein samples are usually prepared at 50 µM or 100 µM. The protein is mixed with the TAE running buffer so that the sample contains 5 µL at the chosen concentration. Then, 1 µL of Gel Loading Solution and 4 µL of 50 %

glycerol are added. The gel is run at 50 V for 2 to 5 hours. At the end of the run, it

is transferred to a staining box, and is first fixed with a 0.05 % SDS solution for

30 minutes to denature the proteins, before being stained with SYPRO orange

(Staining solution: 20 µL of SYPRO orange protein gel stain (Invitrogen) in a

7.5 % acetic acid + 10 % methanol solution). The gel is visualized under UV light.

47

Figure 2.4 – X-Ray Diffraction Data Analysis Scheme

Data Collection • Program available at the beam line

Auto-Indexing / Integration

• HKL2000 • D*TREK • MOSFLM

Data Reduction and Merging • HKL2000 • D*TREK

• SCALA

Molecular Replacement Phasing • MolRep • AMoRe • PHASER (Maximum likelihood)

Initial Rigid Body Refinement • Molecular Replacement Programs

• REFMAC

Model Building Refinement Structure Validation • COOT • REFMAC • PROCHECK

48

2.10. Scattering Studies

2.10.1. Dynamic Light Scattering (DLS)

DLS requires samples at low concentration, with a minimal amount of

aggregates floating in solution. Therefore, the DLS samples are prepared around

1 mg/mL, filtered using a 0.1 µm pore size Ultrafree®-MC filter from Millipore, and

finally centrifuged for 20 minutes at 18,000 rcf. The measurements were taken using the DynaPro-Titan DLS instrument from Wyatt Technology in Dr. Viola’s

lab, and analyzed using the Dynamics version 6.7.3 software. The DLS cuvette

has to be extremely clean, with water background counts below 20. The protein sample is then injected in the cuvette and the counts recorded. The DLS instrument has a temperature control, allowing the user to make measurements

at 4 °C as well. The final graph is reported in terms of % mass versus

hydrodynamic radius Rh. The hydrodynamic radius is calculated using the

Stokes-Einstein equation:

kT R = Equation 2.4 h 6πηD

where k is the Boltzmann constant, T the temperature in K, η the solvent

viscosity and D the diffusion constant. More background information on light

scattering is available in (Tanford).

The program also calculates the molecular weight corresponding to each

peak. The polydispersity is a good estimate of the homogeneity of the protein

sample: a sample that has a polydispersity of 15 % or lower is considered

homogeneous.

49

2.10.2. Small-Angle X-Ray Scattering (SAXS) (Koch et al., 2003)

SAXS data can only be collected on a homogeneous protein sample, so usually DLS experiments are performed first to check the status of the protein in solution. A typical sample for SAXS has a concentration of 3 to 5 mg/mL.

The SAXS data collection in this study was carried out at the Advanced

Photon Source at Argonne National Laboratory in Chicago, at the

ChemMat-CARS 15-ID beamline.

The program used for data collection is 15-ID SAXS/WAXS v.3.294. A number of images have to be collected for the buffer, used as a blank, and the protein, for different exposure times. The best exposure time is then chosen and the corresponding images averaged. The data is plotted as the intensity versus the momentum transfer q, described by the following equation:

4π sinθ Momentum Transfer q = Equation 2.5 λ

The momentum transfer q is related to the resolution:

2π Resolution (Å) = Equation 2.6 q

Once the data has been collected, it is processed using a number of programs that belong to the ATSAS suite of SAXS programs (Petoukhov, 2007).

First, a regularization and reduction program such as GNOM or PRIMUS is used, to evaluate the size and shape of the particles in solution. PRIMUS only considers low q data, while GNOM takes into account higher resolution data as well. These programs use the momentum transfer plot to calculate a size distribution function p(r), which is then used to calculate the extrapolated intensity

50

at q = 0 : I(0), and the radius of gyration of the particle in solution Rg. All these

terms are defined as follows:

1 ∞ p(r) = qrI(q) sin(qr)dq Equation 2.5 2π 2 ∫0

D max I(0) = 4π p(r)dr Equation 2.6 ∫0

r 2 p(r)dr 2 ∫ Rg = Equation 2.7 2∫ p(r)dr

The overall shape of the size distribution plot also gives an estimate of the overall shape of the particle: a Gaussian function corresponds to a spherical particle, while a function that tails off at higher radius corresponds to an elongated particle (Koch et al., 2003).

The reduced data can then be used in different ways:

A 3D molecular envelope can be calculated with Ab Initio programs such as DAMMIN or GASBOR. DAMMIN only uses low resolution data, so the resolution has to be cutoff while running GNOM. DAMMIN can be run in several modes, the best one being the “keep” mode, as it outputs all the possible models.

These models can then be averaged with the program DAMAVER.

If a crystal structure is available and only part of the protein of interest is unknown, the missing domain can be built and added to the known structure with the programs CHADD or CREDO.

When trying to dock two proteins with known structures, a rigid body modeling program called SASREF is available. It fits the atomic models together

51

so that the final shape of the complex can account for the scattering data that is

observed.

Finally, CRYSOL outputs calculated scattering data from a coordinate file.

This is a good way of comparing scattering data, obtained from a solution based

experiment, to X-Ray diffraction data, obtained from a crystal.

Other programs are available in the ATSAS suite of programs but were not used in this particular study.

A summary of the different steps of data processing and analysis is presented in Figure 2.5.

52

Figure 2.5 – SAXS Data Analysis Scheme

Data Collection • 15-ID SAXS / WAXS program

Data Reduction and Regularization • PRIMUS (low q range, Guinier plot) • GNOM (entire q range)

Ab Initio Modeling Building of Missing Fragments • DAMMIN (low q range, low resolution) • CREDO (chain of dummy atoms) • GASBOR (entire q range, higher res.) • CHADD (same as CREDO, starts at the terminal residue)

Model Averaging

• DAMAVER Rigid Body Modeling

• SASREF

Computation of Scattering Data from an Atomic Model • CRYSOL

2.11. Isothermal Titration Calorimetry (ITC) (Pierce et al., 1999)

ITC is used to evaluate the thermodynamic parameters of binding of two species in solution, in this case the binding of one protein to another. The two proteins have to be dialyzed in exactly the same buffer. As far as concentrations

53 go, the protein that is used as a titrant (in the syringe) has to be roughly 20 times more concentrated that the one that is being titrated (in the cell). A total of four runs have to be performed: a buffer into buffer run to get the heat of injection, a buffer into titrated protein run and a titrant protein into buffer to get the heats of dilution, and finally the protein into protein run. The heats of injection and dilution are then substracted from the heats obtained in the final run to obtain the actual heat of binding.

The VP-ITC Instrument from MicroCal® in Dr. Funk’s lab was used in this particular study. Forty injections of 10 s were made, of 5 µL each except the first one which was 1 µL, with a 5 minute pause in between injections to allow the instrument to adjust the temperature of the cell. The protein mixture in the cell was stirred with the injection syringe at 270 rpm. The temperature of the cell was maintained at 20 °C, but this value can be modified to suit the experiment.

The data was analyzed using the program Origin 7.0 for ITC. After data processing, including removing the first couple data points and substracting the heats of injection and dilution, the titration curve obtained is fit using either a one-site binding or two-site binding model. The heat of the reaction is obtained from the integration of the peaks and the thermodynamic parameters of the reaction calculated using the following equations:

∆G = ∆H − T∆S Equation 2.8

∆G = −RT ln K Equation 2.9

The ITC experiment gives access to the enthalpy ∆H, the entropy ∆S, the association constant K and the stoechiometry N of the reaction.

54

2.12. Fluorescence Anisotropy Titration

Fluorescence Anisotropy titrations are another way to obtain dissociation

constants of binding of one species to another. The titrated molecule, usually

DNA, has to be labeled with a fluorophore. In this case HEX-Fluorescein was

chosen. The excitation wavelength of HEX is 535 nm, and the emission wavelength 556 nm. The fluorescent tag was attached to the fork DNA substrate

presented below in Figure 2.6. To measure the binding of one protein to another,

the first protein has to be bound to the DNA substrate before starting titrating the

other protein.

Figure 2.6 – DNA Substrate Used in the Fluorescence Anisotropy Titrations

5’ 5 15 * 3’ HEX Label

5’ * 15 15 3’

The relative concentrations of the proteins and DNA substrate depend on

the estimated dissociation constant: if the Kd is estimated to be in the micromolar

range, then all the samples need to be prepared at a micromolar concentration.

The concentrations therefore vary from one titration to another. The titrant protein is added in such a way that 50 % binding is achieved after three or four additions,

and the final concentration of the titrant in the cell is about ten times the one of

the titrated protein/DNA sample. This is to ensure that 90 % or more of the titrant

55

is bound at the end of the titration, resulting in a better estimate of the

dissociation constant.

A fluorometer from Photon Technology International was used for this

work, and it is found in Dr. Dignam’s lab on the Health Science Campus of the

University of Toledo. The data was collected using the program Felix, and

analyzed using DynaFit version 3.28.058 (Kuzmic, 1996).

2.13. DNA Purification and Annealing

2.13.1. DNA Oligomer Purification

The lyophilized DNA oligomers are purchased from Integrated DNA

Technologies (IDT). Typically, 1 to 2 µmoles are purchased for each oligomer.

Upon reception, the DNA is redissolved in 2 mL of autoclaved water. Each

oligomer is then purified by anion-exchange chromatography on a

BioCAD/SPRINT Perfusion Chromatography System, with a Poros HQ column.

The buffers are composed of 10 mM ammonium hydroxide and 200 mM (buffer

A) or 3 M (buffer B) ammonium acetate. The Poros HQ column is equilibrated

with buffer A, the DNA sample is then loaded onto the column, eluted with a

linear salt gradient and collected with a fraction collector. The absorbance is monitored at 296 nm. The fractions containing the pure DNA oligomer are kept, and after addition of 1 µL of 100X Tris HCl, EDTA buffer for each milliliter of DNA

solution, they are concentrated down overnight at medium heat in a Speed-Vac

concentrating system with a refrigerated condensation trap. The next day, each

DNA pellet is resuspended in 50 µL of autoclaved water. Samples are taken from

56

the different fractions to check the purity of the oligomer on a 20 % Tris, boric

acid, EDTA (TBE) polyacrylamide gel (Invitrogen) (see Section 7.12.3). Once it

has been established that the DNA purification was successful, the fractions are

pooled together. Absorbance at 260 nm is measured with the UV-Visible

Spectrophotomer and the concentration for each oligomer is calculated.

2.13.2. DNA Substrate Annealing

The oligomers needed for the substrate are mixed in a 1:1 molar ratio in an eppendorf tube. Sodium Chloride is also added to the mixture with a final concentration of 100 mM, for stringency purposes. The tube is placed to float in a beaker containing cold water, and the water is then heated until right before the boiling point. At that time, the beaker is placed in a closed Styrofoam box, at 4 °C overnight, to allow the DNA solution to slowly cool down. Finally, a 20% TBE gel is run to check that the DNA substrates annealed properly.

2.13.3. TBE Gel Electrophoresis

The 20 % TBE polyacrylamide gels are purchased from Invitrogen, and the 5X TBE running buffer (445 mM Tris, 445 mM Boric Acid and 10 mM EDTA) from USB. The samples are prepared by mixing 3 µL of the DNA with 15 µL of

TBE sample buffer (1X TBE running buffer + 15 % glycerol). On the outside lanes, a mixture of 2 µL of Gel Loading Solution and 23 µL of TBE sample buffer

is run, in order to check how far the gel has been running, since the DNA samples are colorless. The gel is run at 180 V for 75 minutes, at 4 °C. The bands

57 are visualized by UV shadowing: the gel is placed on a fluorescent silica gel coated sheet, and UV light (254 nm) is used to reveal the bands.

CHAPTER 3 - Bacteriophage T4 32 Protein

and Its Truncations

3.1. Introduction

The 32 protein from bacteriophage T4 is a single-stranded DNA-binding protein,

homologous to the SSB protein in other organisms. Binding of 32 protein

prevents re-annealing of the single strands of DNA during replication as well as

formation of secondary structures, and protects the single strands from nuclease activity. Additional background information on the 32 protein is available in

Section 1.1.2.

The three domains of 32 protein were identified by proteolytic cleavage

(Karpel, 1990). The N-terminus, also known as the B domain (for basic domain),

contains amino-acids 1 to 22 according to Karpel and coworkers. The C-terminus

or A (acidic) domain contains amino-acids 254 to 301. The intermediate domain

is also known as the core domain or 32 core protein. The core is thought to

interact with the single-stranded DNA, while the B domain is responsible for

cooperative binding of 32 proteins to one another, and the A domain for binding

to other proteins present at the replication fork. Several truncations of 32 protein

are available: the 32 core protein, the 32-A (32 minus A) protein missing the A

58 59 domain or C-terminus, and the 32-B (32 minus B) protein missing the B domain or N-terminus. It should be noted that the 32-B protein available in the Mueser lab is only missing the first 16 amino-acids instead of 22 (see Section 4.5.2). A schematic view of the different domains and truncations of 32 protein available in the lab is shown in Figure 3.1.

Figure 3.1 – 32 Protein Domains and 32 Truncations

1 16 17 253 254 301 B Domain Core Domain A Domain

• 32 Protein : amino-acid 1 to 301 • 32 Core Protein : amino-acid 17 to 253 • 32-B Protein : amino-acid 17 to 301 • 32-A Protein : amino-acid 1 to 253

The 32 protein domains presented above were discovered by Karpel and coworkers (Waidner et al., 2001). However, the B domain they described contains the first 22 amino- acids of the protein, while it was found that the 32-B protein used in the Mueser lab is actually missing only 16 amino-acids (see Section 4.5.2 for more details).

The protein characteristics for each 32 truncation were calculated using the ExPASy website (Gill and von Hippel, 1989; Gasteiger et al., 2003) and are summarized in Table 3.1.

60

Table 3.1 – 32 Protein and Truncations Characteristics

32 32 Core 32-B 32-A

Amino-acids 301 218 286 253

Molecular Weight 33.5 kDa 24.9 kDa 31.8 kDa 28.4 kDa

pI 5.82 5.25 4.65 6.76

ε 1.16 1.56 1.24 1.37

3.2. Bacteriophage T4 32 Protein

3.2.1. Introduction

The 32 protein construct was a gift from Dr. Nancy Nossal (NIH). The protein was expressed from the pAS6-2 plasmid transformed in the E. coli N8430 cell line. Cells containing the overexpressed recombinant 32 protein were available in the Mueser lab, they were lysed by sonication before the protein could be purified.

3.2.2. Cell Lysis

The cells were lysed according to the cell lysis protocol described in

Section 2.3. The lysis buffer that was used is composed of 25 mM bis-Tris HCl pH 6.5, 50 mM NaCl, 1 mM EDTA and 1mM DTT. A volume of 100 mL was used for every 10 grams of cells. Unfortunately, the cell lysis samples were not run on an SDS-PAGE gel, but a sample of the supernatant that was then purified can be

61

seen in Figure 3.2.a, lane 2. A large band corresponding to 32 protein is present,

meaning the protein was soluble after cell lysis.

3.2.3. Protein Purification

The supernatant containing 32 protein obtained after cell lysis was purified

using ion-exchange and hydrophobic interaction chromatography. Since the

calculated pI of the 32 protein is around 5.8, anion-exchange chromatography

was used. The buffers used for the different purification steps are detailed in

Table 3.2.

Table 3.2 – HPLC Buffers for T4 32 Protein Purification

Ion Exchange Hydrophobic

(Q Sepharose, POROS HQ) (POROS PE)

25 mM Tris HCl pH 7.5 25 mM bis-Tris HCl pH 6.5 50 mM NaCl Buffers 2 % glycerol 2% glycerol 50-500 mM NaCl 600 mM (NH4)2SO4

Buffer A: ~ 6 mS/cm Conductivity ~ 86 mS/cm Buffer B: ~ 43 mS/cm

The lysate was first loaded on the low-resolution anion-exchange Q

Sepharose. An example of a Q Sepharose run with the matching SDS-PAGE gel is presented in Figure 3.2.a. The elution from the Q Sepharose was then run on a hydrophobic column, the POROS PE, to get rid of any endogenous nuclease that might be present in solution. For that particular step, the conductivity of the protein sample had to be raised by addition of 3 M Ammonium Sulfate to match the one of the PE buffer. The protein was eluted in the void fraction and no salt

62 gradient was required. The PE elution then had to be either dialyzed in or diluted with Q buffer A in order to lower the conductivity back around 6 mS/cm. After that, the protein sample was run on the high-resolution anion-exchange column, the POROS HQ. A chromatogram and SDS-PAGE gel for that last step are shown in Figure 3.2.b. 32 protein was pure after the POROS HQ run, and could be concentrated before being kept at -80°C for further use.

3.3. Bacteriophage T4 32 Core Protein

The 32 core protein was donated by Dr. Yousif Shamoo (Rice University).

The gene was inserted in the pKC30 vector (Yoakum, 1983; Rao, 1984) and the protein expressed in the AR120 E. coli cell line (Waidner et al., 2001). The protein expression and the cell lysis were performed by Jennifer M. Dlwgosh, according to the full length 32 protein expression and cell lysis protocols.

Concerning the purification of 32 core protein, the buffers and protocol are the same as the ones used for 32 protein (see Section 3.2.3). The Q Sepharose and Poros PE runs are similar to the ones from the 32 protein purification. On the other hand, 32 core protein doesn’t seem to bind as strongly to the POROS HQ compared to 32 protein, even though its calculated pI is lower. Nonetheless, the

32 core protein purification did not present any particular challenge and the pure protein was obtained in large quantities, about 90 milligrams from 10 grams of cells.

63

Figure 3.2 – T4 32 Protein Purification

Figure 3.2.a – Q Sepharose

* * * *

1 2 3 4 5 6 7 8

1- Molecular Weight Marker 66.3 kDa 2- Q Sepharose – Load 55.4 kDa 3- Q Sepharose – F. 23

36.5 kDa 4- Q Sepharose – F. 35 31.0 kDa 5- Q Sepharose – F. 57

21.5 kDa 6- Q Sepharose – F. 72 14.4 kDa 7- Q Sepharose – Rinse Fraction 8- Q Sepharose – Flow Through

On the chromatogram, the OD260 is shown in purple, the OD280 in green, the conductivity in red and the % B in black. The red stars indicate which fractions were run on a SDS- PAGE gel. 32 protein can be found in a fairly pure state already in fractions 48 to 78, which were pooled to be run on the POROS PE column. Fractions 31 to 47 as well as the rinse fraction also contain 32 protein but are still contaminated. These fractions were rerun on the Q Sepharose and the pure 32 protein eluted after that second Q Sepharose run was added to the first elution.

64

Figure 3.2.b – POROS HQ

* *****

1 2 3 4 5 6 7

1- Molecular Weight Marker 66.3 kDa 55.4 kDa 2- POROS HQ – F. 18 3- POROS HQ – F. 23 36.5 kDa 31.0 kDa 4- POROS HQ – F. 29 21.5 kDa 5- POROS HQ – F. 33 14.4 kDa 6- POROS HQ – F. 39 7- POROS HQ – F. 43

32 protein is present in most fractions, but it was pure only in fractions 27 to 36. These fractions were pooled and concentrated.

65

3.4. Bacteriophage T4 32-A Protein

The 32-A protein was a gift from Dr. Richard Karpel (UMBC). The

truncated 32 gene was inserted in the pKC30 vector (Yoakum, 1983; Rao, 1984)

and the recombinant protein expressed from the pEKF1 plasmid and the AR120

E. coli cell line (Waidner et al., 2001). Protein expression and cell lysis were performed according to the 32 protein protocols. The 32-A protein was then purified over the Q Sepharose and Poros HQ anion-exchange columns. This work was done by Jennifer M. Dlwgosh.

3.5. Bacteriophage T4 32-B Protein

3.5.1. Introduction

The majority of the work done on the 32 truncations was accomplished

with the 32-B protein. Since the B domain responsible for cooperative binding is

missing but the A domain that is thought to be involved in interactions with other

DNA replication proteins is still present, the 32-B protein can still be used in

protein-protein complexes, but does not tend to aggregate and form filaments like

the 32 full length protein does.

The X-Ray structure of the 32 core was solved (Shamoo et al., 1995) but the structure of the missing A and B domains is still unknown. In parallel to the protein-protein complex studies (see Chapter 5), structural and biophysical experiments were also carried out on the 32-B protein alone, in order to get more insight on the role and structure of the A domain.

66

3.5.2. Molecular Cloning

The 32-B construct and glycerol stocks were donated by Dr. Richard

Karpel (UMBC). The protein was expressed from the pEKF2 plasmid transformed

in AR120 E. coli cells (Waidner et al., 2001).

Another student in the lab, Jennifer M. Dlwgosh, was also studying 32-B

protein, and therefore had her own preparations of 32-B. However, the different

batches of protein ran differently on SDS-PAGE gels, so the plasmid encoding

for the 32-B protein was sent for sequencing to make sure the protein that was

used in the present study was really missing the first 21 amino-acids. The

sequencing results are presented in Appendix 3.

It turns out that the 32-B protein that has been used in the lab is only missing the first 16 amino-acids, as compared to the 32*II protein described by

Karpel and coworkers (Waidner et al., 2001), which is missing the first 22 amino-acids. The “new” sequence for 32-B is shown in Figure 3.3. That sequence was used to calculate the protein characteristics that were presented in Table 3.1.

Figure 3.3 – T4 32-B Protein Amino-Acid Sequence

1 MFKRKSTAEL AAQMAKLNGN KGFSSEDKGE WKLKLDNAGN GQAVIRFLPS KNDEQAPFAI 61 LVNHGFKKNG KWYIETCSST HGDYDSCPVC QYISKNDLYN TDNKEYSLVK RKTSYWANIL 121 VVKDPAAPEN EGKVFKYRFG KKIWDKINAM IAVDVEMGET PVDVTCPWEG ANFVLKVKQV 181 SGFSNYDESK FLNQSAIPNI DDESFQKELF EQMVDLSEMT SKDKFKSFEE LNTKFGQVMG 241 TAVMGGAAAT AAKKADKVAD DLDAFNVDDF NTKTEDDFMS SSSGSSSSAD DTDLDDLLND 301 L

The 32-B protein sequence is in black, shaded in gray are the16 missing amino-acids at the N-terminus. An additional Methionine from the start codon is also present at the N-terminus.

67

All the work with 32-B protein described in this dissertation was carried out

with this particular N-terminal truncation. Also, the amino-acid numbering was

kept as it is for the full length 32 protein. For example, with the I151D 32-B

mutant described below in Section 4.6, the number 151 refers to isoleucine 151 in the 32 protein sequence, not to amino-acid 151 in the 32-B protein sequence if the numbering were to start at the initial methionine.

The following Section 3.6 describes the two 32-B protein mutants that were made and the work that was done with them. These mutants were cloned using site-directed mutagenesis, with the native 32-B protein expression plasmid as a template for the PCR reaction. This pEKF2 plasmid, obtained from Dr.

Richard Karpel (UMBC), is very large: 15-20 kb, as one can see on Figure 3.4.

Figure 3.4 – T4 32-B Initial Expression Plasmid Miniprep

1 2 3 1- Initial pEKF2 32-B expression plasmid

10 kb 2- Supercoiled DNA ladder

10 kb 3- 1 kb DNA ladder

5 kb 5 kb The plasmid containing the 32-B gene is supercoiled. Since it runs past the 2 kb supercoiled DNA ladder, its size can only 1 kb be roughly estimated from looking at the gel. It is probably between 15 and 20 kb.

68

Because the template plasmid was so large, the PCR reactions for site-directed mutagenesis were all unsuccessful, more details on this are provided in Section 4.6. Therefore, the gene encoding for 32-B had to be recloned in a different expression vector. This work is described below.

Two different expression vectors were chosen. One is the commercially

available pET101 (Invitrogen) that has the advantage of allowing direct insertion of the PCR product. However, protein expression is rarely obtained from this

vector. Another vector available is the pDEST-C1. It first requires an insertion of

the PCR product in an entry vector, the pENTR-D vector, the gene of interest

being then inserted in the pDEST-C1 expression vector though a transposition

reaction. Both pET101 and pENTR-D use TOPO-assisted directional cloning

while inserting the PCR product, which is why a CACC overhang is necessarily

present in the forward primer. The pDEST-C1 also adds an N-terminal His-Tag to

the protein expressed for purification purposes. That tag may interfere during

crystallization studies and it is preferable to add a TEV protease cleavage site in

the pDEST-C1 forward primer so that the His-Tag can be cleaved off after the

protein has been purified. The different primers are presented in Figure 3.5.

Despite the low success rate of expression from the pET101 vector, the pET101

cloning does not require a lot of time, which is why it was done in parallel with the

pDEST-C1 cloning.

69

Figure 3.5 – T4 32-B PCR Primers

Forward Primer – pDEST-C1 insertion

32-B 5’- ATG CTG AAT GGC AAT -3’ primer 5’- C ACC GAG AAC CTC TAC TTC CAA GGA CTG AAT GGC AAT -3’

32-B 5’- AAA GGT TTT TCT TCT… -3’ primer 5’- AAA GGT TTT TCT TCT… -3’

5’- C ACC GAG AAC CTC TAC TTC CAA GGA CTG AAT GGC AAT AAA GGT TTT TCT TCT -3’

27 bp from 32-b, 52 bp total 33% GC content Tm = 54°C

Forward Primer – pET101 insertion

32-B 5’- ATG CTG AAT GGC AAT AAA GGT TTT TCT TCT… -3’ primer 5’- C ACC ATG CTG AAT GGC AAT AAA GGT TTT TCT TCT… -3’

5’- C ACC ATG CTG AAT GGC AAT AAA GGT TTT TCT TCT -3’

30 bp from 32-b, 34 bp total 33% GC content Tm = 54°C

Reverse Primer (Inverse Complement)

32-B 5’- …CTG GAT GAC CTT TTG AAT GAC CTT TAA -3’ primer 3’- GAC CTA CTG GAA AAC TTA CTG GAA ATT -5’

5’- TTA AAG GTC ATT CAA AAG GTC ATC CAG -3’

27 bp total 37% GC content Tm = 55°C

Both forward primers were aligned with the nucleotide sequence of the 32-b gene. The start codon is highlighted in green. In red is the CACC overhang necessary for TOPO-assisted directional cloning, and the sequence coding for the TEV protease cleavage site is shown in blue in the pDEST-C1 primer. The reverse primer is shown as the inverse complement of 32-b, where the stop codon is highlighted in red.

The primers were ordered from Integrated DNA Technologies®. They were received as a lyophilized pellet and were resuspended in 1X TE buffer (20 mM

Tris HCl pH 8.0 + 1 mM EDTA) to make 250 µM solutions of each primer. A

70

primer solution was then made for each set of primers (the pET101 insertion and

the pDEST-C1 insertion), by mixing 2 µL of the forward primer, 2 µL of the reverse primer and adding 46 µL of EB buffer, in order to make a 10 µM primer mix solution for each reaction.

The PCR reactions for both the pET101 insertion and the pDEST-C1 insertion were run at the same time, according to the reaction setup described in

Table 3.3. The KOD polymerase was used in the reactions, as it has a higher fidelity and processivity than the regularly used Proof Start polymerase.

Table 3.3 – T4 32-B PCR Reactions

PCR Reaction PCR Program

KOD Buffer (10X) 5 µL Activation 95 °C, 2 minutes

Primer solution (10 µM) 1.5 µL Denaturation 95 °C, 20 seconds

dNTPs (2 mM each) 5 µL Annealing 54 °C, 10 seconds

25 mM MgSO4 3 µL Extension 70 °C, 15 seconds

KOD Polymerase (1 U/µL) 1 µL 30 cycles

Initial pEKF2 32-B expression Final Extension 70 °C, 5 minutes 1 µL plasmid (10 ng/µL)

Autoclaved water 33.5 µL

After the reaction, the two PCR products were run on a 1% agarose gel

that is shown in Figure 3.6.a. Both reactions were successful and were gel

purified using the MiniElute kit from Qiagen. The amplified 32-b gene from lane 2

was then inserted in the pET101 vector, according to the reaction setup shown in

Table 3.4. The mixture was incubated at 25 °C for 30 minutes.

71

Table 3.4 – T4 32-B Insertion in pET101 Reaction

pET101 Insertion Reaction The salt solution is provided with the Purified PCR product 2 µL pET101 cloning kit, and is composed of Salt Solution 1 µL 1.2 M NaCl + 0.06 M MgCl2. The pET101 vector is already pET101 vector 1 µL linearized and covalently linked to Topoisomerase I. Autoclaved water 2 µL

After the reaction, 2 µL of the product were transformed into 50 µL of

DH5α competent cells, 50 µL and 100 µL of the cells respectively were plated on

LB + Carbenicillin plates. The pET 101 vector carries the resistance gene to

Ampicillin, and Carbenicillin is an analog of Ampicillin that is less susceptible to

hydrolysis. Colonies were picked from both plates, grown overnight in a LB +

Carbenicillin media, and the cells were collected the next day for plasmid

isolation using the Miniprep kit (Qiagen). Glycerol stocks were also made by

mixing 1 mL of the cell culture at OD600 = 0.6 with 1 mL of 50% glycerol, then

flash freezing the mixture on dry ice. The products of the pET101 insertion

reaction isolated after the Miniprep reactions were run on a 1% agarose gel,

present in Figure 3.6.b. All the plasmids shown on the gel have the correct size.

The plasmids isolated from colonies 1 and 3 were chosen for transformation into

competent BL21 (DE3) Star cells (0.8 µL / 25 µL of cells). After transformation, the cells were not plated and grown directly overnight at 37 °C in LB +

Carbenicillin media (the BL21 (DE3) star cell line does not carry any additional antibiotic resistance). Theses cultures were then used the next day to inoculate fresh LB + Carbenicillin media, and after glycerol stocks were taken at OD600 =

72

0.6, protein expression was induced by addition of 1 mM IPTG to the culture. The

cells were left at 37 °C for another three hours. Samples of protein expression

were taken after 0h and 3h, and were run on an SDS-PAGE gel, showed in

Figure 3.7.a. No protein was expressed around 31.5 kDa.

It was then decided to clone the 32-b gene in the pDEST-C1 vector. The

PCR product from Figure 3.6.a, lane 3 was inserted in the pENTR-D vector first,

as detailed in the reaction setup in Table 3.5. The reaction was incubated at

room temperature for 30 minutes.

Table 3.5 – T4 32-B Insertion in pENTR-D Reaction

pENTR-D Insertion Reaction The salt solution is provided with the Purified PCR product 2 µL pENTR-D cloning kit, and is composed of 1.2 M NaCl + 0.06 M MgCl2. Salt Solution 1 µL The pENTR-D vector is already linearized and covalently linked to Topoisomerase I. pENTR-D vector 1 µL

Autoclaved water 2 µL

A 2 µL sample was then transformed in 50 µL of competent DH5α cells.

The cells were plated on LB + Kanamycin plates (50 µL and 100 µL per plate

respectively), since the pENTR-D vector carries the Kanamycin resistance gene.

Colonies were picked from both plates, grown overnight in LB + Kanamycin

media and the cells were collected to isolate the pENTR-D plasmid. The different

plasmids isolated from the colonies were run on a 1% agarose gel, which is

shown in Figure 3.6.c. Only one plasmid has the estimated size for a pENTR-D +

73

32-b insert, from colony one. That plasmid was then gel purified before running

the LR Clonase transposition reaction, as described in Table 3.6.

Table 3.6 – T4 32-B Insertion in pDEST-C1 Reaction

LR Clonase Reaction

Gel Purified 32-b in pENTR-D 1 µL

pDEST-C1 1 µL

LR Clonase Mix 2 µL

TE buffer (1X) 6 µL

The 1X TE buffer is made of 20 mM Tris HCl pH 8.0 + 1 mM EDTA

The reaction was incubated at room temperature for 2 hours, and

terminated by addition of 1 µL of Proteinase K at 2 µg/µL, and then incubation at

37 °C for 10 minutes. The reaction product was transformed in DH5α competent

cells (2 µL plasmid / 50 µL of cells), then 50 µL and 100 µL of the cells were

plated on LB + Streptomycin plates. Colonies were picked from the plates and

grown overnight for plasmid isolation. These plasmids were run on a 1% agarose

gel displayed in Figure 3.6.d. All plasmids have the correct size, with the exception of the one from colony 7 being contaminated with the pENTR-D insert.

The plasmid from colony 3 showed a stronger band so it was transformed in

competent T7 express cells.

74

Figure 3.6 – Agarose Gels for T4 32-B Cloning

Figure 3.6.a – T4 32-B PCR Reactions

1 2 3 4 1- 100 bp DNA ladder 2- T4 32-b PCR product (pET101 insertion) 3- T4 32-b PCR product (pDEST-C1 insertion) 4- 1 kb DNA ladder 5 kb

The 32-b gene is 865 bp long. The pET101 PCR product is also 865 bp, 1 kb 1 kb and the pDEST-C1 is 883 bp long because of the additional TEV protease cleavage 0.5 kb 0.5 kb site. Both PCR products have the correct size.

Figure 3.6.b – T4 32-B Insertion in pET101

1- Supercoiled DNA ladder 1 2 3 4 5 6 2- Blank

3- T4 32-b insert in pET101 – colony 1 4- T4 32-b insert in pET101 – colony 2 5- T4 32-b insert in pET101 – colony 3 6- T4 32-b insert in pET101 – colony 4

5 kb The correct sizes are as follows: • 32-b 865 bp • pET101 5753 bp 2 kb • Total 6618 bp

The pET101 + 32-b constructs run between 6 and 7 kb, which indicates the correct insertion.

75

Figure 3.6.c –T4 32-B Insertion in pENTR-D

1 2 3 4 5 6 7

1- Supercoiled DNA ladder 2- Blank 3- T4 32-b insert in pENTR-D – colony 1 4- T4 32-b insert in pENTR-D – colony 2

5 kb 5- T4 32-b insert in pENTR-D – colony 3 6- T4 32-b insert in pENTR-D – colony 4 7- T4 32-b insert in pENTR-D – colony 5 2 kb

The correct sizes are as follows: • 32-b 883 bp • pENTR-D 2580 bp • Total 3463 bp

Only colony 1 has the right insert as the corresponding plasmid is the only one running between 3 and 4 kb.

Figure 3.6.d –T4 32-b Insertion in pDEST-C1

1 2 3 4 5 6 7 8 1- T4 32-b insert in pDEST-C1 – colony 1 2- T4 32-b insert in pDEST-C1 – colony 2 3- T4 32-b insert in pDEST-C1 – colony 3

4- T4 32-b insert in pDEST-C1 – colony 4

5 kb 5- T4 32-b insert in pDEST-C1 – colony 5 6- T4 32-b insert in pDEST-C1 – colony 6 7- T4 32-b insert in pDEST-C1 – colony 7 2 kb 8- Supercoiled DNA ladder

The correct sizes are as follows: • 32-b 883 bp • pDEST-C1 +5334 bp • ccdB -1600 bp • Total 4617 bp

All plasmids seem to have the correct size, but the one isolated from colony 7 is contaminated with the pENTR-D + 32-B plasmid.

76

After transformation, the cells were grown directly overnight in LB +

Steptomycin (pDEST-C1 vector) + Tetracyclin (T7 express cell line). Protein expression was induced at OD600 = 0.6 with 1 mM IPTG. The results from the

protein expression in pDEST-C1 are showed in Figure 3.7.b. A protein is overexpressed around 37 kDa, corresponding to the expected molecular weight of 32-B with an N-terminal His-Tag.

The last step was to make sure the expressed 32-B from the pDEST-C1 plasmid was soluble. A 1 L culture was prepared and protein expression induced under the same conditions as described above. The cells were lysed and sonicated in the 32-B lysis buffer, which is composed of 40 mM Tris HCl pH 8.0,

100 mM NaCl, 10 mM MgCl2, 2 mM CaCl2 and 1mM EDTA. Samples were taken

from the pellet and supernatant after cell lysis and run on an SDS-PAGE gel,

presented in Figure 3.7.c., Even though the amount of 32-B in the pellet is quite

significant, large amounts of the protein of interest are present in the

supernatant as well. The 32-B protein expressed with an N-terminal His-Tag from

the pDEST-C1 vector is therefore soluble, and the plasmid can be used in

site-directed mutagenesis experiments.

77

Figure 3.7 – T4 32-B Protein Expression and Solubility

Figure 3.7.a – T4 32-B Protein Expression from the pET101 Vector

1- T4 32-B expression (plasmid from 1 2 3 4 5 colony 1) – 0h sample 2- T4 32-B expression (plasmid from colony 1) – 3h sample 3- T4 32-B expression (plasmid from 66.3 kDa colony 3) – 0h sample 55.4 kDa 4- T4 32-B expression (plasmid from 36.5 kDa colony 3) – 3h sample 31.0 kDa 5- Molecular Weight Marker 21.5 kDa 14.4 kDa A protein is expressed around 40 kDa, that is too large to be T4 32-B which has a molecular weight of 31.3 kDa.

Figure 3.7.b – T4 32-B Protein Expression from the pDEST-C1 Vector

1 2 3 1- Molecular Weight Marker 2- T4 32-B expression – 0h sample

66.3 kDa 3- T4 32-B expression – 3h sample 55.4 kDa

36.5 kDa 31.0 kDa

21.5 kDa A 37-40 kDa protein, corresponding to the expected molecular weight of 32-B + His-Tag, is expressed. 14.4 kDa

78

Figure 3.7.c – T4 32-B Cell Lysis

1 2 3 4 5

66.3 kDa 1- Molecular Weight Marker 55.4 kDa 2- T4 32-B expression – 3h sample 3- T4 32-B cell lysis – pellet 36.5 kDa 4- T4 32-B cell lysis – supernatant 31.0 kDa 5- Molecular Weight Marker 21.5 kDa

After protein expression was assessed, the pDEST-C1 + 32-b plasmid that

was transformed in the T7 express cell line was sent for DNA sequencing. The

results are shown in Appendix 3. The 32-b gene was cloned correctly in the

pDEST-C1 vector.

The molecular cloning of 32-B protein in the pDEST-C1 vector was done solely for site-directed mutagenesis purposes, described in Section 4.6. All the remaining work on 32-B detailed below was done with protein expressed from

the pEKF2 plasmid.

3.5.3. Protein Expression

The glycerol stock for T4 32-B expression was obtained from Dr. Richard

Karpel (UMBC). The 32-b gene was cloned in the pEKF2 plasmid, derived from

79

the PKC30 vector (Waidner et al., 2001). The plasmid was then transformed in

the AR120 E. coli cell line.

The cells obtained from the glycerol stock were first grown overnight at

37 °C in 300 mL of LB + Ampicillin media. Fresh media (6 L) was then inoculated

the next day with the overnight culture and the cells grown until they reached

OD600 = 0.6. Protein expression was induced at that stage by adding 1 mM of

nalidixic acid: the pEKF2 plasmid has a PL (phage lambda) promoter, therefore

expression cannot be induced by addition of IPTG like the pET vectors that have a T7 promoter. Instead, nalidixic acid provokes an SOS response by inhibiting the cell DNA gyrase and creating DNA damage, which removes the repressor bound to the PL promoter and induces expression of the protein of interest (Little

and Mount, 1982; Shatzman and Rosenberg, 1987). After induction, the cells

were left to grow at 37 °C for another three hours, then centrifuged. Samples

were taken at 0h and 3h of protein expression, and run on an SDS-PAGE gel

shown below in Figure 3.8. T4 32-B is largely overexpressed around 32 kDa.

Figure 3.8 – T4 32-B Protein Expression 1 2 3

1- T4 32-B expression – 0h sample 66.3 kDa 2- T4 32-B expression – 3h sample 55.4 kDa 3- Molecular Weight Marker

36.5 kDa 31.0 kDa A protein is overexpressed around 32 kDa, 21.5 kDa corresponding to the molecular weight of 32-B.

80

3.5.4. Cell Lysis

The cells obtained as described in the previous section were then lysed in

the following buffer: 40 mM Tris HCl pH 8.0, 100 mM NaCl, 10 mM MgCl2, 2 mM

CaCl2 and 1 mM EDTA. The cells were thawed in that buffer in the presence of

lysozyme and AEBSF, a protease inhibitor. They were then lysed open by

sonication, after which a pellet and supernatant sample were taken. The results

from the cell lysis for 32-B are shown in Figure 3.9. The protein is present in both

the pellet and the lysate but that might be due to the high concentration of 32-B

in the cells after expression. Enough protein was present in the supernatant,

which was then purified as is described in the next section.

Figure 3.9 – T4 32-B Cell Lysis

1 2 3

1- Molecular Weight Marker 2- T4 32-B cell lysis – pellet 66.3 kDa 55.4 kDa 3- T4 32-B cell lysis – supernatant

36.5 kDa 31.0 kDa

21.5 kDa After cell lysis, the protein is present in both the pellet and the lysate, but that might be due to the fact that a lot of protein was expressed and not enough lysis buffer was used to extract it all.

81

3.5.5. Protein Purification

T4 32-B protein was initially purified following the protocol obtained from

(Waidner et al., 2001). As the calculated pI of the protein is around 4.6 (Gill and

von Hippel, 1989), anion exchange chromatography was used. The lysate was

first loaded on the low resolution anion-exchange column Q Sepharose. The

POROS PE was the next step, it is a hydrophobic column that is used to remove

nucleases from the sample. Then, the high resolution anion-exchange POROS

HQ was used, and finally 32-B protein was further purified using size exclusion

chromatography, as it was not pure enough after the POROS HQ elution. The

buffers needed for each step are presented in Table 3.7.

Table 3.7 – HPLC Buffers for T4 32-B Purification Scheme 1

Ion Exchange Hydrophobic Size Exclusion

(Q Sepharose, POROS HQ) (POROS PE) (Superdex 75)

25 mM Tris HCl pH 7.5 25 mM bis-Tris HCl pH 6.5 25 mM bis-Tris HCl pH 6.5 50 mM NaCl 150 mM NH Cl Buffers 10 % glycerol 600 mM (NH ) SO 4 4 2 4 2 mM EDTA 50-500 mM NaCl 1 mM EDTA 2 mM BME 2 mM BME

Buffer A: ~ 4 mS/cm Conductivity ~ 90 mS/cm ~ 19 mS/cm Buffer B: ~ 33 mS/cm

Chromatograms and SDS-PAGE gel for this purification scheme are

shown in Figure 3.10. The final yield for this purification was 13 mg of protein / L

of cell culture.

82

Figure 3.10 – T4 32-B Purification Scheme 1

Figure 3.10.a – Q Sepharose

**

1 2 3 4 5

66.3 kDa 1- Q Sepharose – Load 55.4 kDa 2- Q Sepharose – Flow Through

36.5 kDa 3- Q Sepharose – F. 67 31.0 kDa 4- Q Sepharose – F. 76 21.5 kDa 5- Molecular Weight Marker 14.4 kDa

On the chromatogram, the OD260 is shown in purple, the OD280 in green, the conductivity in red and the % B in black. The red stars indicate which fractions were run on a SDS-PAGE gel. 32-B is present in fractions 64 to 76 that were pooled to be run on the POROS PE.

83

Figure 3.10.b – POROS HQ

*******

1 2 3 4 5 6 7 8 9 10 11 1- Molecular Weight Marker

2- POROS PE – Load 66.3 kDa 3- POROS PE – Flow Through 55.4 kDa 4- POROS PE – Void Fraction 5- POROS HQ – F. 21 36.5 kDa 31.0 kDa 6- POROS HQ – F. 23

21.5 kDa 7- POROS HQ – F. 26 14.4 kDa 8- POROS HQ – F. 32 9- POROS HQ – F. 36

10- POROS HQ – F. 42 11- POROS HQ – F. 50

After the POROS HQ elution, 32-B protein is present throughout the run, indicating that the protein has some solubility problems. Fractions 20 to 40 were pooled and run on the Superdex 75.

84

Figure 3.10.c – Superdex 75

*

1 2

66.3 kDa 55.4 kDa 1- Molecular Weight Marker 36.5 kDa 31.0 kDa 2- Superdex 75 – F.11 21.5 kDa 14.4 kDa

After the Superdex 75 run, 32-B is pure.

85

Even though 32-B was pure after the size exclusion column, it was noticed

that the protein may have solubility issues, from the broad shape of the peak that

was obtained with the POROS HQ. The solubility screen was performed on pure

32-B obtained after the Superdex 75 run. That work is described in Section 3.5.6.

It was found that 32-B is more soluble at pH 7.5, and in Na Citrate. A new set of buffers was devised, shown in Table 3.8. The purification scheme was kept the

same, but the buffers were changed.

Table 3.8 – HPLC Buffers for T4 32-B Purification Scheme 2

Ion Exchange Hydrophobic

(Q Sepharose, POROS HQ) (POROS PE)

25 mM Tris HCl pH 7.5 25 mM Tris HCl pH 7.5 50 mM NaCl Buffers 25 mM Na Citrate pH 7.5 600 mM (NH4)2SO4 0-1 M NaCl 1 mM EDTA 2 mM β-mercaptoethanol

Buffer A: ~ 16 mS/cm Conductivity ~ 90 mS/cm Buffer B: ~ 96 mS/cm

Again, the chromatograms and SDS-PAGE gels from the purification

scheme 2 are shown below in Figure 3.11. It should be noted that 32-B did not

stick tightly to the POROS HQ resin with the new buffers. It would be eluted off

the column with buffer A and would mostly be found in the rinse fraction, and then in the early fractions of the salt gradient elution. The protein was pure

enough after the POROS HQ run and did not need to be run on the size

exclusion column. The final yield for this scheme is as follows: 60 mg of pure

protein / L of cells. This is a five fold improvement in terms of solubility, compared

to the initial purification.

86

Figure 3.11 – T4 32-B Purification Scheme 2

Figure 3.11.a – Q Sepharose

* * * * *

1 2 3 4 5 6 7

1- Q Sepharose – F. 20 2- Q Sepharose – F. 46 66.3 kDa 3- Q Sepharose – F. 50 55.4 kDa 4- Q Sepharose – F. 56 36.5 kDa 5- Q Sepharose – F. 72 31.0 kDa 6- Q Sepharose – Flow Through 21.5 kDa 7- Molecular Weight Marker 14.4 kDa

On the chromatogram, the OD260 is shown in purple, the OD280 in green, the conductivity in red and the % B in black. The red stars indicate which fractions were run on a SDS-PAGE gel. 32-B was found in fractions 44 to 58, which were pooled to be run the POROS PE.

87

Figure 3.11.b – POROS HQ

* * * *

1 2 3 4 5 6 7 8 9

1- Molecular Weight Marker 2- POROS PE – Load 3- POROS PE – Flow Through

66.3 kDa 4- POROS PE – Void Fraction 55.4 kDa 5- POROS HQ – F.4

6- POROS HQ – F.6 36.5 kDa 31.0 kDa 7- POROS HQ – F.13 8- POROS HQ – F.27 21.5 kDa 9- POROS HQ – Flow Through 14.4 kDa

32-B was found in fractions 4 to 21, and was pure so these fractions were pooled and concentrated.

88

3.5.6. Solubility Screen

32-B obtained from the Superdex 75 run was precipitated by dialysis against distilled water. The precipitate was then aliquoted in eppendorf tubes, and the different solutions at 100 mM from the solubility screen were added to each tube. Each mixture was then thoroughly mixed and left to incubate at room temperature for 20 minutes, then centrifuged. The absorbance of each supernatant + Bradford reagent at 595 nm was checked, and the values tabulated in a graph, shown in Figure 3.12. 32-B was most soluble in HEPES pH 7.5 and Na Citrate. HEPES was however replaced by Tris HCl pH 7.5 in the

HPLC buffers for monetary reasons.

Figure 3.12 – T4 32-B Solubility Screen

Supernatant

H2O

TAPS pH 8.5

HEPES pH 7.5

PIPES pH 6.5

MES pH 5.6

Na Citrate

Na Phosphate

Na Sulfate

Na Cacodylate

Na Acetate

Na Formate CaCl2 MgCl2 LiCl KCl NaCl NH4Cl 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 abs (595 nm)

89

3.5.7. Dialysis and Concentration

32-B after purification was dialyzed in the T4 RNase H dialysis buffer,

since the two proteins were studied as a complex. This buffer is composed of

25 mM bis-Tris HCl pH 6.5, 150 mM NH4Cl, 10 mM MgCl2 and 2 mM

β-mercaptoethanol. 32-B was then concentrated before being stored at -80 °C.

The maximum concentration that was reached for the protein was around

100 mg/mL.

3.5.8. Crystal Screening and Optimization

The crystal structure of the core domain of 32 protein was solved in 1995

by Shamoo and coworkers (Shamoo et al., 1995) and it is described in Section

1.1.2. On the other hand, the structure of the two missing domains is still

unknown, and the attempts to get diffraction quality crystals of the full length 32

protein were all unsuccessful. Since 32-B protein was available in large

quantities, it was screened for crystallization in an effort to obtain the missing

structure of the A domain.

32-B was screened against different commercial screens, using either

Greiner or Corning trays. The trays were poured and the drops set up using the

Honeybee crystallization robot. The screens that were set up are summarized in

Table 3.9.

90

Table 3.9 – T4 32-B Crystal Screens

Crystal Screen Concentration Temperature Drop (µL)

Crystal Screen I and II 10 mg/mL Room Temp. 0.5 + 0.5

Index 10 mg/mL Room Temp. 0.5 + 0.5

PEG Ion Screen 24 mg/mL Room Temp. 0.5 + 0.5

Natrix 24 mg/mL Room Temp. 0.5 + 0.5

Wizard I and II 24 mg/mL Room Temp. 0.5 + 0.5

Salt RX 24 mg/mL Room Temp. 0.5 + 0.5

Additive Screen 10 mg/mL 4 °C 0.5 + 0.5

PEG Ion Screen 10 mg/mL 4 °C 0.5 + 0.5

Natrix 10 mg/mL 4 °C 0.5 + 0.5

Wizard I and II 10 mg/mL 4 °C 0.5 + 0.5

A large number of hits were obtained, but only around 20 of them were

confirmed to be protein crystals. They were mostly micro-crystals.

A number of these micro-crystal conditions were expanded on, but for most of them it did not yield any successful results. One condition from Crystal

Screen, however, gave reproducible crystals upon expansion. It is presented in

Figure 3.13. The crystals were stained using the Izit Dye, and turned blue,

indicating that they were protein crystals.

91

Figure 3.13 – T4 32-B Crystals after Screening

Crystal Screen II – condition 41

1.0 M Lithium Sulfate 0.1 M Tris HCl pH 8.5 0.01 M NiCl2, 6 H20

Since the crystallization condition 41 from Crystal Screen II was the best

one obtained and could be reproduced, it was chosen for further optimization experiments. Here are some of the attempts that were made to improve the crystal quality: the Lithium Sulfate concentration was modified and increased up to 2 M, Lithium Sulfate was replaced with Ammonium Sulfate, Ammonium or

Sodium Nitrate, Ammonium or Sodium Formate, and Ammonium or Sodium

Malonate. The 32-B concentration was also varied anywhere between 20 to 35 mg/mL.

The best crystals were grown in 2 M or higher of Ammonium or Lithium

Sulfate, 100 mM Tris HCl pH 8.5 and 10 mM NiCl2. The crystals grown in

Ammonium Sulfate however had a tendency to melt faster that the ones grown in

Lithium Sulfate. Some pictures of crystals obtained in these different conditions

are shown in Figure 3.14.

92

Figure 3.14 – T4 32-B Crystals after Optimization

1

2.0 M Ammonium Sulfate 32-B, 22 mg/mL 100 mM Tris HCl pH 8.5 Room Temperature 10 mM NiCl2, 6 H20 2 µL + 2 µL drop

2

2.2 M Ammonium Sulfate 32-B, 22 mg/mL 100 mM Tris HCl pH 8.5 Room Temperature 10 mM NiCl2, 6 H20 2 µL + 2 µL drop

3

32-B, 33 mg/mL 1.8 M Lithium Sulfate Room Temperature 100 mM Tris HCl pH 8.5 1 µL + 2 µL + 2 µL drop 10 mM NiCl2, 6 H20

4 32-B, 33 mg/mL 1.8 M Lithium Sulfate Room Temperature 100 mM Tris HCl pH 8.5 2 µL + 2 µL drop 10 mM NiCl2, 6 H20

3.5.9. Data Collection

The crystals grown in Lithium Sulfate, shown in Figure 3.16 picture 3 and

4, had sharper edges, which could indicate a better crystal packing. They were therefore chosen for X-Ray diffraction studies.

The crystals first had to be flash-frozen in liquid nitrogen. Lithium sulfate at

2 M or higher concentration is a cryoprotectant, and since the crystals were

93

grown in 1.8 M Lithium Sulfate, the salt concentration just had to be increased to cryoprotect the crystals. A substitute mother liquor was made containing 2 M

Lithium Sulfate, 100 mM Tris HCl pH 8.5, 10 mM NiCl2, and then 25 mM bis-Tris

HCl pH 6.5, 150 mM NH4Cl and 2 mM EDTA as the dialysis buffer the protein

was in before being crystallized. The crystals were soaked quickly in the

substitute mother liquor before being plunged in liquid Nitrogen.

The different crystals were then screened for X-Ray diffraction on the

Rigaku FR-E high brilliance X-Ray diffractometer in the Ohio Crystallography

Consortium, located in the Instrumentation Center. Out of the dozen crystals that

were prepared that way, most only diffracted to a 7 Å or less resolution. One

crystal however, from the drop shown in Figure 3.6 picture 4, did diffract to 5 Å

and the preliminary data could be indexed more easily than with the other

crystals. A 5 Å resolution dataset is usually not good enough to solve a protein

structure, as there isn’t enough data to get the phase information. The crystal

was therefore taken to the Argonne National Lab Advanced Photon Source

(APS) synchrotron in Chicago. Synchrotron beams are more intense than in-

house beams and better resolution data can be obtained with the same crystal. A

dataset was collected by Dr. B. Leif Hanson to 4 Å resolution, at 0.90 Å X-Ray

wavelength, 300 mm crystal to detector distance. Some diffraction images are

shown in Figure 3.15.

94

Figure 3.15 – 32-B Crystal X-Ray Diffraction Images

a b c

a – Image 1 (0 to 1°) b – Image 45 (44 to 45°) c – Image 90 (89 to 90°)

The crystals grown in the ammonium sulfate condition were also flash

frozen in the 2 M Lithium Sulfate substitute mother liquor, but they did not diffract once in the X-Ray beam.

3.5.10. Data Processing

The dataset was processed using HKL2000 (Minor et al., 2002).

The initial indexing and integration were done with the space group P2,

but the Molecular Replacement attempts with AMoRe and MolRep, using the

1GPC 32 core pdb (Shamoo et al., 1995) as a search model, were all

unsuccessful. The space group P21 was tried next, without any success either.

Other space groups (P1, P4, P222, P422, C222 and their Laue subgroups) were

also used, but phasing with Molecular Replacement was unsuccessful in all

cases. A summary of the data reduction and phasing for the different space

groups is presented in Table 3.10.

95

Table 3.10 – 32-B Data Processing Summary

P1 P2 P4 P222 P422 C222

Resolution 30 to 4.5 Å 50 to 4.2 Å 30 to 4.5 Å 30 to 4.5 Å 20 to 4.5 Å 20 to 4.5 Å

Unit Cell 81.15 Å 90.004° 81.12 Å 90° 81.17 Å 90° 81.18 Å 90° 81.18 Å 90° 114.75 Å 90° 159.35 Å 90.009° 159.29 Å 90.06° 159.37 Å 90° 159.34 Å 90° 159.34 Å 90° 114.83 Å 90° Dimensions 81.15 Å 90.042° 81.17 Å 90° 81.17 Å 90° 81.13 Å 90° 81.13 Å 90° 159.35 Å 90°

Rmerge * 3.8% 6.1% 6.4% 6.3% 6.8% 5.4%

Mosaicity (°) 0.67 0.71 0.66 / / /

15842 (26996) 14992 (55076) 6060 (35591) 6517 (35609) 3441 (35593) 6395 (35616) Reflections ~ 0.5 / atom ~ 1 / atom ~ 1 / atom ~ 1 / atom ~ 1.5 / atom ~ 1 / atom

Completeness 65.6% 99.1% 99.2% 99.0% 99.2% 99.3%

# Molecules 12 6 3 3 1 or 2 3 per A.S.U.

Molecular Rfactor ~ 63% Rfactor ~ 60% Rfactor ~ 63% Rfactor ~ 63% Rfactor ~ 60% Rfactor ~ 63% Replacement CC ~ 42% CC ~ 35% CC ~ 45% CC ~ 45% CC ~ 40% CC~ 45%

For all space groups, the Rmerge values are below 10%, which is good. However, the Molecular Replacement statistics all indicate that no solution was found, as the Rfactor values are above 50 % and the correlation coefficients CC below 50 %.

⎡ n ⎤ n * R =100 × F 2 − F 2 / F 2 Equation 3.1 merge ⎢∑∑ hkl hkl i ⎥ ∑∑ hkl ⎣ hkl i=1 ⎦ hkl i=1 where F 2 is the intensity of each hkl reflection and F 2 is the mean value of i hkl hkl i measurements of n equivalent reflections.

No solution could be found after Molecular Replacement, and the space group could not be determined. This is most likely due to the low resolution of the data, and the low number of unique reflections per atom. As a rule of thumb, four unique reflections per atom are needed to ensure the final model is not biased by

96

the way the data was processed. Here, a maximum of 1.5 unique reflections per

atom was obtained with the highest symmetry space group P422, which is not enough. A new dataset should be collected on a better crystal, with a high

enough resolution so that the data could be phased with Molecular Replacement.

Unfortunately, no better crystal could be obtained.

3.5.11. Dynamic Light Scattering

Dynamic Light Scattering experiments were run on the 32-B protein, in order to have a better idea of its state in solution, which could provide some information on the poor quality of the 32-B crystals.

The protein was dialyzed in its dialysis buffer: 25 mM bis-Tris HCl pH 6.5,

150 mM NH4Cl, 10 mM MgCl2 and 2 mM β-mercaptoethanol. A protein sample at

1 mg/mL was prepared, filtered using the Millipore Ultrafree-MC filters with a 0.1

µm pore size and finally spun down at 18,000 rcf for 20 min in order to eliminate

the larger aggregates. The DLS experiment was run at 4 °C as well as 20 °C.

The results are presented below in Figure 3.16 and Table 3.11.

The 32-B sample showed some signs of light aggregation, that only account for 0.5 % or less of the total mass. The interesting result, however, is that 32-B appears to be a monomer (35 kDa) at 4 °C, but starts dimerizing at

Room Temperature: the 46 kDa estimated molecular weight is probably due to a

mixed population of monomers and dimers in solution. This result was

consistently obtained upon repeat of the experiment.

97

Figure 3.16 – 32-B Protein Dynamic Light Scattering Results

4°C 20°C

Table 3.11 – 32-B Protein Dynamic Light Scattering Results

Rh (nm) % Pd MW (kDa) % Intensity % Mass

4°C 2.7 15.7 35 90.1 99.6

20°C 3.1 16.9 46 94.7 100.0

The equilibrium between monomers and dimers seems to be more prominent at room temperature, where the 32-B crystals were grown. That mixed population was certainly one of the reasons for the low resolution obtained with the X-Ray diffraction patterns. The extra A domain must also be rather flexible, therefore inducing more disorder in the crystal lattice and accounting for some of the low quality of the diffraction.

3.5.12. Small Angle X-Ray Scattering

When X-Ray crystallography can not be used to obtain the structure of a protein, like in the case of 32-B where the crystals did not yield good enough data, solution-based techniques can be used instead. One of these techniques is

98

Small-angle X-Ray Scattering or SAXS, which provides molecular envelopes of

proteins in solution.

SAXS data was collected at the Argonne Advanced Photon Source in

Chicago, on the ChemMat-CARS 15-ID beamline, at 1.50 Å wavelength. The

resolution range of the data collected was 312 to 7.5 Å, corresponding to a

momentum transfer q range of 0.02 to 0.84 Å-1.

2π Resolution (Å) = Equation 3.2 q

4π sinθ Momentum Transfer q = Equation 3.3 λ

(where θ is the angle and λ the X--Ray wavelength).

32-B was dialyzed in its dialysis buffer, made of 25 mM bis-Tris HCl

pH 6.5, 150 mM NH4Cl, 10 mM MgCl2 and 2 mM β-mercaptoethanol. A 100 µM

(around 3 mg/mL) sample was made and all the data collected at that concentration. The buffer alone had to be run first. A number of images were collected for all exposure time, which were 5 s, 20 s and 40 s. The protein sample was then run and several images collected for each exposure time. The images corresponding to the same exposure were averaged. Figure 3.17 shows the scattering curves, plotting Intensity I(q) versus the momentum transfer q that were collected. In curve c, the buffer and protein scattering signals are superimposed for the three exposure times. The buffer signal is then substracted from the protein signal and the scattering curve from the protein alone is obtained. This is shown in curve d. Finally, the curve corresponding to the most intense signal and less error on the measurement, namely the 40 s exposure

99 one, was chosen for data analysis. It is shown in curve e. The program used was 15-ID SAXS/WAXS v.3.294, which was obtained at the beamline.

Figure 3.17 – 32-B SAXS Data Collection

c

Buffer – 5 s exposure 32-B – 5 s exposure Buffer – 20 s exposure 32-B – 20 s exposure Buffer – 40 s exposure 32-B – 40 s exposure

d

5 s exposure 20 s exposure 40 s exposure

e

100

The programs used for data processing and model building are all part of

the Svergun ATSAS suite of SAXS data processing programs (Petoukhov, 2007).

The experimental data from the 15-ID SAXS/WAXS program was fed into a data

reduction and regularization program such as Primus or GNOM that evaluates

the shape and size of the particle in solution. Primus only considers low q data,

while GNOM takes into account higher resolution data as well. These programs

use the momentum transfer plot to calculate a size distribution function p(r),

which is then used to calculate the extrapolated intensity at q = 0 : I(0), and the

radius of gyration of the particle in solution Rg. The respective equations for

these calculations are shown below.

1 ∞ p(r) = qrI(q) sin(qr)dq Equation 3.4 2π 2 ∫0

D max I(0) = 4π p(r)dr Equation 3.5 ∫0

r 2 p(r)dr 2 ∫ Rg = Equation 3.6 2∫ p(r)dr

GNOM was used to process the 32-B data. The first 5 and last 250 or 300

data points were removed, depending on the highest resolution needed in the programs used next. The maximum dimension (Dmax) of the particle was

estimated to be close to 80 Å and input in the program, it was used to calculate

the model data which were then superimposed onto the experimental data. That

is shown in Figure 3.18, plot c. The size distribution function p(r) is plotted in

Figure 3.18.d. The final radius of gyration for 32-B in solution was calculated to

be 25.91 ± 0.05 Å. With the lower resolution data (172 to 45 Å), it was calculated

101 to be 26.08 ± 0.06 Å, which is very close to the higher resolution (172 to 24 Å) value. These values are also consistent with the hydrodynamic radius obtained with the Dynamic Light Scattering experiments, which was around 30 Å. The shape of the p(r) curve provides some indication as to what the shape of the particle is. In the case of 32-B, the plot tails off at high radius, indicating that the protein is elongated.

Figure 3.18 – 32-B GNOM Plots

172 to 24 Å resolution 172 to 45 Å resolution

c

experimental data

model data

d

102

The output file from GNOM is then used in Ab Initio modeling programs like DAMMIN or GASBOR. These programs model a 3D molecular envelope of the protein. Both programs output a pdb file containing dummy atoms packed in the shape of the molecular envelope. While DAMMIN packs dummy atoms within the envelope, GASBOR packs them in a chain-compatible model. Other differences include : GASBOR uses the entire q range of the data (and therefore the higher resolution output from GNOM), and outputs the most probable model only, while DAMMIN, which uses only low q data (lower resolution output from

GNOM) can be run in several modes. The best mode with which to run DAMMIN is called “keep mode”, where all possible models are output. They can then be averaged using the program DAMAVER. Both DAMMIN and GASBOR were used to determine what the shape of 32-B is. The ribbon structure of the 32 core protein was then modeled manually in the different envelopes. Figure 3.19 shows the envelopes obtained from both programs. The dimensions of the envelopes obtained in both cases allow the modeling of two chains of 32 core. This is consistent with the dynamic light scattering (see Section 3.5.11) and analytical ultra-centrifugation (Dwlgosh, 2008) results that showed that 32-B is a dimer in solution. DAMMIN was used with both the high resolution and low resolution data. The higher resolution model has a more defined shape, but in both cases the two chains were modeled in a back to back fashion. In the case of GASBOR, the envelope is a little smaller and the two chains had to be modeled in an interlocked manner to fit the model. The DAMMIN model, where the two proteins interact through hydrophobic regions, is more likely than the interlocked model.

103

Moreover, the high resolution model from DAMMIN is the only one that leaves enough space for the missing C-terminal 62 residues, and also has a lower χ2 value.

Figure 3.19 – 32-B 3D Molecular Envelope

Figure 3.19.a – DAMMIN Models

C-terminus

Low resolution model (172 to 45 Å)

Average of ten models χ2 = 2.1 Dimensions: 90 Å × 60 Å × 50 Å

C-terminus

C-terminus High resolution model (172 to 24 Å)

Average of five models χ2 = 1.8

Dimensions: 85 Å × 80 Å × 50 Å C-terminus

104

Figure 3.19.b – GASBOR Model (172 to 24 Å resolution)

C-termini

χ2 = 2.1

Dimensions: 85 Å × 55 Å × 35 Å

The A domain, present in the 32-B protein but missing from the 32 core crystal structure, could also be modeled using the CREDO program package.

Among the programs contained in the package, CREDO and CHADD are the most widely used. CREDO creates a chain of dummy atoms corresponding to the missing fragment, while CHADD attaches that chain to the known terminal residue of the atomic structure. In addition, the dummy atoms output by CHADD are separated by 3.8 Å, corresponding to the average length of a peptide bond.

The program CHADD was used to model the A domain of the 32-B protein. The model is shown in Figure 3.20. However, the χ2 value on that model is higher

than with the Ab Initio models from DAMMIN and GASBOR.

105

Figure 3.20 – Modeling of the A Domain of 32 Protein (Chadd)

C-terminus

χ2 = 3.3

The surface model output by CHADD is shown in transparent pink, the white spheres are water molecules. The X-Ray structure of 32 core was superposed onto the CHADD output surface. The missing C-terminal or A domain is shown on the left, linked to the 32 core C-terminal residue.

The Small-Angle X-Ray Scattering results for the 32-B protein provided

some more insight concerning the structure of that protein in solution. It was confirmed that the protein exists in the dimeric form in solution, which may only be an artifact of the truncation and is not necessarily physiologically relevant. A model of the missing A domain was calculated, as well as several models of the

overall 32-B dimer. The programs used for the data processing and modeling

however do not provide the user with a definite answer, and interpretation of the models is an important part of the SAXS data analysis, which is why several possibilities were presented. Only X-Ray crystallography can yield a definite structure for the 32-B protein. However, growing high resolution crystals of 32-B proved to be difficult, and the same problems were encountered by several groups while trying to grow crystals of the full length 32 protein. The extra

106

C-terminal A domain and N-terminal B domain are probably very flexible, which is

why only the 32 core could be crystallized into high enough resolution crystals.

This domain flexibility could also explain the high χ2 value on the modeling of the

A domain from the 32-B SAXS data.

3.6. Bacteriophage T4 32-B Mutants

3.6.1. Introduction

As described in Sections 5.3.2 and 5.3.8, three 32-B mutants were designed in order to further probe the interaction between 32-B and RNase H.

The mutants were as follows: W144E, I151D and I60D where a Tryptophan and

two Isoleucine residues at the interface of the interaction with RNase H were

respectively mutated into a Glutamate and two Aspartate residues. Unlike the

I151D and I60D mutants, the W144E mutant could not be cloned successfully.

The protein characteristics for the native 32-B protein in comparison to the two

successful mutants were calculated using ExPASy (Gill and von Hippel, 1989;

Gasteiger et al., 2003).

Table 3.12 – 32 Protein and Truncations Characteristics

32-B I151D 32-B I60D 32-B

Amino-acids 286 286 286

31.8 kDa 31.8 kDa Molecular Weight 31.8 kDa (36.3 kDa with His-Tag) (36.3 kDa with His-Tag)

pI 4.65 4.61 4.61

ε 1.24 1.25 1.25

107

3.6.2. Molecular Cloning

The forward and reverse primers for the site-directed mutagenesis PCR reaction were designed according to the QuikChange® manual (see Section

2.1.3), and are presented in Figure 3.21.

Figure 3.21 – T4 32-B Mutants Site-Directed Mutagenesis Primers

Figure 3.21.a – T4 W144E 32-B Primers

Forward Primer

32-b 5’– TAC CGC TTT GGT AAG AAA ATC TGG GAT AAA ATC AAT GCA ATG –3’ Y R F G K K I W D K I N A M Primer 5’– CGC TTT GGT AAG AAA ATC GAA GAT AAA ATC AAT GC –3’ R F G K K I E D K I N

5’– CGC TTT GGT AAG AAA ATC GAA GAT AAA ATC AAT GC –3’

Reverse Primer

32-b 3’– ATG GCG AAA CCA TTC TTT TAG ACC CTA TTT TAG TTA CGT TAC –5’ Primer 3’– GCG AAA CCA TTC TTT TAG CTT CTA TTT TAG TTA CG –5’

5’– GC ATT GAT TTT ATC TTC GAT TTT CTT ACC AAA GCG –3’

35 bp total Tm = 75.0 °C

Figure 3.21.b – T4 I151D 32-B Primers

Forward Primer

32-b 5’– AAA ATC AAT GCA ATG ATT GCG GTT GAT GTT GAA ATG –3’ K I N A M I A V D V E M Primer 5’– ATC AAT GCA ATG GAT GCG GTT GAT GTT G -3’ I N A M D A V D V

5’– ATC AAT GCA ATG GAT GCG GTT GAT GTT G –3’

Reverse Primer

32-b 3’- TTT TAG TTA CGT TAC TAA CGC CAA CTA CAA CTT TAC –5’ Primer 3’- TAG TTA CGT TAC CTA CGC CAA CTA CAA C –5’

5’– C AAC ATC AAC CGC ATC CAT TGC ATT GAT –3’

28 bp total Tm = 73.4 °C

108

Figure 3.21.c – T4 I60D 32-B Primers

Forward Primer

32-b 5’– CAA GCA CCA TTC GCA ATT CTT GTA AAT CAC GGT TTC –3’ Q A P F A I L V N H G F Primer 5’- GCA CCA TTC GCA GAT CTT GTA AAT CAC GG –3’ A P F A D L V N H

5’– GCA CCA TTC GCA GAT CTT GTA AAT CAC GG –3’

Reverse Primer

32-b 3’– GTT CGT GGT AAG CGT TAA GAA CAT TTA GTG CCA AAG –5’ Primer 3’– CGT GGT AAG CGT CTA GAA CAT TTA GTG CC –5’

5’– CC GTG ATT TAC AAG ATC TGC GAA TGG TGC –3’

29 bp total Tm = 76.5 °C

The forward primers for the site-directed mutagenesis PCR reactions are shown first. They were aligned with the nucleotide sequence of T4 32-B. The mutated nucleotide is shown in red. Highlighted in yellow is the primer that anneals with the original 32-b gene. The reverse primers are shown second, highlighted in the same manner.

Initial attempts at the site-directed mutagenesis PCR reaction for all three mutants were made, using the pEKF2 plasmid as a template, but were all unsuccessful due to the large size of that template plasmid. Optimization of the reaction was also attempted, by using with a higher processivity such as the Pfu Ultra Polymerase or the KOD polymerase. Increasing the dNTPs concentration, MgCl2 concentration, extension time or changing the annealing

temperature were also done, but without success. Finally, the 32-b gene was

recloned in a different expression vector, the pDEST-C1 vector, as described in

Section 3.5.2, and the new plasmid was then used as the template for the site- directed mutagenesis PCR reactions.

109

Out of the three mutants, only two could be cloned, the I151D and I60D

32-B mutants. All the PCR reactions for the third W144E 32-B mutant were unsuccessful, despite the attempts at optimizing the PCR reaction: different annealing temperatures were probed, as well as different concentrations of primers and template. Below is the description of the cloning procedures for the

I151D and I60D mutants. Table 3.13 shows the parameters for the two successful PCR reactions.

Table 3.13 – Site-Directed Mutagenesis PCR Reactions for the 32-B Mutants

Table 3.13.a – I151D 32-B Reaction

PCR Reaction PCR Program

KOD buffer (10X) 5 µL Activation 95 °C, 2 minutes

Forward primer (2.5 µM) 6 µL Denaturation 95 °C, 20 seconds

Reverse primer (2.5 µM) 6 µL Annealing 55 °C, 10 seconds

dNTPs (2 mM) 5 µL Extension 70 °C, 2 minutes

MgSO4 (25 mM) 5 µL 20 cycles

KOD Polymerase (1 U/µL) 1 µL Final extension 70 °C, 20 minutes

Template 1 µL

Autoclaved water 21 µL

110

Table 3.13.b – I60D 32-B Reaction

PCR Reaction PCR Program

KOD buffer (10X) 5 µL Activation 95 °C, 2 minutes

Forward primer (2.5 µM) 6 µL Denaturation 95 °C, 20 seconds

Reverse primer (2.5 µM) 6 µL Annealing 60 °C, 10 seconds

dNTPs (2 mM) 5 µL Extension 70 °C, 2 minutes

MgSO4 (25 mM) 5 µL 20 cycles

KOD Polymerase (1 U/µL) 1 µL Final extension 70 °C, 20 minutes

Template 1 µL

Autoclaved water 21 µL

Upon reception, the primers were first dissolved and diluted to 250 µM with 1X TE buffer, then a 2.5 µM stock was made for each primer to be used in the PCR reaction.

When the reactions were finished, the template was digested by adding

1 µL of the DpnI restriction enzyme at 20 U/µL, leaving only the mutated plasmids in solution, which were then run on 1% agarose gels, shown in Figure

4.22.a. At that point, the plasmids are still double-nicked, and they run higher than their actual size when compared to the supercoiled DNA ladder. These plasmids were then transformed into competent DH5α cells (2 µL of plasmid / 25

µL of cells), 50 µL and 100 µL of each transformation were plated on LB +

Streptomycin plates. Colonies were picked and grown in LB + Streptomycin media, and the plasmids isolated to be run on 1% agarose gels. These are shown in Figure 3.22.b. The I151D 32-B plasmid from colony 3 and the I60D

32-B plasmid from colony 4 were chosen for expression studies.

111

Figure 3.22 – Agarose Gels for the T4 32-B Mutants Cloning

Figure 3.22.a – T4 32-B Mutants PCR Reactions

1 2 3 4 5

1- Supercoiled DNA ladder 2- T4 I151D 32-b in pDEST-C1 – Mutagenesis PCR product

5 kb 5 kb 5 kb 3- Supercoiled DNA ladder

4- T4 I60D 32-b in pDEST-C1 – 2 kb 2 kb 2 kb Mutagenesis PCR product

5- 1 kb DNA ladder

Figure 3.22.b – T4 32-B Mutants (pDEST-C1) Miniprep

1 2 3 4 5 6 7 8 1- T4 I151D 32-b in pDEST-C1 – PCR product

2- T4 I151D 32-b in pDEST-C1 – colony 1 3- T4 I151D 32-b in pDEST-C1 – colony 2 4- T4 I151D 32-b in pDEST-C1 – colony 3 5- T4 I151D 32-b in pDEST-C1 – colony 4 6- T4 I151D 32-b in pDEST-C1 – colony 5

7- T4 I151D 32-b in pDEST-C1 – colony 6 8- Supercoiled DNA ladder 5 kb

2 kb • 32-b in pDEST-C1: 4617 bp All plasmids have the correct size except the one isolated from colony 3.

112

1 2 3 4 5 6 7 1- T4 I60D 32-b in pDEST-C1 – colony 1 2- T4 I60D 32-b in pDEST-C1 – colony 2 3- T4 I60D 32-b in pDEST-C1 – colony 3 4- T4 I60D 32-b in pDEST-C1 – colony 4 5- T4 I60D 32-b in pDEST-C1 – colony 5 6- T4 I60D 32-b in pDEST-C1 – colony 6 7- Supercoiled DNA ladder 5 kb

2 kb • 32-b in pDEST-C1: 4617 bp All plasmids have the correct size.

3.6.3. Protein Expression and Solubility

The two plasmids described above were transformed into competent T7

express cells (0.8 µL / 50 µL of cells). After transformation, the cells were first

plated on LB + Streptomycin + Tetracyclin plates (50 µL and 100 µL of cells per

plate, for each mutant). One colony was picked from each plate, grown up

overnight in LB + Streptomycin + Tetracyclin media, that culture was then used to

inoculate media and the cells were grown until OD600 = 0.6, when glycerol stocks

were taken and protein expression was induced by adding 1 mM IPTG. 0 h and

3 h expression samples were taken and run on a SDS-PAGE gel. The results

from protein expression are presented in Figure 3.23.a. Both the I151D and the

I60D 32-B mutants were overexpressed.

The cells were then lysed, according to the lysis procotol described in

Section 2.3, to check for protein solubility. The lysis buffer that was used was

composed of 40 mM Tris-HCl pH 8.0, 100 mM NaCl, 10 mM MgCl2, 2 mM CaCl2

113 and 1 mM EDTA. Large amounts of protein can be found in the pellet and supernatant after cell lysis, as can be seen on Figure 3.23.b. However, this is not necessarily an indication of a solubility problem, since the amounts of 32-B mutants expressed were so large, and not enough lysis buffer might have been used to allow all the protein to be extracted from the cells.

Figure 3.23 – T4 32-B Mutants Expression and Solubility

Figure 3.23.a – T4 32-B Mutants Protein Expression

1 2 3 4 5 1- T4 I151D 32-B expression (plasmid from colony 1) – 0h sample 2- T4 I151D 32-B expression (plasmid

from colony 1) – 3h sample 3- T4 I151D 32-B expression (plasmid 66.3 kDa 55.4 kDa from colony 2) – 0h sample

4- T4 I151D 32-B expression (plasmid

36.5 kDa from colony 2) – 3h sample 31.0 kDa 5- Molecular Weight Marker 21.5 kDa

1 2 3 4 5

66.3 kDa 1- T4 I60D 32-B expression (plasmid 55.4 kDa from colony 1) – 0h sample 36.5 kDa 2- T4 60D 32-B expression (plasmid 31.0 kDa from colony 1) – 3h sample 21.5 kDa 3- T4 I60D 32-B expression (plasmid 14.4 kDa from colony 2) – 0h sample 4- T4 I60D 32-B expression (plasmid from colony 2) – 3h sample

5- Molecular Weight Marker

114

Figure 3.23.b – T4 32-B Mutants Cell Lysis

1 2 3

1- T4 I151D 32-B cell lysis – pellet 2- T4 I151D 32-B cell lysis – supernatant 66.3 kDa 3- Molecular Weight Marker 55.4 kDa

36.5 kDa 31.0 kDa 1 2 3

21.5 kDa

66.3 kDa 55.4 kDa

36.5 kDa 31.0 kDa

21.5 kDa 1- T4 I60D 32-B cell lysis – pellet 14.4 kDa 2- T4 I60D 32-B cell lysis – supernatant

3- Molecular Weight Marker

As it was the case for T4 32-B expressed in pDEST-C1, the 32-B mutants are found in both the pellet and supernatant after cell lysis. A large enough portion of the protein is soluble to carry on with purification.

Once it was known that both mutants could be expressed in a soluble

manner, the two plasmids were sent for sequencing at the Plant-Microbe

Genomics Facility at Ohio State University. Both sequences were correct, as is

shown in Appendix 3, and the mutations were confirmed.

115

3.6.4. Protein Purification

The protocol that was designed for the 32-B protein after a solubility

screen was applied to the two 32-B mutants. Indeed, that purification scheme

yielded large amounts of pure protein. The different sets of buffers are detailed in

Table 3.14.

Table 3.14 – Lysis and HPLC Buffers for the T4 32-B Mutants Purification

Ion Exchange Hydrophobic Lysis (Q Sepharose, POROS HQ) (POROS PE)

40 mM Tris HCl pH 8.0 25 mM Tris HCl pH 7.5 100 mM NaCl 25 mM Tris HCl pH 7.5 50 mM NaCl Buffers 10 mM MgCl 25 mM Na Citrate pH 7.5 2 2% glycerol 2 mM CaCl 0 - 1 M NaCl 2 600 mM (NH ) SO 1 mM EDTA 4 2 4

Buffer A: ~ 16 mS/cm Conductivity ~ 18 mS/cm ~ 110 mS/cm Buffer B: ~ 96 mS/cm

After cell lysis, the supernatant was first loaded on the low-resolution

anion-exchange Q Sepharose. The elution from the Q Sepharose was then

loaded on the hydrophobic POROS PE to get rid of any endogenous nuclease

that might be present in solution. The conductivity of the protein sample had to

be increased first by addition of 3 M Ammonium Sulfate. The protein was found

in the void fraction as expected after the POROS PE run. To decrease the

conductivity of the sample and match it back to that of Q buffer A, the protein was

dialyzed overnight in Q buffer A. It was then further purified with the

high-resolution anion-exchange column POROS HQ.

116

Examples of the chromatograms and corresponding SDS-PAGE gels for the I60D 32-B mutant purification are shown in Figure 3.24.

Figure 3.24 –T4 I60D 32-B Purification

Figure 3.24.a –Q Sepharose

* * * * * *

1- Molecular Weight Marker 1 2 3 4 5 6 7 8 9 10 11 12 13 2- Q Sepharose – Load 3- Q Sepharose – F. 18 4- Q Sepharose – F. 33

66.3 kDa 5- Q Sepharose – F. 40 55.4 kDa 6- Q Sepharose – F. 45 36.5 kDa 7- Q Sepharose – F. 49 31.0 kDa 8- Q Sepharose – F. 58 21.5 kDa 9- Q Sepharose – Flow Through 1 14.4 kDa 10- Q Sepharose – Flow Through 2 11- Molecular Weight Marker 12- POROS PE – Void Fraction

13- POROS PE – Flow Through

On the chromatogram, the OD260 is shown in purple, the OD280 in green, the conductivity in red and the % B in black. The red stars indicate which fractions were run on a SDS- PAGE gel. The fractions 37 to 52 were pooled and run on the POROS PE column, while the fractions 53 to 66 would have to be run the Q Sepharose column again. They were pooled and kept at -80 °C until further purification. The POROS PE void fraction, containing I151D 32-B, and the flow through are shown in lane 12 an 13.

117

Figure 3.24.b –POROS HQ

* * * *** *

1 2 3 4 5 6 7 8 9 10 11 12 1- Molecular Weight Marker 2- POROS HQ – Load

3- POROS HQ – F. 12 4- POROS HQ – F. 17 66.3 kDa 55.4 kDa 5- POROS HQ – F. 20 6- POROS HQ – F. 25 36.5 kDa 7- POROS HQ – F. 31 31.0 kDa 8- POROS HQ – F. 42 21.5 kDa 9- POROS HQ – F. 52 14.4 kDa 10- POROS HQ – Rinse Fraction

11- POROS HQ – Flow Through 12- Molecular Weight Marker

I60D 32-B was found in the rinse fraction as well as in fractions 1 to 17, they were pooled and concentrated. More protein was found in fraction 18 to 27, they were added to the extra fractions from the Q Sepharose column and kept at -80 °C.

118

After the POROS HQ run, both mutants were pure enough and were concentrated. The final yields for the two mutants with the His-Tag still attached are as follows: 120 mg of I151D 32-B from a 6 L culture, and 120 mg of I60D

32-B from a 3 L culture. Some of each protein was flash frozen on dry ice in the presence of 25 % glycerol and kept at -80 °C, and the remaining was used in the

TEV Protease reaction, in order to remove the His-Tag.

3.6.5. Cleaving of the His-Tag

The N-terminal His-Tag present in both mutants needs to be removed as it might interfere while studying the interaction with RNase H. The TEV protease is

a cysteine protease that specifically recognizes the amino-acid sequence

ENLYFQG and cleaves between the Glutamine and Glycine residues. The full

sequence of 32-B with the His-Tag and linker is shown below in Figure 3.25.

Figure 3.25 – TEV Protease Cleavage Site

1 MAHHHHHHVG TGSNDDDDKS TSLYKKAGSA AAPFTENLYF Q*GLNGNKGFS SEDKGEWKLK 61 LDNAGNGQAV IRFLPSKNDE QAPFAILVNH GFKKNGKWYI ETCSSTHGDY DSCPVCQYIS 121 KNDLYNTDNK EYSLVKRKTS YWANILVVKD PAAPENEGKV FKYRFGKKIW DKINAMIAVD 181 VEMGETPVDV TCPWEGANFV LKVKQVSGFS NYDESKFLNQ SAIPNIDDES FQKELFEQMV 241 DLSEMTSKDK FKSFEELNTK FGQVMGTAVM GGAAATAAKK ADKVADDLDA FNVDDFNTKT 301 EDDFMSSSSG SSSSADDTDL DDLLNDL

The 32-B protein sequence is highlighted in yellow. In green is the starting Methionine, the His-Tag is highlighted in pink and the TEVprotease cleavage site in blue. The cut made by the protease is represented with a red star. The two isoleucines that were mutated are shown in red.

119

Once the TEV protease has cleaved the His-Tag, a residual Glycine residue is left at the N-terminus of the protein.

I151D and I60D 32-B were mixed with the TEV protease, in a 1:20 and

1:50 mass ratio respectively. The reaction setups are presented in Table 3.15.

β-Mercaptoethanol was added to initiate the reaction, since the TEV protease is a cysteine protease.

Table 3.15 – TEV Protease Reaction Setup

Table 3.15.a – I151D 32-B Reaction

TEV Protease Reaction

Component Concentration Volume Amount

I151D 32-B 20.9 mg/mL 1 mL ~20 mg

TEV protease 1.55 mg/mL 700 µL ~1 mg final β-mercaptoethanol 14.3 M 1 µL concentration ~5 mM

Table 3.15.b – I60D 32-B Reaction

TEV Protease Reaction

Component Concentration Volume Amount

I60D 32-B 5 mg/mL 10 mL ~50 mg

TEV protease 1.55 mg/mL 700 µL ~1 mg final β-mercaptoethanol 14.3 M 2 µL concentration 2.5 mM

120

In the case of I151D 32-B, the protein was concentrated at around 20 mg/mL, which is apparently too high, as some precipitation occurred when the

TEV protease was added. The protein concentration was lowered to 5 mg/mL when the reaction was done with I60D 32-B. The protease reactions were left at room temperature overnight. Some light precipitate was present at the end of the reaction in both cases. The SDS-PAGE gels run for the two respective experiments are presented in Figure 3.26.

After the protease reaction, each mixture was run on the cobalt affinity

Talon column. The TEV protease has a His-Tag, so it binds to the column as well as the His-Tag that was cleaved, and the cleaved 32-B mutant protein flows through the column. The TEV protease and the His-Tag were eluted off the column with either a 0 to 250 mM imidazole elution, or a direct 250 mM imidazole elution. The SDS-PAGE gels for both 32-B mutants purification are also shown in

Figure 3.29.

The cleaved proteins after purification over the Talon column still showed some higher molecular weight impurities, some of them having twice the size of

32-B. This indicates that some cross-linking might be taking place, which is also

a possibility since there are four cysteine residues in 32-B. They are highlighted in yellow in the amino-acid sequence below, in Figure 3.27.

121

Figure 3.26 – 32-B Mutants TEV Protease Reactions

Figure 3.26.a – I151D 32-B TEV Protease Reaction

1 2 3 4 5 6 7 8 9 10 11 12 13 14

66.3 kDa 55.4 kDa

36.5 kDa 31.0 kDa 21.5 kDa

1- I151D 32-B (+ His-Tag) 2- TEV Protease It can be seen how from lane 1 to 6 3- I151D 32-B + TEV Protease – 0 h pellet

the molecular weight of I151D 32-B 4- I151D 32-B + TEV Protease – 0 h sup. decreased, as the His-Tag was cleaved off. The precipitate is mostly 5- I151D 32-B + TEV Protease – 16 h pellet formed of I151D 32-B, but it might be 6- I151D 32-B + TEV Protease – 16 h sup. due to a concentration problem. After the reaction, the sample was 7- Molecular Weight Marker run on the Talon column. I151D 32-B 8- Talon – Load was eluted in the wash fraction (lane 13) and the His-Tag can be seen in 9- Talon – F. 7 the lane 11, eluted with 250 mM 10- Talon – F. 19 Imidazole. 11- Talon – F. 26

12- Talon – Flow Through 13- Talon – Wash Fraction (7.5 mM Imidazole) 14- Molecular Weight Marker

122

Figure 3.26.b – I60D 32-B TEV Protease Reaction

1 2 3 4 5 6 7 8

1- I60D 32-B (+ His-Tag) 66.3 kDa 55.4 kDa 2- TEV Protease 3- I60D 32-B + TEV Protease – 0 h 36.5 kDa 4- Talon – Load 31.0 kDa 5- Talon – Wash Fraction (7.5 mM Imidazole) 21.5 kDa 6- Talon – F. 25 14.4 kDa 7- Talon – 250 mM Imidazole Elution 8- Molecular Weight Marker

Here, most of the I60D 32-B after reaction did not precipitate and was eluted in the wash fraction from the Talon column. The TEV protease and the His-Tag were eluted first with am Imadazole gradient, and then directly with 250 mM Imidazole (the sample had to be loaded on the column in several batches).

Figure 3.27 – Cysteine Residues in the 32-B Protein

1 GLNGNKGFSS EDKGEWKLKL DNAGNGQAVI RFLPSKNDEQ APFAILVNHG FKKNGKWYIE 61 TCSSTHGDYD SCPVCQYISK NDLYNTDNKE YSLVKRKTSY WANILVVKDP AAPENEGKVF 121 KYRFGKKIWD KINAMIAVDV EMGETPVDVT CPWEGANFVL KVKQVSGFSN YDESKFLNQS 181 AIPNIDDESF QKELFEQMVD LSEMTSKDKF KSFEELNTKF GQVMGTAVMG GAAATAAKKA 241 DKVADDLDAF NVDDFNTKTE DDFMSSSSGS SSSADDTDLD DLLNDL

The cysteine residues are highlighted in yellow, and the mutated isoleucine residues are shown in red.

123

β-Mercaptoethanol was added to the protein samples, with a final

concentration of 50-100 mM, and most of the higher molecular weight bands

disappeared. This is shown for I151D 32-B in Figure 3.28.

Figure 3.28 – I151D 32-B Cross Linking

1 2 3 4 1- I151D 32-B after Talon column elution 2- Molecular Weight Marker 66.3 kDa 55.4 kDa 3- Molecular Weight Marker 4- I151D 32-B + 100 mM β-mercaptoethanol 36.5 kDa 31.0 kDa

21.5 kDa

14.4 kDa A band is present in lane 1 around 66 kDa, corresponding to a dimer of I151D 32-B. This band disappears after addition of β-mercaptoethanol.

Once both mutants were cleaved from the His-Tag and in a monomeric

state by addition of a reducing agent, they were concentrated before any further experiment was done.

3.7. Conclusion

In this chapter it was described how the 32 protein and the three 32

truncations were expressed and purified.

124

A great deal of time was dedicated to the 32-B truncation. The protein had

to be re-cloned in a different expression vector, in order to clone mutants. These mutants were successfully cloned and expressed. The 32-B protein was also

characterized through a number of structural and biophysical studies. The X-Ray

diffraction studies were not successful as a phase solution could not be obtained

from the data. However, scattering experiments did provide some more insight

on the state of 32-B in solution: it appears to be in equilibrium between a

monomer and a dimer form; and the shape of the A domain, which was missing

from the 32 core crystal structure, was obtained.

CHAPTER 4 - Bacteriophage T4 RNase H

4.1. Introduction

Bacteriophage T4 RNase H is a 5’ to 3’ exonuclease associated with the DNA

replication fork. Its role is to cleave off the RNA/DNA duplex primers needed to

start the synthesis of the Okazaki fragments on the DNA lagging strand. More

background information on RNase H is provided in Section 1.1.3.

In this chapter is presented mostly the expression and purification of the

native RNase H, and two of its mutants. The D132N active site mutant is an

inactive nuclease, needed for the assays that require the presence of DNA. The

D132N ∆N mutant is the N-terminal truncation of D132N RNase H, missing the first nine amino-acids. The N-terminus of RNase H is known to interact with the

45 protein (see Section 1.1.3).

The protein characteristics for the native RNase H as well as the D132N mutant and D132N ∆N double mutant were calculated using the ExPASy

website, (Gill and von Hippel, 1989; Gasteiger et al., 2003) and are summarized

in Table 4.1.

125 126

Table 4.1 – RNase H characteristics

Native D132N D132N ∆N

Amino-acids 305 305 297

Molecular Weight 35.6 kDa 35.6 kDa 34.6 kDa

pI 8.61 8.75 9.11

ε 1.65 1.72 1.78

4.2. Bacteriophage T4 Native and D132N Mutant RNase H

4.2.1. Protein Expression

Glycerol stocks harboring the genes for the native and the D132N mutant

RNase H were obtained from Dr. Nancy Nossal (N.I.H.). The gene encoding for

RNase H was cloned into the pNN2202 plasmid derived from the pT7-7 vector

(Hollingsworth and Nossal, 1991) and transformed in MV1190 E. coli cells.

Glycerol stocks were made and stored at -80 °C.

The protein was expressed using the large scale protein expression protocol described in Section 2.2.2. Cells transformed with the native RNase H plasmid were grown overnight at 37 °C in 25 g/L LB containing 1 mM

Chloramphenicol and 1 mM Ampicillin, that culture was then used to inoculate 6

L of LB + 1 mM Ampicillin. D132N RNase H did not require Chloroamphenicol in the overnight culture. Protein expression in both cases was induced with 1 mM

IPTG when the OD600 reached 0.6, the cells were harvested after three hours at

37 °C and stored at -20 °C. The expression of RNase H is shown in Figure 4.1. A

total amount of 10 to 12 grams of cells was typically obtained for both proteins.

127

Figure 4.1 – SDS-PAGE of T4 native and D132N RNase H expression

4 5 6 a 1 2 3 b

66.3 kDa 55.4 kDa 1- Molecular Weight Marker 2- Native RNase H - 0h sample 66.3 kDa 36.5 kDa 55.4 kDa 3- Native RNase H - 3h sample 31.0 kDa

21.5 kDa 36.5 kDa 31.0 kDa 4- Molecular Weight Marker 14.4 kDa 21.5 kDa 5- D132N RNase H - 0h sample

14.4 kDa 6- D132N RNase H - 3h sample

a – Native RNase H expression, b – D132N RNase H expression

4.2.2. Cell Lysis

The native and D132N mutant RNase H were extracted from the cells

using the same protocol, described in Section 2.3.

The lysis buffer contained 50 mM Tris HCl pH 7.5, 200 mM NH4Cl, 10 mM

MgCl2, 5% glycerol, 2 mM Dithiothreitol (DTT), 0.03% Polyethylene Imine (PEI).

A volume of 100 mL of buffer was used for every 10 g of cells. The success of the cell lysis was tested by SDS-PAGE, and an example of the D132N RNase H

cell lysis can be seen on Figure 4.2. RNase H is found in the lysate.

The lysate was either directly purified or stored at -80°C for later HPLC

use.

128

Figure 4.2 – SDS-PAGE of T4 D132N RNase H cell lysis

1 2 3

66.3 kDa 55.4 kDa 1- D132N RNase H lysis - pellet

36.5 kDa 2- D132N RNase H lysis - supernatant 31.0 kDa 3- Molecular Weight Marker

21.5 kDa 14.4 kDa

4.2.3. Protein Purification

Similarly to the cell lysis, the native RNase H and the D132N were purified using the same protocol. The purification was done in three steps: first a low-resolution cation-exchange column (SP Sepharose), then the hydroxyapatite column (HA) to remove the remaining DNA in solution, and finally a high- resolution cation-exchange column (POROS HS). The different buffers used for these runs are shown in Table 4.2.

129

Table 4.2 – HPLC buffers for T4 RNase H purification

Ion Exchange Hydroxyapatite (SP Sepharose, POROS HS)

50 mM Tris HCl pH 7.5 25 mM Tris HCl pH 7.5 100 mM NH Cl 100 mM NaCl Buffers 4 10 mM MgCl2 1% glycerol 0 - 750 mM NaCl 0 - 1 M (NH4)2SO4

Buffer A: ~ 16 mS/cm Buffer A: ~ 13 mS/cm Conductivity Buffer B: ~ 75 mS/cm Buffer B: ~ 130 mS/cm

The lysate was prefiltered, and its conductivity adjusted with 50 mM Tris

HCl pH 7.5 to match the conductivity of SP buffer A before it could be loaded onto the SP Sepharose. The protein was eluted from the column using a salt gradient. The fractions containing RNase H were pooled and the purity tested by

SDS-PAGE. A similar approach was used for the HA and POROS HS runs.

Examples of chromatograms and SDS-PAGE gels for D132N RNase H purification are shown in Figure 4.3.

4.2.4. Dialysis and Concentration

The pure RNase H was dialyzed against a buffer containing 25 mM

Bis-Tris HCl pH 6.5, 150 mM NH4Cl, 10 mM MgCl2 and 2 mM β-mercaptoethanol

(BME). Upon dialysis, the protein was concentrated up to 30 mg/mL and flash

frozen on dry ice after addition of a minimum of 15% glycerol. The frozen protein

was kept at -80°C until further use.

130

Figure 4.3 – T4 D132N RNase H purification

Figure 4.3.a – SP Sepharose

* * * * * * * * * * * * *

1- Molecular Weight Marker 2- SP Sepharose - F. 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 3- SP Sepharose - F. 13

4- SP Sepharose - F. 15 66.3 kDa 5- SP Sepharose - F. 16 55.4 kDa 6- SP Sepharose - F. 17 36.5 kDa 7- SP Sepharose - F. 18 31.0 kDa 8- SP Sepharose - F. 19 21.5 kDa 9- SP Sepharose - F. 20 10- SP Sepharose - F. 21 11- SP Sepharose - F. 22

12- SP Sepharose - F. 23 13- SP Sepharose - F. 24 14- SP Sepharose - F. 25 15- SP Sepharose - Flow Through

On the chromatogram, the OD260 is shown in purple, the OD280 in green, the conductivity in red and the % B in black. The red stars indicate which fractions were run on a SDS-PAGE gel. Here, fractions 13 to 19 contained D132N RNase H and were pooled to be run on the hydroxyapatite column. The pooled fractions are indicated with a red line on the chromatogram.

131

Figure 4.3.b – Hydroxyapatite

**

1 2 3 4 5

1- HA – Load 66.3 kDa 55.4 kDa 2- HA – F. 9 3- HA – F. 12 36.5 kDa 31.0 kDa 4- HA – Flow Through 21.5 kDa 5- Molecular Weight Marker

14.4 kDa

132

Figure 4.3.c – POROS HS

**

1 2 3 4 5

1- POROS HS – Load 66.3 kDa 55.4 kDa 2- POROS HS – F. 19

36.5 kDa 3- POROS HS – F. 21 31.0 kDa 4- POROS HS – Flow Through 21.5 kDa 5- Molecular Weight Marker 14.4 kDa

D132N RNase H can be found in fractions 19 and 21 on the SDS-PAGE gel, and looks pure. The fractions 19 to 22 were pooled to be dialyzed and concentrated.

133

4.2.5 Scattering Studies

Scattering studies were done on the D132N RNase H, to further characterize the protein in solution.

Dynamic Light Scattering

D132N was dialyzed in its dialysis buffer described in the previous section.

A 1 mg/mL sample was then prepared, filtered using a 0.1 µm pore size Millipore

Ultrafree-MC filtering device, and finally spun down at 18,000 rcf at 4 °C for 20 minutes. The DLS readings were taken at 4 °C and 20 °C. The results are presented below in Figure 4.4 and Table 4.3.

The 4 °C and 20 °C results are consistent with one another. The polydispersity of the sample indicates some slight aggregation, and the calculated molecular weight from a hydrodynamic radius of 2.5 nm is around 30 kDa, which is close enough to the theoretical 35.5 kDa of the protein. D132N

RNase H therefore appears to be a monomer in solution.

Figure 4.4 – D132N RNase H Dynamic Light Scattering Results

4 °C 20 °C

134

Table 4.3 – D132N RNase H Dynamic Light Scattering Results

Rh (nm) % Pd MW (kDa) % Intensity % Mass

4 °C 2.5 14.0 30 94.4 100.0

20°C 2.5 14.0 28 89.7 100.0

Small Angle X-Ray Scattering

Small Angle X-Ray Scattering experiments were done on D132N RNase

H, as part of the controls needed for the SAXS experiments carried out on the

D132N RNase H + 32 protein complex (see Section 5.3.3).

The SAXS data were collected at the Argonne Advanced Photon Source

15-ID beamline, under the same conditions described in the previous chapter

(see Section 3.5.12). The sample was prepared similarly to the DLS sample, but the sample concentration was 100 µM (~ 3.5 mg/mL). Several images were collected with a 5 s, 20 s and 40 s exposure for the buffer and the protein sample, but only the 40 s ones were averaged and used for data processing. The data collected at 40 s exposure time is shown in Figure 4.5.

Figure 4.5 – D132N RNase H SAXS Data Collection

135

The data reduction program GNOM was used next to process the data.

Two datasets were made, one at higher resolution (171 to 24 Å) and one at lower

resolution (171 to 45 Å). In both cases, the maximum dimension Dmax of the particle was estimated to be 65 Å. The radius of gyration obtained from the size distribution plot p(r) was calculated to be 22.53 ± 0.02 Å for the high resolution dataset, and 22.45 ± 0.02 Å for the lower resolution dataset. These values are very consistent with one another, and also consistent with the hydrodynamic radius obtained from the DLS experiments, which was 25 Å. The plots output by

GNOM for both datasets are shown in Figure 4.6.

Figure 4.6 – D132N RNase H GNOM Plots

172 to 24 Å resolution 172 to 45 Å resolution

experimental data model data

136

Finally, the two datasets obtained from GNOM were used in Ab Initio modeling programs such as DAMMIN and GASBOR. DAMMIN was used in the

“keep” mode with both datasets, and the multiple models were then averaged using DAMAVER. GASBOR only uses the higher resolution data. The models output by both programs are presented in Figure 4.7.

Figure 4.7 – D132N RNase H 3D SAXS Molecular Envelopes

Figure 4.7.a – DAMMIN Models

Low resolution model (172 to 45 Å)

Average of three models χ2 = 4.2

Dimensions: 70 Å × 70 Å × 70 Å

High resolution model (172 to 24 Å)

Average of five models χ2 = 2.1 Dimensions: 70 Å × 55 Å × 45 Å

137

Figure 4.7.b – GASBOR Model

χ2 = 2.1 Dimensions: 65 Å × 45 Å × 35 Å

The high resolution model from DAMMIN and the one from GASBOR are

fairly similar, with identical χ2 values. The ribbon structure from the RNase H

crystal structure can be fitted in the envelopes nicely. The DAMMIN model is a

little bigger as it takes the solvation of the molecule into account. On the other

hand, the low resolution model from DAMMIN appeared to be spherical, but this is most likely irrelevant as the χ2 value is very high.

4.3. Bacteriophage T4 D132N ∆N RNase H

4.3.1. Protein Expression and Cell Lysis

A glycerol stock of BL21 (DE3) pLysS cells containing the plasmid

pCJrnh1321 encoding for T4 D132N ∆N RNase H was obtained from Dr. Charles

Jones at N.I.H..

138

D132N ∆N RNase H was expressed according to the protocol described in

Section 2.2.2. The overnight culture was grown in 25 g/L LB containing 1 mM

Ampicillin and 1 mM Chloramphenicol. That culture was then used to inoculate 6

liters of fresh LB and 1 mM Ampicillin. After induction with 1 mM IPTG, the cells

were incubated at 37 °C for three hours, then harvested and stored at -20 °C.

About 10 g of cells were obtained for 6 L of culture.

The expression of D132N ∆N RNase H can be seen on the SDS-PAGE gel shown in Figure 4.8.a.

The cells were lysed according to the same protocol that was used for the native and D132N RNase H (see Section 4.2.2). The results of the D132N ∆N

RNase H cell lysis can be seen on the SDS-PAGE gel on Figure 4.8.b.

Figure 4.8 – SDS-PAGE of T4 D132N ∆N RNase Expression and Lysis

a 1 2 3 4 5 6 7 8 9

1- D132N ∆N RNase H expression – 0h sample

2- D132N ∆N RNase H expression – 3h sample 66.3 kDa 3- D132N ∆N RNase H expression – 3h sample 55.4 kDa 4- D132N ∆N RNase H expression – 3h sample

36.5 kDa 5- D132N ∆N RNase H expression – 3h sample 31.0 kDa 6- D132N ∆N RNase H expression – 3h sample 21.5 kDa 7- D132N ∆N RNase H cell lysis – pellet 14.4 kDa 8- D132N ∆N RNase H cell lysis – supernatant 9- Molecular Weight Marker

139

b 1 2 3

66.3 kDa 55.4 kDa 1- Molecular Weight Marker 2- D132N ∆N RNase H cell lysis – pellet 36.5 kDa 3- D132N ∆N RNase H cell lysis – supernatant 31.0 kDa

21.5 kDa 14.4 kDa

D132N ∆N RNase expressed very well. After cell lysis, a high amount of RNase H is found in the supernatant, indicating that the protein is soluble

4.3.2. Protein Purification

D132N ∆N RNase H was purified using cation-exchange chromatography.

The lysate was filtered before being loaded on the SP Sepharose column. The A and B buffers were similar to the ones used for native and D132N RNase H. The elution from the SP Sepharose was then further purified with the high resolution

POROS HS column.

Chromatograms and SDS-PAGE gels from the SP Sepharose and

POROS HS runs are shown in Figure 4.9.

140

Figure 4.9 – T4 D132N ∆N RNase H Purification

Figure 4.9.a – SP Sepharose

* * * * * * * * * * *

1- Molecular Weight Marker 2- SP Sepharose – Load 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 3- SP Sepharose – F. 9 4- SP Sepharose – F. 12 5- SP Sepharose – F. 15 6- SP Sepharose – F. 18 7- SP Sepharose – F. 21

8- SP Sepharose – F. 24 9- SP Sepharose – F. 27 10- SP Sepharose – F. 30

11- SP Sepharose – F. 33 12- SP Sepharose – F. 36 13- SP Sepharose – F. 43 14- SP Sepharose – Flow Through 15- Molecular Weight Marker

On the chromatogram, the OD260 is shown in purple, the OD280 in green, the conductivity in red and the % B in black. The red stars indicate which fractions were run on a SDS-PAGE gel. Here, fractions 1 to 36 contain D132N ∆N RNase H and were pooled to be run on the high resolution POROS HS column. The pooled fractions are indicated with a red line on the chromatogram.

141

Figure 4.9.b – POROS HS

* * * ****** * * *

1- Molecular Weight Marker 2- POROS HS – Load 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 3- POROS HS – F. 5 4- POROS HS – F. 11 5- POROS HS – F. 17

6- POROS HS – F. 19 7- POROS HS – F. 20 8- POROS HS – F. 21 9- POROS HS – F. 22 10- POROS HS – F. 23

11- POROS HS – F. 24 12- POROS HS – F. 26 13- POROS HS – F. 30

14- POROS HS – F. 33 15- Molecular Weight Marker D132N ∆N RNase H can be found in fractions 19 through 33 on the SDS-PAGE gel, and looks pure. The fractions 17 to 35 were pooled to be concentrated.

142

4.3.4. Solubility Screen

When D132N ∆N RNase H was concentrated in the HLPC buffer (50 mM

Tris HCl pH 7.5, 100 mM NH4Cl, 10 mM MgCl2 and ~ 300 mM NaCl), the protein precipitated heavily. A solubility screen was then performed in order to determine the optimum buffer for D132N ∆N RNase H. The protocol for the solubility screen

is described in Section 2.5.3.

The results are shown in Figure 4.10. PIPES pH 6.5 and MgCl2 improved

the solubility of the protein, therefore the dialysis buffer was modified to 25 mM bis-Tris HCl pH 6.5, 150 mM NH4Cl, 10 mM MgCl2 and 2 mM BME. This is also the optimized buffer for the native and the D132N mutant RNase H proteins.

Figure 4.10 – T4 D132N ∆N RNase H Solubillity Screen Results

Supernatant H2O TAPS pH 8.5 HEPES pH 7.5 PIPES pH 6.5 MES pH 5.6 Na Citrate Na Phosphate Na Sulfate Na Cacodylate Na Acetate Na Formate CaCl2 MgCl2 LiCl KCl NaCl NH4Cl 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 abs (595 nm)

143

4.3.5. Dialysis and Concentration

D132N ∆N RNase H was dialyzed in the optimized buffer obtained from

the solubility screen. Upon concentration, it still precipitated, but not as heavily as

previously. It was also found that the addition of glycerol to a final amount of 25

to 30% reduced the precipitation to a minimal level. D132N ∆N RNase H could

be concentrated up to 15 mg/mL, but was never stable and would precipitate out

of solution over time. However, it was found to be more stable when it was

concentrated only to a maximum of 5 mg/mL. This seems to indicate that the

N-terminus of RNase H is important for solubility and correct folding of the protein

4.4. Conclusion

The native T4 RNase H, the D132N mutant as well as the D132N ∆N

N-terminal truncation were all expressed and purified. Solubility studies were done with the D132N ∆N double mutant, since it showed some solubility

problems, and scattering studies were performed on the D132N mutant, as part

of the RNase H + 32 protein complex SAXS characterization.

CHAPTER 5 - Bacteriophage T4 RNase H

+ 32 Protein + DNA Interactions

5.1. Introduction

As it was previously mentioned in Chapter 1, the crystal structures of the core domain from 32 protein and RNase H, both from bacteriophage T4, have been

solved (Shamoo et al., 1995; Mueser et al., 1996). However, how these two

proteins interact at the replication fork is still unknown. After the individual protein

structures, this is the next level of information needed, in order to get a better

understanding of how the different proteins at the replication fork come together to organize DNA replication.

All the experiments and assays described in this chapter were carried out with the D132N mutant of RNase H, since the nuclease activity of that protein is incompatible with the presence of DNA.

5.2. Preliminary Complex Determination

There are a number of truncations available for both RNase H and 32

protein, which were respectively described in Chapter 3 and 4. These truncations

might interact with one another differently. Another parameter to take into

144 145

account is the nature and length of the DNA substrate. In order to sort out the

different possibilities and identify the stronger complexes that are more likely to

crystallize and yield better data, a series of non-denaturing gels (for the protein-

protein complexes) and gel-shift assays (for the protein-protein-DNA complexes)

were run. The results obtained from these gels are described in this Section.

5.2.1. Protein-Protein Interactions

Native gel electrophoresis was used to identify the RNase H + 32 protein complexes. The two types of RNase H: the D132N mutant and the D132N ∆N

N-terminal truncation were run in the presence of the four 32 protein truncations: the full length 32 protein, and the 32-A, 32-B and 32 core truncations.

The methodology for non-denaturing gel electrophoresis is described in

Section 2.9. A summary of the pIs of the different proteins involved is shown in

Table 5.1. Since the RNase Hs are all basic with a pI of 8.5 or above, and the

32s are acidic with pIs around 5, the gels were run at pH 6.5.

Table 5.1 – D132N RNase H and 32 Truncations Calculated pIs

Protein Calculated pI

D132N RNase H 8.75

D132N DN RNase H 9.11

32 protein 5.82

32-A 6.76

32-B 4.65

32 core 5.25

146

The first gel, presented in Figure 5.1, was run with D132N RNase H in the

presence of the different 32 proteins.

Figure 5.1 – D132N RNase H + 32 Truncations Native Gel

32 32-A 32-B 32 core 1- 32 Protein 1 2 3 4 5 6 7 8 9 10 11 12 (-) 2- 32 + D132N RNase H 3- D132N RNase H

4- 32-A Protein 5- 32-A + D132N RNase H 6- D132N RNase H 7- 32-B Protein 8- 32-B + D132N RNase H 9- D132N RNase H 10- 32 core Protein 11- 32 core + D132N RNase H (+) 12- D132N RNase H

The 32 proteins run towards the anode (positive electrode), while RNase

H runs towards the cathode. A weak complex is formed between D132N RNase

H and 32 protein (lane 2), as well as between D132N RNase H and the 32 core protein (lane 11). The strongest complex, however, is formed between D132N

RNase H and 32-B, as a strong band can be seen in between where the separate proteins run. 32-A doesn’t form a complex with D132N RNase H.

A similar gel was run, but this time with the N-terminal truncation D132N

∆N RNase H. It is shown in Figure 5.2.

147

Figure 5.2 – D132N ∆N RNase H + 32 Truncations Native Gel

32 32-A 32-B 32 core 1- 32 Protein (-) 2- 32 + D132N ∆N RNase H 1 2 3 4 5 6 7 8 9 10 11 12 3- D132N ∆N RNase H 4- 32-A Protein 5- 32-A + D132N ∆N RNase H

6- D132N ∆N RNase H 7- 32-B Protein 8- 32-B + D132N ∆N RNase H

9- D132N ∆N RNase H 10- 32 core Protein 11- 32 core + D132N ∆N RNase H (+) 12- D132N ∆N RNase H

With the D132N ∆N RNase H, stronger complexes are observed with 32

protein and the 32 core domain, while the 32-B complex is still strong. 32-A,

similarly to what happened with D132N RNase H, does not complex with D132N

∆N RNase H.

Following these non-denaturing gels, it was determined that the D132N

RNase H + 32-B complex, as well as the D132N ∆N RNase H + 32 / 32-B / 32

core complexes, were strong enough in terms of protein-protein interactions, to

justify further studies. This is also summarized in Section 5.2.3.

5.2.2. Protein-Protein-DNA Interactions

Even though several strong RNase H + 32 protein interaction complexes

were identified using non-denaturing electrophoresis, the assumption that these

protein-protein complexes will yield strong protein + DNA complexes cannot be

148 made. Some protein domains might move upon DNA binding, and the two proteins would then interact differently. This is the reason another series of gels was run for the ternary complexes.

The DNA substrates were designed in such a way that they mimic the natural DNA substrate occurring at the replication fork, where RNase H and 32 protein would bind. Two substrates were designed and used, a 3’-overhang and a fork DNA substrate. The natural substrate for RNase H binding at the replication fork closely resembles the 3’-overhang DNA, since RNase H is a

5’-exonuclease. However, RNase H is not locked in a particular place on that substrate and can slide along the DNA strand, which would then create heterogeneity issues when it comes to the formation of the ternary complex.

Therefore, another substrate was designed, where a short 5’-arm was added in order to keep RNase H positioned at the fork.

Figure 5.3 – DNA Substrates

3’ 5’ 3’-overhang DNA 5’ 3’

RNase H 5’ 32 Protein 3’ Fork DNA 5’

3’

149

To identify the best RNase H + 32 protein + DNA complexes, gel-shift

assays were run. The methodology for these gels is not described in chapter 2, as the gels were run by Dr. Charles Jones at N.I.H.. In these assays, the DNA is labeled using 32P radio-labeling. The protein-DNA complexes are run on an

agarose gel, and complex formation can be observed by the retardation of the

labeled DNA substrate.

A total number of five gels were run, probing the effect of the DNA substrate nature and size, as well as the effect of truncation of RNase H and 32 protein domains, on the formation of the ternary complex. These gels are shown in Figure 5.4. Figure 5.4.a shows the difference between the DNA substrates on

RNase H + 32 protein + DNA binding. The gels in Figure 5.4.b were run to investigate the effect of the truncation of RNase H (full length versus ∆N RNase

H) as well as the 32 protein truncations on the formation of the ternary complex.

On Figure 5.4.a, it can be seen that both RNase H and 32 protein can bind to either the 3’-overhang or the fork DNA substrate. The longer the 3’-arm is, the stronger the binding of 32, as it has more room to bind (the DNA footprint of 32 protein is 5 nucleotides (Jensen et al., 1976)). It can also be seen that if a longer

5’-arm is present on the fork DNA, only RNase H can bind, and no ternary

complex is formed. Finally, the short 5’-arm on the fork DNA can be 4 or 6 nucleotides long, as it does not make a difference in RNase H or 32 protein binding. The gel shown in Figure 5.4.a.c was run with the D19N RNase H, which is another inactive mutant of the protein and doesn’t induce nuclease degradation of the DNA substrate.

150

Figure 5.4 – RNase H + 32 Truncations + DNA Substrates Gel Shift Assays

Figure 5.4.a – Comparison of DNA Substrates

c

DNA + RNase H + 32 DNA + RNase H

DNA

Here we compare the length of the 3’-overhang and its effect on the binding of 32 protein. The longer 3’-overhang shows a stronger binding of 32 protein. The results with the 12/12 fork are shown for comparison: RNase H can bind more tightly to the fork substrate, which is expected, but the 12-mer on either the 3’-arm or the 5’-arm is too short to allow binding of 32.

d

A new substrate was designed following the results shown in the previous gel. A short 5’-arm was added to the different 3’-overhang substrates, in order to bind RNase H in a “locked” position, meaning it cannot slide along the 3’-overhang binding site anymore. Again, the longer the 3’-arm is, the stronger 32 protein binds. The length of the 5’-arm is not critical, as a 6-mer shows the same results as the 4-mer. These particular fork substrates were chosen to study the effect of RNase H or 32 protein truncations on the formation of the ternary complex, presented in Figure 5.4.b.

151

The gels shown in Figure 5.4.b show the effect of the protein truncations on the formation of the complex. All of these gels were run with the fork DNA substrate. The results can be divided into two categories: formation of the ternary complex with the full length D132N RNase H, or with the D132N ∆N truncation.

Concerning D132N RNase H, the strongest binding is observed with the 32-B truncation. A somewhat strong binding also appears with the 32-A truncation, and a weak binding with the full length 32 protein. The 32 core was not tested in these gels. With D132N ∆N RNase H, the same hierarchy is observed, only all complexes seem to be stronger than their D132N RNase H counterparts. The third gel (Figure 5.4.b.d) shows again that 32-B binds more strongly to the fork

DNA + D132N ∆N RNase H than the full length 32 protein. It also confirms that a longer 3’-arm on the fork DNA substrate is synonymous with a stronger ternary complex.

One thing that should be pointed out, however, is that radioactive labeling is very sensitive, and therefore nanomolar concentrations are sufficient to observe DNA bands on the gel. In the experiments described later on, concentrations as high as millimolar had to be used, for instance in the crystallization experiments, and the increase in concentration can drive the formation of complexes that appear to be weak at nanomolar concentrations.

152

Figure 5.4.b – Comparison of RNase H and 32 Protein Truncations

c This gel compares the binding of wild type RNase H versus the N-terminal truncation, as well as 32 protein versus 32-B, while binding to the fork DNA that was previously described as the best substrate. It can be seen that with the wild type 32 protein, a stronger ternary complex is

32 + RNase H + DNA obtained with the ∆N RNase H. Strong complexes are obtained with 32-B, either RNase H + DNA with the wild type RNase N or the ∆N truncation.

This gel is similar to the d previous one, but it compares the binding of 32 protein versus the 32-A truncation, again with the wild type and ∆N RNase Hs. It can be seen again, in a more obvious way than before, that 32 protein forms a stronger complex in the presence of ∆N RNase H. The binding of 32-A is also stronger with ∆N RNase H.

Here we compare the binding of 32 protein versus the 32-B e truncation in the presence of ∆N RNase H and two different lengths of 3’-arms. 32-B can bind more strongly than 32 to the fork DNA loaded with ∆N RNase H, as it was shown before. Also, the longer 3’-arm is, the stronger the ternary complex that is formed, which is consistent with the results from the gel shown in Figure 5.4.a.

153

5.2.3. Summary of the T4 RNase H + 32 Protein Complexes

To summarize the information described in the two previous sections, the

different complexes identified were compiled in Table 5.2. For the protein-protein

interaction, D132N RNase H was found to interact with only the 32-B truncation,

while D132N ∆N RNase H interacts with 32-B as well as 32 protein and the 32

core domain. As for the ternary complex, the D132N RNase H + 32-B + DNA

complex was identified, and the D132N ∆N RNase H + 32-B / 32 + DNA also.

The D132N RNase H + 32 protein + DNA complex was considered too weak.

The complexes involving 32-A were not retained, as the bands on the gel were

not as strong as the ones for 32-B and were very smeary.

Table 5.2 – RNase H + 32 Protein ± DNA Complexes

D132N RNase H D132N ∆N RNase H

D132N ∆N RNase H + 32

Protein-Protein Complex D132N RNase H + 32-B D132N ∆N RNase H + 32-B D132N ∆N RNase H + 32 core

D132N ∆N RNase H + 32 Protein-Protein-DNA Complex D132N RNase H + 32-B D132N ∆N RNase H + 32-B

It is interesting to see that RNase H and the 32 protein only interact,

although weakly, through DNA binding, and when the N-terminal domain of the

32 protein is cleaved off, the binding is a lot stronger. It is possible that the 32

B domain moves upon DNA binding and allows RNase H to bind, which cannot be observed when DNA is not present.

154

The following sections describe the work that was done on the different

complexes previously identified.

5.3. D132N RNase H + 32-B Protein Interaction

5.3.1. Complex Preparation

Before the D132N RNase H + 32-B complex was prepared, the two

proteins were always dialyzed separately in the same buffer, containing 25 mM

bis-Tris HCl pH 6.5, 150 mM NH4Cl, 10 mM MgCl2 and 2 mM β-mercaptoethanol.

After dialysis, the concentrations of the two proteins were checked, and the proteins were re-concentrated if needed. The complex was then prepared by mixing D132N RNase H and 32-B protein in an equimolar ratio, since the two proteins have different molecular weights. For instance, a 300 µM concentration

of the complex is roughly equivalent to 15 mg/mL.

5.3.2. Structural Studies

Crystal Screening and Optimization

The RNase H + 32-B complex was screened against six different

commercial and lab-made screens. This is summarized in Table 5.3. The first two

screens that were used (Crystal Screen I/II and Index) are indicated in italic,

because they were set up differently from the other ones. Native RNase H was

used, instead of the D132N mutant, and the complex was prepared at 10 mg/mL

of each protein, which could be a problem as this is not an equimolar complex.

155

All the screens were done in three-well Greiner plates, RNase H being in the first well, the complex in the second well and 32-B in the third as controls.

Table 5.3 – RNase H + 32-B Crystal Screens

Crystal Screen Concentration Temperature Drop (µL)

Crystal Screen I and II 10 mg/mL Room Temp. 0.5 + 0.5

Index 10 mg/mL Room Temp. 0.5 + 0.5

PEG Ion Screen 0.3 mM Room Temp. 0.5 + 0.5

Natrix 0.3 mM Room Temp. 0.5 + 0.5

Wizard I and II 0.3 mM Room Temp. 0.5 + 0.5

Additive Screen 0.3 mM Room Temp. 0.5 + 0.5

PEG Ion Screen 0.3 mM 4 °C 0.5 + 0.5

Natrix 0.3 mM 4 °C 0.5 + 0.5

Wizard I and II 0.3 mM 4 °C 0.5 + 0.5

Additive Screen 0.3 mM 4 °C 0.5 + 0.5

After the initial screening with the Crystal I/II and Index screens, a few hits were obtained. Some of them are shown below in Figure 5.5. Unfortunately, as it can be seen on the pictures, the crystal morphology of the complex strongly resembles the one of the RNase H crystals, meaning the complex crystals most likely only contain RNase H. Moreover, the Crystal Screen I condition 18 (first pictures) is similar to the condition used to grow Native RNase H crystals.

156

Figure 5.5 – Native RNase H + 32-B Initial Crystal Hits

Complex Native RNase H

Crystal Screen I – condition 18

20 % PEG 8000 0.1 M Na Cacodylate pH 6.5 0.2 M Magnesium Acetate

Index – condition 66

25 % PEG 3350 0.1 M bis-Tris pH 5.5 0.2 M Ammonium Sulfate

After this first rather unsuccessful attempt, the complex was rescreened against other screens (Wizard I and II, Natrix, PEG Ion Screen and Additive

Screen). This time, the complex was prepared in an equimolar fashion, with each protein at 0.3 mM. The D132N RNase H mutant was used instead of the native

RNase H, and the screening was done at both 4 °C and room temperature. The best hits obtained from these screens are shown in Figure 5.6. A picture of the crystals for each condition at both temperatures is shown. For three conditions out of the four presented, crystals are obtained for both temperatures. However, the crystal morphologies differ somewhat and the crystals grown at 4 °C look like the RNase H crystals more than the room temperature ones.

157

Figure 5.6 – D132N RNase H + 32-B Crystal Hits after Screening

Room Temperature 4 °C

Wizard I – condition 12

20 % PEG 1000 0.1 M Imidazole pH 8.0 0.2 M Calcium Acetate

Wizard II – condition 18

clear drop 20 % PEG 3000 0.1 M Tris HCl pH 7.0 0.2 M Calcium Acetate

Wizard II – condition 28

20 % PEG 8000 0.1 M Na MES pH 6.0 0.2 M Calcium Acetate

PEG Ion Screen – condition 25

20 % PEG 4000 0.1 M Na HEPES pH 7.5 0.2 M Ammonium Chloride

Expansion trays were setup for all conditions, but only the PEG 3000

(Wizard II, condition 18) and the PEG 4000 (PEG Ion Screen, condition 25) crystals could be obtained reproducibly and repeatedly. The Wizard II, condition

28 did not yield any crystals in the expansions, and the Wizard I, condition 12

158 only gave showers of crystals, even upon optimization. The other two conditions, however, could be optimized to grow large, single crystals, as is shown in Figure

5.7.

Figure 5.7 – D132N RNase H + 32-B Crystals after Optimization

10.9 % PEG 3350 Complex at 0.5 mM 0.1 M Tris HCl pH 7.5 Room Temperature 0.2 M Calcium Acetate 2 µL + 2 µL hanging drop 3 % Glycerol

16.4 % PEG 3350 Complex at 0.3 mM 0.1 M Tris HCl pH 7.5 Room Temperature 0.2 M Calcium Acetate 2 µL + 2 µL hanging drop 3 % Glycerol

Complex at 0.3 mM 7.7 % PEG 4000 Room Temperature 0.1 M Na HEPES pH 7.5 4 µL + 4 µL sitting drop 0.2 M Ammonium Chloride

Complex at 0.3 mM 8.6 % PEG 4000 Room Temperature 0.1 M Na HEPES pH 7.5 4 µL + 4 µL sitting drop 0.2 M Ammonium Chloride

Complex at 0.3 mM 9.5 % PEG 4000 Room Temperature 0.1 M Na HEPES pH 7.5 4 µL + 4 µL sitting drop 0.2 M Ammonium Chloride

159

Crystal Handling and Freezing

Once single crystals were grown, they had to be cryoprotected and flash-

frozen in liquid nitrogen before being screened for diffraction. Table 5.4 below

summarizes the different cryoprotectants that were tried first. The crystals were

soaked in a mixture of the substitute mother liquor and X % of the cryoprotectant.

Table 5.4 – D132N RNase H + 32-B Crystals Cryoprotection

Cryoprotectant Result

25 % Glucose Crystal turns brown immediately

25 % MPD Crystal turns brown after 1 min

25 % Ethylene Glycol Crystal melts immediately

25 % PEG 400 Crystal looks good

25 % Glycerol Crystal melts after 1 min

35 % PEG 400 Crystal melts immediately

15 % Ethylene Glycol + Crystal turns brown immediately 25 % PEG 400 15 % Ethylene Glycol + Crystal looks nice, but turns 25 % PEG 400 slightly brown after 2 min

PEG 400 seems to be the only cryoprotectant that does not degrade the

crystals, but its cryoprotecting power is rather weak and PEG 400-cryoprotected crystals tend to show ice rings upon X-Ray diffraction. A combination of PEG 400 and ethylene glycol was then used, at different concentrations, but the crystals were never stable in any cryoprotectant that was used at that point. The crystals that could be flash-frozen all showed no or very weak diffraction. It should also be noted that the biggest crystals had a tendancy to crack.

160

Since the crystals could not be cryoprotected by soaking, as seen before, it was attempted to grow them in the presence of cryoprotectant. Concentrations of 5 to 15 % of glycerol, MPD, ethylene glycol, PEG 400, glucose or xylitol were added to the crystallization conditions. However, the crystals did not grow in the presence of any of these cryoprotectants.

Finally, it was attempted again to soak the crystals in the substitute mother liquor + cryoprotectant, but this time by slowly increasing the cryoprotectant concentration. Soaks of 5 minutes were done, starting at 0 %, then 2 %, then

5 %, and then increased by 5 % increments until reaching the concentration of cryoprotectant needed to obtain an amorphous freeze. This strategy was tested with MPD and ethylene glycol. MPD, similarly to what was seen before, turned the crystals brown and degraded the proteins. Ethylene glycol, on the other hand, could be used to up to 20 % without showing any sign of degradation of the crystals. The cryoprotection in increasing concentration of ethylene glycol

therefore became the method of choice for freezing the D132N RNase H + 32-B

crystals.

Initial Data Collection and Processing

A number of crystals were flash-frozen in liquid nitrogen using the

cryoprotection protocol described above. They were then screened for diffraction

using the high brilliance FR-E X-Ray diffractometer in the Ohio Crystallography

Consortium, located in the Instrumentation Center. The crystals that showed

good enough diffraction were then used for data collection.

161

A first dataset was collected in-house on the crystal shown below, in

Figure 5.8. An example of a diffraction frame is also presented.

Figure 5.8 – D132N RNase H + 32-B Crystal Data Collection 1

Image 1

Complex at 0.3 mM Room Temperature 2 µL + 2 µL hanging drop

9.5 % PEG 4000 0.1 M Na HEPES pH 7.5 0.2 M Ammonium Chloride

The data collection and processing statistics are shown in Table 5.5. The

crystal diffracted to 3.5 Å but the data was scaled to only 4 Å. Even though the

resolution was cutoff, the Rmerge value is still extremely high, 22.1 %, indicating

either that the space group is incorrect, or that the quality of the data is very low.

This is likely, as the mosaicity is also very high, 1.81 °. Also, one cell edge is rather big, around 230 Å, meaning that the diffraction spots of that particular

direction are very close to one another. This can be seen on the diffraction frame

shown above. Another issue with this dataset is that there are only around 7,000 unique reflections, and the RNase H + 32-B complex contains roughly 4,700 atoms (not counting the hydrogen atoms). An estimate of four unique reflections

162

per atom is needed in order to get an unbiased model upon building and

refinement. This is clearly not the case here.

Table 5.5 – Crystallographic Data for the D132N RNase H + 32-B Dataset 1

Data Collection Data Processing

X-ray Wavelength 1.54 Å Space Group P222

Detector CCD 52.11 Å 90 ° Cell Dimensions 65.68 Å 90 ° Crystal to Detector Distance 100 mm 232.83 Å 90 °

Exposure Time 20 s Resolution after Scaling 20 to 4 Å

Oscillation 1 ° Rmerge * 22.1 %

Maximum Resolution 3.5 Å I/σ 3.7 (2.0)

ϕ Range 132 ° Observed Reflections 31,606 (6,996 unique)

Images 1 to 132 Completeness 97.3 %

Kappa Offset 30 ° Redundancy 4.52 (4.93)

Mosaicity 1.81 °

1 mol./ASU : 3.0 (58.4 %) Matthews Coefficient 2 mol./ASU : 1.5 (16.8 %)

The dataset was processed using the HKL2000 software (Otwinowski and Minor, 1997) ⎡ n ⎤ n * R =100 × F 2 − F 2 / F 2 Equation 5.1 merge ⎢∑∑ hkl hkl i ⎥ ∑∑ hkl ⎣ hkl i=1 ⎦ hkl i=1 where F 2 is the intensity of each hkl reflection and F 2 is the mean value of i hkl hkl i measurements of n equivalent reflections.

After scaling, the Molecular Replacement program MolRep was used to

attempt to phase the data. One molecule of RNase H (pdb 1TFR) and one

molecule of 32 core (pdb 1GPC) were sought. The data was scaled in the space

group was P222 as well as in all the related Laue subgroups (P2221, P21212 and

P212121), including inversion of the hand on the P2221 and P21212 space groups.

163

However, all the attempts were unsuccessful. This is probably due to the bad

quality of the data, indicated by an Rmerge value of 22 %.

Since the crystals seem to diffract poorly, it was proposed that a

synchrotron data collection could help, as a synchrotron X-Ray beam is more

intense than an in-house one. A new dataset was collected on the 22-ID

beamline at the Advanced Photon Source at Argonne National Laboratory. The

data were collected by Dr. Alexander Pavlovsky from Dr. Viola’s group. The

crystal used for this data collection as well as a frame from the dataset are

shown in Figure 5.9.

Figure 5.9 – D132N RNase H + 32-B Crystal Data Collection 2

Complex at 0.3 mM 20 °C 2 µL + 2 µL hanging drop

11.9 % PEG 4000 0.1 M Na HEPES pH 7.5 0.2 M Ammonium Chloride

The data collection and processing statistics are tabulated below in Table

5.6. The quality of this dataset is better than the previous one : the Rmerge is only

6.0 %, compared to 22 % before, and the mosaicity decreased to 0.62 °. Despite

164

these improvements, the number of unique reflections is still too low and only 1.6

unique reflections per atom are observed.

Table 5.6 – Crystallographic Data for the D132N RNase H + 32-B Dataset 2

Data Collection Data Processing

X-ray Wavelength 1.0332 Å Space Group P222

Detector CCD 52.88 Å 90 ° Cell Dimensions 65.33 Å 90 ° Crystal to Detector Distance 485 mm 234.03 Å 90 °

Exposure Time 5 s Resolution after Scaling 30 to 4 Å

Oscillation 0.3 ° Rmerge * 6.0 % (10.9 %)

Maximum Resolution 3.26 Å I/σ 18.5 (14.1)

ϕ Range 122 ° Observed Reflections 24,163 (7,467 unique)

Images 1 to 340 Completeness 74.2 % (47.2 %)

Kappa Offset 0 ° Redundancy 2.5

Mosaicity 0.62 °

The dataset was processed using the HKL2000 software (Otwinowski and Minor, 1997) ⎡ n ⎤ n * R =100 × F 2 − F 2 / F 2 Equation 5.1 merge ⎢∑∑ hkl hkl i ⎥ ∑∑ hkl ⎣ hkl i=1 ⎦ hkl i=1 where F 2 is the intensity of each hkl reflection and F 2 is the mean value of i hkl hkl i measurements of n equivalent reflections.

Molecular Replacement was done, using one molecule of RNase H and

one molecule of the 32 core domain as search models, like it was previously

described. MolRep and AMoRe were used, but no solution was found.

165

Crystal Analysis

Since the diffraction quality of the complex crystals was so poor that it

made phasing of the data impossible, it was decided to carry out some

biophysical experiments on the crystals themselves, in order to determine if both

proteins were present.

The first experiment that was done was to run the crystals, as well as the

crystallization drops, on an SDS-PAGE gel. A large number of crystals were

harvested using crystal freezing loops, and washed using the substitute mother

liquor (crystallization condition + dialysis buffer). Once enough crystals were

collected, they were dissolved in a solution of 100 mM Ammonium Bicarbonate

and run on an SDS –PAGE gel, shown in Figure 5.10.c, along with the separate

proteins. The same crystal sample was run with 1 mM TCEP, in case some

cross-linking was occurring. The bands for both RNase H and 32-B can be seen

in the crystal sample, indicating that both proteins are indeed present in the

crystals. However, the band corresponding to D132N RNase H is stronger than the 32-B one. In order to further investigate that fact, more crystallization drops were collected, similar to the previous ones, but this time the crystals were spun down and the supernatant was run on an SDS-PAGE gel, shown in Figure

5.10.d. Both proteins can be seen in the supernatant, and again the RNase H band is stronger than the 32-B one. If the lower concentration of 32-B seen in the crystal (Figure 5.10.c, lane 2) was due to missing 32-B molecules within the crystal lattice, an excess of 32-B would be found in the supernatant, as the crystal drops were setup in an equimolar ratio of D132N RNase H : 32-B. Since

166

this is not the case, the other possible explanation is that 32-B binds to the

Coomassie Blue dye more weakly than RNase H, and therefore shows as a

weaker band on the SDS-PAGE gel.

Another experiment that was done to further investigate this question was

to set up crystallization of the complex in several RNase H : 32-B ratios, ranging from 1:1 to 1:3. The crystals mostly grew only for the 1:1 ratio. This again indicates that the reason more RNase H can be seen on the gel is probably due to a staining problem.

At any rate, the important result from this experiment is that D132N

RNase H and 32-B can be seen on the crystal sample, indicating that the crystals do indeed contain both proteins.

Figure 5.10 – SDS-PAGE Gel of the D132N RNase H + 32-B Crystals

c 1 2 3 4 5 d 6 7 1- Molecular Weight Marker 2- 32-B

66.3 kDa 3- D132N RNase H + 32-B crystals 66.3 kDa 55.4 kDa 55.4 kDa 4- D132N RNase H + 32-B crystals + TCEP

36.5 kDa 5- D132N RNase H 36.5 kDa 31.0 kDa 31.0 kDa

21.5 kDa 21.5 kDa

14.4 kDa 14.4 kDa 6- Molecular Weight Marker 7- Crystallization drop supernatant

167

Mass Spectrometry was the other biophysical experiment carried out on the crystals. Similarly to what was described before, a number of crystals were looped out of the crystallization drop, rinsed with the substitute mother liquor

(minus the PEG that could interfere with the MS experiment), and dissolved in

100 mM Ammonium Bicarbonate. The sample was then sent to the Proteome

Consortium at the University of Michigan Medical School, for intact mass analysis as well as trypsin digestion and protein identification. The intact mass spectrum

is shown in Figure 5.11.

Figure 5.11 – Intact Mass Spectrum of the D132N RNase H + 32-B Crystals

32-B 31.8 kDa

D132N RNase H 35.6 kDa

The error on the experimental molecular weights determined by intact mass spectrometry with this experiment is ± 22 Da.

Two intense peaks are present, with an apparent mass of 32.076 kDa and

35.833 kDa respectively. The calculated molecular weights for the two proteins

are also shown on the spectra. The experimental masses are close enough to

168 the theoretical ones, and this experiment confirms that both proteins are present in the crystals. Moreover, the sample was also subject to trypsin digestion and

MALDI-TOF analysis, which further confirmed the identity of the two proteins in the sample, as is shown in Figure 5.12.

Figure 5.12 – RNase H + 32-B Crystals MALDI-TOF Results

a – D132N RNase H, Protein Score: 159, 100 %

1 MDLEMMLDED YKEGICLIDF SQIALSTALV NFPDKEKINL SMVRHLILNS IKFNVKKAKT

61 LGYTKIVLCI DNAKSGYWRR DFAYYYKKNR GKAREESTWD WEGYFESSHK VIDELKAYMP

121 YIVMDIDKYE ANDHIAVLVK KFSLEGHKIL IISSDGDFTQ LHKYPNVKQW SPMHKKWVKI

181 KSGSAEIDCM TKILKGDKKD NVASVKVRSD FWFTRVEGER TPSMKTSIVE AIANDREQAK

241 VLLTESEYNR YKENLVLIDF DYIPDNIASN IVNYYNSYKL PPRGKIYSYF VKAGLSKLTN

301 SINEF

b – 32-B protein, Protein Score: 67, 99.99 %

1 MGFSSEDKGE WKLKLDNAGN GQAVIRFLPS KNDEQAPFAI LVNHGFKKNG KWYIETCSST

61 HGDYDSCPVC QYISKNDLYN TDNKEYSLVK RKTSYWANIL VVKDPAAPEN EGKVFKYRFG

121 KKIWDKINAM IAVDVEMGET PVDVTCPWEG ANFVLKVKQV SGFSNYDESK FLNQSAIPNI

181 DDESFQKELF EQMVDLSEMT SKDKFKSFEE LNTKFGQVMG TAVMGGAAAT AAKKADKVAD

241 DLDAFNVDDF NTKTEDDFMS SSSGSSSSAD DTDLDDLLND L

The sets of peptides corresponding to T4 D132N RNase H and the 32-B protein truncation with the highest protein score were chosen. Cysteine residues ozidized as carbamidomethyl are shown in blue, and oxidized Methionine residues are in green. The peptides identified by the MS experiment are shown in red. The database used for peptide matching was NCBInr.

169

Further Crystal Optimization

Since the two previous cryogenic datasets were not of good enough quality to allow phasing of the data, a room temperature diffraction experiment was done in order to figure out if the problem came from freezing the crystals or from the crystals themselves. The room temperature diffraction pattern, similarly to the cryogenic ones, was very anisotropic, the mosaicity was high. A lot of diffuse scattering was observed and the resolution was low. All of these indicate that the crystal lattice is somewhat disordered and therefore responsible for the poor quality of the data. In order to improve the diffraction quality of the D132N

RNase H + 32-B crystals, several optimization strategies were tested.

The first one was to slightly modify the crystallization condition, by changing concentrations (other than the precipitating agent concentration) the pH or adding an extra component.

• Depending if the interaction between the two proteins is hydrophobic or electrostatic, the ionic strength of the crystallization condition is likely to play an important role, so a salt gradient of 0 to 1 M of Ammonium Chloride was used.

The crystals only grew at a salt concentration ranging between 0.1 and 0.2 M.

This indicates that the interaction might be electrostatic, since higher salt concentrations seem to inhibit it, but still a minimum amount of salt in solution is needed to keep the proteins stable.

• The pH of the solution is also important, so the crystals were grown at a pH varying from 6.5 to 9.5, using Tris based buffers as well as the Na HEPES form the initial condition. Only a few crystals grew in bis-Tris HCl pH 6.5, and

170

they could not be reproducibly obtained. Going from HEPES pH 7.5 to Tris HCl

pH 7.5 did not make a big difference, but the crystals seemed to grow in clusters

instead of separate entities. At pH 8.5 and higher, only showers of smaller

crystals were obtained.

• As some cross-linking was observed when the crystals were run on an

SDS-PAGE gel, the crystals were grown in the presence of a reducing agent, such as 1 mM TCEP or 10 mM β-mercaptoethanol. The crystal quality improved somewhat, and one dataset could be collected on a TCEP grown crystal (see the following section). However, the improvement was not dramatic.

• It was also proposed that bacterial growth might be part of the problem.

Indeed, the crystals grown in the PEG 3350 + 0.1 M Tris HCl pH 7.5 + 0.2 M Ca

Acetate repeatedly showed contamination of the crystallization drop. A final

concentration of 0.5 mM Na Azide was added to the crystallization condition, but

did not improve the crystal quality.

Another strategy was to try to affect the nucleation and crystal growth

rates using several methods.

• The rate of nucleation seems rather high, as showers of crystals can be

obtained very easily. To tackle that issue, the protein concentration was lowered,

but it could not be lowered too much as bigger crystals are needed to obtain

enough resolution. 3 % glycerol was also added to the crystallization condition.

Glycerol has two effects, it increases the viscosity of the solution, slowing down

nucleation, and it is also a precipitating agent. Fewer and bigger single crystals

171

were indeed obtained in the presence of 3 % glycerol, but the diffraction quality did not improve.

• Addition of water to the crystallization drop is also known to slow down

nucleation, as it takes longer for the drop to reach super-saturation. In the case

of the D132N RNase H + 32-B crystals, no improvement was seen.

• The crystals were also grown at 4 °C, for the same reason. Interestingly,

only showers of small crystals were obtained in the cold, compared to single crystals at room temperature.

• Hanging drop versus sitting drop crystallization has been shown to make a

difference in crystal growth. Sitting drops were used, a method that allows for a

larger drop size. Larger crystals could be obtained, but the disorder of the crystal

lattice was only amplified, resulting in even more diffuse scattering and higher

mosaicity upon X-Ray diffraction. Polypropylene micro-bridges were used instead

of the regular polystyrene micro-bridges, on the surface of which protein crystals tend to grow and get stuck.

Post-crystallization crystal improvement methods were attempted, such as desiccation or cross-linking.

• The dessication method (Haebel et al., 2001) is supposed to rid the crystal

of the extra solvent that it contains, therefore shrinking the crystal lattice and

improving diffraction. It was carried out on the D132N RNase H + 32-B crystals, but all diffraction was lost after the desiccation of the crystals.

172

• Cross-linking using glutaraldehyde was also proposed to strengthen the

crystal lattice. This experiment was unsuccessful as well, as the cross-linking

actually destroyed the lattice.

Finally, optimization of the flash-freezing and data collection was done, so

as to limit to a minimum the loss of resolution upon these two steps.

• The crystals were flash-frozen in a helium stream at 20 K, instead of the

regularly used liquid nitrogen freeze. This is supposed to limit damage to the

crystal lattice upon freezing.

• It has been seen that using a substitute mother liquor with a different pH

from the crystallization condition can sometimes dramatically increase the

diifraction quality of the crystals. When this experiment was tried, it resulted in a

total loss of diffraction.

• Annealing of the crystal in the cryo-stream was attempted as well,

resulting also in the total loss of diffraction.

Challenges Posed by the D132N RNase H + 32-B Crystals

To summarize the issues that were encountered so far with the complex

crystals, several conflicting problems were found. All the crystals showed poor diffraction, diffuse scattering, high mosaicity and anisotropy of the diffraction pattern. One of the cell edges being large, the spots in that direction are very

close and can overlap easily. The larger crystals do diffract to better resolution,

but tend to crack upon flash-freezing, giving rise to twinned diffraction patterns,

and are also more disordered. On the other hand, smaller crystals show better

173 quality data, but to a lower resolution. All the optimization methods described in the previous section were somewhat unsuccessful, as they did not result in any dramatic improvement of the diffraction quality of the crystals.

The key to obtaining good enough data is to find a good compromise between all these issues. In other words, finding a crystal small enough so that the disorder of the crystal lattice is limited, but still big enough in order to have decent resolution. Moreover, the easiest way to tackle the spot overlapping problem would be to collect data on a more intense synchrotron X-Ray source, which could provide better resolution and better separation.

A large number of crystals responding to these criteria were therefore flash-frozen, tested for diffraction and the more promising candidates were saved so that a dataset could be collected at the synchrotron at APS.

Data Collection (APS) and Processing

A number of crystals were taken to the Advanced Photon Source synchrotron, and four datasets were collected. The best one, with a resolution of

3.2 Å, is presented below and was used for phasing and model building. The dataset was collected on one of the crystals shown in Figure 5.13. These crystals were grown in the presence of 1 mM TCEP, which seems to improve the diffraction quality. Tris HCl pH 7.5 was also used instead of Na HEPES. The crystals were flash frozen after the stepwise ethylene glycol cryoprotection method described above.

174

Figure 5.13 - D132N RNase H + 32-B Crystal Used in Data Collection 3

12.8 % PEG 4000 Complex at 0.3 mM 0.1 M Tris HCl pH 7.5 Room Temperature 0.2 M Ammonium Chloride 2 µL + 2 µL hanging drop 1 mM TCEP

The dataset was collected according to the parameters described in Table

5.7. A set of diffraction images from the dataset is shown in Figure 5.14. One can

already see that the resolution is higher and the mosaicity less than in the previous datasets.

Figure 5.14 - D132N RNase H + 32-B Crystal Data Collection 3 Images

a b c

a – Image 1 (0 to 0.5°) b – Image 90 (44.5 to 45°) c – Image 180 (89.5 to 90°)

The dataset was processed using MOSFLM (Leslie, 1992) and scaled using SCALA, which is part of the CCP4 Suite (Bailey, 1994). The data were indexed and integrated as P222 first, then the space group was changed to

175

P212121 using SORTMTZ, before scaling was done. A summary of the

processing and scaling statistics is available in Table 5.7.

Table 5.7 - D132N RNase H + 32-B Crystal Data Collection and Processing

Data Collection Data Processing

X-ray Wavelength 0.90020 Å Space Group P212121

Detector CCD 51.22 Å 90 ° Cell Dimensions 64.78 Å 90 ° Crystal to Detector Distance 400 mm 233.81 Å 90 °

Exposure Time 5 s Resolution after Scaling 117.0 to 3.4 Å

Oscillation 0.5 ° Rmerge * 15.2 % (62.1 %)

Maximum Resolution 3.2 Å I/σ 7.4 (2.3)

ϕ Range 180 ° Observed Reflections 46,208 (10,365 unique)

Images 1 to 360 Completeness 99.4 % (99.8 %)

Kappa Offset 0 ° Redundancy 4.5 (4.8)

Mosaicity 0.68 °

1 mol./ASU : 2.87 (57.2 %) Matthews Coefficient 2 mol./ASU : 1.44 (14.4 %)

The dataset was processed using the MOSFLM software (Leslie, 1992) ⎡ n ⎤ n * R =100 × F 2 − F 2 / F 2 Equation 5.1 merge ⎢∑∑ hkl hkl i ⎥ ∑∑ hkl ⎣ hkl i=1 ⎦ hkl i=1 where F 2 is the intensity of each hkl reflection and F 2 is the mean value of i hkl hkl i measurements of n equivalent reflections.

The Matthews coefficient that was calculated indicates that only one

molecule is present in the asymmetric unit (Matthews, 1968). Molecular

replacement was performed using the maximum likelihood program PHASER

(McCoy, 2007). The search models were the 1TFR (RNase H) and 1GPC (32

core domain) coordinate files. The results are shown in Table 5.8.

176

Table 5.8 - D132N RNase H + 32-B Molecular Replacement Results

Score Search Model (1TFR, 1GPC)

Rotation Function Score 11.5 6.1

Translation Function Score 26.9 27.6

Packing 0 0

Log-Likelihood Gain 597 1030

The Log-likelihood gain (LLG) indicates how well the data agrees with the model, a good molecular replacement solution will therefore have a high LLG score. The rotation (RFZ) and translation (TFZ) Z scores are then calculated from the LLG. An RFZ score higher than 5 and TFZ score higher than 8 indicate that a solution was found. The packing is an indication of clashes that may have been found by the program.

Model Building

The solution found by PHASER was first refined using rigid body refinement, then restrained refinement in REFMAC (Bailey, 1994; Murshudov,

1997). The refined model was then imported in COOT (Emsley, 2004), where major loop movements and clashes were fixed. The reflection file used to calculate the electron density was the one output after molecular replacement.

The electron density was calculated as follows:

1 ρ(x,y,z) = ∑ Fhkl cos[2π (hx + ky + lz − φhkl )] Equation 5.2 V hkl

177

Figure 5.15 – D132N RNase H + 32-B Model Building and Refinement

Molecular Replacement PHASER Model RFZ = 6.1, TFZ = 27.6

Rigid Body Refinement

R = 39.9 %, Rfree = 40.5 %

Restrained Refinement (Weight = 0.01)

R = 28.6 %, Rfree = 34.5 %

Model Building Fixed the main loop movements and clashes

Rigid Body Refinement

R = 28.1 %, Rfree = 33.5 %

Restrained Refinement (Weight = 0.01) R = 27.6 %, Rfree = 33.7 %

Procheck

Most favored = 83.7 % Allowed = 15.2 % Generously Allowed = 0.9 % Disallowed = 0.2 %

After model building, the modified model was then refined once again using REFMAC. At that point, the R value was around 27 % and the Rfree 33 %. A

178

schematic summary of the model building and refinement for the D132N RNase

H + 32-B crystal structure is available in Figure 5.15.

If additional modifications were made to the protein chains, even small, it always

resulted in either an increase of the R value, but mostly in an increase of the Rfree without seeing any significant decrease of the R value. This is usually an indication that the model is being built wrong. Since the electron density map quality was rather poor and it was difficult to interpret it correctly, the model output by REFMAC after the first round of model building was declared final. For the same reason, the automatic addition of water molecules with the ARP/wARP command was not done, and the missing A domain of the 32-B protein could not be built. The final statistics for refinement and validation are presented in Table

5.9.

Table 5.9 – Final Refinement and Validation Summary

Refinement Ramachandran Plot

Resolution 117.0 to 3.4 Å Most Favored 83.7 %

R 27.6 % Allowed 15.2 %

Rfree 33.7 % Generously Allowed 0.9 %

RMS Deviation: Disallowed 0.2 %

Bond Lengths 0.014 Å

Bond Angles 2.001 °

The final model for the interaction between RNase H and the 32 core domain is presented in Figure 5.16.

179

Figure 5.16 – Final D132N RNase H + 32-B Model

RNase H 32 core domain

Surface Rendering

The two proteins are color-coded similarly to the previous model.

Jones’ Rainbow Ribbons The N-termini of the two proteins are colored in red and the C-termini in blue.

B-Factor Ribbons The regions with the highest B-factors are colored in red, the lowest B-factors are shown in blue.

180

The C-terminus of RNase H, as it was predicted, is largely involved in the interaction with the 32 protein. The part of 32 that is involved in interacting with

RNase H is the helix C region at the back of the subdomain II. According to

Shamoo’s description of the 32 protein structure (Shamoo et al., 1995), the subdomain II forms most of the binding cleft for the single-stranded DNA, while the subdomain I is involved in Zn2+ binding. The surface rendering picture shows how nicely the two proteins fit together. It is visible from the B-factor ribbon structure that the subdomain I of the 32 protein has really high B-factors, which is why this region of the protein had really poor electron density. Being the

Zn2+-binding domain of 32 protein, it is possible that disorder in that region resulted from the lack of Zn2+ in the crystals.

The separate structures for RNase H and the 32 core domain were superposed onto the complex structure, so that the domain movement is more visible. This is shown in Figure 5.17. The complex is shown in green and cyan as before, and the RNase H and 32 core respective structures in yellow. The main domain movement occurs at the helix C loop in the 32 protein, which swings down to lock onto RNase H (c on the figure). There is some more subtle movement in RNase H as well, denoted by the d on the figure: the binding of the

32 protein seem to push the helices of the large subdomain towards the active site, which would strengthen the binding of the DNA substrate. Finally, shown as e, a little loop in the Zn2+ binding region of the 32 protein moves out of the way to accommodate RNase H.

181

Figure 5.17 – Domain Movement Observed upon Binding

d  c 

Front View d 

 e

c  d 

Back View  d

e 

The final models of RNase H and the 32 protein are shown in green and blue respectively, consistent with the previous figure. In yellow is the superposition of the crystal structures of the two proteins by themselves (pdb files 1 TFR and 1GPC). The main domain movements are indicated with a black star, the clashes that were fixed after model building with a red star.

Figure 5.18 below shows the electrostatic surfaces of RNase H and the 32 core domain at the site of the interaction. It can be seen that the two proteins interact through hydrophobic areas, in gray on the figures.

182

Figure 5.18 – Electrostatic Surfaces

RNase H is colored in green and the 32 protein is cyan. The two proteins were pulled apart so that the surfaces are easier to see. The electrostatic surfaces are colored as follows: the negatively charged areas are in red, and the positively charged ones in blue. The areas that are shown in gray are hydrophobic.

The helix 12 from RNase H, which plays a critical role in the binding, is

almost entirely non-polar: all the residues carrying a charge are oriented towards

the active site, and the non-polar residues are oriented towards the outside and the 32 protein. A number of hydrophobic residues are present on the exposed surface of the 32 core domain as well, notably two isoleucines (I60 and I151) and

a tryptophan (W144).

Moreover, it was seen on the crystal trials that at high salt concentration,

the crystals did not grow anymore. This can now easily be explained by the fact

that the interaction of the two proteins is hydrophobic.

Finally, a fork DNA substrate was modeled in the D132N RNase H + 32

core protein structure. The coordinates for the DNA were taken from the

RNase H + DNA crystal structure (Devos et al., 2007). The final model is

presented in Figure 5.19.

183

Figure 5.19 – Superposition of a Fork DNA Substrate

The fork DNA model was obtained from the D132N RNase H + fork DNA coordinate file (Devos et al., 2007) and superposed on the RNase H + 32 core domain model.

The RNase H + 32 protein structure complements really well the RNase H

+ fork DNA one. The manner the two proteins were found to interact positions the groove region of 32 protein perfectly to bind and protect the 3’-arm of single-stranded DNA coming from RNase H.

Discussion

The X-Ray diffraction data collected from the D132N RNase H + 32-B

crystals, even at 3.4 Å resolution, allowed us to dock the two proteins together

and observe some movements occurring upon binding. A large loop that is part

of the 32 protein subdomain II locks down on RNase H for instance. The surfaces

taking part in the binding were found to be largely non-polar, therefore meaning

that the interaction of the two proteins occurs through hydrophobic contact. This

184

is interesting as preliminary results during the crystallization trials indicated that

the interaction was electrostatic, and not hydrophobic. Also, the modeling of a

fork DNA substrate onto the RNase H + 32 protein structure showed how well the

32 protein is positioned to bind the single-stranded DNA in its cleft between the subdomains I and II. This is a very good confirmation that the low resolution model of the interaction between RNase H and 32 protein is physiologically relevant.

Since the interaction appears to be hydrophobic, a nice way of testing the model would be to mutate some of the non-polar residues involved in the binding into charged residues having roughly the same size. The hydrophobic residues at the 32 protein surface previously mentioned, I60, I151 and W144, were chosen.

They are all directly pointing at RNase H and span the entire length of the binding interface. Figure 5.20 shows where these residues are located on the model. It was decided to mutate the isoleucine residues into aspartate groups, and the tryptophan into a glutamate. These mutations were chosen as they only change the charge at the interface, but the overall shape is roughly conserved.

Once the three 32-B mutants have been cloned and expressed, their interaction with RNase H will be tested using the same techniques that were used for the RNase H + 32-B complex. Non-denaturing electrophoresis will be performed first to qualitatively estimate the binding or lack thereof between

RNase H and the mutants. Quantitative techniques like ITC or Fluorescence

Anisotropy titrations will then be used to estimate how well the mutants affected

185

the interaction. If it appears that the single-mutants do not have a strong effect on the binding, then double-mutants may have to be cloned as well.

Figure 5.20 – 32-B Mutated Residues

I60

W144

I151

5.3.3. Scattering Studies

Since it proved to be difficult for a long time to obtain any information from the crystals, solution-based experiments like Dynamic Light Scattering and Small

Angle X-Ray Scattering were carried out to gather more information on the state of the D132N RNase H + 32-B complex in solution.

Dynamic Light Scattering

186

The D132N RNase H + 32-B complex was prepared as described in

Section 5.3.1, at 30 µM ~ 1 mg/mL. The sample was filtered using a 0.1 µm pore size Ultrafree-MC filter, and centrifuged for 20 minutes and 4 °C at 18,000 rcf.

The DLS readings were then taken at 4 °C and 20 °C. Unfortunately, the complex aggregates at 20 °C and no satisfactory measurements could be made at that temperature. The results for the 4 °C experiment are shown in Figure 5.21 and

Table 5.10. The control measurements on the separate proteins were also done and presented in their respective chapter (Section 3.5.11 for 32-B and Section

4.2.5 for D132N RNase H).

Figure 5.21 – Dynamic Light Scattering Results for the D132N

RNase H + 32-B Complex at 4 °C

187

Table 5.10 – Dynamic Light Scattering Results for the D132N

RNase H + 32-B Complex at 4 °C

Rh (nm) % Pd MW (kDa) % Intensity % Mass

3.3 15.4 54 100.0 100.0

The polydispersity of the sample is around 15 %, indicating that slight

aggregation is occurring. The molecular weight of 54 kDa was calculated from

the hydrodynamic radius of 3.3 nm. That value was a little small but close to the

one expected for the D132N RNase H + 32-B complex, which has a theoretical

molecular weight of 67.4 kDa.

The fact that the complex is aggregating at 20 °C might be an indication

as to why the crystals were not diffracting to high resolution. However, when the

crystallization experiment was repeated at 4 °C, only showers of small crystals

were obtained.

Small-Angle X-Ray Scattering

The SAXS sample was prepared in the same way the DLS sample was, at

100 µM concentration. The data was collected at the APS 15-ID beamline, at

room temperature. Similarly to what was done with the separate proteins (see

Section 3.5.12 and 4.2.5), images were collected with 5 s, 20 s and 40 s

exposure, for the buffer and the protein sample, and only the 40 s images were

averaged for the data processing. The data used in processing is shown in

Figure 5.22.c, it was superposed with the D132N RNase H and the 32-B data, in

188

Figure 5.22.d. The scattering curves for the two separate proteins look very different from the complex one, indicating that it is indeed the complex scattering the X-Rays and not D132N RNase H or 32-B by themselves.

Figure 5.22 – D132N RNase H + 32-B SAXS Data Collection c d

Complex D132N RNase H 32-B

The data were processed with GNOM to two different resolutions: a higher resolution (171 to 24 Å) dataset was made as well as a lower resolution one (171 to 45 Å). The estimated maximum dimension of the particle Dmax was 100 Å. The experimental vs. calculated data and size distribution plots are shown in Figure

5.23. The radii of gyration that were calculated at that point were as follows:

30.66 ± 0.09 Å for the high resolution data, and 31.3 ± 0.2 Å for the low resolution data. These values are in good agreement with each other, and are also consistent with the hydrodynamic radius of 33 Å obtained from the DLS experiment.

189

Figure 5.23 – D132N RNase H + 32-B SAXS Data Processing (GNOM)

High Resolution (172 to 24 Å) Low Resolution (172 to 45 Å)

experimental data

model data

The GNOM datasets were then input in GASBOR or DAMMIN, two Ab

Initio modeling programs that calculate what the molecular envelope of the protein in solution look like. DAMMIN was used in the “keep” mode (models averaged with DAMAVER) with both the high resolution and the low resolution data, while GASBOR was only used with the high resolution data. The models output by the two programs are presented in Figure 5.24. The ribbon-like structures of the RNase H and the 32 core were modeled in the envelopes using

190 the program COOT (Emsley, 2004). The proteins were positioned in a relative orientation matching the one observed in the crystal structure of the D132N

RNase H + 32-B complex. For comparison, the surface rendering of the crystal structure is also shown in Figure 5.24.c.

Figure 5.24 – D132N RNase H + 32-B SAXS 3D Molecular Envelopes

Figure 5.24.a – DAMMIN Models

Lower resolution model (172 to 45 Å)

Average of seven models 2 χ = 0.71 Dimensions: 110 Å × 60 Å × 50 Å

Higher resolution model (172 to 24 Å)

Average of five models 2 χ = 0.76 Dimensions: 110 Å × 65 Å × 50 Å

Figure 5.24.b – GASBOR Model

χ2 = 1.0

Dimensions: 105 Å × 55 Å × 30 Å

191

Figure 5.24.c – Crystal Structure Surface Rendering

Dimensions: 95 Å × 50 Å × 40 Å

In all the figures above, RNase H is shown in green and the 32 protein in blue.

The GASBOR model seems to fit the crystal structure the best, even

though the χ2 values for the DAMMIN models are lower. In all cases, the

molecular envelope has an elongated shape, consistent with the surface

obtained from the crystal structure.

In the case of the DAMMIN and GASBOR models, the protein atomic

models were manually built inside the molecular envelopes, which is clearly not

the best way of obtaining a good model. The program SASREF in the ATSAS

suite is a rigid body modeling program that fits known structures into SAXS molecular envelopes using simulated annealing. It was used with the 1TFR

(RNase H) and 1GPC (32 core domain), against the GNOM dataset. This program should be run a large number of times, as a different model is obtained from every run. The models were compared, and only the best one is presented in this section. The χ2 values for these models ranged from 2.1 to 3 and above.

The fit obtained from the best SASREF run is shown in Figure 5.25.

192

Figure 5.25 – Best SASREF fit for the D132N RNase H + 32-B calculated

vs. experimental data

2 χ = 2.1

experimental data

calculated data

SASREF outputs a coordinate file for each model, corresponding to the rigid body modeling of the two proteins. The model corresponding to the fit shown previously is presented in Figure 5.26.

Figure 5.26 – Best SASREF Model for the D132N RNase H + 32-B Complex

32 core (SASREF)

RNase H 32 core 32 core domain (crystal)

SASREF model Superposition of the SASREF model and the crystal structure

193

In this model, the 32 protein interacts with helix 4 on the side of the bridge

region of RNase H (see (Mueser et al., 1996) for the description of the regions of

RNase H). It is known from other studies and the crystal structure that the

RNase H – 32 protein interaction is made through the C-terminus of RNase H.

Indeed, when the SASREF model was superposed on the crystal structure, as

seen on the right-hand side of Figure 5.25, the position of the 32 protein

predicted by SASREF is very different from the actual one.

In parallel, the scattering data from the D132N RNase H + 32-B complex crystal structure was calculated using the program CRYSOL, and superimposed onto the experimental data. The curves are shown in Figure 5.27.

Figure 5.27 –CRYSOL fit for the D132N RNase H + 32-B theoretical vs. experimental data

χ2 = 1.59

experimental data

model data

194

The two datasets superpose really well and the χ2 value for the fit is 1.59,

which is lower than anything SASREF came up with. This is a very good

indication that the crystal structure is also relevant in solution, and that the two

proteins are actually interacting in the way that was seen in the structural studies.

Discussion

Dynamic Light Scattering and Small-Angle X-Ray Scattering studies were

carried out on the D132N RNase H + 32-B complex, in order to better

characterize the interaction between these two proteins.

The DLS results showed that the complex is more stable at 4 °C than at

room temperature, which was an interesting result as the complex crystals could only be grown as large single crystals at room temperature. It was also confirmed that the molecular weight of the complex is consistent with a 1:1 ratio of the two proteins.

The SAXS experiment provided us with some molecular envelopes that are reasonably consistent with the crystal structure. The experimental and the

calculated scattering data were, on the other hand, remarkably consistent with

one another (CRYSOL), which was a very good confirmation of the status of the

D132N RNase H + 32-B complex in solution, as compared to the crystal. When

rigid body modeling was attempted to fit the two proteins together, no model

produced by SASREF was close to the one seen from the structural studies, and

all had a larger error than the CRYSOL fit.

195

In the case of the RNase H + 32 protein complex, structural studies were

successfully done so SAXS was mostly used to confirm that the solution 3D

envelope and the crystal structure were in good agreement with each other.

However, it would be much more difficult to produce a good model of the

interaction between two proteins if the SAXS data was to be used alone (as was

the case for the 59 protein and 32 protein complex, (Dwlgosh, 2008)): The

SASREF modeling program, based only on shape fitting, had difficulties coming

up with a decent model.

5.3.4. Size Exclusion Chromatography

D132N RNase H, 32-B and the complex were analyzed using size

exclusion chromatography. The Superdex 200 column was chosen, since it has a void volume of 200 kDa, as compared to the Superdex 75 which has a void

volume of 75 kDa and therefore was not appropriate for the molecular weights

that were dealt with in this particular study. These molecular weights are listed in

Table 5.11.

Table 5.11 – D132N and 32-B Molecular Weights

D132N RNase H 32-B Complex

31,844 kDa (monomer) Molecular Weight 35,558 kDa 67,402 kDa 63,688 kDa (dimer)

196

The buffer used was composed of 25 mM Bis-Tris HCl pH 6.5, 150 mM

NH4Cl, 10 mM MgCl2 and 2 mM β-mercaptoethanol. The proteins were also

dialyzed in that buffer prior to the experiment, and concentrated to around

0.2 mM. Each protein separately, then the complex was loaded on the column in three separate runs. The three chromatograms are shown in Figure 5.28. D132N was eluted in fractions 57 to 66 (elution time = 77 minutes), 32-B in fractions 52 to 61 (69.5 minutes). The complex run showed two peaks, one in fractions 52 to

59 (68 minutes) and the other in fractions 59 to 65 (76 minutes). The fractions

highlighted with a red star were run on an SDS-PAGE gel, presented in Figure

5.29.

Several problems appeared with this experiment. First, it was seen in

Chapter 3 that 32-B can exist as a dimer in solution. The molecular weight of a

32-B dimer is very close to the molecular weight of the complex (see Table 5.11),

which makes the separation via gel filtration difficult. Indeed, it is really hard to

tell if the first peak observed in the complex run corresponds to the D132N

RNase H + 32-B complex or a dimer of 32-B proteins. The Superdex 75 could

have provided a more efficient separation, which may nonetheless not have been

good enough, but it wasn’t used as its void volume of 75 kDa was too low. The

other problem appeared with the SDS-PAGE gel shown in Figure 5.28, where

D132N RNase H and 32-B run at the same level, and are indistinguishable.

197

Figure 5.28 – D132N RNase H + 32-B Complex Gel Filtration Assay

D132N RNase H (0.23 mM)

* *

32-B Protein (0.20 mM)

*

D132N RNase H + 32-B Complex (0.15 mM)

*******

On the chromatogram, the OD260 is shown in purple, the OD280 in green, the conductivity in red and the % B in black. The red stars indicate which fractions were run on a SDS-PAGE gel.

198

Figure 5.29 – D132N RNase H + 32-B Complex Gel Filtration SDS-PAGE Gel

1- D132N RNase H – Load 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2- D132N RNase H – F. 62

3- D132N RNase H – F. 77 4- Molecular Weight Marker 5- 32-B – Load 6- 32-B – F. 56 7- Molecular Weight Marker

8- Complex – Load 9- Complex – F. 53 10- Complex – F. 55

11- Complex – F. 57 12- Complex – F. 59 13- Complex – F. 61 14- Complex – F. 63 15- Complex – F. 65

To solve that problem, the fractions 54 and 62 from the D132N RNase H +

32-B run were analyzed by mass spectrometry. They were chosen because they

correspond to the beginning of the first peak and the end of the second peak,

therefore minimizing overlaps and cross-contamination of the samples. About

500 µL of each fraction was dialyzed several times in 50 mM Ammonium

Bicarbonate using a MicroconTM concentrator, and then filtered. The samples

were then run on the ESI-Ion Trap Mass Spectrometer available at the

+ Instrumentation Center. Despite the multiple dialyses, the concentration of NH4 and Mg2+ ions in solution was still high, and that made the interpretation of the

spectra difficult. However, it was seen that fraction 54 did contain two different

proteins, but the molecular weights could not be determined precisely because of

199 the high salt concentration. Fraction 62 only contained one protein with a molecular weight of 34,465 kDa, which has to be RNase H since 32-B is only

31.8 kDa. From these results we can say that the first peak observed contained both D132N RNase H and 32-B. That peak then most likely corresponds to a mixture of the complex and of 32-B dimers, as it was twice as intense as the second peak. The identity of RNase H in the second peak was confirmed. It should be pointed out that the fractions were kept at 4 °C for a few weeks before being dialyzed. The proteins at that point were probably partly aggregated, trapping salt ions, which made the dialysis step more difficult. This could also explain why the molecular weight of RNase H came out to be about 1 kDa lighter that it should have been, the protein having been degraded over time.

5.3.5. Isothermal Titration Calorimetry

All the experiments that have been described so far show that D132N

RNase H does bind to 32-B, but these are all qualitative results. In order to better quantify the binding of the two proteins, ITC was performed for the D132N

RNase H + 32-B complex formation.

The two proteins were first dialyzed in the usual dialysis buffer

(25 mM bis-Tris HCl pH 6.5, 150 mM NH4Cl, 10 mM MgCl2 and 2 mM

β-mercaptoethanol) and then concentrated. The concentration of the protein in the syringe has to be roughly 20 times higher than the one in the cell. D132N

RNase H was the titrated protein, placed in the cell, and 32-B the titrant placed in the syringe, the reason being that 32-B can be obtained in larger quantities and

200 at a higher concentration, making it the obvious candidate for the titrant protein.

The concentration of the two proteins for each run is given in Table 5.12.

Table 5.12 – Protein Concentrations Used in the ITC Experiment

Titrant / Titrated Protein 32-B D132N RNase H

Buffer / Buffer / /

Buffer / D132N RNase H / 0.0516 mM

32-B / Buffer 0.980 mM /

32-B / D132N RNase H 0.917 mM 0.0509 mM

All the runs were set up in a similar fashion: 40 injections of 5 µL and 10 s were made, except for the first injection that only contained 1 µL. The rotation speed of the syringe was 270 rpm, and the temperature of the cell 20 °C. Several control runs had to be done first. The buffer-buffer run, or injection of buffer into buffer, provides the heat of injection. The 32-B-buffer and the buffer-D132N

RNase H runs give the heat of dilution of each protein into the buffer. The different heats (injection, dilution) obtained from the control runs then have to be subtracted from the 32-B-D132N RNase H data, in order to get the actual heat of binding. Figure 5.30 shows the 32-B into D132N RNase H titration curve. The shape of the curve was then fitted assuming a one-site binding, and the fit was used to calculate the thermodynamic parameters of the binding of 32-B to D132N

RNase H, which are summarized in Table 5.13.

201

Figure 5.30 – D132N RNase H + 32-B Isothermal Titration

Table 5.13 – Thermodynamic Parameters of the D132N RNase H + 32-B

Complex Formation

Parameter Result

N 0.92 ± 0.03

Kd (µM) 3.8 ± 0.7

∆H (kJ.mol-1) 8.1 ± 0.3

∆S (J.K-1.mol-1) 31.4

T∆S (kJ.mol-1) 9.2

∆G (kJ.mol-1) -1.1

202

The dissociation constant calculated for the D132N RNase H + 32-B

complex is in the low micromolar range, 3.8 µM, indicating that the affinity of the

two proteins for each other is not very strong, but strong enough so that the complex can be observed in in vitro assays, like on the non-denaturing gel. Also, the binding reaction appears to be endothermic (∆H > 0), and therefore entropy

driven. This is interesting, as it implies that the complex would form more readily

when the temperature increases, i.e. at room temperature versus 4 °C. This

might be the reason why showers of small crystals were observed at 4 °C, and larger single crystals could only be grown at room temperature. However, the proteins are less stable at room temperature than at 4 °C, as it was seen with the

DLS experiments, and have a tendancy to aggregate. These two opposite phenomena probably explain why the D132N RNase H + 32-B crystals could be grown at room temperature, but had some intrinsic disorder due to the lower stability of the proteins.

5.3.6. Fluorescence Anisotropy Titration

Fluorescence Anisotropy is another experiment that allows for quantitative characterization of the binding of a protein to another. One way of carrying out

this experiment, in the case of the D132N RNase H + 32-B complex, would be to

label one of the proteins with the fluorescent label, and titrate it with the other

protein. However, it has been shown that labeling of some proteins can affect

their interaction with other proteins or substrates. It is the case for RNase H:

once labeled, it doesn’t bind to its DNA substrate anymore, but the labeled DNA

203

binds to the unlabeled RNase H with a low nanomolar affinity (Juliette M. Devos,

personal communication). To circumvent these issues, it was chosen to use a

labeled DNA substrate loaded with RNase H as the starting point of the titration,

and then use 32-B as the titrant. That strategy also has the advantage of

bringing DNA back in the experiment, since the two proteins only interact in

physiologically relevant conditions through DNA. The labeled DNA substrate that

was chosen is shown in Figure 5.31. It is a 20/30 fork, labeled at its 5’-end with

HEX-Fluorescein. HEX absorbs at 535 nm and emits at 556 nm.

Figure 5.31 – Fluorescence Anistropy Titration Fork DNA Substrate

5’ 5 15 * 3’ HEX Label

5’ * 15 15

3’

The proteins were dialyzed in the dialysis buffer composed of 25 mM bis-Tris HCl pH 6.5, 150 mM NH4Cl, 10 mM MgCl2 and 2 mM β-mercaptoethanol.

They were then concentrated to around 300 µM for D132N RNase H and 700 µM

for 32-B, so as to limit the dilution effect during the titration. The DNA substrate

was annealed according to the protocol described in Section 2.14.2. The initial

concentration for the starting material in the cuvette (fork DNA + D132N RNase

H) was 10 µM, where only 5 % of the DNA was labeled and the rest was the

same 20/30 fork DNA without the 5’-HEX label. The reason behind that trace

204 experiment is that micromolar concentrations of protein and DNA are needed for titrations having a micromolar Kd, but a micromolar concentration of labeled DNA would saturate the detector, as well as being extremely expensive. This particular concentration was chosen according to the results from the ITC experiment, which gave a dissociation constant of 3.8 µM. 32-B at 400 µM was then titrated until equilibrium was reached.

The anisotropy is calculated from the parallel and perpendicular intensities of the light emitted by the fluorophore and polarized by the instrument, as shown in the following equation.

I − I = ll ⊥ Equation 5.3 I ll + 2I ⊥

A total number of ten readings of the anisotropy were taken after each addition and averaged before being used in the titration curve. The initial fork

DNA anisotropy readings averaged around 0.07, and increased to 0.16 to 0.17 when D132N RNase H was loaded on the fork. It should be noted that the dissociation constant of RNase H for fork DNA is around 20 nm, so if the concentration is 10 µM, 98 % or more of RNase H is bound to the DNA. The titration curve for the 32-B + D132N RNase H + 20/30 fork DNA and the calculated fit are shown in Figure 5.32.

205

Figure 5.32 – Fluorescence Anisotropy Titration of the 32-B + D132N

RNase H + fork DNA Complex

The anisotropy shown on the Y axis has been normalized and really is the change in anisotropy while titrating 32-B.

Kd = 3.65 ± 0.01 µM

The dissociation constant obtained from the titration is 3.65 ± 0.01 µM, indicating a binding in the presence of DNA comparably close to the D132N

RNase H + 32-B binding obtained from the ITC experiment (3.8 µM).

To complete the study on the binding of D132N RNase H to 32 protein,

the same experiment was repeated with the other truncations of 32 protein

available to us, namely 32-A, the core domain and the full length 32 protein. This

was made possible by the fact that Fluorescence Anisotropy titrations are not

very much time- and material-consuming. The same thing cannot be said about

Isothermal Titration Calorimetry, which is why even though the binding of the 32

protein and 32 core domain to D132N RNase H was witnessed on the native gels

shown in Figure 5.1, it was too weak to justify ITC titrations. On the other hand, it

was seen on the gel-shift assays (Figure 5.4.b) that 32 protein and the 32-A

206

truncation do bind quite strongly to D132N RNase H, in the presence of a fork

DNA, justifying the FA titrations.

The titration curves and fits for the four 32 protein truncations being

titrated into D132N RNase H + fork DNA are shown in Figure 5.33. A summary of the dissociation constants for each truncation is also available in Table 5.14.

From this, it can be seen that the strongest binding is observed with the 32-B truncation (3.65 µM), followed by the 32 core domain (13.14 µM), the 32-A

truncation (31.98 µM) and finally the full length 32 protein (105.8 µM). These

results are consistent with the relative strength of binding observed from the gel

shift assays (see Section 5.2.2), with the exception of the 32 core domain that

wasn’t used in those.

Table 5.14 –Summary of the Dissociation Constants for the 32 Truncations

+ D132N RNase H + Fork DNA Complex

120

32 Truncation Kd (µM) 100

32 Protein 105.8 ± 0.1 80 60 32-B 3.65 ± 0.01 Kd (uM) 40 32-A 31.98 ± 0.07 20

32 core 13.14 ± 0.04 0 32 32-B 32-A 32core

207

Figure 5.33 – Fluorescence Anisotropy Titrations of the 32 Truncations +

D132N RNase H + Fork DNA Complex

32-B 32

32 core 32-A

The anisotropy shown on the Y axis has been normalized and brought back to zero, so that the titration curve only shows the change in anisotropy induced by the binding of the 32 protein.

The 32-A titration did show some additional interesting features. While for

the other three titrations, the anisotropy only increased by 0.04 at the most, it

increased by 0.12 with 32-A, a three fold increase compared to the other 32

truncations. Such a jump in anisotropy can only be explained by the formation of

a much larger complex in this particular case. Referring to Sections 1.1.2 and

3.1, the B domain that is present in the 32-A truncation is responsible for the cooperative binding of the 32 protein to itself, resulting in the formation of

208 filaments (Giedroc et al., 1991; Casas-Finet et al., 1992). There is a good chance that 32-A also displays this cooperative binding. Therefore, when 32-A is titrated into the RNase H + fork DNA complex, not only does it bind to RNase H but also to itself when added in excess. This would not have been observed on the gel shift assays that were run at low nanomolar concentrations.

5.3.7. Protein-Protein-DNA Crystallization

An important step in studying the interaction between D132N RNase H and the 32-B protein truncation was to screen the ternary complex for crystals. A crystal structure of RNase H and 32 protein in the presence of DNA would provide a lot more information about the way these proteins come together and organize the lagging strand replication.

The same DNA substrates, the 3’-overhang and the fork DNA, that have previously been described were used in the screens. As a reminder they are shown again in Figure 5.34.

Figure 5.34 – DNA Substrates Used in the Ternary Complex Screens

12 3’ 5’ • 12/24 3’-overhang 3’-overhang DNA 5’ • 12/27 3’-overhang 12 12/15/18 • 12/30 3’-overhang

3’ 5’ 4 12 3’ • 16/24 fork DNA Fork DNA 5’ • 16/27 fork DNA 12 • 16/30 fork DNA 12/15/18 3’

209

The ternary complex was always prepared in a 1:1:1 ratio, by incubating

D132N RNase H with the DNA substrate first, and then adding the 32 protein.

When this part of the project was initiated, the D132N RNase H + 32-B crystals had already been obtained, but not optimized, and the two crystallization conditions described in Figure 5.7 were known. A first crystallization attempt was done using these. The ternary complex ( D132N RNase H + 32-B + 12/24 or

12/27 or 12/30 3’-overhang) was screened at 0.2 mM and room temperature, in 2

µL + 2 µL hanging drops. The crystallization conditions were:

• 10-15 % PEG 4000 + 0.1 M HEPES pH 7.5 + 0.2 M Ammonium Chloride

• 10-20 % PEG 3350 + 0.1 M Tris HCl pH 7.5 + 0.2 M Calcium Acetate

Only microcrystals or precipitate were obtained with both conditions.

The D132N RNase H + 32-B + fork DNA was then screened using the

Crystal Screen I/II and the Wizard I/II commercial screens. A summary of the

different screens is given in Table 5.15. The screens were setup in Greiner trays, by hand, with 0.2 mM of the D132N RNase H + 32-B + DNA substrate complex.

Table 5.15 – D132N RNase H + 32-B + DNA Crystal Screens

Crystal Screen DNA Substrate Temperature Drop (µL)

Crystal Screen I and II 16/30 fork DNA Room Temp. 1 + 1

Wizard I and II 16/30 fork DNA Room Temp. 1 + 1

Crystal Screen I and II 16/27 fork DNA 4 °C 1 + 1

Crystal Screen I and II 16/30 fork DNA 4 °C 1 + 1

Wizard I and II 16/27 fork DNA 4 °C 1 + 1

Wizard I and II 16/30 fork DNA 4 °C 1 + 1

210

First of all, a trend was observed, where the ternary complex made with the shorter DNA substrate seemed to be more stable and crystallize more easily

that with the longer DNA substrate, that had a tendancy to precipitate more

readily. It was initially thought that the longer DNA substrate would be better, as

the binding from the gel shift assays was stronger (see Figure 5.4.a). However,

like it has been mentioned before, these gels were run at nanomolar

concentrations, and the crystallization experiment in the millimolar range, so

about a 10,000 fold increase in concentration. This is probably enough to drive

the binding of the 32 protein on a shorter DNA strand. Moreover, a shorter 3’-arm

DNA would also mean a more homogeneous ternary complex, resulting in better crystallization results.

The best crystal hits were obtained from the 4 °C screens, even though the most promising hits were all microcrystals. They are shown in Figure 5.35 below. The first one, however, is a salt crystal as it did not stain with the Izit blue

dye.

These hits were used as the starting point for crystal optimization

experiments. Unfortunately, these crystals could never be grown again.

211

Figure 5.35 – D132N RNase H + 32-B + Fork DNA Crystals after Screening

16/27 fork Crystal Screen I – Condition 6

30 % PEG 4000 0.1 M Tris HCl pH 8.5 0.2 M Magnesium Chloride

Wizard I – Condition 3

15 % ethanol 0.1 M Na CHES pH 9.5 16/30 fork 16/27 fork

Crystal Screen II – Condition 11

1,6-hexanediol 0.1 M Na Acetate pH 4.6 0.01 M Cobalt Chloride

Wizard II – Condition 28

20 % PEG 8000 0.1 M Na MES pH 6.0 0.2 M Calcium Acetate 16/30 fork

5.3.8. 32-B Mutants Studies

As previously discussed in Section 5.3.2, three mutants were designed in order to further characterize the interaction between D132N RNase H and 32-B.

The cloning, expression and purification of these mutants is described in Section

3.6. Only two out of the three mutants that were designed could be successfully cloned, and the W144E mutant never got past the PCR stage. As a reminder, a

212 picture of the location of the other two mutated residues, I151 and I60, is shown below in Figure 5.36.

Figure 5.36 – Location of the 32-B Mutated Residues at the Interface between D132N RNase H and 32-B

I60

I151

32 protein RNase H

32 protein is represented in blue and RNase H in green. The two mutants are colored in pink. The RNAse H electrostatic map is also shown in mesh.

A non-denaturing gel was run first, to assess if the 32-B mutants can still bind to D132N RNase H. This gel is shown in Figure 5.37. It was run in a bis-Tris

HCl pH 6.5 buffer, which is why it looks different from the gel in Figure 5.1, which was run in a Tris HCl pH 6.5 buffer. The wild type 32-B + D132N RNase H complex is in lane 2 and did not move from the well, and no extra bands can be observed for the separate 32-B and RNase H proteins either. For the two mutants on the other hand (lane 5 and 8), a band can be seen close to the well, especially for I151D 32-B, but bands for the 32B mutant and D132N RNase H

213 are also observed. This indicates that binding between RNase H and the 32-B mutant is still occurring, but at a weaker level than with the wild type 32-B. Also, by comparing lane 5 and 8, the I60D 32-B mutant seems to break the interaction with RNase H better than I151D 32-B.

Figure 5.37 – D132N RNase H + 32-B Mutants Native Gels

WT I151D I60D 1 2 3 4 5 6 7 8 9 1- 32-B Wild Type Protein 2- 32-B + D132N RNase H 3- D132N RNase H

4- I151D 32-B Protein 5- I151D 32-B + D132N RNase H 6- D132N RNase H

7- I60D 32-B Protein 8- I60D 32-B + D132N RNase H 9- D132N RNase H

In order to quantify the effect of the mutants on the 32-B + D132N

RNase H interaction, Fluorescence Anisotropy titrations were performed. Another parameter that was tested was the influence of the N-terminal His-Tag of the

32-B mutant. Titrations were carried out similarly to the 32-B titration into 20/30 fork DNA + D132N RNase H. Since the binding of the 32-B mutants to D132N

RNase is expected to be weaker, the initial concentration of D132N RNase H +

214

fork DNA was increased to 15 µM. The titration curves and fits for all four titrations are presented in Figure 5.38.

Figure 5.38 – Fluorescence Anisotropy Titration of the 32-B Mutants +

D132N RNase H + fork DNA Complex

I151D 32-B I151D 32-B (N-term His-Tag)

I60D 32-B I60D 32-B (N-term His-Tag)

The anisotropy shown on the Y axis has been normalized and brought back to zero, so that the titration curve only shows the change in anisotropy induced by the binding of the 32 protein.

The dissociation constants calculated from each titration are tabulated in

Table 5.16 below.

215

Table 5.16 – Summary of the Dissociation Constants for the 32-B Mutants +

D132N RNase H + fork DNA Complex

32-B Kd (µM)

Wild Type 3.65 ± 0.01

I151D 32-B 30.99 ± 0.02

I151D 32-B + His-Tag 31.92 ± 0.06

I60D 32-B 34.21 ± 0.05

I60D 32-B + His-Tag 39.99 ± 0.05

The first important point is that for both mutants, the 32-B + D132N

RNase H interaction is about ten times weaker than with the wild type 32-B protein. Moreover, confirming what was seen on the native gel (Figure 5.36), the

I60D mutation does seem to have a stronger effect on the interaction with RNase

H. Looking at the 32-B / RNase H crystal structure, it looks like I60 is more buried

in the hydrophobic interface than I151, which could explain why its mutation has

a more dramatic effect. Overall, both mutants did show a decrease in affinity for

D132N RNase H, and this results confirms that the crystal structure is indeed

physiologically relevant, and that the two proteins bind via hydrophobic

interactions.

The other result from these titrations is that the N-terminal His-Tag has no

effect on the binding to RNase H. From this, two things follow. First, it confirms that the N-terminus of 32-B is completely out of the way of the interaction with

RNase H, like what was seen with the crystal structure. The other thing, more

216 practical, is that the two versions of each 32-B mutant, containing the His-Tag or not, can be used interchangeably, and that will be done later on in this study.

5.3.9. Conclusion

The D132N RNase H + 32-B complex was extensively characterized and described in this chapter.

Biophysical experiments were done, to qualitatively and quantitatively characterize the interaction between the two proteins: non-denaturing gel electrophoresis showed that the interaction is strong enough to be witnessed in a gel, a result that was seen again with size exclusion chromatography. The SEC however revealed that the complex is in equilibrium with the two separate proteins. The interaction was quantitatively characterized with Isothermal

Titration Calorimetry and Fluorescence Anisotropy Titrations. The two experiments were in very good agreement with one another and gave a dissociation constant of around 3.5 µM.

The complex was further characterized by structural studies. A crystal structure was obtained, where the C-terminus of RNase H interacts with the 32 core domain, and from which a number of 32-B mutants were designed in order to test the X-Ray data model. It was then shown that the two 32-B mutants did have an effect on the RNase H + 32-B complex, and through FA titrations that the interaction was about an order of magnitude weaker.

Solution-based studies were also performed, such as Dynamic Light

Scattering and Small-Angle X-Ray Scattering. The SAXS data agreed rather well

217

with the crystal structure, which was a good indication that the interaction that is

seen in the crystal is also relevant in solution.

Crystallization experiments were done with the D132N RNase H + 32-B +

DNA ternary complex, but no diffraction quality crystals could be grown.

Finally, the interaction between RNase H, the different 32 protein

truncations and a fork DNA substrate was characterized with Fluorescence

Anisotropy titrations. It appears that the strongest complex is obtained with the

32-B truncation and the 32 core domain, the weakest being observed with the 32

full length protein and 32-A to some extent. Even though 32-B interacts with

RNase H through the core domain, the A domain must be playing an important

role as its truncation weakens the complex quite a lot. There must also be some

domain movement associated with the B domain upon DNA binding or influence

of the cooperative binding of 32, since the full length 32 protein appears to bind

more weakly to RNase H than the 32-B truncation. Unfortunately, the crystal

structure did not show where the A domain is located when the 32 protein is

bound to RNase H.

218

5.4. D132N ∆N RNase H + 32-B Protein Interaction

5.4.1. Complex Preparation

Preparing the D132N ∆N RNase H + 32 protein complexes is not as

straightforward as the D132N RNase H + 32 protein. As it was described in

Section 4.3, D132N ∆N RNase H had solubility issues and could not be obtained

at high concentration. However, in order to make the protein complexes, the separate proteins are needed at a high enough concentration, so that upon

mixing and dilution the complex can be made at a reasonable concentration too.

Since D132N ∆N RNase H can only be concentrated in the presence of glycerol

(see Section 4.3.5), the separate proteins were concentrated and the complex

prepared before it was dialyzed in its optimum buffer (25 mM Tris HCl pH 7.5,

150 mM NH4Cl, 10 mM MgCl2 and 2 mM β-mercaptoethanol). After dialysis, the

complex was re-concentrated if need be, before it was used.

5.4.2. Protein-Protein Crystallization

The first crystallization trial for the D132N ∆N RNase H + 32-B complex

was done using the crystallization conditions obtained from the D132N RNase H

+ 32-B complex (Section 5.3.2). As a reminder, here are the two crystallization

conditions again:

• 5-15 % PEG 4000 + 0.1 M HEPES pH 7.5 + 0.2 M Ammonium Chloride

• 10-20 % PEG 3350 + 0.1 M Tris HCl pH 7.5 + 0.2 M Calcium Acetate

219

Crystals were obtained only with the first condition, the other one yielding

only precipitate. Some of these crystals are shown in Figure 5.39.

Figure 5.39 – D132N ∆N RNase H + 32-B Initial Crystals

7.6 % PEG 4000 Complex at 0.15 mM 0.1 M HEPES pH 7.5 Room Temperature 0.2 M Ammonium Chloride 2 µL + 2 µL hanging drop

Complex at 0.15 mM 7.6 % PEG 4000 Room Temperature 0.1 M HEPES pH 7.5 2 µL + 2 µL hanging drop 0.2 M Ammonium Chloride

The crystals were cryoprotected similarly to the D132N RNase H + 32-B crystals, and flash-frozen in liquid nitrogen. They were then screened for X-Ray

diffraction but none of them diffracted.

The D132N ∆N RNase H + 32-B complex was therefore screened for

crystal hits, as shown in Table 5.17 below.

220

Table 5.17 – D132N ∆N RNase H + 32-B Crystal Screens

Crystal Screen Concentration Temperature Drop (µL)

Crystal Screen I and II 0.11 mM Room Temp. 1 + 1

Wizard I and II 0.11 mM Room Temp. 1 + 1

Additive Screen 0.1 mM Room Temp. 1 + 1

Index 0.085 mM Room Temp. 1 + 1

PEG Ion Screen 0.09 mM Room Temp. 1 + 1

Natrix 0.09 mM Room Temp. 1 + 1

A number of crystal hits were obtained from the screens, but most of them were microcrystals. Some of the best ones are presented in Figure 5.40.

Figure 5.40 – D132N ∆N RNase H + 32-B Crystal Hits after Screening

Crystal Screen I – Condition 46

18 % PEG 8000 0.1 M Na Cacodylate pH 6.5 0.2 M Calcium Acetate

Crystal Screen I – Condition 36

8 % PEG 8000 0.1 M Tris HCl pH 8.5

Wizard II – Condition 18

20 % PEG 3000 0.1 M Tris HCl pH 7.0 0.2 M Calcium Acetate

221

It was noticed that a lot of the hits were obtained in PEG 4000 or 8000,

and the identity of the buffer or the salt (if present) did not have an influence on

the crystal morphology. A first expansion screen, shown in Figure 5.41.c, was

designed, with only PEG 4000 or 8000 and different buffers. It appeared that

microcrystals formed better at higher pH, but the absence of salt in that screen did seem to have an influence on crystal growth, as a lot of the drops were

precipitated. Another expansion screen was set up, as shown in Figure 5.41.d.

Four different salts were tested at a 0.1 M concentration. These two screens

were done at room temperature and 0.3 mM protein concentration, in a 2 + 2

hanging drop setup.

222

Figure 5.41 – D132N ∆N RNase H + 32-B Crystal Optimization

c 5 % PEG 4000 or 8000 25 %

0.1 M Na Cacodylate pH 6.5

0.1 M Na HEPES pH 7.5

0.1 M Tris HCl pH 8.5

0.1 M Na CHES pH 9.5

0 % PEG 3400, 4000 or 8000 10 % d 0.1 M Magnesium Acetate

0.1 M Calcium Acetate

0.1 M Sodium Formate

0.1 M Lithium Sulfate

+ 0.1 M Na CHES pH 9.5

The best crystals that were grown with this strategy are presented in

Figure 5.42.

Figure 5.42 – D132N ∆N RNase H + 32-B Crystal Hits after Optimization

4.0 % PEG 4000 Complex at 0.3 mM 0.1 M Na CHES pH 9.5 Room Temperature 0.2 M Magnesium Acetate 2 µL + 2 µL hanging drop

223

Since these crystals could not be grown bigger than what is shown on the

picture, the macroseeding technique was used: some of the small crystals were

used as seeds and placed in a new crystallization drop. The attempt was

however unsuccessful and no large single crystals were obtained for the D132N

∆N RNase H + 32-B complex.

5.4.3. Protein-Protein-DNA Complex

The D132N ∆N RNase H + 32-B + DNA complex had been identified from

the gel shift assays as one of the strongest ternary complexes.

The complex was prepared at around 0.1 mM with the 16/30 fork DNA

shown on Figure 5.33. D132N ∆N RNase H was loaded onto the DNA substrate first and left to incubate for fifteen minutes on ice before the 32-B protein was added. The addition of D132N ∆N RNase H to the DNA caused precipitation, but

this phenomenon had been observed before with D132N RNase H, and the

complex redissolved again when the 32 protein was added. Here however, the

addition of 32-B to the D132N ∆N RNase H + fork DNA did not re-solubilize the

complex.

5.4.4. 32-B Mutants Studies

Two biophysical techniques, non-denaturing electrophoresis and

fluorescence anisotropy titration, were applied to study the interaction between

the ∆N truncation of RNase H and the 32-B protein.

224

The native gel in Figure 5.43 was run at pH 6.5 and 100 µM protein concentration, similarly to the D132N RNase H + 32-B mutants native gel presented in Figure 5.37. The D132N ∆N RNase H + 32-B complex, in lane 2,

looks rather strong, and the two complexes with the 32-B mutants, lane 5 and 8,

appear to be weaker.

Figure 5.43 – D132N ∆N RNase H + 32-B Mutants Native Gels

WT I151D I60D 1 2 3 4 5 6 7 8 9

1- 32-B Wild Type Protein 2- 32-B + D132N ∆N RNase H

3- D132N ∆N RNase H 4- I151D 32-B Protein 5- I151D 32-B + D132N ∆N RNase H

6- D132N ∆N RNase H 7- I60D 32-B Protein 8- I60D 32-B + D132N ∆N RNase H

9- D132N ∆N RNase H

Next, fluorescence anisotropy titrations were run in the same conditions

the D132N RNase H + 32-B mutants titrations were. The WT 32-B titration was

done at 10 µM D132N ∆N RNase H + 20/30 fork DNA, while the I151D and I60D

32-B titrations were done at 15 µM.

225

The dissociation constant for the D132N ∆N RNase H + WT 32-B + DNA

is comparable to the D132N RNase H complex one, even though slightly larger.

The D132N ∆N RNase H complex however did look stronger on the native gels

(see Figure 5.1 and Figure 5.2). The two mutants show a weaker interaction with

D132N ∆N RNase H than they did with the full length RNase H, but the difference

in affinity between the wild-type 32-B and the mutants is not as dramatic here

with the truncated RNase H. The N-terminus of RNase H, even though it was

disordered in the crystal structure, is located near the binding site of the 32

protein. Therefore, when the N-terminus is missing like in the case of D132N ∆N

RNase H, it is possible that the binding of 32 protein is less restricted to a specific area, and the presence of the mutated residues can be accommodated by sliding along RNase H towards where the N-terminus would be. Another point is that the I151D 32-B mutant appears to bind quite strongly to D132N ∆N RNase

H and the 20/30 fork DNA. However, the His-Tag version of that protein was

used in this titration and the His-Tag might be playing a role in the increase of

binding affinity that is observed.

226

Figure 5.44 – Fluorescence Anisotropy Titrations of the D132N ∆N RNase H

+ 32-B Mutants + Fork DNA Complex

WT 32-B

32-B Kd (µM)

Wild Type 6.70 ± 0.05

I151D 32-B 10.51 ± 0.03

I60D 32-B 21.0 ± 0.1

I151D 32-B I60D 32-B

227

5.5. D132N ∆N RNase H + 32 Protein Interaction

5.5.1. Complex Preparation

Similarly to the D132N ∆N RNase H + 32-B complex, the D132N ∆N

RNase H + 32 protein complex had to be prepared with concentrated stocks of

D132N ∆N RNase H and 32 protein, then dialyzed and finally concentrated (see

Section 5.4.1).

5.5.2. Protein-Protein Crystallization

The D132N ∆N RNase H + 32 Protein complex was screened for crystal

hits, according to Table 5.18 below. The screens were setup at 4 °C in order to

help keep the proteins in solution, as D132N ∆N RNase H has a tendency to

precipitate.

Table 5.18 – D132N ∆N RNase H + 32 Protein Crystal Screens

Crystal Screen Concentration Temperature Drop (µL)

Crystal Screen I and II 0.1 mM 4 °C 1 + 1

Wizard I and II 0.1 mM 4 °C 1 + 1

A few crystal hits were obtained, and two of them are shown in Figure

5.45. All the hits were microcrystalline ones.

Expansion trays were then setup using these two conditions, but crystals

large enough for diffraction studies could not be obtained.

228

Figure 5.45 – D132N ∆N RNase H + 32 Protein Crystals after Screening

Crystal Screen I – Condition 27

20 % 2-propanol 0.1 M Na HEPES pH 7.5 0.2 M Sodium Citrate

Crystal Screen II – Condition 43 50 % MPD 0.1 M Tris HCl pH 8.5 0.2 M Ammonium Phosphate monobasic

5.5.3. Protein-Protein-DNA Crystallization

The ternary complex (D132N ∆N RNase H + 32 Protein + fork DNA) was

also screened for diffraction, as it was one of the DNA complexes identified by

native gels (Section 5.2.2). The screens were done as follows in Table 5.19.

Table 5.19 – D132N ∆N RNase H + 32 Protein + DNA Crystal Screens

Crystal Screen Concentration Temperature Drop (µL)

Crystal Screen I and II 0.1 mM 4 °C 1 + 1

Wizard I and II 0.1 mM 4 °C 1 + 1

Unfortunately, no crystal hits were obtained with these screens, and all the

wells were precipitated.

5.5.4. Fluorescence Anisotropy

Finally, fluorescence anisotropy titrations were performed. In a similar way

to what was done with D132N RNase H + the 32 truncations, all four 32 protein

truncations were used with the D132N ∆N RNase H. These experiments were

run identically to the previous ones, with the same labeled DNA substrate.

The results are shown in Figure 5.46 and Table 5.20. The strongest

binding is observed with the 32-B truncation, with a Kd of 6.70 µM. Next are the

32 full length and the 32 core proteins, with a respective Kd of 12.3 and 12.4 µM.

Finally, the 32-A truncation shows the weakest binding with a Kd of 23 µM. It is interesting to see that the 32 full length protein is able to bind more strongly to

RNase H, when it is missing the N-terminus domain. This is consistent with what was said earlier in Section 5.2.3, about domains moving upon DNA binding in order to allow the binding of one protein to the other. In this particular case, the absence of the N-terminus allows the full length 32 protein to bind, just like the

32-B truncation could bind quite strongly to the full length RNase H, but the full length 32 could not.

229 230

Figure 5.46 – Fluorescence Anisotropy Titrations of the 32 Truncations +

D132N ∆N RNase H + Fork DNA Complex

32-B 32

32 core 32-A

Table 5.20 – Summary of the Dissociation Constants from the FA Titrations

25

32 Truncation Kd (µM) 20 32 Protein 12.31 ± 0.03 15

32-B 6.70 ± 0.05 Kd (uM) 10

32-A 22.98 ± 0.04 5

32 core 12.42 ± 0.05 0 32 32-B 32-A 32core

231

5.6. D132N ∆N RNase H + 32 Core Protein Interaction

5.6.1. Complex Preparation

Similarly to the D132N ∆N RNase H + 32-B complex, the D132N ∆N

RNase H + 32 core protein complex had to be prepared with concentrated stocks of D132N ∆N RNase H and 32 core protein, then dialyzed and finally

concentrated (see Section 5.4.1).

5.6.2. Protein-Protein Crystallization

The D132N ∆N RNase H + 32 core domain complex was screened for

crystal hits, as described in the Table 5.21 below.

Table 5.21 – D132N ∆N RNase H + 32 Core Crystal Screens

Crystal Screen Concentration Temperature Drop (µL)

Crystal Screen I and II 0.1 mM 4 °C 1 + 1

Wizard I and II 0.1 mM 4 °C 1 + 1

A few, nice crystals hits were obtained from the screens. Some of them

are shown on Figure 5.47. These crystallization conditions were then expanded

on, but only the 30 - 40 % MPD + 0.1 M Tris HCl pH 8.5 + 0.2 M Ammonium

Sulfate Monobasic condition grew single crystals that were big enough to be

flash-frozen and screened for X-Ray diffraction. These crystals are shown in

Figure 5.48. However, only faint diffraction was observed, so further optimization

of these crystals was needed.

232

Figure 5.47 – D132N ∆N RNase H + 32 Core Crystals after Screening

Crystal Screen I – Condition 27

20 % 2-propanol 0.1 M Na HEPES pH 7.5 0.2 M Sodium Citrate

Crystal Screen II – Condition 43 50 % MPD 0.1 M Tris HCl pH 8.5 0.2 M Ammonium Phosphate monobasic

Wizard I – Condition 12

20 % PEG 1000 0.1 M Imidazole pH 8.0 0.2 M Calcium Acetate

Wizard I – Condition 46

10 % PEG 8000 0.1 M Imidazole pH 8.0 0.2 M Calcium Acetate

Figure 5.48 – D132N ∆N RNase H + 32 Core Crystals after Optimization

35.4 % MPD Complex at 0.47 mM 0.1 M Tris HCl pH 8.5 4 °C 0.2 M Ammonium Phosphate 2 µL + 2 µL hanging drop

233

In order to increase the size of the crystals, larger drops were setup using the sitting drop method. First, round micro-bridges from Nextal (now Qiagen) were used, but this resulted in the precipitation of the protein. Poly-propylene micro-bridges from Hampton Research were used next, but only microcrystals mixed with precipitate could be grown with these. The sitting drop method or the plastic used in the microbridges might not have been appropriate for use with this particular complex, even though the same phenomenon was not observed with the D132N RNase H + 32-B protein complex. The hanging drop method was then used by default, but no large crystals could be obtained.

5.6.3. Fluorescence Anisotropy

The results from the D132N ∆N RNase H + 32 core + fork DNA

fluorescence anisotropy titration were presented in Section 5.5.4. The

dissociation constant was calculated to be 12.4 µM. As a comparison, the D132N

RNase H + 32 core + fork DNA dissociation constant was around 13.1 µM, so the

two bindings are fairly similar.

234

5.7. Conclusion

The work done to characterize bacteriophage T4 RNase H, the 32 protein

and their interaction was described in the previous three chapters.

The 32 protein, along with three truncations, was expressed and purified.

The 32-B truncation, missing the N-terminal cooperativity domain, was

characterized in further details. Even though a crystal structure could not be

obtained, scattering studies showed that the 32-B protein in solution is in

equilibrium between a monomeric and a dimeric form. The shape of the A

domain was also obtained from the SAXS experiment. Two 32-B mutants were

cloned, expressed and purified, and were used in the RNase H + 32 protein

study.

The native RNase H, the D132N RNase H mutant and the D132N ∆N

RNase H N-terminal truncation were all expressed and purified.

The D132N RNase H + 32 protein complex was characterized using different techniques. The different 32 protein truncations were tested in complex with RNase H, and it appeared that the 32-B truncation formed the strongest

RNase H + 32 complex. Dissociation constants were obtained by fluorescence anisotropy titrations for RNase H complexed with each 32 truncation, in the presence of the fork DNA substrate. The 32-B + RNase H dissociation constant in the absence of DNA was also obtained by ITC, and was consistent with the one from the fluorescence anisotropy titration. The 32-B + RNase H complex being the most robust, it was extensively characterized. A crystal structure at

3.4 Å resolution was obtained, which showed that the two proteins interact with

235

each other through the C-terminus of RNase H and the core domain of 32.

However, the A and B domain of the 32 protein seem to have an influence on the

interaction, but this could not be seen from the crystal structure as the B domain

was missing and the A domain disordered. The crystal structure was validated by

the SAXS experiment. Modeling of a fork DNA substrate in the RNase H + 32

core model further validated the crystal structure. Finally, site-directed

mutagenesis studies provided another confirmation of the RNase H + 32 model.

An N-terminal truncation of RNase H was then used in complex with the

32 truncations, to further probe the role of the 32 domains. Complexes of the

D132N ∆N RNase H with the 32-B, 32 core domain and 32 full length were

obtained, but working with these was made difficult by the low solubility of the

RNase H double mutant. It appeared that the N-terminus of RNase H, even

though it is not directly involved in binding to 32 but rather to the 45 clamp protein, still played a role in the RNase H / 32 protein interaction. The A domain

and the cooperative binding of 32 again seemed to play a role, however it could

not be determined which one from the biophysical or structural studies.

Since the N-terminal domain of RNase H and the A domain of the 32

protein were both disordered in the T4 D132N RNase H + 32-B crystal structure.

It would be interesting to see if more information can be obtained from homologous proteins from a related bacteriophage. This new project was initiated and the work accomplished so far is described in the following chapter.

CHAPTER 6 - Bacteriophage Rb69

6.1. Introduction

Bacteriophage Rb69 is a T4-like phage, meaning it is a close relative to bacteriophage T4. The genomic map of Rb69 is shown in Figure 6.1, and as it can be seen, most of the Rb69 genomic DNA is orthologous to that of T4.

Highlighted in yellow are the genes that differ between Rb69 and T4, they do not include the genes for RNase H and 32 protein (helix destabilizing protein) that are highlighted in red on the figure. If compared to the genomic map of bacteriophage T4, both genes for the two organisms are located in the same 150 region.

A common strategy in X-Ray crystallography is to study the same protein from closely related organisms. The protein is expected to have the same overall fold and structure, the important residues for substrate binding and catalysis are most commonly conserved, however surface residues can vary. These residues are the ones involved in lattice contact in forming the crystal. Therefore, one protein might crystallize better than the other and give better diffracting crystals.

Since the bacteriophage T4 RNase H + 32-B protein complex did not produce high resolution diffraction crystals, switching to the Rb69 RNase H + 32-B complex might yield better results in terms of X-Ray diffraction resolution and

236 237 provide a better picture of how the complex comes together. Moreover, it could be interesting to study the Rb69 RNase H + 32-B complex and compare it to the results that were obtained from the T4 complex.

Figure 6.1 – Bacteriophage Rb69 Genomic Map

The map was obtained from http://www.phage.org/images/rb69small.png

The following sections describe the work that was done on bacteriophage

Rb69 RNase H, both the native protein and the D132N mutant. Additional work on the Rb69 32-B protein that was done by other lab members is also detailed.

238

6.2. Bacteriophage Rb69 Native RNase H

6.2.1. Introduction

The bacteriophage Rb69 RNase H is a 5’ to 3’ exonuclease. For more information on the RNase H protein in general, see Section 1.1.3. The amino- acid sequence was aligned with the one of its T4 homologue, and the sequence alignment is presented in Figure 6.2. The two proteins are 75% identical, with an

85% positive score. This shows how closely related the Rb69 and T4 proteins are, which is why studying the Rb69 RNase H + 32-B complex could shed more light on some of the results obtained with the T4 proteins.

Figure 6.2 – Sequence Alignement of T4 RNase H and Rb69 RNase H

1 MDLEMMLDEDYKEGIALADFSNIALAAALNNFEDGDKITVPMVRHVVLNSIRKNVVMFRK 60 MDLEMMLDEDYKEGI L DFS IAL+ AL NF D +KI + MVRH++LNSI+ NV + 1 MDLEMMLDEDYKEGICLIDFSQIALSTALVNFPDKEKINLSMVRHLILNSIKFNVKKAKT 60

61 QGYTKFVLCMDNATSGYWRRDFAYYYKKNRKTDREASKWDWEGYFTALHQVVDEIKKYMP 120 GYTK VLC+DNA SGYWRRDFAYYYKKNR RE S WDWEGYF + H+V+DE+K YMP 61 LGYTKIVLCIDNAKSGYWRRDFAYYYKKNRGKAREESTWDWEGYFESSHKVIDELKAYMP 120

121 YVVMDIDKYEADDHIGVLTKYLSLAGHKVCIVASDGDFTQLHKYPNVKQWSPPQKKWVKI 180 Y+VMDIDKYEADDHI VL K SL GHK+ I++SDGDFTQLHKYPNVKQWSP KKWVKI 121 YIVMDIDKYEADDHIAVLVKKFSLEGHKILIISSDGDFTQLHKYPNVKQWSPMHKKWVKI 180

181 KNGSAEIDCMTKILKGDRKDGVASVRVRGDFWFTRVEGERTPSMKTTIIEALANDRSQAE 240 K+GSAEIDCMTKILKGD+KD VASV+VR DFWFTRVEGERTPSMKT+I+EA+ANDR QA+ 181 KSGSAEIDCMTKILKGDKKDNVASVKVRSDFWFTRVEGERTPSMKTSIVEAIANDREQAK 240

241 VLLSAEEYKRYQENLVLIDFDYIPDNIASTIIEYYNSYQPQPKGKIYSYFVKSGLSKLTS 300 VLL+ EY RY+ENLVLIDFDYIPDNIAS I+ YYNSY+ P+GKIYSYFVK+GLSKLT+ 241 VLLTESEYNRYKENLVLIDFDYIPDNIASNIVNYYNSYKLPPRGKIYSYFVKAGLSKLTN 300

301 VINEF 305 INEF 301 SINEF 305

The Rb69 RNase H sequence is shown in blue, the T4 RNase H is in green. Identities = 218/289 (75%), Positives = 246/289 (85%)

239

Rb69 RNase H is a 305 amino-acid protein, with a molecular weight of

35.4 kDa. Its calculated pI from the Expasy website is 6.97 and the extinction coefficient at 280 nm is 1.78 (Gill and von Hippel, 1989; Gasteiger et al., 2003). A summary of the Rb69 RNase H characteristics is given in Table 6.1.

Table 6.1 – Rb69 RNase H Characteristics

Amino-acids Molecular Weight Calculated pI ε

35.4 kDa 305 6.97 1.78 (40.0 kDa with HisTag)

6.2.2. Initial Cloning and Expression

An initial construct of Rb69 native RNase H in pET101 was previously

made by Hillary H. Voss, an undergraduate student in the Mueser lab. The

primers were designed according to the Rb69 rnh gene sequence found in the

NCBI GenBank database, with a CACC overhang on the forward primer to allow

insertion in the pET101 vector using the TOPO-assisted directional cloning

Gateway technology. The Rb69 rnh gene encodes for a 290 amino-acid protein, which was surprising as the T4 RNase H contains 305 amino-acids. Since T4 and Rb69 are closely related bacteriophages, it was expected that the two proteins would contain the same number of amino-acids and have high sequence similarities. The BLAST sequence alignment between the Rb69 RNase H and the

T4 RNase H, in Figure 6.3, shows that Rb69 RNase H is missing 15 amino-acids

at the C-terminus. The characteristics of that truncated Rb69 RNase H are

240 shown in Table 6.2, along with the T4 Native RNase H characteristics for comparison.

Figure 6.3 – Sequence Alignment of T4 Native and Rb69 Truncated RNase H

1 MDLEMMLDEDYKEGIALADFSNIALAAALNNFEDGDKITVPMVRHVVLNSIRKNVVMFRK 60 MDLEMMLDEDYKEGI L DFS IAL+ AL NF D +KI + MVRH++LNSI+ NV + 1 MDLEMMLDEDYKEGICLIDFSQIALSTALVNFPDKEKINLSMVRHLILNSIKFNVKKAKT 60

61 QGYTKFVLCMDNATSGYWRRDFAYYYKKNRKTDREASKWDWEGYFTALHQVVDEIKKYMP 120 GYTK VLC+DNA SGYWRRDFAYYYKKNR RE S WDWEGYF + H+V+DE+K YMP 61 LGYTKIVLCIDNAKSGYWRRDFAYYYKKNRGKAREESTWDWEGYFESSHKVIDELKAYMP 120

121 YVVMDIDKYEADDHIGVLTKYLSLAGHKVCIVASDGDFTQLHKYPNVKQWSPPQKKWVKI 180 Y+VMDIDKYEADDHI VL K SL GHK+ I++SDGDFTQLHKYPNVKQWSP KKWVKI 121 YIVMDIDKYEADDHIAVLVKKFSLEGHKILIISSDGDFTQLHKYPNVKQWSPMHKKWVKI 180

181 KNGSAEIDCMTKILKGDRKDGVASVRVRGDFWFTRVEGERTPSMKTTIIEALANDRSQAE 240 K+GSAEIDCMTKILKGD+KD VASV+VR DFWFTRVEGERTPSMKT+I+EA+ANDR QA+ 181 KSGSAEIDCMTKILKGDKKDNVASVKVRSDFWFTRVEGERTPSMKTSIVEAIANDREQAK 240

241 VLLSAEEYKRYQENLVLIDFDYIPDNIASTIIEYYNSYQPQPKGKIYSYL 290 VLL+ EY RY+ENLVLIDFDYIPDNIAS I+ YYNSY+ P+GKIYSY ------241 VLLTESEYNRYKENLVLIDFDYIPDNIASNIVNYYNSYKLPPRGKIYSYFVKAGLSKLTN 300

----- 301 SINEF 305

The Rb69 RNase H sequence is shown in blue, the T4 RNase H is in green. Identities = 218/289 (75%), Positives = 246/289 (85%)

Table 6.2 – Truncated Rb 69 RNase H vs. T4 RNase H Characteristics

Amino-Acids Molecular Weight pI ε

Rb69 RNase H 290 33.7 kDa 6.54 1.87

T4 RNase H 305 35.6 kDa 8.61 1.65

Nonetheless, the gene was amplified using PCR and was then inserted in the pET101 vector, as shown in Figure 6.4. The correct insertion was verified by

DNA sequencing at the Plant-Microbe Genomics Facility at Ohio State University.

241

Figure 6.4 – pET 101 Insert of the Rb69 rnh Gene

1 2 1- Rb69 rnh in pET 101 2- Supercoiled DNA ladder

The correct sizes are as follows: 5 kb • rnh 873 bp • pET101 5753 bp • Total 6623 bp

2 kb The pET101 + rnh construct runs between 6 and 7 kb, which indicates that rnh is inserted in the vector.

The pET101 + rnh vector was transformed in several expression cell lines, namely BL21 (DE3) pLysS, Rosetta Blue (DE3) and T7 Express lac Iq. Protein expression was induced with 1 mM IPTG and checked on an SDS-PAGE gel that is shown in Figure 6.5. Rb69 RNase H is expressed in both Rosetta Blue (DE3) and T7 express cell lines.

Figure 6.5 – Rb69 RNase H Expression

1 2 3 4 5 6 7 1- BL21 (DE3) pLysS expression – 0h sample

66.3 kDa 2- BL21 (DE3) pLysS expression – 3h sample 55.4 kDa 3- Rosetta Blue (DE3) expression – 0h sample 36.5 kDa 31.0 kDa 4- Rosetta Blue (DE3) expression – 3h sample

21.5 kDa 5- T7 Express expression – 0h sample 6- T7 Express expression – 3h sample 7- Molecular Weight Marker

242

A 1 L culture was grown for both cell lines, and the cells were lysed in the

presence of a low salt or high salt buffer. The low salt buffer was composed of

50 mM Tris HCl pH 8.0, 200 mM NH4Cl, 10 mM MgCl2, 5 % glycerol, 2 mM DTT

and 0.03% PEI. The high salt buffer had the same composition but contained 1 M

NH4Cl. The solubility of Rb69 RNase H was assessed by SDS-PAGE gel, which is shown in Figure 6.6. The protein was present in the pellet and therefore insoluble in all cases.

Figure 6.6 – Rb69 RNase H Cell Lysis

1 2 3 4 5 6 7 8 9 1- Rosetta Blue (DE3) low salt cell lysis – pellet 2- Rosetta Blue (DE3) low salt cell lysis – supernatant

66.3 kDa 3- Rosetta Blue (DE3) high salt cell lysis – pellet 55.4 kDa 4- Rosetta Blue (DE3) high salt cell lysis – supernatant 5- T7 Express low salt cell lysis – pellet 36.5 kDa 31.0 kDa 6- T7 Express low salt cell lysis – supernatant 7- T7 Express high salt cell lysis – pellet 21.5 kDa 8- T7 Express high salt cell lysis – supernatant 14.4 kDa 9- Molecular Weight Marker

The fact that Rb69 RNase H is insoluble isn’t very surprising, since the

C-terminus of the T4 RNase H is known to play an important role in solubility.

C-terminal deletions of the T4 RNase H were also found to have solubility issues

(Gangisetty et al., 2005). Thus, a closer look was taken at the nucleotide sequence of the C-terminus of Rb69 RNase H, which was aligned with the T4 nucleotide sequence. This is shown in Figure 6.7.a.

243

Figure 6.7 – Nucleotide and Amino-Acid C-terminus Sequence Alignment of

T4 and Rb69 RNase H

6.7.a – Truncated Rb69 RNase H Sequence Alignment

GAG TAT TAT AAC TCA TAT CAA CCA CAA CCT AAA GGC AAG ATT TAT TCA TAC 283 E Y Y N S Y Q P Q P K G K I Y S Y AAT TAC TAT AAT TCA TAT AAA TTA CCA CCG CGT GGC AAA ATT TAT TCA TAT 283 N Y Y N S Y K L P P R G K I Y S Y

TTG TAA AAT CCG GTC TTT CTA AAT TAA CAA GTG TAA TTA ATG AAT TCT GAG 290 L STOP N P V F L N stop Q V stop L M N S E TTT GTA AAA GCG GGT CTT TCT AAA TTA ACT AAT AGC ATT AAT GAA TTT TGA 290 F V K A G L S K L T N S I N E F stop

6.7.b – Full Length Rb69 RNase H Sequence Alignment

GAG TAT TAT AAC TCA TAT CAA CCA CAA CCT AAA GGC AAG ATT TAT TCA TAC 283 E Y Y N S Y Q P Q P K G K I Y S Y AAT TAC TAT AAT TCA TAT AAA TTA CCA CCG CGT GGC AAA ATT TAT TCA TAT 283 N Y Y N S Y K L P P R G K I Y S Y

TTT GTA AAA TCC GGT CTT TCT AAA TTA ACA AGT GTA ATT AAT GAA TTC TGA 290 F V K S G L S K L T S V I N E F stop TTT GTA AAA GCG GGT CTT TCT AAA TTA ACT AAT AGC ATT AAT GAA TTT TGA 290 F V K A G L S K L T N S I N E F stop

The Rb69 RNase H amino-acid sequence is shown in blue, the T4 RNase H is in green. The extra Thymidine base that was added in the full length Rb69 RNase H nucleotide sequence is shown in red.

It can be seen that the codons in the Rb69 RNase H sequence seem to be off frame with the ones in the T4 RNase H sequence, right after amino-acid 290

(Leucine 290). If a Thymidine base is added between the Thymidine and

Guanosine bases in the codon coding for Leucine 290, as shown in Figure 6.7.b, the sequences fall back on frame with one another. Also, the new C-terminus amino-acid sequence for Rb69 RNase H aligns much better with the T4, and a stop codon is created after Phenylalanine 305. This indicates that a sequencing

244 error was made in the NCBI GenBank. The new sequence alignment for the full length Rb69 RNase H and the T4 RNase H is shown in Figure 6.7.b.

6.2.3. Molecular Cloning

New primers had to be designed in order to clone the full length RNase H from Rb59. The reverse primer was designed according to the new C-terminal sequence that is described in the previous section. Also, the calculated pI of

Rb69 full length RNase H (see Table 6.1) is very close to 7, and it might be challenging to purify the protein using regular ion-exchange chromatography.

Therefore, a new forward primer was designed so that the rnh gene can be inserted in the pDEST-C1 expression vector, which incorporates a 6xHis-Tag on the N-terminus of the protein. The forward primer contains a CACC overhang necessary for TOPO-assisted directional insertion in the entry vector, and a TEV protease cleavage site which will allow the His-Tag to be cleaved off after purification. The new set of primers is shown in Figure 6.8.

Figure 6.8 – Rb69 Full Length RNase H PCR Primers

Forward Primer rnh 5’- ATG GAT TTA GAA -3’ primer 5’- C ACC GAG AAC CTC TAC TTC CAA GGA ATG GAT TTA GAA -3’ rnh 5’- ATG ATG TTG GAT GAA GAT TA… -3’ primer 5’- ATG ATG TTG GAT GAA GAT TA -3’

5’- C ACC GAG AAC CTC TAC TTC CAA GGA ATG GAT TTA GAA ATG ATG TTG GAT GAA GAT TA -3’

32 bp from rnh, 57 bp total 28% GC content Tm = 55°C

245

Reverse Primer (Inverse Complement) rnh 5’- …CTT TCT AAA TTA ACA AGT GTA ATT AAT GAA TTC TGA -3’ primer 3’- GAA AGA TTT AAT TGT TCA CAT TAA TTA CTT AAG ACT -5’

5’- TCA GAA TTC ATT AAT TAC ACT TGT TAA TTT AGA AAG -3’

36 bp total 22% GC content Tm = 55°C

The forward primer was aligned with the nucleotide sequence of the rnh gene. The start codon is highlighted in green. In red is the CACC overhang necessary for insertion in the pDEST-C1 vector, and the sequence coding for the TEV protease cleavage site is shown in blue. The reverse primer is shown as the inverse complement of rnh, where the stop codon is highlighted in red.

The PCR reaction was performed as described in Table 6.3. The annealing temperature was chosen as the melting temperature + 2°C, so 57°C. It is recommended to use 1 minute/kb for the extension time, therefore 1 minute was used as the rnh gene is 918 bp long.

Table 6.3 – Rb69 RNase H PCR Reaction

PCR Reaction PCR Program

Proof Start Buffer (10X) 5 µL Activation 95 °C, 4 minutes

Primer solution (10 µM) 10 µL Denaturation 95 °C, 1 minute

dNTPs (10 mM) 2 µL Annealing 57 °C, 1 minute

Proof Start Polymerase (2.5 U/µL) 2 µL Extension 68 °C, 1 minute

Rb69 genome 1 µL 32 cycles

Autoclaved water 35 µL

The PCR product was then run on a 1.5% agarose gel that is shown in

Figure 6.9.a. Since the band corresponding to the rnh gene amplified by PCR

246 runs at the correct size, it was excised from the gel and the gene was purified using the MiniElute® gel purification kit from Qiagen.

The rnh PCR product was first inserted in the pET101 expression vector.

The advantage of pET101 is that it allows direct insertion of the PCR product in the expression vector, therefore making the cloning process faster. However, pET101 constructs seldom yield expression of the protein of interest. The reaction setup is shown in Table 6.4, and it was left to incubate at Room

Temperature for 30 minutes.

Table 6.4 – Rb69 RNase H Insertion in pET101 Reaction

pET101 Insertion Reaction

Purified PCR product 2 µL

Salt Solution 1 µL

pET101 vector 1 µL

Autoclaved water 2 µL

The salt solution is provided with the pET101 cloning kit, and is composed of 1.2 M NaCl + 0.06 M MgCl2. The pET101 vector is already linearized and covalently linked to Topoisomerase I.

A 2 µL sample of the reaction were transformed into 50 µL of competent

Top10 cells. After the transformation reaction, 50 µL and 100 µL of the cells were then respectively plated on two Petri dishes containing LB Agar + Ampicillin, as pET101 contains the AmpR Ampicillin resistance gene. The plates were then incubated at 37 °C until good-sized colonies could be seen. Several colonies

247

were picked on each plate and grown overnight in a liquid media containing LB

and Ampicillin. The next day, the cells were pelleted down and the plasmid DNA

was isolated using the MiniPrep® kit from Qiagen. Glycerol stocks of all the colonies were also made at OD600 = 0.4. The plasmids obtained from the different

colonies were run on a 1% agarose gel that is shown in Figure 6.9.b. The

plasmids from colonies 3, 4 and 6 seem to have the right insert, despite the bad

quality of the gel. All three plasmids were transformed in a competent expression

host to check for protein expression.

Because of the limited success of pET101 constructs in terms of protein

expression, the PCR purified gene was also inserted in the pENTR-D entry

vector. This work was done in parallel with the cloning in pET101. The pENTR-D

reaction setup is shown in Table 6.5. Once all the components were added to the

reaction, it was incubated at Room Temperature for 30 minutes.

Table 6.5 – Rb69 RNase H Insertion in pENTR-D Reaction

pENTR-D Insertion Reaction

Purified PCR product 2 µL

Salt Solution 1 µL

pENTR-D vector 1 µL

Autoclaved water 2 µL

The salt solution is provided with the pENTR-D cloning kit, and is composed of 1.2 M NaCl + 0.06 M MgCl2. The pENTR-D vector is already linearized and covalently linked to Topoisomerase I.

248

A 2 µL sample of the reaction were transformed into 50 µL of competent

Top10 cells. Then, 50 µL and 100 µL of the transformed cells were respectively

plated on two Petri dishes containing LB Agar + Kanamycin, as the pENTR-D

vector carries the Kanamycin resistance gene. The plates were incubated at

37 °C overnight. Several colonies were picked on each plate and grown

overnight in a liquid media containing LB and Kanamycin. The next day, the cells

were pelleted down and the plasmid DNA was isolated using the MiniPrep kit.

Glycerol stocks of all the colonies were also made at OD600 = 0.4. The plasmids

obtained from the different colonies were run on a 1% agarose gel that is shown

in Figure 6.9.c. All four plasmids have the correct size, indicating that the gene

was correctly inserted in the pENTR-D entry vector. However, four bands can be

seen on the gel for each plasmid, when only three are expected (supercoiled

DNA, linear DNA and nicked DNA). The presence of a fourth band is an

indication that another DNA species is present and might hinder any chance of

success in the following step, which is the insertion of the gene in the expression

vector. To avoid any difficulty, the supercoiled plasmid from colony 2,

corresponding to the strongest, lowest band around 3500 bp was gel purified with

the MiniElute kit.

Next, the LR Clonase reaction was performed, in order to swap the rnh

gene from the entry vector into the pDEST-C1 expression vector. The reaction setup is described in Table 6.6. It should be noted that in this case, the LR

Clonase enzyme is not present in the destination vector and has to be added

249

separately. The 1X TE buffer is composed of 20 mM Tris HCl pH 8.0 + 1 mM

EDTA.

Table 6.6 – Rb69 RNase H Insertion in pDEST-C1 Reaction

LR Clonase Reaction

Gel Purified rnh in pENTR-D 1 µL

pDEST-C1 1 µL

LR Clonase Mix 2 µL

TE buffer (1X) 6 µL

The reaction was incubated at Room Temperature for 2 hours and

terminated by adding 1 µL of Proteinase K at 2 µg/µL followed by incubation for

10 minutes at 37°C. The Proteinase K is a protease that degrades the LR

Clonase enzyme and leaves the DNA untouched, therefore ending the transposition reaction. 2 µL of the reaction were transformed into 100 µL of

competent DH5α cells. Top10 cells could not be used with the pDEST-C1 vector

since that particular cloning host is resistant to Streptomycin. The pDEST-C1

vector also carries the Streptomycin resistance gene, therefore no selection for

pDEST-C1 containing colonies can be achieved. Again, 50 µL and 100 µL of the

transformed DH5α cells were plated on two plates made with LB Agar +

Streptomycin. Colonies were picked on each plate, grown overnight in LB +

Streptomycin, and the plasmid was isolated from the cells with the MiniPrep kit.

Glycerol stocks were taken for all colonies at OD600 = 0.6. The different plasmids

were run on a 1% agarose gel, shown in Figure 6.9.d.

250

Figure 6.9 – Agarose Gels for Rb69 RNase H Cloning a 1 2 3 4

1- Rb69 rnh PCR product 2- Rb69 rnh PCR product

3- Rb69 rnh PCR product 1000 bp 4- 100 bp ladder 500 bp The rnh gene is 918 bp long, the PCR product has the correct size. 100 bp

b 1 2 3 4 5 6 7 8

1- Supercoiled DNA ladder 2- Rb69 rnh insert in pET101 – colony 1 3- Rb69 rnh insert in pET101 – colony 2

4- Rb69 rnh insert in pET101 – colony 3 5- Rb69 rnh insert in pET101 – colony 4 6- Rb69 rnh insert in pET101 – colony 5 5 kb 7- Rb69 rnh insert in pET101 – colony 6 8- Rb69 rnh insert in pET101 – colony 7 2 kb

The correct sizes are as follows: • rnh 918 bp • pET101 5753 bp • Total 6671 bp

The pET101 + rnh construct should run between 6 and 7 kb. Colonies 3, 4 and 6 look like they have the right insert, but the quality of the gel makes it difficult to assess the actual size of the different plasmids.

251 c 1 2 3 4 5 1- Rb69 rnh insert in pENTR-D – colony 1

2- Rb69 rnh insert in pENTR-D – colony 2 3- Rb69 rnh insert in pENTR-D – colony 3 4- Rb69 rnh insert in pENTR-D – colony 4 5- Supercoiled DNA ladder

The correct sizes are as follows: 5 kb • rnh 918 bp • pENTR-D 2580 bp • Total 3498 bp

2 kb The pENTR-D + rnh constructs run between 3 and 4 kb, which indicates that rnh is inserted in the vector.

d 1 2 3 4 5 6 7 8 1- Supercoiled DNA ladder 2- Rb69 rnh insert in pDEST-C1 – colony 1 3- Rb69 rnh insert in pDEST-C1 – colony 2

4- Rb69 rnh insert in pDEST-C1 – colony 3 5- Rb69 rnh insert in pDEST-C1 – colony 4 5 kb 6- Rb69 rnh insert in pDEST-C1 – colony 5 7- Rb69 rnh insert in pDEST-C1 – colony 6 2 kb 8- Rb69 rnh insert in pDEST-C1 – colony 7

The correct sizes are as follows: • rnh 918 bp • pDEST-C1 + 5334 bp • ccdB - 1600 bp • Total 4652 bp

The pDEST-C1 + rnh construct should run between 4 and 5 kb. Colonies 1 and 3 are contaminated with the pENTR-D + rnh construct (3500 bp), but all the other colonies show the right size.

252

Some contamination with the pENTR-D rnh insert can be seen on the plasmids isolated from colonies 1 and 3. These colonies were picked from the same plate (the one inoculated with 50 µL of transformed cloning host). The plasmid isolated from colony 2 has the correct size and shows the strongest band, it was therefore chosen to be transformed into an expression host.

6.2.4. Protein Expression and Solubility

The three pET101 + rnh plasmids isolated from colonies 3, 4 and 6 were

all transformed in competent T7 express cells (1 µL of plasmid / 25 µL of cells).

That specific expression cell line was chosen because protein expression is more

tightly regulated due to the presence of the lacIq promoter. Leaky expression of

the toxic RNase H is therefore greatly reduced, and better yields can be achieved. T7 express cells are resistant to Tetracyclin (see Table 2.1).

After the transformation, 50 µL and 100 µL of the reaction were separately

incubated overnight in 10 mL LB + Ampicillin + Tetracyclin. The next day, 500

µL of the overnight culture were again incubated in fresh LB + Ampicillin +

Tetracyclin, and the cells were allowed to grow until they reached OD600 = 0.6. At

that point, glycerol stocks were taken as well as SDS-PAGE samples, and

protein expression was induced by adding 1 mM IPTG. After three hours of

expression, samples were again taken. The SDS-PAGE for RNase H expression

is shown in Figure 6.10.a. It can be seen that a 55 kDa protein is strongly

expressed, while Rb69 RNase H is a 35.4 kDa protein. It was concluded that the

competent T7 express cells used for the transformation reaction were

253 contaminated with a plasmid containing a gene coding for a 55 kDa protein. This particular 55 kDa protein wasn’t identified.

A new transformation reaction was performed with a fresh batch of competent T7 express cells. The SDS-PAGE gel for RNase H expression is shown in Figure 6.10.b. This time, weak expression around 28 kDa can be seen, but again no RNase H was produced.

Figure 6.10 – Rb69 RNase H (pET101) Expression

a 1 2 3 4 5 6 7 1- Molecular Weight Marker 2- Rb69 RNase H expression (plasmid

from colony 3) – 0h sample 66.3 kDa 55.4 kDa 3- Rb69 RNase H expression (plasmid from colony 3) – 3h sample 36.5 kDa 4- Rb69 RNase H expression (plasmid 31.0 kDa from colony 4) – 0h sample 21.5 kDa 5- Rb69 RNase H expression (plasmid

14.4 kDa from colony 4) – 3h sample 6- Rb69 RNase H expression (plasmid

from colony 6) – 0h sample 7- Rb69 RNase H expression (plasmid from colony 6) – 3h sample

Three hours after IPTG induction, a 55 kDa protein is expressed. Rb69 RNase H should run around 35 kDa, so it is not RNase H that is being produced.

254

1 2 3 4 5 6 7 b 1- Molecular Weight Marker 2- Rb69 RNase H expression (plasmid 66.3 kDa from colony 3) – 0h sample 55.4 kDa 3- Rb69 RNase H expression (plasmid from colony 3) – 3h sample 36.5 kDa 4- Rb69 RNase H expression (plasmid 31.0 kDa from colony 4) – 0h sample 21.5 kDa 5- Rb69 RNase H expression (plasmid from colony 4) – 3h sample 14.4 kDa 6- Rb69 RNase H expression (plasmid from colony 6) – 0h sample 7- Rb69 RNase H expression (plasmid from colony 6) – 3h sample

Weak expression of a 28 kDa protein is occurring, but this is too small a protein to be RNase H. No band is seen at 35 kDa.

Since expression of Rb69 RNase H from the pET101 plasmid was

unsuccessful, it was decided to focus on the pDEST-C1 plasmid instead and test

it for protein expression.

As it has previously been mentioned, 0.7 µL of the pDEST-C1 + rnh

plasmid isolated from colony 2 (Figure 6.9.c) were transformed into competent

T7 express cells. A 100 µL sample of the transformed T7 express cells were

directly incubated in 10 mL of LB + Streptomycin + Tetracyclin and allowed to

grow overnight. The overnight cell culture was then used the next day to

inoculate fresh LB + Streptomycin + Tetracyclin, The cells were grown until OD600

= 0.6, when a glycerol stock was taken and protein expression was induced by

255 adding 1 mM IPTG. 0 h and 3 h samples were taken and run on an SDS-PAGE gel, shown in Figure 6.11.a.

Once protein expression was assessed, a 1 L culture was grown using the same protocol so that a decently sized cell pellet could be obtained for cell lysis.

A 1 g pellet was obtained and lysed in the following buffer: 50 mM Tris HCl pH 7.5, 200 mM NH4Cl, 10 mM MgCl2, 5% glycerol, 2 mM DTT and 0.03% PEI.

The cell lysis protocol is described in Section 2.3. The pellet and supernatant samples were run on an SDS-PAGE gel shown in Figure 6.11.b. Some Rb69

RNase H is found insoluble in the pellet, but a good fraction of the protein is also soluble and found in the supernatant. The same phenomenon was observed for

T4 RNase H. The protein was declared soluble, and a 6 L culture was prepared for further purification studies.

Figure 6.11 – Rb69 RNase H (pDEST-C1) Expression and Cell Lysis

1 2 3 a

1- Molecular Weight Marker 66.3 kDa 55.4 kDa 2- Rb69 RNase H expression – 0h sample 3- Rb69 RNase H expression – 3h sample 36.5 kDa 31.0 kDa

21.5 kDa

14.4 kDa

256

b

1 2 3

66.3 kDa 55.4 kDa 1- Molecular Weight Marker 36.5 kDa 2- Rb69 RNase H cell lysis – pellet 31.0 kDa 3- Rb69 RNase H cell lysis – supernatant 21.5 kDa

14.4 kDa

Once protein expression and solubility were assessed, the pDEST-C1 +

rnh plasmid chosen for the expression studies was sent to the Plant-Microbe

Genomics Facility at Ohio State University for DNA sequencing. The results are presented in Appendix 3, and show that the Rb69 rnh gene was correctly amplified and inserted in the pDEST-C1 vector.

6.2.5. Protein Purification

To purify Rb69 RNase H, it was decided to take advantage of the presence of an N-terminal 6xHisTag and to purify the protein using a Cobalt affinity column or Talon column. However, the protein had first to be roughly isolated from the lysate by ion-exchange chromatography.

The first purification scheme that was designed was the following:

RNase H was first loaded on the low-resolution cation-exchange column SP

257

Sepharose. The elution from the SP Sepharose was then further purified with the

Talon column. Since the pI of Rb69 RNase H calculated from Expasy (Gill and

von Hippel, 1989) is very close to 7, as is shown in Table 6.1, the pH of the lysis

buffer had to be lowered below 7 so that the protein would be in a cationic form.

Thus, the cell pellet was first lysed with the buffer shown in Table 6.7.

Table 6.7 – Lysis and HPLC buffers for Rb69 RNase H purification scheme 1

Ion Exchange Metal Affinity Lysis (SP Sepharose) (Talon) 50 mM bis-Tris HCl pH 6.5 50 mM Na Phosphate pH 8.0 200 mM NH Cl 25 mM bis-Tris HCl pH 6.5 4 300 mM NaCl 10 mM MgCl 100 mM NH Cl Buffers 2 4 10 mM MgCl 5% glycerol 10 mM MgCl 2 2 + 7.5 mM Imidazole (wash) 2 mM DTT 0 - 1 M NaCl or 150 mM Imidazole (elution) 0.03% PEI

Buffer A: ~ 16 mS/cm Conductivity ~ 18 mS/cm ~ 35 mS/cm Buffer B: ~ 96 mS/cm

Elution / 100% A → 100% B 100% A → 50% B

The results of the protein expression and cell lysis with that particular

buffer are presented in Figure 6.12.a. The majority of the protein was found in the

lysis pellet and very little in the lysate, which indicates that the protein is much

less soluble at pH 6.5, as compared to the cell lysis that was done at pH 7.5

(Figure 6.11.b) where a good fraction of the protein was found in the lysate.

Nonetheless, the soluble fraction was loaded on the SP Sepharose. The buffers

used for that step are also shown in Table 6.7, the chromatogram and

SDS-PAGE gel from the run are presented in Figure 6.12.b. RNase H bound to

the SP resin and was eluted fairly purely for a first purification step. The fractions

258

containing the protein were pooled to be run on the metal affinity Talon column.

They were first dialyzed in the Talon equilibration buffer that contains no

imidazole. The different buffers used with the Talon column are shown in Table

6.7. It should be noted that magnesium chloride was added to the buffers to

increase the solubility of RNase H. As a result, magnesium phosphate tribasic

Mg3(PO4)2 precipitated out of solution and made that purification step difficult.

The protein eluted from the SP Sepharose was loaded in several batches on the

Talon column and eluted with 75 mM imidazole. The chromatogram and SDS-

PAGE gel from the first batch are shown in Figure 6.12.c. The fractions containing the protein were again pooled and concentrated. The pH of the solution also had to be lowered to 6.0 to avoid precipitation of magnesium phosphate.

Figure 6.12 – Rb69 RNase H purification scheme 1

Figure 6.12.a – Cell lysis

1 2 3 4 5

66.3 kDa 1- Molecular Weight Marker 55.4 kDa 2- Rb69 RNase H expression – 0h sample

36.5 kDa 3- Rb69 RNase H expression – 3h sample 31.0 kDa 4- Rb69 RNase H cell lysis – pellet

21.5 kDa 5- Rb69 RNase H cell lysis – supernatant

14.4 kDa

259

Figure 6.12.b – SP Sepharose

* * *

1 2 3 4 5

1- Molecular Weight Marker 66.3 kDa 55.4 kDa 2- SP Sepharose – F. 7 3- SP Sepharose – F. 10 36.5 kDa 4- SP Sepharose – F. 13 31.0 kDa 5- SP Sepharose – Flow Through 21.5 kDa

14.4 kDa

On the chromatogram, the OD260 is shown in purple, the OD280 in green, the conductivity in red and the % B in black. The red stars indicate which fractions were run on a SDS-PAGE gel. Here, fractions 11 to 18 contained RNase H and were pooled to be run on the Talon column. The pooled fractions are indicated with a red line on the chromatogram.

260

Figure 6.12.c – Talon

* *

1 2 3 4 5

66.3 kDa 1- Talon – Load 55.4 kDa 2- Talon – F. 18

3- Talon – F. 32 36.5 kDa 31.0 kDa 4- Talon – Flow Through 5- Molecular Weight Marker 21.5 kDa 14.4 kDa

Here, fractions 14 to 40 contained RNase H and were pooled for concentration. A large portion of the protein didn’t bind to the resin and was found in the Flow Through fraction.

261

The first purification scheme yielded less than a milligram of pure protein

(from 11.6 g of cells), the low yield being mostly due to the incapacity to extract

RNase H in the lysis buffer at pH 6.5.

As more protein needed to be purified, a second purification scheme was

designed. This time, the lysis buffer had a higher pH of 8.0, to improve the

extraction efficiency, and the lysate was first loaded on a low resolution

anion-exchange column, the Q Sepharose, before being run on the Talon

column. The new set of buffers is described in Table 6.8.

Table 6.8 – Lysis and HPLC buffers for Rb69 RNase H purification scheme 2

Ion Exchange Metal Affinity Lysis (Q Sepharose) (Talon) 50 mM Tris HCl pH 8.0 200 mM NH4Cl 25 mM bis-Tris HCl pH 8.0 50 mM Na Phosphate pH 8.0 10 mM MgCl 100 mM NH Cl 300 mM NaCl Buffers 2 4 5% glycerol 10 mM MgCl2 7.5 mM Imidazole (wash) 2 mM DTT 0 - 1 M NaCl or 150 mM Imidazole (elution) 0.03% PEI

Buffer A: ~ 16 mS/cm Conductivity ~ 18 mS/cm ~ 30 mS/cm Buffer B: ~ 98 mS/cm

Elution / 100% A → 100% B 100% A → 50% B

After cell lysis, a larger amount of RNase H was found in the supernatant,

compared to Scheme 1. The SDS-PAGE gel for the lysis is shown in Figure

6.13.a. The soluble fraction was then loaded on the low resolution anion

exchange Q Sepharose column. The buffers for this step are presented in Table

6.8, the chromatogram and SDS-PAGE in Figure 6.13.b. RNase H didn’t stick

very tightly to the resin and most of it was eluted while the column was being

262

rinsed with buffer A. The rest of the protein was eluted with a salt gradient at the

beginning of the run. All the fractions containing RNase H were pooled and

dialyzed overnight in the Talon equilibration buffer. After dialysis, precipitate was

found in the protein solution. There was a possibility that this was due to

Magnesium Phosphate precipitation again, as the Q buffer contains Mg2+ and the

Talon buffer phosphate ions, but the precipitate didn’t dissolve when the pH of the solution was decreased. The precipitate was then run on an SDS-PAGE gel and identified as RNase H (see Figure 6.13.c, lane 1). It was pelleted down and resuspended in the Q buffer A, then kept at 4°C to allow the protein to go back in solution but none of the precipitate redissolved. The supernatant after the dialysis step was further purified with the Talon column, using the buffers described in

Table 6.8. Like in scheme 1, RNase H was loaded onto the column in several batches. The chromatogram and SDS-PAGE for one of the Talon runs are shown in Figure 6.13.c. It should be noted that some of the protein is eluted when washing the column with the rinse buffer containing 7.5 mM imidazole, as can be seen on the rinse fraction in lane 5. The fractions containing RNase H were pooled and concentrated.

263

Figure 6.13 – Rb69 Full Length RNase H purification scheme 2

Figure 6.13.a – Cell lysis

1 2 3

1- Molecular Weight 66.3 kDa 2- Rb69 RNase H cell lysis – pellet 55.4 kDa 3- Rb69 RNase H cell lysis – supernatant

36.5 kDa 31.0 kDa

21.5 kDa

264

Figure 6.13.b – Q Sepharose

* * * * * * * *

1- Q Sepharose – Load 1 2 3 4 5 6 7 8 9 10 11 12 2- Q Sepharose – F. 11 3- Q Sepharose – F. 13 4- Q Sepharose – F. 15

5- Q Sepharose – F. 18 66.3 kDa 55.4 kDa 6- Q Sepharose – F. 21 7- Q Sepharose – F. 23 36.5 kDa 31.0 kDa 8- Q Sepharose – F. 25 9- Q Sepharose – F. 27 21.5 kDa 10- Q Sepharose – Rinse fraction 11- Q Sepharose – Flow Through 12- Molecular Weight Marker

On the chromatogram, the OD260 is shown in purple, the OD280 in green, the conductivity in red and the % B in black. The red stars indicate which fractions were run on a SDS-PAGE gel. Here, fractions 1 to 16 as well as the rinse fraction contained RNase H and were pooled to be run on the Talon column. The pooled fractions are indicated with a red line on the chromatogram.

265

Figure 6.13.c – Talon

**

1 2 3 4 5 6 7 8

1- Precipitate from the dialysis 2- Talon – Load 66.3 kDa 55.4 kDa 3- Talon – F. 21 4- Talon – F. 28 36.5 kDa 31.0 kDa 5- Talon – Flow Through 6- Talon – Rinse fraction (concentrated) 21.5 kDa 7- Talon – 150 mM Imidazole Elution 14.4 kDa 8- Molecular Weight Marker

Here, fractions 18 to 40 contained RNase H and were pooled for concentration. A large portion of the protein didn’t bind to the resin and was found in the Rinse fraction. This particular fraction was concentrated before being loaded on the gel which is why it looks more concentrated.

266

The final yield for this purification scheme was around 2 mg of pure

protein from 6.8 g of cells. Most of the protein was lost due to its precipitation and

could not be brought back in solution.

A different approach was used for the third purification scheme. Since

RNAse H doesn’t seem to bind to the metal affinity column, maybe because the

His-Tag is not accessible for binding to the Co2+, the protein would be purified

using ion-exchange chromatography only. The different buffers for this scheme

are detailed in Table 6.9.

Table 6.9 – Lysis and HPLC buffers for Rb69 RNase H purification scheme 3

Ion Exchange Size Exclusion Lysis (SP Sepharose and (Superdex 75) POROS HS)

50 mM Tris HCl pH 7.5 200 mM NH Cl 25 mM bis-Tris HCl pH 6.5 4 25 mM bis-Tris HCl pH 6.5 10 mM MgCl 100 mM NH Cl Buffers 2 4 150 mM NH Cl 5% glycerol 10 mM MgCl 4 2 10 mM MgCl 2 mM DTT 0 - 1 M NaCl 2 0.03% PEI

Buffer A: ~ 16 mS/cm Conductivity ~ 18 mS/cm / Buffer B: ~ 96 mS/cm

Elution / 100% A → 100% B /

The cell pellet was first lysed at pH 7.5 to yield soluble protein, using the

lysis buffer described inTable 6.9. The soluble portion, containing RNase H, was

run on an SDS-PAGE gel, which is shown in Figure 6.14.a, lane 2. This

supernatant was then loaded on the same low resolution cation exchange

column that was used in Scheme 1, the SP Sepharose, the reason being that

RNase H seemed to bind more strongly to the cation exchange resin compared

267

to the anion exchange one. The chromatogram and SDS-PAGE gel

corresponding to the SP Sepharose run are presented in Figure 6.14.a. Similarly

to Scheme 1, the protein was eluted fairly pure from the SP Sepharose, and the

fractions containing RNase H were pooled and loaded on the next column, the

high resolution cation exchange Poros HS. The protein eluted from the Poros

HS is still contaminated with higher molecular weight impurities, as it can be seen on Figure 6.14.b. Therefore, the protein was run on a size exclusion column to be further purified. The Superdex 75 was chosen as its void fraction (75 kDa and higher) is significantly larger than RNase H (around 40 kDa with the His-Tag).

Thus, RNase H should be eluted during the run and the impurities either earlier in the run or in the void fraction. Indeed, RNase H was purified away from the impurities (see Figure 6.14.c) and was then concentrated.

268

Figure 6.14 – Rb69 Full Length RNase H purification scheme 3

Figure 6.14.a – SP Sepharose

* *

1 2 3 4 5

1- Molecular Weight Marker

66.3 kDa 2- SP Sepharose – Load 55.4 kDa 3- SP Sepharose – F. 10 4- SP Sepharose – F. 14 36.5 kDa 31.0 kDa 5- SP Sepharose – Flow Through

21.5 kDa

On the chromatogram, the OD260 is shown in purple, the OD280 in green, the conductivity in red and the % B in black. The red stars indicate which fractions were run on a SDS-PAGE gel. Here, fractions 13 to 19 contained RNase H and were pooled to be run on the Poros HS. The pooled fractions are indicated with a red line on the chromatogram.

269

Figure 6.14.b – Poros HS

*

1 2 3 4

1- Poros HS – Load 2- Poros HS – F. 12 66.3 kDa 55.4 kDa 3- Poros HS – Flow Through 4- Molecular Weight 36.5 kDa 31.0 kDa

21.5 kDa

Here, RNase H is found in the main peak from fraction 6 to 16, and is almost pure. These fractions were pooled and run on a size exclusion column.

270

Figure 6.14.c – Superdex 75

* ** *

1 2 3 4 5 6

1- Molecular Weight

66.3 kDa 2- Superdex 75 - Load 55.4 kDa 3- Superdex 75 – F. 2 4- Superdex 75 – F. 10 36.5 kDa 31.0 kDa 5- Superdex 75 – F. 12 6- Superdex 75 – F. 20 21.5 kDa

The high molecular weight impurities are found in fraction 10 while RNase H is in fraction 12. A low molecular weight impurity was also eluted of the column in fraction 20. Fractions 11 to 14 were pooled and concentrated.

271

The final yield for this purification scheme was close to 5 mg for a starting

mass of 7 g of cells. A large amount of protein was lost, due to problems with the

fraction collector, and the expected yield should be around 15 to 20 mg. This is a

considerable improvement compared to the previous purification schemes, and

Scheme 3 was chosen as the preferred purification protocol for Rb69 RNase H.

However, these yields are still rather low, and this is most likely due to the

toxicity of native RNase H once it is overexpressed. Similarly to the strategy that

was adopted for the Bacteriophage T4 RNase H, an inactive mutant, D132N

RNase H, was cloned. This mutant has greatly reduced nuclease activity and

larger amounts of protein can be obtained. Moreover, an inactive RNase H would

have to be used for DNA binding studies, providing another reason for cloning

this mutant.

6.3. Bacteriophage Rb69 D132N RNase H

6.3.1. Introduction

The D132N mutant for the bacteriophage T4 RNase H is an inactive

mutant, as Asp 132 is involved in Mg2+ binding in the active site. Mutating this

aspartate into an asparagine prevents metal binding in site 1 (see Section 1.1.3)

and therefore removes nuclease activity. The same mutant can be made for the

Rb69 RNase H. A sequence alignment (see Figure 6.15) shows that residue 132

in Rb69 RNase H is the same aspartate that was involved in metal binding in T4

RNase H, thus the same D132N mutant in Rb69 RNase H would also be inactive.

272

Figure 6.15 – Sequence Alignement of T4 RNase H and Rb69 RNase H

1 MDLEMMLDEDYKEGIALADFSNIALAAALNNFEDGDKITVPMVRHVVLNSIRKNVVMFRK 60 MDLEMMLDEDYKEGI L DFS IAL+ AL NF D +KI + MVRH++LNSI+ NV + 1 MDLEMMLDEDYKEGICLIDFSQIALSTALVNFPDKEKINLSMVRHLILNSIKFNVKKAKT 60

61 QGYTKFVLCMDNATSGYWRRDFAYYYKKNRKTDREASKWDWEGYFTALHQVVDEIKKYMP 120 GYTK VLC+DNA SGYWRRDFAYYYKKNR RE S WDWEGYF + H+V+DE+K YMP 61 LGYTKIVLCIDNAKSGYWRRDFAYYYKKNRGKAREESTWDWEGYFESSHKVIDELKAYMP 120

121 YVVMDIDKYEADDHIGVLTKYLSLAGHKVCIVASDGDFTQLHKYPNVKQWSPPQKKWVKI 180 Y+VMDIDKYEADDHI VL K SL GHK+ I++SDGDFTQLHKYPNVKQWSP KKWVKI 121 YIVMDIDKYEADDHIAVLVKKFSLEGHKILIISSDGDFTQLHKYPNVKQWSPMHKKWVKI 180

181 KNGSAEIDCMTKILKGDRKDGVASVRVRGDFWFTRVEGERTPSMKTTIIEALANDRSQAE 240 K+GSAEIDCMTKILKGD+KD VASV+VR DFWFTRVEGERTPSMKT+I+EA+ANDR QA+ 181 KSGSAEIDCMTKILKGDKKDNVASVKVRSDFWFTRVEGERTPSMKTSIVEAIANDREQAK 240

241 VLLSAEEYKRYQENLVLIDFDYIPDNIASTIIEYYNSYQPQPKGKIYSYFVKSGLSKLTS 300 VLL+ EY RY+ENLVLIDFDYIPDNIAS I+ YYNSY+ P+GKIYSYFVK+GLSKLT+ 241 VLLTESEYNRYKENLVLIDFDYIPDNIASNIVNYYNSYKLPPRGKIYSYFVKAGLSKLTN 300

301 VINEF 305 INEF 301 SINEF 305

The Rb69 RNase H sequence is shown in blue, the T4 RNase H is in green. Identities = 218/289 (75%), Positives = 246/289 (85%) The Aspartate mutated to an Asparagine in the D132N mutant is highlighted in red.

The characteristics of Rb69 D132N RNase H are shown in Table 6.10.

They are similar to the native RNase H, except for the calculated pI which is

significantly higher.

Table 6.10 – Rb69 D132N RNase H characteristics

Amino-acids Molecular Weight Calculated pI ε

35.4 kDa 305 7.62 1.78 ( 40.0 kDa with HisTag)

273

6.3.2. Molecular Cloning

The forward and reverse primers for the site-directed mutagenesis reaction were designed according to the QuikChange® protocol (see Section

2.1.3), they are shown in Figure 6.16.

Figure 6.16 – Site-Directed Mutagenesis PCR primers for Rb69 D132N

RNase H Cloning

Forward Primer

rnh 5’– ATT GAC AAA TAC GAA GCG GAT GAC CAT ATC GGC GTA TTA –3’ I D K Y E A D D H I G V L Primer 5’– GAC AAA TAC GAA GCG AAT GAC CAT ATC GGC G –3’ D K Y E A N D H I G

5’– GAC AAA TAC GAA GCG AAT GAC CAT ATC GGC G –3’

Reverse Primer

rnh 3’– TAA CTG TTT ATG CTT CGC CTA CTG GTA TAG CCG CAT AAT –5’ Primer 3’– CTG TTT ATG CTT CGC TTA CTG GTA TAG CCG C –5’

5’– C GCC GAT ATG GTC ATT CGC TTC GTA TTT GTC –3’

31 bp total

The forward primer for the site-directed mutagenesis PCR reaction is shown on the top. It was aligned with the nucleotide sequence of Rb69 RNase H. An adenosine in the AAT codon coding for D132 was mutated into a guanosine, the new GAT codon now coding for N132. The mutated nucleotide is shown in red. Highlighted in yellow is the primer that anneals with the original rnh gene. The reverse primer is shown at the bottom, highlighted in the same manner.

Tm = 76.3 °C

274

The PCR reaction was performed as described in Table 6.11. The template used in the reaction was the rnh + pDEST-C1 plasmid from colony 2 (see Section

6.2.3). The annealing temperature of 55 °C was chosen according to the

QuikChange® protocol.

Table 6.11 – Site-Directed Mutagenesis PCR Reaction for D132N RNase H

PCR Reaction PCR Program

KOD buffer (10X) 5 µL Activation 95 °C, 2 minutes

Forward primer (2.5 µM) 6 µL Denaturation 95 °C, 20 seconds

Reverse primer (2.5 µM) 6 µL Annealing 55 °C, 10 seconds

dNTPs (2 mM) 5 µL Extension 70 °C, 2 minutes

MgSO4 (25 mM) 5 µL 20 cycles

KOD Polymerase (1 U/µL) 1 µL Final extension 70 °C, 20 minutes

Template 1 µL

Autoclaved water 21 µL

Upon reception, the primers were first dissolved and diluted to 250 µM with 1X TE buffer (20 mM Tris HCl pH 8.0 + 1 mM EDTA), then a 2.5 µM stock was made for each primer to be used in the PCR reaction.

After the reaction, 1 µL of the restriction enzyme DpnI at 20 U/µL was

added to the PCR product, in order to digest the original pDEST-C1 + rnh

plasmid that is methylated, and leave only the pDEST-C1 + D132N rnh plasmid

in solution. The mixture was incubated at 37 °C for an hour. Next, the digested

PCR product was run on a 1% agarose gel that is presented in Figure 6.17.a. It

should be noted that after the PCR reaction, the plasmid that has been amplified

has not been ligated yet and is double-nicked. Its actual size can therefore not be

275

directly assessed by comparison with the supercoiled DNA ladder or the 1kb

linear ladder, as the nicked plasmid will run slower than either one: on the gel,

the pDEST-C1 + D132N rnh plasmid runs next to the 8 kb supercoiled DNA and

7kb linear DNA, when it is around 4600 bp. However, the purpose of this gel was

to make sure the PCR reaction yielded a clean product before moving on to the

transformation step. First, 2 µL of the PCR product were transformed into 25 µL

of DH5α competent cells, and after the transformation 50 µL and 100 µL

respectively of the cells were plated on LB Agar + Streptomycin. The next day,

colonies were picked from both plates and after an overnight growth, the

plasmids were isolated from the cells using the Miniprep kit. These plasmids

were run on a 1% agarose gel that is shown in Figure 6.17.b. The one isolated

from colony 2 showed the strongest band and was chosen for transformation into the expression host.

Figure 6.17 – Agarose Gels for Rb69 D132N RNase H Cloning

a 1 2 3

1- Supercoiled DNA ladder

10 kb 2- Rb69 D132N rnh in pDEST-C1 - Mutagenesis PCR product

3- 1 kb DNA ladder 5 kb 2 kb 5 kb

1 kb

276

1 2 3 4 5 6 7 b 1- Supercoiled DNA ladder 2- Rb69 D132N RNase H in pDEST-C1 – colony 1 3- Rb69 D132N RNase H in pDEST-C1 – colony 2

4- Rb69 D132N RNase H in pDEST-C1 – colony 3 5- Rb69 D132N RNase H in pDEST-C1 – colony 4 5 kb 6- Rb69 D132N RNase H in pDEST-C1 – colony 5 7- Rb69 D132N RNase H in pDEST-C1 – colony 6 2 kb

6.3.3. Protein Expression and Solubility

A 1 µL sample of the pDEST-C1 + D132N rnh plasmid from colony 2 was

transformed into 50 µL of competent T7 express cells. The cells were first plated

on LB Agar + Streptomycin + Tetracyclin, and one colony from each plate was

picked up and grown overnight in LB + Streptomycin + Tetracyclin. The next day,

protein expression was induced with 1 mM IPTG when the cells reached

OD600 = 0.6. Glycerol stocks were also taken at that point. Samples for the 0 h and 3 h expression were run on an SDS-PAGE gel, presented in Figure 6.18.a. A protein is expressed around 40 kDa, which is consistent with Rb69 D132N

RNase H with the N-terminal His-Tag.

Next, the cells were lysed to check for protein solubility. The lysis buffer

was composed of 50 mM Tris HCl pH 7.5, 200 mM NH4Cl, 10 mM MgCl2, 5 % glycerol, 0.03% PEI and 2 mM DTT. This is the same lysis buffer that was used for the Rb69 Native RNase H and yielded soluble protein. The pellet and supernatant samples after cell lysis are shown in Figure 6.18.b. As is always the case for RNase H, some protein was found insoluble in the pellet, but a decent amount was also soluble.

277

Figure 6.18 – Rb69 D132N RNase H Expression and Cell Lysis

a 1 2 3 4 5

1- Rb69 D132N RNase H expression (colony 1) – 0h sample

2- Rb69 D132N RNase H expression

66.3 kDa (colony 1) – 3h sample 55.4 kDa 3- Rb69 D132N RNase H expression

(colony 2) – 0h sample 36.5 kDa 31.0 kDa 4- Rb69 D132N RNase H expression (colony 2) – 3h sample 21.5 kDa 5- Molecular Weight Marker

Rb69 D132N RNase H is expressed around 40 kDa, which is consistent with the calculated molecular weight of RNase H including the His-Tag.

1 2 3 b

66.3 kDa 1- Molecular Weight Marker 55.4 kDa 2- Rb69 D132N RNase H cell lysis – pellet 3- Rb69 D132N RNase H cell lysis –supernatant 36.5 kDa 31.0 kDa

21.5 kDa

Rb69 D132N RNase H is partially distributed between pellet and supernatant after the cell lysis, but a reasonable amount of protein is found soluble.

278

After protein expression and solubility were assessed, the pDEST-C1 +

D132N rnh plasmid from colony 2 that was used in the expression studies was

sent to the Plant-Microbe Genomics Facility at Ohio State University for DNA

sequencing. The results are detailed in Appendix 3. The plasmid was correctly

sequenced and the mutation in codon 132 was confirmed. Both the forward and

reverse primer reactions are shown, for the same reason that was explained in

Section 6.2.4.

6.3.4. Protein Purification

Rb69 D132N RNase H was purified according to the final purification

protocol that was designed for the native protein (see Section 6.2.5). The

different sets of buffers are presented in Table 6.12.

Table 6.12 – Lysis and HPLC buffers for Rb69 D132N RNase H purification

Ion Exchange Size Exclusion Lysis (SP Sepharose and (Superdex 75) POROS HS)

50 mM Tris HCl pH 7.5 200 mM NH Cl 25 mM bis-Tris HCl pH 6.5 4 25 mM bis-Tris HCl pH 6.5 10 mM MgCl 100 mM NH Cl Buffers 2 4 150 mM NH Cl 5% glycerol 10 mM MgCl 4 2 10 mM MgCl 2 mM DTT 0 - 1 M NaCl 2 0.03% PEI

Buffer A: ~ 16 mS/cm Conductivity ~ 18 mS/cm / Buffer B: ~ 96 mS/cm

Elution / 100% A → 100% B /

279

The soluble portion after cell lysis was first loaded on the low resolution

cation-exchange SP Sepharose. The majority of D132N RNase H was eluted

during the salt gradient elution, however a small amount of protein was also

found in the flow through fraction. That flow through fraction was loaded onto the

SP Sepharose column again, but no more RNase H bound to the resin, the same

amount was found in the flow through fraction once again. The chromatogram

from the first run, as well as the SDS-PAGE gels for both runs, labeled SP

Sepharose (1) and (2), are presented in Figure 6.19.a. Next, the eluted fractions

from run 1 were pooled and loaded on the high resolution cation exchange Poros

HS. D132N RNase H bound more strongly to the Poros resin, but the eluted

protein is still contaminated with higher molecular weight proteins. The results

from the Poros HS step are shown in Figure 6.19.b. Finally, the protein was further purified by size exclusion chromatography using Superdex 75 (see Figure

6.19.c). D132N RNase H after the final purification step was pure enough, the total amount of pure protein was 3mg, which was concentrated down to 6 mg/mL.

However, the protein started precipitating upon concentration.

280

Figure 6.19 – Rb69 D132N RNase H purification

Figure 6.19.a – SP Sepharose

* *

1 2 3 4 5 1- Molecular Weight Marker 6 7 8 9

2- SP Sepharose (1) – Load 3- SP Sepharose (1) - F. 10

66.3 kDa 4- SP Sepharose (1) - F. 14 66.3 kDa 55.4 kDa 55.4 kDa 5- SP Sepharose (1) - Flow Through

36.5 kDa 36.5 kDa 6- SP Sepharose (2) – Load 31.0 kDa 31.0 kDa 7- SP Sepharose (2) – F. 10 21.5 kDa 21.5 kDa 8- SP Sepharose (2) – Flow Through

9- Molecular Weight Marker

On the chromatogram, the OD260 is shown in purple, the OD280 in green, the conductivity in red and the % B in black. The red stars indicate which fractions were run on a SDS- PAGE gel. Here, fractions 12 to 20 contained RNase H and were pooled to be run on the Poros HS column. The pooled fractions are indicated with a red line on the chromatogram. Some of RNase H didn’t bind to the resin and was found in the Flow Through fraction, which was reloaded onto the column. However, RNase H didn’t bind the resin once again.

281

Figure 6.19.b – Poros HS

*

1 2 3

66.3 kDa 55.4 kDa 1- Poros HS – Load 36.5 kDa 2- Poros HS – F.12 31.0 kDa 3- Molecular Weight Marker 21.5 kDa

14.4 kDa

For this step, D132N RNase H was found in fractions 8 to 17 after elution from the Poros HS. Some higher molecular weight contaminants were still present in the protein sample.

282

Figure 6.19.c – Superdex 75

* * *

1 2 3 4 5

1- Molecular Weight Marker 66.3 kDa 2- Superdex 75 – Load 55.4 kDa 3- Superdex 75 – F. 11 36.5 kDa 4- Superdex 75 – F. 13 31.0 kDa 5- Superdex 75 – F. 21 21.5 kDa

After the Superdex 75 run, the impurities were eluted in fraction 11 while D132N RNase H was present in fraction 13. Fractions 12 to 15 were pooled for concentration.

283

6.3.5. Cleaving of the His-Tag

The structure of the bacteriophage T4 RNase H shows that the N-terminus and the C-terminus of the protein are in close proximity to each other. Even though the structure of bacteriophage Rb69 RNase H is still unknown, it is expected to be closely related to that of T4 RNase H. The C-terminus of

RNase H is involved in the interaction with 32 protein, and the presence of an

N-terminal His-Tag might interfere with that interaction. Therefore, the N-terminal

His-Tag had to be cleaved off before the interaction with Rb69 32-B could be studied.

D132N RNase H and the TEV protease were dialyzed together in a buffer containing 50 mM Tris HCl pH 8.5 and 0.5 mM EDTA. It should be noted that as

RNase H was precipitating during the previous concentration step, the precipitate was spun down and only the supernatant was used in dialysis. The proteolysis reaction was then setup as described in Table 6.13, and left to incubate at room

temperature overnight, since the TEV protease is only active at room

temperature or higher.

Table 6.13 – TEV Protease Reaction Setup

TEV Protease Reaction

Conponent Concentration Volume Amount

Rb69 D132N RNase H 0.27 mg/mL 2.2 mL 0.6 mg

TEV Protease 0.5 mg/mL 50 µL 0.05 mg final β-mercaptoethanol 7 M 2.5 µL concentration 1 mM

284

After the reaction, more precipitate was formed. It was pelleted down and

both the pellet and supernatant were run on an SDS-PAGE gel. The results from

the gel are presented in Figure 6.20. It appears that most of the D132N RNase H

was lost due to precipitation, as only 0.6 mg out of 3 mg were left. Two bands

can be seen around 35 kDa for the supernatant sample where the cleaved

D132N RNase H is expected, indicating that the His-Tag might actually have been partially cleaved. Another strong band is seen around 70 kDa, which could be an RNase H dimer. However, the amount of RNase H left in solution at that point was too low to allow for any further experiment with that batch of protein.

Figure 6.20 – TEV Protease Reaction Results

1 2 3 4

66.3 kDa 55.4 kDa 1- Molecular Weight Marker

36.5 kDa 2- D132N RNase H after Superdex 75 31.0 kDa 3- TEV Protease reaction pellet 21.5 kDa 4- TEV Protease reaction supernatant 14.4 kDa

285

6.4. Bacteriophage Rb69 32-B and Future Work

The objective in cloning, expressing and purifying Rb69 RNase H was to study the interaction with Rb69 32-B protein. The 32-B protein was cloned by Dr.

Juliette M. Devos in the Mueser lab. The truncated 32 gene was inserted in the

pDEST-C1 vector, but no TEV protease cleavage site was inserted between the

N-terminal HisTag and the protein. The plasmid was then transformed in the T7

express cell line, and soluble protein was expressed under the same conditions

that were used for Rb69 RNase H. However, the protein purification proved to be

a lot more challenging than for the T4 32-B protein, and despite several attempts no pure Rb69 32-B was obtained in large enough quantities for further studies.

Concerning Rb69 RNase H, the yields were rather low, even while using

the D132N mutant, which also had solubility issues. In the future, one should try

to resolve this solubility problem by using the solubility screen in order to find an

optimal buffer that would enhance the solubility of the protein.

When all the issues of purification and solubility for both Rb69 RNase H

and 32-B have been solved, it would be interesting to apply the strategy that was used for the T4 proteins to the Rb69 complex. If crystals can be obtained, they

might diffract better due to different surface residues that lead to better packing

of the proteins, and provide a clearer picture of the interaction between RNase H

and the 32 protein at the T4-like phage replication fork.

CHAPTER 7 - Escherichia coli DNA-Binding

Protein from Starved Cells

7.1. Introduction

The E. coli Dps protein (from DNA-binding protein from starved cells) has

been found in E. coli cells going into stationary phase. Even though the mechanism by which this protein binds DNA is not clear, it seems to be protecting it from oxidative damage and induce compaction of the genome (see

Section 1.2 for more details).

This chapter is divided in two main sections, the previous work that was accomplished by other members of the lab, and the current work that follows up on their results.

7.2. Previous Work

The Dps project was initiated by Brandon K. Collins and Stephen J. Tomanicek,

former students in the lab. The work they accomplished is described in this

particular section and can be found in Stephen Tomanicek’s Ph.D. dissertation

(Tomanicek, 2005).

286 287

7.2.1. Expression and Purification

Dps was found as an overexpressed impurity in preparations of recombinant archaeal FEN-1 proteins. FEN-1 was expressed in BL21(DE3) cells and protein production was induced with IPTG at 37 °C. The molecular weight of

Dps was found to be around 19 kDa on SDS-PAGE gel.

Dps could be purified away from FEN-1 by size exclusion chromatography on a Superdex 75 column, and was found in the void fraction, meaning its actual size is much larger than 19 kDa.

It should be noted than Dps was obtained from a few batches of FEN-1 protein, notably several Aeropyrum pernix (Ape) FEN-1, Archeoglobus fulgidus

(Afu) FEN-1 and Thermococcus ziligii (Tzi) FEN-1 samples. All the work described in this section was done with early batches of Dps purified from

Ape/Tzi FEN-1 by Brandon Collins and will later on be referred to as (BKC

Ape/Tzi FEN-1) Dps. Dps purified by Stephen Tomanicek will be referred to as

(SJT Ape/Ave FEN-1) Dps.

7.2.2. Characterization

Dps was identified as such using N-terminal sequencing (Midwest

Analytical Inc., St Louis, MO) and MALDI-TOF-TOF Mass Spectrometry (The

Michigan Proteome Consortium, U. of M. Medical School, Ann Arbor, MI), with a

100 % confidence score.

288

It is important to note that the N-terminal sequencing showed that the

endogenous Dps expressed from E. coli is truncated, missing the first nine

amino-acids, as shown in Figure 7.1.

Figure 7.1 – Amino-Acid Sequence of E. coli Dps

1 MSTAKLVKSK ATNLLYTRND VSDSEKKATV ELLNRQVIQF IDLSLITKQA HWNMRGANFI

61 AVHEMLDGFR TALIDHLDTM AERAVQLGGV ALGTTQVINS KTPLKSYPLD IHNVQDHLKE

121 LADRYAIVAN DVRKAIGEAK DDDTADILTA ASRDLDKFLW FIESNIE

The nine missing residues are highlighted in yellow.

The protein characteristics were calculated using that sequence, with the

ExPASy website (Gill and von Hippel, 1989; Gasteiger et al., 2003). They are

summarized in Table 7.1.

Table 7.1 – Dps Characteristics

Dps

Amino-acids 158

Molecular Weight 17.7 kDa

pI 5.33

ε 0.87

The molecular weight of the truncated Dps was calculated to be 17.7 kDa.

However, Dynamic Light Scattering (DLS) measurements show that the

molecular weight is 141.6 kDa, suggesting that Dps really is a heptamer or an

289 octamer in solution. This result is consistent with the fact that Dps was found in the void fraction during size exclusion chromatography.

7.2.3. X-Ray Diffraction Studies

Dps crystals were obtained at 21 °C, using the hanging drop vapor diffusion method, in drops containing 2 µL of the reservoir solution and 2 µL of the protein solution at 19 mg/mL. Examples of the crystals obtained and the conditions they were grown in are shown in Figure 7.2.

Figure 7.2 – Crystals of (BKC Tzi FEN-1) Dps and (BKC Ape FEN-1) Dps

a b

a - (BKC Tzi FEN-1) Dps crystal grown in ~ 17 % PEG 400, 100 mM Na HEPES pH 7.5 and 200 mM MgCl2.

b - (BKC Ape FEN-1) Dps crystals grown in ~ 14 % PEG 1000, 100 mM Na Cacodylate pH 6.5 and 200 mM MgCl2.

The crystals shown in Figure 7.2.a were flash frozen in liquid Nitrogen with

30 % PEG 400 as a cryoprotectant, while the crystals shown in Figure 7.2.b were cryoprotected using either 30 % PEG 400 or 25 % Ethylene Glycol and then flash frozen in liquid Nitrogen.

290

The crystals were screened for X-Ray diffraction and four datasets were

collected on (BKC Tzi FEN-1) Dps crystals. No dataset was collected on the

(BKC Ape FEN-1) Dps crystals as they were not diffraction quality crystals.

The datasets were processed, and the space groups I222 or I212121 were found to have the highest symmetry and the lowest Rmerge values, around 6.1 %.

However, phasing with Molecular Replacement (PDB entry of the search

model: 1DPS, (Grant et al., 1998)) was unsuccessful and the datasets had to be

reprocessed as P1. Eight Dps molecules were then found in the asymmetric unit,

but the solution could not be refined.

7.2.4. Discussion and Future Work

The fact that Molecular Replacement, using the full-length Dps structure

as the search model, did not yield any solution could mean that the truncated

Dps may have a different and unique structure. More crystals of this protein

should be grown and more data collected, in order to solve the structure and see how different it is from the full-length Dps structure. Moreover, it could be interesting to see if the Dps truncation also forms a spherical dodecamer.

7.3. Project Follow-up

Different samples of purified Dps were obtained from Stephen Tomanicek,

namely (BKC Ave FEN-1) Dps, (BKC Ape FEN-1) Dps and (STJ Ape FEN-1)

Dps. That last protein was used initially, as it was present in greater quantity.

291

The protein was dialyzed in 50 mM bis-Tris HCl pH 6.5, 100 mM NH4Cl

and 10 mM MgCl2. A hanging drop vapor diffusion expansion was set up at room

temperature using a coarse gradient in a 4x6 format, with the condition

previously stated in Section 7.1.3. that gave the best crystals, that is 10-30 %

PEG 400, 100 mM Na HEPES pH 7.5 and 200 mM MgCl2. Various amounts of

ethylene glycol (0, 5, 15 and 25 %) were added to each row for cryoprotection.

Two drops were set up on the cover slide, containing 2 µL of the well solution

and 2 µL of the protein at either 15 mg/mL or 18.5 mg/mL. Surprisingly, this

experiment was unsuccessful and no crystals were obtained.

Since the protein used in that crystallization experiment had been kept at

4 °C for a long time, there was some concern that it might have degraded over

time. It was run, along with the other two samples ((BKC Ave FEN-1) Dps, (BKC

Ape FEN-1) Dps), on an SDS-PAGE gel, which is shown on Figure 7.3.

Figure 7.3 – SDS-PAGE Gel of the Truncated Dps Samples

1 2 3 4

66.3 kDa 55.4 kDa 1- Molecular Weight Marker 2- (SJT Ape FEN-1) Dps 36.5 kDa 31.0 kDa 3- (BKC Ape FEN-1) Dps 21.5 kDa 4- (BKC Ave FEN-1) Dps 14.4 kDa

292

As it can be seen on the SDS-PAGE gel, (SJT Ape FEN-1) Dps, that was used for the crystallization trials, has a molecular weight of around 20 kDa, as compared to the BKC Dps, which both have a molecular weight of about 18 kDa.

This difference in size might explain why no crystals were obtained, even though the crystallization condition that was used was known to yield crystals.

The Dps that was sequenced as missing nine amino-acids at the

N-terminus was one of the BKC proteins, and its molecular weight on the

SDS-PAGE gel is consistent with the calculated value of 17.7 kDa. It is possible that the (SJT Ape FEN-1) Dps, which was purified from a different batch of

FEN-1 protein, might be the full-length Dps (the calculated molecular weight on the ExPASy ProtParam Tool is 18.7 kDa) or a truncation missing less than nine amino-acids.

The next section will deal with the characterization of (SJT Ape FEN-1)

Dps. However, the focus should remain on the known truncated Dps proteins for the crystallization studies, since the structure of the full-length Dps is already known.

7.3.1. Further Characterization of (SJT Ape FEN-1) Dps

Dynamic Light Scattering

In order to determine if (SJT Ape FEN-1) Dps in solution behaves like the truncated Dps, a DLS experiment was performed at Room Temperature. The protein was dialyzed in 50 mM Bis-Tris HCl pH 6.5, 100 mM NH4Cl and 10 mM

MgCl2, and filtered using a 0.45 µm pore size Ultrafree-MC filter unit from

293

Millipore. The Dps sample used in the experiment had a concentration of 1.08 mg/mL.

The results of the experiment are shown in Figure 7.4 and Table 7.2. The

protein sample is homogeneous, with a polydispersity of 16.5%. Some trace

amounts of aggregates can be seen, but they don’t account for any of the mass.

The molecular weight was calculated to be about 134 kDa, which is again

consistent with Dps being a heptamer or an octamer in solution.

Figure 7.4 – DLS Results for (SJT Ape FEN-1) Dps

Peak 1 `

Peak 3

Peak 2

294

Table 7.2 – DLS Results for (SJT Ape FEN-1) Dps

Peak Rh (nm) % Pd MW (kDa) % Intensity % Mass 1 4.828 16.5 134 84.9 100 2 43.75 0 23,270 3.7 0 3 43817 0 2.44 E 11 11.4 0

Mass Spectometry

The SDS-PAGE gel shown in Figure 7.3 was sent to the Michigan

Proteome Consortium at the University of Michigan Medical School (Ann Arbor,

MI) for trypsin digestion and MALDI-TOF Mass Spectrometry analysis. The

objective of this experiment was to determine if the (SJT Ape FEN-1) Dps protein

is also an N-terminal truncation missing nine residues, like it was previously

determined for the (BKC Ape FEN-1) Dps. The results are shown in Figure 7.5.

Figure 7.5 – MALDI-TOF Mass Spectrometry Results

a – (SJT Ape FEN-1) Dps, Protein Score: 121, 100 %

1 MSTAKLVKSK ATNLLYTRND VSDSEKKATV ELLNRQVIQF IDLSLITKQA HWNMRGANFI

61 AVHEMLDGFR TALIDHLDTM AERAVQLGGV ALGTTQVINS KTPLKSYPLD IHNVQDHLKE

121 LADRYAIVAN DVRKAIGEAK DDDTADILTA ASRDLDKFLW FIESNIE

b – (BKC Ape FEN-1) Dps, Protein Score: 121, 100 %

1 MSTAKLVKSK ATNLLYTRND VSDSEKKATV ELLNRQVIQF IDLSLITKQA HWNMRGANFI

61 AVHEMLDGFR TALIDHLDTM AERAVQLGGV ALGTTQVINS KTPLKSYPLD IHNVQDHLKE

121 LADRYAIVAN DVRKAIGEAK DDDTADILTA ASRDLDKFLW FIESNIE

295

c – (BKC Ave FEN-1) Dps, Protein Score: 125, 100 %

1 MSTAKLVKSK ATNLLYTRND VSDSEKKATV ELLNRQVIQF IDLSLITKQA HWNMRGANFI

61 AVHEMLDGFR TALIDHLDTM AERAVQLGGV ALGTTQVINS KTPLKSYPLD IHNVQDHLKE

121 LADRYAIVAN DVRKAIGEAK DDDTADILTA ASRDLDKFLW FIESNIE

The sets of peptides corresponding to E. coli Dps with the highest protein score were chosen. a and b were labeled as “DNA protection during starvation condition (E. coli CFT073)”, and c as “PexB”. Highlighted in gray are the missing amino-acids at the N terminus. Glutamine residues cyclizing into pyroglutamate are shown in blue, and oxidized Methionine residues are in green. The peptides identified by the MS experiment are shown in red. The database used for peptide matching was NCBInr.

The MALDI-TOF experiment identified (SJT Ape FEN-1) Dps as E. coli

Dps. The first ten amino-acids are missing from the peptide list, which suggests that (SJT Ape FEN-1) Dps is missing the N-terminus the same way (BKC Ape

FEN-1) Dps is. However, trypsin, which was used in this experiment, cleaves after each lysine and arginine residue unless it is followed by a proline. The

N-terminus of E. coli Dps contains three lysines in position 5, 8 and 10. The three

following peptides would then be created by trypsin from the N terminus: MSTAK,

LVK and SK, and have a molecular weight of 537 Da, 394 Da and 251 Da,

respectively. Peptides this small would not have been detected by the

experiment. The solution then would be to treat the (SJT Ape FEN-1) Dps

sample with a different protease and run the MS experiment again.

Unfortunately, that sample was not available anymore and the experiment could

not be done.

296

7.3.2. X-Ray Diffraction Studies

The three samples previously mentioned ((SJT Ape FEN-1) Dps, (BKC

Ape FEN-1) Dps and (BKC Ave FEN-1) Dps) were dialyzed in the following

buffer: 50 mM bis-Tris HCl pH 6.5, 100 mM NH4Cl, 150 mM NaCl and 10 mM

MgCl2. Each sample was then concentrated and filtered with a Millipore 0.45 µm

pore size Ultrafree-MC filter unit. A coarse gradient expansion, using the 2 + 2

hanging drop vapor diffusion method, was set up in 24-well Costar trays at room temperature. This time, the two conditions that previously gave crystals were

used in a 2x12 setup (see Figure 8.2). The best crystals were obtained for (BKC

Ape FEN-1) Dps at 21.2 mg/mL, in 11.8-15.5 % PEG 1000, 100 mM Na

Cacodylate pH 6.5 and 200 mM MgCl2, and are shown in Figure 7.6.

Figure 7.6 – (BKC Ape FEN-1) Dps Crystals a b c

a – 11.8 % PEG 1000, 100 mM Na Cacodylate pH 6.5 and 200 mM MgCl2

b – 13.6 % PEG 1000, 100 mM Na Cacodylate pH 6.5 and 200 mM MgCl2

c – 15.5 % PEG 1000, 100 mM Na Cacodylate pH 6.5 and 200 mM MgCl2

Other crystals were obtained for the other two proteins at around

20 mg/mL, but they were too small for diffraction.

297

Crystals shown in Figure 8.6.b and 8.6.c were soaked in a substitute

mother liquor containing 50 mM bis-Tris HCl pH 6.5, 100 mM NH4Cl, 150 mM

NaCl, 13.6 or 15.5 % PEG 1000 respectively, 100 mM Na Cacodylate pH 6.5 and

210 mM MgCl2, before 25 % Ethylene Glycol was added for cryoprotection. The

crystals were then flash frozen in liquid Nitrogen and screened for X-Ray

diffraction.

One dataset was collected on the in-house Rigaku FR-E diffractometer

(the Ohio Crystallography Consortium at the University of Toledo, Toledo, OH,

USA), using a wavelength of 1.54 Å. The crystal diffracted to a resolution of

3.0 Å. A number of 220 images were collected every 0.5 degree with an

oscillation of 0.5 degree and an exposure time of 15 s. The distance to the

detector was set at 80 mm. Some of the images collected are shown in Figure

7.7.

Figure 7.7 – X-Ray Diffraction Images of the (BKC Ape FEN-1) Dps Crystals

a b c

a – Image 1 at 0° b – Image 90 at 45° c – Image 180 at 90°

298

The dataset was indexed, integrated and scaled using the HKL2000 software (Minor et al., 2002). A summary of the data processing with the different space groups that were used is shown in Table 7.3.

Table 7.3 – (BKC Ape FEN-1) Dps Data Processing Summary

Space Group P1 (1) C2 (5) F222 (22) I222 (23) I4 (79)

Resolution 20 to 3.5 Å 20 to 3.5 Å 20 to 3.5 Å 20 to 3.5 Å 20 to 3.5 Å after scaling Unit Cell 85.04 116.9 145.0 90.0 114.19 90.0 88.73 90.0 88.78 90.0 dimensions 85.00 95.32 88.76 127.8 125.48 90.0 88.97 90.0 88.78 90.0 (Å, °) 85.09 117.1 89.00 90.0 125.53 90.0 114.5 90.0 114.3 90.0

Rmerge * 8.0 % 10.0 % 48.2 % 10.7 % 46.3 %

Mosaicity 0.945 0.913 0.968 0.964 0.955

16,474 9,753 5,775 6,052 5,618 Number of (25,896) (26,159) (25,248) (26,212) (25,763) reflections 13 / atom 8 / atom 4.6 / atom 5 / atom 4.5 / atom

Completeness 74.3 % 84.2 % 99.9 % 98.2 % 99.9 %

# molecules / asymmetric 8 to 12 5 to 6 2 to 3 2 to 3 2 to 3 unit

⎡ n ⎤ n * R =100 × F 2 − F 2 / F 2 Equation 7.1 merge ⎢∑∑ hkl hkl i ⎥ ∑∑ hkl ⎣ hkl i=1 ⎦ hkl i=1 where F 2 is the intensity of each hkl reflection and F 2 is the mean value of i hkl hkl i measurements of n equivalent reflections.

The Rmerge values indicate that the actual space group might be I222. That space group choice was confirmed using Pointless in the CCP4 program suite.

The Matthews cell content analysis program used next predicted three molecules in the asymmetric unit (Matthews, 1968), with a Matthews coefficient of 2.13

299

Å3.Da-1 and a solvent content of 42.24%. Phasing using Molecular Replacement was then done on the I222 data, as well as on the P1 data. The program used for

Molecular Replacement was MolRep, which is part of the CCP4 program suite

(Bailey, 1994), and the search model was the E. coli full length Dps crystal structure (pdb: 1DPS, (Grant et al., 1998)). A summary of the Molecular

Replacement results can be found in Table 7.4.

Table 7.4 – (BKC Ape FEN-1) Dps MolRep Molecular Replacement Summary

P1 (1) I222 (23) search result search result

R = 56 % 1 dodecamer factor 12 monomers No solution found CC = 31.4 %

R = 59 % factor R = 42.8 % 12 monomers CC = 26 % 3 monomers factor CC = 59.7 % (11 monomers found)

R = 59 % factor 3 monomers with R ~ 60 % 6 monomers CC = 26 % factor I2 2 2 (24) CC ~ 40 % (11 monomers found?) 1 1 1

R = 59.3 % dimer factor CC = 22 %

P1 Molecular Replacement: The high R and low correlation values suggest that P1 is not the right space group, and no satisfying solution was found after Molecular Replacement.

I222 Molecular Replacement: No solution was found when 12 Dps monomers were searched for. However, when 3 Dps monomers were searched for, the lower R and higher correlation values indicate that a possible solution was found. The same searched with the dataset processed as I212121 didn’t yield any acceptable solution.

The I222 Molecular Replacement solution was refined using the CCP4 refinement program REFMAC in the restrained mode (Bailey, 1994; Murshudov,

1997). The R value was calculated to be 29.6 % and the Rfree 41.8 %.

300

Since the refinement results after molecular replacement with MolRep

were not very good, another molecular replacement program, namely Phaser,

was used, in order to compare the two molecular replacement solutions. Three

molecules of 1DPS were searched for. The results from Phaser are shown in

Table 7.5.

Table 7.5 - (BKC Ape FEN-1) Dps Phaser Molecular Replacement Summary

Score Search Model (1DPS × 3)

Rotation Function Score 7.1 8.1 9.0

Translation Function Score 11.6 23.0 31.1

Packing 0 0 0

Log-Likelihood Gain 127 490 1222

The Log-likelihood gain (LLG) indicates how well the data agrees with the model, a good molecular replacement solution will therefore have a high LLG score. The rotation (RFZ) and translation (TFZ) Z scores are then calculated from the LLG. An RFZ score high than 5 and TFZ score higher than 8 indicate that a solution was found. The packing is an indication of clashes that may have been found by the program.

The Phaser scores looked good, and after a number of restrained

refinement cycles, the R value was calculated to be 18.3 % and the Rfree 31.1 %.

The R value is a little low but this could most likely be fixed upon a few cycles of building and refinement.

The two solutions, from MolRep and from Phaser, are shown below in

Figure 7.8. Even though these solutions look different, it is important to remember that the space group after processing was I222, therefore after generating symmetry molecules with the I222 symmetry operators, the two models are actually the same.

301

Figure 7.8 – MolRep vs. Phaser Solutions

MolRep Phaser

The Phaser solution will be used for the rest of the model description, as the Molecular Replacement and refinement statistics were better.

After generation of the symmetry molecules, a spherical dodecamer was

obtained. It is presented in Figure 7.9. A hollow sphere was obtained after

surface rendering, it is also shown in Figure 7.9.c. The dimensions of the sphere

are as follows: 85 to 90 Å for the outside diameter and 45 Å for the hollow core

diameter. These dimensions are very close to the ones reported by Grant (Grant

et al., 1998), which were 90 and 45 Å respectively, for the outside and the hollow

core.

302

Figure 7.9 – Final Dps Model from the (BKC Ape FEN-1) Dps Crystals

a b

c

a – Three monomers of Dps found 85 Å after Molecular Replacement

b – Spherical Dodecameric structure 45 Å of Dps

c- Surface rendering of a half-sphere of Dps dodecamers

The (BKC Ape FEN-1) Dps dodecameric structure was superimposed onto the Dps dodecamer reported earlier (Grant et al., 1998). This is shown in

Figure 7.10 below. It can be seen easily that the two structures superimpose perfectly.

303

Figure 7.10 – Final Dps Model from the (BKC Ape FEN-1) Dps Crystals

The Dps dodecamer model is shown in gray (Grant et al., 1998) and the (SJT Ape FEN-1) Dps model in orange.

It appears that the truncation of the N-terminus of Dps does not affect the

interaction of the proteins forming the dodecameric structure. The reason the

earlier dataset could not be phased must therefore be the bad quality of the X-

Ray diffraction data. This is another piece of evidence that the N-terminus of Dps

is not involved in hollow sphere formation and aggregation leading to genome

compaction, but more likely in DNA binding, as it has been suggested before

(Ceci et al., 2004).

8.4. Conclusion

An N-terminal truncation of the E. coli Dps protein was characterized.

Biophysical studies, such as Dynamic Light Scattering and Mass Spectrometry, were done but did not provide any groundbreaking results. On the other hand, a

304 crystal structure of the protein was obtained. The Dps truncation adopts the same overall dodecameric form as the full length Dps does. This result indicates that the N-terminus of Dps, which is lysine rich, is not involved in cooperative binding but more likely in DNA binding, a result that is consistent with some of the evidence that has been published by other groups.

305

BIBLIOGRAPHY

Alberts, B. M., Frey, L. (1970) T4 bacteriophage gene 32: a structural protein in the replication and recombination of DNA. Nature, 227 1313-18. Azam, T. A. and Ishihama, A. (1999) Twelve species of the nucleoid-associated protein from Escherichia coli. Sequence recognition specificity and DNA binding affinity. J Biol Chem, 274 (46), 33105-13. Bailey, S. (1994) The CCP4 suite: programs for protein crystallography. Acta Crystallogr D Biol Crystallogr, D50 (5), 760-3. Bhagwat, M., Hobbs, L. J. and Nossal, N. G. (1997) The 5'-exonuclease activity of bacteriophage T4 RNase H is stimulated by the T4 gene 32 single- stranded DNA-binding protein, but its flap endonuclease is inhibited. J Biol Chem, 272 (45), 28523-30. Bhagwat, M., Meara, D. and Nossal, N. G. (1997) Identification of residues of T4 RNase H required for catalysis and DNA binding. J Biol Chem, 272 (45), 28531-8. Bhagwat, M. and Nossal, N. G. (2001) Bacteriophage T4 RNase H removes both RNA primers and adjacent DNA from the 5' end of lagging strand fragments. J Biol Chem, 276 (30), 28516-24. Bradford, M. M. (1976) A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Anal Biochem, 72 248-54. Casas-Finet, J. R., Fischer, K. R. and Karpel, R. L. (1992) Structural basis for the nucleic acid binding cooperativity of bacteriophage T4 gene 32 protein: the (Lys/Arg)3(Ser/Thr)2 (LAST) motif. Proc Natl Acad Sci U S A, 89 (3), 1050-4. Ceci, P., Cellai, S., Falvo, E., Rivetti, C., Rossi, G. L. and Chiancone, E. (2004) DNA condensation and self-aggregation of Escherichia coli Dps are coupled phenomena related to the properties of the N-terminus. Nucleic Acids Res, 32 (19), 5935-44. Chastain, P., Makhov, A. M., Nossal, N. G., Griffith, J. D. (2003) Architecture of the replication complex and DNA loops at the fork generated by the bacteriophage T4 proteins. J Biol Chem, 278 21276-21285. Collins, B. K., Tomanicek, S. J., Lyamicheva, N., Kaiser, M. W. and Mueser, T. C. (2004) A preliminary solubility screen used to improve crystallization trials: crystallization and preliminary X-ray structure determination of Aeropyrum pernix flap endonuclease-1. Acta Crystallogr D Biol Crystallogr, 60 (Pt 9), 1674-8. DeLano, W. L. and Lam, J. W. (2005) PyMOL: A communications tool for computational models. 230th ACS National Meeting, Washington, DC, United States. Devos, J. M., Tomanicek, S. J., Jones, C. E., Nossal, N. G. and Mueser, T. C. (2007) Crystal structure of bacteriophage T4 5' nuclease in complex with a

306

branched DNA reveals how flap endonuclease-1 family nucleases bind their substrates. J Biol Chem, 282 (43), 31713-24. Dwlgosh, J. M. (2008). The study of protein-protein interactions involved in lagging strand DNA replication and repair, the University of Toledo. Ph.D. Dissertation. Emsley, P. a. C. K. (2004) Coot: Model-building tools for molecular graphics. Acta Crystallogr D Biol Crystallogr, D60 2126-32. Gangisetty, O., Jones, C. E., Bhagwat, M. and Nossal, N. G. (2005) Maturation of bacteriophage T4 lagging strand fragments depends on interaction of T4 RNase H with T4 32 protein rather than the T4 gene 45 clamp. J Biol Chem, 280 (13), 12876-87. Gasteiger, E., Gattiker, A., Hoogland, C., Ivanyi, I., Appel, R. D. and Bairoch, A. (2003) ExPASy: The proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res, 31 (13), 3784-8. Genschel, J., Curth, U. and Urbanke, C. (2000) Interaction of E. coli single- stranded DNA binding protein (SSB) with exonuclease I. The carboxy- terminus of SSB is the recognition site for the nuclease. Biol Chem, 381 (3), 183-92. Giedroc, D. P., Keating, K. M., Williams, K. R. and Coleman, J. E. (1987) The function of zinc in gene 32 protein from T4. Biochemistry, 26 (17), 5251-9. Giedroc, D. P., Keating, K. M., Williams, K. R., Konigsberg, W. H. and Coleman, J. E. (1986) Gene 32 protein, the single-stranded DNA binding protein from bacteriophage T4, is a zinc metalloprotein. Proc Natl Acad Sci U S A, 83 (22), 8452-6. Giedroc, D. P., Khan, R. and Barnhart, K. (1991) Site-specific 1,N6- ethenoadenylated single-stranded oligonucleotides as structural probes for the T4 gene 32 protein-ssDNA complex. Biochemistry, 30 (33), 8230- 42. Gill, S. C. and von Hippel, P. H. (1989) Calculation of protein extinction coefficients from amino acid sequence data. Anal Biochem, 182 (2), 319- 26. Grant, R. A., Filman, D. J., Finkel, S. E., Kolter, R. and Hogle, J. M. (1998) The crystal structure of Dps, a ferritin homolog that binds and protects DNA. Nat Struct Biol, 5 (4), 294-303. Haebel, P. W., Wichman, S., Goldstone, D. and Metcalf, P. (2001) Crystallization and initial crystallographic analysis of the disulfide bond isomerase DsbC in complex with the alpha domain of the electron transporter DsbD. J Struct Biol, 136 (2), 162-6. Han, E. S., Cooper, D. L., Persky, N. S., Sutera, V. A., Jr., Whitaker, R. D., Montello, M. L. and Lovett, S. T. (2006) RecJ exonuclease: substrates, products and interaction with SSB. Nucleic Acids Res, 34 (4), 1084-91. Hollingsworth, H. C. and Nossal, N. G. (1991) Bacteriophage T4 encodes an RNase H which removes RNA primers made by the T4 DNA replication system in vitro. J Biol Chem, 266 (3), 1888-97. Horanyi, P. S., Griffith, J., Wang, B. C. and Jenney, F. E. (2006) Vectors for high throughput expression of ccdB polypeptides. U.S. Pat. Appl. Publ.

307

Huber, C. G. (2000) Biopolymer Chromatography. Encyclopedia of Analytical Chemistry, R. A. M. Eds, 11250-78. Hurley, J. M., Chervitz, S. A., Jarvis, T. C., Singer, B. S., Gold L. (1993) Assembly of the bacteriophage T4 replication machine requires the acidic carboxy terminus of gene 32 protein. J Mol Biol, 229 398-418. Ilari, A., Ceci, P., Ferrari, D., Rossi, G. L. and Chiancone, E. (2002) Iron incorporation into Escherichia coli Dps gives rise to a ferritin-like microcrystalline core. J Biol Chem, 277 (40), 37619-23. Izaac, A., Schall, C. A. and Mueser, T. C. (2006) Assessment of a preliminary solubility screen to improve crystallization trials: uncoupling crystal condition searches. Acta Crystallogr D Biol Crystallogr, 62 (Pt 7), 833-42. Jensen, D. E., Kelly, R. C. and von Hippel, P. H. (1976) DNA "melting" proteins. II. Effects of bacteriophage T4 gene 32-protein binding on the conformation and stability of nucleic acid structures. J Biol Chem, 251 (22), 7215-28. Jones, C. E., Mueser, T. C. and Nossal, N. G. (2004) Bacteriophage T4 32 protein is required for helicase-dependent leading strand synthesis when the helicase is loaded by the T4 59 helicase-loading protein. J Biol Chem, 279 (13), 12067-75. Karam, J. D. (1994) Molecular Biology of Bacteriophage T4. Washington D.C., American Society of Microbiology. Karpel, R. L. (1990). The biology of non-specific DNA-protein interactions, 103- 130. Koch, M. H., Vachette, P. and Svergun, D. I. (2003) Small-angle scattering: a view on the properties, structures and structural changes of biological macromolecules in solution. Q Rev Biophys, 36 (2), 147-227. Kornberg, A., Baker, T.A. (1992) DNA Replication, second edition. New York, W.H. Freeman and Company. Kuzmic, P. (1996) Program DYNAFIT for the analysis of enzyme kinetic data: application to HIV proteinase. Anal Biochem, 237 (2), 260-73. Laskowski, R. A., MacArthur, M. W., Moss, D. S. and Thornton, J. M. (1993) PROCHECK: a program to check the stereochemical quality of protein structures. J Appl Cryst, 26 283-91. Leslie, A. G. W. (1992) Recent changes to the MOSFLM package for processing film and image plate data. Joint CCP4 + ESF-EAMCB Newsletter on Protein Crystallography,(26). Little, J. W. and Mount, D. W. (1982) The SOS regulatory system of Escherichia coli. Cell, 29 (1), 11-22. Liu, Y., Kao, H. I. and Bambara, R. A. (2004) Flap endonuclease 1: a central component of DNA metabolism. Annu Rev Biochem, 73 589-615. Martinez, A. and Kolter, R. (1997) Protection of DNA during oxidative stress by the nonspecific DNA-binding protein Dps. J Bacteriol, 179 (16), 5188-94. Matthews, B. W. (1968) Solvent content of protein crystals. J Mol Biol, 33 (2), 491-7.

308

McCoy, A. J., Grosse-Kunstleve R. W., Adams P. D., Winn M. D., Storoni L. C. and Read R. J. (2007) Phaser crystallographic software. J. Appl. Cryst., 40 658-74. Minor, W., Cymborowski, M. and Otwinowski, Z. (2002) Automatic system for crystallographic data collection and analysis. Acta Physica Polonica, A, 101 (5), 613-19. Molineux, I. J. and Gefter, M. L. (1975) Properties of the Escherichia coli DNA- binding (unwinding) protein interaction with nucleolytic enzymes and DNA. J Mol Biol, 98 (4), 811-25. Mueser, T. C., Nossal, N. G. and Hyde, C. C. (1996) Structure of bacteriophage T4 RNase H, a 5' to 3' RNA-DNA and DNA-DNA exonuclease with sequence similarity to the RAD2 family of eukaryotic proteins. Cell, 85 (7), 1101-12. Murshudov, G., Vagin A. and Dodson E. (1997) Refinement of macromolecular structures by the maximum-likelihood method. Acta Crystallogr D Biol Crystallogr, D53 240-55. Nossal, N. G. (1992) Protein-protein interactions at a DNA replication fork: bacteriophage T4 as a model. Faseb J, 6 (3), 871-8. Nossal, N. G. (1994) The Bacteriophage T4 DNA Replication Fork. Molecular Biology of Bacteriophage T4, American Society of Microbiology, 43-53. Otwinowski, Z. and Minor, W. (1997) Processing of X-ray diffraction data collecting in oscillation mode. In Carter, C.W. and Sweet, R.M. (eds.). Methods Enzymol, 276 307-326. Pan, T., Giedroc, D. P. and Coleman, J. E. (1989) 1H NMR studies of T4 gene 32 protein: effects of zinc removal and reconstitution. Biochemistry, 28 (22), 8828-32. Petoukhov, M. V., Konarev, P. V., Kikhney, A. G. and Svergun, D .I. (2007) ATSAS 2.1 - Towards Automated and Web-Supported Small Angle Scattering Data Analysis. J Appl Cryst, 40 223-28. Pflugrath, J. W. (1999) The finer things in X-Ray diffraction data collection. Acta Crystallogr D Biol Crystallogr, D55 1718-25. Pierce, M. M., Raman, C. S. and Nall, B. T. (1999) Isothermal titration calorimetry of protein-protein interactions. Methods, 19 (2), 213-21. Rao, R. N. (1984) Construction and properties of plasmid pKC30, a pBR322 derivative containing the pL-N region of phage lambda. Gene, 31 (1-3), 247-50. Reuven, N. B., Staire, A. E., Myers, R. S. and Weller, S. K. (2003) The herpes simplex virus type 1 alkaline nuclease and single-stranded DNA binding protein mediate strand exchange in vitro. J Virol, 77 (13), 7425-33. Rodgers, D. W. (1994) Cryocrystallography. Structure, 2 (12), 1135-40. Sandigursky, M., Franklin, W. A. (1993) E. coli single-stranded DNA-binding protein stimulates the DNA deoxyribophosphodiesterase activity of Exonuclease I. Nucleic Acids Res, 22 (2), 247-50. Savvides, S. N., Raghunathan, S., Futterer, K., Kozlov, A. G., Lohman, T. M., Waksman G. (2004) The C-terminal domain of full-length E. coli SSB is disordered even when bound to DNA. Protein Sci, 13 1942-47.

309

Senger, A. B. and Mueser, T. C. (2005) Rapid preparation of custom grid screens for crystal growth optimization. J. Appl. Cryst., 38 847-50. Shamoo, Y., Friedman, A. M., Parsons, M. R., Konigsberg, W. H. and Steitz, T. A. (1995) Crystal structure of a replication fork single-stranded DNA binding protein (T4 gp32) complexed to DNA. Nature, 376 (6538), 362-6. Shatzman, A. R. and Rosenberg, M. (1987) Expression, identification, and characterization of recombinant gene products in Escherichia coli. Methods Enzymol, 152 661-73. Tanford, C. Light Scattering. Physical Chemistry of Macromolecules. Tomanicek, S. J. (2005). Crystallographic Studies of DNA Replication and Repair Proteins, the University of Toledo. Ph.D. dissertation. Tomanicek, S. J., Devos. J. M., Mueser, T. C. Metal-free crystal structure of bacteriophage T4 RNase H. in preparation. Waidner, L. A., Flynn, E. K., Wu, M., Li, X. and Karpel, R. L. (2001) Domain effects on the DNA-interactive properties of bacteriophage T4 gene 32 protein. J Biol Chem, 276 (4), 2509-16. Williams, K. R., Spicer, E. K., LoPresti, M. B., Guggenheimer, R. A., Chase, J. W. (1983) Limited proteolysis studies on the E. coli single-stranded DNA binding protein. J Biol Chem, 258 (5), 3346-3355. Yoakum, G. H. (1983) Amplification of DNA repair genes using plasmid pKc30. Methods Enzymol, 101 138-55.

310

APPENDICES

Appendix 1 – Maps of the pENTR-D, pET 101 and pDEST-C1 Vectors...... 311

Appendix 2 – HPLC Columns...... 313

Appendix 3 – DNA Sequencing Results ...... 314

311

Appendix 1 – Maps of the pENTR-D, pET 101 and pDEST-C1 Vectors

A

B

312

C

A – pENTR™ / D-TOPO® Gateway® entry vector (Invitrogen)

B – pET101 / D-TOPO® Gateway® expression vector (Invitrogen)

C – pDEST-C1 expression vector (Horanyi et al., 2006)

313

Appendix 2 – HPLC Columns

HPLC Column Chemistry of the Resin

Low Resolution Cation Exchange SP Sepharose The cross-linked agarose-dextran matrix is coated with sulfopropyl (Amersham groups that bind to the positively charged residues on the protein. The Biosciences) protein is then eluted with a NaCl gradient.

Low Resolution Anion Exchange Q Sepharose The cross-linked agarose-dextran matrix is coated with quaternary (Amersham amine groups that bind to the negatively charged residues on the Biosciences) protein. The protein is then eluted with a NaCl gradient.

High Resolution Cation Exchange POROS HS The poly(styrene-divinylbenzene) polymer matrix (PS/DVB) is coated (Applied with sulfopropyl groups that bind to the positively charged residues on Biosystems) the protein. The protein is then eluted with a NaCl gradient.

High Resolution Anion Exchange POROS HQ The PS/DVB matrix is coated with quaternary amine groups that bind to (Applied the negatively charged residues on the protein. The protein is then Biosystems) eluted with a NaCl gradient.

Ion Exchange Hydroxyapatite The [Ca5(PO4)3OH]2 resin of the hydroxyapatite column binds proteins (Bio-Rad) and competes with the phosphate backbone of the DNA present in the sample. The protein is eluted with a salt gradient of Ammonium Sulfate.

POROS PE Hydrophobic Interaction (Applied The PS/DVB matrix is coated with phenyl-ether hydrophobic groups that Biosystems) bind to nucleases present in the sample. The protein is eluted in the flow through.

Size Exclusion Superdex 75 The macroporous gel matrix of agarose and dextran forms porous (Amersham beads. Protein larger than 75 kDa cannot diffuse in the pores and are Biosciences) eluted in the void fraction. The smaller proteins are eluted according to their hydrodynamic radius (the smaller ones being eluted last).

Superdex 200 Size Exclusion (Amersham The principle is the same as the Superdex 75, but here the pores are Biosciences) larger and allow proteins up to 200 kDa to diffuse.

Metal Affinity Cobalt ions are immobilized on a Superflow resin via a tetradentate Talon ligand (carboxyl and amine groups). The His-Tags located on the protein (Clontech) can then chelate the cobalt ions. The protein is eluted with an Imidazole gradient.

314

Appendix 3 – DNA Sequencing Results

a - T4 32-B Protein (pEKF2 plasmid)

Forward Primer Sequencing Reaction

ATG CTG ATG TTT AAA CGT AAA TCT ACT GCT GAA CTC GCT GCA CAA ATG GCT AAA CTG 1 M F K R K S T A E L A A Q M A K L

AN GGN NN AAA GGT TTT TCT TCT GAA NAT ANA GGC NAG TGG AAA CTG AAA AAT GGC AAT AAA GGT TTT TCT TCT GAA GAT AAA GGC GAG TGG AAA CTG AAA 18 N G N K G F S S E D K G E W K L K

CTC NAT AAT GNG GGT AAN GGT CAA GCN NTA ATT CNT TTT CTT CCG TCN AAA CTC GAT AAT GCG GGT AAC GGT CAA GCA GTA ATT CGT TTT CTT CCG TCT AAA 35 L D N A G N G Q A V I R F L P S K

AAT GAT GAA CAA GCA CCA TTC NCA ATT CTT GTA AAT CAC GGT TTC NAG AAA AAT GAT GAA CAA GCA CCA TTC GCA ATT CTT GTA AAT CAC GGT TTC AAG AAA 52 N D E Q A P F A I L V N H G F K K

AAT GGT AAA TGG TAT ATT GAA NCA TGT TCA TCN ACC CAT GGT GAT TAC GAT AAT GGT AAA TGG TAT ATT GAA ACA TGT TCA TCT ACC CAT GGT GAT TAC GAT 69 N G K W Y I E T C S S T H G D Y D

TCT TGC CCA GTA TGT CAA TAC ATC NGT AAA AAT GAT CTA TAC AAC ACT GAC TCT TGC CCA GTA TGT CAA TAC ATC AGT AAA AAT GAT CTA TAC AAC ACT GAC 86 S C P V C Q Y I S K N D L Y N T D

AAT AAA GAG TAN AGT CTT GTT AAA CGT AAA ACT TCT TAC TGG GCC AAC NTT AAT AAA GAG TAC AGT CTT GTT AAA CGT AAA ACT TCT TAC TGG GCT AAC ATT 103 N K E Y S L V K R K T S Y W A N I

CTT GTA GTA AAA GAC CCA GCT GCT CCA NAA AAC GAA NGT NAA GTA TTN AAA CTT GTA GTA AAA GAC CCA GCT GCT CCA GAA AAC GAA GGT AAA GTA TTT AAA 120 L V V K D P A A P E N E G K V F K

TAC CGW TNC GGN AAN AAA ATC TGG GAT AAA ATC AAT GCA ATG ATT GCG GTT TAC CGC TTT GGT AAG AAA ATC TGG GAT AAA ATC AAT GCA ATG ATT GCG GTT 137 Y R F G K K I W D K I N A M I A V

GAT GTT GAA ATG GGT GAA CAN CCA NTT GAT GTA ACT TGT NCG NGG GAA GGT GAT GTT GAA ATG GGT GAA ACT CCA GTT GAT GTA ACT TGT CCG TGG GAA GGT 154 D V E M G E T P V D V T C P W E G

GCT AAC TTT GNA CTG AAA GTT ANA CAA GTT TCT GGA TTT AGT AAC TAC NAT GCT AAC TTT GTA CTG AAA GTT AAA CAA GTT TCT GGA TTT AGT AAC TAC GAT 171 A N F V L K V K Q V S G F S N Y D

GAA TCT NAA TTC CTG AAT CAA NCT GCN ATT CCA AAC ATT GAC GAT GAA TCT GAA TCT AAA TTC CTG AAT CAA TCT GCG ATT CCA AAC ATT GAC GAT GAA TCT 188 E S K F L N Q S A I P N I D D E S

315

TTC CAN AAA GAA CTG TTC NAA CAA ATG GTT GAC CTT TCT GAN ATG ACT TCT TTC CAG AAA GAA CTG TTC GAA CAA ATG GTT GAC CTT TCT GAA ATG ACT TCT 205 F Q K E L F E Q M V D L S E M T S

AAA GAT AAA TTC AAA TCG TTT GAA GAA CNT AAT ACT AAA TTC GGT CAA GTT AAA GAT AAA TTC AAA TCG TTT GAA GAA CTT AAT ACT AAA TTC GGT CAA GTT 222 K D K F K S F E E L N T K F G Q V

ATG NGA ACT GCT GTG ATG GGC GGT GCT GCT GCA ACT GCT GCT AAG AAA GCT ATG GGA ACT GCT GTG ATG GGC GGT GCT GCT GCA ACT GCT GCT AAG AAA GCT 239 M G T A V M G G A A A T A A K K A

GAT AAA GTT GCT GAT GAT TTG NAT GCA TTC ANT GTT GAT GAC TTC AAT ACA GAT AAA GTG GCT GAT GAT TTG GAT GCA TTC AAT GTT GAT GAC TTC AAT ACA 256 D K V A D D L D A F N V D D F N T

NAA ACT GAA NAT GAT TTT ATG AGC TCA AGC TCT GGT AGT TCA TCT AGT GCN AAA ACT GAA GAT GAT TTT ATG AGC TCA AGC TCT GGT AGT TCA TCT AGT GCT 273 K T E D D F M S S S S G S S S S A

GAT GAC ACG GAC CTG GAT GAC CTT TTG AAT GAC CTT TAA GAT GAC ACG GAC CTG GAT GAC CTT TTG AAT GAC CTT TAA 290 D D T D L D D L L N D L stop

Reverse Primer Sequencing Reaction

NNG CTG ATG TTT AAA CGT AAA TCT ACT GCT GAA CTC GCT GCA CAA ATG GCT AAA CTG 1 M F K R K S T A E L A A Q M A K L

AAT GGC AAT AAA NGG TTT TTCT TNT GAA GAT AAA GGN GAG TGG AAA CTG AAA AAT GGC AAT AAA GGT TTT TCT TCT GAA GAT AAA GGC GAG TGG AAA CTG AAA 18 N G N K G F S S E D K G E W K L K

CTC GAT AAT GCG GGT AAC GGT CAA GCA GTN ATT NGT TTT CTT CNG TNT AAA CTC GAT AAT GCG GGT AAC GGT CAA GCA GTA ATT CGT TTT CTT CCG TCT AAA 35 L D N A G N G Q A V I R F L P S K

AAT GAT GAA CAA GCA CCA TTN GCA ATT CTT NGTA AAT CAC GGT TTC AAG AAA AAT GAT GAA CAA GCA CCA TTC GCA ATT CTT GTA AAT CAC GGT TTC AAG AAA 52 N D E Q A P F A I L V N H G F K K

AAT GGT AAA TGG TAT ANN GAA ACA TGT TCA TCT ACC CAT GGT GAT TAC GAT AAT GGT AAA TGG TAT ATT GAA ACA TGT TCA TCT ACC CAT GGT GAT TAC GAT 69 N G K W Y I E T C S S T H G D Y D

TCT TGC CCA GTA TGT CAA TAC ATC AGT AAA AAT GAT CTA TAC AAC ACT GAC TCT TGC CCA GTA TGT CAA TAC ATC AGT AAA AAT GAT CTA TAC AAC ACT GAC 86 S C P V C Q Y I S K N D L Y N T D

AAT AAA GAG TAC AGT CTT GTT AAA CGT AAA ACT TCT TAC TGG GCC AAC ATT AAT AAA GAG TAC AGT CTT GTT AAA CGT AAA ACT TCT TAC TGG GCT AAC ATT 103 N K E Y S L V K R K T S Y W A N I

316

CTT GTA GTA AAA GAC CCA GCT GCT CCA GAA AAC GAA GGT AAA GTA TTT AAA CTT GTA GTA AAA GAC CCA GCT GCT CCA GAA AAC GAA GGT AAA GTA TTT AAA 120 L V V K D P A A P E N E G K V F K

TAC CGT TTC GGT AAG AAA ATC TGG GAT AAA ATC AAT GCA ATG ATT GCG GTT TAC CGC TTT GGT AAG AAA ATC TGG GAT AAA ATC AAT GCA ATG ATT GCG GTT 137 Y R F G K K I W D K I N A M I A V

GAT GTT GAA ATG GGT GAA ACT CCA GTT GAT GTA ACT TGT CCG TGG GAA GGT GAT GTT GAA ATG GGT GAA ACT CCA GTT GAT GTA ACT TGT CCG TGG GAA GGT 154 D V E M G E T P V D V T C P W E G

GCT AAC TTT GTA CTG AAA GTT AAA CAA GTT TCT GGA TTT AGT AAC TAC GAT GCT AAC TTT GTA CTG AAA GTT AAA CAA GTT TCT GGA TTT AGT AAC TAC GAT 171 A N F V L K V K Q V S G F S N Y D

GAA TCT AAA TTC CTG AAT CAA TCT GCG ATT CCA AAC ATT GAC GAT GAA TCT GAA TCT AAA TTC CTG AAT CAA TCT GCG ATT CCA AAC ATT GAC GAT GAA TCT 188 E S K F L N Q S A I P N I D D E S

TTC CAG AAA GAA CTG TTC GAA CAA ATG GTT GAC CTT TCT GAA ATG ACT TCT TTC CAG AAA GAA CTG TTC GAA CAA ATG GTT GAC CTT TCT GAA ATG ACT TCT 205 F Q K E L F E Q M V D L S E M T S

AAA GAT AAA TTC AAA TCG TTT GAA GAA CTT AAT ACT AAA TTC GGT CAA GTT AAA GAT AAA TTC AAA TCG TTT GAA GAA CTT AAT ACT AAA TTC GGT CAA GTT 222 K D K F K S F E E L N T K F G Q V

ATG GGA ACT GCT GTG ATG GGC GGT GCT GCT GCA ACT GCT GCT AAG AAA GCT ATG GGA ACT GCT GTG ATG GGC GGT GCT GCT GCA ACT GCT GCT AAG AAA GCT 239 M G T A V M G G A A A T A A K K A

GAT AAA GTT GCT GAT GAT TTG GAT GCA TTC AAT GTT GAT GAC TTC AAT ACA GAT AAA GTG GCT GAT GAT TTG GAT GCA TTC AAT GTT GAT GAC TTC AAT ACA 256 D K V A D D L D A F N V D D F N T

AAA ACT GAA GAT GAT TTT ATG AGC TCA AGC TCT GGT AGT TCA TCT AGT GCT AAA ACT GAA GAT GAT TTT ATG AGC TCA AGC TCT GGT AGT TCA TCT AGT GCT 273 K T E D D F M S S S S G S S S S A

GAT GAC ACG GAC CTG GAT GAC CTT TTG AAT GAC CTT TAA GAT GAC ACG GAC CTG GAT GAC CTT TTG AAT GAC CTT TAA 290 D D T D L D D L L N D L stop

b - T4 32-B Protein (pDEST-C1 plasmid)

Forward Primer Sequencing Reaction

ATG GCA CAT CAC CAC CAC CAT CAC GTG GGT ACC GGT TCG AAT GAT GAC NAC NAC

AAA TCA ACA AGT TTG TAC AAA AAA GCA GGC TCC GCG GCC GCC CCC TTC ACC GAG

AAC CTC TAC TTC CAA GGA CTG AAT GGC AAT AAA GGT TTT TCT TCT GAA GAT AAA ATG CTG AAT GGC AAT AAA GGT TTT TCT TCT GAA GAT AAA

317

GGC GAG TGG AAA CTG AAA CTC GAT AAT GCG GGT AAC GGT CAA GCA GTA ATT CGT GGC GAG TGG AAA CTG AAA CTC GAT AAT GCG GGT AAC GGT CAA GCA GTA ATT CGT

TTT CTT CCG TCT AAA AAT GAT GAA CAA GCA CCA TTC GCA ATT CTT GTA AAT CAC TTT CTT CCG TCT AAA AAT GAT GAA CAA GCA CCA TTC GCA ATT CTT GTA AAT CAC

GGT TTC AAG AAA AAT GGT AAA TGG TAT ATT GAA ACA TGT TCA TCT ACC CAT GGT GGT TTC AAG AAA AAT GGT AAA TGG TAT ATT GAA ACA TGT TCA TCT ACC CAT GGT

GAT TAC GAT TCT TGC CCA GTA TGT CAA TAC ATC AGT AAA AAT GAT CTA TAC AAC GAT TAC GAT TCT TGC CCA GTA TGT CAA TAC ATC AGT AAA AAT GAT CTA TAC AAC

ACT GAC AAT AAA GAG TAC AGT CTT GTT AAA CGT AAA ACT TCT TAC TGG GCC AAC ACT GAC AAT AAA GAG TAC AGT CTT GTT AAA CGT AAA ACT TCT TAC TGG GCT AAC

ATT CTT GTA GTA AAA GAC CCA GCT GCT CCA GAA AAC GAA GGT AAA GTA TTT AAA ATT CTT GTA GTA AAA GAC CCA GCT GCT CCA GAA AAC GAA GGT AAA GTA TTT AAA

TAC CGT TTC GGT AAG AAA ATC TGG GAT AAA ATC AAT GCA ATG ATT GCG GTT GAT TAC CGC TTT GGT AAG AAA ATC TGG GAT AAA ATC AAT GCA ATG ATT GCG GTT GAT

GTT GAA ATG GGT GAA ACT CCA GTT GAT GTA ACT TGT CCG TGG GAA GGT GCT AAC GTT GAA ATG GGT GAA ACT CCA GTT GAT GTA ACT TGT CCG TGG GAA GGT GCT AAC

TTT GTA CTG AAA GTT AAA CAA GTT TCT GGA TTT AGT AAC TAC GAT GAA TCT AAA TTT GTA CTG AAA GTT AAA CAA GTT TCT GGA TTT AGT AAC TAC GAT GAA TCT AAA

TTC CTG AAT CAA TCT GCG ATT CCA AAC ATT GAC GAT GAA TCT TTC CAG AAA GAA TTC CTG AAT CAA TCT GCG ATT CCA AAC ATT GAC GAT GAA TCT TTC CAG AAA GAA

CTG TTC GAA CAA ATG GTT GAC CTT TCT GAA ATG ACT TCT AAA GAT AAA TTC AAA CTG TTC GAA CAA ATG GTT GAC CTT TCT GAA ATG ACT TCT AAA GAT AAA TTC AAA

TCG TTT GAA GAA CTT AAT ACT AAA TTC GGT CAA GTT ATG GGA ACT GCT GTG ATG TCG TTT GAA GAA CTT AAT ACT AAA TTC GGT CAA GTT ATG GGA ACT GCT GTG ATG

GGC GGT GCT GCT GCA ACT GCT GCT AAG AAA GCT GAT AAA GTT GCT GAT GAT TTG GGC GGT GCT GCT GCA ACT GCT GCT AAG AAA GCT GAT AAA GTG GCT GAT GAT TTG

GAT GCA TTC AAT GTT GAT GAC TTC AAT ACA AAA ACT GAA GAT GAT TTT ATG AGC GAT GCA TTC AAT GTT GAT GAC TTC AAT ACA AAA ACT GAA GAT GAT TTT ATG AGC

TCA AGC TCT GGT AGT TCA TCT AGT GCT GAT GAC ACG GAC CTG GNT GAC CTT TTG TCA AGC TCT GGT AGT TCA TCT AGT GCT GAT GAC ACG GAC CTG GAT GAC CTT TTG

AAT GAC CTT TAA AAT GAC CTT TAA

Reverse Primer Sequencing Reaction

CTG AAN NNC NNT AAA NNT TTT TCT TCT GAA GAT AAA NGN GAG TGG AAA CTG ATG CTG AAT GGC AAT AAA GGT TTT TCT TCT GAA GAT AAA GGC GAG TGG AAA CTG

318

AAA CTC GAT AAT GCG GGT AA GGT CAA GCA GTA ATT CGT TTT CTT CCG TCT AAA AAA CTC GAT AAT GCG GGT AAC GGT CAA GCA GTA ATT CGT TTT CTT CCG TCT AAA

AAT GAT GAA CAA GCA CCA TTC GCA ATT CTT GTA AAT CAG GTT TCA AGA AAA A T AAT GAT GAA CAA GCA CCA TTC GCA ATT CTT GTA AAT CAC GGT TTC AAG AAA AAT

GGT AAA TGG TAT ATT GAA ACA TGT TCA TCT ACC CAT GGT GAT TAC GAT TCT TGC GGT AAA TGG TAT ATT GAA ACA TGT TCA TCT ACC CAT GGT GAT TAC GAT TCT TGC

CC GTA TGT CAA TAC ATC AGT AAA AAT GAT CTA TAC AAC ACT GAC AAT AAA GAG CCA GTA TGT CAA TAC ATC AGT AAA AAT GAT CTA TAC AAC ACT GAC AAT AAA GAG

TAC AGT CTT GTT AAA CGT AAA CT TCT TAC TGG GCC AAC ATT CTT GTA GTA AAA TAC AGT CTT GTT AAA CGT AAA ACT TCT TAC TGG GCT AAC ATT CTT GTA GTA AAA

GAC CCA GCT GCT CCA GAA AAC GAA GGT AAA GTA TTT AA TAC CG TTTCGGT AAG GAC CCA GCT GCT CCA GAA AAC GAA GGT AAA GTA TTT AAA TAC CGC TTT GGT AAG

AAA ATC TGG GAT AAA ATC AAT GCA ATG ATT GCG GTT GAT*TTT GTA CTG AAA GTT AAA ATC TGG GAT AAA ATC AAT GCA ATG ATT GCG GTT GAT TTT GTA CTG AAA GTT

AAA CAA GTT TCT GGA TTT AG AAC TAC GAT GAA TCT AAA TTC CTG AAT CAA TCT AAA CAA GTT TCT GGA TTT AGT AAC TAC GAT GAA TCT AAA TTC CTG AAT CAA TCT

GCG ATT CCA AAC ATT GAC GAT GAA TCT TTC CAG AAA GA CTG TTC GAA CAA ATG GCG ATT CCA AAC ATT GAC GAT GAA TCT TTC CAG AAA GAA CTG TTC GAA CAA ATG

GTT GAC CTT TCT GAA ATG ACT TCT AAA GAT AAA TTC AAA TCG TTT GAA GAA CTT GTT GAC CTT TCT GAA ATG ACT TCT AAA GAT AAA TTC AAA TCG TTT GAA GAA CTT

AA ACT AAA TTC GGT CAA GTT ATG GGA ACT GCT GTG ATG GGC GGT GCT GCT GCA AAT ACT AAA TTC GGT CAA GTT ATG GGA ACT GCT GTG ATG GGC GGT GCT GCT GCA

ACT GCT GCT AAG AAA GCT GA AAA GTT GCT GAT GAT TTG GAT GCA TTC AAT GTT ACT GCT GCT AAG AAA GCT GAT AAA GTG GCT GAT GAT TTG GAT GCA TTC AAT GTT

GAT GAC TTC AAT ACA AAA ACT GAA GAT GAT TTT ATG AG TCA AGC TCT GGT AGT GAT GAC TTC AAT ACA AAA ACT GAA GAT GAT TTT ATG AGC TCA AGC TCT GGT AGT

TCA TCT AGT GCT GAT GAC ACG GAC CTG GAT GAC CTT TTG AAT GAC CTT TAA TCA TCT AGT GCT GAT GAC ACG GAC CTG GAT GAC CTT TTG AAT GAC CTT TAA

* GTTGAAATGGGTGAAACCCAGTTGATGTAACTTGTCCGTGGGAAGGTGCTAACTTTGTACTGAAAGTT

c - T4 I151D 32-B Protein (pDEST-C1 plasmid)

Forward Primer Sequencing Reaction

ATG GCA CAT CAC CAC CAC CAT CAC GTG GGT ACC GGT TCG AAT GAT GAC GAC NAC

AAA TCA ACA AGT TTG TAC AAA AAA GCA GGC TCC GCG GCC GCC CCC TTC ACC GAG

AAC CTC TAC TTC CAA GGA CTG AAT GGC AAT AAA GGT TTT TCT TCT GAA GAT AAA CTG AAT GGC AAT AAA GGT TTT TCT TCT GAA GAT AAA

319

GGC GAG TGG AAA CTG AAA CTC GAT AAT GCG GGT AAC GGT CAA GCA GTA ATT CGT GGC GAG TGG AAA CTG AAA CTC GAT AAT GCG GGT AAC GGT CAA GCA GTA ATT CGT

TTT CTT CCG TCT AAA AAT GAT GAA CAA GCA CCA TTC GCA ATT CTT GTA AAT CAC TTT CTT CCG TCT AAA AAT GAT GAA CAA GCA CCA TTC GCA ATT CTT GTA AAT CAC

GGT TTC AAG AAA AAT GGT AAA TGG TAT ATT GAA ACA TGT TCA TCT ACC CAT GGT GGT TTC AAG AAA AAT GGT AAA TGG TAT ATT GAA ACA TGT TCA TCT ACC CAT GGT

GAT TAC GAT TCT TGC CCA GTA TGT CAA TAC ATC AGT AAA AAT GAT CTA TAC AAC GAT TAC GAT TCT TGC CCA GTA TGT CAA TAC ATC AGT AAA AAT GAT CTA TAC AAC

ACT GAC AAT AAA GAG TAC AGT CTT GTT AAA CGT AAA ACT TCT TAC TGG GCC AAC ACT GAC AAT AAA GAG TAC AGT CTT GTT AAA CGT AAA ACT TCT TAC TGG GCT AAC

ATT CTT GTA GTA AAA GAC CCA GCT GCT CCA GAA AAC GAA GGT AAA GTA TTT AAA ATT CTT GTA GTA AAA GAC CCA GCT GCT CCA GAA AAC GAA GGT AAA GTA TTT AAA

TAC CGT TTC GGT AAG AAA ATC TGG GAT AAA ATC AAT GCA ATG GAT GCG GTT GAT TAC CGC TTT GGT AAG AAA ATC TGG GAT AAA ATC AAT GCA ATG ATT GCG GTT GAT

GTT GAA ATG GGT GAA ACT CCA GTT GAT GTA ACT TGT CCG TGG GAA GGT GCT AAC GTT GAA ATG GGT GAA ACT CCA GTT GAT GTA ACT TGT CCG TGG GAA GGT GCT AAC

TTT GTA CTG AAA GTT AAA CAA GTT TCT GGA TTT AGT AAC TAC GAT GAA TCT AAA TTT GTA CTG AAA GTT AAA CAA GTT TCT GGA TTT AGT AAC TAC GAT GAA TCT AAA

TTC CTG AAT CAT TCT GCG ATT CCA AAC ATT GAC GAT GAA TCT TTC CAG AAA GAA TTC CTG AAT CAA TCT GCG ATT CCA AAC ATT GAC GAT GAA TCT TTC CAG AAA GAA

CTG TTC GAA CAA ATG GTT GAC CTT TCT GAA ATG ACT TCT AAA GAT AAA TTC AAA CTG TTC GAA CAA ATG GTT GAC CTT TCT GAA ATG ACT TCT AAA GAT AAA TTC AAA

TCG TTT GAA GAA[CTT AAT ACT AAA TTC GGT CAA GTT ATG GGA ACT GCT GTG ATG TCG TTT GAA GAA AGC TGA TAA AGT GGC TGA TGA TTT GGA TGC ATT CAA TGT TGA

GGC GGT GCT GCT GCA ACT GCT GCT AAG AAA GCT GAT AAA GTT GCT GAT GAT TTG TGA CTT CAA TAC AAA AAC TGA AGA TGA TTT TAT GAG CTC AAG CTC TGG TAG TTC

GAT GCA TTC AAT GTT GAT GAC TTC NAT ACA AAA CTG AAG ATG] AT TTT ATG AGC ATC TAG TGC TGA TGA CAC GGA CCT GGA TGA CCT TTT GAA TGA CAT TTT ATG AGC

TCA AGC TCT GGT AGT TCA TCT AGT GCT GAT GAC ACG GAC CTG GAT GAC CTT TTG TCA AGC TCT GGT AGT TCA TCT AGT GCT GAT GAC ACG GAC CTG GAT GAC CTT TTG

AAT GAC CTT TAA AAT GAC CTT TAA

Reverse Primer Sequencing Reaction

CTG AAT GNC AAT AAA NGT TTT TCT TCT GAA GAT AAA GGC GAG TGG AAA CTG ATG CTG AAT GGC AAT AAA GGT TTT TCT TCT GAA GAT AAA GGC GAG TGG AAA CTG

AAA CTC GAT AAT GCG GGT AAC GGT CAA GCA GTA ATT CGT TTT CTT CCG TCT AAA AAA CTC GAT AAT GCG GGT AAC GGT CAA GCA GTA ATT CGT TTT CTT CCG TCT AAA

320

AAT GAT GAA CAA GCA CCA TTC GCA ATT CTT GTA AAT CAC GGT TTC AAG AAA AAT AAT GAT GAA CAA GCA CCA TTC GCA ATT CTT GTA AAT CAC GGT TTC AAG AAA AAT

GGT AAA TGG TAT ATT GAA ACA TGT TCA TCT ACC CAT GGT GAT TAC GAT TCT TGC GGT AAA TGG TAT ATT GAA ACA TGT TCA TCT ACC CAT GGT GAT TAC GAT TCT TGC

CCA GTA TGT CAA TAC ATC AGT AAA AAT GAT CTA TAC AAC ACT GAC AAT AAA GAG CCA GTA TGT CAA TAC ATC AGT AAA AAT GAT CTA TAC AAC ACT GAC AAT AAA GAG

TAC AGT CTT GTT AAA CGT AAA ACT TCT TAC TGG GCC AAC ATT CTT GTA GTA AAA TAC AGT CTT GTT AAA CGT AAA ACT TCT TAC TGG GCT AAC ATT CTT GTA GTA AAA

GAC CCA GCT GCT CCA GAA AAC GAA GGT AAA GTA TTT AAA TAC CGT TTC GGT AAG GAC CCA GCT GCT CCA GAA AAC GAA GGT AAA GTA TTT AAA TAC CGC TTT GGT AAG

AAA ATC TGG GAT AAA ATC AAT GCA ATG GAT GCG GTT GAT GTT GAA ATG GGT GAA AAA ATC TGG GAT AAA ATC AAT GCA ATG ATT GCG GTT GAT GTT GAA ATG GGT GAA

ACT CCA GTT GAT GTA ACT TGT CCG TGG GAA GGT GCT AAC TTT GTA CTG AAA GTT ACT CCA GTT GAT GTA ACT TGT CCG TGG GAA GGT GCT AAC TTT GTA CTG AAA GTT

AAA CAA GTT TCT GGA TTT AGT AAC TAC GAT GAA TCT AAA TTC CTG AAT CAT TCT AAA CAA GTT TCT GGA TTT AGT AAC TAC GAT GAA TCT AAA TTC CTG AAT CAA TCT

GCG ATT CCA AAC ATT GAC GAT GAA TCT TTC CAG AAA GAA CTG TTC GAA CAA ATG GCG ATT CCA AAC ATT GAC GAT GAA TCT TTC CAG AAA GAA CTG TTC GAA CAA ATG

GTT GAC CTT TCT GAA ATG ACT TCT AAA GAT AAA TTC AAA TCG TTT GAA GAA CTT GTT GAC CTT TCT GAA ATG ACT TCT AAA GAT AAA TTC AAA TCG TTT GAA GAA CTT

AAT ACT AAA TTC GGT CAA GTT ATG GGA ACT GCT GTG ATG GGC GGT GCT GCT GCA AAT ACT AAA TTC GGT CAA GTT ATG GGA ACT GCT GTG ATG GGC GGT GCT GCT GCA

ACT GCT GCT AAG AAA GCT GAT AAA GTT GCT GAT GAT TTG GAT GCA TTC AAT GTT ACT GCT GCT AAG AAA GCT GAT AAA GTG GCT GAT GAT TTG GAT GCA TTC AAT GTT

GAT GAC TTC AAT ACA AAA ACT GAA GAT GAT TTT ATG AGC TCA AGC TCT GGT AGT GAT GAC TTC AAT ACA AAA ACT GAA GAT GAT TTT ATG AGC TCA AGC TCT GGT AGT

TCA TCT AGT GCT GAT GAC ACG GAC CTG GAT GAC CTT TTG AAT GAC CTT TAA TCA TCT AGT GCT GAT GAC ACG GAC CTG GAT GAC CTT TTG AAT GAC CTT TAA

d – T4 I60D 32-B Protein (pDEST-C1 plasmid)

Forward Primer Sequencing Reaction

ATG GCA CAT CAC CAC CAC CAT CAC GTG GGT ACC GGT TCG AAT GAT GAC GAC GAC AAA TCA ACA AGT TTG TAC AAA AAA GCA GGC TCC GCG GCC GCC CCC TTC ACC GAG

AAC CTC TAC TTC CAA GGA CTG AAT GGC AAT AAA GGT TTT TCT TCT GAA GAT AAA ATG CTG AAT GGC AAT AAA GGT TTT TCT TCT GAA GAT AAA

321

GGC GAG TGG AAA CTG AAA CTC GAT AAT GCG GGT AAC GGT CAA GCA GTA ATT CGT GGC GAG TGG AAA CTG AAA CTC GAT AAT GCG GGT AAC GGT CAA GCA GTA ATT CGT

TTT CTT CCG TCT AAA AAT GAT GAA CAA GCA CCA TTC GCA GAT CTT GTA AAT CAC TTT CTT CCG TCT AAA AAT GAT GAA CAA GCA CCA TTC GCA ATT CTT GTA AAT CAC

GGT TTC ANN AAA AAN GGT AAA TGG TAT ATT GAA ACA TGT TCA TCT ACC CAN GGT GGT TTC AAG AAA AAT GGT AAA TGG TAT ATT GAA ACA TGT TCA TCT ACC CAT GGT

GAT TAC NAT TCT TGC CCA NTA TGT CNN TAC NTC NGN AAN AAT GAT CTA TAC AAC GAT TAC GAT TCT TGC CCA GTA TGT CAA TAC ATC AGT AAA AAT GAT CTA TAC AAC

ACT GAC NAT AAA GAG TAN NGT CTT GTT AAA CGT AAA ACT TCN TAC TGG GCC NNC ACT GAC AAT AAA GAG TAC AGT CTT GTT AAA CGT AAA ACT TCT TAC TGG GCT AAC

ATT CTT GTA NTA AAA GAC CCN GCT GCT CCA NAA AAC GAA NGT AAA GTA TTT AAA ATT CTT GTA GTA AAA GAC CCA GCT GCT CCA GAA AAC GAA GGT AAA GTA TTT AAA

TAC CGN TTC GGT AAG AAA ATC TGG GAT AAA ATC AAT GCA ATG ATT GCG GTT GAN TAC CGC TTT GGT AAG AAA ATC TGG GAT AAA ATC AAT GCA ATG ATT GCG GTT GAT

GTT GAA ATG GGT GAA ACT CAN GTT GAN GTA ACT TGT NCG TGN GAA GGT GCT AAN GTT GAA ATG GGT GAA ACT CCA GTT GAT GTA ACT TGT CCG TGG GAA GGT GCT AAC

TTT GTA CTG AAA GTT AAA CAA GTT TCT GGA TTT ANT AAC TAC GAT GAA TCT AAN TTT GTA CTG AAA GTT AAA CAA GTT TCT GGA TTT AGT AAC TAC GAT GAA TCT AAA

TTC CTG AAT CAA TCT GCN ATT CCN AAC ATT GAC GAT GAA TCT TTC CAG ANA GAA TTC CTG AAT CAA TCT GCG ATT CCA AAC ATT GAC GAT GAA TCT TTC CAG AAA GAA

CTG TTC GAA CAN ATG GTT GAC CTT TCT GAN NTG ACT TCT AAA GAT ANA TTC NNA CTG TTC GAA CAA ATG GTT GAC CTT TCT GAA ATG ACT TCT AAA GAT AAA TTC AAA

TCG NTT GAA NAA CTT AAT ACT ANA TTC NGT CAA GTT ATN GGA ACT GCT GTG ATG TCG TTT GAA GAA CTT AAT ACT AAA TTC GGT CAA GTT ATG GGA ACT GCT GTG ATG

GGC GGT GCT GCT GCA ACT GCT GCT AAN AAA GCT GAT ANA GNT GCT GAT GAT TTG GGC GGT GCT GCT GCA ACT GCT GCT AAG AAA GCT GAT AAA GTG GCT GAT GAT TTG

GAT GCA TTN NAT GTT GAT GAC TTC NAT ACA AAA CT GAN NT GAT TNT ATG AGC GAT GCA TTC AAT GTT GAT GAC TTC AAT ACA AAA ACT GAA GAT GAT TTT ATG AGC

TCA AGC TCT GGT AGN TCA TCT AGT GNT GAT GAC NCG GAC CTG NNT GAC CNT TTG TCA AGC TCT GGT AGT TCA TCT AGT GCT GAT GAC ACG GAC CTG GAT GAC CTT TTG

AAT GAC CTT TAA AAT GAC CTT TAA

Reverse Primer Sequencing Reaction

CTG AAN NNC NAN AAN GNT TTT TCT TCN NAA GAT AAAAGGC GAG TNG AAN CTG ATG CTG AAT GGC AAT AAA GGT TTT TCT TCT GAA GAT AAA GGC GAG TGG AAA CTG

NAN CTC GAT AAT GNG GGN ANN GGT CAA GCA GTA ATN NGT NTT CTT CCG NNN ANA AAA CTC GAT AAT GCG GGT AAC GGT CAA GCA GTA ATT CGT TTT CTT CCG TCT AAA

322

AAT GAT NAA CAA GCA CCA TTC GCA GAT CTT GTA AAT CAC GGT TTC AAG AAA AAT AAT GAT GAA CAA GCA CCA TTC GCA ATT CTT GTA AATCAC GGT TTC AAG AAA AAT

GGT AAA TGG TAT ATT GAA ACA TGT TCA TCT ANN CAT GGT GAT TAC GAT TCT TGC GGT AAA TGG TAT ATT GAA ACA TGT TCA TCT ACC CAT GGT GAT TAC GAT TCT TGC

CCA GTA TGT CAA TAC ATC AGT AAA AAT GAT CTA TAC AAC ACT GAC AAT AAA GAG CCA GTA TGT CAA TAC ATC AGT AAA AAT GAT CTA TAC AAC ACT GAC AAT AAA GAG

TAC AGT CTT GTT AAA CGT AAA ACT TCT TAC TGG GCC AAC ATT CTT GTA GTA AAA TAC AGT CTT GTT AAA CGT AAA ACT TCT TAC TGG GCT AAC ATT CTT GTA GTA AAA

GAC CCA GCT GCT CCA GAA AAC GAA GGT AAA GTA TTT AAA TAC CGT TTC GGT AAG GAC CCA GCT GCT CCA GAA AAC GAA GGT AAA GTA TTT AAA TAC CGC TTT GGT AAG

AAA ATC TGG GAT AAA ATC AAT GCA ATG ATT GCG GTT GAT GTT GAA ATG GGT GAA AAA ATC TGG GAT AAA ATC AAT GCA ATG ATT GCG GTT GAT GTT GAA ATG GGT GAA

ACT CCA GTT GAT GTA ACT TGT CCG TGG GAA GGT GCT AAC TTT GTA CTG AAA GTT ACT CCA GTT GAT GTA ACT TGT CCG TGG GAA GGT GCT AAC TTT GTA CTG AAA GTT

AAA CAA GTT TCT GGA TTT AGT AAC TAC GAT GAA TCT AAA TTC CTG AAT CAA TCT AAA CAA GTT TCT GGA TTT AGT AAC TAC GAT GAA TCT AAA TTC CTG AAT CAA TCT

GCG ATT CCA AAC ATT GAC GAT GAA TCT TTC CAG AAA GAA CTG TTC GAA CAA ATG GCG ATT CCA AAC ATT GAC GAT GAA TCT TTC CAG AAA GAA CTG TTC GAA CAA ATG

GTT GAC CTT TCT GAA ATG ACT TCT AAA GAT AAA TTC AAA TCG TTT GAA GAA CTT GTT GAC CTT TCT GAA ATG ACT TCT AAA GAT AAA TTC AAA TCG TTT GAA GAA CTT

AAT ACT AAA TTC GGT CAA GTT ATG GGA ACT GCT GTG ATG GGC GGT GCT GCT GCA AAT ACT AAA TTC GGT CAA GTT ATG GGA ACT GCT GTG ATG GGC GGT GCT GCT GCA

ACT GCT GCT AAG AAA GCT GAT AAA GTT GCT GAT GAT TTG GAT GCA TTC AAT GTT ACT GCT GCT AAG AAA GCT GAT AAA GTG GCT GAT GAT TTG GAT GCA TTC AAT GTT

GAT GAC TTC AAT ACA AAA ACT GAA GAT GAT TTT ATG AGC TCA AGC TCT GGT AGT GAT GAC TTC AAT ACA AAA ACT GAA GAT GAT TTT ATG AGC TCA AGC TCT GGT AGT

TCA TCT AGT GCT GAT GAC ACG GAC CTG GAT GAC CTT TTG AAT GAC CTT TAA TCA TCT AGT GCT GAT GAC ACG GAC CTG GAT GAC CTT TTG AAT GAC CTT TAA

e – Rb69 RNase H (pDEST-C1 plasmid)

Forward Primer Sequencing Reaction

ATG GAT TTA GAA ATG ATG TTG GAT GAA GAT TAC AAA GAA GGT ATT GCG CTT GCA ATG GAT TTA GAA ATG ATG TTG GAT GAA GAT TAC AAA GAA GGT ATT GCG CTT GCA

GAC TTT AGT AAC ATT GCA TTG GCA GCT GCA TTA AAC AAC TTT GAA GAT GGT GAT GAC TTT AGT AAC ATT GCA TTG GCA GCT GCA TTA AAC AAC TTT GAA GAT GGT GAT

AAA ATT ACC GTT CCG ATG GTT CGT CAT GTA GTC TTG AAT TCA ATT CGT AAA AAC AAA ATT ACC GTT CCG ATG GTT CGT CAT GTA GTC TTG AAT TCA ATT CGT AAA AAC

323

GTA GTG ATG TTC CGT AAG CAA GGT TAT ACA AAA TTT GTA TTG TGC ATG GAT AAC GTA GTG ATG TTC CGT AAG CAA GGT TAT ACA AAA TTT GTA TTG TGC ATG GAT AAC

GCT ACT TCT GGG TAT TGG CGA CGC GAC TTT GCT TAC TAC TAC AAG AAA AAT CGT GCT ACT TCT GGG TAT TGG CGA CGC GAC TTT GCT TAC TAC TAC AAG AAA AAT CGT

AAA ACT GAT CGT GAA GCT TCA AAG TGG GAT TGG GAA GGA TAT TTT ACT GCA CTT AAA ACT GAT CGT GAA GCT TCA AAG TGG GAT TGG GAA GGA TAT TTT ACT GCA CTT

CAT CAA GTC GTT GAT GAG ATT AAG AAA TAT ATG CCA TAC GTT GTA ATG GAT ATT CAT CAA GTC GTT GAT GAG ATT AAG AAA TAT ATG CCA TAC GTT GTA ATG GAT ATT

GAC AAA TAC GAA GCG GAT GAC CAT ATC GGC GTA TTA ACT AAA TAT TTG TCA TTA GAC AAA TAC GAA GCG GAT GAC CAT ATC GGC GTA TTA ACT AAA TAT TTG TCA TTA

GCT GGT CAT AAG GTG TGT ATT GTT GCA TCA GAT GGT GAC TTT ACA CAA TTA CAC GCT GGT CAT AAG GTG TGT ATT GTT GCA TCA GAT GGT GAC TTT ACA CAA TTA CAC

AAA TAC CCT AAC GTT AAA CAG TGG TCG CCA CCG CAG AAA AAA TGG GTT AAA ATT AAA TAC CCT AAC GTT AAA CAG TGG TCG CCA CCG CAG AAA AAA TGG GTT AAA ATT

AAG AAT GGT TCT GCC GAA ATT GAT TGC ATG ACT AAA ATT CTT AAA GGC GAC CGT AAG AAT GGT TCT GCC GAA ATT GAT TGC ATG ACT AAA ATT CTT AAA GGC GAC CGT

AAA GAT GGT GTT GCG TCT GTT CGA GTT CGT GGT GAT TTC TGG TTT ACT CGA GTC AAA GAT GGT GTT GCG TCT GTT CGA GTT CGT GGT GAT TTC TGG TTT ACT CGA GTC

GAA NGC GAA CGA ACT CCA AGC ATG AAA ACA ACG ATC ATT GAA GCA CTT GCC AAT GAA GGC GAA CGA ACT CCA AGC ATG AAA ACA ACG ATC ATT GAA GCA CTT GCC AAT

GAT CGT TCT CAA GCT GAA GTA TTA TTA AGT GCA GAA NAA TAT AAA CGG TAC CAA GAT CGT TCT CAA GCT GAA GTA TTA TTA AGT GCA GAA GAA TAT AAA CGG TAC CAA

GAA AAT TTG GTT CTC ATT GAT TTT GAT TAT ATC CCT GAT AAT ATT GCT TCA ACC GAA AAT TTG GTT CTC ATT GAT TTT GAT TAT ATC CCT GAT AAT ATT GCT TCA ACC

ATT ATA GAG TAT TAT AAC TCA TAT CAN CCA CAA CCT AAA GGC AAG ATT TAT TCN ATT ATA GAG TAT TAT AAC TCA TAT CAA CCA CAA CCT AAA GGC AAG ATT TAT TCA

TAC TTT GTA AAA TCC GGT CTT TCT AAA TTA ACN AGT GTA ATT AAT GAA TTC TGA TAC TTT GTA AAA TCC GGT CTT TCT AAA TTA ACA AGT GTA ATT AAT GAA TTC TGA

Reverse Primer Sequencing Reaction

ATG GNT TTA GAA ATG ATN TG GAT GAN GAT TNC AAA GAA GGT ATT GCG CTT GCA ATG GAT TTA GAA ATG ATG TTG GAT GAA GAT TAC AAA GAA GGT ATT GCG CTT GCA

GAC TTT AGT ANC ATT GCA TNG NCA GCT GCA TTA AAC AAC TT GAA GAT GGT GAT GAC TTT AGT AAC ATT GCA TTG GCA GCT GCA TTA AAC AAC TTT GAA GAT GGT GAT

AAA ATT ACC GTT CCG ATG GTT CGT CAT GTA GTC TTG AAT TCA ATT NGT AAA AAC AAA ATT ACC GTT CCG ATG GTT CGT CAT GTA GTC TTG AAT TCA ATT CGT AAA AAC

GTA GTG ATG TTC CGT AAG CAA GGT TAT ACA AAA TTT GTA TTG TGC ATG GAT AAC GTA GTG ATG TTC CGT AAG CAA GGT TAT ACA AAA TTT GTA TTG TGC ATG GAT AAC

324

GCT ACT TCT GGG TAT TGG CGA CGC GAC TTT GCT TAC TAC TAC AAG AAA AAT CGT GCT ACT TCT GGG TAT TGG CGA CGC GAC TTT GCT TAC TAC TAC AAG AAA AAT CGT

AAA ACT GAT CGT GAA GCT TCA AAG TGG GAT TGG GAA GGA TAT TTT ACT GCA CTT AAA ACT GAT CGT GAA GCT TCA AAG TGG GAT TGG GAA GGA TAT TTT ACT GCA CTT

CAT CAA GTC GTT GAT GAG ATT AAG AAA TAT ATG CCA TAC GTT GTA ATG GAT ATT CAT CAA GTC GTT GAT GAG ATT AAG AAA TAT ATG CCA TAC GTT GTA ATG GAT ATT

GAC AAA TAC GAA GCG GAT GAC CAT ATC GGC GTA TTA ACT AAA TAT TTG TCA TTA GAC AAA TAC GAA GCG GAT GAC CAT ATC GGC GTA TTA ACT AAA TAT TTG TCA TTA

GCT GGT CAT AAG GTG TGT ATT GTT GCA TCA GAT GGT GAC TTT ACA CAA TTA CAC GCT GGT CAT AAG GTG TGT ATT GTT GCA TCA GAT GGT GAC TTT ACA CAA TTA CAC

AAA TAC CCT AAC GTT AAA CAG TGG TCG CCA CCG CAG AAA AAA TGG GTT AAA ATT AAA TAC CCT AAC GTT AAA CAG TGG TCG CCA CCG CAG AAA AAA TGG GTT AAA ATT

AAG AAT GGT TCT GCC GAA ATT GAT TGC ATG ACT AAA ATT CTT AAA GGC GAC CGT AAG AAT GGT TCT GCC GAA ATT GAT TGC ATG ACT AAA ATT CTT AAA GGC GAC CGT

AAA GAT GGT GTT GCG TCT GTT CGA GTT CGT GGT GAT TTC TGG TTT ACT CGA GTC AAA GAT GGT GTT GCG TCT GTT CGA GTT CGT GGT GAT TTC TGG TTT ACT CGA GTC

GAA GGC GAA CGA ACT CCA AGC ATG AAA ACA ACG ATC ATT GAA GCA CTT GCC AAT GAA GGC GAA CGA ACT CCA AGC ATG AAA ACA ACG ATC ATT GAA GCA CTT GCC AAT

GAT CGT TCT CAA GCT GAA GTA TTA TTA AGT GCA GAA GAA TAT AAA CGG TAC CAA GAT CGT TCT CAA GCT GAA GTA TTA TTA AGT GCA GAA GAA TAT AAA CGG TAC CAA

GAA AAT TTG GTT CTC ATT GAT TTT GAT TAT ATC CCT GAT AAT ATT GCT TCA ACC GAA AAT TTG GTT CTC ATT GAT TTT GAT TAT ATC CCT GAT AAT ATT GCT TCA ACC

ATT ATA GAG TAT TAT AAC TCA TAT CAA CCA CAA CCT AAA GGC AAG ATT TAT TCA ATT ATA GAG TAT TAT AAC TCA TAT CAA CCA CAA CCT AAA GGC AAG ATT TAT TCA

TAC TTT GTA AAA TCC GGT CTT TCT AAA TTA ACA AGT GTA ATT AAT GAA TTC TGA TAC TTT GTA AAA TCC GGT CTT TCT AAA TTA ACA AGT GTA ATT AAT GAA TTC TGA

f – Rb69 D132N RNase H (pDEST-C1 plasmid)

Forward Primer Sequencing Reaction

ATG GAT TTA GAA ATG ATG TTG GAT GAA GAT TAC AAA GAA GGT ATT GCG CTT GCA ATG GAT TTA GAA ATG ATG TTG GAT GAA GAT TAC AAA GAA GGT ATT GCG CTT GCA

GAC TTT AGT AAC ATT GCA TTG GCA GCT GCA TTA AAC AAC TTT GAA GAT GGT GAT GAC TTT AGT AAC ATT GCA TTG GCA GCT GCA TTA AAC AAC TTT GAA GAT GGT GAT

AAA ATT ACC GTT CCG ATG GTT CGT CAT GTA GTC TTG AAT TCA ATT CGT AAA AAC AAA ATT ACC GTT CCG ATG GTT CGT CAT GTA GTC TTG AAT TCA ATT CGT AAA AAC

GTA GTG ATG TTC CGT AAG CAA GGT TAT ACA AAA TTT GTA TTG TGC ATG GAT AAC GTA GTG ATG TTC CGT AAG CAA GGT TAT ACA AAA TTT GTA TTG TGC ATG GAT AAC

325

GCT ACT TCT GGG TAT TGG CGA CGC GAC TTT GCT TAC TAC TAC AAG AAA AAT CGT GCT ACT TCT GGG TAT TGG CGA CGC GAC TTT GCT TAC TAC TAC AAG AAA AAT CGT

AAA ACT GAT CGT GAA GCT TCA AAG TGG GAT TGG GAA GGA TAT TTT ACT GCA CTT AAA ACT GAT CGT GAA GCT TCA AAG TGG GAT TGG GAA GGA TAT TTT ACT GCA CTT

CAT CAA GTC GTT GAT GAG ATT AAG AAA TAT ATG CCA TAC GTT GTA ATG GAT ATT CAT CAA GTC GTT GAT GAG ATT AAG AAA TAT ATG CCA TAC GTT GTA ATG GAT ATT

GAC AAA TAC GAA GCG AAT GAC CAT ATC GGC GTA TTA ACT AAA TAT TTG TCA TTA GAC AAA TAC GAA GCG GAT GAC CAT ATC GGC GTA TTA ACT AAA TAT TTG TCA TTA

GCT GGT CAT AAG GTG TGT ATT GTT GCA TCA GAT GGT GAC TTT ACA CAA TTA CAC GCT GGT CAT AAG GTG TGT ATT GTT GCA TCA GAT GGT GAC TTT ACA CAA TTA CAC

AAA TAC CCT AAC GTT AAA CAG TGG TCG CCA CCG CAG AAA AAA TGG GTT AAA ATT AAA TAC CCT AAC GTT AAA CAG TGG TCG CCA CCG CAG AAA AAA TGG GTT AAA ATT

AAG AAT GGT TCT GCC GAA ATT GAT TGC ATG ACT AAA ATT CTT AAA GGC GAC CGT AAG AAT GGT TCT GCC GAA ATT GAT TGC ATG ACT AAA ATT CTT AAA GGC GAC CGT

AAA GAT GGT GTT GCG TCT GTT CGA GTT CGT GGT GAT TTC TGG TTT ACT CGA GTC AAA GAT GGT GTT GCG TCT GTT CGA GTT CGT GGT GAT TTC TGG TTT ACT CGA GTC

GAA NGC GAA CGA ACT CCA AGC ATG AAA ACA ACG ATC ATT GAA GCA CTT GCC AAT GAA GGC GAA CGA ACT CCA AGC ATG AAA ACA ACG ATC ATT GAA GCA CTT GCC AAT

GAT CGT TCT CAA GCT GAA GTA TTA TTA AGT GCA GAA GAA TAT AAA CGG TAC CAA GAT CGT TCT CAA GCT GAA GTA TTA TTA AGT GCA GAA GAA TAT AAA CGG TAC CAA

GAA AAT TTG GTT CTC ATT GAT TTT GAT TAT ATC CCT GAT AAT ATT GCT TCA ACC GAA AAT TTG GTT CTC ATT GAT TTT GAT TAT ATC CCT GAT AAT ATT GCT TCA ACC

ATT ATA GAG TAT TAT AAC TCA TAT CAA CCA CAA CCT AAN GGC AAG ATT TAT TCA ATT ATA GAG TAT TAT AAC TCA TAT CAA CCA CAA CCT AAA GGC AAG ATT TAT TCA

TAC TTT GTA AAA NCC CGG TCT TTC TAN TTA ANN AGT GTA ATT NAT GAA TTC TGA TAC TTT GTA AAA TCC GGT CTT TCT AAA TTA ACA AGT GTA ATT AAT GAA TTC TGA

Reverse Primer Sequencing Reaction

ATG NN TT AGA AAT GAT GTN GAT GAA GAT TAC AAA GAA GGT ATT GCG CTT GCA ATG GAT TTA GAA ATG ATG TTG GAT GAA GAT TAC AAA GAA GGT ATT GCG CTT GCA

GAC TTT AGT ANC AT GCA TNG GCA GCT GCA TTA AAC AAC TT GAA GAT GGT GAT GAC TTT AGT AAC ATT GCA TTG GCA GCT GCA TTA AAC AAC TTT GAA GAT GGT GAT

AAA ATT ACC GTT CCG ATG GTT CGT CAT GTA GTC TTG AAT TCA ATT CGT AAA AAC AAA ATT ACC GTT CCG ATG GTT CGT CAT GTA GTC TTG AAT TCA ATT CGT AAA AAC

GTA GTG ATG TTC CGT AAG CAA GGT TAT ACA AAA TTT GTA TTG TGC ATG GAA AAC GTA GTG ATG TTC CGT AAG CAA GGT TAT ACA AAA TTT GTA TTG TGC ATG GAT AAC

GCT ACT TCT GGG TAT TGG CGA CGC GAC TTT GCT TAC TAC TAC AAG AAA AAT CGT GCT ACT TCT GGG TAT TGG CGA CGC GAC TTT GCT TAC TAC TAC AAG AAA AAT CGT

326

AAA ACT GAT CGT GAA GCT TCA AAG TGG GAT TGG GAA GGA TAT TTT ACT GCA CTT AAA ACT GAT CGT GAA GCT TCA AAG TGG GAT TGG GAA GGA TAT TTT ACT GCA CTT

CAT CAA GTC GTT GAT GAG ATT AAG AAA TAT ATG CCA TAC GTT GTA ATG GAT ATT CAT CAA GTC GTT GAT GAG ATT AAG AAA TAT ATG CCA TAC GTT GTA ATG GAT ATT

GAC AAA TAC GAA GCG AAT GAC CAT ATC GGC GTA TTA ACT AAA TAT TTG TCA TTA GAC AAA TAC GAA GCG GAT GAC CAT ATC GGC GTA TTA ACT AAA TAT TTG TCA TTA

GCT GGT CAT AAG GTG TGT ATT GTT GCA TCA GAT GGT GAC TTT ACA CAA TTA CAC GCT GGT CAT AAG GTG TGT ATT GTT GCA TCA GAT GGT GAC TTT ACA CAA TTA CAC

AAA TAC CCT AAC GTT AAA CAG TGG TCG CCA CCG CAG AAA AAA TGG GTT AAA ATT AAA TAC CCT AAC GTT AAA CAG TGG TCG CCA CCG CAG AAA AAA TGG GTT AAA ATT

AAG AAT GGT TCT GCC GAA ATT GAT TGC ATG ACT AAA ATT CTT AAA GGC GAC CGT AAG AAT GGT TCT GCC GAA ATT GAT TGC ATG ACT AAA ATT CTT AAA GGC GAC CGT

AAA GAT GGT GTT GCG TCT GTT CGA GTT CGT GGT GAT TTC TGG TTT ACT CGA GTC AAA GAT GGT GTT GCG TCT GTT CGA GTT CGT GGT GAT TTC TGG TTT ACT CGA GTC

GAA GGC GAA CGA ACT CCA AGC ANG AAA ACA ACG ATC ATT GAA GCA CTT GCC AAT GAA GGC GAA CGA ACT CCA AGC ATG AAA ACA ACG ATC ATT GAA GCA CTT GCC AAT

GAT CGT TCT CAA GCT GAA GTA TTA TTA AGT GCA GAA GAA TAT AAA CGG TAC CAA GAT CGT TCT CAA GCT GAA GTA TTA TTA AGT GCA GAA GAA TAT AAA CGG TAC CAA

GAA AAT TTG GTT CTC ATT GAT TTT GAT TAT ATC CCT GAT AAT ATT GCT TCA ACC GAA AAT TTG GTT CTC ATT GAT TTT GAT TAT ATC CCT GAT AAT ATT GCT TCA ACC

ATT ATA GAG TAT TAT AAC TCA TAT CAA CCA CAA CCT AAA GGC AAG ATT TAT TCA ATT ATA GAG TAT TAT AAC TCA TAT CAA CCA CAA CCT AAA GGC AAG ATT TAT TCA

TAC TTT GTA AAA TCC GGT CTT TCT AAA TTA ACA AGN GTA ATT AAT GAA TTC TGA TAC TTT GTA AAA TCC GGT CTT TCT AAA TTA ACA AGT GTA ATT AAT GAA TTC TGA

The results from the sequencing reactions are highlighted in yellow, with the GOI nucleotide sequence below. The mismatches are not highlighted. In the forward primer sequence, the start codon present in the pDEST-C1 is highlighted in green and the stop codon is highlighted in red. The sequence coding for the His-Tag is in pink and the one coding for the TEV protease site in blue. In between the two is the linker sequence. All of these are not presented in the reverse primer reaction. The mutated nucleotides are shown in red.