<<

Sequence Specificity of BUZ, PDZ, SH2, and Tandem BRCT Domains

DISSERTATION

Presented in Partial Fulfillment of the Requirements for

the Degree Doctor of Philosophy in the Graduate

School of The Ohio State University

By

Ryan Hard, B.S.

Ohio State Biochemistry Program

*****

The Ohio State University 2013

Dissertation Committee:

Professor Dehua Pei, Advisor

Professor Michael Ibba

Professor Jennifer Ottesen

Copyright by

Ryan Hard

2013

ABSTRACT

Src Homology 2 (SH2), Post-Synaptic Density-95/Discs Large/Zonula

Occludens-1 (PDZ), Binder of Ubiquitin Zinc Finger (BUZ), and BRCA1 C-terminal

(BRCT) domains are short peptide-binding modules that recognize different types of peptide motifs within their protein binding partners. Determination of the type of peptide motifs preferred by each domain would lead to a better understanding of the in vivo role of each domain, including which protein(s) they recognize and how they recognize them.

To determine the binding specificity of these domains, we constructed combinatorial one-bead-one-compound (OBOC) peptide libraries and screened them against each domain, sequenced positive hits from the screens by partial Edman degradation/mass spectrometry, and sorted the binding sequences into recognizable patterns. We used in vitro peptide binding assays, site-directed mutagenesis, and pull-down assay techniques to validate the screening results and help understand the nature of the protein-peptide interactions.

The SH2 domains of PLCγ1 (both SH2 domains) and TSAd were screened against phosphotyrosine (pY)-containing OBOC peptide libraries. The SH2 domain of TSAd selected for only one class of sequences, while the other SH2 domains each selected multiple classes of ligands. Generally, the SH2 domains showed selectivity for residues at the +1, +2, and +3 positions (relative to pY) and not for residues N-terminal to pY.

PDZ and BUZ domains recognize their protein binding partners at the C-terminus, specifically requiring the free carboxylate group for efficient binding. In order to probe

ii the binding specificity of these C-terminal binding domains, we constructed spatially segregated OBOC peptide libraries, where peptides in the exterior of each bead presented a free C-terminus and could therefore interact with the PDZ and BUZ domains, while peptides in the bead interior presented a free N-terminus so that they could be sequenced using partial Edman degradation and mass spectrometry. Using this type of library, the

C-terminal sequence specificity of the PDZ domains of Tiam1 and Tiam2 and the BUZ domains of Ubp-M and HDAC6 were examined. Each PDZ and BUZ domain selected for multiple classes of binding sequences. In vitro peptide binding studies were used to verify the BUZ/PDZ-peptide interactions, including ones with a Tiam1 mutant that helped to illustrate the basis of PDZ domain ligand affinity and specificity.

Furthermore, the BUZ domain of Ubp-M was used in a pull-down assay of a protein containing a C-terminus matching one of its consensus binding sequences, which further validated the screening results.

Tandem BRCT domains recognize their protein binding partners either at internal or C-terminal peptide motifs. In either case, they usually require at least one phosphoserine (pS) or phosphothreonine (pT) residue to bind with high affinity. In order to determine the phosphopeptide binding specificity of the tandem BRCT domains, we constructed two pS/pT OBOC libraries. One pS/pT OBOC library, like the libraries used to study the PDZ and BUZ domains, presented peptides with free C-termini on the library bead surfaces and contained encoding peptides with free N-termini in the bead interior.

A second library was constructed where the peptides were presented in the normal N- terminal to C-terminal direction on both the interior and exterior portions of the library beads. All 16 known human tandem BRCT domains (or their closel related mouse orthologs) were screened against both phosphopeptide libraries, with 8 of the 16 domains

iii giving either well-defined or general consensus binding motifs. The C-terminal BRCT repeat of PTIP was found to bind to a novel dual phosphoserine motif. The remainder of the domains either failed to show binding selectivity or showed binding selectivity in the peptide library screens but failed to bind to resynthesized peptides in solution.

iv

DEDICATION

Dedicated to my family

v

ACKNOWLEDGEMENTS

I would like to thank my advisor, Dr. Dehua Pei, for his guidance throughout my career at Ohio State. His advice on how to think about and practice science will help guide me in my future career. I would also like to thank my committee members for their guidance on my candidacy exam and dissertation defense. I also would like to thank

Dr. Pei Zhou, Dr. Ernesto Fuentes, and Dr. Junjie Chen for their collaborations.

I am grateful to my current and former labmates for their assistance in my career at Ohio State. I am especially grateful to Dr. Sang Hoon Joo for synthesizing the library

I first worked with and to Dr. Tao Liu for his advice on protein purification and peptide library synthesis. I would also like to thank Dr. Pauline Tan for her guidance on a variety of different subjects.

Finally, I would like to thank my parents and grandparents for their support and encouragement over the years. My parents pushed me to work hard, which was critical to my academic success. I would also like to thank them for their financial support, which allowed me to focus more on my academic career and less on my basic needs.

vi

VITA

May 2007 ……………………………... B.S. Biochemistry

Ohio Northern University

2007-2013 ……………………………. Graduate Teaching and Research Associate

The Ohio State University

PUBLICATIONS

1. Hard, R., Liu, J., Shen, J., Zhou, P., and Pei, D. “HDAC6 and Ubp-M BUZ domains recognize specific C-terminal sequences of proteins”, Biochemistry 2010, 49, 10737-10746.

2. Shepherd, T., Hard, R., Murray, A., Pei, D., and Fuentes, E. “Distinct ligand specificity of the Tiam1 and Tiam2 PDZ domains”, Biochemistry 2011, 50, 1296-1308.

3. Zhang, Y., Zhang, J., Yuan, C., Hard, R., Park, I., Li, C., Bell, C., and Pei, D. “Simultaneous Binding of Two Peptidyl Ligands by a Src Homology 2 Domain”, Biochemistry 2011, 50, 7637-7646.

FIELDS OF STUDY

Major Field: Biochemistry

vii

TABLE OF CONTENTS

Page

Abstract ...... ii Dedication ...... v Acknowledgements ...... vi Vita ...... vii List of Tables ...... xii List of Figures ...... xv List of Abbreviations ...... xviii

Chapter 1: Introduction ...... 1 1.1 Modular Protein Domains ...... 1 1.2 PDZ Domains...... 1 1.2.1 Structure of PDZ Domains and Mechanism of C-Terminal Ligand Recognition ...... 2 1.2.2 Function of PDZ Domains ...... 3 1.3 BUZ Domains ...... 4 1.3.1 Structure of BUZ Domains and Mechanism of Ubiquitin Recognition ... 4 1.3.2 Function of BUZ Domains...... 5 1.4 SH2 Domains ...... 6 1.4.1 Structure of SH2 Domains and Mechanism of Ligand Recognition ...... 6 1.4.2 Function of SH2 Domains ...... 7 1.5 Tandem BRCT Domains...... 8 1.5.1 Structure of Tandem BRCT Domains and Mechanism of Ligand Recognition ...... 9 1.5.2 Function of Tandem BRCT Domains ...... 12 1.6 Methods to Determine the Sequence Specificity of PDZ, BUZ, SH2 and Tandem BRCT Domains...... 14 1.6.1 Biological Library Techniques...... 15 1.6.1.1 Two-Hybrid System ...... 15 1.6.1.2 Phage Display ...... 16 1.6.1.3 lacI Repressor ...... 17 1.6.1.4 FRET-Based Screening Assay ...... 18 1.6.1.5 Co-immunoprecipitation/Pull-Down Assays ...... 19 1.6.2 Chemical Library Techniques ...... 20

viii

1.6.2.1 Solution Phase Pool Library ...... 20 1.6.2.2 SPOT Library ...... 21 1.6.2.3 Oriented Peptide Array Library ...... 21 1.6.2.4 Protein Microarray ...... 22 1.6.2.5 One-Bead-One-Compound (OBOC) Library ...... 23

Chapter 2: Determination of the Sequence Specificity of the SH2 Domains of PLCγ1 and TSAd Using One-Bead-One-Compound pY Libraries ...... 36 2.1 Introduction ...... 36 2.2 Experimental Procedures ...... 38 2.2.1 Materials ...... 38 2.2.2 Expression, Purification, and Biotinylation of the SH2 Domains ...... 39 2.2.3 Library Synthesis ...... 41 2.2.4 Library Screening...... 42 2.2.5 Peptide Sequencing ...... 44 2.2.6 SMALI Analysis of Sequences ...... 46 2.3 Results ...... 47 2.3.1 Sequence Specificity of the TSAd SH2 Domain ...... 47 2.3.2 Sequence Specificity of the PLCγ1 N- and C-terminal SH2 domains ..... 47 2.4 Discussion ...... 50 2.5 Acknowledgements ...... 52

Chapter 3: Distinct Ligand Specificity of the Tiam1 and Tiam2 PDZ Domains ...... 63 3.1 Introduction ...... 63 3.2 Experimental Procedures ...... 66 3.2.1 Expression, Purification, and Labeling of the PDZ Domains ...... 66 3.2.2 Library Screening...... 68 3.2.3 Peptide Sequencing ...... 69 3.2.4 SMALI Analysis of Sequences ...... 69 3.2.5 Synthetic Peptides ...... 69 3.2.6 Peptide Binding Affinity Determination ...... 70 3.3 Results ...... 72 3.3.1 Determination of Tiam1 and Tiam2 PDZ Domain Specificities by Screening a Peptide Library ...... 72 3.3.2 Sequence Specificity of the Tiam1 PDZ Domain ...... 73 3.3.3 Sequence Specificity of the Tiam2 PDZ Domain ...... 75 3.3.4 Binding Affinities of Selected Peptides to the PDZ Domains ...... 76 3.3.5 Potential Tiam1- and Tiam2-Binding Proteins ...... 77 3.3.6 S0 and S-2 Residues Selectively Modulate Ligand Affinity and Specificity ...... 79 3.3.7 Double-Mutant Cycle Analysis of the S0 and S-2 Binding Pocket Residues ...... 80 3.3.8 Residues in S0 and S-2 Pockets Determine Tiam1 and Tiam2 PDZ Domain Specificity ...... 81 3.4 Discussion ...... 82 ix

3.5 Acknowledgements ...... 91

Chapter 4: HDAC6 and Ubp-M BUZ Domains Recognize Specific C-Terminal Sequences of Proteins ...... 103 4.1 Introduction ...... 103 4.2 Experimental Procedures ...... 105 4.2.1 Materials ...... 105 4.2.2 Expression, Purification, and Biotinylation of the BUZ Domains ...... 106 α 4.2.3 Synthesis of N -Boc-Glu(δ-NHS)-O-CH2-CH=CH2 ...... 107 4.2.4 Library Synthesis ...... 108 4.2.5 Library Screening...... 110 4.2.6 Synthesis of Individual Peptides ...... 111 4.2.7 Determination of Binding Affinity by Fluorescence Polarization ...... 112 4.2.8 GST Pull-Down Assay ...... 113 4.3 Results ...... 114 4.3.1 Synthesis and Screening of a Peptide Library with Free C-Termini ...... 114 4.3.2 Sequence Specificity of the Ubp-M BUZ Domain ...... 115 4.3.3 Sequence Specificity of the HDAC6 Domain ...... 116 4.3.4 Binding Affinity of the BUZ Domains for Selected Peptides ...... 117 4.3.5 Determination of Critical Positions for Binding by Alanine Scan ...... 119 4.3.6 Database Search for Potential Interacting Partners of Ubp-M and HDAC6 BUZ Domains ...... 121 4.3.7 In Vitro Interaction between Ubp-M and HDAC6 BUZ Domains and Protein C-Termini ...... 121 4.4 Discussion ...... 122 4.5 Acknowledgements ...... 125

Chapter 5: Systematic Inspection of the Phosphopeptide Binding Specificity of Tandem BRCT Domains...... 136 5.1 Introduction ...... 136 5.2 Experimental Procedures ...... 139 5.2.1 Materials ...... 139 5.2.2 Cloning of the Tandem BRCT Domains ...... 140 5.2.3 Expression, Purification, and Biotinylation of the Tandem BRCT Domains ...... 143 5.2.4 Library Synthesis ...... 144 5.2.4.1 Synthesis of OBOC Peptide Library with Free C-Terminus ..... 144 5.2.4.2 Synthesis of OBOC Peptide Library containing Internal Regonition motifs...... 145 5.2.5 Library Screening...... 147 5.2.6 Peptide Synthesis and Labeling ...... 148 5.2.7 Fluorescence Polarization Assays ...... 150 5.2.8 SMALI Analysis ...... 150 5.2.9 Cell Culture and Transfection of PTIP Plasmid ...... 152 5.2.10 Peptide Pull-Down Assay ...... 152 5.3 Results ...... 154

x

5.3.1 Library Construction and Screening ...... 154 5.3.2 Sequence Specificity of the Tandem BRCT Domains of BRCA1, MDC1, and MCPH1 ...... 156 5.3.3 Sequence Specificity of the Tandem BRCT Domains of PTIP ...... 158 5.3.4 Sequence Specificity of Nibrin FHA-BRCT2 ...... 160 5.3.5 Sequence Specificity of the BRCT Domains of ANKRD32, BARD1, TopBP1 BRCT 1-2, TopBP1 BRCT 4-5, TopBP1 BRCT 7-8, and XRCC1 ...... 162 5.3.6 Sequence Specificity of the BRCT Domains of ECT2, DNA IV, and TP53BP1 ...... 164 5.3.7 Fluorescence Polarization Binding Assay for PTIP BRCT 5-6 and Nibrin FHA-BRCT2 ...... 164 5.3.8 In Vitro Pull-Down Assay ...... 167 5.4 Discussion ...... 168 5.5 Acknowledgements ...... 174

Bibliography ...... 210

xi

LIST OF TABLES

Page

Table 2.1 Libraries used for the SH2 screens ...... 53

Table 2.2 Peptides selected from TSAd SH2 screens ...... 53

Table 2.3 Peptides selected from PLCγ1 N-SH2 screens ...... 54

Table 2.4 Peptides selected from PLCγ1 C-SH2 screens ...... 55

Table 2.5 Summary of the sequence specificities of the SH2 domains ...... 56

Table 2.6 Potential TSAd SH2 protein interaction partners ...... 57

Table 2.7 Potential PLCγ1 N-SH2 protein interaction partners (class I) ...... 58

Table 2.8 Potential PLCγ1 N-SH2 protein interaction partners (class II) ...... 59

Table 2.9 Potential PLCγ1 C-SH2 protein interaction partners (class II)...... 60

Table 3.1 Peptides selected from Tiam1 PDZ screens ...... 92

Table 3.2 Peptides selected from Tiam2 PDZ screens ...... 93

Table 3.3 Dissociation constants of selected peptides against the PDZ domains ...... 94

Table 3.4 Summary of the sequence specificities of the PDZ domains...... 95

Table 3.5 Database search for potential Tiam1 PDZ-binding proteins ...... 95

Table 3.6 Database search for potential Tiam2 PDZ-binding proteins ...... 96

Table 3.7 ΔΔΔGint for Tiam1 PDZ Mutants as a function of binding to selected peptides ...... 97

Table 4.1 Peptides selected from Ubp-M BUZ domain screens ...... 126

Table 4.2 Peptides selected from HDAC6 BUZ domain screens ...... 127

xii

Table 4.3 Dissociation constants of selected peptides against the BUZ domains ...... 128

Table 4.4 Database search for potential Ubp-M and HDAC6 BUZ-binding proteins . 129

Table 5.1 Peptides from BRCA1 BRCT2 domain screens (C-terminal library) ...... 175

Table 5.2 Peptides from BRCA1 BRCT2 domain screens (X7 library) ...... 176

Table 5.3 Peptides from MDC1 BRCT2 domain screens (C-terminal library) ...... 177

Table 5.4 Peptides from MDC1 BRCT2 domain screens (X7 library) ...... 177

Table 5.5 Peptides from MCPH1 BRCT2 domain screens (C-terminal library)...... 178

Table 5.6 Peptides from MCPH1 BRCT2 domain screens (X7 library) ...... 178

Table 5.7 Peptides from PTIP BRCT 3-4 domain screens (C-terminal library) ...... 178

Table 5.8 Peptides from PTIP BRCT 5-6 domain screens (C-terminal library) ...... 179

Table 5.9 Peptides from PTIP BRCT 5-6 domain screens (X7 library) ...... 179

Table 5.10 Peptides from WT Nibrin FHA-BRCT2 domain screens (C-terminal library)...... 180

Table 5.11 Peptides from WT Nibrin FHA-BRCT2 domain screens (X7 library) ...... 180

Table 5.12 Peptides from R28A Nibrin FHA-BRCT2 domain screens (C-terminal library)...... 181

Table 5.13 Peptides from R28A Nibrin FHA-BRCT2 domain screens (X7 library) .... 181

Table 5.14 Peptides rom BARD1 BRCT2 domain screens (C-terminal library) ...... 182

Table 5.15 Peptides from BARD1 BRCT2 domain screens (X7 library) ...... 182

Table 5.16 Peptides from DNA Ligase IV BRCT2 domain screens (C-terminal library)...... 183

Table 5.17 Peptides from DNA Ligase IV BRCT2 domain screens (X7 library) ...... 183

Table 5.18 Peptides from ECT2 BRCT2 domain screens (C-terminal library) ...... 183

Table 5.19 Peptides from TopBP1 BRCT 1-2 domain screens (C-terminal library)... 184

xiii

Table 5.20 Peptides from TopBP1 BRCT 1-2 domain screens (X7 library) ...... 184

Table 5.21 Peptides from TopBP1 BRCT 4-5 domain screens (C-terminal library)... 185

Table 5.22 Peptides from TopBP1 BRCT 4-5 domain screens (X7 library) ...... 185

Table 5.23 Peptides from TopBP1 BRCT 7-8 domain screens (C-terminal library)... 186

Table 5.24 Peptides from TP53BP1 BRCT2 domain screens (C-terminal library) ..... 186

Table 5.25 Peptides from TP53BP1 BRCT2 domain screens (X7 library) ...... 187

Table 5.26 Peptides from XRCC1 BRCT2 domain screens (C-terminal library) ...... 187

Table 5.27 Peptides from XRCC1 BRCT2 domain screens (X7 library) ...... 187

Table 5.28 Peptides from ANKRD32 BRCT2 domain screens (C-terminal library) ... 188

Table 5.29 Peptides from ANKRD32 BRCT2 domain screens (X7 library) ...... 188

Table 5.30 Summary of the sequence specificities of the tandem BRCT domains of BRCA1, MDC1, MCPH1, PTIP (BRCT 5-6), and WT Nibrin ...... 189

Table 5.31 Summary of the sequence specificities of the tandem BRCT domains of ANKRD32, BARD1, TopBP1, and XRCC1 ...... 189

Table 5.32 Dissociation constants of selected peptides against the tandem BRCT domains of ANKRD32, BARD1, TopBP1, and XRCC1 ...... 190

Table 5.33 Dissociation constants of selected peptides against the tandem BRCT domains of PTIP (BRCT 5-6) and the FHA-BRCT2 domains of Nibrin ... 191

xiv

LIST OF FIGURES

Page

1.1 Cartoon representation of a canonical PDZ domain ...... 29

1.2 Structure of the BUZ domain of USP5 ...... 30

1.3 Structure of the C-terminal SH2 of PIK3R1 ...... 31

1.4 Ribbon diagram of the tandem BRCT domains of BRCA1 ...... 32

1.5 One-bead-one-compound peptide library split-and-pool synthesis ...... 33

1.6 screening methods ...... 34

1.7 Peptide sequencing by partial Edman degradation-mass spectrometry (PED-MS) ...... 35

2.1 Synthesis scheme of the OBOC phosphotyrosine peptide library ...... 61

2.2 Sequence specificity of the TSAd SH2 domain ...... 61

2.3 Sequence specificity of the PLCγ1 N-SH2 domain (class I sequences) ...... 62

2.4 Sequence specificity of the PLCγ1 C-SH2 domain (class I sequences) ...... 62

3.1 Conservation and structure of the Tiam1 and Tiam2 PDZ domains ...... 98

3.2 Structure of the inverted peptide library used in the PDZ domain screens ...... 99

3.3 Sequence specificity of the Tiam1 and Tiam2 PDZ domains...... 100

3.4 The Tiam1 PDZ domain quadruple mutant (QM) has the same specificity as the Tiam2 PDZ domain...... 101

3.5 Four subfamilies of the Tiam PDZ domains ...... 102

4.1 Synthesis scheme of a C-terminal OBOC peptide library ...... 130

4.2 Sequence specificity of the Ubp-M and HDAC6 BUZ domains ...... 130 xv

4.3 Representative plots showing the binding of FITC-labeled peptides to to Ubp-M and HDAC6 BUZ domains ...... 131

4.4 Fluorescence anisotropy assay showing the competition between FITC-BBRGMGG and Ubiquitin for binding to GST-HDAC6 BUZ domain ...... 132

4.5 Fluorescence anisotropy assay showing the binding of FITC-BBLQDGF to Glutathione S- ...... 133

4.6 Fluorescence anisotropy assay showing the interaction between the C-terminal peptides of histone H4 and FAT10 to Ubp-M and HDAC6 BUZ domains ...... 134

4.7 GST pull-down assay of the histone H3-H4 complex by Ubp-M BUZ ...... 135

5.1 Cartoon of tandem BRCT domains cloned into a pET22-ybbR13 vector ...... 192

5.2 Synthesis scheme of the tandem BRCT one-bead-one-compound peptide libraries ...... 193

5.3 Representative fluorescence polarization binding curve for the BRCT domains ... 194

5.4 Sequence specificity of BRCA1 BRCT2 to the X7 library ...... 194

5.5 Sequence specificity of BRCA1 BRCT2 to the C-terminal library...... 195

5.6 Sequence specificity of MDC1 BRCT2 to the C-terminal library ...... 195

5.7 Sequence specificity of MDC1 BRCT2 to the X7 library ...... 196

5.8 Sequence specificity of MCPH1 BRCT2 to the C-terminal library ...... 196

5.9 Sequence specificity of MCPH1 BRCT2 to the X7 library...... 197

5.10 Sequence specificity of PTIP BRCT 3-4 to the C-terminal library ...... 197

5.11 Sequence specificity of PTIP BRCT 5-6 to the C-terminal library ...... 198

5.12 Sequence specificity of PTIP BRCT 5-6 to the X7 library (group I sequences) ... 198

5.13 Sequence specificity of PTIP BRCT 5-6 to the X7 library (group II sequences) . 199

5.14 Sequence specificity of WT Nibrin FHA-BRCT2 to the X7 library...... 199

5.15 Sequence specificity of WT Nibrin FHA-BRCT2 to the C-terminal library ...... 200

xvi

5.16 Sequence specificity of R28A Nibrin FHA-BRCT2 to the X7 library ...... 200

5.17 Sequence specificity of R28A Nibrin FHA-BRCT2 to the C-terminal library ..... 201

5.18 Sequence specificity of ANKRD32 BRCT2 to the X7 library ...... 201

5.19 Sequence specificity of BARD1 BRCT2 to the X7 library ...... 202

5.20 Sequence specificity of TopBP1 BRCT 1-2 to the X7 library ...... 202

5.21 Sequence specificity of TopBP1 BRCT 4-5 to the X7 library ...... 203

5.22 Sequence specificity of XRCC1 BRCT2 to the X7 library ...... 203

5.23 Sequence specificity of ANKRD32 BRCT2 to the C-terminal library ...... 204

5.24 Sequence specificity of BARD1 BRCT2 to the C-terminal library ...... 204

5.25 Sequence specificity of TopBP1 BRCT 1-2 to the C-terminal library ...... 205

5.26 Sequence specificity of TopBP1 BRCT 4-5 to the C-terminal library ...... 205

5.27 Sequence specificity of TopBP1 BRCT 7-8 to the C-terminal library ...... 206

5.28 Sequence specificity of XRCC1 BRCT2 to the C-terminal library...... 206

5.29 Sequence specificity of DNA Ligase IV BRCT2 to the X7 library ...... 207

5.30 Sequence specificity of TP53BP1 BRCT2 to the X7 library ...... 207

5.31 Sequence specificity of DNA Ligase IV BRCT2 to the C-terminal library ...... 208

5.32 Sequence specificity of ECT2 BRCT2 to the C-terminal library ...... 208

5.33 Sequence specificity of TP53BP1 BRCT2 to the C-terminal library ...... 209

5.34 Peptide pull-down of PTIP BRCT 5-6 by various phosphopeptides ...... 209

xvii

LIST OF ABBREVIATIONS

α Alpha

Abu L-α-aminobutyric acid

Ac Acetyl

Ac2O Acetic anhydride

β Beta

B β-alanine

BCIP 5-bromo-4-chloro-3-indolyl phosphate

Boc tert-butyloxycarbonyl

Boc2O Boc anhydride

BRCT BRCA1 C-terminal

BSA Bovine serum albumin

BUZ Binder of ubiquitin zinc finger

Bzl Benzyl

°C Degrees Celsius

CLEAR Cross-linked ethoxylate acrylate resin

δ Delta

DCM Dichloromethane ddH2O Double distilled water

DIC Diisopropylcarbodiimide

DIPEA N,N-diisopropylethylamine xviii

DMAP 4-dimethylaminopyridine

DMEM Dulbecco’s Modified Eagle’s Medium

DMF N,N-dimethylformamide

DMSO Dimethylsulfoxide

DNA Deoxyribonucleic acid

EDTA N, N, N’, N’-ethylenediamine tetraacetate equiv Equivalent(s)

FHA Forkhead-associated

FITC Fluorescein isothiocyanate

Fmoc 9-fluorenylmethoxycarbonyl

Fmoc-OSu N-(9-fluorenylmethoxycarbonyloxy) succinimide g Gram(s)

γ Gamma

ΔGb Gibbs free energy

ΔΔΔGint Pairwise Coupling Free Energy of Interaction

GST Glutathione S-Transferase h Hour(s)

HATU 2-(1H-7-azabenzotriazole-1-yl)-1,1,3,3-tetramethyluronium hexafluorophosphate

HBTU 2-(1H-benzotriazole-1-yl)-1,1,3,3-tetramethyluronium hexafluorophosphate

HEK293 Human Embryonic Kidney cells

HEPES 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid

(His)6 Six histidine affinity tag

HOBt N-hydroxybenzotriazole xix

HPLC High performance liquid chromatography

HRP Horseradish peroxidase

IPTG Isopropyl-beta-D-thiogalactopyranoside

KD Dissociation constant

L Liter(s)

LB Luria-Bertani

M Moles per liter m Milli

μ Micro

μm Micron

MALDI-TOF Matrix-Assisted Laser Desorption/ Ionization-Time of Flight

Min Minute(s)

Minipeg 8-amino-3,6-dioxaoctanoic acid

Mol mole(s)

MS Mass spectrometry m/z Mass to Charge ratio

NHS N-hydroxysuccinimidyl

Nle L-norleucine

NMM N-methylmorpholine

OBOC One-Bead-One-Compound

PAGE Polyacrylamide gel electrophoresis

PCR Polymerase Chain Reaction

PDZ Post-synaptic density-95/Discs large/Zonula occludens-1 xx

PEG Polyethylene glycol

PITC Phenyl isothiocyanate pS Phosphoserine pT Phosphothreonine

PTP Protein tyrosine phosphatase pY Phosphotyrosine

PyBOP Benzotriazole-1-yl-oxy-tris-pyrrolidino-phosphonium hexafluorophosphate rpm Revolutions per minute rt Room temperature

RU Response unit s Second(s)

SA-AP Streptavidin-Alkaline Phosphatase

SDS Sodium Dodecyl Sulfate

SH2 Src homology 2

SPR Surface Plasmon Resonance

TCEP Tris(2-carboxyethyl) phosphine

TFA Trifluoroacetic acid

THF Tetrahydrofuran

Tris Tris(hydroxymethyl) aminomethane

WT Wild-type ybbr13 DSLEFIASKLA peptide sequence

Standard one-letter codes are used for deoxynucleotides and standard one- or three-letter codes are used for amino acids

xxi

CHAPTER 1

INTRODUCTION

1.1 Modular Protein Domains

Many protein-protein interactions are mediated by domains that are found to occur in many different proteins (or several times within the same protein). Such domains are considered to be modular (1, 2). Modular protein domains are generally

40-200 amino acids in length and recognize short peptide motifs in their binding partners.

Modular domains may recognize internal motifs, or may specifically recognize motifs at the N- or C-termini of their interaction partners (3, 4). They may also specifically require posttranslational modifications (PTMs) to bind, such as phosphorylation or acetylation.

By mediating protein-protein interactions, modular domains are critically involved in such processes as the formation of signaling complexes, controlling protein subcellular localization, altering protein conformations, and controlling the catalytic activity of . The binding specificity of four such modular domains (SH2, PDZ, BUZ, and tandem BRCT domains) is the subject of this report.

1.2 PDZ Domains

There are at least 250 Post-synaptic density-95/Discs large/Zonula occludens-1 (PDZ) domains found in over 150 human proteins (5). They generally consist of 80-90 amino acids and a single protein may contain one to several PDZ 1

domains. Proteins with PDZ domains fall into three general structural classes (6), with the first class consisting entirely of PDZ domains. The second class is the MAGUK

(membrane-associated guanylate kinase) family, in which each protein has 1-3 PDZ domains, one Src homology 3 (SH3) domain, and a guanylate kinase domain. The third class consists of proteins with both PDZ domains and other modular domains (such as

Pleckstrin homology (PH) and WW domains). The majority of proteins with PDZ domains are found in the cytoplasm and function as adaptors that hold protein complexes together.

1.2.1 Structure of PDZ Domains and Mechanism of C-Terminal Ligand Recognition

The canonical PDZ domain found in multicellular organisms consists of six

β-strands and two α-helices, in the form of β1-β2-β3-α1-β4-β5-α2-β6 (figure 1.1) (5-7)

The most common binding mode of a PDZ domain is to a C-terminal peptide on its protein binding partner. Peptide ligand binding does not greatly perturb PDZ domain structure and is referred to as “β-strand addition” (8). A bound peptide forms an extra

β-strand between β2 and α2, with its backbone NH and CO groups bonding with the peptide backbone of the PDZ domain. The carboxylate group at the C-terminus of the peptide binds to a R/K-XXX-G-Φ-G-Φ binding loop (X is any and Φ is a hydrophobic residue). The carboxylate group of the peptide forms hydrogen bonds with the NH groups of the binding loop peptide backbone, along with an ordered water molecule positioned by a conserved R/K residue. Along with the C-terminus, the five C- terminal residues generally play a role in ligand recognition, and residues beyond the fifth amino acid to the C-terminus rarely make a significant contribution to binding selectivity

(9). Commonly, the side-chains of the C-terminal amino acid and the third residue from

2

the C-terminus point into conserved binding pockets within the PDZ domain and are therefore particularly important for PDZ ligand recognition. Other amino acids of the peptide can interact with protein residues adjacent to the peptide binding groove to enhance the binding interaction. So far, PDZ domains have been found to interact with

16 different classes of peptide sequences. Some PDZ domains are known to be promiscuous, recognizing multiple classes of peptide sequences.

1.2.2 Function of PDZ Domains

PDZ domains function as protein scaffolds, assembling multiple proteins into complexes that perform specific cellular tasks (such as G protein-coupled receptor signaling complexes that regulate excitatory synapses (10) or visual signaling (11) and receptor tyrosine kinase complexes involved with cellular proliferation (12, 13)). PDZ domains are critical in maintaining cell polarity through the formation of tight junction

(14, 15) and adherens junction (16) complexes. They play roles in organizing glutamate receptor complexes (17), recycling membrane proteins (18, 19), linking membrane proteins to the cytoskeleton (20, 21), and several other critical cellular functions.

Although the majority of PDZ domains recognize their binding partners at the C- terminus,it should be mentioned that a subset of PDZ domains can also recognize internal peptide motifs (22-24). This interaction mechanism allows multiple PDZ domains to bind to the same protein (25) or one PDZ domain to bind multiple classes of ligands (26).

Certain PDZ domains are also capable of interacting with phospholipids, such as phosphatidylinositol 4,5-bisphosphate and phosphatidylinositol (3,4,5)-trisphosphate

(27, 28). One function of such an interaction is to help target PDZ-containing proteins to the plasma membrane. The PDZ domain is therefore a scaffolding protein with at least three different modes of ligand recognition. 3

1.3 BUZ Domains

Ubiquitin is highly prevalent protein found in eukaryotes, consisting of 76 amino acids and conjugated to proteins to signal for their degradation (29). Ubiquitin is also added to proteins to signal for processes such as DNA repair (30, 31), transcriptional regulation (32, 33), signal transduction (34, 35), and endocytosis and sorting (36-38).

There are at least 17 ubiquitin-binding domains found in the human proteome, including

BUZ (binder of ubiquitin zinc finger) domains (39, 40). BUZ domains are composed of

~ 100 residues (41-43) and are mainly found in ubiquitin-specific processing proteases

(USPs) (44). USPs are cysteine proteases that function to remove ubiquitin from protein substrates (45). BUZ domains are also found on an E3 ubiquitin ligase (46) and a cytoplasmic histone deacetylase (39, 47).

1.3.1 Structure of BUZ Domains and Mechanism of Ubiquitin Recognition

BUZ domains contain a central five-stranded twisted β-sheet, which is next to a conserved α-helix (41-44) (figure 1.2). Beyond the conserved β-sheet/α-helix exists a more varied region of the domain, with each BUZ domain having its own set of loops and secondary structural elements that define its function. BUZ domains are zinc metalloproteins, where zinc atoms are bound by conserved cysteine/histidine residues. Zinc is important for ubiquitin binding by BUZ domains, as mutation of the zinc-coordinating residues of the BUZ domains of IsoT and HDAC6 severely impaired their ability to bind ubiquitin (42, 48).

BUZ domains interact primarily with the C-terminal (RLRGG-COOH) tail of ubiquitin (41, 42, 44, 49, 50). The only detailed structural analysis of the mechanism of

BUZ domain recognition of the ubiquitin C-terminus to date shows the L71-G76

(LRLRGG-COOH) tail of ubiquitin penetrating into a deep, hydrophobic pocket in the 4

IsoT (USP5) BUZ domain (42). A network of direct and water-mediated hydrogen bonds, van der Waals forces, and hydrophobic contacts exist in the binding pocket between the ubiquitin C-terminus and the BUZ domain. Notably, the carbonyl of the C-terminal carboxylate group of G76 form four hydrogen bonds (two for each ) directly with the BUZ Y261 hydroxyl group and the amide group of BUZ R221.

Two other hydrogen bonds are mediated through two water molecules in proximity of

G76. The BUZ D264 side chain interacts with both the R72 and R74 side chains of ubiquitin. The BUZ Y223 side chain packs together with ubiquitin L71 and L73. The authors created a G76A model of the complex and noted that the side chain of an alanine would sterically clash with the side chains of Y259 and Y261 of IsoT.

1.3.2 Function of BUZ Domains

BUZ domains seem to function as regulatory modules in USPs, where their binding to the free C-terminus of ubiquitin alters the level of USP catalytic activity

(42, 51, 52). For USP5 (ubiquitin-specific processing protease 5), the binding of the free C-terminal tail of ubiquitin to its BUZ domain enhances its catalytic activity, allowing it to disassemble polyubiquitin chains that have already been removed from proteins (42). It is thought that the BUZ domain of USP5 acts to ensure that only polyubiquitin chains already removed from proteins are processed by USP5, as the BUZ domain could not interact with conjugated ubiquitin (which lacks a free C-terminus).

USP3 and USP16 (also known as Ubp-M) have been identified as histone deubiquitinases for histones H2A and H2B (or just histone H2A for USP16) (53, 54). The BUZ domains of each USP are believed to regulate the histone deubiquitinase activity of the enzymes, where binding of free ubiquitin activates their catalytic activity and frees histones H2A and H2B from ubiquitin, thereby altering expression and affecting cell cycle

5

progression (44). The BUZ domains of these two USPs would therefore act as sensors of the overall levels of free ubiquitin in the cell, which vary depending on levels of cellular stress (55). The HDAC6 BUZ domain binds the ubiquitin associated with protein aggregates, while a separate domain on HDAC6 binds to the dynein motor complex associated with the cytoskeleton (50, 56, 57), allowing for the transport of protein aggregates to aggresomes. BUZ domains therefore have at least two general functions, one as a regulatory domain on USPs and the other as a part of a tertiary protein complex

(protein aggregate-HDAC6-dynein motor complex) in HDAC6 that functions to transport misfolded proteins to aggresomes

1.4 SH2 Domains

The phosphorylation of tyrosine is a post-translational modification used in a diverse set of cellular signaling pathways that control processes such as cellular differentiation, proliferation, migration, apoptosis, and the immune system response (58-

60). The most abundant phosphotyrosine (pY) binding domain in the human proteome is the Src homology 2 (SH2) domain, 120 of which are found in 110 human proteins (58,

61). SH2 domains are composed of about 100 amino acids and usually occur only once within a protein (although a few proteins contain two different SH2 domians). They frequently are found in proteins with several other types of protein domains, such as PH and Src homology 3 (SH3) domains. SH2 domain-containing proteins can be divided into 11 distinct functional families (adaptors, scaffolds, kinases, signal transduction, phosphatases, transcription,ubiquitination, phospholipid second messenger signaling, cytoskeletal regulation, chromatin remodeling, and small GTPase signaling) (61).

1.4.1 Structure of SH2 Domains and Mechanism of Ligand Recognition

SH2 domains share a conserved central β-sheet composed of seven β-strands

6

with two closely associated α-helices (58, 59) (see figure 1.3). The pY peptide binding pocket will vary between different SH2 domains but all share a conserved phosphotyrosine pocket, which consists of a highly conserved arginine residue that forms a salt bridge with the phosphate group of pY (64). Specificity for residues other than pY occurs through binding pockets on the SH2 domain that will recognize residues between the -2 to +5 positions (relative to pY), although most SH2 domains primarily recognize residues between the +1 to +3 positions (65). Flexible loops of the SH2 domain can limit access or block certain binding pockets, adding an additional level of control over binding specificity. The canonical SH2-peptide interaction involves one singly phosphorylated pY peptide bound to the conserved SH2 binding pocket, but SH2 domains are capable of more exotic binding mechanisms. One such mechanism is that of the Vav1 SH2 domain, which is capable of recognizing two pY residues within the same peptide (66). Another mechanism involves the SH2 domain of Crk, which binds both a pY peptide at its conserved binding pocket and a SH3 domain via a proline-rich loop (67). Yet another mechanism is that of Spt6 SH2, which recognizes phosphoserine rathan than phosphotyrosine (68).

1.4.2 Function of SH2 Domains

As previously mentioned, SH2 domains are found in 11 different functional families. As SH2 domains lack intrinsic catalytic activity, their function is to mediate protein-protein interactions by recognizing specific (usually pY) sequences within a protein target. They can bridge proteins together in signaling pathways in a pY- dependent manner, as is the case for Nck SH2 (69). The Nck SH2 domain recognizes pY-motifs on activated receptors, followed by the Nck SH3 domain binding to downstream effectors, which results in cytoskeletal reorganization. SH2 domains

7

can also inhibit the catalytic activity of kinases or phosphatases, as is the case for SHIP2

(70). SHIP2 contains an SH2 domain, which acts to inhibit its phosphatase activity until

SHIP2 is phosphorylated at specific tyrosine sites. The mechanism of SHIP2 inhibition is most likely based on an initial inhibitory conformation, followed by a conformational change of SHIP2 caused by its SH2 domain recognizing pY sites within SHIP2. In human c-Src kinase, phosphorylation of tyrosine 530 causes a conformational change in the protein, where the c-Src SH2 domain binds to pY530. This results in the inhibition of c-Src kinase activity (71). Dephosphorylation of pY530, or the SH2 domain binding to pY sites in other proteins, reactivates c-Src kinase activity. The SH2 domain can also increase the catalytic activity of an , as is the case for the Fes tyrosine kinase (72).

Binding of the Fes SH2 domain to phosphotyrosine causes a conformational change in

Fes, which leads to an increase in kinase activity. SH2 domains that occur in tandem can confer very high binding affinity and specificity, with each SH2 domain of the tandem recognizing its own pY motif (73). In general, SH2 domains act to link tyrosine kinase activity to downstream cellular effects.

1.5 Tandem BRCT Domains

BRCA1 C-terminal (BRCT) domains are found in over 30 human proteins (74).

They can exist as isolated domains, tandem domains (where they are separated by a linker region of the protein that varies in length), or as a triple BRCT structure (75).

They can also exist as tandem domains immediately adjacent to an FHA domain (76).

The number of BRCT domains in a single protein ranges from only one domain to nine

(in the case of TopBP1). BRCT domains generally consist of ~ 90-100 amino acids and are found in proteins involved with the DNA damage response pathway and cell cycle regulation (77-79). Through various binding mechanisms, they mediate protein-protein 8

and protein-DNA interactions (80, 81). Tandem BRCT domains have been found to function as a single phosphopeptide binding unit (mainly to phosphoserine (pS) peptides)

(82, 83), where both BRCT domains participate in recognizing the phosphopeptide (84-

85).

1.5.1 Structure of Tandem BRCT Domains and Mechanism of Ligand Recognition

The canonical tandem BRCT structure consists of two β1-α1-β2-β3-α2-β4-α3 folds, connected by a generally short (~ 30-50 residue) linker region (figure 1.4). The two domains pack together in a head-to-tail manner through a hydrophobic interface region (87-91). The typical phosphopeptide interaction mode is deemed the “2-knob” mode, where a pS/pT is bound through a hydrogen bonding network in a pocket in the N- terminal BRCT domain and a secondary binding pocket in the BRCT repeat interface and

C-terminal BRCT domain recognizes residues immediately C-terminal to pS/pT, including a hydrophobic binding pocket recognizing the +3 position (relative to pS/pT).

Previous phosphopeptide binding specificity studies on the tandem BRCT domains of

BRCA1, MDC1, MCPH1, and PTIP BRCT 5-6 (the 5th and 6th BRCT domains) have shown that the domains generally prefer pS over pT (and not pY) at the 1st binding pocket, along with having a significant preference for aromatic/hydrophobic residues at the +3 position (83, 92-94). Another aspect of tandem BRCT domain specificity is their recognition for a free C-terminus after the +3 position, caused by the formation of a double salt bridge between the C-terminal carboxylate and the guanidinium side chain of a conserved arginine. Tandem BRCT domains differ in their preference for a free C- terminus. For example, the MDC1 and MCPH1 tandem BRCTs heavily prefer a free C- terminus after the +3 position. The BRCT tandem of MDC1 binds a pS peptide amidated 9

at the C-terminus with 66-fold lower affinity compared to the corresponding peptide with a free C-terminus, while MCPH1 binds with over 100-fold lower affinity to the amidated peptide (85, 94). The BRCT tandems of BRCA1 and PTIP (repeat 5-6) do not have as stringent a requirement for free C-termini after the +3 positon, with BRCA1 binding with only 7-fold lower affinity to a pS peptide amidated at the C-terminus (compared to the free C-terminal peptide) and PTIP BRCT 5-6 binding with 14-fold lower affinity (90,

91). The preference for a free C-terminus is therefore an additional specificity factor for tandem BRCT domains.

Interestingly, the structure of the tandem BRCT domains of TopBP1 (the repeat formed by the 7th and 8th BRCT domains) in the free and bound state show a non- canonical mode of phosphopeptide recognition (95). The tandem BRCT domains undergo a dramatic conformational shift upon binding to a phosphothreonine peptide, creating a deep binding pocket. This stands in contrast to other tandem BRCT domains, which do not dramatically change their conformations upon peptide binding and have relatively shallow peptide binding pockets. Another unusual feature of TopBP1 BRCT 7-

8 peptide binding is that is possesses a hydrophobic binding pocket for both the +3 and

+4 residues, rather than one for only the +3 position. In contrast to the other tandem

BRCT domains, the 7th and 8th BRCT domains of TopBP1 do not prefer phosphoserine over phosphothreonine. Peptide binding studies show that there is very little difference in the binding affinity between a pT peptide and a peptide where the pT is substituted for a pS residue.

As previously mentioned, TopBP1 possesses 9 BRCT domains. Unlike its

C-terminal tandem BRCT domains (domains 7-8), the N-terminus of TopBP1 contains 10

a triple BRCT repeat (75). The overall structure of each domain is basically the same as other BRCT domains, except that they contain perpendicular central beta sheets rather than a central parallel found in the canonical tandem BRCT fold. The domains are separated by shorter linkers (17 and 22 amino acids) than those found in canonical tandem BRCT domains. Interestingly, the second and third BRCT domains of the triple repeat each contain a phosphoamino acid binding pocket, while the triple repeat structure apparently lacks a binding pocket in the BRCT repeat interface region that would bind residues C-terminal to pS/pT, like those found in other tandem BRCT structures. It is currently unclear how a triple repeat recognizes phosphopeptide ligands.

Another BRCT repeat that deviates from the canonical tandem BRCT structure is that of Nibrin (or NBS1). Although no structure has been published of the human tandem BRCT domains of Nibrin, a crystal structure of the yeast (S. pombe) Nibrin

BRCT tandem revealed an FHA-BRCT1-BRCT2 architecture (96). Sequence homology between the yeast and human proteins suggest that the overall structure is conserved.

Rather than being separated by a flexible linker region, the FHA domain closely associates with the 1st BRCT domain through a large, hydrophobic interface. The arrangement of the tandem BRCT domains resembles the canonical BRCT fold, where the BRCT domains are separated by a ~ 30 amino acid linker. Like canonical BRCT tandems, the N-terminal BRCT domain in human Nibrin contains conserved pS interacting residues, although it lacks other conserved residues associated with the recognition of phosphopeptides. It is therefore not clear exactly how the FHA-BRCT1-

BRCT2 domains would recognize a phosphopeptide, although in vitro peptide binding studies suggest that they could interact with dual pS/pT motifs (motifs containing a pS 11

and a nearby pT or two nearby pS residues). It was speculated that both the FHA domain

(which is known to specifically bind pT peptide sequences (97)) and the BRCT domains might play roles in ligand recognition.

A few tandem BRCT repeats contain the pS/pT binding pocket in the C-terminal

BRCT domain (98). One such tandem is that of TopBP1 BRCT 4-5, where the phospho- amino acid binding pocket resides in the 5th BRCT domain, rather than the 4th one (75).

The 4th-5th BRCT tandem apparently lacks a conserved hydrophobic binding pocket for residues other than pS/pT, making it unclear how it recognizes phosphopeptides. A few tandem BRCT domains have been found to bind to proteins in a phosphorylation- independent manner, such as TP53BP1 BRCT2 and DNA Ligase IV BRCT2 (99, 100).

The fact that these domains can recognize non-phosphorylated proteins does not mean they cannot bind phosphopeptide ligands, as both tandem BRCTs of TP53BP1 and DNA

Ligase IV contain conserved phosphoamino acid binding pockets (101).

Overall, tandem BRCT domains generally exist as one functional unit that recognizes phosphopeptide ligands, although they can also exist as triple domain units

(with another BRCT or an FHA domain). Although the canonical tandem BRCT domain contains two conserved binding pockets, there are BRCT tandems which lack the secondary binding pocket normally responsible for binding residues C-terminal to pS/pT.

1.5.2 Function of Tandem BRCT Domains

There are at least 15 tandem BRCT domains in the human proteome (74). The most prevalent role for tandem BRCT domains is to act as mediators of protein-protein interactions (PPIs) in the DNA damage response pathway and cell cycle regulation processes (98, 102). A major mechanism by which tandem BRCT domains recognize 12

their binding partners is the phosphorylation of proteins on serine/ residues upon DNA damage (82, 83, 103). These interactions are crucial to the formation of DNA repair complexes, which must repair DNA damage prior to cell cycle progression (104).

Examples of proteins recruited by tandem BRCT domains to repair DNA include the

BACH1 DNA helicase by BRCA1 (BACH1 must contain a pS residue for this interaction) (82, 105), DNA exonuclease MRE11 (this interaction is indirect, where the

BRCT tandem of MDC1 binds pS-containing H2AX at sites of DNA damage and then recruits MRE11) (106-108), and the DNA recombinase RAD51 by MCPH1 (this is also an indirect interaction, where the tandem BRCT domains of MCPH1 bind to BRCA2, which is bound to RAD51) (109).

The interactions are also critical in controlling the cell cycle, which must be halted to allow for DNA damage repair before cells undergo DNA replication and mitosis

(110). Examples of such interactions include the interaction between the BRCA1 BRCT tandem domains and CtIP, a pS-dependent interaction that is critical for controlling the

G2/M checkpoint prior to entering mitosis (111). Another includes the pT-dependent

TopBP1 BRCT tandem (BRCT 7-8)-BACH1 interaction, which is important for the DNA replication checkpoint (112). The tandem BRCT domains of BRCA1 bind to p53, which stimulates the transcription of the cell cycle inhibitor p21WAF1 (113), while the tandem BRCT domains of ECT2 negatively regulate cytokinesis through an autoinhibitory mechanism (114). The interaction between MDC1 and the BRCT repeat region of NBS1 is important for the intra-S-phase checkpoint (115).

Tandem BRCT domains are critical for the repair of DNA damage because they recruit repair proteins to sites of the damage and are critical to cell cycle regulation 13

because they mediate the formation of protein complexes that can halt the cell cycle at various stages. Tandem BRCT domains have been found to be mutated in various cancers (116), the most famous of which are BRCA1 BRCT mutations. This is not surprising, as cells with nonfunctional BRCT tandems would be expected to accumulate DNA mutations and would not be effective at regulating cellular division.

Overall, tandem BRCT domains serve a critical role in both the DNA damage repair pathway and cell cycle regulation, acting as mediators of protein-protein interactions that are often based upon phosphorylation of pS/pT residues.

1.6 Methods to Determine the Sequence Specificity of PDZ, BUZ, SH2, and Tandem BRCT Domains

In the past, several groups have used both ribosomally and chemically synthesized libraries to study the binding specificity of PDZ, BUZ, SH2, and tandem

BRCT domains. In ribosomally synthesized libraries, a biological system (such as a bacterial or yeast cell) produces the peptide/protein members of the library. One advantage of using biologically synthesized libraries is their size (>108 members for lacI repressor libraries and >1012 members for a phage display system) (117, 118).

Methods such as yeast two-hybrid and pull-down assays offer the advantage of detecting interactions with whole proteins, rather than short peptide ligands. The disadvantage is that they are largely incompatible with posttranslational modifications (PTMs), as biological systems are limited to the 20 proteinogenic amino acids. These techniques are thus not suitable for screening against SH2 and BRCT domains.

Chemically synthesize libraries have no PTM limitation, since they are not limited by the genetic code. The incorporation of pY, pS, and pT residues into such

14

libraries is straightforward and makes them useful for screening against PTM-recognizing domains. The disadvantage of chemically synthesize libraries is their smaller diversity, which usually is limited to between 104 and 106 members (119, 120). The choice of which type of library method to use to determine the binding specificity of a protein or protein domain is therefore dependent on how a given protein/domain recognizes its binding partners.

1.6.1 Biological Library Techniques

1.6.1.1 Yeast Two-Hybrid System

The yeast two-hybrid (Y2H) system was developed in 1989 by Fields and Song

(121). The basic Y2H screen involves genetically fusing a protein of interest to a DNA binding domain (“bait”) and fusing another protein (or library of proteins) to an activation domain (“prey”). Upon expression in S. cerevisiae cells, the bait construct is bound upstream of a reporter gene via its DNA binding domain. If the bait protein binds to the prey protein, the activation domain comes in close proximity to the DNA binding domain, which activates transcription from the downstream reporter gene. The reporter gene can vary, but is commonly β-galactosidase (122). Expression of β-galactosidase labels the yeast cell in the presence of a colorimetric .

One common problem with Y2H screens is their reproducibility, thought to be caused by be differences in factors such as the selection stringency, gene reporters, and types of plasmids (122, 123). False negatives may arise because of factors such as protein solubility or the requirement for PTMs in a protein-protein interaction. False positives are a notorious problem in Y2H screens, arising from phenomena such as prey proteins directly binding to the DNA binding domain of the bait and misfolded 15

prey proteins non-specifically binding to the bait protein complex. Although the Y2H method allows for the detection of protein-protein interactions that occur in the context of a cellular environment, it clearly suffers from many drawbacks.

Y2H screens have been used to study the binding specificity of PDZ domains

(124, 125), the BUZ domain of HDAC6 (126), SH2 domains (127-131), and tandem

BRCT domains (74, 134-139). For Y2H screens involving SH2 domains, it is necessary to co-express a tyrosine kinase, as S. cerevisiae lack endogenous tyrosine kinases (140).

For Y2H screens of tandem BRCT domains, co-expression of serine/threonine kinases is not necessary, as S. cerevisiae possess over 100 serine/threonine kinases. Even with the co-expression or endogenous expression of kinases, it is not clear whether hits from Y2H screens are actually due to phosphorylation-dependent protein-protein interactions, nonphosphorylation-dependent protein-protein interactions, or false positives.

1.6.1.2 Phage Display

Phage display of proteins/peptides on viral coat proteins was first developed by

Smith (141). The basic method involves cloning the DNA sequence encoding a peptide or protein (or library of peptides/proteins) into the gene of the phage viral coat protein.

A population of phages, each with different coat protein/peptide sequences, is exposed to an immobilized target (such as a protein domain). Unbound phages are washed away, followed by elution of binding phages by harsher washing conditions. Binding phages are allowed to infect and are therefore amplified. The new batch of phages from bacteria are biased towards peptide/protein sequences that bind to the domain. Further rounds of phage binding, washing, and elution can be used to select for the best binders.

Finally, the DNA sequences of binding proteins/peptides are determined (142). 16

Although the diversity of phage libraries can be very large (>1012 variants), they suffer from certain drawbacks. One is that phage libraries may be biased towards certain sequences, due to codon bias within the infected bacteria and differences in viral translocation efficiency across the bacterial membrane based upon the sequence of their viral coats (143). Another potential problem is the selection of relatively weak binders, due to an avidity effect caused by the presence of multiple coat proteins per virus

(144). In order to screen phage display libraries against tandem BRCT domains and SH2 domains, the phage peptides must first be phosphorylated by kinases in the bacteria or in solution prior to their addition to the immobilized domain (145-146). Potential issues

that would arise from using the phage display technique for studying SH2 or tandem

BRCT domain specificity would be uncertainty about the degree of phosphorylation of each phage peptide, along with phosphorylation bias of certain sequences due to the sequence selectivity of the kinase(s) used. Determination of the C-terminal binding specificity of PDZ and BUZ domains requires that the phage peptides present a free

C-terminus. Although this has not been done for BUZ domains, the C-terminal binding specificity of PDZ domains has been examined by C-terminal phage display, where the peptide library was fused to the C-terminus of protein-8 of the M13 phage (147, 148) or presented on the surface of T7 phages (149).

1.6.1.3 lacI Repressor

The lacI repressor technique for screening protein binding specificity was developed by Schatz (150). The method involves fusing a peptide library to the

C-terminus of the lac repressor, which binds tightly to its own encoding plasmid at lacO sequences. Once the lac repressor-plasmid complex is formed, bacteria 17

containing the complex are lysed, and the complex is added to a column containing an immobilized protein target. After a washing step, the bound repressor-plasmid complex is eluted, followed by transformation into E. coli. Bacteria are grown so that the binding sequences are amplified. This process can be repeated as many times as necessary to obtain highly specific peptide binders to the immobilized protein target.

Like phage display, this technique allows for the generation of large (>1010 member) peptide libraries (151). Also like phage display, it is difficult to incorporate PTMs into the displayed peptides. SH2, BUZ, and tandem BRCT domains have not been screened against lac repressor libraries, although the technique has been used to examine the

C-terminal binding specificity of PDZ domains (152-153).

1.6.1.4 FRET-Based Screening Assay

Protein-protein interactions can be detected by fluorescence resonance energy transfer (FRET) between mutants of green fluorescent protein (GFP) (154). This method works by genetically fusing two proteins to different versions of GFP (one GFP mutant acts as a FRET donor and the other as the acceptor). If the two proteins of interest interact, the FRET donor and acceptor are in close enough proximity to allow for FRET transfer. The method has been used to determine the binding specificity of a PDZ domain, using a PDZ-GFP acceptor and 15-mer peptide library-GFP donor co-expressed in E. coli cells (154). If a PDZ-peptide interaction occurred, cells would emit a FRET signal and would be sorted using fluorescence-activated cell sorting (FACS). Positive cells would be grown, then recycled through FACS to enrich the population of cells harboring binding peptide sequences. The PDZ-peptide dissociation constant (KD) would then be determined by lysis of positive cells and serial dilution of the sample. DNA 18

sequencing would be performed to determine the identity of each peptide.

This method has the advantage of allowing for KD determination of protein- peptide interactions without the need for peptide resynthesis/labeling but does not allow for the easy incorporation of amino acids with PTMs. Other issues, such as differences in the solubility of peptide-GFP donor constructs and rates of peptide proteolysis could bias the library towards certain types of peptide sequences.

1.6.1.5 Co-immunoprecipitation/Pull-Down Assays

Pull-down assays involve the immobilization of a protein of interest to an affinity column (such as a GST-fusion protein to glutathione agarose), followed by exposure of the immobilized protein to a cell lysate. After a washing step, proteins which are bound to the immobilized fusion protein are eluted. Alternatively, an antibody specific for one protein (“protein X”) can be added to a cell lysate, followed by capture of protein X and any protein(s) associated with protein X by protein A/G-coupled resin (protein A or protein G specifically bind the antibody). Protein complexes can then be eluted and analyzed. This method is known as co-immunoprecipitation (co-IP) (156).

A common method to determine the identity of eluted proteins from pull- down/co-IP assays is gel electrophoresis and mass spectrometry (MS) (157). Briefly, eluted proteins are separated by gel electrophoresis, followed by excision of the protein bands from the gel and proteolytic digestion. Afterwards, peptides can be separated by liquid chromatography and analyzed by various MS methods. MS analysis reveals the identity of each eluted protein. An alternative to MS detection of the eluted proteins is a

Western blot, where proteins are separated on polyacrylamide gels, transferred to a membrane (such as nitrocellulose), then detected with an antibody specific for a 19

particular protein (158).

Pull-down/co-IP assays coupled with MS or Western blot detection have been used to find the binding partners of PDZ domains (159-162), SH2 domains (163-164),

BUZ domains (41), and tandem BRCT domains (74, 82-84, 165). The advantage of using pull-down/co-IP assays to discover protein-protein interactions, compared to techniques like Y2H and phage display, is that human (or mammalian) cell lines are generally used. This means that detected interacting partners can presumably interact in the human/mammalian cellular environment, rather than in just in yeast or in vitro. One disadvantage of these methods is in detecting protein-protein interactions that require a

PTM. Another disadvantage lies in the detection of low abundance proteins my MS, which can be problematic due to the presence of other proteins in the mixture that are much more abundant. Detection of proteins by Western blot requires a specific antibody for each potential interacting partner.

1.6.2 Chemical Library Techniques

1.6.2.1 Solution Phase Pool Library

The basic method of this technique is to immobilize a protein domain unto an affinity column (such as GST-SH2 domain unto a glutathione column) and to expose the domain to a library of peptides (in solution) (166). Unbound peptides are washed away, followed by elution of binding peptides by using a selective elution step (such as using phenylphosphate buffer to elute pY peptides bound to an SH2 domain) or a harsh washing condition. Binding peptides are collectively sequenced using Edman degradation. The abundance of each amino acid at each position in the binding peptides reveals the binding specificity of the domain in question. SH2, PDZ, and tandem BRCT 20

domains have all been screened against these types of libraries (83, 92-94, 166-168). The disadvantage of this technique is that individual binding sequences cannot be determined.

If a protein domain selects for multiple classes of ligands, the composite binding data cannot be deconvoluted into individual groups based upon their ligand class.

1.6.2.2 SPOT Library

Thousands of individual peptides can be synthesized on spots on a cellulose membrane support (169). Peptide synthesis is generally done robotically. Screening of

SPOT libraries, where each spot represents one peptide sequence, can be performed by the addition of a fusion protein (such as a GST-PDZ domain), followed by a washing step. Afterwards, an antibody-enzyme conjugate (such as anti-GST antibody-horse radish peroxidase (HRP)) is added. A chemiluminescent signal then develops at spots containing binding peptides (170). Sequencing of peptides is unnecessary, because the position of each peptide is already known. This technique has been used to study the binding specificity of PDZ (171, 172), SH2 domains (173), and tandem BRCT domains

(92, 174). The main advantages of this technique, compared to solution phase pool libraries, include being able to determine individual binding sequences and not having to sequence biding peptides. The limitation of the method is that SPOT libraries usually are limited to 10,000 members, which stands in contrast to solution libraries (>106 members).

1.6.2.3 Oriented Peptide Array Library

Rather than synthesizing only one peptide per spot, a library of peptides can be synthesized per spot (175). These oriented peptide array libraries (OPAL) work by having each spot with a fixed residue surrounded by random positions. For example, for 21

a 7-mer library X1X2X3X4X5X6X7 (X = one of 20 proteinogenic amino acids), one spot of the array would contain a (K)X2X3X4X5X6X7 library (where all library members have at the first position). There would be a total of 140 spots (20 spots for each amino acid at 7 positions). Screening of these libraries is performed the same as SPOT libraries.

The chemiluminescent intensity at each spot indicates the preference a protein domain has for that particular amino acid at that position in the library. No sequencing is necessary because the experimenter would know which amino acid is fixed per spot.

This method has been used to screen for the binding specificity of SH2 domains, where every spot library presented a fixed pY residue (173, 174, 176). Like solution phase pool libraries, OPAL libraries do not allow for the sequence determination of individual binding sequences.

1.6.2.4 Protein Microarray

The protein microarray is much like a SPOT library, except that proteins are immobilized unto a surface instead of peptides. Protein microarrays can be used to immobilize hundreds of different proteins (or domains) unto glass slides (177-181). The protein microarrays can then be used to probe the binding specificity of the immobilized proteins. For example, one study immobilized 157 different mouse PDZ domains and exposed the domains to 217 different fluorescently labeled peptides (179). The fluorescent intensity of each spot for each peptide indicates the affinity of the domain for the peptide. The authors confirmed interactions by using solution-phase fluorescence polarization assays. In another study, almost every SH2 domain in the human proteome was immobilized unto glass arrays, followed by exposure of the SH2 domains to

61 fluorescent-pY peptides derived from the ErbB receptors (181). The study used a 22

range of concentrations for each peptide to create binding curves and KD estimates for each SH2-peptide interaction. The microarray technology can also be used to probe for protein-protein interactions. One study exposed 145 different protein domains to MCF7 cell lysate, followed by a washing step. Afterwards, the protein chip was exposed to an antibody of a protein of interest. Exposure of the chip to a flourophore-labeled secondary antibody revealed that the protein of interest bound to immobilized WW and SH3 domains (180). An advantage of the protein microarray method in determine the binding specificity of proteins is that they require very small quantities of each protein (μg quantities). A disadvantage is that each peptide must be individually added to the array

(one at a time), making it difficult to screen more than a few hundred peptides. Although

PDZ and SH2 domains have been screened against these types of arrays, tandem BRCT and BUZ domains have not yet been screened.

1.6.2.5 One-Bead-One-Compound (OBOC) Library

The basic idea of combinatorial OBOC peptide libraries is that millions of different peptides can be synthesized on microbeads (~100 μm diameter) in such a way that allows each bead to present only one peptide sequence (182, 183). This is accomplished by the “split-and-pool” synthesis technique, where a collection of millions of microbeads are split into different reaction vessels (see figure 1.5). Each microbead contains a (generally an amine) that reacts with an added amino acid

(the amino acid is first reacted with an activating agent like HBTU, which facilitates a nucleophilic attack of the amine unto the carboxylic acid group). To each reaction vessel, a different amino acid coupling mixture is added so that only one type of amino acid is coupled to each bead. The result of the reaction is the formation of 23

an amide bond between the microbead amine and the amino acid carboxylic acid group.

Afterwards resin from each reaction vessel is combined into a larger vessel and the amino protecting group of the coupled amino acids (generally the Fmoc group) (184) is removed. By combining the beads into a larger vessel, the beads are also mixed together.

When re-split into smaller reaction vessels for the next coupling reaction, each vessel has a heterogeneous mixture of beads (rather than all the beads having only one type of amino acid). After re-splitting the beads, the next added amino acid mixture reacts with the free amine of the previously added amino acid. The process of split and pool is repeated until the desired number of positions is added to the library. Finally, the amino acid side-chain protecting groups are removed, generally by treatment with a solution of concentrated trifluoroacetic acid (TFA). Along with TFA, the solution generally contains carbocation scavengers, such as H20 and 1,2-ethanedithiol. The total number of possible members of the library is the number of amino acids found a position raised to the power of the number of random positions (for example, if each position has 20 different amino acids and there is five random positions, the library has 205 (3.2 million) possible sequences).

The advantage of using OBOC peptide libraries, compared to solution phase and OPAL peptide libraries, is that library beads selected from screens of protein domains can be individually sequenced. Because each bead contains a unique sequence, this means individual binding sequences can be determined. OBOC peptide libraries can present millions of peptides, rather than only ~ 10,000 sequences that SPOT libraries can present. An advancement in OBOC peptide library synthesis introduced a decade after the first OBOC peptide library was reported made them even better alternatives to 24

other synthetic peptide library techniques. Lam et al. introduced a technique that allowed for the spatial segregation of library beads, where one type of peptide could be presented on the surface of beads to interact with proteins while an encoding peptide could reside in the bead interior (185). This advancement overcame a limitation to OBOC peptide libraries containing peptides (or peptidomimetics) which were incompatible with Edman degradation chemistry (such as cyclic or beta-peptides). Before this development, encoding tags (such as the linear version of the cyclic peptide) would be co-presented on the bead surface, making it unclear whether a positive hit from a library screen was the result of a protein binding to the desired ligand or the encoding ligand. Another technique made possible by this technology was the reduction in surface peptide density on library beads (186). One source of false positives from screens of protein domains against OBOC peptide libraries is the high ligand density on the bead surface, which leads to multidentate peptide-protein interactions (such as charge-charge interactions).

This allows for the selection of beads with peptides rich in basic or acid residues to be picked from screens. To overcome this problem, beads are spatially segregated into outer and inner layers, followed by reaction of the majority of the free amines on the bead surface with a capping agent. The majority of the amines on the bead surface can no longer be reacted with amino acids, while the interior amines can still do so. This allows for enough peptides to exist on the bead surface to engage in specific protein-protein interactions while avoiding interactions resulting from an avidity effect. The number of peptides in the bead interior does not change, so that bead sequencing is not effected.

Another application of the bead segregation technique is the synthesis of an

OBOC peptide library presenting peptides with free C-termini on the bead surface and 25

encoding peptides in the normal (N-terminal to C-terminal) orientation in its interior

(187). This was accomplished by the selective addition of a modified on the exterior of library beads, followed by coupling a compound presenting a hydroxyl functional group to the glutamic acid. As a result, amino acids coupled to the resin were added to the hydroxyl group on the exterior of the bead (forming an ester bond) and to amines in the bead interior. After the peptide synthesis, the peptides on the bead exterior were cyclized, with the N-terminus of the peptide reacting with the glutamic acid carboxylic acid group. The ester bond of the exterior cyclic peptides was cleaved, producing a linear peptide presenting a free C-terminus. Prior to the development of spatial segregation technology, C-terminal peptides would have to be co-presented on the bead exterior with peptides in the normal N-to-C orientation. This would complicate screening by making it unclear whether a protein of interest is binding to the C-terminal or N-terminal peptide. The exclusive presentation of peptides with free C-termini on the bead exterior made screening C-terminal specific domains with OBOC peptide libraries more effective.

OBOC peptide libraries are commonly screened against a protein of interest by first labeling the protein with a fluorophore or a biotin group on lysine residue(s) (figure

1.6). If labeled with a fluorescent group, protein binding to beads will lead to the beads developing fluorescence when viewed under a fluorescent microscope. Alternatively, proteins labeled with biotin will recruit streptavidin-alkaline phosphatase (SA-AP) to binding beads. The alkaline phosphatase will then remove the phosphate group of 5- bromo-4-chloro-3-indolyl phosphate (BCIP), which is then oxidized into an indigo dimer.

Because of the hydrophobicity of the dimer, it is deposited into the hydrophobic interior 26

of the binding library bead, giving it a turquoise color. It should be noted that the

BCIP/SA-AP screening system works best if the library beads have polystyrene cores

(such as Tentagel beads), rather than hydrophilic cores (such as PEGA resin). Regardless of the screening technique, positive beads are picked by micropipette under a microscope.

The identification of hits from OBOC peptide library screens requires the sequencing of selected library beads. This can be accomplished by Edman degradation of each bead (182, 188, 189), which is can be expensive and time consuming if dozens or hundreds of beads are selected from a screen. Alternatively, tandem MS analysis of peptides from library beads has been employed (190). This technique can produce complex, ambiguous spectra that makes sequence determination difficult. A modification of Edman degradation is partial Edman degradation (191-193) (figure 1.7), where amino acids at each position are treated with a mixture of PITC and Fmoc-OSu. If the amino acid reacts with Fmoc-OSu, it is capped and protected from degradation. If not, it reacts with PITC and is later degraded when the library beads are treated with TFA. The process of exposing the peptide amino acids to Fmoc-OSu/PITC and then TFA is repeated until all the random positions are either capped or degraded. This process results in the formation of a peptide ladder. Afterwards, the Fmoc groups are removed with piperidine treatement and the peptides are cleaved from the resin. This is followed by MALDI-TOF MS analysis of the peptide ladder from each bead. The mass difference between each peak in the MS spectrum corresponds to the amino acid which was degraded by PITC/TFA treatment. As a sequencing technique for OBOC library beads, partial Edman degradation and mass spectrometry (PED/MS) is both rapid and inexpensive (the sequences of 100 peptides can be determined in 1-2 days and it costs 27

less than $1 per peptide). It also produces spectra that are clean and easy to interpret.

OBOC peptide libraries have previously been used to study the binding specificity of PDZ and SH2 domains (187, 194-199), but have not been used to the binding specificity of BUZ or tandem BRCT domains. The OBOC peptide library synthesis, screening, and sequencing methodology was extended to the N-terminal and C-terminal

SH2 domains of Phosphoinositide Phospholipase C-Gamma-1 (PLCγ1) and the SH2 domain of T Cell-Specific Adapter Protein (TSAd). C-terminal OBOC peptide libraries were used to study the sequence specificity of the PDZ domains of T-Lymphoma

Invasion and Metastasis-Inducing Protein 1 (Tiam-1) and Tiam-2, along with the BUZ domains of Ubiquitin-Processing Protease (Ubp-M) and Histone Deacetylase 6

(HDAC6). OBOC pS/pT peptide libraries were used to study the binding specificity of all known human tandem BRCT domains. Furthermore, the mechanism of binding of the N-terminal SH2 domain of the protein-tyrosine phosphatase SHP-2 to a peptide selected from OBOC pY peptide library screens was studied by X-ray crystallography,

NMR spectroscopy, and surface plasmon resonance.

28

Figure 1.1 Cartoon representation of a canonical PDZ domain.. The ribbon diagram is of the second PDZ domain of the protein tyrosine phosphatase PTP-BL. The N- and C- terminal strands are colored blue and red, the α-helices and β-strands are sequentially numbered. Structure from (1), PDB code 1GM1.

29

Figure 1.2 Structure of the BUZ domain of USP5, from (38) (PDB code 2G43). The ribbon diagram shows the conserved central five-stranded twisted β-sheet (β-strands 1-5) and the conserved α-helix (αA helix). Also shown are flexible loops L1 and L2A, along with a chelated zinc ion. The USP5 BUZ domain also has a second α-helix (αB, orange).

30

Figure 1.3 Structure of the C-terminal SH2 of PIK3R1, from (196) (PDB code 1QAD). α-helices are colored in red, while β-strands are colored cyan. Each α- helix and β-strand is sequentially lettered.

31

Figure 1.4 Ribbon diagram of the tandem BRCT domains of BRCA1, from (83) (PDB code 1JNX). The α-helices are colored yellow while the β-strands are cyan. The BRCT linker region is blue (including a linker α-helix). The α-helices and β-strands are sequentially numbered.

32

Figure 1.5 One-bead-one-compound peptide library split-and-pool synthesis. In the scheme, the resin is split into four different reaction vessels, followed by the coupling of four different amino acids (alanine (red), phenylalanine (green), (cyan), and arginine (purple)). Afterwards, all the resin is pooled into the larger reaction vessel and the alpha amines of the coupled amino acids are deprotected. The process is repeated as many times as desired.

33

a.

b.

Figure 1.6 Protein domain screening methods. (a) SA-AP/BCIP screening method, where a biotin-labeled protein domain recruits SA-AP to a binding library bead. (b) Fluorescence screening method, where a fluorophore-labeled protein domain binds to a library bead.

34

Figure 1.7 Peptide sequencing by partial Edman degradation-mass spectrometry (PED-MS). Shown below the PED scheme is an analysis of a MALDI-TOF mass spectrum, with the sequence AAHWPGGRGBBLM* (RGBBLLM* is the linker used in the library, M* is a lactone). 35

CHAPTER 2

DETERMINATION OF THE SEQUENCE SPECIFICITY OF THE SH2 DOMAINS OF PLCγ1 AND TSAd USING ONE-BEAD-ONE-COMPOUND pY LIBRARIES

2.1 Introduction

T cell specific adaptor protein (TSAd) contains a single SH2 domain, along with a C-terminal proline-rich region and several tyrosine phosphorylation sites (201). One role for TSAd is in modulating the function of the tyrosine kinase Lck in T cells by either inhibiting or mediating Lck phosphorylation of its substrates (202, 203, 204).

TSAd is crucial to the formation of the G protein β subunit/Lck/ZAP-70 complex in

T cells, which leads to T-cell migration (205). TSAd has also been to shown to mediate

T cell migration by its interaction with laminin binding protein (LBP), where it uses both its SH2 domain and proline rich region to bind LBP. (206). In endothelial cells, TSAd binds vascular endothelial growth factor receptor 2 (VEGFR-2), leading to both cell migration and actin stress fiber formation (207). TSAd also acts as a transcription factor in T cells, where its SH2 domain is required for both its nuclear import and for its ability to activate transcription (208).

Phospholipase C-Gamma 1 (PLCγ1) contains two PH domains, an EF hand domain, a C2 domain, a catalytic domain, two SH2 domains, and a single SH3 domain

(209). T cell receptor (TCR) ligation leads to its recruitment to the T cell membrane by

36

its interaction with the lipid raft-associated protein LAT (along with proteins associated with LAT) via its SH2 and SH3 domains (210). Once localized to the LAT complex,

PLCγ1 is phosphorylated to become catalytically active (211), leading to the hydrolysis of phosphatidylinositol-4,5-bisphosphate (PIP2) into inositol-1,4,5-trisphosphate (IP3) and diacylglycerol (DAG). IP3 stimulates calcium release from ER stores while DAG activates protein kinase C and RasGRP (a Ras GEF) (212-214). The downstream effects of PIP2 hydrolysis are T cell activation/proliferation. PLCγ1 binds villin in epithelial cells to induce cell migration, using one of its SH2 domains (215). PLCγ1 is activated by a variety of growth factor receptor tyrosine kinases (RTKs), along with certain nonreceptor RTKs, which leads to PLCγ1 hydrolysis of PIP2 and generation of IP3 and

DAG in those cell lines (216). The release of IP3 and DAG into these cells then helps to regulate cellular processes such as metabolism, growth, and secretion (217). PLCγ1 can also serve as a guanine nucleotide exchange factor (GEF) for dynamin and PIKE through its SH3 domain, which links PLCγ1 to the regulation of receptor-mediated endocytosis and cellular proliferation by acting on GTPases (218).

Along with its many roles in normally proliferating cells, PLCγ1 has been found to be overexpressed in certain cancers (219-221). Overexpression of PLCγ1 has been associated with cancer tumorigenesis, invasion, and metastasis. One report found that overexpression of just the SH2-SH2-SH3 portion of PLCγ1 lead to tumorigenesis, highlighting the role PLCγ1 SH2/SH3 domains play in cell proliferation

(222). In one study, knockdown of PLCγ1 inhibited lung cancer metastasis in mice, revealing the central role PLCγ1 plays in cancer metastasis.

Although protein binding partners have already been found for the SH2 domains 37

of TSAd and PLCγ1, screening the SH2 domains to determine their peptide binding specificity could help determine novel binding partners or reveal how the SH2 domains bind to already known protein interacting partners. Although the pY peptide binding specificities of the TSAd and PLCγ1 SH2 domains have already been reported (166,

176), previous studies used either solution phase pool or OPAL libraries. They therefore could not read individual binding sequences and possibly discover multiple classes of binding ligands. In order to obtain a better picture of the binding specificity of the SH2 domains of TSAd and PLCγ1, they were screened against several types of OBOC pY peptide libraries. Because the TSAd SH2 and the N-terminal SH2 of PLCγ1 were previously found not to recognize residues beyond the +3 position (relative to pY), they were screened against OBOC pY libraries which did not have random positions beyond

the +4 position. The C-terminal SH2 domain of PLCγ1 was previously reported to recognize residues beyond the +4 position (223), so an OBOC pY peptide library with random positions up to the +5 position was used to screen the domain.

2.2 Experimental Procedures

2.2.1 Materials

Talon cobalt resin and glutathione resin were purchased from Clontech

Laboratories (Palo Alto, Ca) and Q sepharose resin was from GE Healthcare Life

Sciences (Pittsburgh, PA). Sephadex G-25 resin was from Sigma Aldrich (St. Louis,

MO). Fmoc/Boc-protected amino acids were from Advanced Chemtech (Louisville,

KY), NovaBiochem (La Jolla, CA), and Peptides International (Louisville, KY). HBTU and HOBt were also from Peptides International. Alloc-OSu was purchased from Chem-

Impex (Wood Dale, IL). EZ-Link NHS-chromogenic biotin was from Thermo Scientific 38

(Rockford, IL). Solvents and chemical reagents were from Sigma Aldrich, Fisher

Scientific (Pittsburgh, PA), and VWR (West Chester, PA). PITC was purchased from

Sigma Aldrich in 1-mL sealed ampoules. SA-AP enzyme (1 mg/mL stock) was from

Prozyme (Hayward, CA). Tentagel S NH2 resin (90 μm, 0.26 mmol/g, ~100 pmol/bead) was from Peptides International. BCIP was purchased from Sigma Aldrich. IPTG was purchased from Anatrace (Maumee, OH) and antibiotics were from Sigma Aldrich.

2.2.2 Expression, Purification, and Biotinylation of the SH2 Domains

The plasmid encoding the GST fusion SH2 domain of TSAd was kindly provided by the lab of Dr. Shawn Li (University of Western Ontario) (176). The plasmids encoding the N- and C-terminal SH2 domains of PLCγ1 were kindly provide by the lab of Dr. Gavin Macbeath (Harvard University) (both were in pET-32c(+) vectors). The vectors were transformed into E. coli R. BL21(DE3) cells and plated onto LB agar plates

(with added ampicillin/chloramphenicol antibiotics). A 50 mL LB culture was inoculated with a single bacterial colony containing the SH2 plasmid and rotated at 37 °C for 14-16 hours. Afterwards, 10 mL of culture was added to 1 L of LB media and rotated at 37 °C until the OD 600 was 0.6. Protein production was induced by the addition of 90 μM IPTG (for GST-TSAd) and 500 μM IPTG (for the PLCγ1 SH2 domains). For GST-TSAd, cells were rotated at 30 °C for 5h, while PLCγ1 N-SH2 cells were rotated at 20 °C for 15h and PLCγ1 C-SH2 cells for 18 °C for 16h. The next step was pelleting the cells at 5000 rpm (15 minutes) at 4 °C, followed by resuspension of the cells in lysis buffer (20 mM HEPES, 150 mM NaCl, 1 mM β-mercaptoethanol, pH

7.4 for GST-TSAd and 20 mM HEPES, 300 mM NaCl, 5 mM imidazole, 1 mM β- mercaptoethanol 0.5% triton X-100, pH 8 for both SH2 domains of PLCγ1). Protease 39

inhibitors were added to the lysis buffers (35 mg/L phenylmethanesulfonyl fluoride

(PMSF), 20 mg/L Soybean trypsin inhibitor, and 1 mg/L pepstatin). Cells were lysed by the addition of 1 mg/mL lysozyme (from chicken egg white), followed by sonication at 4

°C. Cell debris was removed by centrifugation at 15,000 rpm for 15 minutes. The supernatant was loaded unto either glutathione resin (for GST-TSAd) or Talon cobalt resin (for PLCγ1 SH2 domains), followed by washing of the resins with 200-250 mL of wash buffer (20 mM HEPES, 150 mM NaCl, 1 mM β-mercaptoethanol, pH 7.4 for glutathione resin and 20 mM HEPES, 300 mM NaCl, 5 mM imidazole, pH 8 for cobalt resin). Protein was then eluted with 20 mM HEPES, 150 mM NaCl, 10 mM glutathione,

10 mM β-mercaptoethanol, pH 7.4 (for GST TSAd) and 20 mM HEPES, 300 mM NaCl,

125 mM imidazole, pH 7.4 (for PLCγ1 SH2 domains). For GST TSAd, the protein was passed through a G-25 size exclusion chromatography (SEC) column to remove glutathione (using 30 mM HEPES, 150 mM NaCl, pH 7.4 buffer). Both PLCγ1 N- and

C-SH2 domains were further purified by anion exchange chromatography (Q-sepharose resin, using 20 mM HEPES/pH 8 buffer). The SH2 domains were labeled by reaction with EZ-Link NHS-chromogenic biotin. Briefly, protein (which was at least 2 mg/mL concentration) was diluted with 1 M NaHCO3 (pH 8.3) buffer so that it was labeled in

200 mM NaHCO3. Between 2-3 equivalents of EZ-Link NHS-chromogenic biotin was then added to the protein solution, which was rotated at 4 °C for 1h. To remove unreacted biotin, the SH2 domains were then passed through G-25 columns using 30 mM

HEPES/150 mM NaCl/pH 7.4 buffer. After confirming protein labeling by measuring chromogenic biotin absorbance at 354 nm, glycerol was added to the proteins so that the final protein solution contained 30% (v/v) glycerol. The proteins were then flash 40

frozen in a dry ice/isopropyl bath and stored at -80 °C until use.

2.2.3 Library Synthesis

Library I (figure and table 2.1) was synthesized on 5g Tentagel S NH2 resin. In order to reduce the density of free amines on the bead surface, beads were soaked overnight in water. Afterwards, the water was drained and the resin resuspended in 55:45

(v/v) DCM/diethyl ether contained 0.05 equiv Fmoc-Met-OSu and 0.45 equiv Boc-Met-

OSu for 30 min (on a rotary shaker). The resin was then washed with the DCM/ether mixture, and DMF (10x). The remainder of the free amines were reacted with 2 equiv of

Fmoc-Met-OH/HBTU/HOBt and 4 equiv NMM for 90 min. The Boc group was removed by treatment of the beads with a 50:50 (v/v) TFA/DCM mixture, followed by capping of the resulting free amines with excess acetic anhydride (acetic anhydride/

DMAP/NMM capping) for 30 min. The Met Fmoc group was then removed with 20% piperidine in DMF (5 min, then 15 min). The rest of the linker was coupled to the resin using standard Fmoc/HBTU/HOBt chemistry. The random positions of the library contained the 18 proteinogenic amino acids, plus L-norleucine ( replacement) and L-α-aminobutyric acid (cysteine replacement). Random positions of the library were added by first splitting the resin into 20 different reaction vessels (250 mg per vessel), followed by coupling the resin to 5 equiv Fmoc-amino acid/5 equiv HBTU/5 equiv HOBt and 10 equiv NMM for 1 hr (in DMF). This procedure was repeated once, followed by pooling the resin into a larger reaction vessel and removing the Fmoc protecting groups with 20% piperidine/DMF treatment. In order differentiate isobaric amino acids, 5%

(mol/mol) CD3CO2D was added to Lys and Leu coupling reactions, while 5% (mol/mol)

CH3CD3CO2D was added to each norleucine coupling reaction (193). Amino acid side 41

chain protecting groups were removed by modified reagent K treatment (7.5% phenol,

5% thioanisole, 5% H2O, 2.5% ethanedithiol, and 1% anisole). The resin was then exhaustively washed with TFA, DCM, 5% DIPEA in DMF, then DCM. The library was then dried under vacuum and then stored at -20 °C.

Library II was synthesized in essentially the same manner as library I, except that it contained one additional random position (C-terminal to pY) and it lacked

L-α-aminobutyric acid in its random positions. For library III, five random positions were C-terminal to the pY residue. An alanine was then coupled pY by using 6 equiv

Fmoc-Ala/6 equiv HATU/12 equiv NMM (1st couple was 1hr in DMF, 2nd couple was

1 hr in a 70:30 (v/v) DMF/DCM mixture). The Fmoc group of the alanine was removed, followed by capping of the free N-terminus with Alloc-OSu (5 equiv Alloc-OSu/5 equiv

DIPEA) in a 1:1 (v/v) DMF/DCM mixture (1 hr). The library was then deprotected with modified reagent K, dried, and stored at -20 °C. It should be noted that libraries I-III were synthesized by Pauline Tan and Yanyan Zhang (The Ohio State University).

2.2.4 Library Screening

For small-scale screens, 10 mg of a pY library would be added to a Bio-Rad micro BioSpin column (0.8 mL), swelled in DCM (10 min), washed with DMF (6x), washed with ddH20 (6x), and then incubated in blocking buffer (30 mM HEPES, 150 mM

NaCl, 20 mM imidazole, 1 mM TCEP, 0.05% Tween 20, 0.1% gelatin, pH 7.4) for 4 hr.

Afterwards, the blocking buffer wash drained, followed by the addition of blocking buffer and protein (final biotin-protein concentration was 500 nM for GST-TSAd screens and 1 μM for screens of PLCγ1 SH2 domains). After being gently rotated overnight at

4 °C, the solution was drained the next day. Afterwards, the resin was resuspended in 42

1 mL SA-AP binding buffer (30 mM Tris-HCl, 250 mM NaCl, 10 mM MgCl2, 70 µM

ZnCl2, 20 mM imidazole, 20 mM potassium phosphate, pH 7.4, 1 μg SA-AP enzyme) and gently rotated at 4°C for 10 min. The solution was then drained, followed by three quick washing steps (once with 500 μL of SA-AP binding buffer, once with 500 μL of blocking buffer, and then once with SA-AP staining buffer (30 mM Tris-HCl, 100 mM

NaCl, 5 mM MgCl2, 20 µM ZnCl2, 20 mM imidazole, 0.01% Tween 20, pH 8.5). After the washing steps, resin was resuspended in 900 μL SA-AP staining buffer and transferred to one well of a 12-well BD falcon plate. 100 μL of a BCIP solution (5 mg/mL BCIP in SA-AP staining buffer) was added to the 900 μL solution. Library resin was then shaken vigorously on a rotator until the development of turquoise colored-beads was observed under a dissecting microscope. The reaction was then quenched by the addition of 500 μL of 4 M HCl. Colored beads were then picked by micro-pipet under microscope. Beads which were darker blue were separated from those with a lighter blue coloration. For large-scale screens, 30-100 mg of library was used instead of 10 mg pY library. For TSAd SH2 screens, library I was used. Library II was used for PLCγ1 N-SH2, while library III was used for PLCγ1 C-SH2 screens.

Before beads selected from screens of library III could be sequenced, it was necessary to remove the alloc protecting group from the N-terminal alanine. This was done by the first placing the positive library hits into PED vessels (homemade glass vessels with a fine-porosity fit on the bottom, darker colored beads were placed in one

PED vessel while the lighter colored beads were placed in another vessel). The beads were then washed with ddH2O (3x), DMF (3x), DCM (3x), and anhydrous THF (3x).

Afterwards, 50 mg of triphenylphosphine was dissolved in 1mL anhydrous THF. 7.5 μL 43

of formic acid was added to the solution, which was then briefly vortexed. The solution was then placed on ice, followed by addition of 20.7 μL diethylamine. The solution was again briefly vortexed and placed on ice. The THF solution was then used to dissolve

23 mg of tetrakis(triphenylphosphine) palladium. Finally, 500 μL of the solution was added to the PED vessel with the darker colored library hits and 500 μL was added to the vessel with the lighter colored hits. Parafilm was placed over each PED vessel, along with aluminum foil. After an overnight reaction (at room temperature), the solution was drained from each vessel. The beads were then washed with THF (3x), DCM (3x), and

DMF (3x), followed by incubation with 1% DIPEA in DMF (10 min). The solution was then drained, the beads washed with DMF (3x), followed by the addition of a solution of

1% sodium dimethyldithiocarbamate hydrate in DMF. After a 10 min incubation, the solution was drained, and the beads were washed with DMF (3x) and pyridine (3x). The beads were then sequenced by PED-MS.

2.2.5 Peptide Sequencing

The sequences of positive library hits were determined by PED/MALDI-TOF

MS, as previously described by our group (193). Briefly, beads selected from a screen were placed into PED vessels (the darker colored beads would be placed into one PED vessel while the lighter colored beads were placed in a separate vessel). Next, the beads were washed with ddH2O (3x), DMF (3x), and pyridine (3x). The beads were then suspended in a 2:1 (v/v) pyridine/ddH2O solution (plus 0.1% triethylamine), followed by the addition of a 160 μL pyridine solution (containing 8.5 mM Fmoc-OSu and 589 mM

PITC, a 69:1 molar excess of PITC to Fmoc-OSu). The vessel was then rotated at room temperature for 6 minutes. The pyridine/water solution was then drained, followed by 44

washes with pyridine (3x), DCM (3x), and TFA (3x). Approximately 1 mL of TFA was then added to each vessel, followed by a 6 minute rotation (at room temperature). The beads were then drained and resuspended in 1 mL TFA, followed by another 6 minute rotation. Upon draining the TFA, the resin was washed in TFA (3x), DCM (3x), and pyridine (3x). The whole process was repeated 7-8 times (7 times for library III and 8 times for libraries I and II). After the final cycle, beads were washed with TFA (3x),

DCM (3x), and DMF (3x). To each vessel, 1 mL of 20% piperidine in DMF was added.

The vessels were rotated for 5 minutes, drained of the piperidine solution, and incubated for another 15 minutes with 20% piperidine in DMF. The beads were washed with DMF

(3x) and then DCM (6x). Beads were then incubated for 20 minutes with a 1 mL TFA solution containing 25 mg ammonium iodide and 20 μL dimethyl sulfide (for this step, vessels were placed on ice). The solution was then drained, followed by washes with ddH20 (10x). Beads were then transferred from each PED vessel into a Petri dish and picked manually under a dissecting microscope. Each bead was placed into its own microcentrifuge tube and incubated overnight in a 20 μL CNBr solution (40 mg/mL

CNBr dissolved into 70% TFA in water) in the dark. The following day, the CNBr solution was removed by use of a SpeedVac system. For MALDI-TOF MS analysis, the peptide ladders of each microcentrifuge tube were re-dissolved in 0.1% TFA in ddH20.

They were then spotted onto a MALDI plate by mixing 1 μL of a peptide ladder solution with 2 μL of a α-cyano-4-hydroxycinnamic acid solution (8 mg/mL α-cyano-4- hydroxycinnamic acid dissolved in a 50:50:0.1 acetonitrile/water/TFA solution). 1 μL of that mixture would then be added to a spot on the MALDI-TOF plate.

45

2.2.6 SMALI Analysis of Sequences

SH2 domain selected sequences were sorted by Microsoft Excel software into groups following an observed pattern. Next, every group sequence was assigned a

SMALI score (224) and then ranked according to their corresponding score. For each position in a peptide sequence, each amino acid (20 for library I and 19 for libraries II and III) was given its own score using the following formulas:

푁 Ri,p = Xi,p/∑푖=1 Xi,p

N Wt= log2 N -(- ∑i=1 (Ri,p log2 Ri,p))

N Si,p= Ri,p (log2 N -(- ∑i=1 (Ri,p log2 Ri,p)))

Here, Xi,p is the number of times amino acid i occurs at position p. Xi,p is used to calculate Ri,p (the probability of amino acid i appearing at position p). Ri,p and N (the number of residues included in the random positions of the library) are used to calculate the weight (Wt) of each position. The weight of a position reflects the degree of selectivity seen at that position in selected sequences. The higher degree of selectivity, the larger the weight. Finally, Si,p (the score of amino acid i and position p) is calculated by multiplying Ri,p by the weight of position i. The SMALI (Sm) score for a peptide is simply the sum of the Si,p values of the amino acids in a peptide sequence. A higher Sm score indicates that a given sequences is more preferred by a SH2 domain than a sequence with a lower score.

46

2.3 Results

2.3.1 Sequence Specificity of the TSAd SH2 Domain

The SH2 domain of TSAd selected for a single peptide consensus sequence when screened against library I, [(h)-(h/r)-pY-(A/D)-(N)-(l/v)] (lowercase letters represent residues that were only slightly preferred compared to other amino acids while uppercase letters indicate that the residue was heavily preferred at that position in the library, see figure and table 2.2). TSAd SH2 clearly shows the highest degree of selectivity at the +1 and +2 positions, with a minor preference for basic residues N- terminal to pY and hydrophobic residues at the +3 position. A previous study of the binding specificity of TSAd SH2 showed a consensus of (H/E/P)-pY-(D/E/S)-(N) (172).

In the study, the selectivity of the SH2 domain was most pronounced at the +1 and +2 positions, with D being the most preferred residue at the +1 position and N being heavily preferred at the +2 position. The results from the OBOC pY library screen of TSAd SH2 are therefore similar to the results obtained from the reported OPAL library screen, although there was not a significant selection of residues N-terminal to pY or a selection of E/S at the +1 position in the OBOC screens.

2.3.2 Sequence Specificity of the PLCγ1 N- and C-terminal SH2 domains

The N-terminal SH2 domain of PLCγ1 was screened against library II and selected for three classes of sequences. Class I followed the general pattern of [(Y/r)-(x)- pY-(I/V/t/m)-(d/i/v)-(I/V/t/f/m)-(x)], where x indicates that little/no selectivity was seen at a position and M represents L-norleucine (see figure and table 2.3). Class II was a minor class of the consensus [(x)-(x)-pY-(a)-(Y)-(Q)-(f/y)]. Class III sequences showed no obvious pattern. The C-terminal SH2 domain of PLCγ1 also selected for 47

three classes of sequences when screened against library III, with the first class showing a consensus of [pY-(I/V/t/l/m/w/f/y)-(d/e/n/r/w)-(v)-(x)-(x)], the second class had a consensus of [pY-(I/V/y)-(K/r/h/p)-(P)-(g/p/h)-(r/v)], while the third class of sequences followed no obvious pattern (see figure and table 2.4). For both SH2 domains, class I sequences were the most frequently selected from screens. In the class I sequences of both the N- and C-terminal SH2 domains, I/V were the most preferred residues selected at the +1 position. Both domains selected for D at the +2 position and V at the +3 position. For PLCγ1 N-SH2, the -2, +1, and +3 positions were the most selective, while

PLCγ1 C-SH2 only showed a strong selection for residues at the +1 position. The SH2 domains also differed in which types of class II sequences they selected.

One previous study examined the binding specificity of the N-terminal SH2 domain of PLCγ1 using a solution pool library (166). The solution pY library gave a consensus of [pY-(L/I/V)-(E/D)-(L/I/V)], where the greatest selectivity was seen at the

+1 and +3 positions. Because the library used in the study lacked random residues N- terminal to pY, the authors of the study could not have selected for tyrosine at the -2 position (as seen for this study). Like reported in the previous study, PLCγ1 N-SH2 selected for I/V at both the +1 and +3 positions in the class I sequences. It did not, however, strongly select for E/D at the +2 position. The previous study did not report any other consensus sequence.

Two previous studies examined the binding specificity of the C-terminal SH2 domain of PLCγ1, one using a solution pool library (166) and the other using an OPAL library (176). The solution library gave a consensus of [pY-(V/I/L)-(I/L)-(P/V/I)] and the OPAL consensus was [(x)-(x)-pY-(L/V/I/T)-(L/M)-(P/I/L)-(x)]. The consensus 48

sequences reported by both studies are very similar to class I sequences from this study in the selection for residues at the +1 position (all show the selection of hydrophobic residues). Both previous studies reported that the greatest selectivity was observed at the

+1 position, again consistent with this report. Like the class I sequences from this study, the solution pool library screens showed selection for V at the +3 position. In contrast to the previous reports, this study did not show a selection for hydrophobic (I/L/M) residues at the +2 position in class I sequences. In agreement with the OPAL library results, this study showed no significant selectivity for residues beyond the +3 position. This result seems to contradict a structural study of PLCγ1 C-SH2 bound to a peptide, which showed interactions between the SH2 binding pocket and amino acid side chains at the

+4, +5, and +6 positions of the peptide (223).

Because the previous binding specificity studies used a solution library and an

OPAL library, they could not determine individual binding sequences. As a result, they could not divide the selected sequences into separate binding classes. This may explain why both previous studies reported the selection of proline at the +3 position. In their studies, they grouped all sequences together into one class (including ones with proline at the +3 position), whereas this study separated the sequences containing P at the +3 position into a separate class (class II). Unlike the consensus sequences from the previous studies, the class II sequences show a significant selection for basic residues

(H/R/K) at the +2 position. This selection indicates that the class II sequences may have a different binding mode than the class I sequences, which lacked any significant selection for residues at the +2 position. The discovery of two separate binding consensus motifs for both the N-terminal and C-terminal SH2 domains of PLCγ1 49

highlight the advantage of using OBOC libraries in determining protein domain binding specificity.

2.4 Discussion

In this study, the binding specificities of three SH2 domains were examined by screening the domains against OBOC pY peptide libraries. Although others have already reported the binding specificities of these domains, the peptide libraries they used to determine the specificities did not allow for individual sequence determination. By using

OBOC pY libraries, sequences could be grouped into different classes based upon differences in their selection for residues in the random positions. This lead to the discovery of multiple classes of selected peptides for the SH2 domains of PLCγ1

(summarized in table 2.5). Future studies will be necessary to confirm that the class II sequences of the PLCγ1 SH2 domains are indeed true binders (possibly by both structural and peptide binding affinity studies, along with the eventual determination of which protein binding partners bind to the PLCγ1 SH2 domains through the class II motifs). Those studies could also include selected class III sequences, which did not seem to follow a recognizable pattern.

The consensus binding motif of the TSAd SH2 domain reported in this study strongly suggests that in vivo binding partners of the SH2 domain of TSAd would most likely possess D/A residues at the +1 position and N at the +2 position (relative to a pY). In support of this, the SH2 domain of TSAd was found to interact with pY1214 of VEGFR2 (sequence: FHpYDNT) (225). The mouse version of TSAd (Lad) reportedly binds MEKK2 between amino acids 228-282 (226). This region contains a TD(Y)DNP motif and may therefore interact with TSAd SH2 (if phosphorylated). It 50

is possible that the SH2 domain of TSAd mediates other protein-protein interactions that have yet to be discovered. A computer database search was conducted to find other proteins which may bind to TSAd via its SH2 domain (using the following search motif: pY-[AD]-[N]-[LV], (http://www.phosphosite.org), table 2.6).

Based upon the class I sequence motifs selected by both SH2 domains of PLCγ1, both should be able to bind proteins at sites containing a hydrophobic residue at the +1 position (especially ones with I/V). Of those sequences, PLCγ1 N-SH2 would bind motifs with tyrosine at the -2 position and hydrophobic residues at the +3 position with the highest affinity, while PLCγ1 C-SH2 would not have a strong preference for residues C-terminal to the +1 position. In support of this model, both SH2 domains of

PLCγ1 can bind to a pY132 peptide from the human LAT protein (PGpYLVV) (227) and to a pY1068 peptide from EGFR (PEpYINQ) (228). In contrast to the class I sequences, the PLCγ1 SH2 domains should be more specific for their class II sequences.

The N-SH2 of PLCγ1 should bind better to pYAYQ-type sequences than C-SH2, while the C-SH2 domain should interact better with pY(I/V)KP-type sequences than the N-SH2 domain. A literature search yielded no proteins known to bind to PLCγ1 through motifs matching either class II sequence, suggesting that these motifs may belong to undiscovered binding partners of the PLCγ1 SH2 domains. Like for TSAd SH2, a database search was conducted to find potential binding partners of the SH2 domains of PLCγ1 (also with http://www.phosphosite.org). The class I and II motifs for PLCγ1

N-SH2 and the class II motif for PLCγ1 C-SH2 were used for the searches (tables 2.7-

2.9). The class I motif for PLCγ1 C-SH2 was not used because it was an overly broad

51

search motif.

The binding specificities of the TSAd and PLCγ1 SH2 domains were determined by screening the domains against OBOC pY peptide libraries. Future studies should be able to use the data from this study to help determine protein binding partners of these

SH2 domains, along with which pY-motifs are recognized by each domain. This will lead to a better understanding of the cellular roles of TSAd and PLCγ1.

2.5 Acknowledgements

The plasmid encoding the SH2 domain of TSAd was provided by the lab of Dr.

Shawn Li (University of Western Ontario), while the PLCγ1 SH2 plasmids were from the lab of Dr. Gavin Macbeath (Harvard University). The OBOC pY peptide libraries used to screen TSAd and PLCγ1 SH2 domains were synthesized by Pauline Tan and

Yanyan Zhang of the Pei lab at The Ohio State University.

52

Table 2.1 Libraries used for the SH2 screensa

Library I: AAXXpYXXXLNBBRM (reduced density) Library II: AXXpYXXXXLNBBRM (reduced density) Library III: Alloc-ApYXXXXXLNBBRM (reduced density)

aFor library I, X represents 18 proteinogenic amino acids plus L-α-aminobutyric acid (cysteine replacement) and L-norleucine (methionine replacement). Libraries II and III lack L-α-aminobutyric acid in their random positions. B, β-alanine; Alloc, N- allyloxycarbonyl protecting group.

Table 2.2 Peptides selected from TSAd SH2 screensa

Peptide SM Peptide SM TRpYDNV 3.82 NIpYCNP 3.08 VRpYDNF 3.79 HCpYHNM 3.07 QTpYDNL 3.78 XXpYCNR 2.97 QLpYDNL 3.71 FHpYXNX 2.91 AEpYDNL 3.71 HXpYXNW 2.90 VRpYANI 3.70 XSpYALL 1.09 MTpYDNC 3.69 KMpYDMV 1.07 MRpYANX 3.69 MTpYAIY 1.01 CYpYDNV 3.69 GHpYDXX 0.90 HVpYANK 3.69 WGpYDXX 0.89 HIpYANN 3.68 AFpYDXX 0.83 HKpYANY 3.66 LCpYAXX 0.79 WNpYDNP 3.65 FGpYTLV 0.51 KGpYANK 3.64 IHpYQYS 0.35 NTpYANF 3.64 WRpYXXX 0.22 CYpYANN 3.59 FRpYXXX 0.22 LHpYCNL 3.22 HPpYSXX 0.21 HHpYGNK 3.12 RRpYXXX 0.19 IRpYHNH 3.11 aC, L-α-aminobutyric acid; M, L-norleucine; X, could not determine amino acid identity; SM, SMALI score of the peptide sequence. TSAd SH2 was screened against library I.

53

Table 2.3 Peptides selected from PLCγ1 N-SH2 screensa

Class I (37 Sequences) Class II (5 Sequences) Class III (36 Sequences)

Peptide SM Peptide SM Peptide SM Peptide Peptide YDpYIVIR 1.53 XXpYVIVI 0.90 RApYAYQF 9.44 XXXXMII PQpYRYMY YPpYIDIA 1.52 XXpYVVIE 0.89 PMpYYYQF 8.98 XXXIXXX PRpYINGN YDpYIDTT 1.46 IYpYTVVS 0.89 AFpYHIQY 7.83 XVpYFMHA NEpYQMVS YQpYVIIQ 1.38 RRpYTVTM 0.84 XWpYIYQI 7.55 XWpYEXXX IRpYVDXX YVpYIDFT 1.36 RDpYYYVP 0.76 XXpYANQY 6.84 XYpYEIVY ITpYHFVI YDpYINTR 1.32 YKpYFSMP 0.75 YTpYVLPR HQpYVFXX YYpYTDVY 1.22 PYpYTIMT 0.74 YPpYRYRA HEpYQEVY XXpYIDIW 1.15 IApYIDFD 0.66 YMpYHQPY HSpYMMPS RRpYIWVE 1.14 VYpYLQID 0.66 YRpYGPSF HVpYHMIQ YQpYVMFW 1.08 SKpYFVTR 0.62 WQpYLXXX HApYHTFA SEpYVIIG 1.08 IIpYADFW 0.61 VDpYXXXX HTpYGVYR GEpYIRIQ 1.08 HApYIAVI 0.61 VVpYYIRI GVpYIMXX DWpYIITF 1.07 MSpYTYFS 0.59 VIpYKFFR EDpYLYXX TQpYIIMH 1.04 KGpYVMAV 0.58 VYpYHMVI DGpYIIXX FHpYVDVF 0.99 RHpYMAYG 0.58 TVpYNNFS ATpYHXXX YRpYMYMN 0.94 DEpYLEMY 0.53 RYpYTWKY PMpYYYQF YEpYVTAS 0.92 VGpYMNYR 0.50 RKpYNSIH PRpYYMHY HTpYMIIR 0.92 XXpYYETV 0.44 QHpYHMTI PIpYVGHW RQpYMEVQ 0.91

a M, L-norleucine; X, could not determine amino acid identity; SM, SMALI score of the peptide sequence; Class I, sequences of the (Y/r) -(x)-pY-(I/V/t/m)-(d/i/v)-(I/V/t/f/m)-(x) consensus; Class II, sequences of the (x)-(x)-pY-(a)-(Y)-(Q) -(f/y) consensus; Class III, sequences that did not follow any observable pattern. PLCγ1 N-SH2 was screened against library II.

54

Table 2.4 Peptides selected from PLCγ1 C-SH2 screensa

Class I (96 Sequences) Class II (11 Sequences) Class III (12 Sequences)

Peptide SM Peptide SM Peptide SM Peptide pYVEVHQ 0.39 pYTVFHL 0.19 pYVKPHV 5.80 pYESVSQ pYVDNYH 0.38 pYTLIYT 0.19 pYVHPGV 5.60 pYEFEYT pYVPMYD 0.38 pYWDMYK 0.19 pYIHPHR 5.60 pYEMTMF pYVRTYS 0.37 pYTRHWN 0.19 pYVPPPQ 5.50 pYHEIRM pYVDYIE 0.37 pYWNFMD 0.19 pYIPPEK 5.38 pYHYDMR pYVNTNQ 0.37 pYTVISV 0.19 pYMKPFR 5.35 pYHQIGR pYVGVEA 0.37 pYWWMQG 0.19 pYIEPGN 5.30 pYHFARK pYVETAV 0.37 pYLNYQA 0.18 pYFKPRI 5.24 pYHYKTW pYVWSSI 0.37 pYWRTFA 0.18 pYYRPLP 5.21 pYHYIKR pYVKFKH 0.37 pYTPNWW 0.18 pYYYPPA 5.13 pYQYVRG pYVEWES 0.37 pYWQFIH 0.18 pYWRPAG 5.04 XXWKK pYVGFSK 0.36 pYLPYIH 0.18 XXXXW pYVLMMP 0.36 pYWESSF 0.18 pYVEWTI 0.36 pYLDRNL 0.18 pYVQHYW 0.36 pYWTVMH 0.18 pYVLFGD 0.36 pYLSVAL 0.18 pYVMSQA 0.36 pYWNRVY 0.17 pYVQKSE 0.36 pYLFVLE 0.17 pYVVHNQ 0.36 pYWIYVG 0.17 pYVMGPQ 0.36 pYMDMHV 0.16 pYVFLLH 0.36 pYLKWEM 0.16 pYVHHYW 0.35 pYWIDAT 0.16 pYVAADH 0.35 pYLYRAQ 0.16 pYVAANL 0.35 pYMWEMG 0.15 pYVXXXX 0.32 pYMENFE 0.15 pYIDLHE 0.26 pYMNYTS 0.15 pYINEWG 0.26 pYYWVKG 0.15 pYINEAA 0.26 pYMMEPN 0.14 pYIRNYN 0.26 pYMQHIH 0.14 pYIDLLL 0.25 pYYEIHD 0.13 pYIWKWT 0.25 pYMAKSR 0.13 pYINDTA 0.25 pYMRDXX 0.13 pYIGVQK 0.25 pYLXXXX 0.13 pYIPWFY 0.25 pYLXXXX 0.13 pYIATYI 0.25 pYYPTKH 0.13 pYIMRGG 0.25 pYYYVPD 0.12 pYIEQKF 0.24 pYFDIMG 0.12 pYIWKRF 0.24 pYYQIDD 0.12 pYIVSVG 0.24 pYFVLHA 0.12 pYIWXXX 0.23 pYYIGWK 0.11 pYIPXXX 0.22 pYFMLEN 0.11 pYTWVIT 0.20 pYFLLET 0.10 pYTNFSF 0.20 pYFIRKE 0.10 pYTLVHV 0.20 pYARYFN 0.09 pYTWIKP 0.19 pYARWFK 0.09 pYLDVHM 0.19 pYSRIFY 0.08 pYTMFPH 0.19 pYAAGWF 0.07 pYTNEAP 0.19 pYSSWFM 0.06

a M, L-norleucine; X, could not determine amino acid identity; SM, SMALI score of the peptide sequence; Class I; sequences of the pY-(I/V/t/l/m/w/f/y)-(d/e/n/r/w)-(v)-(x)-(x) consensus; Class II, sequences of the pY-(I/V/y)-(K/r/h/p)-(P)-(g/p/h)-(r/v) consensus; Class III, sequences that did not follow any observable pattern. PLCγ1 C-SH2 was screened against library III. 55

Table 2.5 Summary of the sequence specificities of the SH2 domainsa

TSAd SH2: (h)-(h/r)-pY-(A/D)-(N)-(l/v)

PLCγ1 N-SH2: (Y/r)-(x)-pY-(I/V/t/m)-(d/i/v)-(I/V/t/f/m)-(x) (Class I) (x)-(x)-pY-(a)-(Y)-(Q)-(f/y) (Class II)

PLCγ1 C-SH2: pY-(I/V/t/l/m/w/f/y)-(d/e/n/r/w)-(v)-(x)-(x) (Class I) pY-(I/V/y)-(K/r/h/p)-(P)-(g/p/h)-(r/v) (Class II)

aUppercase letters indicate a strong preference for an amino acid, while lowercase letters indicate a slight preference for an amino acid at a given position. M, L-norleucine; X, no significant selectivity. The SH2 domains of PLCγ1 also selected for class III sequences, but they did not show a binding consensus.

56

Table 2.6 Potential TSAd SH2 protein interaction partnersa

Protein Phosphotyrosine Site Motif AFAP1 (iso3) Y537 GLpYDNL ALS2CR11 Y129 QHpYANL ARHGAP12 Y243 PVpYANL BEST1 Y131 IRpYANL C12orf35 Y95 ITpYANV Cas-L Y12 ALpYDNV CCDC63 Y192 AApYDNV CENTD1 Y77 PIpYANV CLIC2 Y239 NTpYANV CSPG5 Y507 AHpYDNV CTBP2 Y108 SGpYDNV Dok2 Y402 TEpYDNV DPY19L2 Y602 QGpYANL ERK2 Y43 SApYDNV GJB4 Y117 SLpYDNL Jun Y170 PVpYANL KCC1 Y17 GDpYDNL LIMD1 Y180 DYpYDNL MAIR-I Y231 LHpYANL MTMR3 Y614 RSpYDNL MYPT1 Y68 INpYANV NT5C1A Y34 IFpYDNL P130Cas Y12 ALpYDNV PATL1 Y401,Y566 DPpYANL/SMpYDNL PDK1 Y376 GNpYDNL PTPRD Y1386 NRpYANV PTPRF Y1381 NRpYANV PTPRS Y1422 NRpYANV RAB1B Y96 ESpYANV RICS Y2023 SQpYDNL SgK269 Y616 NApYDNL SHANK3 Y932 SPpYANL SHARP Y2877 AGpYANV SIRT2 Y104 GLpYDNL SNX18 Y78 ARpYANV SPTA1 Y1499 GDpYANL StARD13 Y459 SIpYDNV TAFII31 Y261 DDpYDNL THOC2 Y590 QKpYDNL USP6NL Y551 SQpYDNV XYLT2 Y284 QGpYDNV ZDHHC5 Y533 VRpYDNL ZDHHC8 Y541 VRpYDNL ZNF406 Y1022 EEpYANV

aList of proteins from Phosphosite search using Y-[AD]-[N]-[LV] as search criteria (http://www.phosphosite.org ). Listed phosphotyrosine sites are tyrosines suspected to be phosphorylated in vivo.

57

Table 2.7 Potential PLCγ1 N-SH2 protein interaction partners (class I)a

Protein Phosphotyrosine Site Motif AFAP Y453 YDpYIDV LEMD2 Y438 YPpYVGI MVP Y15 YHpYIHV NUDT15 Y92 YHpYVTI PCOLCE2 Y212 YDpYVAV SLCO3A1 Y626 YLpYVSI

aList of proteins from Phosphosite search using Y-X-Y-[IV]-X-[IV]

as search criteria (http://www.phosphosite.org).

58

Table 2.8 Potential PLCγ1 N-SH2 protein interaction partners (class II)a

Protein Phosphotyrosine Site Motif

AKR1B10 Y47 CApYVYQ AKR1D1 Y56 GApYIYQ ALK Y1584 VNpYGYQ CBX4 Y150 GKpYYYQ CENTD2 Y343 GSpYIYQ CNNM1 Y660 EHpYLYQ CNP Y141 DQpYQYQ CYLD Y713 DCpYFYQ DAZ1 Y685 PVpYNYQ DAZ2 Y355 PVpYNYQ FAM125B Y241 TDpYEYQ FLG Y3952 SSpYHYQ GANC Y254 DVpYGYQ GPATC2 Y199 RApYQYQ GSK3B Y161 KLpYMYQ Helicase B Y102 RSpYQYQ IRAK2 Y4 ACpYIYQ IRX5 Y7 QGpYLYQ KDELC2 Y367 FKpYKYQ KLC1 Y360 VEpYYYQ Kv1.3 Y161 ILpYYYQ LARP2 Y174 YSpYGYQ LKAP Y1501 HTpYHYQ MAGE-B10 Y124 LLpYKYQ MAP1B Y1870 FSpYAYQ MEA1 Y68 AGpYSYQ MTAC2D1 Y131 PFpYMYQ Muc5ac iso1 Y1467 LCpYNYQ Myogenin Y9 SPpYFYQ NPDC1 Y249 EMpYHYQ NPM-ALK Y644 VNpYGYQ PFAAP5 Y535 DRpYEYQ RICS Y2005 VLpYQYQ RNF148 Y130 IIpYNYQ RPN1 Y264 SRpYDYQ SLC27A6 Y80 DIpYTYQ SMAD2 Y165 NPpYHYQ SMAD3 Y125 NPpYHYQ SORBS1 Y536 SIpYEYQ SOX10 Y171 PDpYKYQ SOX8 Y169 PDpYKYQ SOX9 Y172 PDpYKYQ SPTAN1 Y976 ALpYDYQ SYTL2 iso17 Y634 RKpYTYQ SYTL2 iso7 Y110 RKpYTYQ Tensin1 Y796 SPpYDYQ WDR1 Y96 LKpYEYQ ZNF512B Y124 LKpYHYQ

aList of proteins from Phosphosite search using Y-X-Y-Q as search

criteria (http://www.phosphosite.org).

59

Table 2.9 Potential PLCγ1 C-SH2 protein interaction partners (class II)a

Protein Phosphotyrosine Site Motif

ACOX1 Y265 GTpYVKP BLNK Y189 ENpYIHP C4orf37 Y124 PApYYKP C5orf49 Y44 SYpYYRP C6orf167 Y925 LKpYIKP CABLES2 Y334 IEpYVKP CEP192 Y1129 PEpYVKP CEP76 Y323 CSpYVKP Clca1 Y91 ADpYVRP Diminuto Y299 GNpYYKP EXOC3L2 Y235 VEpYVRP GMD Y323 LKpYYRP GPX1 Y98 LKpYYRP HIGD1C Y86 KDpYIRP hnRNP R Y435 DYpYYHP HSP27 Y54 PGpYVRP INTS6 Y269 LIpYVRP KIAA1143 Y9 VSpYVRP MTMR12 Y23 VSpYVRP Nck1 Y268 CDpYIRP OR10C1 Y258 FIpYIRP OGDH Y968 YDpYVKP OR6C74 Y257 FMpYVKP OR9A4 Y259 FLpYVKP p27Kip1 Y88 EFpYYRP PHF16 Y733 QCpYVKP PSMD11 Y72 LKpYVRP PTPRCAP Y64 GGpYYHP SHANK2 Y1201 GNpYVHP SND1 Y421 VDpYIRP SPATA2 Y242 KDpYYKP STAT5A Y694 DGpYVKP STAT5B Y699 DGpYVKP TBX21 Y220 RLpYVHP Titin Y33006 DFpYYRP

aList of proteins from Phosphosite search using Y -[IVY]-[HKR]-[P] as

search criteria (http://www.phosphosite.org).

60

Figure 2.1 Synthesis scheme of the OBOC phosphotyrosine peptide librarya

H2N 45% Boc-Met 45% Boc-Met 45% Ac-Met a-b c d H2N 5% Fmoc-Met 5% Fmoc-Met 5% Fmoc-Met

50% H2N 50% Fmoc-Met 50% Fmoc-Met

45% Ac-Met e-f 5% H2N-AAXXpYXXXLNBBRM 50% H2N-AAXXpYXXXLNBBRM

a Shown is the synthesis scheme for library I: (a) soak in ddH20 overnight; (b) 0.45 equiv Boc- Met-NHS and 0.05 equiv Fmoc-Met-NHS in 55:45 (v/v) DCM/Et2O; (c) Fmoc-Met-OH and HBTU; (d) 50:50 (v/v) TFA/DCM then Ac2O/NMM/DMAP; (e) Standard Fmoc/HBTU chemistry; (f) modified reagent K. Library synthesized on 90 μm Tentagel S NH2 resin (0.26 mmol/g loading capacity).

Figure 2.2 Sequence specificity of the TSAd SH2 domaina

a Sequence specificity is shown from the N-terminus (-2 position) to the C-terminus (+3 position). The y-axis represents percent occurrence of each amino acid while the x-axis represents each amino acid in the random positions. C, L-α-aminobutyric acid; M, L-norleucine.

61

Figure 2.3 Sequence specificity of the PLCγ1 N-SH2 domain (class I sequences)a

aSequence specificity is shown from the N-terminus (-2 position) to the C-terminus (+4 position). The y-axis represents percent occurrence of each amino acid while the x-axis represents each amino acid in the random positions. M, L-norleucine.

Figure 2.4 Sequence specificity of the PLCγ1 C-SH2 domain (class I sequences)a

aSequence specificity is shown from the N-terminal random position (+1) to the C-terminal random position (+5). The y-axis represents percent occurrence of each amino acid while the x-axis represents each amino acid in the random positions. M, L-norleucine.

62

CHAPTER 3

DISTINCT LIGAND SPECIFICITY OF THE TIAM1 AND TIAM2 PDZ DOMAINS1

3.1 Introduction

The T-cell lymphoma invasion and metastasis 1 (Tiam1) protein and its homologue, Tiam2, also known as STEF (SIF and Tiam1-like exchange factor), are guanine nucleotide exchange factor proteins that specifically activate the Rho family

GTPase Rac1 (229, 230). Tiam1 is important for the integrity of adherens junctions (231,

232), tight junctions (233, 234), and cell-matrix interactions (235, 236). In addition,

Tiam1 is involved in axon formation (237) and neurite outgrowth (238-241).

Deregulation of Tiam1 has been implicated in invasive and metastatic forms of lung

(242) and colorectal (243) cancer and may be a predictor of poor patient outcome for renal cell (244), prostate (245), and hepatocellular (246) carcinomas. Tiam2 has been shown to be important for focal adhesion disassembly (247), neuronal development, and neurite growth (248, 249), yet its role in disease has not been established. In mammalian cells, the spatial and temporal regulation of Tiam-like proteins is achieved via distinct protein-protein interaction domains. However, the mechanisms that regulate each of these

1 Reproduced from Shepherd, T., Hard, R., Murray, A., Pei, D., and Fuentes, E. Biochemistry 50, 1296- 1308, Copyright 2010 American Chemical Society. 63

interaction domains are not fully understood.

Tiam1 and Tiam2 are both composed of a set of diverse interaction domains, including a Pleckstrin homology-coiled-coilextension (PHn-CC-Ex) cassette that is important for subcellular localization, a Ras binding domain (RBD) known to bind activated Ras, a Dbl homology-Pleckstrin homology (DH-PHc) catalytic bidomain that activates Rac1, and a post-synaptic density-95/discs large/zonula occludens-1 (PDZ) domain that binds cell adhesion molecules (Figure 3.1A). PDZ domains are ubiquitous protein-protein binding domains found in bacteria and eukaryotes (250, 251) that typically interact with carboxy-terminal residues 4-10 of partner proteins to form higher- order signal transduction complexes. Recent studies identified Syndecan1 (236) and

CADM1 (252) as PDZ domain-binding proteins for the Tiam1 PDZ domain, but to date, no binding partners have been reported for the Tiam2 PDZ domain. Although other interaction domains have a high degree of sequence identity across the Tiam family of proteins, the Tiam1 and Tiam2 PDZ domains are only ~28% identical (Figure 3.1A and

B). This suggests potential differences in ligand specificity and biological function.

Previously, we determined the structure of the Tiam1 PDZ domain free and bound to a model peptide ligand (236). This structure shows that the peptide binding cleft is formed by β2 and α2 of the PDZ domain, and that ligand specificity is partially derived from two pockets (S0 and S-2, S for site) (Figure 3.1C and D). The S0 pocket is formed by the side chains of F860, Y858, L915, and L920 and accommodates the side chain of the most C-terminal residue of the ligand (P0, where P-n denotes the residue position n amino acids from the C-terminus). Residues L911 and K912 form the S-2 pocket that accommodates the side chain of the ligand at P-2. NMR-based titrations of the Tiam1 64

PDZ domain with C-terminal eight-residue peptides that differ primarily at position P0

(Syndecan1 and Caspr4) showed that the chemical shifts of L911, K912, and L915 were significantly perturbed upon binding of the Syndecan1 peptide, but not upon binding of the Caspr4 peptide. In contrast, the chemical shift of L920 was not perturbed upon binding of Syndecan1 but was shifted upon binding of the Caspr4 peptide (236). These data suggest that residues L911, K912, L915, and L920 are determinants for Tiam1 PDZ domain specificity. Notably, these residues are not conserved between the Tiam1 and

Tiam2 PDZ domains (Figure 3.1B), further suggesting that there might be differences in

Tiam1 and Tiam2 PDZ domain specificity.

The specificity of the Tiam1 and Tiam2 PDZ domains for C-terminal ligands has been previously investigated, but the results from these studies appear to be inconsistent.

Songyang et al. (167) used immobilized PDZ domains to select binding peptides derived from a synthetic, randomized peptide library and determined that the Tiam1 PDZ domain had a preference for ligands with Ala and Phe at position P0 (167). Using phage display,

Tonikian et al. (9) determined the specificity for approximately half of the PDZ domains found in humans, including the Tiam1 and Tiam2 PDZ domains. The consensus sequence determined by Tonikian et al. (9) for the Tiam1 PDZ domain was in reasonable agreement with that found by Songyang et al. (166). For the Tiam2 PDZ domain, however, there is an apparent inconsistency among the reported specificities. Tonikian et al. (9) reported a clear preference for Val at position P0, whereas Stiffler et al. (179) identified Caspr4, which contains a C-terminal Phe, as a Tiam2 PDZ domain-binding protein using a fluorescence-based protein microarray assay. These discrepancies prompted us to conduct a comprehensive study comparing the specificity of the Tiam1 65

and Tiam2 PDZ domains. We screened a combinatorial peptide library to determine the consensus binding sequence for each PDZ domain and found they possess overlapping but distinct specificities. These differences were corroborated by binding assays using several physiologically relevant ligands and resolve the apparent inconsistency regarding the specificity of the Tiam2 PDZ domain. Furthermore, we identified four residues in the

Tiam1 PDZ domain that are crucial for the differences in specificity and investigated their thermodynamic origin by site-directed mutagenesis and double-mutant cycle analyses. Finally, mutation of these four residues in the Tiam1 PDZ domain was sufficient to switch its specificity to that of the Tiam2 PDZ domain.

3.2 Experimental Procedures

3.2.1 Expression, Purification, and Labeling of the PDZ Domains

The Tiam1 PDZ domain expression plasmid has been described previously (236).

All Tiam1 PDZ domain mutants (single, double, and quadruple) were produced using oligonucleotide-directed mutagenesis (QuikChange, Stratagene) with the wild-type or mutant Tiam1 PDZ domain DNA as a template. All mutants were verified by automated DNA sequencing (University of Iowa DNA Facility). The Tiam2 PDZ domain expression plasmid was constructed by amplification of the DNA sequence encoding the

PDZ domain (residues 809-994) from the full-length mouse Tiam2 sequence using polymerase chain reaction (PCR) (253). The amplified PDZ domain DNA was ligated into a modified pET21a vector (Novagen) that contains an N-terminal His6 tag and a tobacco etch virus (rTEV) protease cleavage site. The nucleotide coding sequence of the pET21a-Tiam2 PDZ vector was verified by automated DNA sequencing (University of

Iowa DNA Facility). Glutathione S-transferase (GST)-fused Tiam1 and Tiam2 PDZ 66

domains were constructed by subcloning the PDZ domain encoding DNA into a modified pGEX vector (GE Healthcare).

All proteins were produced in BL21(DE3) (Invitrogen) cells.

Typically, E. coli cells were grown at 37 °C in Luria-Bertani (LB) medium supplemented with ampicillin (100 μg/mL) under vigorous agitation until an A600 of 0.6-1.0 had been reached. Cultures were subsequently cooled to 25 °C, and protein expression was induced by the addition of isopropyl 1-thio-D-galactopyranoside to a final concentration of 1 mM.

Induced cells were incubated for an additional 6-8 h at 25°C and harvested by centrifugation.

The histidine-tagged Tiam1 and Tiam2 PDZ domains were purified by nickel- chelate and size-exclusion (G-50 or S-75) chromatography (GE Healthcare). The N- terminal His6 affinity tag was removed by proteolysis by incubation of the histidine- tagged PDZ protein with recombinant rTEV protease for 12-16 h at room temperature.

Undigested fusion protein, cleaved His6 tag, and histidine-tagged rTEV were separated from the digested PDZ domain by nickel-chelate chromatography. The final yield was ~20 mg of PDZ protein from 1 L of culture, at ~95% purity as judged by sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Samples were used immediately or stored at -20 °C. GST-Tiam1 and GST-Tiam2 PDZ domain fusion proteins were purified by standard affinity chromatography using glutathione Sepharose 4B medium (GE

Healthcare) followed by size exclusion (S-75) chromatography. The concentration of the protein in solution was determined by measuring the UV absorbance at 280 nm using the extinction coefficient of the protein calculated from the protein sequence (SEDNTERP version 1.09). 67

GST-Tiam1 and GST-Tiam2 PDZ domain proteins (≥ 2 mg/mL) were labeled with biotin by incubation with 2 equiv of (+)-biotin N-hydroxysuccinimide ester (Sigma) in 0.1M NaHCO3 (pH 8.4) for 30 min. The reaction was quenched by the addition of 50

μL of 1 M Tris buffer (pH 8.4). The reaction mixture was passed through a G-25 size- exclusion column (eluted with phosphate-buffered saline) to remove any free biotin. The proteins were also labeled with Texas Red (Texas Red-X succinimidyl ester, Invitrogen) in a similar manner (2 equiv of Texas Red, 45 min reaction, and quenching with 5 μL of

1 M Tris buffer).

3.2.2 Library Screening

An inverted peptide library was synthesized on 2.0 g of TentaGel S NH2 resin (90

μm, 0.29 mmol/g, 2.86 x 106 beads/g) and characterized as previously described (187).

For a typical screening experiment, 10-50 mg of the library resin was swollen in dichloromethane (DCM) in a 1.2 mL spin column (Micro Bio-Spin, Bio-Rad) for 2 h, washed with dimethylformamide (DMF) and water, and suspended in HBST-gelatin buffer [30 mM HEPES, 150 mM NaCl, 0.05%Tween 20, and 0.1% gelatin (pH 7.4)] for

4 h. The resin was drained and resuspended in the HBST-gelatin buffer containing 1-2

μM biotinylated PDZ domain protein. After overnight incubation at 4 °C (with shaking), the protein solution was removed and the resin resuspended in streptavidin-alkaline phosphatase (SA-AP) buffer [10 mM MgCl2, 30 mM Tris-HCl, 70 μM ZnCl2, and 250 mM NaCl (pH 7.4)] with 1 μg/mL SA-AP enzyme. After 10 min at 4 °C, the resin was drained and then washed twice with the SA-AP buffer and twice with SA-AP reaction buffer [5 mM MgCl2, 30 mM Tris-HCl,20 μM ZnCl2, and 100 mM NaCl (pH 8.5)]. The resin was transferred into one well of a 12-well plate (BD Falcon) using SAAP reaction 68

buffer. After the addition of 100 μL of 5 mg/mL 5-bromo-4-chloro-3-indolyl phosphate

(BCIP), the resin was incubated at room temperature on a rotary shaker until positive beads took on a turquoise color. The staining reaction was terminated by the addition of

100 μL of 1 M HCl to the resin, and the positive beads were removed manually with a micropipet with the aid of a dissecting microscope.

For screening against Texas Red-labeled PDZ domains, 10-50 mg of the peptide library resin was swollen and washed in DCM, DMF, water, and the gelatin buffer as described previously (187). The resin was then transferred to a Petri dish (60 mm x 15 mm) (BD Falcon), and fluorescently labeled protein was added to produce a final concentration of 0.7-1 μM. The dishes were incubated overnight at 4 °C with gentle shaking. The resulting beads were then examined under an Olympus SZX12 fluorescence microscope (Texas Red filter), and beads having the brightest red fluorescence were isolated using a micropipet and sequenced by PED-MS (193).

3.2.3 Peptide Sequencing

Positive hits from screens were sequenced the same as described in section 2.2.5, except that a 40:1 molar excess of PITC to Fmoc-OSu was used in the PED procedure.

3.2.4 SMALI Analysis of Sequences

The procedure for assigning SMALI scores to peptide sequences is the same as described in section in 2.2.6.

3.2.5 Synthetic Peptides

All peptides were synthesized and purified by GenScript Inc. (Piscataway, NJ).

The peptides used were >95% pure as judged by analytical HPLC and mass spectrometry.

Peptides for fluorescence anisotropy-based binding assays were dansylated at their 69

N-termini. The concentration of peptide in solution was determined by measuring the UV absorbance at 280 nm using the extinction coefficient of the danysl-peptide calculated from the peptide sequence (SEDNTERP version 1.09) and the dansyl group (254). The following peptides were used in this study: library-derived Tiam-binding peptides

(YAAKAFRFCOOH, YAAYRYRACOOH, YAAGRKHFCOOH, YAALIHKFCOOH, YAAEK-

YWACOOH, YAARKFAKCOOH, YAAKRTYVCOOH, and YAAQKHFHCOOH), Model

(SSRKEYYACOOH) (163), human Syndecan1 (residues 303-310, TKQEEFYACOOH), human Caspr4 (residues 1301-1308, ENQKEYFFCOOH), Caspr4 (F→A) (ENQKEYF-

ACOOH), Syndecan1 (A→F) (TKQEEFYFCOOH), and human Neurexin1 (residues 1470-

1477, NKDKEYYVCOOH).

3.2.6 Peptide Binding Affinity Determination

Fluorescence anisotropy was used to monitor the binding of dansylated peptides to the Tiam1 and Tiam2 PDZ domains. The PDZ domain protein (1.5 mM stock) was titrated into a solution containing 1.3 mL of 1 μM dansylated peptide until little or no change in the measured anisotropy was evident. Fluorescence anisotropy measurements were recorded in a quartz cuvette at 25 °C with constant stirring using a Fluorolog-3

(Horiba Jobin Yvon) or LS-55 (Perkin-Elmer) spectrofluorometer. The excitation and emission wavelengths were set to 340 and 550 nm, respectively. The excitation and emission slit widths were adjusted to 3 nm (excitation) and 8 nm (emission), respectively, and individual measurements were integrated over 3 s. The data were baseline corrected using a buffer blank, and the titration curves were fit to a standard hyperbolic binding

70

model (Eq. 1):

Bmax[PDZ ] A Amin  (1) Kd [PDZ ]

where A is the anisotropy at each titration step, Amin is the initial anisotropy, Bmax is the maximal anisotropy at PDZ domain saturation, Kd is the dissociation constant, and [PDZ] is the total concentration of the PDZ domain in solution. Bmax and Kd were determined by fitting the titration data to eq 1 using nonlinear regression analysis (SigmaPlot, Systat

Software Inc.). For presentation (see figure 3.4), data were normalized to the fitted Bmax.

The reported dissociation constants are the average of at least three independent experiments. Using this assay, reliable quantification of the dissociation constant for PDZ domain-model peptide interactions was possible to approximately 250 μM.

The change in the Gibbs free energy of the PDZ domain-ligand interaction was determined by Eq. 2:

(2) ΔGb = RT ln(Kd)

where R is the gas constant, T is 298.15 K, and Kd is the fitted dissociation constant. The standard error in ΔGb was propagated by standard methods (255). The coupling free energy was determined by Eq. 3:

ΔΔΔGint = (ΔGWT – ΔGM1) – (ΔGM2 – ΔGDM) (3)

where ΔGWT is the free energy of peptide binding for the wild-type Tiam1 PDZ domain,

ΔGDM is the free energy of binding for the PDZ domain double mutant, and ΔGM1 and

71

ΔGM2 are the binding energies for the respective PDZ domain single mutants.

3.3 Results

3.3.1 Determination of Tiam1 and Tiam2 PDZ Domain Specificities by Screening a Peptide Library

We and others have previously reported on the specificity and protein binding partners of the Tiam1 PDZ domain (9, 167, 236). However, less is known about the specificity of the Tiam2 PDZ domain, and questions remain about its specificity for peptide ligands. Here, we investigated differences in the specificities of the Tiam1 and

Tiam2 PDZ domains and determined the thermodynamic origin of these differences.

We have previously synthesized a one-bead-one-compound (OBOC) peptide library with a free C-terminus on 90 μm TentaGel resin (187). In this library, each bead is topologically segregated into two layers. The surface layer displays an inverted peptide with variable sequence and a free C-terminus of the form resin-MLLBBEAAX-4X-3X-2

X-1X0COOH, where B is β-alanine and X-4-X0 represent L-α-aminobutyrate (Abu or

C, a replacement of cysteine), L-norleucine (Nle or M, a replacement of methionine), or any of the remaining 18 amino acids (Figure 3.2). The interior layer of the bead contains the corresponding peptide in the normal orientation (with a free N-terminus), which serves as an encoding tag for determining the sequence of the peptide. The library has a theoretical diversity of 205 or 3.2 x 106. We conducted library screening by incubating the OBOC library (typically 10-50 mg of the library resin) with fluorescently labeled or biotinylated GST-PDZ domain fusion protein and selecting those beads that were fluorescent or turquoise-colored (using an on-bead streptavidin-alkaline phosphatase and BCIP substrate) (187). The stringency of the screeningwas controlled

72

(by varying the amount of GST-PDZ domain fusion protein and staining reaction time) so that ~0.05% of the library beads (typically 10-60 beads) became positive. The resulting positive beads, which should carry peptides of the highest affinities to the target protein, were manually removed from the library and individually sequenced by PED-MS (193).

Note that macromolecules such as the GST-PDZ domain fusion proteins are too large to diffuse into the TentaGel beads and therefore only accessible to the inverted peptides on the bead surface. Thus, the encoding peptides in the bead interior do not interfere with library screening.

3.3.2 Sequence Specificity of the Tiam1 PDZ Domain

The Tiam1 PDZ domain was screened against a total of 102 mg of the library

(~290,000 beads) in six separate experiments to give a total of 126 positive beads.

Although the number of beads and/or peptides screened represented only ~10% of the library sequence space, we have previously shown that the consensus sequence(s) of a typical protein domain can be unambiguously determined by screening only 10% of the library, because not all of the random positions are required for binding to the target proteins (193, 194). Sequencing of the 126 beads by PED-MS (193) gave 106 reliable sequences (Table 3.1); for the other 20 beads, the quality of their mass spectra was too poor to allow for definitive sequence assignment. All six screening experiments gave the same types of (though not identical) sequences, demonstrating the reproducibility of the screening procedure. The 106 sequences were initially sorted into four different groups on the basis of sequence similarities with the aid of Microsoft Excel. Subsequently, each group of sequences was treated with SMALI analysis to construct a position-weighted matrix and assign a SMALI score (Sm), which indicates the propensity of the peptide to 73

bind to the query PDZ domain (Table 3.1). The group I ligands have a consensus sequence of X-Φ-[F/Y]-X-[A/Abu]COOH (where Φ is a hydrophobic residue, especially β- branched amino acids, while X is any amino acid) (Table 3.1 and Figure 3.3). The 10 sequences that do not contain Ala or Abu at position P0 all have very low SMALI scores

(<1), suggesting that are either weak binders or false positives. The group II ligands (23 sequences) have a similar preference for a Phe/Tyr at P-2 and predominantly contain Phe at position P0, which is occasionally replaced with Nle or Tyr. Interestingly, the group II ligands show a significant preference for a positively charged residue (Arg or Lys) at position P-4, a trend not observed for the group I ligands (Table 3.1). The group III peptides are rich in basic residues (Arg and Lys) at all positions, whereas the group IV peptides each contain an H-X-H or H-X-X-H motif. We have observed both group III and

IV sequences during our studies with other PDZ and SH2 domains and found that they have no measurable affinity for the protein domains used in the screens (186, 187). Two of the group III (YAARKFAKCOOH) and group IV peptides (YAAQKHFHCOOH) were synthesized with dansyl chloride at the N-terminus and tested for binding to Tiam1 and

Tiam2 PDZ domains in the solution phase. Both peptides failed to bind to any of the PDZ domains (vide infra). The group III sequences are likely caused by the high ligand density on TentaGel beads (~100 mM), which permits a negatively charged protein molecule (or a negatively charged surface patch) to electrostatically interact with multiple peptides on the bead surface (186). The origin of the group IV peptides, which are primarily associated with the SA-AP/BCIP screening method, remains unknown. Thus, we consider both group III and IV sequences as false positives, although we cannot rule out the possibility that some of them may bind weakly to the Tiam1 PDZ domain, as our 74

previous work has shown that certain weak ligands can be selected from a library if their specific binding (desired) is enhanced by undesired multidentate interactions (186).

3.3.3 Sequence Specificity of the Tiam2 PDZ Domain

Screening against the PDZ domain of Tiam2 yielded the same four groups of sequences (Table 3.2). The group I peptides that bind this PDZ domain are similar to those of the Tiam1 PDZ domain, except that Val was also selected at the C-terminus. The group II peptides were also similar to the group II peptides of the Tiam1 PDZ domain, in that they both selected an aromatic residue (Phe or Tyr) at the C-terminal position.

However, the two PDZ domains show some key differences in their specificities. First, the Tiam1 PDZ domain selected predominantly Phe at position P0, whereas the Tiam2

PDZ domain appears to bind Phe and Tyr equally well (Figure 3.3). Second, while the

Tiam1 PDZ domain selected more group I peptides (47 sequences) than group II peptides

(23 sequences), the Tiam2 PDZ domain selected a much greater number of group II peptides (10:44 group I:group II ratio), suggesting that it has higher affinity for the group

II sequences. Additionally, the group II peptides selected by the Tiam2 PDZ domain exhibited broader specificity at the C-terminal position, accepting bulkier side chains

(e.g., Val) (Tables 3.1 and 3.2). Third, the Tiam1 PDZ domain strongly prefers Phe or

Tyr at position P-2, whereas the Tiam2 PDZ domain accepts both Tyr and positively charged residues (His, Lys, and Arg) (Figure 3.3). Overall, the results from the peptide library screen suggest that the Tiam1 and Tiam2 PDZ domains have overlapping but distinct specificities for peptide ligands and indicate that positions P0 and P-2 are the most crucial determinants for their specificities. The group III and IV sequences are deemed false positives. The Tiam2 PDZ domain selected a smaller number of false positive 75

sequences as compared to the Tiam1 PDZ domain. This is likely due to the fact that the

Tiam2 PDZ domain generally has higher affinities for the library peptides (vide infra) and is therefore less susceptible to interference from nonspecific binding (194).

3.3.4 Binding Affinities of Selected Peptides to the PDZ Domains

To validate the library screening results, we randomly selected and synthesized three representative peptides for each PDZ domain (YAAKAFRFCOOH, YAAYRYR-

ACOOH, and YAAEKYWACOOH for Tiam1 and YAAGRKHFCOOH, YAALIHKFCOOH, and

YAAKRTYVCOOH for Tiam2) and determined their binding affinities by a fluorescence anisotropy assay. These peptides represent group I and II peptides that make up the major binding motif found in the library screen. As mentioned above, two group III and IV peptides were also tested and did not bind to either the Tiam1 or Tiam2 PDZ domain. In contrast, the three group I and II peptides tested for the Tiam2 PDZ domain bound with affinities typical for PDZ domains (Kd=10-80 μM) and showed no detectable binding to the Tiam1 PDZ domain (Table 3.3). Interestingly, the three library-derived Tiam1 peptides only weakly bound the Tiam1 PDZ domain (Kd values from 90 to >250 μM) but bound the Tiam2 domain with higher affinities (Kd ~ 20 μM). The weak affinity of the

Tiam1 PDZ domain also explains why library screening against this domain was more problematic and resulted in a larger number of false positive beads as compared to the number for the Tiam2 PDZ domain (Tables 3.1 and 3.2). A possible explanation for the generally weak binding of the Tiam1 PDZ domain is that residue P-5 may also contribute to binding (236). To test this notion, we synthesized and tested a peptide containing a Gln at position P-5, because several Tiam1-interacting proteins contain a Gln at this position

(Table 3.3). However, the addition of a Gln at position P-5 did not enhance binding, 76

suggesting that this position does not significantly contribute to binding. Despite the fact that the selected library sequences had only weak affinity for the Tiam1 PDZ domain, the obtained preferences at each position closely recapitulate the consensus sequences identified by the previous studies of Songyang et al. (167) and Tonikian et al. (9), further validating the results of our screening. The derived consensus sequences for the Tiam1 and Tiam2 PDZ domains are listed in Table 3.4.

3.3.5 Potential Tiam1- and Tiam2-Binding Proteins

Using a composite consensus sequence based on the studies presented here and previous studies (9, 167, 179, 236), we searched the PROSITE database (256) for candidate Tiam1 and Tiam2 PDZ domain binding proteins. This analysis yielded 12 human proteins for Tiam1 and 43 proteins for Tiam2 (Tables 3.5 and 3.6). Among the identified candidate proteins were Syndecan proteins, Contactin-associated protein-like4

(Caspr4), and Neurexin1. Syndecans are cell-cell and cell-matrix adhesion proteins (257,

258), and Caspr4 (259) and Neurexin1 (260, 261) are neuronal cell-cell adhesion proteins of the Neurexin family. The four isoforms of Syndecan (1-4) have very similar C- terminal sequences, with the last four residues being EFYACOOH; the four C-terminal residues of Caspr4 are EYFFCOOH, and Neurexin1 has a C-terminus of EYYVCOOH. The putative Tiam1 PDZ domain-binding proteins were consistent with those found previously (236), while those for the Tiam2 PDZ domain constitute a novel set of proteins that may link Tiam2 to new functions (Table 3.6).

We next tested the two PDZ domains against peptide ligands derived from the C- termini of known and putative PDZ domain binding proteins. We have previously shown that the C-terminal peptides from Syndecan1 (TKQEEFYACOOH) and Caspr4 (ENQKEY- 77

FFCOOH) bound the Tiam1 PDZ domain with typical PDZ domain ligand affinities, whereas the Neurexin1 peptide (NKDKEYYVCOOH) bound with a very low affinity

(Table 3.3) (236). Because the library screening indicated differences in specificity between the Tiam1 and Tiam2 PDZ domains, we investigated whether the Tiam2 PDZ domain could bind these physiologically relevant peptide ligands. The Caspr4 peptide matches both the Tiam1 and Tiam2 PDZ domain consensus sequences, and it indeed bound the Tiam2 PDZ domain with a Kd value of 3.4 μM, ~6-fold lower than that of the

Tiam1 PDZ domain (Table 3.3). The Syndecan1 peptide, which does not match the

Tiam2 consensus sequence, bound only very weakly to the Tiam2 PDZ domain (Kd

~ 200 μM). Given the general trend that the Tiam2 PDZ domain binds ligands with higher affinity than the Tiam1 PDZ domain, the weak binding of the Tiam2 PDZ domain for the Syndecan1 peptide suggests that this interaction is clearly not favored. In contrast, the Tiam2 PDZ domain bound the Neurexin1 peptide with high affinity (Kd = 5.0 μM), even though it contains a Val at the C-terminus, which was less frequently selected by the

Tiam2 PDZ domain during library screening (Figure 3.3). This observation highlights the importance of acquiring individual binding sequences during library screening, as minor consensus sequences of this type (which bind to the target protein with high affinity but have low abundance in the library) would have been overlooked by some of the other library methods that select for both affinity and abundance [e.g., the oriented peptide library method (167)].

We previously employed Caspr4 and Syndecan1 peptides with point mutations at

P0 to examine the specificity of the Tiam1 PDZ domain (236). Here, we used these peptides to further probe the specificity of the Tiam2 PDZ domain at position P0. 78

Specifically, the C-terminal residues of the Syndecan1 and Caspr4 ligand were switched to Phe and Ala, respectively [denoted as Syndecan1 (A→F) and Caspr4 (F→A), respectively].We found that substitution of Ala for the C-terminal Phe of the Caspr4 peptide reduced its binding affinity for the Tiam2 PDZ domain by 23-fold, whereas replacement of the C-terminal Ala of the Syndecan1 peptide with a Phe increased the affinity by 44-fold (Table 3.3). These experiments corroborate the findings obtained from the peptide library screen and indicate that position P0 is a key determinant that defines the specificity difference between the Tiam1 and Tiam2 PDZ domains and that a lack of optimal residues in P0 seems to be partially offset in Tiam2 by the presence of a Tyr at position P-2.

3.3.6 S0 and S-2 Residues Selectively Modulate Ligand Affinity and Specificity

To establish the thermodynamic origin of the distinct specificity between the two

Tiam PDZ domains, we probed the individual specificity pockets identified in the Tiam1

PDZ domain structure. To understand the roles of residues L915, L920, L911, and K912 in Tiam1 PDZ domain affinity and specificity, we created single-site mutants at each of these positions, changing the residue to the corresponding amino acid found in the Tiam2

PDZ domain. In addition, the double mutants L915F/L920V and L911M/K912E were created. For each mutant, the Kd and free energy of binding (ΔGb, eq 2) were determined for the interaction with the Syndecan1 and Caspr4 peptides (Table 3.7). Syndecan1 binding was disrupted by all of the mutations in the S0 pocket: the Kd was 3-fold higher in the case of L915F, 2-fold higher for L920V, and ~5-fold higher for the L915F/L920V double mutant. The Caspr4 binding data were more heterogeneous, with a ~3-fold increase in Kd for the L915F mutant, a ~2-fold decrease in Kd for the L920Vmutant, and 79

a 4-fold increase in Kd for the L915F/L920V double mutant. Binding of Neurexin1 to each single mutant was too weak to allow for reliable measurement of the Kd changes and was therefore not analyzed in detail. Notably, however, the L915F/L920V double mutant bound the Neurexin1 peptide with a Kd ~14-fold lower than that of the wild-type

Tiam1 PDZ domain.

When residues in the S-2 pocket were probed, the L911M mutation did not disrupt binding to either the Syndecan1 or Caspr4 peptide, as both interactions had affinities similar to that of the wild-type Tiam1 PDZ domain (Table 3.7). The K912E mutation disrupted binding to both the Syndecan1 (5-fold) and Caspr4 (3-fold) peptides. Both the

L911M and K912E mutants bound very weakly to the Neurexin1 peptide and were not analyzed further. The L911M/K912E double mutant, however, exhibited significant impairment in Syndecan1 binding (~8-fold), yet its affinity for the Caspr4 peptide was nearly the same as that for the wild-type Tiam1 PDZ domain. Finally, the Neurexin1 peptide bound to the L911M/K912E double mutant with a Kd ~ 9-fold lower than that for the wild-type Tiam1 PDZ domain. These results suggest that residues in the S0 and S-2 pockets are important for determining the specificity of the Tiam1 PDZ domain, and that their relative importance is dependent upon the peptide ligand being examined.

3.3.7 Double-Mutant Cycle Analysis of the S0 and S-2 Binding Pocket Residues

We used double-mutant cycle analysis to further characterize the energy of interaction between sites mutated in the Tiam1 PDZ domain. This type of analysis involves construction of thermodynamic cycles from two mutants made individually and then together within the same protein, and measurement of free energy to determine whether these individual mutations cause additive changes in energetics (262). The 80

results of the double-mutant cycle analysis are listed in Table 3.7. The two amino acids of the S0 binding pocket that were probed (L915 and L920) were found to act cooperatively with respect to both Syndecan1 (ΔΔΔGint = 0.36 kcal/mol) and Caspr4 (ΔΔΔGint = 0.38 kcal/mol) binding. In both cases, most of the loss of binding energy was attributable to the L915F mutant, and the L920V mutant was unable to rescue binding. Double-mutant cycle analysis was also applied to residues L911 and K912 of the S-2 pocket. These residues were not coupled with respect to Syndecan1 binding (ΔΔΔGint = 0.07 kcal/mol) but did act cooperatively (ΔΔΔGint = -0.25 kcal/mol) with respect to Caspr4 peptide binding. These data indicate that residues L915 and L920 in the S0 pocket work cooperatively to provide selectivity for both ligands. In contrast, energetic coupling between residues L911 and K912 was ligand-dependent, suggesting that they interact in a distinct manner with different ligands.

3.3.8 Residues in S0 and S-2 Pockets Determine Tiam1 and Tiam2 PDZ Domain Specificity

Having established that single and double Tiam1 PDZ domain mutations within the S0 and S-2 binding pockets can modify ligand specificity, we next combined the four mutations in the binding pockets to produce a Tiam1 quadruple mutant (QM) and determined the combined effect on binding the Syndecan1, Caspr4, and Neurexin1 peptides. While none of the single or double mutants was able to fully re-create the specificity profiles of the Tiam2 PDZ domain, the quadruple mutant Tiam1 PDZ domain did; it was able to bind both the Caspr4 and Neurexin1 peptides but not the

Syndecan1 peptide (Figure 3.4). The Tiam1 QM PDZ domain bound the Caspr4 peptide with approximately the same Kd as the wild-type PDZ domain, while the affinity for the

81

Neurexin1 peptide was enhanced 52-fold. In contrast, the Tiam1 QM PDZ domain bound the Syndecan1 peptide with a 5-fold greater Kd than did the wild-type PDZ domain.

Comparison of the Tiam1 QM PDZ domain with the Tiam2 PDZ domain indicated that although the peptide specificity profiles of the two are the same, their affinity for these peptides was distinct. For example, the QM PDZ domain bound the Caspr4 peptide with a 5-fold higher Kd, the Neurexin1 peptide with a 9-fold higher Kd, and the Syndecan1 peptide with a 1.6-fold lower Kd than did the Tiam2 PDZ domain. Taken together, these data show that the four mutations in the Tiam1 PDZ domain alone are sufficient to effectively recapitulate the binding specificity of the Tiam2 PDZ domain, but additional mutations are needed to match the affinities of the Tiam2 PDZ domain precisely.

3.4 Discussion

The Tiam1 and Tiam2 GEF proteins have generally been assumed to have similar and overlapping functions within the cell (263, 264). This notion is reinforced by the fact that both proteins have similar domain compositions and high degrees of sequence conservation in the PHn-CC-Ex and DH-PHc catalytic domains (Figure 3.1A).

Furthermore, the PHn-CC-Ex domains of both proteins show redundancy in structure and binding partners (265), and both DH-PHc domains are known to specifically activate the

Rac1 GTPase (249). Nevertheless, close examination of other domains in Tiam1 and

Tiam2 proteins, such as the PDZ domain, suggests that this functional redundancy may not apply to all domains. Previous studies on the specificity of the Tiam1 and Tiam2 PDZ domains have generated significant discrepancies. We sought to resolve these discrepancies by using a novel combinatorial peptide screen (187) to independently determine the consensus binding sequences for the Tiam1 and Tiam2 PDZ domains. 82

Furthermore, we were interested in determining the molecular origin of the Tiam family

PDZ domain specificity.

The study by Songyang et al. (167) showed that the Tiam1 PDZ domain has a preference for peptides containing Phe or Ala at the C-terminal position (P0) (Table 3.4).

Tonikian et al. (9) reported that the Tiam1 PDZ domain preferred Phe at P0, but not Ala.

Our results show that both Phe and Ala are preferred residues at position P0. Additionally, our study reveals a preference for Abu, used as a cysteine replacement. The preference for Ala (or Abu) can be readily explained by the structure of the Tiam1 PDZ domain, which has a shallow S0 pocket accepting the small methyl group of Ala, whereas a conformational change(s) in this pocket or the peptide ligand may be necessary to accommodate the larger side chain of Phe (236). Table 3.4 shows that the Tiam1 PDZ domain had amino acid preferences for N-terminal residues of the ligand as well. All previous studies indicate that the PDZ domain prefers a hydrophobic side chain (Tyr and Trp) at position P-1 (Table 3.4).Our results show that Arg is also preferred and almost all amino acids are tolerated. The Tiam1 PDZ domain-model peptide structure shows that the P-1 side chain points to the solvent, although it can make hydrophobic and/or hydrogen bonding interactions with residues S861, N876, and S877 in the S-1 pocket

(236). At position P-2, we found that the Tiam1 PDZ domain is highly selective for aromatic residues (Phe and Tyr), in agreement with the results of Songyang et al. (167).

The Tiam1PDZ domain-model peptide structure shows that the Tyr side chain fits snugly into a large, deep pocket on the PDZ domain surface (236). It is unclear why the phage display experiment of Tonikian et al. (9) selected Gly exclusively at this position. The

Tiam1 PDZ domain again tolerates a variety of amino acids at position P-3 but has some 83

preference for residues containing larger hydrophobic side chains (e.g., Ile, Tyr, Leu, and

Arg) (Figure 3.3). In the PDZ domain-model peptide structure (Figure 3.1C), the P-3 side chain is exposed to the solvent and therefore many different residues are accommodated at this position. However, the β- and γ-methylene groups of the P-3 Glu are engaged in hydrophobic interactions with both the PDZ domain surface and the side chain of residue

P-1 (Tyr), and this may provide some degree of selectivity. Finally, the Tiam1 PDZ domain has broad specificity with some preference for charged residues (Arg, Lys, and

Glu) at position P-4. The broad specificity is explained by the fact that the P-4 side chain is mostly exposed to the solvent in the Tiam1 PDZ domain-model peptide structure (Figure

3.1D). Thus, the particular preferences for ligand residues at positions P0-P-4 by the

Tiam1 PDZ domain are readily rationalized by the Tiam1 PDZ domain-model peptide structure.

Our results show that the Tiam2 PDZ domain has a distinct but overlapping specificity compared to that of the Tiam1 PDZ domain. A major distinction between these two PDZ domains is at position P0. Although the two PDZ domains accept (and selected from the library) the same set of amino acids at position P0 (Ala, Abu, Phe, and

Tyr), they have very different relative preferences for these residues. The Tiam1 PDZ domain selected Ala and Phe with similar frequencies, and the actual binding affinity for each (Ala vs. Phe) depends on the sequence context of positions P-5-P-1. The Tiam2 PDZ domain, on the other hand, clearly prefers Phe and Tyr over Ala (Figure 3.3 and Table

3.3). However, with optimal sequences at other positions, peptides with a C-terminal Ala and Val may still bind to the Tiam2 PDZ domain with respectable affinities (Table 3.3).

In particular, the peptide from Neurexin1 (containing Val at the C-terminus) was in fact 84

one of the most potent peptide ligands of the Tiam2 PDZ domain identified in this study

(Kd = 5.0 μM). Thus, it appears that the work of Tonikian et al. (9) identified a minor consensus class of Tiam2 ligands. It is not yet clear why their method did not select the major consensus class of ligands, which are generally more potent ligands than those of the minor class.

The Tiam2 PDZ domain was also selective for residues P-4-P-1 of the ligand.

Similar to the Tiam1 PDZ domain, it prefers hydrophobic and positively charged residues at position P-1. A homology model of the Tiam2 PDZ domain based on the structure of the Tiam1 PDZ domain suggests that the residues surrounding residue P-1 form a hydrophobic patch and that D845 in β3 of the Tiam2 PDZ domain (S877 in the Tiam1

PDZ domain) could form a salt bridge with surrounding basic side chains. Another major difference in specificity compared to the Tiam1 PDZ domain is at position P-2. The

Tiam2 PDZ domain strongly prefers Tyr and the positively charged His, Lys, or Arg at this position. The preference for positively charged amino acids at this position is likely due to the presence of a glutamic acid near the P-2 binding pocket in the Tiam2 PDZ domain (corresponding to residue K912 in the Tiam1 PDZ domain). The selection of charged residues at position P-4 might also be due to the Glu at position 912. Together, these results indicate that the Tiam1 and Taim2 PDZ domains have overlapping but distinct specificities, and that the P0 and P-2 positions in the ligand are the major determinants of Tiam1 and Tiam2 PDZ domain specificity.

On the basis of the results from the peptide screen, structural data, and the primary sequence alignment of the Tiam1 and Tiam2 PDZ domains, we hypothesized that the differences in the specificities of these proteins are determined primarily by the 85

residues at positions P0 and P-2 of the ligand and residues in the S0 and S-2 pockets of the

PDZ domain that participate in protein-ligand interactions.Here we sought to identify the residues that determine the specificity of these two PDZ domains and to establish if we could rationally re-engineer PDZ domain specificity by making targeted mutations in the

S0 and S-2 pockets.

As described in our previous study, the S0 binding pocket in the Tiam1 PDZ domain is shallow and formed in part by L915 in α2 and L920 found in strand β6 (Figure

3.1D). Notably, neither of these residues is conserved between the Tiam1 and Tiam2

PDZ domains (Figure 3.1B). On the basis of the Tiam1 PDZ domain-model peptide structure (236), we hypothesized that substituting residues in this pocket with those found at these positions in Tiam2 might enable the Tiam1 PDZ domain to bind Tiam2 PDZ domain ligands. Our results with the S0 pocket mutants showed that the L915F mutant disrupted Caspr4 peptide binding but that the L920V mutant exhibited enhanced binding.

In addition, the S0 pocket double mutant L915F/L920V disrupted Caspr4 binding

(~4-fold) and Syndecan1 binding (~9-fold), revealing a modest negative in both cases. These results suggest that L915 and L920 contribute to ligand specificity by selecting residues in P0 and, in the case of the Tiam1 PDZ domain, are fine-tuned to accommodate Ala and Phe. In contrast, residues in the S0 pocket of the Tiam2 PDZ domain prefer ligands with a C-terminal Phe/Tyr and Val over those that contain Ala at this position. While neither single S0 mutant could accommodate a Val at the C-terminal position, the double mutant acquired the ability to bind the Neurexin1 peptide at the expense of its ability to bind the Syndecan1 peptide, arguing that these two residues synergistically contribute to Tiam1 PDZ domain specificity. 86

Examination of the structure of the Tiam1 PDZ domain-model peptide structure indicates that residues L911 and K912 form a pocket that packs against the Tyr at position P-2 of the ligand (Figure 3.1C, D) (236). In addition, this structure shows that residue Lys P-4 of the model ligand is ~7 A° from K912 of the PDZ domain. In the case of the Syndecan1 peptide, residue P-4 is a Glu, and one can imagine that a salt bridge might form between this side chain and that of K912. This hypothesis is supported by the binding results with the K912E mutant, which reduced the level of Syndecan1 peptide binding to the level seen with the model peptide. The L911M mutant did not have a strong effect on the binding of either Syndecan1 or Caspr4 ligand, and this mutation was incapable of rescuing Syndecan1 ligand binding in the presence of K912E. In contrast, the L911M/K912E double mutant had a cooperative affect, restoring Caspr4 binding levels near that seen with the wild-type protein. Interestingly, this double mutant also exhibited a marked increase in affinity for Neurexin1. Together, these results show that the S-2 pocket has little influence on Caspr4 peptide specificity and affinity yet profoundly affects interactions with Syndecan1 and Neurexin1. Thus, it appears that PDZ domain-ligand interactions are optimized distinctly for each ligand.

Each of the double mutants tested had vestigial Tiam2 PDZ domain specificity, such that Caspr4 peptide binding was mildly perturbed at the expense of Syndecan1 peptide binding and interactions with the Neurexin1 peptide were always enhanced.

Remarkably, the four mutations were highly synergistic when combined. The Tiam1 QM

PDZ domain bound the Caspr4 peptide with an affinity indistinguishable from that measured with the wild-type Tiam1 PDZ domain, while the Neurexin1 peptide bound with a Kd ~52-fold lower than that of the wild-type PDZ domain. In addition, the Tiam1 87

QM PDZ domain significantly disrupted the Syndecan1 peptide interaction. Thus, the four mutations found in the S0 and S-2 pockets effectively switched the specificity of the

Tiam1 PDZ domain to that of the Tiam2 PDZ domain. The structural origin of this change in specificity is not fully understood but likely reflects stabilizing interactions between the S0 and S-2 pockets.

Our results show that PDZ domain specificity can be rationally re-engineered via incorporation of only a few specific mutations. Thus, engineered Tiam family PDZ domains with particular specificities could be used to probe PDZ domain-dependent functions. In general, such switches have been difficult to achieve and have been possible only when elaborate biological screens or computational algorithms were implemented

(266-268). The re-engineering process was simplified in this case by the fact that the

Tiam1 and Tiam2 PDZ domains have overlapping specificity. In particular, the interaction between residue K912 and ligand residue P-4 appears to be important in establishing PDZ domain specificity playing a critical role in promoting Syndecan1 peptide binding. Our findings also revealed that specificity subsites in the Tiam1 PDZ domain are coupled and that this coupling is dependent upon the ligand. Although not tested here, cooperativity between subsites might also be important. The fact that mutations in the S0 and S-2 pockets were synergistic in changing the Tiam1 PDZ domain specificity hints that these couplings might exist. Our results support the notion that PDZ domain affinity and specificity are regulated by determinants spread across the entire rather than by a few discrete subsites (179, 236, 269, 270). This feature would likely allow the broad range of PDZ domain specificities needed to accommodate interactions with many potential ligands but would also complicate the design of novel 88

specificities. Thus, cooperative effects may present a significant challenge for designing de novo PDZ domain specificities when starting from a template backbone structure.

Having shown that four nonconserved residues are vital for determining the specificity of the Tiam1 PDZ domain, we examined if these residues might be evolutionarily conserved throughout the Tiam family of proteins. In addition, we were curious whether the PDZ domain could be used to effectively classify Tiam family GEFs, as shown by Sakarya et al. (271) for other PDZ domain-containing proteins. To this end, we compiled a collection of Tiam family PDZ domain sequences and aligned them on the basis of the known structure of the Tiam1 PDZ domain (Figure 3.5). Figure 3.5 clearly shows that the Tiam family PDZ domains segregate into four distinct families:

Tiam1-like, Tiam2-like, zebrafish Tiam-like, and Drosophila Still Life (SIF, a GEF)-like

(272). This analysis revealed that the four residues targeted in our study were differentially conserved across these four Tiam PDZ domain families, suggesting that

PDZ domain specificities among members of this family are distinct across subfamilies but evolutionarily conserved within them. For example, the four residues investigated here are absolutely conserved within the vertebrate Tiam1 and Tiam2 PDZ domains. In the zebrafish Tiam-like and SIF-like counterparts, different sets of these four residues are conserved and the specificities of these hybrid PDZ domains remain unknown. Overall, these observations of PDZ domain conservation support the notion that Tiam family

GEFs have divergent, PDZ domain-dependent functions that are evolutionarily conserved. Because only two in vivo PDZ domain-binding partners for Tiam1 have been identified (236, 252) and none have been identified for either Tiam2 or SIF, additional studies will be required to experimentally assess the function of the Tiam2 and hybrid 89

PDZ domains.

Previously, we identified the cell adhesion proteins Syndecan1 and Caspr4 as putative Tiam1-binding proteins. In the case of Syndecan1, we showed that it was indeed a physiological partner of Tiam1 involved in cell migration and cell-matrix adhesion

(236). Furthermore, we showed that this interaction occurred via the PDZ domain of

Tiam1 and the C-terminus of Syndecan1. Here, we have identified 43 putative Tiam2- binding proteins, including the neuronal adhesion proteins Neurexin1 and Caspr4

(Table 3.6). The Tiam2 PDZ domain is capable of binding peptides from both Neurexin1 and Caspr4 (Table 3.3) but not Syndecan1. In contrast, the Tiam1 PDZ domain is capable of binding a Syndecan1 and Caspr4 C-terminal peptide but not a Neurexin1 peptide.

Combined, these results strongly suggest that the Tiam1 and Tiam2 PDZ domains target distinct proteins leading to divergent functions. Importantly, our results predict that

Neurexin1 and Tiam2 are binding partners in vivo, while Tiam1 is not. It is interesting to note that Neurexins and Caspr4 primarily function within neuronal cells where Tiam2 is known to function. It remains to be seen if the cell-cell adhesion function ascribed to the

Neurexin family of receptor proteins may be connected with Tiam2’s role in neurite extension via a PDZ domain interaction. Additional biological studies will be required to verify these predictions.

In the study presented here, we investigated the origin of the specificities of the

Tiam1 and Tiam2 PDZ domains. Using a combinatorial peptide library screen and PDZ domain ligands from native proteins, we determined that these two PDZ domains have overlapping but distinct specificities. This result is of particular interest because these two homologous proteins are thought to be functionally redundant in many contexts (245, 90

264); the distinct PDZ domain specificities suggest they may instead have unique biological functions conferred by each PDZ domain. Another important feature of our study was the identification of four residues in two specificity pockets (S0 and S-2) in the

Tiam1 PDZ domain that are crucial for ligand affinity and specificity. Remarkably, replacing these four residues with the corresponding amino acids in the Tiam2 PDZ domain was sufficient to switch the specificity of the Tiam1 PDZ domain to that of the

Tiam2 PDZ domain. Additionally, we found that residues in the S0 and S-2 pockets were energetically coupled, and that the degree of coupling was dependent upon the identity of the ligand. Together, these results suggest that the interactions between PDZ domains and their ligands are highly evolved, and that specificity is derived from determinants spread across the entire binding interface. Finally, inspection of available Tiam family PDZ domain sequences provided further evidence that the Tiam family PDZ domains have evolved distinct specificities that likely translate into distinct PDZ domain-dependent functions.

3.5 Acknowledgements

The full-length mouse Tiam2 DNA was provided by Dr. M. Hoshino (Kyoto

University, Kyoto, Japan). Cloning, protein purification, and fluorescence anisotropy binding assays were performed by the lab of Dr. Ernesto Fuentes (The University of

Iowa). Members of the Fuentes lab and Dr. C. M. Blaumueller (The University of Iowa) helped with the preparation of the manuscript.

91

Table 3.1 Peptides selected from Tiam1 PDZ screensa

Group I Group II Group III Group IV (47) SM SM (23) SM (23) (13)

RTYYC 1.88 VTFTA 1.49 KRFTF 4.29 FPRVR PLHCH ATYYC 1.87 EVFIC 1.49 RRFHF 4.23 YRYQR YSHLH IIYDC 1.86 HTFMA 1.48 RLFRF 4.15 FSDWK ACHSH GIYYA 1.85 RVFLA 1.45 KFFRF 4.10 XXXRR HKHPY QIYDA 1.85 QHFSA 1.40 KAFRF* 4.10 FTKRR FTHKH HTYRC 1.84 GILYC 1.26 KEFFF 4.02 CRFRK CHLHV NIYFC 1.83 YIIHC 1.25 TVFPF 3.72 RKFRR HPHCR AIYHA 1.80 YLIDC 1.18 ELYYF 3.37 KRFPR HIHNC NVYYC 1.79 RLRYC 1.16 CRYRF 3.37 KYRVR HEHLG TMYRC 1.78 WTWEC 1.16 KYTYF 3.27 RKFKR HQHMR ITYKA 1.77 WFIDA 1.10 EIYIF 3.20 RKFAK RHSLH YRYRA* 1.75 KMMFC 1.10 YRRYF 3.12 XXGFK HRVGH YYYKC 1.74 SYWIA 1.03 PIYVF 3.07 GTKWR RHLQH TMYLA 1.71 KYYWS 0.94 EHMHF 3.03 XARTR RWYTA 1.66 WRFVT 0.75 DLMFF 3.02 RRSRV EKYWA 1.66 FRTWT 0.40 TRWQF 3.01 YYKRD DFYCA 1.66 EMMLT 0.35 XXXYF 2.62 KRWTE AEYVA 1.65 QVTPN 0.30 RIFTM 2.22 RYKYN KIFRC 1.60 GLRII 0.27 WRFEM 2.13 RRVIT QIFDA 1.58 CWLKS 0.26 RQFYY 1.96 RRKYW SRFYC 1.54 GPTHQ 0.25 FCFEM 1.86 KRRHY TRFRC 1.54 CLDWE 0.22 KYMDM 1.45 KRKTY TTFCA 1.51 QCESL 0.18 RYQWY 0.91 LRKRY QFFVC 1.49

aUnderlined sequences were selected by the SA-AP/BCIP method, whereas the rest of the sequences were selected against Texas Red-labeled GST-PDZ. M, norleucine; C, (S)-2-aminobutyric acid; X, amino acid identity could not be determined; *, sequences selected for further binding nd analysis. Sm scores were rounded to the 2 decimal place.

92

Table 3.2 Peptides selected from Tiam2 PDZ screensa

Group I Group II Group III Group IV (10) SM (44) SM SM (3) (12) FKYCC 3.32 CHKHF 2.04 GYHRY 1.54 TTSKR YKHAH IKYFA 3.24 GRKHF* 1.96 KYELF 1.53 DKQYR QKHFH LKYFA 3.24 SLKHF 1.93 HCKRY 1.52 YMRRR RRHFH SKFYV 3.04 RHHCF 1.88 MHRAY 1.52 MRHFH YHYAV 3.02 IHKSF 1.87 VKNFF 1.48 CRHIH XKYYL 3.01 KYHVF 1.81 IKRIY 1.47 IYHKH YHRHV 2.15 SRHAF 1.81 ERAHY 1.43 KSHRH KRTYV 2.10 LQKYF 1.80 XXYWF 1.42 EKHTH WPMGC 1.52 KFHLF 1.79 GMRYY 1.41 IRHVH MLHMC 1.52 FRHVF 1.79 NKRCY 1.41 CHLHR KHCHF 1.79 CHYRY 1.41 RHSHR KHPHF 1.79 XXXIF 1.37 HFHKR YSKYF 1.78 RLTEY 1.22 IIHTF 1.78 RQINY 1.20 LIHKF* 1.77 TGLLY 1.19 HPHKF 1.74 MRHNW 0.67 ILKHY 1.71 KLRYH 0.64 TKHHY 1.70 CVRYW 0.58 YKAHF 1.70 EVYAH 0.41 VCKHY 1.66 LVQIW 0.36 FCKHY 1.66 PAVTH 0.30 YHKKY 1.65 KHRTY 1.58

aUnderlined sequences were selected against Texas Red-labeled GST-PDZ, whereas the rest of the sequences were selected by the SA-AP/BCIP method, followed by a secondary screen with Texas Red-labeled protien. M, norleucine; C, (S)-2-aminobutyric acid; X, amino acid identity could not be determined; *, sequences selected for further binding analysis. Sm scores were rounded to the 2nd decimal place.

93

Table 3.3 Dissociation constants of selected peptides against the PDZ domains

Kd (μM)

Peptide Sequence Tiam1 PDZ Tiam 2 PDZ Tiam1 peptide 1 (Group I) YAAYRYRA NBa 22.5 ± 1.1 Tiam1 peptide 2 (Group I) YAAEKYWA 90.3 ± 8.3 10.9 ± 0.4 Tiam1 peptide 3 (Group II) YAAKAFRF 200 ± 50 7.7 ± 1.3 Tiam1 peptide 4 (Group III) YAARKFAK NBa NBa

Tiam2 peptide 1 (Group I) YAAKRTYV NBa 77.6 ± 2.4 Tiam2 peptide 2 (Group II) YAAGRKHF NBa 9.9 ± 2.2 Tiam2 peptide 3 (Group II) YAALIHKF NBa 28.7 ± 4.9 Tiam2 peptide 4 (Group IV) YAAQKHFH NBa NBa

Kd (μM)

Peptide Sequence Tiam1 PDZ Tiam 2 PDZ Model SSRKEYYA 112 ± 15b,c 24.1 ± 7.1 Syndecan1 TKQEEFYA 26.9 ± 0.9b 200 ± 20 Syndecan1 (A→F) TKQEEFYF 55.7 ± 3.6b 4.5 ± 0.2 Caspr4 ENQKEYFF 19.0 ± 0.4 3.4 ± 0.3 Caspr4 (F→A) ENQKEYFA 64.8 ± 5.9b 78.7 ± 7.8 Neurexin1 NKDKEYYV 2400 ± 250b,c 5.0 ± 0.2

aNB, no binding detected. b Data taken from ref. 236; caffinity based on NMR titration.

94

Table 3.4. Summary of the sequence specificities of the PDZ domainsa

P-4 P-3 P-2 P-1 P0 Tiam1 (ref. 163): [X] [I] [FY] [YH] [AF] Tiam1 (ref. 5): [F] [ILM] [G] [W] [F] Tiam1 (this study)b: [RK] [IR] [FY] [YR] [ACF] Tiam2 (ref. 5): [R] [STE] [ST] [SR] [V] Tiam2 (this study)b: [K] [RKH] [YRKH] [YH] [FY] aAbbreviations: C, (S)-2-aminobutyric acid; X, any amino acid. bThe indicated amino acids had an occurrence of 10% at each position, one standard deviation from the average.

Table 3.5 Database search for potential Tiam1 PDZ-binding proteinsa

UniProtKB Gene Name Carboxy Tail Protein Name A6NJV1 UPF0573_HUMAN DCYFEFRA UPF0573 Protein C2orf70 Q9H7N3 CN056_HUMAN LFSFLFFF Uncharacterized protein C14orf56 Q9C0A0 CNTP4_HUMAN ENQKEYFF Contactin-associated protein-like 4 P08263 GSTA1_HUMAN EARKIFRF Glutathione S-transferase A1 R09210 GSTA2_HUMAN EARKIFRF Glutathione S-transferase A2 Q16772 GSTA3_HUMAN EARKIFRF Glutathione S-transferase A3 Q7RTV2 GSTA5_HUMAN EARKIFRF Glutathione S-transferase A5 Q60330-2 PCDGC_HUMAN QIFFLFFF Protocadherin γ-A12 (isoform 2) P18827 SDC1_HUMAN TKQEEFYA Syndecan-1 P34741 SDC2_HUMAN APTKEFYA Syndecan-2 O75076 SDC3_HUMAN DKQEEFYA Syndecan-3 ZCPW1 ZCPW1_HUMAN RRPREFRF Zinc finger CW-type PWWP domain protein 1 (isoform 2) aDatabase search was conducted using PROSITE (256) ([FRKE]-[ILMRE]-[FYG]-[YWRF]-[AF]> as the search motif).

95

Table 3.6 Database search for potential Tiam2 PDZ-binding proteinsa UniProtKB Gene Name Carboxy Tail Protein Name O95477 ABCA1_HUMAN EKVKESYV ATP-binding cassette sub-family A member 1 P05023 AT1A1_HUMAN WVEKETYY Na+/K+-transporting ATPase subunit α-1 P50993 AT1A2_HUMAN WVEKETYY Na+/K+-transporting ATPase subunit α-2 P13637 AT1A3_HUMAN WVEKETYY Na+/K+-transporting ATPase subunit α-3 Q13733 AT1A4_HUMAN WVEKETYY Na+/K+-transporting ATPase subunit α-4 Q13733-2 AT1A4_HUMAN WVEKETYY Na+/K+-transporting ATPase subunit α-4 isoform 2 Q4G0S7 CC152_HUMAN SHLKRRRF Coiled-coil domain-contain protein 152 Q9NS75 CLTR2_HUMAN WLRKETRV Cysteinyl leukotriene receptor 2 Q9C0A0 CNTP4_HUMAN ENQKEYFF Contactin-associated protein-like 4 Q13216-2 ERCC8_HUMAN RFNKKKRY Isoform 2 of DNA excision repair protein ERCC-8 Q9Y4F4-2 F179B_HUMAN EEVRTKYF Isoform 2 of Protein FAM179B Q6PXP3 GTR7_HUMAN SPAKETSF Solute carrier family 2, facilited glucose transporter member 7 Q9P2D3 HTR5B_HUMAN IKLKTSFF HEAT repeat-containing protein 5B Q9P2D3-3 HTR5B_HUMAN IKLKTSFF Isoform 3 of HEAT repeat-containing protein 5B O14649 KCNK3_HUMAN LMKRRSSV Potassium channel subfamily K member 3 Q9NPC2 KCNK9_HUMAN LMKRRKSV Potassium channel subfamily K member 9 Q9H0V9 LMA2L_HUMAN EQSRKRFY VIP36-like protein Q9H0V9-2 LMA2L_HUMAN EQSRKRFY Isoform 2 of VIP36-like protein Q15233 NONO_HUMAN APNKRRRY Non-POU domain-containing octamer-binding protein Q9ULB1 NRX1A_HUMAN NKDKEYYV Neurexin-1-alpha Q9ULB1-2 NRX1A_HUMAN NKDKEYYV Isoform 2 of Neurexin-1-alpha P58400 NRX1B_HUMAN NKDKEYYV Neurexin-1-beta Q9P2S2 NRX2A_HUMAN NKDKEYYV Neurexin-2-alpha Q9P2S2-2 NRX2A_HUMAN NKDKEYYV Isoform 2 of Neurexin-2-alpha P58401 NRX2B_HUMAN NKDKEYYV Neurexin-2-beta Q9Y4C0 NRX3A_HUMAN NKDKEYYV Neurexin-3-alpha Q9Y4C0-2 NRX3A_HUMAN NKDKEYYV Isoform 2 of Neurexin-3-alpha Q9HDB5 NRX3B_HUMAN NKDKEYYV Neurexin-3-beta. Q9HDB5-2 NRX3B_HUMAN NKDKEYYV Isoform 2 of Neurexin-3-beta Q9H342 O51J1_HUMAN LLEKRRRV Olfactory receptor 51J1 O43660 PLRG1_HUMAN EIIKRKRF Pleiotropic regulator 1 O43660-2 PLRG1_HUMAN EIIKRKRF Isoform 2 of Pleiotropic regulator 1 Q8WXF1 PSPC1_HUMAN GPNKRRRY Paraspeckle component 1 P57052 RBM11_HUMAN KSKKKKRY Putatitive RNA-binding protein 11 P57052-2 RBM11_HUMAN KSKKKKRY Isoform 2 of Putatitive RNA-binding protein 11 Q9H628 RERGL_HUMAN FGKRRKSV Ras-related and estrogen-regulated growth inhibitor-like protein O94822 RN160_HUMAN PLCRETFF RING finger protein 160 P48066 S6A11_HUMAN ITEKETHF Sodium- and chloride-dependent GABA transporter 3 A6NNU9 SSX11_HUMAN RPPKSKYF Protein SSX11 Q658P3 STEA3_HUMAN LAEKTSHV Metalloreductase STEAP3 Q658P3-2 STEA3_HUMAN LAEKTSHV Isoform 2 of Metalloreductase STEAP3 Q658P3-3 STEA3_HUMAN LAEKTSHV Isoform 3 of Metalloreductase STEAP3 Q8N7E2 ZN645_HUMAN HQRRHRRY Zinc finger protein 645 aDatabase search was conducted using PROSITE (256) ([RK]-[STERKH]-[STYRKH]-[SRYHF]-[VFY]> as the seach motif).

96

a Table 3.7 ΔΔΔGint for Tiam1 PDZ Mutants as a function of binding to selected peptides

Syndecan1: TKQEEFYACOOH

1 2 3 Tiam1 PDZ Kd (μM) ΔGb ΔΔGb ΔΔΔGint ∑singles Wild-type 26.9 ± 0.90 6.23 ± 0.02 L911M 34.7 ± 0.70 6.08 ± 0.01 0.15 ± 0.02 K912E 140 ± 20 5.25 ± 0.08 0.98 ± 0.09 L911M/K912E 211 ± 36 5.00 ± 0.10 1.20 ± 0.10 0.07 ± 0.13 1.13 ± 0.09 L915F 81 ± 7 5.58 ± 0.05 0.65 ± 0.06 L920V 46 ± 3 5.92 ± 0.03 0.31 ± 0.04 L915F/L920V 250 ± 20 4.91 ± 0.08 1.32 ± 0.09 0.36 ± 0.13 0.96 ± 0.07 QM 122 ± 8 5.33 ± 0.04 0.89 ± 0.04

Caspr4: ENQKEYFFCOOH

Tiam1 PDZ Kd (μM) ΔGb ΔΔGb ΔΔΔGint ∑singles Wild-type 19.0 ± 0.40 6.43 ± 0.01 L911M 14.0 ± 0.30 6.61 ± 0.01 0.18 ± 0.02 K912E 58.6 ± 4.10 5.77 ± 0.04 0.67 ± 0.04 L911M/K912E 28.9 ± 0.90 6.19 ± 0.02 0.24 ± 0.02 0.25 ± 0.04 0.49 ± 0.04 L915F 61 ± 4 5.74 ± 0.04 0.69 ± 0.04 L920V 10.8 ± 0.60 6.77 ± 0.03 0.33 ± 0.04 L915F/L920V 76.5 ± 3.4 5.61 ± 0.04 0.75 ± 0.05 0.38 ± 0.10 0.36 ± 0.06 QM 18.3 ± 0.30 6.46 ± 0.01 0.02 ± 0.02

Neurexin1: NKDKEYYVCOOH

Tiam1 PDZ Kd (μM) ΔGb ΔΔGb ΔΔΔGint ∑singles Wild-type 2400 ± 2504 3.60 ± 0.10 L911M/K912E 270 ± 150 4.90 ± 0.30 1.30 ± 0.30 L915F/L920V 166 ± 12 5.15 ± 0.07 1.58 ± 0.10 QM 46 ± 2 5.91 ± 0.04 2.30 ± 0.10 a Abbreviations: b, binding; int, interaction; QM, quadruple mutant (L911M/K912E/L915F/L920V) 1 units of kcal/mol, 2 ΔΔGb = ΔGb(mutant)  ΔGb(wild-type), 3 ΔΔΔGint = ΔGb(mutant1,2) [ΔGb(mutant1) + ΔGb(mutant2)], 4 Data taken from ref. 236.

97

Figure 3.1 Conservation and structure of the Tiam1 and Tiam2 PDZ domains

FIGURE 3.1: (A) Domain architecture and motifs within Tiam family GEF proteins: DH, Dbl homology domain; PH, Pleckstrin homology domain; CC, coiled-coil domain; Ex, extension domain; RBD, Ras-binding domain; PDZ, PSD-95/Dlg/ZO-1 domain; and P, PEST sequence. Myristoylation is indicated by the twisting line at the N-terminus. The percent identity (% ID) between each homologous domain is indicated. (B) Amino acid residues involved in Tiam1 and Tiam2 PDZ domain specificity. Primary sequence alignment of Tiam1 and Tiam2 PDZ domains with residues that are discussed in the text labeled and boxed. (C) Ribbon diagram of the Tiam1 PDZ domain-Model peptide complex ( entry 3KZE) (236). (D) Expanded view of the boxed region in panel C showing the residues targeted for mutagenesis (colored yellow) in this study.

98

Figure 3.2 Structure of the inverted peptide library used in the PDZ domain screensa

aB, β-Alanine; X, random position; encoding peptide is colored in red while exterior binding peptide is colored black.

99

Figure 3.3 Sequence specificity of the Tiam1 and Tiam2 PDZ domainsa

aThe histograms indicate the amino acids identified in the combinatorial peptide screen. Each position in the peptide is labeled P0-P-4 starting from the C-terminus. The “Occurrence (%)” on the y-axis represents the percentage of the selected sequences (groups I and II) that contained a particular amino acid at a given position. C, (S)-2- aminobutyric acid (Abu); M, norleucine (Nle).

100

Figure 3.4 The Tiam1 PDZ domain quadruple mutant (QM) has the same specificity as the Tiam2 PDZ domaina

a Representative peptide binding curves for (A) the Tiam2 PDZ domain and (B) the Tiam1 PDZ domain QM (L911M, K912E, L915F, and L920V). Individual curves are for interactions with the (o) Caspr4, (▼) Neurexin1, and (●) Syndecan1 peptides. Each titration curve was measured in triplicate.

101

Figure 3.5 Four subfamilies of the Tiam PDZ domainsa

aForty-three protein sequences were collected from the SMART database (273, 274). The PDZ domain of each “Tiam-like” protein was aligned with the Tiam1 PDZ domain using ClustalX2 (275). The PDZ domains were grouped into four classes (A-D) based on the conservation of residues 911, 912, 915, and 920, respectively (numbered according to the Tiam1 PDZ domain). The protein accession numbers for these proteins are as follows: (A) Tiam1-like (Pt, UPI0000E25843; Pa, XP_002830672; Hs, Q13009-1; Mm, UPI- 0000D9A603; Cj, XP_002761403; Rn, UPI0000DA3756; Mm, Q60610; Tb, ENSTBEP00000012843; St, ENSSTOP00000005270; Oc, XP_002716851; Bt, UPI0000F33E72; Ec, UPI00015606CF; Fc, ENSFCAP- 00000002181; Md, ENSMODP00000037826; Gg, UPI0000ECD4E6; Tg, XP_002188532), (B) Tiam2-like (Hs, Q8IVF5; Pt, UPI0000E21226; Mm, UPI0000DC5170; Bt, UPI0000F3232D; Cl, UPI0000- EB44DA; St, ENSSTOP00000000705; Ec, UPI00015607E2; Rn , UPI0000DC1DFF; Mm, UPI0000DC5170; Oa, ENSOANP00000010710; Ol, ENSOANP00000010712; Md, ENSMODP000000- 29251; Gg,UPI0000- ECC8B0; Tg,XP_002198583), (C) zebrafishTiam-like (Dr-1,XP_002664748; Tn-1,Q4RFR9; Dr-2, XP_0- 01924044; Tn-2,Q4SFX7; Ol, ENSORLP00000005549), and (D) SIF-like (Dm-1, P91621;Dm-2, P91620; Dp,Q2LZN8; Da, B3M3N2; Ag,Q7PI66; Aa,Q17DC1; Cq, B0X8E7; Bf, C3YH98).

102

CHAPTER 4

HDAC6 AND UBP-M BUZ DOMAINS RECOGNIZE SPECIFIC C-TERMINAL SEQUENCES OF PROTEINS2

4.1 Introduction

The conjugation of ubiquitin, a highly conserved 76-amino acid protein, to proteins is a key signaling event in cellular processes such as the proteolysis of cellular proteins (29), cell cycle control (276), and transcriptional regulation (32, 33). Several proteins involved in these processes need to specifically recognize ubiquitin to perform their function, such as deubiquitylating enzymes (DUBs) (44, 277), certain trafficking proteins (57, 278), and ubiquitin conjugating enzymes (279). To recognize ubiquitin, many proteins utilize ubiquitin-binding domains, such as UBA, UIM, MIU, DUIM, CUE,

GAT, NZF, A20 ZnF, BUZ, UBZ, Ubc, UEV, UBM, GLUE, Jab1/MPN, and PFU domains (280). Most of these domains bind ubiquitin at a hydrophobic patch surrounding

Ile-44 of ubiquitin (281, 282). The BUZ domain, however, has been found to interact with ubiquitin by binding to its free C-terminus (41, 42).

The BUZ domain, which is also known as the Znf-UBP (zinc finger ubiquitin- specific processing protease) domain, DAUP (deacetylase/ubiquitin-specific protease) domain, or PAZ (polyubiquitin-associated zinc finger) domain, consists of approximately

2 Reproduced from Hard, R., Liu, J., Shen, J., Zhou, P., and Pei, D. Biochemistry 49, 10737-10746, Copyright 2010 American Chemical Society. 103

100 residues and has the general structure of being organized around a central five-strand twisted β-sheet with a nearby α-helix (41, 42, 43). BUZ domains require one to three zinc ions to maintain their structural integrity (41, 43, 48). The BUZ domain is present in at least 10 human ubiquitin-specific processing proteases (USPs) (44), which are the largest class of deubiquitylating enzymes. It is also found in BRCA1-associated protein 2

(BRAP2) (46), an E3 ubiquitin ligase, and in cytoplasmic histone deacetylase 6 (HDAC6)

(39, 47). Although the precise roles of BUZ domains in these proteins are not completely understood, it appears that one major function is to regulate the activity of BUZ- containing proteins by binding to the C-terminus of ubiquitin (44). In the case of BUZ- containing USPs, binding of ubiquitin has been shown to increase their catalytic activity

(42, 51, 52), while binding of ubiquitin by the BUZ domain of HDAC6 may prevent

HDAC6 from transporting polyubiquitinated proteins to aggresomes (41, 57).

Recent biochemical and structural analysis of the isopeptidase T (isoT) BUZ domain revealed a distinct ubiquitin binding mode, in which the C-terminal RGG motif of ubiquitin inserts into a deep pocket on the BUZ domain (42). The more N-terminal residues, including Arg-72 and Leu-73, also make specific contacts with the BUZ domain, likely contributing to both specificity and overall affinity. In addition, in the case of isoT, Phe-224 form the L2A loop of the BUZ domain interacts with a small hydrophobic patch on the ubiquitin surface (formed by Leu-8 and Ile-36). Consistent with this observation, any modification that disrupts the free C-terminal Gly-Gly motif abolished the binding of the ubiquitin peptide to BUZ domains (41). Because ubiquitin is the only currently known ligand of BUZ domains, it is not clear whether the BUZ domains can bind to other peptides and/or proteins. One approach to addressing this 104

question is to screen the BUZ domains against a combinatorial peptide library (119). We have previously developed a methodology for synthesizing and screening one-bead-one- compound (OBOC) peptide libraries that display support-bound peptides with free

C-termini and used it to profile the sequence specificity of postsynaptic density-95/discs large/zona occluden-1 (PDZ) domains (187). Herein, we extended this approach to systematically profile the sequence specificity of the BUZ domains of mutant ubiquitin processing protease (Ubp-M, also known as USP16) and HDAC6. Our results demonstrate that the BUZ domain is a sequence-specific protein-binding module, with each domain recognizing a specific subset of C-terminal sequences. This suggests that the BUZ domains may bind to other cellular proteins in addition to ubiquitin.

4.2 Experimental Procedures

4.2.1 Materials

Fmoc L-amino acids and coupling reagents for peptide synthesis were purchased from Advanced ChemTech (Louisville, KY) andNovaBiochem (La Jolla, CA). TentaGel resin was purchased from Peptides International (Louisville, KY), while Wang resin was from Advanced ChemTech. Streptavidin-alkaline phosphatase (SA-AP) conjugate was purchased from Prozyme (San Leandro, CA). Phenyl isothiocyanate (PITC) was purchased in 1 mL sealed ampules from Sigma (St. Louis, MO). 5-Bromo-4-chloro-3- indolyl phosphate (BCIP) was from Sigma. Boc-Glu(OFm)-OH was purchased from

Chem-Impex International Inc. (Wood Dale, IL). Thrombin (from bovine plasma) was purchased from Fisher Scientific (Pittsburgh, PA). DNA plasmids for GST-Ubp-M

BUZ (BUZ residues 10-143) and GST-HDAC6 BUZ (residues 1059-1215) (both pGEX-

2T vectors) and (His)6-tagged Ubp-M BUZ (in pET-15b vector, residues 22-143) have 105

previously been described (41).

4.2.2 Expression, Purification, and Biotinylation of the BUZ Domains

Escherichia coli Rosetta BL21(DE3) cells were transformed with either a

GST-HDAC6 BUZ, GST-Ubp-M BUZ, or (His)6-Ubp-M BUZ plasmid and grown in

Luria-Bertani medium (containing 500 μM ZnSO4) at 37 °C until the OD600 reached 0.6.

For the production of GST-Ubp-M BUZ fusion protein, the cells were induced by addition of 90 μM isopropyl β-D-thiogalactoside (IPTG) for 5 h at 30 °C. For (His)6-

Ubp-M BUZ and GST-HDAC6 BUZ proteins, the cells were induced with 200 μM IPTG for 15 h at 20 °C. The cells were collected by centrifugation at 5000 rpm for 20 min in a

Sorvall RC-5C Plus rotor and lysed by sonication in either 50 mM sodium phosphate (pH

8.0), 300 mM NaCl, and 5 mM imidazole [for (His)6-Ubp-M BUZ] or 20 mM HEPES

(pH7.4), 150 mM NaCl, and 1mM β-mercaptoethanol (for GST fusion proteins) containing protease inhibitors phenylmethanesulfonyl fluoride (35 mg/L), trypsin inhibitor (20 mg/L), and pepstatin (1 mg/L). The GST fusion proteins were purified on a glutathione-agarose column according to the manufacturer’s instructions. Free glutathione was removed by size exclusion chromatography in 30 mM HEPES (pH 7.4) and 150 mM NaCl. For library screening, the GST fusion proteins (≥ 2 mg/mL) were biotinylated by treatment with 2 equiv of (+)-biotin N-hydroxysuccinimide (NHS) ester

(a 10 mg/mL biotin-NHS stock solution prepared in DMSO). The pH of the reaction solution was adjusted to ~8 by the addition of 1 M NaHCO3 (pH 8.4), and the reaction was allowed to proceed for 1 h at 4 °C. Any unreacted biotin-NHS was quenched by the addition of 1M Tris buffer (pH 8.3) to a final concentration of 50 mM. Free biotin was then removed by size exclusion chromatography in 30 mM HEPES (pH 7.4) and 150 mM 106

NaCl. The protein concentration was determined by the Bradford method, using bovine serum albumin as the standard. The proteins were flash-frozen in 33% glycerol using dry ice and isopropyl alcohol and stored at -80 °C. The (His)6-tagged Ubp-M BUZ domain was purified by metal affinity chromatography (Ni-NTA column) and ion exchange chromatography (Q-Sepharose). For fluorescence polarization experiments, the proteins were exchanged into a buffer containing 20 mM sodium phosphate (pH 7.0) and 100 mM

NaCl by size exclusion chromatography after affinity purification. For fluorescence polarization studies with the HDAC6 BUZ domain, the GST tag was removed by treatment of the fusion protein still bound to the glutathione resin with thrombin (GE

Healthcare) for 16 h at 4 °C. The GST-free protein was eluted from the resin with a buffer containing 50 mM Tris-HCl (pH 8.0), 150 mM NaCl, and 2.5 mM CaCl2.

α 4.2.3 Synthesis of N -Boc-Glu(δ-NHS)-O-CH2-CH=CH2

Boc-Glu(OFm)-OH (0.426 g, 1 mmol) was dissolved in 1.4 mL of DCM, followed by the addition of NaHCO3 (0.168 g, 2 mmol) and H2O (1.7 mL). Allyl bromide

(0.363 g, 3 mmol) was then added at 0 °C, followed by Aliquate-336 (0.388 g, 0.96 mmol). The reaction mixture was stirred at 35 °C for 16 h. After that, the organic and aqueous phases were separated, the aqueous fraction was extracted with DCM (2 x 1 mL), and the organic fractions were combined and dried over MgSO4. The solvent was removed by evaporation under vacuum, and the crude was purified by silica gel column chromatography (2:1 hexane:ethyl acetate ratio) to give a white solid after being dried under vacuum overnight (0.37g, 80%). The product was dissolved in 10% (v/v) piperidine in DCM (16 mL) and stirred for 2 h at room temperature. The solvent was removed under reduced pressure. The product was dissolved in 10% NaHCO3 and 107

extracted with diethyl ether (20 mL). The aqueous layer was then acidified to pH ~ 4 with

1 M HCl. The desired product was extracted with ethyl acetate (3 x 20 mL) and dried over Na2SO4. After removal of the solvent under reduced pressure, the product (0.174 g,

76.3%) was dissolved in 20 mL of DCM containing 1.2 equiv (0.72 mmol, 0.083 g) of

N-hydroxysuccinimide, and the mixture was stirred vigorously for 30 min at room temperature. Next, diisopropylcarbodiimide (0.72 mmol, 0.091 g) was added to the mixture, and the solution was stirred overnight at room temperature. The solvent was removed under reduced pressure, and the reaction mixture was extracted with ethyl acetate (3 x 30 mL) and water (20 mL). The organic portions were then combined, washed with brine (40 mL), and dried with Na2SO4. The solvent was evaporated under reduced pressure, and the product was purified by silica gel column chromatography (1:2 hexane:ethyl acetate ratio) to give a white solid (0.18 g, 78%): 1H

NMR (250 MHz, CDCl3) δ 1.28 (s, 9H), 1.84-1.99 (m, 1H), 2.08-2.23 (m, 1H), 2.53-2.62

(m, 2H), 2.68 (s, 4H), 4.15-4.29 (m, 1H), 4.49 (d, J = 5.7 Hz, 2H), 5.11 (dd, J = 1.2, 10.3

+ Hz, 2H), 5.22 (d, J = 1.4 Hz, 1H), 5.68-5.83 (m, 1H); HRESI-MS C17H24N2O8Na ([M +

Na]+) calcd 407.1425, found 407.1421.

4.2.4 Library Synthesis

The library was synthesized on 2.0 g of TentaGel S NH2 resin (90 μm, 0.28 mmol/g) via modification of a previously reported procedure (187). The BBLLM linker was synthesized using 4 equiv of Fmoc-amino acids using HBTU, HOBt, and NMM as the coupling agents. Each amino acid was coupled for 1 h, followed by exhaustive washing of the resin with DMF and DCM. Before the coupling of the next amino acid,

108

the N-terminal Fmoc group was removed by treatment of the resin with 20% piperidine in

DMF (5 + 15 min), followed by exhaustive washing of the resin with DMF, DCM, and DMF again. To segregate the beads into outer and inner layers, we soaked the resin

(after removing the N-terminal Fmoc group) in 75% DMF in water (5 min), 50% DMF in water (5 min), 25% DMF in water (5 min), and then washed the resin with water. The resulting resin was incubated in water overnight. The following day, the water was drained and the resin was quickly resuspended in 30 mL of a 55:45 (v/v) DCM/diethyl ether mixture containing 0.5 equiv of Nα-Boc-Glu(δ-N-hydroxysuccinimidyl)-

O-CH2CH=CH2 (0.28 mmol) and incubated for 30 min on a rotary shaker. The resin was washed with a 55:45 (v/v) DCM/diethyl ether mixture (3 x 50 mL) and DMF (8 x 50 mL) and treated with 4 equiv of Fmoc-Gly-OH with HBTU, HOBt, and NMM (in 30 mL of

DMF) for 1 h. Next, the Nα-Boc group was removed by treatment with a solution containing 90% trifluoroacetic acid (TFA), 2.5% triisopropylsilane, 2.5% ethanedithiol, and 5% DCM for 30 min. The resin was drained and neutralized with 10% triethylamine in DMF for 10 min. 4-Hydroxymethylphenoxyacetic acid (HMPA, 2 equiv) was then coupled to the surface layer of the beads using HBTU. The N-terminal Fmoc group of the peptides in the inner layer was removed with piperidine, and Fmoc-Arg(Pbf)-OH (0.7 equiv) was coupled to the free N-terminus using HBTU, HOBt, and NMM (45 min).

Upon removal of the Fmoc protecting group, the resin was split into 20 equal portions

(100 mg each), and each was coupled to a different Fmoc-amino acid (4 equiv). The addition of the first random residue employed diisopropylcarbodiimide (4 equiv) and

4-dimethylaminopyridine (0.1 equiv) in DCM as the coupling reagents (6 h), while the other random residues were coupled using the standard HBTU, HOBt, and NMM 109

chemistry (1 h). Each coupling reaction was repeated once to ensure complete reaction.

To differentiate isobaric amino acids by MS sequencing, 5% (mol/mol) CD3CO2D was added to the coupling reaction mixtures of Leu and Lys while 5% (mol/mol)

CH3CD2CO2D was added to the coupling reaction mixtures of Nle (192, 193). After synthesis of the random region of the library, an Ala-Ala dipeptide was added to the N- terminus of all peptides. The resin was treated overnight with tetrakis-

(triphenylphosphine)palladium (1 equiv), triphenylphosphine (3 equiv), formic acid (10 equiv), and diethylamine (10 equiv) in anhydrous THF. The resin was washed with 1% diisopropylethylamine in DMF, 1% sodium dimethyldithiocarbamate hydrate in DMF,

DMF, DCM, and DMF. The Fmoc group was removed with piperidine, and the resin was washed with DMF, 1M HOBt in DMF, DMF, and DCM. The surface peptides were cyclized via incubation of the resin in a solution of PyBOP, HOBt, and NMM (5, 5, and

10 equiv, respectively) in DMF for 3 h. The resin was washed with DMF and DCM and treated with 50 mL of a modified Reagent K (7.5% phenol, 5% water, 5% thioanisole,

2.5% ethanedithiol, and 1% anisole in TFA) for 2 h. The resulting resin was washed with TFA and DCM, dried under vacuum, and stored at -20 °C.

4.2.5 Library Screening

A typical screening reaction involved 10-50 mg of the peptide library in a Micro

Bio-Spin column (0.8 mL, Bio-Rad). The resin was swelled in DCM for 10-15 min, thoroughly washed with DMF and water, and incubated in 1 mL of HBST-gelatin buffer

[30 mM HEPES, 150 mM NaCl, 0.05% Tween 20, and 0.1% gelatin (pH 7.4)] for 4 h.

Afterward, the gelatin buffer was drained and the resin was resuspended in 1 mL of

HBST-gelatin buffer containing 1 μM biotinylated BUZ domain protein and 1 mM tris(2- 110

carboxyethyl)phosphine (TCEP). The resin was incubated overnight at 4 °C with gentle mixing. The protein solution was gently drained, and the resin was resuspended in 1mL of SA-AP binding buffer [30 mM Tris-HCl (pH 7.4), 250 mM NaCl, 10 mM MgCl2, 70

μM ZnCl2, and 20 mM imidazole] containing 1 μg/mL SA-AP. The resin was incubated for 10 min at 4 °C. The SA-AP buffer was drained, and the resin was washed with the SA-AP binding buffer (2 x 1 mL) and SA-AP reaction buffer [30 mM Tris-HCl

(pH 8.5), 100 mM NaCl, 5 mM MgCl2, 20 μM ZnCl2, 20 mM imidazole, and 0.01%

Tween 20] (2 x 1 mL). The resin was then transferred into a single well of a 12-well plate

(BD Falcon) by using the SA-AP reaction buffer (3 x 300 μL). After the addition of

100 μL of a BCIP solution (5 mg/mL in SA-AP reaction buffer), the resin was incubated at room temperature on a rotary shaker and monitored under a dissecting microscope for the development of turquoise color on positive beads. Once the desired color intensity was developed on the positive beads (typically ~1 h), the staining reaction was quenched by the addition of 100 μL of 1M HCl. The colored beads were manually removed by using a micropipette under a dissecting microscope. The peptides on the positive beads were individually sequenced by partial Edman degradation mass spectrometry (PED-MS) as previously described (192, 193).

4.2.6 Synthesis of Individual Peptides

Each peptide was synthesized on 50 mg of Wang resin (0.8 mmol/g). The first amino acid was coupled to the resin using diisopropylcarbodiimide as the coupling reagent and dimethylaminopyridine as a catalyst. The remaining amino acids were coupled using standard Fmoc/HBTU chemistry. To generate fluorescently labeled peptides, 100 μL of an FITC solution (20 mg/mL FITC in anhydrous DMSO) was added 111

to ~5 mg of the resin in a 0.5 mL microcentrifuge tube, along with 3.2% (v/v) diisopropylethylamine. The resin was incubated with FITC at room temperature for 4 h in the dark. The resin was then transferred into a 0.8 mL Bio-Rad Micro-Spin column and washed with dichloromethane. Side chain deprotection and peptide cleavage from the resin were achieved by treatment with 1 mL of the modified reagent K for 2 h. The crude peptide was triturated three times with cold diethyl ether and purified on a semipreparative HPLC column (C18 column). The identity of each peptide was confirmed by MALDI-TOF mass spectrometry. Peptide concentrations were determined by the absorbance of FITC at 495 nm.

4.2.7 Determination of Binding Affinity by Fluorescence Polarization

The binding affinities of the BUZ domains for individual peptides were determined by fluorescence polarizationat 25 °C on a SpectraMax M5 Multi-Mode

Microplate Reader (Molecular Devices, Sunnyvale, CA). A fluorescently labeled peptide was dissolved in ddH2O containing 1% (w/v) bovine serum albumin and added to solutions containing increasing concentrations of GST-HDAC6 BUZ or (His)6-Ubp-M

BUZ protein [in 20mM sodium phosphate (pH 7.0) and 100 mM NaCl] to give a final concentration of 90 nM peptide and 0.1% (w/v) bovine serum albumin. For binding studies with HDAC6 BUZ (no GST), the buffer contained 50 mM Tris-HCl (pH 8.0),

150 mM NaCl, and 2.5 mM CaCl2. Fluorescence anisotropy (A) values were measured in triplicate, and the dissociation constant (KD) was determined by nonlinear regression fitting of the anisotropy data against the protein concentration using KaleidaGraph version 3.6 (Synergy Software, Reading, PA):

112

2 A = [Af + (AbQb/Qf – Af) ({[LT] + [PT] + KD – [([LT] + [PT] + KD) –

1/2 2 4[LT][PT] }/(2[LT))]/[1 + (Qb/Qf – 1)({[LT] + [PT] + KD – [([LT] + PT + KD) –

1/2 4[LT][PT]] }/(2[LT]))] where A is experimentally measured anisotropy, Af is the anisotropy of the unbound peptide, Ab is the anisotropy of the peptide-protein complex, Qf is the fluorescence intensity of the free peptide,Qb is the fluorescence intensity of the bound peptide,

[LT] is the total concentration of the labeled peptide, and [PT] is the total concentration of the protein.

4.2.8 GST Pull-Down Assay

DNA constructs encoding Xenopus laevis histone proteins H3 and H4 were kindly provided by K. Luger (Colorado State University, Fort Collins, CO). The (H3+H4)2 tetrameric protein complex was overexpressed and purified as described previously

(283). A small column filled with ~200 μL of glutathione-agarose (Sigma) was washed with PBS buffer [25 mM sodium phosphate and 100 mM NaCl (pH 7)]. Either the GST-

BUZ domain protein (4.0 nmol) or GST alone (negative control) was loaded onto the column, which was exhaustively washed with PBS until no GST-BUZ or GST protein was detected in the flow-through fractions. An equal amount of the (H3+H4)2 tetrameric complex (4.0 nmol) was added to the column and incubated on ice for 10 min with rotary mixing. The column was then washed with 10 mL of PBS containing 100 mM NaCl and eluted with PBS containing 100 mM NaCl (3 x 200 μL), 300 mM NaCl (3 x 200 μL), and

700 mM NaCl (3 x 200 μL). All of the fractions were analyzed by SDS-PAGE and visualized by Coomassie blue staining.

113

4.3 Results

4.3.1 Synthesis and Screening of a Peptide Library with Free C-Termini

Earlier studies have shown that the BUZ domains of isoT and Ubp-M interact with ubiquitin by recognizing its extreme C-terminal sequence (41, 42). A peptide corresponding to the last five residues of ubiquitin bound to the Ubp-M BUZ domain with a KD value of 15.9 μM, similar to the binding affinity between the BUZ domain and full-length ubiquitin. However, it is currently unknown whether BUZ domains are capable of binding to other peptides and/or proteins. To answer this question, we decided to systematically profile the sequence specificity of BUZ domains by screening them against a combinatorial OBOC peptide library. Because the BUZ domains require a free

C-terminus for binding (41), we designed a peptide library in the form of resin-

MLLBBE’AAX5X4X3X2X1-CO2H [where B is β-alanine, E’ is a modified glutamic acid, and X1-X5 are L-α-aminobutyrate (Abu or U, used as a Cys replacement), L-norleucine

(Nle or M, used as a Met replacement), or 18 proteinogenic amino acids except for Cys and Met], which featured five random residues near the free C-terminus (Figure 4.1).

Because solid-phase peptide synthesis usually starts from the C-terminus, the desired peptide library cannot be prepared by conventional peptide synthesis chemistry. We have recently developed a novel strategy for overcoming this technical difficulty (187). In this strategy, each resin bead was topologically segregated into two layers, with the surface layer displaying an inverted peptide containing a free C-terminus, with the inner core containing the same peptide sequence in the normal orientation (which is attached to the support via its C-terminus) as an encoding tag for later sequence analysis (Figure 4.1).

114

The peptide library was synthesized on TentaGel microbeads (90 μm, 0.28 mmol/g, 2.86 x 106 beads/g) by a modification of our previously reported method (187) and has a theoretical diversity of 205 or 3.2 x 106. The library was screened against the biotinylated

GST-BUZ domain proteins as previously described for other protein domains (284).

Briefly, binding of a biotinylated BUZ domain to a resin bead recruits SA-AP to the bead surface. Subsequent incubation in the presence of 5-bromo-4-chloro-3-indolyl phosphate results in the formation of turquoise color on the positive beads (284). Note that during the library screening, the BUZ domains were too large to diffuse into the beads and therefore have access only to the inverted peptides on the bead surfaces.

4.3.2 Sequence Specificity of the Ubp-M BUZ Domain

The BUZ domain of Ubp-M was chosen for this study because it is a representative of a family of BUZ domain-containing ubiquitin-specific processing (USP) proteases (44). HDAC6, on the other hand, is one of only a few BUZ domain-containing proteins that are not in the USP family. For each BUZ domain, a total of ~300 mg of the library (~850000 beads) was screened in several separate experiments. Although the amount of resin used did not cover the entire sequence space of the peptide library, we have previously demonstrated that screening only a fraction of the library (~10% or more of its theoretical diversity) is sufficient for defining the sequence specificity of a binding domain (194, 285). For the Ubp-MBUZ domain, library screening resulted in a total of

128 hits, which can be divided into five different classes on the basis of sequence similarity (Table 4.1). The class I peptides (35 sequences) all contained a C-terminal Gly-

Gly motif, similar to the C-terminal sequence of ubiquitin (RLRGG-CO2H). It has broader but still significant specificity at the more N-terminal positions. For example, it 115

strongly prefers a positively charged residue (Arg or Lys) at the -2 position (relative to the C-terminal residue, which is defined as position 0) (Figure 4.2). It tolerates a wide variety of residues at the -3 and -4 positions, with some preference for an aromatic hydrophobic residue at the -3 position and a basic or hydrophobic residue at the -4 position. The class II peptides (18 sequences) contained a consensus motif of DG(F/Y), often but not always at the C-terminus. These sequences were apparently selected because of their ability to bind to the GST tag (vide infra). The class III peptides (19 sequences) all contained an HPQ motif, which has previously been identified as a specific ligand of streptavidin (286). Apparently, beads containing the class III peptides directly bound and recruited SA-AP to their surfaces. The class IV peptides (six sequences) contained an HXH or HXXH motif (where X represents any amino acid).We have previously shown that they were false positives during screening of inverted peptide libraries against GST-PDZ fusion proteins and the origin of their selection is not yet clear

(187). Finally, a large number of peptides (50 total) with diverse sequences were selected and were collectively categorized as class V peptides. The origin of their selection is not yet clear, and we have not previously observed this type of sequence during our library screening against numerous other protein domains. Interestingly, some of the selected sequences were highly similar with each other (e.g., PSDHV and PSDYV, LRKMG and

RLRMG), suggesting that they were positively selected against some component of the library screening system.

4.3.3 Sequence Specificity of the HDAC6 Domain

The HDAC6 BUZ domain also selected the same five classes of peptides (Table

4.2). Again, the class I peptides (50 sequences) all contained a C-terminal GG motif. 116

Among the class I peptides, HDAC6 has weaker selectivity for basic residues at the -2 position than Ubp-M; although lysine was frequently selected, Arg was not (Figure 4.2).

On the other hand, hydrophobic residues Leu and Nle were also selected at this position.

At the -4 position, the HDAC6 BUZ domain prefers an Arg, His, or Phe residue. The four class II peptides all contained the DG(M/Y) motif and were selected against the GST tag.

Screening against the HDAC6 BUZ domain also led to the same false positive sequences containing the HPQ (class III) and HXH (class IV) motifs as the Ubp-M domain. Again, the HDAC6 BUZ domain selected a large number of class V peptides (137 sequences), some of which are highly homologous to each other (e.g., MYRHF and MSRHF, QTVFL and QLVLL, YRRWQ and SRRWQ, SWRAR and SLRAR).

4.3.4 Binding Affinity of the BUZ Domains for Selected Peptides

To confirm the library screening results, we individually synthesized six representative peptides selected against the two BUZ domains and tested their binding affinities for the BUZ domains by fluorescence polarization (Table 4.3, peptides 1-6).

Peptides IAKGG and NALGG were class I ligands selected against Ubp-M and HDAC6

BUZ domains, respectively. Two class II peptides (LQDGF and FDGFM) were chosen to contain the DGF motif at the C-terminus and an internal position, respectively. Finally, one representative class V peptide was selected for each BUZ domain (PSDHV for Ubp-

M and SLRAR for HDAC6). As a control, we also synthesized a peptide corresponding to the C-terminal sequence of ubiquitin (RLRGG). All of the peptides contained a free C- terminus and an N-terminal FITC-β-Ala-β-Ala motif.

As expected, both class I peptides exhibited strong binding to the BUZ domains.

Peptide IAKGG, which was selected against Ubp-M, bound to both BUZ domains with 117

similar affinities (KD ~10 μM) (Table and Figure 4.3). Peptide NALGG, on the other hand, bound to its cognate BUZ domain (from HDAC6) with an affinity much higher than that of the Ubp-M domain (KD values of 0.79 and 33 μM for HDAC6 and Ubp-M

BUZ domains, respectively). This is in excellent agreement with the screening results, which showed that the HDAC6 BUZ domain has a stronger preference for aliphatic hydrophobic residues (Leu, Nle, and Ile) at the -2 position than the Ubp-M domain

(Figure 4.2). As a comparison, the ubiquitin C-terminal peptide had KD values of 2.2 and

0.34 μM for Ubp-M and HDAC6 BUZ domains, respectively. It was previously reported that the Ubp-M BUZ domain binds the full-length ubiquitin with a KD value of

6.5 μM (41). Thus, the binding affinities of the selected peptides (IAKGG and NALGG) for the Ubp-M BUZ domain were similar to those of known binding partners of this BUZ domain. To ascertain that the BUZ domain was responsible for the observed binding, we also conducted the binding study with the isolated HDAC6 BUZ domain (without the

GST tag) and obtained a slightly higher affinity for peptide NALGG (KD = 0.23 μM).

This discrepancy is likely caused by GST dimerization (KD=0.33 μM) (283), because the

GST dimer produces a stronger polarization signal than a monomer and therefore artificially increases the magnitude of the signal at high protein concentrations.

Unfortunately, production of GST-free HDAC6 protein was problematic, and only small amounts of the protein could be produced at a time. Consequently, most of the binding studies involving the HDAC6 domain were conducted with the GST fusion protein. A KD value of 60 nM was reported for the interaction between the HDAC6 BUZ domain and the full-length ubiquitin (48). To determine whether the class I peptides bind to the same site as the full-length ubiquitin, the fluorescence polarization experiment was conducted 118

in the presence of increasing concentrations of ubiquitin. Addition of ubiquitin decreased the anisotropy value in a concentration dependent manner and completely abolished the binding of peptide FITC-BBRGMGG to the HDAC6 BUZ domain at ≥ 7 μM ubiquitin

(Figure 4.4).We thus conclude that the selected class I peptides bind to the ubiquitin- binding site of the BUZ domains.

Class II peptides LQDGF and FDGFM bound to the GST-HDAC6 BUZ domain with KD values of 4.0 and 10.8 μM, respectively. However, neither peptide exhibited significant binding to the Ubp-M BUZ domain (KD > 150 μM), which contained a six- histidine tag but not the GST tag.When the GST tag was removed by partial proteolysis, the isolated HDAC6 BUZ domain had no detectable binding to the peptides (Figure 4.5).

This suggests that the peptides might be binding to the GST tag. To test this notion, we performed the binding studies with a GST-Abl2 SH2 domain fusion protein and found that the LQDGF peptide bound to the GST-SH2 protein with a KD value of 4.6 μM, essentially the same as the binding affinity of the LQDGF peptide for the GST-BUZ domain. Moreover, the binding of the peptide to the GST-SH2 protein was inhibited by glutathione, with an IC50 of ~200 μM. Therefore, we conclude that the class II peptides were selected from the peptide library because of their ability to bind to the GST portion of the GST-BUZ domain proteins. Class V peptide PSDHV exhibited very weak binding to the Ubp-M BUZ domain (KD >140 μM), while class V peptide SLRAR exhibited no binding toward HDAC6 BUZ (Table and Figure 4.3).

4.3.5 Determination of Critical Positions for Binding by Alanine Scan

To determine whether all or which of the positions are critical for binding to a

BUZ domain, we performed an “alanine scan” of the ubiquitin C-terminal peptide 119

(RLRGG), during which each of the five C-terminal residues of the peptide was replaced with an alanine. For both BUZ domains, the C-terminal dipeptide GG is critical for binding; substitution of Ala for the terminal Gly completely abolished binding, while mutation of the Gly at position -1 either abolished binding (HDAC6) or greatly reduced the binding affinity (~30-fold for Ubp-M) (Table 4.3). Interestingly, the two BUZ domains display differential sensitivities to Ala substitution at the more N-terminal positions. At the -2 position, while mutation of the Arg to Ala decreased the affinity for the Ubp-M BUZ domain by ~8-fold, it had a weaker effect on the HDAC6 BUZ domain

(2.6-fold). This is consistent with the library screening results, which showed a clear preference for Arg at this position by Ubp-M but not HDAC6 (Figure and Table 4.2). The opposite trend was observed at the -4 position, where Arg → Ala mutation reduced the affinity for the HDAC6 BUZ domain by 7.4-fold but had a minimal effect on the Ubp-M domain, again in agreement with the specificity profile obtained from library screening.

Replacement of the Leu at the -3 position with Ala had no effect on the Ubp- M BUZ domain and resulted in a minor reduction in affinity for the HDAC6 domain (1.5-fold).

Thus, we conclude that for Ubp- M and HDAC6 BUZ domains and likely many other

BUZ domains, the C-terminal GG motif is the most critical element of the specificity determinant. This is consistent with the structure of the ubiquitin C-terminus bound to the

BUZ domain of USP5 (isoT), where molecular modeling predicted a steric clash between the Ala-76 side chain of a G76A mutant ubiquitin and tyrosine residues in the binding pocket of the USP5 BUZ domain (42). Our results suggest that BUZ domains make additional contacts with the more N-terminal residues, for enhanced affinity and

120

specificity. It appears that different BUZ domains may contact different positions in a peptide (e.g., -2 position vs -4 position) as well as recognize different amino acid residues at the same position (e.g., Arg vs Leu at the -2 position).

4.3.6 Database Search for Potential Interacting Partners of Ubp-M and HDAC6 BUZ Domains

We searched an Expasy proteomics server database (http://ca.expasy.org/) for human proteins that contain a diglycine motif at the C-terminus. The search was conducted by entering the motif XXXGG> (the greater than sign limits the search to C- terminal GG sequences, and X represents any amino acid) into the search site. Only proteins that are known to exist either at the transcript level or at the protein level are included. The search resulted in 73 human proteins with a C-terminal GG motif. On the basis of the sequence specificity determined above, we predict 11 of the proteins as potential targets of the Ubp-M BUZ domain and 24 proteins as potential partners of the

HDAC6 BUZ domain, one of which is ubiquitin (Table 4.4).

4.3.7 In Vitro Interaction between Ubp-M and HDAC6 BUZ Domains and Protein C-Termini

We next synthesized four peptides corresponding to the C-terminal sequences of four of the predicted BUZ-binding partners, F-box only protein 11 (FBXO11, STLGG), histone H4 (YGFGG), prostate tumor overexpressed gene 1 protein (PTOV1, RGMGG), and FAT10 (or ubiquitin D, YUIGG) (Table 4.3). FBXO11 contains an F-box motif, which potentially links it to the protein ubiquitination pathway (288). Histone H4 was chosen because it has already been established that Ubp-M interacts with another histone protein, histone H2A (289). PTOV1 is a protein overexpressed in prostate cancer (290). It has previously been reported that HDAC6 interacts with FAT10 and both the catalytic 121

and BUZ domains of HDAC6 contribute to the binding interaction (291). It was not clear, however, whether the BUZ domain recognizes the C-terminal sequence of FAT10. All four peptides bound to the HDAC6 BUZ domain with KD values in the low micromolar range (1.3-6.2 μM) (Figure 4.6 and Table 4.3). We therefore propose that binding to the

C-terminus of FAT10 by the BUZ domain is a key mechanism for the observed HDAC6-

FAT10 interaction. The four peptides also bound to the Ubp-M BUZ domain, although the binding affinities were somewhat lower than those of the HDAC6 domain (KD = 12-

17 μM). To test whether the Ubp-M BUZ domain is capable of binding to intact proteins

(other than ubiquitin), we performed in vitro GST pulldown assays using heterotetramer

(H3+H4)2 of recombinant Xenopus laevis histone proteins H3 and H4 and the GST-Ubp-

M BUZ domain. As shown in Figure 4.7, the (H3+H4)2 heterotetramer bound to the

GST-BUZ domain but not to GST alone. The bound histone protein required 0.7 M NaCl to elute, indicating that the interaction was relatively strong. Taking all the data together, we conclude that the BUZ domains are sequence-specific protein-binding modules that recognize the free C-termini of cellular proteins, with each interacting with a specific subset of partner proteins (including ubiquitin).

4.4 Discussion

In this work, we have systematically profiled the sequence specificities of Ubp-M and HDAC6 BUZ domains. Several conclusions have emerged from our studies. First, the BUZ domain is a sequence-specific protein-binding module that recognizes the free

C-termini of proteins. Second, different BUZ domains apparently have distinct specificity profiles, with each domain recognizing a different subset of peptide sequences, although overlapping specificities are possible. For the two BUZ domains examined in this work, a 122

Gly-Gly motif at the extreme C-terminus is required for binding. This specificity is unique among all of the protein-binding modules that have been biochemically and/or structurally characterized and is endowed by having a deep, narrow pocket on the BUZ domain surface that is sufficiently large to fit only the small Gly residues (42). At positions N-terminal to the Gly-Gly motif, the two BUZ domains exhibit different specificities. For example, while the Ubp-M BUZ domain strongly prefers a basic residue at the -2 position but has broad specificity at the -3 and -4 positions, the BUZ domain of

HDAC6 prefers a basic residue at the-4 position and tolerates a variety of amino acids at the -2 and-3 positions. The ability of the BUZ domains to bind with high affinity to peptide sequences that are different from the ubiquitin C-terminus suggests that they may bind to other cellular proteins in vivo, in addition to ubiquitin. A database search identified 11 and 24 other human proteins as potential binding partners of Ubp-M and HDAC6 BUZ domains, respectively (Table 4.4). Peptides corresponding to the C- terminal sequences of four of these potential targets that were selected for further testing all bound to the two BUZ domains with low micromolar KD values. Furthermore, one of the four proteins (FAT10) has previously been reported to bind to HDAC6 via its BUZ domain (291), while our current work has demonstrated that histone H4 also binds to the GST-Ubp-M BUZ domain in an in vitro pull-down assay (Figure 4.7). We propose that many of the other predicted interactions in Table 4.4 may be physiologically relevant. In addition, proteolytic cleavage may generate protein fragments that bear

C-terminal Gly-Gly motifs, which may bind to the BUZ domains. In this regard, several

BUZ domains, including those of Ubp8p, USP20, USP22, and USP33, have been shown or are expected to be incapable of binding to ubiquitin (43, 44, 292). We suggest that 123

they may function by binding to non-ubiquitin proteins. Finally, the BUZ domain of

HDAC6 generally has a higher affinity for peptide ligands than the Ubp-M domain

(Table 4.3). It is likely that BUZ domains interact with their physiological targets with a wide range of affinities to perform proper functions.

Screening of the inverted peptide library also resulted in a large number of other peptide sequences (classes II-V in Tables 4.1 and 4.2). We have shown that the class II peptides with the DG(F/Y) consensus motif bind to the of GST with KD values of ~4 μM. To the best of our knowledge, this represents a novel class of high-affinity peptide ligands for GST (293). Because GST fusion proteins are widely used in numerous applications, the new peptide ligands may provide useful tools for research and development purposes. For example, because DG(F/Y) peptides do not contain free thiol or amine functionalities (unlike free glutathione), DG(F/Y) peptides could be used to elute GST fusion proteins from glutathione columns and the eluted protein could be directly labeled with amine or thiol reactive reagents without the need for dialysis and/or size exclusion chromatography. They may also be used to immobilize GST fusion proteins onto solid surfaces to generate protein chips with a uniform orientation (294).

The class III and IV ligands have previously been shown to be caused by either direct binding to SA-AP (used in library screening) or nonspecific binding of unknown origin.

The origin of the class V peptides is currently unknown. Although the two peptides we have tested failed to bind to the BUZ domains, we cannot rule out the possibility that some of the peptides may be bona fide ligands of the BUZ domains. It should be pointed out that among the numerous protein domains that have been subjected to peptide library screening in this laboratory, the two BUZ domains were unusual in that screening led to 124

so many noncognate peptides. The underlying reason for their appearance is yet unclear.

However, our ability to generate individual peptide sequences and therefore the ability to classify the selected peptides into different families speaks highly of the advantage of the

OBOC method over other library methods (e.g., oriented peptide libraries or SPOT libraries) (167, 170, 295).

In conclusion, we have demonstrated the BUZ domains to be a family of sequence-specific protein-binding domains and determined the sequence specificity profiles for two BUZ domains. Our results suggest that the BUZ domains may bind to cellular proteins other than ubiquitin and the specificity data should be useful for the identification of these protein targets.

4.5 Acknowledgements

Dr. Juan Shen (The Ohio State University) synthesized the Nα-Boc-Glu(δ-N- hydroxysuccinimidyl)-O-CH2-CH=CH2 molecule used in the peptide library synthesis.

The BUZ domain contructs were kindly provided by the lab of Dr. Pei Zhou (Duke

University), while the DNA constructs encoding Xenopus laevis histone proteins H3 and

H4 were kindly provided by K. Luger (Colorado State University, Fort Collins, CO). The pull-down assays were performed by Dr. Jiangxin Liu (Duke University).

125

Table 4.1 Peptides selected from Ubp-M BUZ domain screensa

Class I Class II HPQXX FDAGK WYAGG LSDGF HPQRY VASGL KFAGG LQDGF* WUHPQ PSHHL RRAGG RWDGF WTHPQ IFISL LHFGG LYDGF AEAAM RKGGG MDGLF Class FIYDM MYHGG XDGMF IV HRAGM IRHGG DGFWF IYHQH GMGLM KVHGG FDGFM* YVHTH YDDQM RYIGG DWDGM HSHKT HQKGQ YSKGG LDGYM HKHYT PULGQ QDKGG UDGYM HHHXX YFGKQ IAKGG* WDGYS QHLTH KIQLQ RWKGG NDGFW SIWUR MFKGG HDGFY Class V DMUGR MNKGG FRDGY PSDHV* UUKSS KYKGG LMDGY PSDYV QKKRT EFKGG YFDGY YRFDA HQVNW NNKGG LFDGY QFLDA FYEMW HTKGG KGGGA YTHRV HRLGG Class III VVVPU FRAGR NPLGG RHPQA LDHRD ESHGD KSMGG HPQNU SVIKE RNRNV YGPGG THPQF HGMSE KRYHH YFQGG HPQUG UKFAF YINGY IIQGG HPQWG DLVGF LFWMY DIRGG HPQWG LHFLF QRDTY DKRGG SHPQN ISVQF PDIWY MGRGG HPQRP UGLVF SPRGG WRHPQ LRKMG KLRGG WNHPQ RLRMG IMRGG LYHPQ MITWG RURGG GWHPQ YNSGH UFYGG RHPQQ PFYMH IUYGG WHPQR WGPYH AKYGG HPQXX FMUAI QLYQI

aM, norleucine; U, (S)-2-aminobutryic acid; X, amino acids whose identity could not be unambiguously determined. *, sequences selected for resynthesis and binding analysis.

126

Table 4.2 Peptides selected from HDAC6 BUZ domain screensa Class I YIWGG RHGLF DVHTM LRLRT HHQGG HHYGG GIILF RMHVM NNMTT FRAGG FYYGG MPVLF IYWYM IFLYT RYUGG HVYGG SRHMF HIYYM LUEHV FKUGG LAKMF VVDQP UYRHV RKDGG Class II TRSRF GLRVP KIYKV QREGG DDGMF LYUIG RQFRQ FSIRV RUEGG FEDGM ADSIG TILTQ AKDWV RSEGG WADGY LRUMG YRRWQ RHGRW WUFGG ADGMY MVRHH SRRWQ HUUWW WYHGG WKWIH SWRAR RRUWW HAHGG Class III LRPKH SLRAR* UTYWW VKHGG SHPQU HRWLH WQKAR LQIHA WMHGG SHPQF UKQRH KRYAR MYRER HEIGG HPQXX FLRSH NALER YFRRU YLIGG HPQXX RHIWH IRIFR QVRRL XXKGG HPQXX RTTWH VMPGR WWHLR LHKGG WHPQG RFQYH SRUHR RGYRR HDKGG YIYAI AWHHR WNGGY TRKGG Class IV THMDI YYHHR RREHY AFKGG HSHIH WVYGI GYMHR RIHKY LLKGG FRHYH VLRHI VIRHR WKALY IKKGG YHKHT VVGLI QVRHR YGHPY XXKGG HKHYW IWIMI FKWHR DIAYR RNKGG HRHPY LFFQI RYDIR MIKRY REKGG HRHXX IFEYI RQLLR RFQRY NALGG* HTHRR YRHFK RVTLR LIKYY RVLGG HIHRR URQKK LRUMR MYYYY HTLGG HAHXX PGNDL VRSMR EGLGG YHDHV QTVFL WSARR DHLGG QLVLL RYFRR QELGG Class V IMAQL LKGRR FTMGG LSGPA IYFQL LLHRR RRMGG ETWUU WWHRL AGIRR MYMGG WRIFU RKQRL RIQRR WWMGG WMQFU LSMSL KFYRR FNMGG RRWNU FHKVL LRATR HWPGG WMEPD MNWFM YHNTR NYPGG MFNMD IRHHM GIRVR SWPGG RWHAF YEALM QHPWR MIQGG LELAF FQULM QRUYR LLQGG YRRFF KPMLM FKFFS IRQGG RRAGF KGPQM MLMGS ULQGG MYRHF RVSQM MUITS UMRGG MSRHF KIQRM FLLAT RFSGG UHHKF LURRM HPVHT QRWGG EYFLF TLUSM YRRQT

aM, norleucine; U, (S)-2-aminobutryic acid; X, amino acids whose identity could not be unambiguously determined. *, sequences selected for resynthesis and binding analysis.

127

Table 4.3 Dissociation constants of selected peptides against the BUZ domains

Entry Peptide Peptide KD (M) No. Sequencea Source Ubp-M BUZb HDAC6 BUZc 1 IAKGG library 12 ± 3 8.8 ± 1.1 2 NALGG library 33 ± 7 0.79 ± 0.12 0.23 ± 0.02 (non-GST)d 3 LQDGF library >150 4.0 ± 1.3 NA (non-GST) d 4 FDGFM library >150 10.8 ± 2.5 5 PSDHV library >140 ND 6 SLRAR library ND NA 7 RLRGG ubiquitin 2.2 ± 0.4 0.34 ± 0.02 8 RLRGA NA NA 9 RLRAG 62 ± 11 NA 10 RLAGG 16 ± 4 0.90 ± 0.10 11 RARGG 2.0 ± 0.3 0.47 ± 0.02 12 ALRGG 2.3 ± 0.4 2.5 ± 0.4 13 STLGG FBXO11 16 ± 4 3.7 ± 0.5 14 YGFGG Histone H4 13 ± 3 6.2 ± 0.6 15 RGMGG PTOV1 17 ± 3 1.3 ± 0.2 16 YUIGG FAT10 12 ± 2 1.3 ± 0.1

a Each peptide was labeled at the N-terminus with FITC via a β-Ala-β-Ala linker and contained a free C-terminus. bUbp-M BUZ domain contained an N-terminal six-histidine tag. cGST- BUZ fusion protein was used in the binding studies unless otherwise noted. dIsolated BUZ domain (non-GST fusion) was used. M, norleucine; U, (S) -2-aminobutryic acid; NA, no significant binding affinity; ND, affinity not determined.

128

Table 4.4 Database search for potential Ubp-M and HDAC6 BUZ-binding proteins

C-terminal ID number Proteins Sequence P98198-2 Probable phospholipid-transporting ATPase ID, isoform 2 STLGGb Q86XK2-4 Isoform 4 of F-box only protein 11 O60942-4 Isoform 4 of mRNA-capping enzyme LERGGa Q9HC96-6 Calpain-10, isoform F Coiled-coil-helix-coiled-coil-helix domain-containing protein 3, LEKGGa Q9NX63 mitochondrial RVSGGb O14578-2 Isoform 2 of Citron Rho-interacting kinase QRLGGb Q9GZP9 Derlin-2 KSFGGb Q6P158-2 Isoform 2 of Putative ATP-dependent RNA helicase DHX57 LYKGGa Q8TBM8 DnaJ homolog subfamily B member 14 LYKGGa Q8TBM8-2 Isoform 2 of DnaJ homolog subfamily B member 14 LRRGGa Q9H819 DnaJ homolog subfamily C member 18 YGFGGa,b P62805 Histone H4 HPKGGa,b P49641-2 Isoform Short of Alpha-mannosidase 2x Q9NZB8-3 Isoform 3 of Molybdenum protein 1 ILIGGb Q9NZB8-5 Isoform MOCS1A of Molybdenum cofactor biosynthesis protein 1 Q9NZB8-6 Isoform 2 of Molybdenum cofactor biosynthesis protein 1 RGCGGb Q71RS6 Sodium/potassium/calcium exchanger 5 RWVGGb Q9H1B4-5 Isoform E of Nuclear RNA export factor 5 NTLGGb Q8NGS3 Olfactory receptor 1J1 [Pyruvate dehydrogenase [acetyl-transferring]]-phosphatase 2, YYKGGa Q9P2J9 mitochondrial Q86YD1 Prostate tumor-overexpressed gene 1 protein RGMGGb Q86YD1-2 Isoform 2 of Prostate tumor-overexpressed gene 1 protein Q86YD1-3 Isoform 3 of Prostate tumor-overexpressed gene 1 protein GKKGGa P10155-2 Isoform Short of 60 kDa SS-A/Ro ribonucleoprotein Q9BZJ4 Solute carrier family 25 member 39 RLLGGb Q9BZJ4-2 Isoform 2 of Solute carrier family 25 member 39 IIVGGb P60059 Protein transport protein Sec61 subunit gamma RGSGGb Q58EX2-4 Isoform 4 of Protein sidekick-2 TPRGGa Q9Y675 SNRPN upstream reading frame protein YCIGGb O15205 Ubiquitin D RMLGGb P35544 Ubiquitin-like protein FUBI RLRGGa,b P62988 Ubiquitin

aPotential binding partners of Ubp-M BUZ domain; bpotential binding partners of HDAC6 BUZ.

129

Figure 4.1 Synthesis scheme of a C-terminal OBOC peptide librarya

a Reagents and conditions: (a) standard Fmoc/HBTU chemistry, (b) soak in water and α then 0.5 equiv of N -Boc-Glu(δ-N-hydroxysuccinimidyl) -O-CH2CH=CH2, (c) Fmoc-Gly-OH/ HBTU, (d) TFA, (e) HMPA/HBTU, (f) piperidine, (g) Fmoc-Arg(Pbf)-OH/HBTU, (h) Fmoc-AA/ DIC, (i) Pd(PPh3)4, (j) piperidine, (k) PyBOP and HOBt, (l) modified Reagent K. B, β-alanine, X, random residues. Figure 4.2 Sequence specificity of the Ubp-M and HDAC6 BUZ domainsa a 100 80 60 40 0 20 0 -4 D E N Q H K R W F Y M L I V U T S A G P

b 100

Occurrence (%) Occurrence 80 60 40 20 0 0 -4 D E N Q H K R W F Y M L I V U T S A G P

aDisplayed are the amino acids identified at each position from the C-terminus (position 0) to the -4 position (z-axis) for the Ubp-M BUZ (a) and HDAC6 BUZ (b) domains. Occurrence on the y-axis represents the percentage of selected sequences that contained a particular amino acid at a certain position. 130

Figure 4.3 Representative plots showing the binding of FITC-labeled peptides to Ubp-M and HDAC6 BUZ domainsa

a The Ubp-M BUZ domain contained an N-terminal (His)6 tag, whereas the HDAC6 BUZ domain contained an N-terminal GST fusion tag, protein concentration (x-axis) is in μM. B, β-alanine.

131

Figure 4.4 Fluorescence anisotropy assay showing the competition between FITC- BBRGMGG and Ubiquitin for binding to GST-HDAC6 BUZ domaina

aThe peptide and BUZ domain were kept at fixed concentrations (92 nM and 2.6 μM, respectively), while the concentration of ubiquitin was varied (0–30 μM). M, l-norleucine; B, β-alanine.

132

Figure 4.5 Fluorescence anisotropy assay showing the binding of FITC-BBLQDGF to Glutathione S-transferasea

a For the binding assay between HDAC6 BUZ (without GST) and FITC- BBLQDGF, the KD was too weak to determine. The binding affinity of GST-HDAC6 BUZ and GST-Abl2 SH2 to FITC-BBLQDGF was almost identical, indicating that the peptide motif interacts with GST. For the competition assay between FITC-BBLQDGF and glutathione for GST-Abl2 SH2, the peptide concentration was fixed at 90 nM and the SH2 concentration was fixed at 33 μM. The glutathione concentration was varied from 0-909 μM. All x-axis concentration values are in μM; B, β-alanine.

133

Figure 4.6 Fluorescence anisotropy assay showing the interaction between the C- terminal peptides of histone H4 and FAT10 to Ubp-M and HDAC6 BUZ domainsa

a The Ubp-M BUZ domain contained an N-terminal (His)6 tag, whereas the HDAC6 BUZ domain contained an N-terminal GST fusion tag, all protein concentrations (x-axis) are in μM; B, β-alanine; U, (S)-2-aminobutryic acid. The peptide YGFGG is derived from the C-terminus of histone H4, while YUIGG is derived from the FAT10 C-terminus.

134

Figure 4.7 GST pull-down assay of the histone H3-H4 complex by Ubp-M BUZa

a GST-BUZ (or GST) protein and the (H3+H4)2 tetramer were sequentially loaded onto the column as input. The column was washed with 10 mL of PBS containing 0.1M NaCl and then eluted with 3 x 200 μL of PBS containing 0.1M NaCl (fractions 1-3), 3 x 200 μL of PBS containing 0.3M NaCl (fractions 1-3), and 3 x 200 μL of PBS containing 0.7 M NaCl (fractions 1-3). Each fraction was analyzed by SDS-PAGE and stained with Coomassie blue. (a) Elution profile of the column loaded with GST-BUZ protein. (b) Elution profile of the column loaded with GST alone. M, molecular mass markers.

135

CHAPTER 5

SYSTEMATIC INSPECTION OF THE PHOSPHOPEPTIDE BINDING SPECIFICITY OF TANDEM BRCT DOMAINS

5.1 Introduction

The cellular response to DNA damage requires the phosphorylation of proteins by serine/threonine kinases such as ATM, ATR, and DNA-PK (296).

Phosphorylation of specific protein sites generates phosphopeptide motifs that can be recognized by phosphoserine or phosphothreonine-binding proteins, such as 14-3-3 proteins, Forkhead-Associated domains (FHA) domains, and BRCA1 C-terminal (BRCT) domains (103). The recognition of specific sequences phosphorylated upon DNA damage by modular domains allows for the formation of DNA damage repair complexes at the sites of DNA breaks. One protein domain critical to the DNA damage repair pathway is the BRCT domain (78, 79). Due to the importance of the BRCT domain in the DNA damage response, BRCT domains are mutated in certain cancers (297-299). The BRCT domain is found in at least 30 human proteins (74) and can occur as a single structural unit, a tandem structural unit, or even a triple repeat structure (75, 98). In one case, a

Forkhead-Associated (FHA) domain was found to form a single structural unit with a

BRCT domain tandem (76).

The diversity of the arrangement of BRCT domains in the proteins in which

136

they are found is matched by the diversity of interaction modes to their binding targets.

They recognize proteins in both phosphorylation-dependent (82, 83) and independent

(88, 100, 300) manners, and are also capable of binding to DNA (80, 301). Although there is conflicting evidence as to whether single BRCT domains can act as phosphopeptide binding modules (82, 92, 98, 302, 303), it is well established that tandem

BRCT domains can specifically recognize phosphopeptides (83-86, 92, 174). Tandem

BRCT domains that have been shown to interact with phosphopeptides generally have a preference for binding phosphoserine-containing peptides over phosphothreonine- containing peptides (83, 304).

Determination of the binding targets of tandem BRCT domains will help to understand their biological functions, for example, their mechanism of action in the

DNA damage response pathway. Previous attempts at determining the sequence specificity of tandem BRCT domains employed either oriented phosphopeptide libraries or SPOT phosphopeptide libraries (83, 92-94). Oriented peptide libraries can only reveal general trends in the binding specificity of a domain, but not individual binding sequences. SPOT peptide libraries can provide individual binding sequences, but are generally limited to ~104 members. Our lab has previously utilized one-bead-one- compound (OBOC) peptide libraries to determine the binding specificities of a variety of protein domains (49, 194, 305, 306). Those studies sometimes led to the discovery of binding motifs that would be missed using chemically-synthesized peptide libraries that do not allow for individual binding sequence determination (194, 306). Despite the fact that there are at least 15 tandem BRCT domains present in the human proteome (74), their binding specificities have not yet been systematically evaluated using peptide 137

library approaches. Furthermore, almost all previous binding specificity studies of tandem BRCT domains were performed with libraries containing only one random position which included phosphoserine (pS) or phosphothreonine (pT) (only Breast

Cancer-Associated Protein 1 (BRCA1) tandem BRCT (BRCT)2 was screened against a library with pS or pT present in all of the random positions of the library (92). Tandem

BRCT domains of Pax Transactivation Domain-Interacting Protein (PTIP), Nijmegen

Breakage Syndrome Protein 1 (NBS1, also known as Nibrin), Ankyrin Repeat Domain-

Containing Protein 32 (ANKRD32), and DNA Topoisomerase 2-Binding Protein 1

(TopBP1) have been reported to interact with dual phosphoserine-containing motifs (76,

307-310). Even though at least 4 tandem BRCT domains have been shown to preferentially bind phosphopeptides which contain a free carboxylate group after the third amino acid C-terminal to phosphoserine (85, 86, 90, 91), only two tandem BRCT domains have been screened against peptide libraries which present a free C-terminus after the +3 position (relative to pS) in the library (93, 94).

In this study, we systematically investigated the binding specificity of all the known tandem BRCT domains in the human proteome using two OBOC phosphopeptide libraries. The tandem BRCT domains were screened against OBOC phosphopeptide libraries containing both free C-termini and internal pS/pT motifs. Our results suggest that at least eight of the BRCT repeats recognize specific phosphopeptide motifs. We found that the FHA domain of Nibrin, and not its BRCT repeat, is responsible for its binding to a (D/E)-(D/E)-pT-X-I consensus sequence. We also have found that the most

C-terminal tandem BRCT domains of PTIP bind a novel dual phosphoserine motif, where both phosphoserine residues are important for the binding interaction. 138

5.2 Experimental Procedures

5.2.1 Materials

All DNA modifying enzymes used to clone the BRCT domains were purchased from New England Biolabs (Ipswich, MA). For PCR, Pfu Turbo polymerase was from

Stratagene (Santa Clara, CA). Primers used for PCR or site-directed mutagenesis were ordered from Integrated DNA Technologies (Coralville, IA). TentaGel and CLEAR-

Amide resins were purchased from Peptides International (Louisville, KY), while Wang resin was purchased from Advanced ChemTech (Louisville, KY). Coupling reagents and most Fmoc L-amino acids were purchased from Advanced ChemTech and NovaBiochem

(La Jolla, CA). Fmoc L-Ser(HPO3Bzl)-OH, Fmoc L-Thr(HPO3Bzl)-OH, and Fmoc-8- amino-3,6-dioxaoctanoic acid were purchased from Chem-Impex International (Wood

α Dale, IL). N -Boc-Glu(δ-N-hydroxysuccinimidyl)-O-CH2CH=CH2 was synthesized by

Dr. Juan Shen as described previously (49). Streptavidin-alkaline phosphatase (SA-AP) conjugate was purchased from Promega (Madison, WI). Phenyl isothiocyanate (PITC) was purchased in 1 mL sealed ampoules from Sigma (St. Louis, MO). Trypsin inhibitor

(Soybean) was purchased from Millipore (Temecula, CA). Leupeptin was from Enzo

Life Sciences (Farmingdale, NY). N-(9-fluorenylmethoxycarbonyloxy) succinimide

(Fmoc-OSu) was from Advanced Chemtech. 5-Bromo-4-chloro-3-indolyl phosphate

(BCIP) was from Sigma. Streptavidin-agarose resin was purchased from Thermo Fisher

Scientific (Rockford, IL). Plasmid Mini Extraction and PCR purification kits were purchased from Bioneer (Alameda, CA). Plasmid Plus Maxi Kit (for large-scale plasmid purification) was purchased from Qiagen (Germantown, MD). Fluorescein 5- isothiocyanate (5-FITC) was purchased from Anaspec (Fremont, CA) and EZ-Link NHS- 139

chromogenic biotin was from Thermo Scientific. Rabbit anti-PTIP polyclonal antibody was purchased from Bethyl Laboratories (Montgomery, TX), while the goat anti-rabbit

IgG-peroxidase secondary antibody was from Thermo Fisher Scientific. Immun-Star

HRP substrate was from Bio-Rad (Hercules, CA). Kodak BioMax light film was from

Sigma. Dulbecco’s Modified Eagle’s Medium (DMEM) was from Sigma, while fetal bovine serum (FBS), phosphate-buffered saline (PBS), 0.25% Trypsin-EDTA-Phenol

Red, and Opti-MEM reduced serum medium were from Gibco Life Technologies (Grand

Island, NY). Lipofectamine 2000 (transfection reagent) was purchased from Invitrogen

(Carlsbad, CA). HEK293 cells were from ATCC (Manassas, VA), 150 x 25 mm culture dishes were from Corning Life Sciences (Tewksbury, MA). Micro Bio-Spin columns and end caps were purchased from Bio-Rad. Parafilm was from Pechiney Plastic Packaging

(Menasha, WI).

5.2.2 Cloning of the Tandem BRCT Domains

Plasmids encoding the human tandem BRCT domains GST-BRCA1 BRCT2,

GST-MDC1 BRCT2, GST-MCPH1 BRCT2, GST-TopBP1 BRCT 7-8 and mouse tandem

BRCT domains GST-PTIP BRCT 1-2 and GST-PTIP BRCT 3-4 were kindly provided by

Dr. Junjie Chen (MD Anderson Center, University of Texas) (92). The plasmid encoding human GST-BARD1 BRCT2 (residues 554-777) was provided by Dr. Mark Glover

(University of Alberta) (311). The plasmid DNA encoding human ECT2 BRCT2 was kindly provided in a pCMV-Myc vector by Dr. Channing Der (Department of

Pharmacology, University of North Carolina at Chapel Hill). cDNA encoding human

PTIP BRCT 5-6, mouse TopBP1 BRCT 1-2, mouse TopBP1 BRCT 4-5, human DNA

Ligase IV BRCT2, human TP53BP1 BRCT2, mouse ANKRD32 BRCT2, mouse XRCC1 140

BRCT2, and human Nibrin FHA-BRCT2 were purchased from Open Biosystems, Thermo

Scientific (clone IDs 4509264, 6510112 (for both TopBP1 BRCT 1-2 and 4-5), 5259632,

8327629, 30714028, 4207343, and 9020793, respectively). The DNA fragments encoding the tandem BRCT domains of ECT2 BRCT2 (residues 144-376), PTIP BRCT

5-6 (residues 862-1069), TopBP1 BRCT 1-2 (residues 1-290), TopBP1 BRCT 4-5

(residues 534-758), DNA ligase IV BRCT2 (residues 618-911), TP53BP1 BRCT2

(residues 1723-1972), ANKRD32 BRCT2 (residues 1-200), XRCC1 BRCT2 (residues

310-631), and Nibrin FHA-BRCT2 (residues 1-382) were amplified and isolated by polymerase chain reaction (PCR) using the following primers: 5’-tga ata gca tat ggc tga ttg tag agt tat tgg acc acc-3’ and 5’-tgt cga cgg aat tcg ggg tat tta gag aaa gca ttg aca ctg a-3’ (ECT2) 5’- tga ata gca tat gac tcc aga att gac ccc ttt tgt g-3’ and 5’-gga att cgg gtc gac gtt aaa ctt ata tga ttc ata gtc cag cgt-3’ (PTIP BRCT 5-6), 5’-tga ata gca tat gtc cag aaa tga cca aga gcc gt-3’ and 5’-gga att cgg gtc gac tgc ttc tac tct ggt ttc agc ctt g-3’

(TopBP1 BRCT 1-2), 5’-tga ata gca tat gga gga aaa caa gtc gtc tgt cag tca tt-3’ and 5’- gga att cgg gtc gac tcc att tgg tat ctt agt ttc taa aac ttg-3’ (TopBP1 BRCT 4-5), 5’-tga ata gca tat gtg gaa act gct gcc cgc c-3’ and 5’-gga att cgg gtc gac ttt tgg cct ttc act caa atc cca-3’ (Nibrin FHA-BRCT2), 5’-tga ata gca tat ggg tgg tga tga tga acc aca aga a-3’ and

5’-gga att cgg gtc gac aat caa ata ctg gtt ttc ttc ttg taa ttc aca c-3’ (DNA Ligase IV

BRCT2), 5’-tga ata gca tat gcc tct caa caa gac ctt gtt tct gg-3’ and 5’-gga att ccg ggg tcg acg tga gaa aca taa tcg tgt tta tat ttt gg-3’ (TP53BP1 BRCT2), 5’-tga ata gca tat gga aga tag tgc cac aaa aca tat ca-3’ and 5’-gga att ccg ggt cga ctt ctt tct cta ata gaa aat ctc cta aat act g-3’ (ANKRD32 BRCT2), 5’-tga ata gca tat gcg cac tgg acc cca aga gct tgg ca-3’ and

5’-gga att ccg ggt cga cgg cct ggg gca cca ccc cat ag-3’ (XRCC1 BRCT2). Primers used 141

for site-directed mutagenesis of the Nibrin FHA domain include 5’-gtt gag tac gtt gtt gga gcg aaa aac tgt gcc att ctg-3’ and 5’-cag aat ggc aca gtt ttt cgc tcc aac aac gta ctc aac-3’

(R28A Nibrin FHA BRCT2 mutant).

The coding sequence for ECT2 was amplified by PCR from the provided pCMV-

Myc plasmid, while the sequences of the other BRCT domains were amplified by PCR from the cDNA purchased from Open Biosystems (Thermo Scientific). All primers contained restriction sites at their 5’ and 3’ termini. The ECT2 PCR product was digested with NdeI and EcoRI, while the rest of the PCR products were digested with

NdeI and SalI. The PCR product encoding the DNA Ligase IV BRCT domains contained an internal NdeI restriction site, so that after the restriction digests, it was necessary to sequentially ligate each fragment into the pET22-ybbR13 vector. The vector was treated with calf intestinal phosphatase after digestion with appropriate endonucleases to prevent re-ligation of the vector prior to insert ligation. Digestion products were purified using a

PCR purification kit (Bioneer). Afterwards, the PCR products were ligated into the pET22-ybbR13 vector (Figure 5.1) (284, 312) using T4 DNA ligase. Ligated vectors were used to transform E. coli cells. Transformed DH5α cells were grown in LB media

(with added ampicillin) and plasmids were isolated by using a Bioneer Plasmid Mini

Extraction Kit. The DNA sequences were confirmed by dideoxy sequencing. For site- directed mutagenesis of the Nibrin FHA domain, 18 thermal cycles (55 °C annealing temperature and 68 °C elongation temperature (13.5 minutes per cycle)) were used to synthesize the mutated DNA, followed by DNA purification by minipreparation. The wild-type DNA was digested with DpnI, and the mutated DNA transformed into E. coli cells. The DNA sequence was confirmed by dideoxy sequencing. 142

5.2.3 Expression, Purification, and Biotinylation of the Tandem BRCT Domains

E. Coli Rosetta BL21(DE3) cells containing plasmids encoding BRCT domains were grown in LB media supplemented with 50 mg/L ampicillin and 35 mg/L chloramphenicol at 37 °C until the OD600 reached 0.6. For most BRCT domains, protein production was induced with the addition of 90 µM IPTG at 18 °C for 16 hours. For

GST-BARD1 BRCT2, protein was produced after induction with 500 µM IPTG at 16 °C for 20 hours. Cells containing the plasmid for TP53BP1 BRCT2 were grown at 37 °C until the OD600 = 0.8, protein was then produced by the addition of 300 µM IPTG for 16 hours at 30 °C. For most GST fusion proteins, protein purification was performed as described previously (49, 306). For GST-BARD1 BRCT2 purification, cells were lysed in 10 mM phosphate buffered saline (10 mM sodium phosphate, 150 mM NaCl, 10 mM

2-mercaptoethanol, pH 7.5) containing phenylmethanesulfonyl fluoride (35 mg/L), trypsin inhibitor (20 mg/L), and pepstatin (1 mg/L). GST-protein was immobilized on a glutathione-agarose column, washed with PBS, and eluted with borate buffer (50 mM borate, 300 mM NaCl, 10 mM 2-mercaptoethanol, 10 mM glutathione, pH 9). GST-

BARD1 BRCT2 was passed through a size exclusion chromatography column to remove glutathione, labeled with biotin, and passed through a size-exclusion column in borate buffer to remove excess biotin. For BRCT domains containing C-terminal six-histidine tags, cells were lysed in a buffer containing 20 mM HEPES, 300 mM NaCl, 1 mM 2- mercaptoethanol, 0.5% Triton X-100, pH 8. The proteins were then purified by Talon cobalt affinity chromatography, followed by ion-exchange chromatography on a Q- sepharose column (or a SP-sepharose column for ANKRD32 BRCT2). Both GST fusion and six-histidine fusion proteins were labeled with biotin and further purified as 143

described previously (49, 306), except that EZ-Link NHS-Chromogenic Biotin (Thermo

Scientific) was used to label proteins rather than NHS-biotin.

5.2.4 Library Synthesis

5.2.4.1 Synthesis of OBOC Peptide Library with a Free C-Terminus

The C-terminal pS/pT library was synthesized on 2.8 g Tentagel S NH2 resin (90

µm, 0.28 mmol/g loading capacity, Figure 5.2a). The library was synthesized as previously described up to the third random position (49). After the third random position, the library was split equally into two reaction vessels (1.4 g per vessel), and the resin coupled to either Fmoc L-Ser(HPO3Bzl)-OH or Fmoc L-Thr(HPO3Bzl)-OH (3 equiv of each phosphoamino acid, 3 equiv HATU, 9 equiv DIPEA) in DMF for 3 hr.

Each phosphoamino acid was then coupled again for an additional 3 hr in a 60:40 (v/v)

DMF/DCM mixture. Afterwards, any remaining free amines were reacted with 10 equiv di-tert-butyl dicarbonate (Boc anhydride), along with 11 equiv NMM and 0.1 equiv

DMAP in dry DCM (50 min). To ensure that no free amines were still present, the resin was coupled to Boc-Gly-OH (5 equiv Boc-Gly-OH, 5 equiv HATU, 10 equiv NMM) in

DMF (45 min), followed by acetic anhydride capping (10 equiv acetic anhydride, 11 equiv NMM, 0.1 equiv DMAP) in DCM (20 min). The final two random positions after pS/pT were coupled using standard HBTU/HOBt/NMM chemistry (90 min per reaction).

Like the first three random positions, the last two random positions of the library included 18 proteinogenic amino acids (Ala, Arg, Asn, Asp, Gly, Gln, Glu, His, Ile, Leu,

Lys, Phe, Pro, Ser, Thr, Trp, Tyr, Val) plus L-α-aminobutyrate (cysteine replacement) and L-norleucine (methionine replacement). To monitor the completeness of each reaction, remaining free amines were detected with the acetaldehyde/chloranil test (313). 144

Uncoupled amines were reacted with 10 equiv Boc anhydride, 11 equiv NMM, 0.1 equiv

DMAP in dry DCM (50 min). Any remaining free amines were capped with acetic anhydride. In order to reduce the peptide density on the bead surface by 100-fold, a 9:1 ratio of Boc-Ala-OH and Fmoc-Ala-OH was coupled after the final random position (4.5 equiv Boc-Ala-OH, 0.5 equiv Fmoc-Ala-OH, 5 equiv HBTU, 10 equiv NMM, 1 hr). The procedure was repeated for the second alanine so that only 1% of the final alanine was protected by Fmoc at the N-terminus. The deallylation, cyclization, and sidechain deprotection of the library was carried out the same as previously described (49).

5.2.4.2 Synthesis of OBOC Peptide Library containing Internal Recognitin Motifs

The X7 pS/pT library was synthesized on 5 g Tentagel S NH2 resin (90 µm, 0.27 mmol/g, Figure 5.2b). To reduce the surface density, the resin was swollen in DCM, soaked in DMF, and sequentially incubated in 75% DMF in H2O (10 min), 50% DMF in

H2O (10 min), 25% DMF in H2O (10 min), and H2O overnight. The following day, the water was gently drained from the resin, and the resin was quickly resuspended in 35 mL of 55:45 (v/v) DCM/diethyl ether containing 0.45 equiv Boc-Met-OSu and 0.05 equiv

Fmoc-Met-OSu, along with 0.5 equiv DIPEA (30 min). Afterwards, the beads were washed with the 55:45 (v/v) DCM/ether mixture, DCM, DMF, then coupled to 5 equiv

Fmoc-Met-OH/5 equiv HBTU/10 equiv NMM in DMF (75 min). The resin was then washed with DMF and DCM, followed by removal of the Boc protecting group from methionine by addition of a 1:1 (v/v) solution of DCM/TFA (30 min). The solution was drained and the resin washed with DCM, followed by the addition of 5% DIPEA in DCM to neutralize the resin (10 min). The resulting amines were capped with acetic anhydride to reduce the number of free amines on the surface of the library by 10 fold. The 145

NBBRRM linker was coupled to the resin by standard HBTU/HOBt/NMM chemistry (5 equiv, 1 hr couple, 1 hr double couple per amino acid). The random positions of the library contained 21 different amino acids per position. For the first four random positions, 98% of the resin was coupled to 18 proteinogenic Fmoc-amino acids (Ala, Arg,

Asn, Asp, Gly, Gln, Glu, His, Ile, Leu, Lys, Phe, Pro, Ser, Thr, Trp, Tyr, Val) plus Fmoc-

L-norleucine-OH (Met replacement), while cysteine was excluded from the library. To allow for the distinction between isobaric amino acids by MS sequencing, 5% (mol/mol)

CD3CO2D was added to coupling reactions of Leu and Lys and 5% (mol/mol)

CH3CD2CO2D was added to the coupling reactions of Nle (193). The remaining 2% of the library was coupled to N-α-Fmoc-O-benzyl-L-phosphoserine or N-α-Fmoc-O-benzyl-

L-phosphothreonine (1% to phosphoserine (pS) and 1% to phosphothreonine (pT)). For the first five random positions, all amino acids except pS and pT were coupled for 90 min

(4 equiv amino acid/4 equiv HBTU/4 equiv HOBt/8 equiv NMM) in DMF, and then coupled again in DMF to ensure completion of the reaction. The phosphoamino acids were coupled for 3 hr in DMF and then coupled again for 2 hr in a 60:40 (v/v) solution of

DMF/DCM (5 equiv pS or pT/5 equiv HATU/15 equiv DIPEA). After each random position, the resin was pooled and capped with acetic anhydride (5 equiv acetic anhydride/5 equiv NMM/0.1 equiv DMAP) in DCM for 30 min. The fifth random position contained 25% pS and 25% pT, while the remaining 50% of the resin was coupled to the 19 other amino acids. For the final two random positions, 98% of the resin was coupled to the 19 non-phosphoamino acids, while 1% was coupled with pS and 1% with pT. After the fifth random position, all amino acids were coupled twice, once with

DMF as solvent (2 hr, 5 equiv amino acid/5 equiv HBTU/5 equiv HOBt, 10 equiv NMM 146

for non-phosphoamino acids and 2 hr, 5 equiv pS or pT/5 equiv HATU/ 15 equiv DIPEA for phosphoamino acids), and a second time with a 70:30 (v/v) mixture of DMF/DCM as the solvent. Like the first five random positions, the resin was pooled and capped with acetic anhydride after the sixth and seventh random positions. After the seventh random position, an alanine was added to the N-terminus of all peptides (5 equiv Fmoc-Ala-OH/5 equiv HATU/10 equiv NMM) for 2 hr in DMF, and for a second time for 2 hr in 70:30

(v/v) DMF/DCM. After removal of the Fmoc group from alanine, the N-terminal amine was capped with Alloc-OSu (5 equiv Alloc-OSu, 5 equiv DIPEA, in 1:1 (v/v)

DMF/DCM for 1 hr). The amino acid side chain protecting groups were removed by treatment of the resin with modified Reagent K (7.5% phenol, 5% water, 5% thioanisole,

2.5% ethanedithiol, and 1% anisole in TFA) for 2.5 hr. The resin was then washed with

TFA (1x), DCM (3x), DMF (3x), DCM (3x), TFA (1x), DCM (3x), DMF (3x), DCM

(3x), diethyl ether (1x), DCM (3x), DMF (3x), H2O (3x), DMF (3x), and DCM (3x). The library was dried under vacuum and then stored at -20 °C.

5.2.5 Library Screening

Most screens of BRCT domains were performed essentially the same as described previously (49), except that the washing steps of the peptide library after overnight incubation with protein involved SA-AP binding buffer (30 mM Tris-HCl, 250 mM

NaCl, 10 mM MgCl2, 70 µM ZnCl2, 20 mM imidazole, 20 mM potassium phosphate, pH

7.4, 1 x 0.5 mL wash), HBST-gelatin buffer (30 mM HEPES, 150 mM NaCl, 0.05%

Tween 20, 0.1% gelatin, pH 7.4, 1 x 0.5 mL wash), and SA-AP reaction buffer (30 mM

Tris-HCl, 100 mM NaCl, 5 mM MgCl2, 20 µM ZnCl2, 20 mM imidazole, 0.01% Tween

20, pH 8.5, 1 x 0.5 mL wash). For GST-BARD1 BRCT2, the buffer compositions were 147

modified. The blocking buffer was composed of 50 mM borate, 300 mM NaCl, 0.05%

Tween-20, 0.1% gelatin, pH 9. The SA-AP binding buffer and the SA-AP reaction buffer were both adjusted to pH 9. All screens used 1 µM of biotin-protein for the overnight incubation of library resin and BRCT domain. Beads selected from screens were sequenced using partial Edman degradation/mass spectrometry (PED-MS) essentially the same as described before (192, 193), except that a 45:1 molar ratio of PITC to Fmoc-OSu was used for PED of the C-terminal pS/pT library, while a 65:1 molar ratio of PITC to

Fmoc-OSu was used for the X7 pS/pT library.

5.2.6 Peptide Synthesis and Labeling

Peptides containing a free C-terminus were synthesized on Wang resin (0.8 mmol/g) while all other peptides were synthesized on CLEAR-amide resin (0.37 mmol/g). For synthesis on Wang resin, the first amino acid was coupled using diisopropylcarbodiimide (DIC) as the coupling reagent and dimethylaminopyridine as the catalyst (in 9:1 (v/v) dry DCM:DMF) for 6 h (twice). After the 1st position, all amino acids were coupled using standard Fmoc/HBTU/NMM chemistry except for pS or pT.

Coupling of pS or pT was performed using HATU as the coupling reagent and DIPEA as base (pS/pT were coupled twice to each position, with the first coupling reaction using

DMF as the solvent and the second coupling using a 70:30 (v/v) DMF:DCM mixture).

All couplings after pS or pT also used HATU as the coupling reagent and NMM as base

(each position was coupled twice, using DMF as the solvent for the first coupling step and 70:30 (v/v) DMF:DCM for the second coupling step). After each coupling, acetic anhydride was used to cap any remaining free amines. Peptides synthesized on CLEAR- amide resin were synthesized the same way except that no DIC coupling was necessary 148

for the first coupling step. For labeling peptides at the N-terminus with FITC, labeling was performed on resin. Approximately 5 mg of resin was incubated with 100 µL of 20 mg/mL FITC (dissolved in DMSO) plus 3.2 µL of DIPEA for 2h at room temperature in the dark. The resin was then transferred to a 0.8 mL Bio-Rad Micro-Spin column and washed with DCM, followed by addition of 1 mL modified reagent K. After 2h, the crude peptide was triturated with cold diethyl ether (3x), dried under vacuum, and purified by HPLC on a C18 column. For labeling of peptides at an N-terminal lysine, peptides were cleaved from the resin and side chain deprotected with modified reagent K, followed by cold diethyl ether trituration (as before). The crude peptide was then dissolved in 20 µL DMSO, containing 5 equiv FITC and 10 equiv DIPEA. The reaction ran for 30 min in the dark (room temperature) and was quenched by addition of 10 µL of

1M Tris-HCl (pH 8.3). The peptides were then purified by HPLC as before. Peptides were labeled at the N-terminus with biotin by the addition of a 150 µL solution of DMSO containing 10 equiv biotin-NHS and 11 equiv DIPEA to 15 mg resin. The peptide resin was reacted with biotin-NHS for 1 h at 37 °C, then transferred to a 0.8 mL Bio-Rad

Micro-Spin column and washed with DMF and DCM. The peptides were cleaved off the resin by modified reagent K treatment, triturated with cold diethyl ether, and further purified by HPLC (as before). The identity of each peptide was confirmed by MALDI-

TOF mass spectrometry. For peptides labeled with FITC, peptide concentration was determined by absorbance at 495 nm. All biotin-peptides contained a single tyrosine residue, allowing for concentration determination by absorbance at 280 nm.

149

5.2.7 Fluorescence Polarization Assays

The instrument, equation, and procedures used to determine the dissociation constants of various BRCT domains to FITC-labeled peptides were much the same as described previously (49). The buffers used to dilute the BRCT domains depended on which buffer the protein was stored in after protein purification. The following buffers were used for each BRCT domain: 30 mM HEPES, 150 mM NaCl, pH 7.4 (for GST-

TopBP1 BRCT 7-8 and ANKRD32 BRCT2), 20 mM HEPES, 200 mM NaCl, pH 7.5 (for

Nibrin FHA-BRCT2 and R28A Nibrin FHA-BRCT2), 20 mM HEPES, 300 mM NaCl, pH

8 (for XRCC1 BRCT2), 20 mM HEPES, 300 mM NaCl, 125 mM imidazole, pH 8

(TopBP1 BRCT 1-2 and TopBP1 BRCT 4-5), and 50 mM borate, 300 mM NaCl, pH 9 for GST-BARD1 BRCT2. For PTIP BRCT 5-6, the protein was stored in 20 mM

HEPES, 150 mM NaCl, pH 8 but had imidazole and EDTA added to produce a solution of 20 mM HEPES, 150 mM NaCl, 150 mM imidazole, 10 mM EDTA, pH 8. Serial dilutions of PTIP BRCT 5-6 were carried out in the same buffer. Peptides for PTIP

BRCT 5-6 were diluted in ddH2O with 1% BSA plus 100 mM imidazole and 10 mM

EDTA. Imidazole and EDTA were used for fluorescence polarization assays of PTIP

BRCT 5-6 to chelate any metals which might otherwise bind the dual phosphoserine or phosphothreonine motifs on the peptides and therefore potentially prevent them from binding to PTIP BRCT 5-6. See Figure 5.3 for a representative fluorescence polarization binding curve.

5.2.8 SMALI Analysis

A position-weighted matrix was constructed to sort the binding sequences of the

BRCT domains of BRCA1, MDC1, MCPH1, PTIP BRCT 5-6, ANKRD32, BARD1, 150

TopBP1 (repeats 1-2, 4-5, and 7-8), XRCC1, and Nibrin FHA-BRCT2 using scoring matrix-assisted ligand identification (SMALI) (224) (see Tables 5.1, 5.2, 5.3, 5.5, 5.9,

5.11, 5.14, 5.19, 5.21, 5.23, 5.26, and 5.28). The SMALI calculations were performed as previously described for sequences from the C-terminal pS/pT library (306). It was necessary to modify the SMALI analysis for sequences from the X7 library because there was not an equal abundance of amino acids at each random position. For the X5 random position, where pS or pT were 9.5 times more abundant than the other amino acids, the number of pS or pT residues found at the X5 position was divided by 9.5 to correct for their higher likelihood of being selected. For the other random positions, amino acids other than pS or pT were 5.16 times more likely to be selected, so it was necessary to multiply the number of times pS and pT were selected by 5.16 to correct for this difference. Afterwards, a SMALI score was calculated for each sequence. A higher

SMALI score indicates that a peptide sequence binds with higher affinity to the query

BRCT domain.

SMALI analsis was not performed for the sequences selected by the following tandem BRCT domains: ECT2, TP53BP1, DNA Ligase IV, R28A Nibrin FHA-BRCT2, and PTIP BRCT 3-4 (screens of these domains were considered unsuccessful). Also,

SMALI analysis was not performed on the sequences selected by the BRCT domains of

MDC1, MCPH1, BARD1, TopBP1 BRCT 1-2, TopBP1 BRCT 4-5, XRCC1, and

ANKRD32 against the X7 library, as these screens were also considered failures. Also not subjected to SMALI analysis were the C-terminal sequences selected by PTIP BRCT

5-6 and WT Nibrin FHA-BRCT2 because the screens were considered less successful than the screens of these domains against the X7 library. The sequences selected from 151

screens of these domains (without the SMALI analysis) can be found in Tables 5.4, 5.6,

5.7, 5.8, 5.10, 5.12, 5.13, 5.15, 5.16, 5.17, 5.18, 5.20, 5.22, 5.24, 5.25, 5.27, and 5.29.

5.2.9 Cell Culture and Transfection of PTIP Plasmid

HEK293 cells were grown in two 150 x 25 mm cell culture dishes at 37°C in 5%

CO2 (in DMEM media + 10% FBS). Once cells reached 90% confluency, the media was changed to DMEM + 2% FBS. Afterwards, 87 µg of PTIP plasmid (in pCMV-Sport6 vector, Open Biosystems clone ID 4509264, residues # 313-1069) was transiently transfected into the cells using Lipofectamine 2000/Opti-MEM reduced serum media by following the manufacturer’s instructions (Invitrogen).

5.2.10 Peptide Pull-Down Assay

After 48 hours, cells were dissociated from the cultured dishes by treatment with

0.25% Trypsin-EDTA. Cells were pelleted at 100g for 5 minutes, resuspended in 1 mL

PBS buffer, transferred to a 2.0 mL microcentrifuge tube and re-pelleted at 500g for 3 minutes. Cells were lysed by resuspension in 1.5 mL of buffer containing 50 mM

HEPES, 150 mM NaCl, 10 mM EDTA, 150 mM imidazole, 10 mM NaF, 23 µM leupeptin, 10 µg/mL soybean trypsin inhibitor, 1% Triton X-100, pH 7.5. Cells were incubated in lysis buffer for 20 minutes at 4°C. Afterwards, the nuclei were removed by centrifugation at 14,000 rpm for 10 minutes. The supernatant was then diluted to a final volume of 30 mL in 50 mM HEPES, 150 mM NaCl, 10 mM EDTA, 150 mM imidazole,

10 mM NaF, 23 µM leupeptin, 10 µg/mL soybean trypsin inhibitor, pH 7.5. An equal volume (6 mL) of lysate was added to 5 different streptavidin-agarose columns. The columns were 0.8 mL Bio-Rad Micro Bio-Spin columns containing either 150 µL of blank streptavidin-agarose beads or 150 µL of streptavidin-agarose beads presenting 12 152

nmol biotin-peptide. After addition of the lysate to each column, the beads were washed with 1 mL of buffer containing 50 mM HEPES, 150 mM NaCl, 10 mM EDTA, 150 mM imidazole, 0.05% Triton X-100, pH 7.5. Protein was eluted from the columns by first capping the bottom of the columns with Bio-Rad end caps (which contained Parafilm plastic inside the caps to prevent leakage). To each column, 50 µL of 2X SDS-PAGE sample loading buffer (120 mM Tris (pH 6.8), 10% 2-mercaptoethanol, 4% SDS, 20% glycerol, 2% bromophenol blue) was added. The columns were partially submerged in a boiling water bath for 30 seconds to help dissociated protein from the beads. The protein was then collected by removing the end caps, inserting the columns into 1.5 mL microcentrifuge tubes, and briefly centrifuging the columns/tubes so that the protein solution would transfer from the spin columns into the microcentrifuge tubes.

In order to detect for the presence of PTIP in the protein samples, each tube was first heated in a boiling water bath for 5 minutes, followed by separation of the sample on a 10% SDS-PAGE gel. Protein was then transferred to a nitrocellulose membrane, which was then incubated with rabbit anti-PTIP polyclonal antibody overnight at 4°C. The next day, the blot was washed and incubated with goat anti-rabbit IgG-horseradish peroxidase conjugate for 1 hour at room temperature. The blot was again washed, followed by addition of a 1:1 mixture of Luminol/enhancer and peroxide buffer solution. The membrane was exposed to Kodak Biomax film and developed using a Konica Minolta

SRX-101A film processor.

153

5.3 Results

5.3.1 Library Construction and Screening

It was previously reported that some tandem BRCT domains have a significantly higher binding affinity for phosphopeptides with free C-termini after the +3 position

(relative to pS or pT) compared to the same peptide sequences lacking a free C-terminus

(84, 86, 90, 91). A reduced density phosphopeptide (phosphoserine or phosphothreonine

(pS/pT)) OBOC peptide library was synthesized (Figure 5.2a) in the form of resin-

MLLBBE’AAX6X5X4X3X2X1-COOH (this peptide is present on the surface of the beads,

B = β-alanine, E’ = modified glutamic acid, X4 is either pS or pT). Each bead displays a peptide with a free C-terminus on its surface which can interact with BRCT domains, while containing the same peptide in the normal orientation (free N-terminus) in its interior as an encoding tag. The BRCT domains cannot enter the interior of the bead and therefore only interact with the inverted peptides on the bead surface. The library contained six random positions, with one position containing a 1:1 ratio of pS and pT

(50% pS and 50% pT), while the rest of the random positions contained 18 proteinogenic amino acids plus L-norleucine (Met replacement) and L-α-aminobutyrate (Cys replacement). The theoretical diversity of the pS/pT library is 6.4 x 106. The BRCT domains were biotinylated on a surface lysine residue and screened against the peptide library using the SA-AP/ BCIP method (284). Briefly, biotin-BRCT domains bound to individual beads of the peptide library recruit SA-AP to the surface of the bead, which removes the phosphate of the BCIP molecule. Upon dephosphorylation, the molecule oxidizes into a hydrophobic dimer, which is attracted to the hydrophobic core of the

154

Tentagel resin. The result is the formation of turquoise colored beads which represent positive hits from the screen. Positive beads are then picked and sequenced by PED-MS.

Although some BRCT domains require a free C-terminus for optimal binding

(such as MDC1 BRCT2 (85), BRCA1 BRCT2 (91), MCPH1 BRCT2 (86), and PTIP

BRCT 5-6 (90), other tandem BRCT domains are also capable of interacting with internal phosphopeptide motifs in proteins (82, 111, 112). Furthermore, it was reported that tandem BRCT domains of PTIP (BRCT 3-4/5-6), Nibrin, TopBP1 (BRCT 1-2), and

ANKRD32 may interact with dual pS/pT motifs on their protein interaction partners (76,

307-310). In order to better probe the binding specificity of tandem BRCT domains that may bind internal or dual phosphopeptide motifs, a reduced density X7 library was constructed (Figure 5.2b) in the form of resin-MRRBBNX1X2X3X4X5X6X7A-NH-Alloc

(this peptide was present on both the surface and interior of each bead, Alloc = allyloxycarbonyl) which contained 18 proteinogenic amino acids, L-norleucine (Met replacement), pS, and pT in the random positions (cysteine was excluded from the library). The library was designed in such a manner that the majority of the peptide sequences contained at least one pS or pT residue, while a small fraction of the sequences contained two or more pS/pT residues. One position (X5) contained 25% pS, 25% pT, and 50% of the other amino acids, while the remaining random positions contained 1% pS, 1% pT, and 98% of the remaining amino acids. The theoretical diversity of the

9 library is 1.8 x 10 . The X7 library was screened against biotin-BRCT domains in the same manner as the C-terminal pS/pT library.

155

5.3.2 Sequence Specificity of the Tandem BRCT Domains of BRCA1, MDC1, and MCPH1

The binding specificity of BRCA1, MDC1, and MCPH1 (along with PTIP BRCT

5-6 and Nibrin) are summarized in Table 5.30. The binding specificity of BRCA1

BRCT2 has been previously examined using both oriented phosphopeptide libraries and amino acids scans on phosphopeptide arrays (83, 92). Neither library presented a free C- terminus after the +3 position relative to pS or pT. The oriented peptide library screens produced a binding consensus of pS/pT-Q-(V/T/I)-(F/Y)-X-F (79) and pS-(F/Y)-

(V/F/I)-(F/Y) (92), while the phosphopeptide amino acid scans showed a consensus motif of pS-X-X-(F/Y) (83) and pS-X-(T/V)-F-X-K (92), where X represents any amino acid and bold font indicates a position with fixed residues. All library screens showed that

BRCA1 BRCT2 has a clear preference for phenylalanine over tyrosine at the +3 position.

Neither study reported any significant binding selection for residues N-terminal to phosphoserine. BRCA1 BRCT2 screens against the X7 library revealed a binding selectivity similar to those reported by other groups (Figure 5.4), with a consensus of pS-

(H/Y)-(V/T)-(F/Y). Screening against the C-terminal pS/pT library also showed a similar selection of residues, with a consensus of pS-(R/K)-(R/K)-(F/Y) (Figure 5.5). The selection for R/K at the +1 and +2 positions, rather than for H/Y and V/T, seemed to be the only significant difference between the C-terminal library consensus sequence and the

X7 library consensus sequence. The results suggest that the presence of a free C-terminus after the +3 position is not required for binding of BRCA1 BRCT2 to phosphopeptides.

This is consistent with a previous study on the requirement for free C-termini after the +3 position for peptides that bind BRCT domains (91). Both library results showed that

156

BRCA1 BRCT2 has an overwhelming preference for pS over pT, along with a clear selection for F over Y at the +3 position.

Screening of MDC1 BRCT2 against the C-terminal pS/pT library showed a consensus of V-H-pS-(R/K)-W-(Y/F/W) (Figure 5.6). Again, a large selection of pS over pT was observed, along with a preference for tyrosine over phenylalanine and tryptophan at the +3 position. These results are similar to the results of a previous study of the binding specificity of MDC1 BRCT2 using a C-terminal phosphoserine oriented peptide library, which reported a consensus of pS-(I/P/V)-(E/I/V)-Y (93). A study of the binding specificity of MDC1 BRCT2 against an oriented phosphopeptide library without a free C- terminus showed a consensus of pS-I-(E/D/V)-(Y/F) (92). In contrast, MDC1 BRCT2 screened against the X7 library showed no obvious binding specificity (Figure 5.7). This is consistent with previous studies showing a strong preference of MDC1 BRCT2 for a free C-terminus after the +3 position relative to pS (91, 93).

MCPH1 BRCT2 screens against the C-terminal pS/pT library showed a consensus of R-pS-X-W-(Y/W/F) (Figure 5.8), where pS was heavily preferred over pT and tyrosine was the preferred aromatic residue at the +3 position. The sequence selectivity is similar to that reported previously (94), where MCPH1 BRCT2 was screened against a C- terminal phosphoserine oriented peptide library, and showed a consensus of pS-X-X-Y.

There seemed to be a significant selection of tryptophan at both the +2 and +3 positions, which was not observed in the previously reported study. MCPH1 BRCT2 was also screened against the X7 library, but no clear binding specificity was observed (Figure

5.9). Like MDC1 BRCT2, it was previously reported that the BRCT domains of MCPH1 have a strong preference for a free C-terminus after the +3 position (86, 94). 157

5.3.3 Sequence Specificity of the Tandem BRCT Domains of PTIP

PTIP contains three tandem BRCT domains (repeats 1-2, 3-4, and 5-6) (314).

Screens of PTIP BRCT 1-2 against either the C-terminal or X7 pS/pT libraries either failed to result in positive hits (no blue beads) or led to poor contrast between positive hits and the rest of the beads in the library (too many blue beads). PTIP BRCT 3-4 screened against the C-terminal pS/pT library showed a selection of pS over pT, along with H/R residues at the other random positions (Figure 5.10). There was not a selection for aliphatic or aromatic residues C-terminal to pS/pT, as seen with the BRCT domains of

BRCA1, MDC1, and MCPH1. The abundance of basic residues in the selected sequences may have been caused by nonspecific charge-charge interactions between

PTIP BRCT 3-4 and the library beads, a phenomenon seen before by our group for screens of other types of domains (186, 187). When PTIP BRCT 3-4 was screened against the X7 library, no significant contrast was seen between positive hits and background beads.

In contrast to the first two tandem BRCT domains of PTIP, hits from screens of

PTIP BRCT 5-6 against both the C-terminal and X7 libraries were more similar to the sequences selected from screens of BRCA1, MDC1, and MCPH1. Sequences selected from the C-terminal library showed a consensus of R-pS-(R/K)-(R/K)-(F/W) (Figure

5.11). Sequences selected from the X7 library which contained only a single pS or pT followed a similar pattern, with a consensus of (R/Y)-R-pS-X-V-(F/M)-H (Figure 5.12).

These patterns are similar to those reported in a previous study of the binding specificity of PTIP BRCT 5-6, which used an oriented phosphopeptide library and a phosphopeptide amino acid scan (79). The reported consensus was pS/pT-Q-V-(F/L/I) (for the oriented 158

phosphopeptide library, fixed residues are in bold font) and pS-X-X-(F/I/L/V/Y) (for the phosphopeptide amino acid scan). It should be noted that unlike the BRCT repeats of

BRCA1, MDC1, and MCPH1, PTIP BRCT 5-6 selected for a broader range of hydrophobic amino acids at the +3 position. Both C-terminal and X7 library results show that aliphatic residues L/I/V were selected almost as frequently as aromatic residues. The similar consensus sequences selected from the C-terminal and X7 libraries suggests that like BRCA1 BRCT2, PTIP BRCT 5-6 does not require a free C-terminus for respectable binding to phosphopeptides. This is consistent with a previously reported peptide binding study of PTIP BRCT 5-6 with an internal phosphopeptide motif, where the

BRCT domains of PTIP bound to the peptide with a binding constant (KD) of 280 nM

(83).

A second class of sequences that emerged from screens of PTIP BRCT 5-6 against the X7 library were of the consensus pS-X-pS-(F/L)-X-X-H (Figure 5.13).

Although it has been reported that the BRCT domains of PTIP can interact with a dual pS motif in TP53BP1 (at phospho-Ser25 and phospho-Ser29) (307), the authors of the study found that only phosphorylation of serine-25 was required for the interaction. The study also found that both C-terminal tandem BRCT domains of PTIP (repeats 3-4 and 5-6) were required for this interaction. The dual pS peptide tested for binding in the study was

DTPCLIIEDpSQPEpSQVLEDD (where the 1st pS represents phospho-Ser25 and the 2nd pS represents phospho-Ser29 of TP53BP1). The peptide used by the authors does not conform to the consensus motif of the dual pS peptide selected from the X7 library screens of PTIP BRCT 5-6. Peptides of the consensus pS-X-pS-(F/L)-X-X-H therefore represent a novel binding motif for the 5th and 6th BRCT domains of PTIP. 159

5.3.4 Sequence Specificity of Nibrin FHA-BRCT2

Nibrin contains an N-terminal FHA domain, followed by tandem BRCT domains

(315). Based on the crystal structure of the N-terminal region of Nibrin from S. pombe, along with sequence analysis of Nibrin across several species of eukaryotes, it is believed that the FHA-BRCT1-BRCT2 domains form a single structural and functional unit (76,

316). Screens of Nibrin FHA-BRCT2 against the X7 library revealed a binding consensus of (D/E)-(D/E)-pT-X-I (Figure 5.14), which is similar to the results of the C-terminal library screens of Nibrin (D/E)-I-pT-X-(M/I)-(H/R) (Figure 5.15). The success of the screens against the X7 library suggest that the FHA-BRCT2 domains do not require a free

C-terminus for ligand recognition. Both screens showed an overwhelming selection of pT over pS, which suggested that the FHA domain may be primarily responsible for the binding specificity. To determine whether the BRCT repeat of Nibrin could select its own class of sequences, the R28A FHA mutant version of Nibrin FHA-BRCT2 was also screened against both the X7 and C-terminal libraries. The X7 library results showed that the R28A Nibrin mutant mainly associated with basic sequences and did not select phosphoserine-containing peptides (Figure 5.16). The lack of selection of phosphoserine- containing sequences, along with the selection of sequences rich in arginine and lysine indicates that the BRCT domains of Nibrin do not bind phosphopeptides. Arginine and lysine were most likely selected from the screens of the BRCT domains because of nonspecific charge-charge interactions between the protein and positively-charged library beads, in a similar manner to PTIP BRCT 3-4. Screens of the C-terminal library with the mutant showed a similar pattern with the selection of basic residues, although there was apparently some selection for pS over pT (Figure 5.17). The positively charged residues 160

flanking pS/pT in selected sequences indicates that binding to the C-terminal beads was mainly non-specific in nature. Overall, the screening results suggest that the FHA domain of Nibrin is either wholly or mostly responsible for its binding specificity to the

(D/E)-(D/E)-pT-X-I consensus sequence and the BRCT domains do not strongly associate with their own type of peptide sequence.

It was previously reported that the Nibrin FHA-BRCT2 domains were capable of interacting with diphosphorylated pSDpTD motifs in proteins such as MDC1 (76, 115,

317). Furthermore, it was shown that both the FHA and BRCT domains of Nibrin could interact separately with diphosphorylated pSDpTD or pSDpSD motifs using in vitro phosphopeptide binding studies (76). The consensus binding motif of (D/E)-(D/E)-pT-X-

I selected from screens of the X7 library with wild-type Nibrin FHA-BRCT2 is similar to the pSDpTD repeats found in MDC1. Figure 5.14 shows that pS is selected with a moderate frequency at the -2 position (relative to pT). The fact that the pS bar is lower than the D/E bars at the -2 position does not mean D/E residues are preferred over pS, due to the fact that pS was approximately five-fold less abundant at that position in the library. The screening data therefore is consistent with the FHA-BRCT2 domains of

Nibrin interacting with the pSDpTD motifs of MDC1. The fact that the R28A FHA

Nibrin mutant screens failed to show any obvious consensus sequence is contrary to the model of the Nibrin BRCT repeat separately interacting with dual pS/pT motifs, although it is possible that no consensus sequence was found from screens because their binding affinity for dual pS/pT peptide sequences is too weak to detect using our screening methodology. The selection of at the +2 position (or norleucine and isoleucine for the C-terminal library) indicates that the FHA domain of Nibrin interacts with 161

proteins other than MDC1, as MDC1 does not contain any pSDpTD motifs with isoleucine or methionine at the +2 position relative to pT.

5.3.5 Sequence Specificity of the BRCT Domains of ANKRD32, BARD1, TopBP1 BRCT 1-2, TopBP1 BRCT 4-5, TopBP1 BRCT 7-8, and XRCC1

Screens of the remaining tandem BRCT domains seemed to show less binding selectivity than those of BRCA1, MDC1, MCPH1, PTIP BRCT 5-6, and WT Nibrin

FHA-BRCT2. Screens of the BRCT repeats of ANKRD32, BARD1, TopBP1 (repeats 1-

2 and 4-5), and XRCC1 against the X7 library showed either no obvious binding selectivity or the selection of positively charged amino acids at all the random positions

(see Figures 5.18-5.22). TopBP1 BRCT 7-8 screened against the X7 library failed to generate blue beads distinguishable from the rest of the library beads. Results of the screens of the C-terminal library followed the same trend, except that all domains selected more pS than pT at the X4 position (Figures 5.23-5.28). Careful analysis of the selected C-terminal sequences showed that the BRCT repeats selected for three general groups of sequences (summarized in Table 5.31). The BRCT tandem domains selected for group I sequences (X-X-(pS/pT)-X-X-ψ-COOH for ANKRD32, BARD1, TopBP1 repeat 4-5, and XRCC1, X-X-(pS/pT)-X-(H/R/K)-ψ-COOH for TopBP1 repeat 1-2, and

X-X-(pS)-X-ψ-ψ-COOH for TopBP1 repeat 7-8, where X represents any amino acid and

ψ represents hydrophobic/aromatic amino acids). Group II sequences showed a consensus of X-X-(pS/pT)-ψ-ψ-X-COOH (for ANKRD32 and XRCC1), X-X-(pS/pT)-

X-ψ-H-COOH (for BARD1), and X-X-(pS/pT)-X-ψ-(H/R/K)-COOH for all BRCT repeats of TopBP1. All remaining sequences were collectively classified as group III

162

(see SMALI tables 5.14, 5.19, 5.21, 5.23, 5.26, and 5.28 for a complete lists of sequences in each group).

Certain C-terminal sequences selected from screens of the BRCT domains of

ANKRD32, BARD1, TopBP1 BRCT 1-2, TopBP1 BRCT 4-5, TopBP1 BRCT 7-8, and

XRCC1 which were representative of peptides from groups I, II, and III were resynthesized and tested for binding to each domain via fluorescence polarization assays

(see Table 5.32). Although the binding affinity between the selected peptides and the

BRCT domains was too weak to determine a dissociation constant (KD) for most of the domains, it was found that ANKRD32 BRCT2 bound to a group I peptide (FITC-

BBDNpSYHY-COOH, B = β-alanine, FITC = fluorescein isothiocyanate) with a KD =

14.2 ± 1.4 μM. TopBP1 BRCT 7-8 bound with weaker affinity to a group II peptide

(FITC-BBFRpSLMH-COOH (M = L-norleucine), KD = 46 ± 17 μM). The BRCT repeats of TopBP1 (repeat 4-5) and XRCC1 did not bind with any significant affinity towards either type of peptide. None of the BRCT repeats bound with any significant affinity towards the group III peptide (FITC-BBRRpSKQR-COOH), suggesting that peptides within group III may largely be false positives, presumably selected as a result of non- specific charge-charge interactions. Overall, the screening and fluorescence polarization assay data suggests that the BRCT repeats of ANKRD32, BARD1, TopBP1 BRCT 1-2, and TopBP1 BRCT 7-8 bind selectively (albeit with somewhat weak affinity) to sequences falling within groups I and II. It is not clear whether TopBP1 BRCT 4-5 and

XRCC1 BRCT2 actually prefer peptides in groups I or II because they selected those types of sequences from library screens but failed to bind with any affinity to them in fluorescence polarization binding assays. It should be noted that TopBP1 was reported to 163

contain an additional BRCT domain at its N-terminus and that it exists as a triple-BRCT domain structure (75). This BRCT domain was included in the TopBP1 BRCT 1-2 clone.

5.3.6 Sequence Specificity of the BRCT Domains of ECT2, DNA Ligase IV, and TP53BP1

The screens of the BRCT tandem domains of DNA Ligase IV and TP53BP1 against the X7 library showed very little binding specificity (other than for pS by DNA

5 Ligase IV at the X position, see Figures 5.29 and 5.30). ECT2 BRCT2 screened against the X7 library failed to produce beads that were distinctly blue compared to the other beads of the library. When screened against the C-terminal library, the BRCT domains of

ECT2, DNA Ligase IV, and TP53BP1 selected for sequences rich in basic amino acids, along for pS over pT at the X4 position (Figures 5.31-5.33). Like for screens of PTIP

BRCT 3-4 and R28A Nibrin FHA-BRCT2, the positively charged sequences were most likely selected due to nonspecific charge-charge interactions between the BRCT domains and the library beads. Unlike the BRCT repeats of ANKRD32, BARD1, TopBP1

(repeats 1-2, 4-5, and 7-8), and XRCC1, the BRCT tandems of ECT2, DNA Ligase IV, and TP53BP1 did not select for sequences that fell into general consensus groups.

5.3.7 Fluorescence Polarization Binding Assay for PTIP BRCT 5-6 and Nibrin FHA-BRCT2

A dual phosphoserine peptide selected from an X7 library screen of PTIP BRCT

5-6 was synthesized and tested for binding by fluorescence polarization assay. The peptide, Ac-ApSEpSFHAFNK-FITC (Ac = acetyl group), bound with a KD of 1.7 ± 0.2

µM (Table 5.33, peptide 1). Each pS of the peptide was replaced with a glutamic acid residue and also tested for binding to PTIP BRCT 5-6. Ac-ApSEEFHAFNK-FITC

(peptide 2) bound with a KD of 13.0 ± 2.0 µM, which is nearly an 8-fold decrease in 164

binding affinity compared to peptide 1. The binding affinity of Ac-AEEpSFHAFNK-

FITC (peptide 3) towards PTIP BRCT 5-6 was too weak to determine. The binding data suggests that PTIP BRCT 5-6 can simultaneously and specifically interact with two pS residues. Furthermore, the first pS residue in the dual pS sequence is the more important for binding than the second pS residue. PTIP BRCT 5-6 was previously reported to recognize the C-terminus of γH2AX and bind with high affinity to a peptide derived from this protein (KKATQApSQEY-COOH) (90). It was found that amidation of the free C- terminus of the peptide reduced binding to PTIP BRCT 5-6 by approximately 14-fold. A truncated version of peptide 1 (FITC-BBApSEpSF-COOH, peptide 4), which presented a

st free C-terminus after the +3 position (relative to the 1 pS), bound with a KD of 4.3 ± 0.5

µM. The difference between the KD of peptides 1 and 4 was less than 3-fold, which suggests that a free C-terminus is no longer required for high-affinity binding by PTIP

BRCT 5-6 when a pS is present at the pS +2 position.

A literature search of known protein binding partners of PTIP which also may contain dual pS motifs similar to the consensus pS-X-pS-(F/L)-X-X-H found from library screens of PTIP BRCT 5-6 was conducted. It was reported that TP53BP1 bound to PTIP

BRCT 5-6 in a phosphorylation dependent manner (83). TP53BP1 contains 8 sequences of the consensus (S/T)-X-(S/T)-(F/I/L/M). Eight peptides, with the S/T residues replaced with pS/pT residues, were tested for binding to PTIP BRCT 5-6. Two of the eight bound with moderate affinity (peptides 6 and 7 in Table 5.33, corresponding to TP53BP1 residues 479-488 (for peptide 6) and residues 954-963 (for peptide 7). The rest of the peptides bound with affinities too weak to obtain a KD value. Although peptides 6 and 7 bound with weaker affinity than the hit from the X7 library (peptide 1), they bound with 165

similar affinities to peptide 5, a known physiologically relevant ligand derived from the

C-terminus of γH2AX. Overall, the fluorescence polarization data for PTIP BRCT 5-6 demonstrate that the BRCT repeat does not necessarily require a free C-terminus after the

+3 position to bind with high affinity to peptide ligands as long as they contain a motif identical or similar to the dual pS binding consensus pS-X-pS-(F/L)-X-X-H. The data also shows that PTIP BRCT 5-6 may interact with TP53BP1 at residues 479-488 or 954-

963 (or on some other protein) via a dual pS motif.

A peptide sequence selected from an X7 pS/pT library screen of Nibrin FHA-

BRCT2 was synthesized and tested for binding (peptide 8, Table 5.33). Replacement of the aspartic acid residue at the -2 position (relative to pT) with a pS produced an almost identical binding affinity (peptide 9). To determine whether the BRCT tandem of Nibrin contributes to binding the peptides, the fluorescence polarization assay was repeated with the R28A FHA Nibrin mutant. The binding affinity of the mutant towards peptides 8 and

9 was too weak to determine. The fact that there was little difference between the binding affinities of peptides 8 and 9 towards wild-type Nibrin indicates that its BRCT repeat was not responsible for the selection of pS at the -2 position from the X7 library screens. If the BRCT repeat was indeed responsible for the selection of pS, it would be expected to effectively distinguish between aspartic acid and phosphoserine in the peptide binding experiment. The lack of any significant binding affinity of the R28A

FHA mutant towards peptides 8 and 9 suggests that the FHA domain, and not the BRCT repeat, is either wholly or mainly responsible for the affinity of Nibrin towards the selected peptides. This finding is in agreement with the library screening data, where the

R28A FHA mutant failed to select for any obvious consensus sequence. 166

5.3.8 In Vitro Pull-Down Assay

In order to determine whether the dual phosphoserine motif found from peptide library screens of PTIP BRCT 5-6 can precipitate PTIP from cell lysate, a peptide pull- down assay of PTIP was performed with two peptides containing the consensus pS-X-pS-

(F/L) binding motif, along with two control peptides derived from the C-terminus of

γH2AX. Cell lysate from HEK293 cells transiently transfected with PTIP (residues 313-

1069, which encompasses the 3-4 and 5-6 BRCT repeats of PTIP, along with a -rich region (314)) was passed through streptavidin-agarose columns containing various immobilized biotin-peptides. Bound protein was eluted from columns and detected by Western blot (Figure 5.34). The two negative controls, which included resin without peptide (lane 1) and resin with bound biotin-(minipeg)2-BBQApSQEY-NH2 (lane

2) failed to bind PTIP. This result is consistent with PTIP BRCT 5-6 having a low binding affinity towards the γH2AX peptide with an amidated C-terminus (90). As a positive control, PTIP was precipitated by biotin-(minipeg)2-BBQApSQEY-COOH (lane

3). This is consistent with a previous study, which demonstrated that PTIP BRCT 5-6 recognizes the γH2AX C-terminus (90). Biotin-(minipeg)2-YBBApSEpSFHAFNK-NH2

(from an X7 library hit from screens of PTIP BRCT 5-6, lane 4) and biotin-(minipeg)2-

YBBDMHpSSpSLTVE-NH2 (from residues 479-488 of TP53BP1, lane 5) both bound

PTIP. Consistent with the fluorescence polarization data, biotin-(minipeg)2-

YBBApSEpSFHAFNK-NH2 was significantly better at precipitating PTIP than either biotin-(minipeg)2-BBQApSQEY-COOH or biotin-(minipeg)2-YBBDMHpS-

SpSLTVE-NH2. Overall, the data shows that a near full-length version of PTIP can interact with peptides containing the dual phosphoserine motif found from peptide library 167

screens of PTIP BRCT 5-6 with equal or greater affinity than a previously established binding peptide containing only a single phosphoserine. The data also shows that PTIP may interact with TP53BP1 at residues 479-488 if both serines 482 and 484 are phosphorylated.

5.4 Discussion

We have probed the binding specificity of all known tandem BRCT domains found in the human proteome (although certain BRCT repeats were of mouse origin, their sequences were highly similar to their human orthologs). BRCT repeats considered to have truncated or degenerated motifs were not included (74). Overall, 4 of the 16 BRCT repeats gave well-defined consensus binding sequences (those of BRCA1, MDC1,

MCPH1, and PTIP (BRCT 5-6)). The general consensus for the BRCT tandem domains of BRCA1, MDC1, and MCPH1 was pS-X-X-(F/Y/W), where selectivity for residues N- terminal to pS and at the +1/+2 (and +4 position for the X7 library) was generally lower than the selectivity for pS over pT and for aromatic residues at the +3 position. The domains showed subtle differences in their selection of residues at positions other than the +3 position, along with differences in their selection for aromatic residues at the +3 position and their requirement for a free C-terminus after the +3 position. PTIP BRCT 5-

6 selected for sequences with both one and two phosphoserines. The sequences selected from screens of either the X7 or C-terminal library containing only one pS residue were fairly similar to those selected from the screens of the BRCT repeats of BRCA1, MDC1, and MCPH1, although PTIP BRCT 5-6 selected for both aliphatic and aromatic residues at the +3 position, rather than primarily aromatic residues. PTIP BRCT 5-6 also displayed a unique dual phosphoserine binding mode, of the consensus pS-X-pS-(F/L)-X- 168

X-H. This represents a novel type of binding motif for the C-terminal BRCT repeat of

PTIP. Other tandem BRCT domains have recently been reported to interact with motifs presenting dual phosphoserines, or phosphoserine along with a phosphothreonine or phosphotyrosine residue. These include the BRCT repeats of MCPH1 (94), TopBP1

(BRCT 1-2) (309, 310), ANKRD32 (308), and the single BRCT domain of TopBP1

(BRCT 5) (318). The interaction of BRCT domains with dual phosphoamino acid- containing peptides is therefore emerging as a novel mode of ligand recognition by these domains. PTIP BRCT 5-6, like BRCA1 BRCT2, did not require a free C-terminus after the +3 position to show an obvious binding consensus motif. The binding specificity profiles of the BRCT repeats of BRCA1, MDC1, MCPH1, and PTIP (BRCT 5-6) are largely consistent with profiles previously determined by other groups (except for the dual pS binding motif discovered for PTIP BRCT 5-6), along with their differing requirements for free C-termini after the +3 position.

Screens of the Nibrin FHA-BRCT1-BRCT2 domains showed a good consensus as well, even though the FHA domain was most likely responsible for the observed screening results. Although in this study the FHA domain of Nibrin seemed to be responsible for the selection of phosphopeptide sequences from screens, it is possible that the BRCT repeat can separately bind its own class of phosphopeptides. If the BRCT repeat binds weakly to its cognate ligands, it could still effectively interact with them, so long as the FHA domain (or some other section of Nibrin) is already bound to a protein containing the BRCT recognition motif. The selection of acidic residues at the -1 and -2 positions (relative to pT), the overwhelming selection of pT, and the lack of selection of aspartic acid at the +1 position in X7 library screens of the FHA-BRCT1-BRCT2 domains 169

(Figure 5.14) suggests that Nibrin recognizes the MDC1 pSDpTD repeats primarily at the pS-D-pT residues (and not at the D residue immediately C-terminal to pT). Future studies may find other proteins with this motif recognized by the FHA domain of Nibrin.

Binding partners may contain an isoleucine (or other aliphatic residue) at the +2 position, as this was selected by Nibrin in both the X7 and C-terminal library screens.

Screens of the BRCT repeats of ANKRD32, BARD1, TopBP1 (repeats 1-2, 4-5, and 7-8), and XRCC1 did not produce binding consensus motifs as well-defined as those of BRCA1, MDC1, MCPH1, PTIP BRCT 5-6, and WT Nibrin FHA-BRCT1-BRCT2.

The plots of the sequences of these domains include those rich in basic residues (arginine/

/lysine/histidine), which have previously been shown by our group to be associated with the nonspecific association of proteins with peptide library beads (186). Due to the presence of likely false positives within the data, it was necessary to sort the binding sequences into various groups based upon observed patterns. The presence of certain types of amino acids (such as hydrophobic or aromatic residues) at specific positions

(relative to phosphoserine or phosphothreonine) can be used to sort the binding sequences into various classes, followed by resynthesis of representative peptides from each class and determination of the dissociation constant (KD) between each peptide and the domain being studied. This allows for the distinction between the preferred ligands of a domain and false positives, a strategy employed by our lab in previous studies (49, 187, 306).

Upon sorting the binding sequences of ANKRD32, BARD1, TopBP1 (repeats 1-2, 4-5, and 7-8), and XRCC1, it appeared that a significant number of sequences from C- terminal library screens followed particular trends, summarized in Table 5.31. The peptide binding affinity of the domains varied, with ANKRD32 and TopBP1 BRCT 7-8 170

binding to group I and group II sequences well enough to obtain a KD value and the remaining domains binding too weakly to determine the dissociation constant (Table

5.32). Although the BRCT repeats of BARD1 and TopBP1 (repeat 1-2) bound with some affinity to the group I and II peptides, the BRCT repeats of XRCC1 and TopBP1 (repeats

4-5) did not show any significant binding affinity. It therefore is questionable that

TopBP1 BRCT 4-5 and XRCC1 BRCT2 actually prefer peptides in groups I or II. The only one of these six BRCT repeat domains to have previously been screened against a peptide library was BARD1 BRCT2, which was reported to recognize an internal pS-

(D/E)-(D/E)-(E) consensus biding motif (92). This stands in contrast to the results obtained in this study, where BARD1 BRCT2 interacted with peptides of the consensus

(pS/pT)-X-X-ψ-COOH and (pS/pT)-X-ψ-H-COOH (X = any residue, ψ = aromatic/ hydrophobic residues). It was previously shown through in vitro peptide binding studies that BARD1 BRCT2 can recognize a BRIP1 phosphopeptide (ISRSTpSPTFNKQTK)

(319), which falls within the (pS/pT)-X-X-ψ-COOH consensus motif of BARD1 BRCT2.

The BRIP1 peptide lacks a free C-terminus after the +3 position, so it is not clear how important a free C-terminus is for BARD1 BRCT2 recognition of cognate ligands.

The BRCT repeats of ECT2, DNA Ligase IV, and TP53BP1 failed to select for peptides following any recognizable pattern, except for peptides with sequences rich in basic residues and/or sequences containing phosphoserine. The cause of the failure of

ECT2 BRCT2 screens is not clear, especially since it has already been demonstrated that the BRCT repeats of ECT2 recognize at least one protein in a phosphorylation dependent manner (320). It is possible that ECT2 requires a secondary binding motif in addition to the pS motif for high-affinity interaction, or that it has too low of a binding affinity for its 171

preferred ligand(s) to produce an obvious binding consensus using our screening methodology. The BRCT repeats of DNA Ligase IV and TP53BP1 have been shown by

X-ray crystal structure to recognize fairly large surfaces of their protein binding partners

(88, 300, 321), interaction interfaces that could not be covered by the relatively short peptide motifs found in either peptide library.

Future studies will hopefully determine the optimal binding motifs of the BRCT repeats whose binding specificity was not effectively characterized in this report.

Knowledge of these motifs would aid in understanding the BRCT protein-protein interaction network crucial to the DNA damage response pathway and cell cycle control.

It would be interesting to find the identity of the protein(s) which contain the dual phosphoserine motif recognized by PTIP BRCT 5-6. In this study, we showed that PTIP

BRCT 5-6 binds with moderate affinity to dual phosphoserine peptides derived from serines 482/484 and 957/959 of TP53BP1 by fluorescence anisotropy. We also show that the dual phosphoserine sequence derived from serines 482/484 of TP53BP1 precipitates

PTIP from a peptide pull-down assay. This data suggests that PTIP BRCT 5-6 may bind

TP53BP1 at one (or both) of these sites. This would, however, contradict a previous study which found that PTIP binds TP53BP1 at phosphoserine-25 (307). Other PTIP- interacting partners should therefore be considered in the search for a protein which binds

PTIP BRCT 5-6 in this manner. It would also be interesting to determine the structural basis of dual phosphoserine recognition by PTIP BRCT 5-6. A crystal structure of PTIP

BRCT 5-6 bound to a peptide derived from the C-terminus of γH2AX (KKATQApSQE-

Y-COOH) shows hydrogen bonding and charge-charge interactions between the glutamic acid sidechain of γH2AX and lysine/threonine sidechains of PTIP (K907/T909) (90). It 172

is tempting to speculate that a phosphoserine residue in the equivalent position would make better bonding interactions with these or other residues. In support of this model, substitution of the second pS of peptide 1 for glutamic acid (peptide 2 in Table 5.33) reduced binding affinity nearly 8-fold to PTIP BRCT 5-6. It is also possible that the dual phosphoserine motif has a different binding mode to the BRCT repeat of PTIP than single pS-containing peptides. Support for this possibility comes from the fact that the dual pS motif did not interact better with PTIP BRCT 5-6 when it contained a free C-terminus after the +3 position relative to the 1st pS residue (peptides 1 and 4 of Table 5.33). This stands in contrast to a previous report, which found that amidation of the free C-terminus of a γH2AX peptide (containing only one pS) reduced binding affinity to PTIP BRCT 5-6 by approximately 14-fold (90). Further studies will be required to determine exactly how

PTIP BRCT 5-6 interacts with the dual pS motif discovered in this paper.

In conclusion, this report has demonstrated that the tandem BRCT domains of

BRCA1, MDC1, MCPH1, and PTIP (BRCT 5-6) can select for well-defined consensus binding motifs from phosphopeptide library screens. The BRCT repeat domains of

ANKRD32, BARD1, TopBP1 BRCT 1-2, TopBP1 BRCT 4-5, TopBP1 BRCT 7-8, and

XRCC1 select for two types of consensus sequences, but bind with low affinity to their cognate ligands. The BRCT repeats of ECT2, DNA Ligase IV, and TP53BP1 failed to select for peptide ligands that we did not considered to be false positives. We also found that PTIP BRCT 5-6 is capable of interacting with both single and dual phosphoserine- containing motifs. Furthermore, the FHA domain of Nibrin, and not its BRCT domain tandem, appears to be responsible for its interaction with the consensus motifs found from OBOC peptide library screens in this study. Overall, eight BRCT repeats (those of 173

BRCA1, MDC1, MCPH1, PTIP (repeat 5-6), ANKRD32, BARD1, and TopBP1 (repeats

1-2 and 7-8) can recognize short phosphopeptide motifs, while the BRCT repeats of

ECT2, DNA Ligase IV, Nibrin, PTIP (repeats 1-2 and 3-4) and TP53BP1 failed to do so.

From this study, it is not clear whether the BRCT repeats of TopBP1 (repeat 4-5) and

XRCC1 interact with short phosphopeptide motifs.

5.5 Acknowledgements

Plasmids for GST-BRCA1 BRCT2, GST-MDC1 BRCT2, GST-MCPH1 BRCT2,

GST-TopBP1 BRCT 7-8 and mouse tandem BRCT domains GST-PTIP BRCT 1-2 and

GST-PTIP BRCT 3-4 were kindly provided by the lab of Dr. Junjie Chen (The University of Texas). The plasmid encoding human GST-BARD1 BRCT2 (residues 554-777) was provided by the lab of Dr. Mark Glover (University of Alberta). The plasmid DNA encoding human ECT2 BRCT2 was kindly provided in a pCMV-Myc vector from the lab of Dr. Channing Der (University of North Carolina at Chapel Hill).

174

Table 5.1 Peptides from BRCA1 BRCT2 domain screens (C-terminal library)a

Peptide SM Peptide SM Peptide SM Peptide SM Peptide SM ELpSKKF 2.64 NFpSKTF 2.51 CNpSNAF 2.39 CMpSKFY 1.76 IGpSYRL 1.15 EApSRKF 2.63 IHpSHLF 2.51 AEpSTLF 2.39 YGpSNMY 1.75 XXpSNCM 1.13 VRpSKRF 2.62 HKpSKSF 2.48 XCpSHTF 2.38 RVpSVHY 1.75 PYpSFKH 1.11 ARpSNKF 2.60 GRpSLRF 2.48 IGpSAAF 2.38 KHpSMHY 1.75 KFpSHLH 1.11 DVpSRKF 2.60 KHpSIHF 2.47 LKpSAYF 2.38 LFpSQHY 1.74 RTpSYRR 1.10 KDpSNKF 2.58 VApSRIF 2.47 MRpSRKY 1.95 LYpSHTY 1.73 XXXKKY 1.09 RHpSRMF 2.57 XXpSFKF 2.46 HRpSQKY 1.88 YIpSGRY 1.73 RSpSATL 1.09 LMpSRRF 2.57 GEpSHLF 2.46 HMpSKRY 1.88 XXXKLF 1.70 NHpSHVL 1.08 IFpSRRF 2.57 SApSIRF 2.45 IRpSIKY 1.86 XXXKLF 1.70 XXXYKY 0.99 CHpSRLF 2.57 HYpSGRF 2.44 TRpSVKY 1.86 XXXKMF 1.68 RRpSQXX 0.98 CVpSRRF 2.56 SQpSRQF 2.43 LYpSAKY 1.85 XXpSAMY 1.67 SRpSNXX 0.98 KIpSVKF 2.55 NLpSQHF 2.42 RIpSNRY 1.83 ETpTHMF 1.65 XXXKTY 0.96 ETpSVKF 2.53 XXpSRIF 2.41 XXpSKLY 1.80 XXXKSF 1.63 RPpSCNW 0.95 GGpSRHF 2.53 MMpSNCF 2.41 XXpSQKY 1.79 XXXTMF 1.54 XXpTAVM 0.31 XXpSNKF 2.53 YSpSYMF 2.41 RApSRSY 1.79 GHpSYHM 1.24 XXXCCM 0.26 VQpSRLF 2.52 TRpSVVF 2.40 XXpSKHY 1.78 XXpSKCM 1.20 VHpSKNF 2.51 KKpSQYF 2.39 KHpSYTY 1.77 GHpSCIM 1.15

a C, L-α-aminobutyrate; M, L-norleucine; X, could not determine amino acid identity; SM, SMALI score of sequence. BRCA-1 BRCT2 was screened against the C-terminal OBOC pS/pT peptide library.

175

a Table 5.2 Peptides from BRCA1 BRCT2 domain screens (X7 library)

Peptide SM Peptide SM Peptide SM Peptide SM RpSYVFAT 4.92 HEpSYNFY 4.73 GGpSFQFR 4.64 PGpSPIYN 3.55 FHpSpTVFN 4.91 GMpSHNFI 4.73 NDApSIAF 4.64 TIVpSpTSY 3.55 ARpSNVFN 4.90 TFpSYSFE 4.72 XpSApSFIH 4.63 MNpSIRYV 3.55 XKpSHVFL 4.90 EVpSFSFN 4.72 XpSpSMEFX 4.62 DTpSFAYI 3.54 GEpSPVFI 4.89 AVpSISFA 4.71 SEFpSPLF 4.60 RDpSYKYK 3.53 QHpSKVFY 4.86 VVpSISFN 4.71 GRpSYVYG 3.80 pTGpSWSYS 3.50 FHpSNVFA 4.86 QRpSINFV 4.71 KpSpTVYGA 3.76 MNpSVKYT 3.17 pSTpSEVFI 4.86 QFpSPAFG 4.71 FIpSYVYL 3.75 RTpSEYVF 2.99 pSFApSPVF 4.83 LQpSYAFF 4.70 WTpSHVYM 3.75 GYTEHRF 2.93 HYpSMVFQ 4.83 XKpSYIFE 4.70 IKpSTVYY 3.72 HAQEHHF 2.82 IpTpSWVFX 4.82 HIpSRIFG 4.70 SApSAVYA 3.72 NRpSFVWF 2.81 XPpSQVFM 4.80 QEpSAAFY 4.70 PYpSFVYH 3.71 AMpSRVKG 2.70 IGpSATFN 4.79 pTTpSRIFR 4.70 IpSpSVYIA 3.70 SEpSMVXX 2.65 ESpSYTFE 4.78 HQpSAKFH 4.69 EpSDVYLG 3.69 HMpSYNWV 2.62 VRpSVTFpT 4.78 XYpSPKFV 4.69 IVpSHTYL 3.68 QRpSNTXX 2.60 NTpSHRFH 4.77 PGpSHAFX 4.69 TAWpSTVY 3.67 IIpSRTXX 2.52 ARpSHNFY 4.76 VLpSFRFF 4.69 SRpSWTYR 3.65 TTpSSEGF 2.45 SYpSHSFN 4.76 LNpSHMFA 4.69 SApSPTYA 3.64 HRpSXXXX 2.43 MSpSpSHKF 4.76 QEpSVRFQ 4.68 ETpSMTYR 3.62 QApSQXXX 2.39 XXpSITFN 4.76 GHpSYMFE 4.68 NFpSHRYY 3.62 HNpSpTYRF 0.27 FKpSHKFF 4.75 DRpSWMFY 4.67 NKpSQTYH 3.62 LFHRXXX 0.03 XIpSVTFV 4.75 FVpSTKFM 4.67 SIpSFTYW 3.61 XXXXXXL HTpSHAFL 4.75 TpSEIFAA 4.67 LKpSLTYH 3.61 TWpSHSFG 4.75 RRpSWHFE 4.66 AEpSPRYG 3.60 AHpSHNFG 4.74 GEpSYLFD 4.66 HKpSYSYH 3.60 NVpSNTFW 4.74 DSpSKIFN 4.66 VLEpSWTY 3.59 PRpSPRFF 4.74 GMpSAIFT 4.66 XRpSYKYK 3.58 SRpSHHFV 4.74 HHpSFHFR 4.66 GGpSIKYY 3.57 LRTpSMTF 4.74 YGpSHQFD 4.65 GMpSTRYV 3.56

a M, L-norleucine; X, could not determine amino acid identity; SM, SMALI score of sequence. BRCA-1 BRCT2 was screened against the X7 OBOC pS/pT peptide library. If a residue was not between the -2 to +4 position (relative to pS or pT), it was not used in the SMALI analysis and is in italics.

176

Table 5.3 Peptides from MDC1 BRCT2 domain screens (C-terminal library)a

Peptide SM Peptide SM Peptide SM Peptide SM Peptide SM VFpSRQY 2.60 XXpSLHY 2.45 XXpSPQY 2.38 XXpSAWF 1.56 XIpSNVR 0.93 VFpSREY 2.59 CDpSCSY 2.45 VHpSRCF 1.77 XTpSQQF 1.55 XTpSXX 0.93 FHpSRQY 2.57 XXpSKQY 2.45 XXXRWY 1.70 XXpSHIF 1.55 XXpSARR 0.90 KHpSRTY 2.57 XXpSIWY 2.45 XXXKWY 1.70 XXpSYTF 1.54 XXpSKXX 0.88 QHpSHCY 2.55 YKpSMDY 2.44 YHpSRIF 1.69 XXpSLEF 1.53 XXXKIF 0.76 STpSLTY 2.54 XXpSHCY 2.43 XXXLWY 1.67 XXpSEVF 1.50 XXXVWF 0.76 SLpSLIY 2.52 ADpSFDY 2.43 XXXYWY 1.66 XHpSRSW 1.40 XXXFWF 0.76 MLpSHCY 2.52 XXpSKYY 2.43 XXXKTY 1.66 ERpSKSW 1.37 XXpTGHF 0.73 VNpSVNY 2.52 XCpSVNY 2.42 XXXIWY 1.64 QVpSHVW 1.36 XXXTLF 0.65 VGpSVNY 2.52 XXpSYVY 2.41 ECpSQWF 1.64 LDpSEHW 1.35 XXpTKHW 0.56 XXpSRWY 2.51 RYpSSEY 2.41 XXXKNY 1.64 XVpSKEW 1.34 XXXCHL 0.13 MMpSHTY 2.51 XXpSITY 2.41 SSpSLIF 1.62 XXpSKSW 1.32 XXXYIK 0.12 VWpSDNY 2.50 XIpSCDY 2.41 XXXXWY 1.62 XXpSKAW 1.30 GTpSHFY 2.50 XXpSHYY 2.40 XXXXWY 1.62 XXpSQHW 1.30 HCpSLVY 2.49 XXpSMCY 2.40 XKpSRHF 1.61 XXpSCHW 1.30 TEpSLHY 2.49 XXpSMQY 2.39 XVpSRCF 1.60 XXpSLSW 1.29 LYpSHCY 2.49 XXpSYAY 2.39 GEpSATF 1.60 XXpSQSW 1.27 GApSITY 2.48 XXpSYAY 2.39 XXpSKHF 1.59 XXpSFDW 1.25 FFpSAVY 2.46 XXpSFEY 2.39 XLpSRYF 1.58 MHpSQXX 0.97 HSpSMIY 2.46 XXpSYFY 2.39 XXpSCWF 1.57 XHpSRXX 0.97

aC, L-α-aminobutyrate; M, L-norleucine; X, could not determine amino acid identity; SM, SMALI score of sequence. MDC1 BRCT2 was screened against the C-terminal OBOC pS/pT peptide library.

a Table 5.4 Peptides from MDC1 BRCT2 domain screens (X7 Library) RPAWWWI HPKVKHA NGNFTYY LHNPIAH ERpSHKRM MIRPFQI NWFHVYI PQKVIGG RGNVEEF RDPKESD ELpTYIGT WWTWSRL MFKNYNY QTLRMLN LHNKFSI AQpSLMWR WLpTDLER WGVDNSD XXXXLIK

a M, L-norleucine; X, could not determine amino acid identity; MDC1 BRCT2 was screened against the X7 OBOC pS/pT peptide library

177

Table 5.5 Peptides from MCPH1 BRCT2 domain screens (C-terminal library)a

Peptide SM Peptide SM Peptide SM Peptide SM Peptide SM RDpSRWY 2.20 WRpSYIY 2.01 XXpSYWW 1.68 XXXIWY 1.28 XXpSVXX 0.80 IQpSEWY 2.19 GTpSAYY 1.97 HQpSHYW 1.65 XXXIWY 1.28 XXXVYW 0.76 ENpSQWY 2.17 HEpSRCY 1.96 VRpSRHW 1.61 XXpSRRR 1.25 XXXLYW 0.74 RRpSHRY 2.16 DFpSACY 1.96 ERpSAVF 1.60 FHpSRRI 1.24 XXXKWF 0.74 RFpSRVY 2.14 XXpSCVY 1.90 FGpSEWF 1.58 MHpSHRH 1.24 XXXKWF 0.74 QRpSRRY 2.14 XXpSEYY 1.89 XXpSHYW 1.52 XXpSKCF 1.23 XXXMYW 0.73 XXpSAWY 2.08 KFpSVWW 1.85 XXpSLVW 1.52 XXpSWRR 1.20 XXXFYW 0.73 MEpSVYY 2.06 XXpSRIY 1.82 CFpSCFW 1.51 XXpSQCF 1.20 XXXKWR 0.64 XXpSYWY 2.06 XXpSRIY 1.82 FWpSEDW 1.47 XXXKVY 1.17 MYpTRRR 0.63 XXpSWWY 2.06 XXpSKCY 1.81 MEpSWYF 1.45 RIpSHRC 1.15 XXXWWR 0.61 WHpSTVY 2.05 WGpSMTY 1.79 XXpSDWF 1.44 XXXKWW 0.95 XXXCFW 0.61 XXpSFWY 2.04 XXpSAIY 1.79 ILpSQSW 1.41 IHpSXXX 0.93 KWpTQAV 0.21 DWpSAVY 2.03 VRpSEVW 1.76 YHpSYRR 1.37 YQpSXXX 0.88 XXXGFM 0.10 HTpSAVY 2.02 ERpSHWF 1.74 XXpSKRF 1.32 XIpSVXX 0.85 ENpSLVY 2.01 MFpSCVW 1.74 YRpSRLR 1.32 XXpSVXX 0.80

aC, L-α-aminobutyrate; M, L-norleucine; X, could not determine amino acid identity; SM, SMALI score of sequence. MCPH1 BRCT2 was screened against the C-terminal OBOC pS/pT peptide library.

a Table 5.6 Peptides from MCPH1 BRCT2 domain screens (X7 library) TDEEPGH VFHAKRA YVNSRMY MHpSAKWF RKRFKSY NRYRSIR TKFHRRA MQIAGQM AIpSYpSAY DPpTHpTFT YRRHKRQ YMYPQTT AKGMLHR REIGAVA DApSQSIN QLpTYEDI PRSRRHF XXXIWEM SFGHAQF FWKRISL ERpSMYHH FIRGIRS IRTWWGR XXXXHKK SWGATPD KRLTWVD KVpSWPFR IWRRKNY KEVFNKQ XXXXIWK XXXXSMI

a M, L-norleucine; X, could not determine amino acid identity; MCPH1 BRCT2 was screened against the X7 OBOC pS/pT peptide library

Table 5.7 Peptides from PTIP BRCT 3-4 domain screens a (C-terminal library) CRpSRXX MKpSHXX RHpSQRH WHpSMRH XHpSKLH FRpSRXX MWpSRRR HTpSIRH WRpSRMR XYpSKRR HGpTHRH PHpSYRR SHpSSHR YCpSHHR XXpSHKR HMpSHHI QDpSHWH SNpSHQH YHpTRVH XXXXKR

aC, L-α-aminobutyrate; M, L-norleucine; X, could not determine amino acid identity; PTIP BRCT 3-4 was screened against the C- terminal OBOC pS/pT peptide library.

178

Table 5.8 Peptides from PTIP BRCT 5-6 domain screens (C-terminal library)a AYpSHWA XXpSFIF TNpSRSI RVpSRFR XXpSYKT PNpSRHY AHpSVIC IMpSRRH LQpSLIL XXpSFWR PYpSFLV PIpSXXX GRpSLIF QFpSWRH XXpSKIL XXpSHYR HRpSVMW FRpTYRY IKpSHMF XXpSRMH XXpSKVL XXpSKRR MKpSRAW XXXKWF IPpSRRF XXpSWKH XXpSMCL XXpSKYR VSpSRRW XXXKQI RQpSRKF KFpSYCI HNpSYHM HRpSYKI XKpSRRW XXXKVV XRpSVTF MRpSRFI NYpSYRM LFpSMVT HRpSMKY XXXXXV XXXKWWW XXXXKW XXXKRY

aC, L-α-aminobutyrate; M, L-norleucine; X, could not determine amino acid identity; PTIP BRCT 5-6 was screened against the C-terminal OBOC pS/pT peptide library.

a Table 5.9 Peptides from PTIP BRCT 5-6 domain screens (X7 library)

Group I (22) Group II (19) Group III (34)

Peptide SM Peptide SM Peptide Peptide

pSRpSLLWH 5.94 NNpSYVFH 2.68 YIpSNXXX FSRAERM pSRpSMEMH 5.70 IEYNFHL 2.55 FRpSAFXX RRSRIIK pSEpSFHAF* 5.60 YLpSEVFV 2.30 XKMApSXX GLRRXXX pSTpSLMQR 5.55 AYpTAVFT 2.17 KRpTSINH WMIKAMD pSApSLNWT 5.51 XHpSQVMM 2.13 RGpSMYSI RWRXXXX pSYpSIYYM 5.47 IDLVVMY 1.99 YPpTQKQA GFARYTY pSFpSFVHN 5.45 IRpSMVLE 1.99 RRpSpTXXX RRVPKXX pSRpSMYRM 5.45 YHpSHFMS 1.96 LRRILpSP NFRVRRH pSHpSLWXX 5.43 MWFEHAI 1.95 HRpSKWHP SYHLLRH pSLpSHSMH 5.41 RSpSYIVG 1.87 pTEPWPFM HRTRXXX pSNpSFXXX 5.32 RYpTKAMR 1.80 RpSpTXXXX NQWLKWE pSKpSMFPV 5.29 XXpSPTMH 1.76 RVRYYRY XXXXXXL pSHpSAWMS 5.26 XXpSMVVA 1.73 QNYGTGT pSIpSVAEF 5.12 XXpSVAFY 1.62 RQIHFXX pSPELILA 4.71 TGpTYTIY 1.54 FRTRFMR pSEpTLLTH 4.61 XXpTYPMH 1.53 EDNGQPP pSHpTIKTH 4.40 ANpTFTTS 1.45 GYLYTVR pSQpTIXXX 3.82 XIpTVYLL 1.23 QRRHLXX pTNpSFYYP 3.26 pSPILXXX 1.19 QFKTQYF pTRpSIMQV 3.20 GHQRVAV pTIpSFFXX 3.03 RMRHNXX pTEpTFPFK 1.88 RRHRQWF

a M, L-norleucine; X, could not determine amino acid identity; SM, SMALI score of sequence. PTIP BRCT 5-6 was screened against the X7 OBOC pS/pT peptide library. For the group II sequences, if a residue was not between the -2 to +4 position (relative to pS or pT), it was not used in the SMALI analysis and is in italics. Sequences were divided into either group I (dual pS/pT sequences of the consensus (pS/pT) -X-(pS/pT) or sequences with a single pS and a glutamic acid believed to substitute for the other pS or pT residue), group II (single pS/pT sequences of the consensus (pS/pT)-X-X-(F/M/L/I/V/T), a few sequences were included where D/E were believed to substitute for pS/pT), and group III (sequences which did not conform to any observable pattern). *The pSEpSFHAF sequence was used for fluorescence polarization binding assays.

179

Table 5.10 Peptides from WT Nibrin FHA-BRCT2 domain screens (C-terminal library)a FRpSNRR CHpTMXX DQpTRXX ETpTHSF HSpTQIL VYpTFFH AIpTXXX DHpTSVF DRpTYLM EVpTFVH RVpTYMF VIpTXXX APpTLXX DIpTCMH EEpTXXX FDpTNMR TEpTFFS YVpTHMY CDpTMMR DKpTMXX EIpTHIY HIpTTIW VDpTXXX CHpTIIH DNpTYIV EIpTRXX HNpTQMR VLpTHVY aC, L-α-aminobutyrate; M, L-norleucine; X, could not determine amino acid identity; Nibrin FHA-BRCT2 was screened against the C- terminal OBOC pS/pT peptide library.

Table 5.11 Peptides from WT Nibrin FHA-BRCT2 domain screens a (X7 library)

Peptide SM Peptide SM Peptide SM Peptide SM pSTpTVIPM 3.01 DEpTHFFW 2.58 DHpTDVYV 2.43 XXpTHAWM 2.13 VpSpTMITE 2.93 pTRpTSVDI 2.57 DApTDYLM 2.42 XXpTYQYQ 2.13 pTSpTIINY 2.92 WNpTDIHA 2.57 EEpTTYXX 2.42 XXpTYAVA 2.11 PHpSpTMIL 2.88 XIpTVIFX 2.56 ELpTSLTF 2.42 EALVpTRK 2.11 YDVpTHID 2.81 DDpTDMFE 2.56 DPpTFDGF 2.38 XRpTpTHXX 2.09 DMpTSITS 2.80 XLpTTIIL 2.55 MDpTIMDS 2.37 XXpTFXXX 2.06 DTpTRIDS 2.80 DEpTSYAT 2.55 XEpTYVHA 2.37 DIpTYDFM 1.29 NEpTLIHF 2.78 DDpTWMTL 2.55 IpSpTXXXX 2.36 pSFpTSDTW 1.16 HEpTLITF 2.77 XApTIIXX 2.54 EFpTNVAI 2.35 DIpSFIRH 0.92 RDpSDpTHL 2.75 HpTpSISGY 2.54 NDpTHFpSS 2.35 RFDYSIH 0.86 EMpTHIDV 2.74 XXDpSRpTW 2.54 FEpTFELF 2.33 QEpTIELY 0.73 EYpTYIDT 2.73 DEpTSQAF 2.53 VEpTIWXX 2.31 XXpSRIIH 0.62 AEpTTILE 2.72 pSApTXXXX 2.52 QVpTSMYV 2.30 XXXYIET 0.54 EpTVIIYA 2.70 XXpTFIXX 2.52 HIpTYVPW 2.29 XXXMITL 0.53 pSRpTHYMQ 2.69 DSpTHLSY 2.52 XVpTHVHT 2.28 XXXKIQY 0.52 EQpTAIYH 2.68 XXpTNIGX 2.51 EDpTFXXX 2.28 XXXKIQN 0.50 NIpTFISI 2.66 XpSpTHFEX 2.50 ADpTGFFH 2.25 XXpSHLWT 0.33 TDpTMIMY 2.63 EpSpTEXXX 2.50 XXpTYMSD 2.25 XXXKMIF 0.27 XIpTNIVF 2.63 DApTILMS 2.49 XXpTIMKG 2.24 XXXNMHK 0.24 DEpTHLKI* 2.62 XXpTWIXX 2.49 XXpTYLFE 2.24 XXXEMHK 0.23 pSNpTHNYR 2.60 EEpTNYFY 2.48 VEpTXXXX 2.22 XXXMLKF 0.23 XRpSTpTFK 2.59 EEpTFFRL 2.48 VpTpTXXXX 2.21 XXpSEWGS 0.22 XpSpTFMIX 2.58 NEpTNMNE 2.47 XXpTHFTX 2.19 XXXXXQK 0.03 GEpSpTIQH 2.58 DSpTDVLN 2.44 RpTpTXXXX 2.18 DEpTYVNW 2.58 YpTpTHLLS 2.43 XXpTKWLF 2.17

a M, L-norleucine; X, could not determine amino acid identity; SM, SMALI score of sequence. Nibrin FHA-BRCT2 was screened against the X7 OBOC pS/pT peptide library. If a residue was not between the -2 to +4 position (relative to pS or pT), it was not used in the SMALI analysis and is in italics. *Sequence resynthesized for binding analysis.

180

Table 5.12 Peptides from R28A Nibrin FHA-BRCT2 domain screens (C-terminal library)a ALpSYKR KRpSFYR QEpSKFH VYpSKYR DYpTFLD WVpTHIR AYpSRRV KSpSYRR QRpSKRI WTpSMYR DYpTQIF YCpTXXX ERpSLFM LWpSRRY QRpSRRR WYpSRDI FGpTWIH YMpTRRR FRpSFPR MCpSRRK RKpSRXX YKpSRRK FRpTRSK YTpTNIW FRpSHTF MFpSVRR RKpSVFR XWpSFRY HIpTXXX XRpTYKR FRpSQHR MRpSRKI RLpSRHK XWpSNAR MRpTRFR XXpTWIK FYpSRRF NRpSKYR RTpSCRR XXpSRRW RCpTRRF XXpTWIR HRpSRIK NVpSYDL RYpSWYR XXpSRXX WIpTWIR XXXMF KLpSRRL NYpSKRW VHpSKMA DVpTQIH WRpTLWR

a C, L-α-aminobutyrate; M, L-norleucine; X, could not determine amino acid identity; R28A Nibrin FHA-BRCT2 was screened against the C-terminal OBOC pS/pT peptide library.

Table 5.13 Peptides from R28A Nibrin FHA-BRCT2 domain screens a (X7 library) RKAIRYF RYFRKQF SIKYLAF AYRSKGF KRTKHYR XXXXIRK SRASYWQ RRGMRGY WRKRRXX FKRRRYT FTYNNKV XXXXXXK YSARAKY HRHRRIH LMpTYAGA RRRFKGP XXXKpTII GHFTGIW YRHHRKR XSpTXXXX XRRKIYM XXXKRKR RGFRFXX NNIYFDR XWQRRRM XHSKYIR XXXKYRH

a M, L-norleucine; X, could not determine amino acid identity; R28A Nibrin FHA-BRCT2 was screened against the X7 OBOC pS/pT peptide library.

181

Table 5.14 Peptides from BARD1 BRCT2 domain screens (C-terminal library)a

Group I (24 Sequences) Group II (8 Sequences) Group III (30 Sequences)

Peptide SM Peptide SM Peptide SM Peptide Peptide WRpSARY 2.29 QIpTGDF 1.29 HMpTGYH 6.57 XXpSYHH RApSRYR MRpSWRY 2.16 FYpTHEW 1.29 GYpTFMH 6.57 HSpSLRH XSpSRKR PRpSVRF 2.06 XXpSYLF 1.20 FRpSYYH 6.52 KVpSHSH PCpSRRR RRpTWRW 2.00 XXpSWHF 1.17 RHpSGWH 6.31 NHpSNHH HTpTRHK XRpTWRW 1.88 XXpSMCY 1.17 FGpSLMH 6.25 XXpSTSH XRpTRKK WGpSHLF 1.78 XXpTHFF 1.12 XXpSFMH 5.70 KRpSLRH RSpTTRR PGpSMHY 1.66 XKpSKRI 0.99 XXpSKMH 5.43 SRpSHHH RFpTYMR QEpSMFY 1.66 XXXIFW 0.99 XXpTGFH 5.42 XXpSHKH KFpTRYR XVpSHLY 1.43 XXpTFWW 0.87 HMpSHFR HRpTFWR XXpSKRW 1.42 HFpSQHR WRpTLRR XIpSYLF 1.38 TCpSRHR KRpTFYR AYpSIMY 1.37 YGpSKRR HRpTFRR HEpSTHW 1.35 IRpSTRR XYpTHRR WQpTRRI 1.35 RFpSRFR XXXHKH XQpSYCY 1.35 WFpSCRR XXXKHK

a C, L-α-aminobutyrate; M, L-norleucine; X, could not determine amino acid identity; SM, SMALI score of sequence. BARD1 BRCT2 was screened against the C-terminal OBOC pS/pT peptide library. Group I, X-X-(pS/pT)-X-X -ψ-COOH sequences; Group II, X-X-(pS/pT)- X- ψ-H-COOH sequences; Group III, sequences not within the group I or group II consensus pattern; ψ, hydrophobic/aromatic residues.

a Table 5.15 Peptides from BARD1 BRCT2 domain screens (X7 library) SKEIFVL HYLQLTN FTpSINLH MQpSLTNP GYpTGNQV KLTDRPM AGIMSYS RVNHNRA IKpSNMPQ pTHpSHXXX IRpTYRMI PGTAKEQ KNIWHMS LRMYTPV ILpSYLNK RLpSYQWS RpSpTVLHR XYTMMIG PRIFPAA RQMHGpSI LApSVQGS RRpSRIYV VGpTFKDH XLVLLWG QRIMNHL WFPGAXX LHpSELAL SpSpSTPPH YNpTRESE XIWGLWF KDKGPIL FNpSRRYA MKpSMNEI EPpTKNAY XIpTMFMR pTpSXXXXX FILSYIT FSpSRYKF MQpSHESN FVpTFWHH LRSQKXX XXXpSKGI XXXXQpSK

a M, L-norleucine; X, could not determine amino acid identity; BARD1 BRCT2 was screened

against the X7 OBOC pS/pT peptide library.

182

Table 5.16 Peptides from DNA Ligase IV BRCT2 domain screens (C-terminal library)a CFpSWHR LVpSRHR RYpSCKR XXpSYRR XXpTHKR FRpSQKR MKpSYKR WRpSQRW LKpTRRR XXXWKR HIpSRRV RHpSYYR XKpSRKM PRpTFFR XXXXHH IRpSYRK RPpSFRW XXpSRXX RRpTKCR XXXXRK RRpTMKR

aC, L-α-aminobutyrate; M, L-norleucine; X, could not determine amino acid identity; DNA Ligase IV BRCT2 was screened against the C-terminal OBOC pS/pT peptide library.

Table 5.17 Peptides from DNA Ligase IV BRCT2 domain screens a (X7 library) YVINRKW AKpSIYWQ NIpSASSM YPpTFGTG XILNFYF IKpSGGHQ YYpSSRKW FHRSRNV HAPSQRW MYpSWFHM HYpTIVMH MXXXXXX

a M, L-norleucine; X, could not determine amino acid identity; DNA Ligase IV BRCT2 was screened against the X7 OBOC pS/pT peptide library.

Table 5.18 Peptides from ECT2 BRCT2 domain screens (C-terminal library)a APpSRKR IFpSYKI RRpSYMR XKpSYMR XXpSKRR KHpTRKH CKpSQRR KNpSQKR SQpSRHR XMpSTKR XXpSMLT KVpTFRR FYpSKMH LYpSGHF WHpSRXX XRpSYKH XXpSRVK LHpTRKR GRpSRXX MCpSRRK WPpSXXX XMpSTKR HKpTRMR NRpTMRR HHpSVFK NLpSRRK YFpSQKL XRpSYKH HRpTHRR SKpTRMR HKpSRMR QVpSYKR XFpSRQR XYpSKXX HRpTRHT VKpTKRC HNpSFWR RApSXXX XKpSRHR XYpSKXX IHpTRRR XXpTRKK IFpSWRR RApSYMR XKpSWYR XXpSHHW KGpTKRK XXXCHK XXXHLP XXXKHK XXXKKK XXXVMK XXXXXL aC, L-α-aminobutyrate; M, L-norleucine; X, could not determine amino acid identity; ECT2 BRCT2 was screened against the C-terminal OBOC pS/pT peptide library.

183

Table 5.19 Peptides from TopBP1 BRCT 1-2 domain screens (C-terminal library)a

Group I (20 Sequences) Group II (25 Sequences) Group III (43 Sequences)

Peptide SM Peptide SM Peptide SM Peptide Peptide FEpSFHL 3.65 IYpSYWR 3.05 XLpSYMH 1.97 XXpTFWF XWpSRRH VKpSRHY 3.53 RRpSYWH 3.01 AVpSEWK 1.47 XXpTFHK VIpSRVL XRpSFHL 3.50 SGpSRWR 2.98 QSpSGYK 0.85 FHpSFCR CWpTRRR XWpTYHL 3.48 PYpSYWH 2.93 XXXFWW GSpSRFT FRpTRHL 3.47 RRpSIWR 2.93 XXXFWW LWpSRVW YEpTFHL 3.42 YRpSTWR 2.85 YRpSHMF RYpSRYW VWpTMHY 3.36 KYpSNWR 2.81 SHpSIMC RYpSRLY FFpSYHW 3.35 LYpSRFR 2.74 NCpSIMS FHpSRXX CWpTYHM 3.29 XFpSRLR 2.57 LKpSKFF RIpSSRH XWpSYHM 3.22 VHpSRFR 2.56 PYpSKWF INpSWHR MKpSKHM 3.21 MVpSRLR 2.56 XXXKHK XXXWKR RRpSVHI 3.20 CFpTFWH 2.53 XXpSKSM XXpSWRR MYpTIHF 3.10 IHpSRFH 2.49 IIpSKHQ XKpTYHR WCpSRHW 3.06 RYpTFLH 2.37 NRpSKHR RCpSYRR RRpTFRL 2.31 XXpTFWH 2.35 NVpSKHR WWpSYRR MFpSQRL 2.04 HLpSRYH 2.28 PCpSKTV HWpSYWT QFpSHRY 1.75 XKpSTLR 2.22 FRpSKLW ADpTYMW RYpSRKY 1.59 PRpSWVR 2.19 XXpSKAY KYpTYXX XNpSRRF 1.50 RGpSCVR 2.18 FRpSLHR PRpSXXX XXpSKRY 1.49 LWpSYYH 2.17 WHpSMDW RTpTXXX REpTWFH 2.14 MYpSQHR MFpSRRD YEpTLFH 2.06 XXpSQRR

aC, L-α-aminobutyrate; M, L-norleucine; X, could not determine amino acid identity; SM, SMALI score of sequence. TopBP1 BRCT 1-2 was screened against the C-terminal OBOC pS/pT peptide library. Group I, X-X-(pS/pT)-X-(H/R/K)-ψ-COOH sequences; Group II, X-X- (pS/pT)-X-ψ-(H/R/K)-COOH sequences; Group III, sequences not within the group I or group II consensus pattern; ψ, hydrophobic/aromatic residues.

a Table 5.20 Peptides from TopBP1 BRCT 1-2 domain screens (X7 library) HFARWDR NRKYRNY YRMNMFH QRQQRMK RFRRTAF XXRYKYW RLARARV RRKYSHR XXMHKRN ARRRHGA RMRFDXX YHTRRXX RRAHYGF XXKHRRA LRPRRHL FRRMHRN RNRHYNS RTVRTKK NHENEAP QFLPGTM RRpSRXXX KDRYMWR RNRTLXX YRWKNRS IQHRRKL WRLRNFR SYpSNMNM LARWRXX RVRMGSR XXWFRRW REHWXXX RQMVDVE VHpSWAAH MRRQHXX TRRWAIY XXWWRAR RRHAVRW RRMFYRY EDpTEEYL NRRKFHR WHRGRIV KRYKYHV AYKRNWR RYMHNDR PEpTIVMV QRRFXXX XRRNKYY NRYRXXX TRYKVHY RRXXXXX XXXKLRR XXXXAQR XXXXKRR

aM, L-norleucine; X, could not determine amino acid identity; TopBP1 BRCT 1-2 was screened against the X7 OBOC pS/pT peptide library.

184

Table 5.21 Peptides from TopBP1 BRCT 4-5 domain screens (C-terminal library)a

Group I ( 34 Sequences) Group II (19 Sequences) Group III (29 Sequences)

Peptide SM Peptide SM Peptide SM Peptide Peptide PHpSRRF 2.49 XXpSRKI 1.74 YLpSKFR 3.70 CFpSWHR WRpSYIS WApSRRF 2.44 WMpSQRL 1.71 WKpSHFR 3.66 FDpSMKP YRpTVHR RTpSRRM 2.26 FPpSCIF 1.65 PFpSYFR 3.62 FEpSRKR YHpSRRH IYpSRWF 2.21 XHpSKTL 1.61 VKpSHYR 3.15 HHpSHHR XLpSQHR ARpSRTF 2.16 TNpSHKW 1.52 XFpSFMR 3.10 HRpSMRR XXXXKK XVpSRYF 2.11 YYpSIFT 1.46 FMpTKFR 3.00 IFpSARP XKpSRKR KYpSHRF 2.09 CRpSFKY 1.45 REpSWMR 2.96 KLpSMFA XXXKKR KHpSRHT 2.08 XXpSKIV 1.43 YFpSFFK 2.86 KRpSQHC XXXWNR IFpSFRF 2.06 HIpSFLM 1.40 XXpSWYR 2.86 LRpSVHK XYpSYPG RTpSRHV 2.04 MNpSYGM 1.38 XWpSAYR 2.81 MIpSRHK XXpSYRR RMpSKHF 2.03 QEpSYVL 1.35 GLpSIFH 2.52 NWpSRWS RFpSRFV 1.99 TGpSNPM 1.33 XXpSCVR 2.51 PFpSRRR AFpSRHY 1.98 XXpSCFI 1.28 XQpSIFK 2.41 PSpSCVC RHpSHRL 1.93 XXXXWT 0.25 XXpSFFH 2.36 QRpSYHR GYpSKRY 1.84 XXXXXW 0.18 XXXWWR 2.20 RYpSLHK XLpSRYW 1.82 FQpSRWH 2.12 RYpSKNR WHpSVRT 1.79 XXpSFWK 1.85 RRpSMPP XXpSKRW 1.76 IYpSHTK 1.78 THpSVIS LVpSMHF 1.74 AGpSRLH 1.71 VIpSHRR

a C, L-α-aminobutyrate; M, L-norleucine; X, could not determine amino acid identity; SM, SMALI score of sequence. TopBP1 BRCT 4-5 was screened against the C-terminal OBOC pS/pT peptide library. Group I, X-X-(pS/pT)-X-X-ψ-COOH sequences; Group II, X-X-(pS/pT)-X-ψ-(H/R/K)- COOH sequences; Group III, sequences not within the group I or group II consensus pattern; ψ, hydrophobic/ aromatic residues.

a Table 5.22 Peptides from TopBP1 BRCT 4-5 domain screens (X7 library) GGEFAHR YGHNFRL PDpSYGWW RFYWAWF XXXKSLT NNFYDDR YQHXXXX MSpTRRRQ XXYQAYL XXXWTYA HVHFIPG HGLNFMG XTWSHHI TDXXXXX XXXXXPT XXXXXXK

aM, L-norleucine; X, could not determine amino acid identity; TopBP1 BRCT 4-5 was screened against the X7 OBOC pS/pT peptide library .

185

Table 5.23 Peptides from TopBP1 BRCT 7-8 domain screens (C-terminal library)a

Group I (61 Sequences) Group II (51 Sequences) Group III (61 Sequences)

Peptide SM Peptide SM Peptide SM Peptide SM Peptide Peptide FFpSHLF 1.81 ICpSFLI 1.57 FNpSFFR 3.32 HPpSMFH 2.71 XXpSFFA MHpSYHQ WLpSHLF 1.79 RWpSRYI 1.57 HYpSKFR 3.32 PHpSRWH 2.69 TKpSYMA FLpSQMQ IMpSRFF 1.77 AQpSYLY 1.55 RFpSRFR 3.30 LIpSMFH 2.68 KHpSIRA HRpSRMQ FMpSYFI 1.74 XXpSYMI 1.54 KFpSHFR 3.30 FLpSFYH 2.63 XXpSFFC ANpSWAR AFpSNMF 1.73 PNpSCMM 1.53 KApSHFR 3.22 AFpSMWH 2.59 IMpSFMC DHpSKAR FWpSNWF 1.73 XXpSLMI 1.53 MRpSRYR 3.20 XXpSKFH 2.58 XXpSIMC QKpSYAR HKpSRFW 1.71 XXpSLFL 1.53 HYpSFYR 3.20 FRpSLMH* 2.49 TDpSVYC WSpSHER FLpSHIF 1.71 RKpSYYY 1.53 MQpSMWR 3.20 PHpSKLH 2.49 XXXVAG CRpSVHR WRpSVFF 1.70 INpSRFV 1.53 NHpSFYR 3.19 XXpSAFH 2.47 IYpSHRG WYpSFHR SIpSHMF 1.70 ATpSSWM 1.51 RLpSKWR 3.18 XXpSMYH 2.44 FHpSFEH XXpSMKR ALpSAYF 1.66 MMpSLMV 1.51 RHpSIYR 3.17 DRpSKLH 2.39 FSpSNHH RVpSYPR HFpSYLM 1.66 KEpSNWL 1.50 VIpSIFR 3.16 RSpSWIH 2.26 RApSWKH KIpSKQR HYpSRMY 1.65 XXpSMLI 1.49 MMpSYWR 3.16 XWpSRVH 2.17 IYpSNPH AFpSYRR FRpSHVF 1.65 XXpSMMW 1.49 HQpSHYR 3.15 XXpSFFK 2.02 FRpSHRH RCpSYRR FHpSNWM 1.64 XXpSNLW 1.49 LRpSLWR 3.14 FQpSFMK 1.95 MLpSKRH VRpSFRR MHpSHLL 1.63 MVpSEYW 1.48 PFpSCYR 3.12 GRpSYMK 1.89 MHpTHRH QRpSYFS MKpSYMY 1.63 HIpSRTM 1.48 YHpSSWR 3.10 XXpSMYK 1.87 QCpSFRH XXXYFS XYpSRYF 1.62 XXpSMLL 1.46 FHpSIMR 3.08 XXpSYIK 1.66 RYpTHRH CYpSAMS LYpSQWL 1.62 RDpSMMT 1.46 XXpSYWR 3.04 XXpSHLK 1.63 WWpSHRH FFpSHMS WLpSLIF 1.61 DApSEWW 1.46 XXpSYWR 3.04 XCpSWVK 1.54 WYpSHSH XXpSLMS QMpSRFY 1.60 VCpSQYW 1.46 PRpSRIR 3.02 XHpTHTH SCpSYCT HFpSNWL 1.60 XXpSDMW 1.45 XXpSKYR 3.01 WFpSRHK XXpSWIT DDpSFYF 1.60 XXpSFYW 1.43 FFpSQMR 3.00 XXXHHK RDpSMMT MRpSFWI 1.60 XKpSLWT 1.42 XCpSVWR 2.98 XXXKHK FNpSSTT HKpSRFV 1.60 KLpSQVY* 1.39 WRpSYIR 2.96 KRpSKRK XKpSLWT XXpSHFM 1.59 XXpSHVI 1.38 KYpSKLR 2.96 MFpSRRK GVpSXXX YHpSFFL 1.59 XXpSLIM 1.37 XXpSHMR 2.84 PHpSRRK LEpTMXX YLpSCLI 1.58 XXpSYAM 1.34 MIpSFFH 2.75 XFpSRRK WRpSXXX WRpSIFL 1.58 FNpSSTT 1.32 PHpSYWH 2.73 IFpSHTN YPpSXXX VLpSAFY 1.58 XXpSWIT 1.20 KWpSFFH 2.73 ELpSMCQ XFpSNXX FLpSHAM 1.57 HVpSMFH 2.71 FEpSHFQ

a C, L-α-aminobutyrate; M, L-norleucine; X, could not determine amino acid identity; SM, SMALI score of sequence. TopBP1 BRCT 7-8 was screened against the C-terminal OBOC pS/pT peptide library. Group I, X-X-(pS)-X-ψ-ψ-COOH sequences; Group II, X-X-(pS/pT)-X-ψ-(H/R/K)-COOH sequences; Group III, sequences not within the group I or group II consensus pattern; ψ, hydrophobic/aromatic residues. *Sequences selected for resynthesis and binding analysis.

Table 5.24 Peptides from TP53BP1 BRCT2 domain screens (C-terminal library)a KHpSRMW VKpSKRR XXpSTMR SRpTFRY XTpTRRY KKpSRFR XKpSLLY RFpTRYR VRpTRFF XXXXRR LVpSRKR XXpSRCR RHpTRRF XRpTMRR XXXXXK

aC, L-α-aminobutyrate; M, L-norleucine; X, could not determine amino acid identity; TP53BP1 BRCT2 was screened against the C-terminal OBOC pS/pT peptide library.

186

a Table 5.25 Peptides from TP53BP1 BRCT2 domain screens (X7 library) YVEKRSL ASIMGYN XYMYMKY NFRLRXX AGVLKMI XXXIGKA IDFSAYF VMKPVIG AVpSAFSG PMRRFKR AMVYRYG XXXXEDK LKFGRSR YKLNYSL DYpSPIMG TRRNHXX YIVHRNA XXXXKKK LLHQYVY XMMYHpTT YRpSRRRV XXRWRRW YVVMHAT XXXXXHH

a M, L-norleucine; X, could not determine amino acid identity; TP53BP1 BRCT2 was screened against the X7 OBOC pS/pT peptide library .

Table 5.26 Peptides from XRCC1 BRCT2 domain screens (C-terminal library)a

Group I (23 Sequences) Group II (6 Sequences) Group III (28 Sequences)

Peptide SM Peptide SM Peptide SM Peptide Peptide XRpSKRM 2.65 IApSRTI 1.80 DDpSIVF 3.33 KRpSRRA XXpSRYR KCpSKRY 2.46 XXpSKIY 1.79 RPpTTVA 2.96 CHpSRKG DKpSHIR GRpSNRL 2.38 XFpSRQW 1.68 AWpTIYH 2.90 PIpSVRH WApSRLR KPpSKRM 2.38 DDpSIVF 1.59 XNpSMMR 2.07 XXpSMRH XXpSKFR TRpSRKT 2.32 XQpSSKT 1.49 XXpSVMK 1.49 XXpSKHH RCpSHPR QTpSKRY 2.31 AEpSYHV 1.45 XXpSFFY 1.09 XXpSGKH XXXWRR HRpSMRF 2.29 XXpSNGI 1.39 TNpSRNH FNpSRRR PRpSRYF 2.27 XXpSFFY 1.37 VRpSKRH AApSKKR VRpSRCV 2.22 XXpSTNW 1.25 PKpSRVH YRpSKKR GHpSNRF 2.18 XXpSNRK LRpSRWR VCpSRTM 2.07 QYpSIRK RRpSKQR* YHpSKHL 1.95 SNpSHNQ KFpSRKR XXpSKKT 1.83 RRpTKRQ XVpSHRR XXpSIRM 1.81 XXXKRR NGpTPCS

a C, L-α-aminobutyrate; M, L-norleucine; X, could not determine amino acid identity; SM, SMALI score of sequence. XRCC1 BRCT2 was screened against the C-terminal OBOC pS/pT peptide library. Group I, X-X-(pS/pT)-X-X-ψ-COOH sequences; Group II, X-X-(pS/pT)-ψ-ψ-X- COOH sequences; Group III, sequences not within the group I or group II consensus pattern; ψ, hydrophobic/aromatic residues. *Sequence resynthesized for binding analysis.

a Table 5.27 Peptides from XRCC1 BRCT2 domain screens (X7 library) GGAPFAA SSIYKST NSRMQAV WRVRFIY XXXYLKW QLAVYYK LLKNLVS HQTQVRA HEYAWRQ XXXXMWK DKGVRSS ARMYKFQ HSTAGXX YNYRKGY XXXXNKF FLHRKNR RIpSSGLN AKVLMLY XXXLTFK XXXXXRR IAHNVHW GHQMRVY RYVMNNY XXXpTMRX XXXXXYA

a M, L-norleucine; X, could not determine amino acid identity; XRCC1 BRCT2 was screened against the X7 OBOC pS/pT peptide library.

187

Table 5.28 Peptides from ANKRD32 BRCT2 domain screens (C-terminal library)a

Group I (43 Sequences) Group II (19 Sequences) Group III (10 Sequences)

Peptide SM Peptide SM Peptide SM Peptide PFpSHHW 2.02 XXXYMW 1.11 AYpSYYH 3.49 XCpTFQH XXXYHW 1.91 XXpTMHM 1.07 YEpSYYH 3.38 FFpTFRH XXXYHW 1.91 XXpTFEW 1.02 FYpSLFH 3.28 WFpTRRH XXXYHW 1.91 XXpSHMW 1.02 XYpSFFH 3.13 XFpSHIH XXpSMHW 1.73 GEpSFFF 1.01 XYpSCFH 2.92 YFpSFHR XXpSDHW 1.73 XXpSLEW 1.00 ILpTYFW 2.65 WVpTHHH XXpSIHW 1.71 FIpSHWY 0.98 XLpTFMH 2.60 IYpSWRH FEpTVHY 1.69 XLpTCIW 0.97 FFpTIYH 2.41 XXpSINH DNpSYHY* 1.68 MYpTYTY 0.96 MYpTYTY 2.37 XXpSDRH HVpSNHF 1.58 XXpSHPW 0.93 XXXFFH 2.27 XXpSFHR XHpTLHY 1.55 XHpSEWF 0.92 YEpTYTF 2.25 LDpSYHI 1.54 VIpSEWY 0.85 GEpSFFF 1.98 XXpSHHF 1.53 XXpTTVW 0.83 XHpSMMH 1.94 XXXCHF 1.44 XFpTREF 0.79 XXXYMW 1.72 XHpSFHI 1.43 XXXXXW 0.76 XXpSIIH 1.69 XXXWHF 1.42 XXXYMY 0.73 XXXYMH 1.46 YYpTEHV 1.42 XXXSWF 0.67 XLpTCIW 1.43 XXpTYHI 1.39 XWpTPFY 0.64 XXpTYWE 1.05 ILpTYFW 1.37 XXXDYF 0.56 XXpTTVW 0.71 XXpSHHI 1.29 XXXEFI 0.49 XYpSLHM 1.27 XXpTFEL 0.31 YEpTYTF 1.14 aC, L-α-aminobutyrate; M, L-norleucine; X, could not determine amino acid identity; SM, SMALI score of sequence. ANKRD32 BRCT 2 was screened against the C-terminal OBOC pS/pT peptide library. Group I, X-X-(pS/pT)-X- X-ψ-COOH sequences; Group II, X-X-(pS/pT)- ψ-ψ-X-COOH sequences; Group III, sequences not within the group I or group II consensus pattern; ψ, hydrophobic/aromatic residues. *Sequence selected for resynthesis and binding analysis.

a Table 5.29 Peptides from ANKRD32 BRCT2 domain screens (X7 library) MAAINPY ARNLFVF SpTpSLHFT MGpTTDYM IpTTYHHF RNAYTQV MQNAGRM VpTpSVLAV SIpTTMLI FYYGFWH YHHNMIS IGpSRLSH YHpSIIGN YEpTNMIM XXXXRLG XXIIIVF ISpSMLTK EIpTRLAA XXpTKGYI TAMVSPV pTSpSMFIS IDpTRKNR RHRNMKN

a M, L-norleucine; X, could not determine amino acid identity; ANKRD32 BRCT2 was screened against the X7 OBOC pS/pT peptide library .

188

Table 5.30 Summary of the sequence specificities of the tandem BRCT domains of BRCA1, MDC1, MCPH1, PTIP (BRCT 5-6), and WT Nibrin

BRCT Repeat C-Terminal pS/pT Library X7 pS/pT Library

BRCA1 X-X-pS-(RK)-(RK)-(FY)-COOH X-X-pS-(HY)-(VT)-(FY)-X

MDC1 V-H-pS-(RK)-W-(YFW)-COOH None

MCPH1 X-R-pS-X-W-(YWF)-COOH None

PTIP (Repeat 5-6) X-R-pS-(RK)-(RK)-(FW)-COOH (RY)-R-pS-X-V-(FM)-H pS-X-pS-(FL)-X-X-H

Nibrina (DE)-I-pT-X-(MI)-(HR)-COOH (DE) -(DE)-pT-X-I-X-X

a Abbreviations: X, no significant selectivity; M, L-norleucine. Nibrin contains an FHA-BRCT2 triple repeat domain.

Table 5.31 Summary of the sequence specificities of the tandem BRCT domains of ANKRD32, BARD1, TopBP1, and XRCC1

BRCT Repeat Group I Group II

ANKRD32 X-X-(pS/pT)-X-X-ψ-COOH X-X-(pS/pT)-ψ-ψ-X-COOH

BARD1 X-X-(pS/pT)-X-X-ψ-COOH X-X-(pS/pT)-X-ψ-H-COOH

XRCC1 X-X-(pS/pT)-X-X-ψ-COOH X-X-(pS/pT)-ψ-ψ-X-COOH

TopBP1 (1-2)a X-X-(pS/pT)-X-(HRK)-ψ-COOH X-X-(pS/pT)-X-ψ-(HRK)-COOH

TopBP1 (4-5)b X-X-(pS/pT)-X-X-ψ-COOH X-X-(pS/pT)-X-ψ-(HRK)-COOH

TopBP1 (7-8)c X-X-pS-X-ψ-ψ-COOH X-X-(pS/pT)-X-ψ-(HRK)-COOH

Note: Group III contained any sequence not within Groups I or II, Abbreviations: X, no significant selectivity; ψ, hydrophobic/aromatic residues, aTopBP1 BRCT 1-2 contained a triple BRCT repeat structure b c (BRCT1-BRCT2- BRCT3); TopBP1 BRCT 4-5; TopBP1 BRCT 7-8

189

Table 5.32 Dissociation constants of selected peptides against the tandem BRCT domains of ANKRD32, BARD1, TopBP1, and XRCC1a

KD (μM)

Peptide BRCT: ANKRD32 BARD1 XRCC1 TopBP1 (1-2) TopBP1 (4-5) TopBP1 (7-8) 1 ND > 50 NA ND NA > 40 2 14.2 ± 1.4 > 50 NA >75 NA ND 3 > 25 > 50 NA >75 NA 46 ± 17 4 NA NA NA NA NA NA

a Peptide 1, FITC-BBKLpSQVY-COOH; Peptide 2, FITC-BBDNpSYHY-COOH; Peptide 3, FITC-BBFRpSLMH-COOH; Peptide 4, FITC-BBRRpSKQR -COOH. Peptides 1-4 were selected from screens of TopBP1 BRCT 7-8 (peptides 1 and 3), ANKRD32 BRCT2 (peptide 2), and XRCC1 (peptide 4) against the C-terminal pS/pT library and tested for binding by fluorescence polarization assay because they conformed to groups I (peptides 1 and 2), group II (peptide 3), and group III (peptide 4). BARD1 BRCT2 and TopBP1 BRCT 7-8 contained N-terminal GST tags, while the rest of the BRCT repeats presented ybbR13 and (His)6 tags at their C-termini. FITC, fluorescein isothiocyanate; B, beta-alanine (spacer); M, L-norleucine; ND, peptide binding affinity not determined; NA, no significant binding affinity.

190

Table 5.33 Dissociation constants of selected peptides against the tandem BRCT domains a of PTIP (BRCT 5-6) and the FHA-BRCT2 domains of Nibrin

Dissociation Constant (KD, Dissociation Constant (KD, μM) Peptide Sequence PTIP BRCT µM) Wild -Type R28A (Repeat 5-6) Nibrin FHA-BRCT2 Nibrin FHA-BRCT2

(1) Ac-ApSEpSFHAFNK-FITC 1.7 ± 0.2 ND ND

(2) Ac-ApSEEFHAFNK-FITC 13.0 ± 2.0 ND ND

(3) Ac-AEEpSFHAFNK-FITC NA ND ND

(4) FITC-BBApSEpSF-COOH 4.3 ± 0.5 ND ND

(5) FITC-BBQApSQEY-COOH 14.5 ± 1.9 ND ND

* (6) FITC-BBDMHpSSpSLTVE-NH2 12.7 ± 1.0 ND ND

* (7) FITC-BBDVMpSEpSMVET-NH2 17.3 ± 1.9 ND ND

(8) FITC-BBDEpTHLKI-NH2 ND 9.7 ± 0.9 NA

(9) FITC-BBpSEpTHLKI-NH2 ND 7.2 ± 1.6 NA

aPeptides 1-3 were amidated at the C-terminus and labeled with FITC on the lysine side chain. The N-terminus was capped with acetic anhydride (Ac = acetyl group). Peptides 4-9 were labeled with FITC on the N-terminal B residue (B = β-alanine), peptides 4 and 5 contained a free C-terminus while peptides 6-9 were amidated at the C-terminus. ND = peptide binding affinity not determined, NA = no significant binding affinity. Dissociation constants were determined by fluorescence polarization assays, where the polarization values were measured in triplicate to determine the KD values. *For peptides 6

and 7, M = L-norleucine. Both protein domains presented ybbR13 and (His)6 tags at their C-termini.

191

Figure 5.1 Scheme showing the structure of tandem BRCT domains cloned into a pET22-ybbR13 vector

BRCTBRCT11 BRCT2 DSLEFIASKLA AALEHHHHHH-COOH

ybbR13 tag histidine affinity tag

192

Figure 5.2 Synthesis scheme of the tandem BRCT one-bead-one-compound peptide libraries

(A) C-terminal pS/pT Library Synthesis (1) standard Fmoc/HBTU chemistry, (2) soak in water α overnight then 0.5 equiv N -Boc-Glu(δ-N-hydroxysuccinimidyl)-O-CH2CH=CH2, (3) Fmoc-Gly-OH/HBTU, (4) TFA, (5) HMPA/ HBTU, (6) 20% piperidine in DMF, (7) Fmoc-Arg(Pbf)-OH/HBTU, (8) 20% piperidine in DMF, (9) Fmoc-AA-OH/ DIC for 1st random position, HBTU/HATU coupling for remaining random positions, (10) Pd(PPh3)4, (11) 20% piperidine in DMF, (12) PyBOP and HOBt, (13) modified Reagent K. B, 4 β-alanine, X, random residues (X is either pS or pT, while rest of random positions are 20 different amino acids). (B) X7 pS/pT Library Synthesis (1) soak in water overnight then 0.45 equiv Boc-Met-OSu and 0.05 equiv Fmoc-Met-OSu plus 0.5 equiv DIPEA in 55:45 (v/v) DCM/ diethyl ether, (2) Fmoc-Met-OH/HBTU, (3) TFA, (4) Ac2O/NMM/DMAP, (5) standard Fmoc- AA-OH/HBTU or Fmoc-AA-OH/HATU chemistry, (6) Alloc-OSu, (7) modified reagent K. X5 is 25% pS, 25% pT, and the remaining 50% is one of 19 different amino acids. The rest of the random positions are 1% pS, 1% pT, and 98% are one of the 19 other amino acids.

193

Figure 5.3 Representative fluorescence polarization binding curve for the BRCT domainsa

a PTIP BRCT 5-6 binds FITC-labeled Ac-ApSEpSFHAFNK (FITC added to the sidechain of lysine, Ac = acetyl group, peptide is amidated at the C-terminus, PTIP BRCT 5-6 contains C -terminal ybbR13 and (His)6 tags).

a Figure 5.4 Sequence specificity of BRCA1 BRCT2 to the X7 library

aThe amino acid selectivity is presented from the N-terminal position (7) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine; pT = phosphothreonine; M = L-norleucine (methionine replacement).

194

a Figure 5.5 Sequence specificity of BRCA1 BRCT2 to the C-terminal library

aThe amino acid selectivity is presented from the N-terminal position (6) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine; pT = phosphothreonine; U = L-α-amino- butyrate (cysteine replacement); M = L-norleucine (methionine replacement).

a Figure 5.6 Sequence specificity of MDC1 BRCT2 to the C-terminal library

a The amino acid selectivity is presented from the N-terminal position (6) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine; pT = phosphothreonine; U = L-α-amino- butyrate (cysteine replacement); M = L-norleucine (methionine replacement).

195

a Figure 5.7 Sequence specificity of MDC1 BRCT2 to the X7 library

aThe amino acid selectivity is presented from the N-terminal position (7) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine ; pT = phosphothreonine; M = L-norleucine (methionine replacement).

a Figure 5.8 Sequence specificity of MCPH1 BRCT2 to the C-terminal library

aThe amino acid selectivity is presented from the N-terminal position (6) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine ; pT = phosphothreonine; U = L-α-amino- butyrate (cysteine replacement); M = L-norleucine (methionine replacement).

196

a Figure 5.9 Sequence specificity of MCPH1 BRCT2 to the X7 library

aThe amino acid selectivity is presented from the N-terminal position (7) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine; pT = phosphothreonine; M = L-norleucine (methionine replacement).

Figure 5.10 Sequence specificity of PTIP BRCT 3-4 to the C-terminal librarya

aThe amino acid selectivity is presented from the N-terminal position (6) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine ; pT = phosphothreonine; U = L-α-amino- butyrate (cysteine replacement); M = L-norleucine (methionine replacement).

197

Figure 5.11 Sequence specificity of PTIP BRCT 5-6 to the C-terminal librarya

aThe amino acid selectivity is presented from the N-terminal position (6) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine; pT = phosphothreonine; U = L-α-amino- butyrate (cysteine replacement); M = L-norleucine (methionine replacement).

Figure 5.12 Sequence specificity of PTIP BRCT 5-6 to the X7 library (group I sequences)a

a The amino acid selectivity is presented from the N-terminal position (7) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine ; pT = phosphothreonine; M = L-norleucine (methionine replacement). Group I sequences conformed to the (R/Y)-R-pS-X-V-(F/M)-H consensus.

198

Figure 5.13 Sequence specificity of PTIP BRCT 5-6 to the X7 library (group II sequences)a

aThe amino acid selectivity is presented from the N-terminal position (7) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine ; pT = phosphothreonine; M = L-norleucine (methionine replacement). Group II sequences conformed to the pS-X-pS-(F/L)-X-X-H consensus.

a Figure 5.14 Sequence specificity of WT Nibrin FHA-BRCT2 to the X7 library

a The amino acid selectivity is presented from the N-terminal position (7) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine ; pT = phosphothreonine; M = L-norleucine (methionine replacement).

199

Figure 5.15 Sequence specificity of WT Nibrin FHA-BRCT2 to the C-terminal librarya

aThe amino acid selectivity is presented from the N-terminal position (6) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine; pT = phosphothreonine; U = L-α-amino- butyrate (cysteine replacement); M = L-norleucine (methionine replacement).

a Figure 5.16 Sequence specificity of R28A Nibrin FHA-BRCT2 to the X7 library

aThe amino acid selectivity is presented from the N-terminal position (7) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine ; pT = phosphothreonine; M = L-norleucine (methionine replacement).

200

Figure 5.17 Sequence specificity of R28A Nibrin FHA-BRCT2 to the C-terminal librarya

aThe amino acid selectivity is presented from the N-terminal position (6) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine; pT = phosphothreonine; U = L-α-amino- butyrate (cysteine replacement); M = L-norleucine (methionine replacement).

a Figure 5.18 Sequence specificity of ANKRD32 BRCT2 to the X7 library

aThe amino acid selectivity is presented from the N-terminal position (7) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine; pT = phosphothreonine; M = L-norleucine (methionine replacement).

201

a Figure 5.19 Sequence specificity of BARD1 BRCT2 to the X7 library

aThe amino acid selectivity is presented from the N-terminal position (7) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine; pT = phosphothreonine; M = L-norleucine (methionine replacement).

a Figure 5.20 Sequence specificity of TopBP1 BRCT 1-2 to the X7 library

aThe amino acid selectivity is presented from the N-terminal position (7) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine; pT = phosphothreonine; M = L-norleucine (methionine replacement).

202

a Figure 5.21 Sequence specificity of TopBP1 BRCT 4-5 to the X7 library

aThe amino acid selectivity is presented from the N-terminal position (7) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine ; pT = phosphothreonine; M = L-norleucine (methionine replacement).

a Figure 5.22 Sequence specificity of XRCC1 BRCT2 to the X7 library

aThe amino acid selectivity is presented from the N-terminal position (7) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine ; pT = phosphothreonine; M = L-norleucine (methionine replacement).

203

a Figure 5.23 Sequence specificity of ANKRD32 BRCT2 to the C-terminal library

aThe amino acid selectivity is presented from the N-terminal position (6) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine; pT = phosphothreonine; U = L-α-amino- butyrate (cysteine replacement); M = L-norleucine (methionine replacement).

a Figure 5.24 Sequence specificity of BARD1 BRCT2 to the C-terminal library

aThe amino acid selectivity is presented from the N-terminal position (6) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine; pT = phosphothreonine; U = L-α-amino- butyrate (cysteine replacement); M = L-norleucine (methionine replacement).

204

Figure 5.25 Sequence specificity of TopBP1 BRCT 1-2 to the C-terminal librarya

aThe amino acid selectivity is presented from the N-terminal position (6) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine; pT = phosphothreonine; U = L-α-amino- butyrate (cysteine replacement); M = L-norleucine (methionine replacement).

Figure 5.26 Sequence specificity of TopBP1 BRCT 4-5 to the C-terminal librarya

aThe amino acid selectivity is presented from the N-terminal position (6) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine; pT = phosphothreonine; U = L-α-amino- butyrate (cysteine replacement); M = L-norleucine (methionine replacement).

205

Figure 5.27 Sequence specificity of TopBP1 BRCT 7-8 to the C-terminal librarya

a The amino acid selectivity is presented from the N-terminal position (6) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine; pT = phosphothreonine; U = L-α-amino- butyrate (cysteine replacement); M = L-norleucine (methionine replacement).

a Figure 5.28 Sequence specificity of XRCC1 BRCT2 to the C-terminal library

aThe amino acid selectivity is presented from the N-terminal position (6) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine; pT = phosphothreonine; U = L-α-amino- butyrate (cysteine replacement); M = L-norleucine (methionine replacement).

206

a Figure 5.29 Sequence specificity of DNA Ligase IV BRCT2 to the X7 library

aThe amino acid selectivity is presented from the N-terminal position (7) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine; pT = phosphothreonine; M = L-norleucine (methionine replacement).

a Figure 5.30 Sequence specificity of TP53BP1 BRCT2 to the X7 library

a The amino acid selectivity is presented from the N-terminal position (7) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine; pT = phosphothreonine; M = L-norleucine (methionine replacement).

207

Figure 5.31 Sequence specificity of DNA Ligase IV BRCT2 to the C-terminal librarya

aThe amino acid selectivity is presented from the N-terminal position (6) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine; pT = phosphothreonine; U = L-α-amino- butyrate (cysteine replacement); M = L-norleucine (methionine replacement).

a Figure 5.32 Sequence specificity of ECT2 BRCT2 to the C-terminal library

aThe amino acid selectivity is presented from the N-terminal position (6) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine; pT = phosphothreonine; U = L-α-amino- butyrate (cysteine replacement); M = L-norleucine (methionine replacement).

208

a Figure 5.33 Sequence specificity of TP53BP1 BRCT2 to the C-terminal library

a The amino acid selectivity is presented from the N-terminal position (6) to the C- terminal position (1). The height of each bar corresponds to the percent abundance of a particular amino acid selected at a certain position relative to all other amino acids selected at that random position. pS = phosphoserine; pT = phosphothreonine; U = L-α-amino- butyrate (cysteine replacement); M = L-norleucine (methionine replacement).

Figure 5.34 Peptide pull-down of PTIP BRCT 5-6 by various phosphopeptidesa

aCell lysate from HEK293 cells transfected with PTIP (residues 313-1069) was passed through micro-columns containing streptavidin -agarose resin with (1) no peptide, (2) biotin- (minipeg)2-BBQApSQEY-NH2, (3) biotin-(minipeg)2-BBQApSQEY-COOH, (4) biotin-(minipeg)2-

YBBApSEpSFHAFNK-NH2, and (5) biotin-(minipeg)2-YBBDMHpSSpSLTVE-NH2. Captured protein was eluted off the resin and subjected to SDS-PAGE and Western blot analysis. Minipeg, 8- amino-3,6-dioxaoctanoic acid (hydrophilic spacer); B, beta-alanine (spacer); M, L-norleucine (methionine mimic); pS, phosphoserine. Tyrosine residues were added to peptides 4-5 to quantify peptide concentration by their absorbance at 280 nm

209

BIBLIOGRAPHY

1. Pawson, T. and Nash, P. (2003) Assembly of cell regulatory systems through protein interaction domains. Science 300, 445-452.

2. Winder, S., et. al. (2002) Normalization of nomenclature for peptide motifs as ligands of modular protein domains. FEBS Lett. 513, 141-144.

3. Chai, J., Du, C., Wu, J.-W., Kyin, S., Wang, X., and Shi, Y. (2000) Structural and biochemical basis of apoptotic activation by Smac/DIABLO, Nature 406, 855-862.

4. Liu, Z., Sun, C., Olejniczak, E. T., Meadows, R. P., Betz, S. F., Oost, T., Herrmann, J., Wu, J. C., and Fesik, S. W. (2000) Structural basis for binding of Smac/DIABLO to the XIAP BIR3 domain, Nature 408, 1004-1008.

5. Ivarsson, Y. (2012) Plasticity of PDZ domains in ligand recognition and signaling. FEBS Lett. 586, 2638-2647.

6. Nourry, C., Grant, S.G.N., and Borg, J.P. (2003) PDZ domain proteins: plug and play!. Sci. STKE 179, re7.

7. Harris, B.Z. and Lim, W.A. (2001) Mechanism and role of PDZ domains in signaling complex assembly. J. Cell Sci. 114, 3219-3231.

8. Harrison, S.C. (1996) Peptide-surface association: the case of PDZ and PTB domains. Cell 86, 341-343.

9. Tonikian, R., et. al. (2008) A specificity map for the PDZ domain family. PLOS Biol. 6, e239.

10. Møller, T.C, Wirth, V.F., Roberts, N., Bender, J., Bach, A., Jacky, B., Strømgaard, K., Deussing, J., Schwartz, T., and Martinez, K. (2013) PDZ domain-mediated interactions of G protein-coupled receptors with Postsynaptic Density Protein 95: quantitative characterization of interactions. PLOS ONE 8: e63352.

11. Xu, XZ., Choudhury, A, Li, X., and Montrell, C. (1998) Coordination of an array of signaling proteins through homo- and heteromeric interactions between PDZ domains and target proteins. J. Cell Biol. 142, 545-555. 210

12. Maudsley, S., Zamah, A., Rahman, N., Blitzer, J., Luttrell, L., Lefkowitz, R., and Hall, R. (2000) Platelet-derived growth factor receptor association with Na+/H+ exchanger regulatory factor potentiates receptor activity. Mol. Cell Biol. 20, 8352-8363.

13. Kaech, S., Whitfield, C., and Kim, S. (1998) The LIN-2/LIN-7/LIN-10 complex mediates basolateral localization of the C. elegans EGF receptor LET-23 in vulval epithelial cells. Cell 94, 761-771.

14. Gujral, T., Karp, E., Chan, M., Chang, B., and MacBeath, G. (2013) Family- wide investigation of PDZ domain-mediated protein-protein interactions implicates β-Catenin in maintaining the integrity of tight junctions. Chemistry & Biology 20, 816-827.

15. Poliak, S., Matlis, S., Ullmer, C., Scherer, S., and Peles, E. (2002) Distinct claudins and associated PDZ proteins form different autotypic tight junctions in myelinating Schwann cells. J. Cell Biol. 159, 361-371.

16. Nagafuchi, A. (2001) Molecular architecture of adherens junctions. Curr. Opin. Cell. Biol. 13, 600-603.

17. Kim, E. and Sheng, M. (2004) PDZ domain proteins of synapses. Nat. Rev. Neurosci. 5, 771-781.

18. Hall, R.A. et al. (1998) The beta2-adrenergic receptor interacts with the Na+/H+-exchanger regulatory factor to control Na+/H+ exchange. Nature 392, 626-630.

19. Cao, T., Deacon, H., Reczek, D., Bretscher, A., and Zastrow, M. (1999) A kinase-regulated PDZ-domain interaction controls endocytic sorting of the β2- adrenergic receptor. Nature 401, 286-290.

20. Fanning, A. and Anderson, J. (1999) PDZ domains: fundamental building blocks in the organization of protein complexes at the plasma membrane. J. Clin. Invest. 103, 767-772.

21. Wade, J.B., Liu, J., Coleman, R.A., Cunningham, R., Steplock, D.A., Lee- Kwon, W., Pallone, T.L., Shenolikar, S., Weinman, E.J. (2003) Localization and interaction of NHERF isoforms in the renal proximal tubule of the mouse. Am. J. Physiol. Cell Physiol. 285, C1494-C1503.

22. Hillier, B.J., Christopherson, K.S., Prehoda, K.E., Bredt, D.S., and Lim, W.A. (1999) Unexpected modes of PDZ domain scaffolding revealed by structure of nNOS-syntrophin complex. Science 284, 812-815.

211

23. London, T.B., Lee, H.J., Shao, Y., and Zheng, J. (2004) Interaction between the internal motif KTXXXI of Idax and mDv1 PDZ domain. Biochem. Biophys. Res. Commun. 322, 326-332.

24. Lenfant, N., Polanowska, J., Bamps, S., Omi, S., Borg, J.P., and Reboul, J. (2010) A genome-wide study of PDZ-domain interactions in C. elegans reveals a high frequency of non-canonical binding. BMC Genomics 11, 671.

25. Liu, W., Wen, W., Wei, Z, Yu, J., Ye, F., Liu, C.H., Hardie, R.C., and Zhang, M. (2011) The INAD scaffold is a dynamic, -regulated modulator of signaling in the Drosophila eye. Cell 145, 1088-1101.

26. Penkert, R., DiVittorio, M., and Prehoda, K. (2004) Internal recognition through PDZ domain plasticity in the Par-6-Pals1 complex. Nat. Struct. Mol. Biol. 11, 1122-7.

27. Zimmermann, P., Meerschaert, K., Reekmans, G., Leenaerts, I., Small, J., Vandekerckhove, J., David, G., Gettemans, J. (2002) PIP2-PDZ domain binding controls the association of syntenin with the plasma membrane. Mol. Cell 9, 1215-1225.

28. Ivarsson, Y., et al. (2013) Prevalence, specificity and determinants of lipid- interacting PDZ domains from an in-cell screen and in vitro binding experiments. PLOS ONE 8, e54581.

29. Ciechanover, A. (1998) The ubiquitin-proteasome pathway: on protein death and cell life. EMBO J. 17, 7151-7160.

30. Hoege, C., Pfander, B., Moldevan, G., Pyrowolakis, G., and Jentsch, S. (2002) Rad6-dependent DNA repair is linked to modification of PCNA by ubiquitin and SUMO. Nature 419, 135-141.

31. Stelter, P. and Ulrich, H. (2003) Control of spontaneous and damage-induced mutagenesis by SUMO and ubiquitin conjugation. Nature 425, 188-191.

32. Kaiser, P., Flick, K., Wittenberg, C., and Reed, S. (2000) Regulation of transcription by ubiquitination without proteolysis: Cdc34/SCFMet30-mediated inactivation of the transcription factor Met4. Cell 102, 303-314.

33. Conaway, R. (2002) Emerging roles of ubiquitin in transcriptional regulation. Science 296, 1254-1258.

34. Sun, L. and Chen, Z. (2004) The novel functions of ubiquitination in signaling. Curr. Opin. Cell Biol. 16, 119-126. 212

35. Wilkinson, K. (2003) Signal transduction: aspirin, ubiquitin, and cancer. Nature 424, 738-739.

36. Welchman, R., Gordon, C., and Mayer, R. (2005) Ubiquitin and ubiquitin-like proteins as multifunctional signals. Nat. Rev. Mol. Cell Biol. 6, 599-609.

37. Haglund, K., et al. (2003) Multiple monoubiquitylation of RTKs is sufficient for their endocytosis and degradation. Nat. Cell Biol. 5, 461-466.

38. Terrell, J., Shih, S., Dunn, R., and Hicke, L. (1998) A function for monoubiquitination in the internalization of a G protein-coupled receptor. Mol. Cell 1, 193-202.

39. Seigneurin-Berny, D. et al. (2001) Identification of components of the murine histone deacetylase 6 complex: link between acetylation and ubiquitination signaling pathways. Mol. Cell Biol. 21, 8035-8044.

40. Hicke, L., Schubert, H., and Hill, C. (2005) Ubiquitin-binding domains. Nat. Rev. Mol. Cell Biol 6, 610-621.

41. Pai, M., Tzeng, S., Kovacs, J., Keaton, M., Li, S., Yao, T., and Zhou, P. (2007) Solution structure of the Ubp-M BUZ domain, a highly specific protein module that recognizes the C-terminal tail of free ubiquitin. J. Mol. Biol. 370, 290-302.

42. Reyes-Turcu, F., Horton, J., Mullally, J., Heroux, A., Cheng, X., and Wilkinson, K. (2006) The ubiquitin binding domain Znf UBP recognizes the C-terminal diglycine motif of unanchored ubiquitin. Cell 124, 1197-1208.

43. Allen, M.D. and Bycroft, M. (2007) The solution structure of the Znf UBP domain of USP33/VDU1. Protein Sci. 16, 2072-2075.

44. Bonnet, J., Romier, C., Tora, L., and Devys, D. (2008) Zinc-finger UBPs: regulators of deubiquitylation. Trends Biochem. Sci. 33, 369-375.

45. Nijman, S., et al. (2005) A genomic and functional inventory of deubiquitinating enzymes. Cell 123, 773-786.

46. Matheny, S.A., et al. (2004) Ras regulates assembly of mitogenic signaling complexes through the effector protein IMP. Nature 427, 256-260.

47. Hook, S.S., Orian, A., Cowley, S., and Eisenman, R. (2002) Histone Deacetylase 6 binds polyubiquitin through its Zinc Finger (PAZ) domain and co-purifies with deubiquitinating enzymes. PNAS U.S.A. 99, 13425-13430. 213

48. Boyault, C., et al. (2006) HDAC6-p97/VCP controlled polyubiquitin chain turnover. EMBO J. 25, 3357-3366.

49. Hard, R.L., Liu, J., Shen, J., Zhou, P., and Pei, D. (2010) HDAC6 and Ubp-M BUZ domains recognize specific C-terminal sequences of proteins. Biochemistry 49, 10737-10746.

50. Ouyang, H., et al. (2012) Protein aggregates are recruited to aggresome by Histone Deacetylase 6 via unanchored ubiquitin C-termini. J. Biol. Chem. 287, 2317-2327.

51. Dang, L.C., Stein, R., and Merli, F. (1998) Kinetic and mechanistic studies on the hydrolysis of Ubiquitin C-terminal 7-amido-4-methylcoumarin by deubiquitinating enzymes. Biochemistry 37, 1868-1879.

52. Stein, R.L., Chen, Z., and Melandri, F. (1995) Kinetic studies of Isopeptidase T: modulation of peptidase activity by Ubiquitin. Biochemistry 34, 12616-12623.

53. Nicassio, F., et al. (2007) Human USP3 is a chromatin modifier required for S phase progression and genome stability. Curr. Biol. 17, 1972-1977.

54. Joo, H.Y. et al. (2007) Regulation of cell cycle progression and by H2A deubiquitination. Nature 449, 1068-1072.

55. Finley, D. and Chau, V. (1991) Ubiquitination. Annu. Rev. Cell Biol. 7, 25-69.

56. Dompierre, J., Godin, J., Charrin, B., Cordelières, F., King, S., Humbert, S., and Saudou, F. (2007) Histone Deacetylase 6 inhibition compensates for the transport deficit in Huntington’s disease by increasing Tubulin acetylation. J. Neurosci. 27, 3571-3583.

57. Kawaguchi, Y., Kovacs, J., McLaurin, A., Vance, J., Ito, A., and Yao, T. (2003) The deacetylase HDAC6 regulates aggresome formation and cell viability in response to misfolded protein stress. Cell 115, 727-38.

58. Liu, B. and Nash, P. (2012) Evolution of SH2 domains and phosphotyrosine signaling networks. Phil. Trans. R. Soc. B. 367, 2556-2573.

214

59. Liu, X. and Ye, K. (2005) Src homology domains in phospholipase C-γ1 mediate its anti-apoptotic action through regulating the enzymatic activity. J. Neurochem. 93, 892-898.

60. Ravichandran, K., Lee, K., Songyang, Z., Cantley, L., Burn, P., and Burakoff, S. (1993) Interaction of Shc with the zeta chain of the T-cell receptor upon T- cell activation. Science 5, 902-905.

61. Liu, B., Jablonowski, K., Raina, M., Arcé, M., Pawson, T., and Nash, P. (2006) The human and mouse complement of SH2 domain proteins – establishing the boundaries of phosphotyrosine signaling. Mol. Cell 22, 851-868.

62. Waksman, G., Kumaran, S., Lubman, O. (2004) SH2 domains: role, structure, and implications for molecular medicine. Expert Rev. Mol. Med. 6, 1-18.

63. Waksman, G., et al. (1992) Crystal structure of the phosphotyrosine recognition domain SH2 of v-src complexed with tyrosine-phosphorylated peptides. Nature 358, 646-653.

64. Kaneko, T., et al. (2010) Loops govern SH2 domain specificity by controlling access to binding pockets. Sci. Signal 3, ra34.

65. H. Huang, L. Li, C. Wu, D. Schibli, K. Colwill, S. Ma, C. Li, P. Roy, K. Ho, Z. Songyang, T. Pawson, Y. Gao, S. S. Li, Defining the specificity space of the human SRC homology 2 domain. Mol. Cell. Proteomics 7, 768–784 (2008).

66. Chen, C., Martin, V., Gorenstein, N., Geahlen, R., and Post, C. (2011) Two closely spaced tyrosines regulate NFAT signaling in B cells via Syk association with Vav. Mol. Cell. Biol. 31, 2984-2996.

67. Donaldson, L., Gish, G., Pawson, T., Kay, L.E., and Forman-Kay, J. (2002) Structure of a regulatory complex involving the Abl SH3 domain, the Crk SH2 domain, and a Crk-derived phosphopeptide. PNAS 99, 14053-14058.

68. Yoh, S., Cho, H., Pickle, L., et al. (2007) The Spt6 SH2 domain binds Ser2-P RNAPII to direct Iws1-dependent mRNA splicing and export. Dev. 21, 160-174.

69. Blasutig, I., et al. (2008) Phosphorylated YDXV motifs and Nck SH2/SH3 adaptors act cooperatively to induce actin reorganization. Mol. Cell. Biol. 28, 2035-2046.

215

70. Prasad, N., Werner, M., Decker, S. (2009) Specific tyrosine phosphorylations mediates signal-dependent stimulation of SHIP2 inositol phosphatase activity, while the SH2 domain confers an inhibitory effect to maintain the basal activity. Biochemistry 48, 6285-6287.

71. Gonfloni, S., Weijland, A., Kretzschmar, J., Supertifurga, G. (2000) Crosstalk between the catalytic and regulatory domains allows bidirectional regulation of Src. Nat. Struct. Biol. 7, 281-286.

72. Filippakopoulos, et al. (2008) Structural coupling of SH2-kinase domains links Fes and Abl substrate recognition and kinase activation. Cell 134, 793-803.

73. Ottinger, E., Botfield, M., Shoelson, S. (1998) Tandem SH2 domains confer high specificity in tyrosine kinase signaling. J. Biol. Chem. 273, 729-735.

74. Woods, N.T., Mesquita, R.D., Sweet, M., Carvalho, M.A., Li, X., Liu, Y., Nguyen, H., Thomas, C.E., Iversen, E.S. Jr., Marsillac, S., Karchin, R., Koomen, J., and Monteiro, A.N. (2012) Charting the landscape of tandem BRCT domain-mediated protein interactions. Sci. Signal. 5, rs6.

75. Rappas, M., Oliver, A.W., and Pearl, L.H. (2011) Structure and function of the Rad9-binding region of the DNA-damage checkpoint adaptor TopBP1. Nucleic Acids Res. 39, 313-24.

76. Lloyd, J., Chapman, J.R., Clapperton, J.A., Haire, L.F., Hartsuiker, E., Li, J., Carr, A.M., Jackson, S.P., and Smerdon, S.J. (2009) A super-modular FHA/BRCT-repeat architecture mediates NBS1 adaptor function in response to DNA-damage. Cell 139, 100-111.

77. Koonin EV, Altschul SF, Bork P. BRCA1 protein products: functional motifs. Nat. Genet. 13, 266-268.

78. Callebaut, I., and Mornon, J.P. (1997) From BRCA1 to RAP1: a widespread BRCT module closely associated with DNA repair. FEBS Lett. 400, 25-30.

79. Bork, P., Hofmann, K., Bucher, P., Neuwald, A.F., Altschul S.F., and Koonin, E.V. (1997) A superfamily of conserved domains in DNA damage-responsive cell cycle checkpoint proteins. FASEB J. 11, 68-76.

80. Kobayashi, M., Figaroa, F., Meeuwenoord, N., Jansen, L.E., and Siegel, G. (2006) Characterization of the DNA binding and structural properties of the BRCT region of human replication factor C p140 subunit. J. Biol. Chem. 281, 4308-17.

216

81. Mesquita, R., Woods, N., Seabra-Junior, E., Monteiro, A. (2010) Tandem BRCT domains: DNA’s praetorian guard. Genes Cancer 1, 1140–1146.

82. Yu, X., Chini, C.C., He, M., Mer, G., and Chen, J. (2003) The BRCT domain is a phospho-protein binding domain. Science 302, 639-42.

83. Manke, I.A., Lowery, D.M., Nguyen, A., Yaffe, M. (2003) BRCT repeats as phosphopeptide-binding modules involved in protein targeting. Science 302, 636-639.

84. Botuyan, M.V., Nominé, Y., Yu, X., Juranic, N., Macura, S., Chen, J., and Mer, G. (2004) Structural basis of BACH1 phosphopeptide recognition by BRCA1 tandem BRCT domains. Structure 12, 1137-44.

85. Lee, M.S., Edwards, R.A., Thede, G.L., and Glover, J.N. (2005) Structure of the BRCT repeat domain of MDC1 and its specificity for the free COOH-terminal end of the Gamma-H2AX Histone tail. J. Biol. Chem. 280, 32053-6.

86. Shao, Z., Li, F., Sy, S., Yan, W., Zhang, Z., Gong, D., Wen, B., Huen, M., Gong, Q., Wu, J., and Shi, Y. (2012) Specific recognition of phosphorylated tail of H2AX by the tandem BRCT domains of MCPH1 revealed by complex structure. J. Struct. Biol. 177, 459-468.

87. Williams, R., Green R., Glover J.N. (2001) Crystal structure of the BRCT repeat region from the breast cancer-associated protein BRCA1. Nat. Struct. Biol. 8, 838-842.

88. Derbyshire, D., Basu, B., Serpell, L., et al. (2002) Crystal structure of human 53BP1 BRCT domains bound to p53 tumour suppressor. EMBO J. 21, 3863-72.

89. Birrane, G., Varma, A., Soni, A., Ladias, J. (2007) Crystal structure of the BARD1 BRCT domains. Biochemistry 46, 7706-7712.

90. Yan, W., Shao, Z., Li, F., Niu, L., Shi, Y., Teng, M., Li, X. (2011) Structural basis of γH2AX recognition by human PTIP BRCT5-6 domains in the DNA damage response pathway. FEBS Lett. 585, 3874-3879.

91. Campbell, S.J., Edwards, R.A., Glover, J.N. (2010) Comparison of the structures and peptide binding specificities of the BRCT domains of MDC1 and BRCA1. Structure 18, 167-76.

92. Rodriguez, M., Yu, X., Chen, J., and Songyang, Z. (2003) Phosphopeptide binding specificities of BRCA1 COOH-Terminal (BRCT) domains. J. Biol. Chem. 278, 52914-52918.

217

93. Stucki, M., Clapperton, J.A., Mohammad, D., Yaffe, M.B., Smerdon, S.J., and Jackson, S.P. (2005) MDC1 directly binds phosphorylated histone H2AX to regulate cellular responses to DNA double-strand breaks. Cell 123, 1213-1226.

94. Singh, N., Basnet, H., Wiltshire, T.D., Mohammad D.H., Thompson, J.R., Héroux, A., Botuyan, M.V., Yaffe, M.B., Couch, F.J., Rosenfeld, M.G., and Mer, G. (2012) Dual recognition of phosphoserine and phosphotyrosine in histone variant H2A.X by DNA damage response protein MCPH1. PNAS 109, 14381-86.

95. Leung, C., Gong, Z., Chen, J., and Glover, M. (2011) Molecular basis of BACH1/FANCJ recognition by TopBP1 in DNA replication checkpoint control. J. Biol. Chem. 286, 4292-4301.

96. Lloyd, J., Chapman, J.R., Clapperton, J.A., Haire, L.F., Hartsuiker, E., Li, J., Carr, A.M., Jackson, S.P., and Smerdon, S.J. (2009) A super-modular FHA/BRCT-repeat architecture mediates NBS1 adaptor function in response to DNA-damage. Cell 139, 100-111.

97. Hammet, et al. (2003) FHA domains as phospho-threonine binding modules in cell signaling. IUBMB Life 55, 23-27.

98. Leung, C. and Glover, M. (2011) BRCT domains: easy as one, two, three. Cell cycle 10, 2461-2470.

99. Derbyshire, D.J., Basu, B.P., Serpell, L.C., Joo, W.S., Date, T., Iwabuchi, K., and Doherty, A.J. (2002) Crystal structure of human 53BP1 BRCT domains bound to p53 tumour suppressor. EMBO J. 21, 3863-3872.

100. Sibanda, B.L., Critchlow, S.E., Begun, J., Pei, X.Y., Jackson, S.P., Blundell, T.L., and Pellegrini, L. (2001) Crystal structure of an XRCC4-DNA Ligase IV complex. Nat. Struct. Biol. 8, 1015-19.

101. Sheng, Z., Zhao, Y., and Huang, J. (2011) Functional evolution of BRCT domains from binding DNA to protein. Evol. Bioinform. 7, 87-97.

102.Glover, M., Williams, R., and Lee, M. (2004) Interactions between BRCT repeats and phosphoproteins: tangled up in two. TIBS 29, 579-585.

103.Mohammad, D.H., and Yaffe, M.B. (2009) 14-3-3 proteins, FHA domains, and BRCT domains in the DNA damage response. DNA Repair (Amst.) 8, 1009- 17.

218

104. Branzei, D. and Foiani, M. (2008) Regulation of DNA repair throughout the cell cycle. Nat. Rev. Mol. Cell Biol. 9, 297-308.

105. Cantor, S., et al. (2001) BACH1, a novel helicase-like protein, interacts directly with BRCA1 and contributes to its DNA repair function. Cell 105, 149-160.

106. Stewart, G., Wang, B., Bignell, C., Taylor, A., and Elledge, S. (2003) MDC1 is a mediator of the mammalian DNA damage checkpoint. Nature 421, 961-966.

107. Xie, A., et al. (2007) Distinct roles of chromatin-associated factors MDC1 and 53BP1 in mammalian double strand break repair. Mol. Cell 28, 1045-1057.

108. Lamarche, B., Orazio, N., and Weitzman, M. (2010) The MRN complex in double-strand break repair and telomere maintanence. FEBS Lett. 584, 3682- 3695.

109. Wu, X., Mondal, G., Wang, X., Wu, J., Yang, L., Pankratz, V., Rowley, M., and Couch, F. (2009) Microcephalin regulates BRCA2 and RAD51-associated DNA double strand break repair. Cancer Res. 69, 5531-5536.

110. Kaufmann, W. and Paules, R. (1996) DNA damage and cell cycle checkpoints. FASEB J. 10, 238-247.

111. Yu, X. and Chen, J. (2004) DNA damage-induced cell cycle checkpoint control requires CtIP, a phosphorylation dependent binding partner of BRCA1 C-terminal domains. Mol. Cell. Biol. 24, 9478-9486.

112. Gong, Z., Kim, J.E., Yun, C.C., Leung, J.N., Glover, M., and Chen, J. (2010) BACH1/FANCJ acts with TopBP1 and participates early in DNA replication checkpoint control. Mol. Cell 37, 438-446.

113. Chai, Y., Cui, J., Shao, N., Shyam, E., Reddy, P., Rao, V. (1999) The second BRCT domain of BRCA1 proteins interacts with p53 and stimulates transcription from the p21WAF/CIP1 . Oncogene 18, 263-268.

114. Kim, J., Billadeau, D., and Chen, J. (2005) The tandem BRCT domains of ECT2 are required for both negative and positive regulation of ECT2 in cytokinesis. J. Biol. Chem. 280, 5733-9.

115. Wu, L., Luo, K., Lou, Z., Chen, J. (2008) MDC1 regulates intra-S-phase checkpoint by targeting NBS1 to DNA double-stranded breaks. PNAS 105, 11200-11205.

219

116. Masi, A., Gullotta, F., Cappadonna, V., Leboffe, L., and Ascenzi, P. (2011) Cancer predisposing mutations in BRCT domains. IUBMB Life 63, 503-512.

117. Cull, M., Miller, J., and Schatz, P. (1992) Screening for receptor ligands using large libraries of peptides linked to the C-terminus of the lac repressor. PNAS 89, 1865-1869.

118. Sidhu, S., Lowman, H., Cunningham, B., and Wells, J. (2000) Phage display for selection of novel binding peptides. Methods Enzymol. 328, 333-363.

119. Pei, D. and Wavreille, A. (2007) Reverse interactomics: decoding protein- protein interactions with combinatorial peptide libraries. Mol. Biosyst. 3, 536- 541.

120. Lam, K., Liu, R., Miyamota, S., Lehman, A., Tuscano, J. (2003) Applications of one-bead-one-compound combinatorial libraries and chemical microarrays in signal transduction research. Acc. Chem. Res. 36, 370-377.

121. Fields, S. and Song, O. (1989) A novel genetic system to detect protein-protein interactions. Nature 340, 245-246.

122. Brückner, A., Polge, C., Lentze, N., Auerbach, D., and Schlattner, U. (2009) Yeast two-hybrid, a powerful tool for systems biology. Int. J. Mol. Sci. 10, 2763-2788.

123. Caufield, J., Sakhawalkar, N., and Uetz, P. (2012) A comparison and optimization of yeast two-hybrid systems. Methods 58, 317-324.

124. Lee, H., Ko, J., and Kim, E. (2006) Analysis of PDZ domain interactions using yeast two-hybrid and co-immunoprecipitation assays. Methods Mol. Biol. 332, 233-244.

125. Gisler, S., et al. (2008) Monitoring protein-protein interactions between the mammalian integral membrane transporters and PDZ-interacting partners using a modified split-ubiquitin membrane yeast two-hybrid system. Mol. Cell. Proteomics 7, 1362-1377.

126. Hook, S., Orian, A., Cowley, S., and Eisenman, R. (2002) Histone deacetylase 6 binds polyubiquitin through its zinc finger (PAZ domain) and copurifies with deubiquitinating enzymes. PNAS 99, 13425-13430.

127. Park, D. and Yun, Y. (2001) Tyrosine phosphorylation-dependent yeast two- hybrid system for the identification of the SH2 domain-binding proteins. Mol. Cells 12, 244-249.

220

128. Wang, B., Lemay, S., Tsai, S., and Veillette, A. (2001) SH2 domain-mediated interaction of inhibitory protein tyrosine kinase Csk with protein tyrosine phosphatase-HSCF. Mol. Cell. Biol. 21, 1077-1088.

129. Crouin, C., Arnaud, M., Gesbert, F., Camonis, J., and Bertoglio, J. (2001) A yeast two-hybrid study of human p97/Gab2 interactions with its SH2 domain- containing binding partners. FEBS Lett. 495, 148-153.

130. Choi, Y., Kim, C., and Yun, Y. (1999) Lad, an adaptor protein interacting with the SH2 domain of p56lck, is required for T cell activation. J. Immunol. 163, 5242-5249.

131. Dombrosky-Ferlan, P. and Corey, S. (1997) Yeast two-hybrid in vivo association of the Src kinase Lyn with the proto-oncogene product Cbl but not with the p85 subunit of PI 3-kinase. Oncogene 14, 2019-2024.

132. Keilhack, H., et al. (2001) Negative regulation of Ros receptor tyrosine kinase signaling: an epithelial function of the SH2 domain protein tyrosine phosphatase SHP-1. J. Cell Biol. 152, 325-334.

133. Riedel, H., Wang, J., Hansen, H., and Yousaf, Y. (1997) PSM, an insulin- dependent, pro-rich, PH, SH2 domain containing partner of the insulin receptor. J. Biochem. 122, 1105-1113.

134. Yu, X., Wu, L., Bowcock, A., Aronheim, A., Baer, R. (1998) The C-terminal (BRCT) domains of BRCA1 interact in vivo with CtIP, a protein implicated in the CtBP pathway of transcriptional repression. J. Biol. Chem. 273, 25388- 25392.

135. Li, S., Chen, P., Subramanian, T., Chinnadurai, G., Tomlinson, G., Osborne, G., Sharp, Z., and Lee, W (1999) Binding of CtIP to the BRCT repeats of BRCA1 involved in the transcription regulation of p21 is disrupted upon DNA damage. J. Biol. Chem. 274, 11334-8.

136. Liu, Y., Woods, N., Kim, D., Sweet, M., Monteiro, A., and Karchin, R. (2011) Yeast two-hybrid junk sequences contain linear motifs. Nucleic Acids Res. 39, e128.

137. Mäkiniemi, M., et al. (2001) BRCT domain-containing protein TopBP1 functions in DNA replication and damage response. J. Biol. Chem. 276, 30399-30406.

138. Gilmore, P., McCabe, N., and Quinn, J. (2004) BRCA1 interacts with and is required for Paclitaxel-induced activation of mitogen-activated protein kinase kinase kinase 3. Cancer Res. 64, 4148-4154. 221

139. Aglipay, J., Martin, S., Hideuyuki, T., Lee, S., and Ouchi, T. (2006) ATM activation by ionizing radiation requires BRCA1-associated BAAT1. J. Biol. Chem. 281, 9710-9718.

140. Hunter, T. and Plowman, G. (1997) The protein kinases of budding yeast: six score and more. Trends Biol. Sci. 22, 18–22.

141. Smith, G. (1985) Filamentous phage fusion: novel expression vectors that display cloned antigens on the surface of the viron. Science 228: 1315– 1317.

142. Willats, W. (2002) Phage display: practicalities and prospects. Mol. Biol. 50, 837-854.

143. Rodi, D., Soares, A., Makowski, L. (2002) Quantitative assessment of peptide sequence diversity in M13 combinatorial peptide phage display libraries. J. Mol. Biol. 322, 1039-1052.

144. Cwirla, S., Peters, E., Barrett, R., and Dower, W. (1990) Peptides on phage: A vast library of peptides for identifying ligands. PNAS 87, 6378-6382.

145. Gram, H., Schmitz, R., Zuber, J., and Baumann, G. (1997) Identification of phosphopeptide ligands for the Src-homology 2 (SH2) domain of Grb2 by phage display. Eur. J. Biochem. 246, 633-637.

146. Khati, M. and Pillay, T. (2003) Phosphotyrosine phosphoepitopes can be rapidly analyzed by coexpression of a tyrosine kinase in bacteria with a T7 bacteriophage display library. Anal. Biochem. 325, 164-167.

147. Cochrane, D., Webster, C., Masih, G., and McCafferty, J. (2000) Identification of natural ligands for SH2 domains from a phage display cDNA library. J. Mol. Biol. 297, 89-97.

148. Laura, P., Witt, A., Held, H., Gerstner, R., Deshayes, K., Koehler, M., Kosik, K., Sidhu, S., and Lasky, L. (2002) The Erbin PDZ domain binds with high affinity and specificity to the carboxyl termini of delta-catenin and ARVCF. J. Biol. Chem. 277, 12906-12914.

149. Sharma, S., Memic, A., Rupasinghe, C., Duc, A., and Spaller, M. (2009) T7 phage display as a method of peptide ligand discovery for PDZ domain proteins. Biopolymers 92, 183-93.

222

150. Cull, M., Miller, J., and Schatz, P. (1992) Screening for receptor ligands using large libraries of peptides linked to the C terminus of the lac repressor. PNAS 89, 1865-1869.

151. Strickler, N., Christopherson, K., Yi, B., Schatz, P., Raab, R., Dawes, G., Bassett, D., Bredt, D., and Li, M. (1997) PDZ domain of neuronal nitric oxide synthase recognizes novel C-terminal peptide sequences. Nat. Biotech. 15, 336-342.

152. Wang, S., Raab, R., Schatz, P., Guggino, W., and Li, M. (1998) Peptide binding consensus of the NHE-RF-PDZ1 domain matches the C-terminal sequence of cystic fibrosis transmembrane conductance regulator (CFTR). FEBS Lett. 427, 103-108.

153. Strickler, N., Schatz, P., and Li, M. (1999) Using the lac repressor system to identify interacting proteins. Methods Enzymol. 303, 451-468.

154. Miyawaki, A. and Tsien, R. (2000) Monitoring protein conformations and interactions by fluorescence energy transfer between mutants of green fluorescent protein. Methods Enzymol. 327, 472-500.

155. You, X., Nguyen, A., Jabaiah, A., Sheff, M., Thorn, K., and Daugherty, P. (2006). Intracellular protein interaction mapping with FRET hybrids. PNAS 103, 18458-18463.

156. Isono, E. and Schwechheimer, C. (2010) Co-immunoprecipitation and protein blots. Methods Mol. Biol. 655, 377-387.

157. Kool, J., Jonker, N., Irth, H., and Niessen, W. (2011) Studying protein–protein affinity and immobilized ligand–protein affinity interactions using MS-based methods. Anal. Bioanal. Chem. 401, 1109-1125.

158. Towbin, H., Staehelin, T., and Gordon, J. (1979) Electrophoretic transfer of proteins from polyacrylamide gels to nitrocellulose sheets: Procedure and some applications. PNAS 76, 4350-4354.

159. Malmberg, E., Andersson, C., Gentzsch, M., Chen, J., Mengos, A., Cui, L., Hannson, G., and Riordan, J. (2004) Bcr (breakpoint cluster region) protein binds to PDZ domains of scaffold protein PDZK1 and vesicle coat protein Mint3. J. Cell Sci. 117, 5535-5541.

160. Zhou, Y., Fang, L., Du, D., Zhou, W., Feng, X., Chen, J., Zhang, Z., Chen, Z. (2008) Proteome identification of binding-partners interacting with cell polarity protein Par3 in Jurkat cells. Acta Biochim. Biophys. Sin. 40, 729-39.

223

161. Métais, J., Navarro, C., Santoni, M., Audebert, S., Borg, J. (2005) hScrib interacts with ZO-2 at the cell–cell junctions of epithelial cells. FEBS Lett. 579, 3725-3730.

162. Cai, H. and Reed, R. (1999) Cloning and characterization of Neuropilin-1- Interacting Protein: A PSD-95/Dlg/ZO-1 domain-containing protein that interacts with the cytoplasmic domain of Neuropilin-1. J. Neuroscience 19, 6519-6527.

163. Shima, T., Okumura, T., Takao, Y., Yagi, T., Okada, M., Nagai, K. (2001) Interaction of the SH2 domain of Fyn with a cytoskeletal protein, β-Adducin. J. Biol. Chem. 276, 42233-42240.

164. Lettau, M., et al. (2009) The adapter protein Nck: Role of individual SH3 and SH2 binding modules for protein interactions in T lymphocytes. Prot. Sci. 19, 658-669.

165. Magnard, C., Bachelier, R., Vincent, A., Jaquinod, M., Kieffer, S., Lenoir, G., and Venezia, N. (2002) BRCA1 interacts with acetyl-CoA carboxylase through its tandem of BRCT domains. Oncogene 21, 6729-6739.

166. Songyang, Z., Shoelson, S., Chaudhuri, M., Gish, G., Pawson, T., Haser, W., King, F., Roberts, T., Ratnofsky, S., Lechleider, J., Neel, G., Birge, B., Fajardo, J., Chou, M., Hanafusa, H., Schaffhausen, B., and Cantley, L. (1993) SH2 domains recognize specific phosphopeptide sequences. Cell 72, 767-78

167. Songyang, Z., Fanning, A., Fu, C., Xu, J., Marfatia, S., Chishti, A., Crompton, A., Chan, A., Anderson, J., and Cantley, L. (1997) Recognition of unique carboxyl-terminal motifs by distinct PDZ domains. Science 275, 73-77.

168. Songyang, Z., Shoelson, S., McGlade, J., Olivier, P., Pawson, T., Bustelo, X., Barbacid, M., Sabe, H., Hanafusa, H., Yi, T., Ren, R., Baltimore, D., Ratnofsky, S., Feldman, R., and Cantley, L. (1994) Specific motifs recognized by the SH2 domains of Csk, 3BP2, fps/fes, GRB-2, HCP, SHC, Syk, and Vav. Mol. Cell. Biol. 14, 2777-85.

169. Frank, R. (1992) Spot synthesis: An easy technique for the positionally addressable, parallel chemical synthesis on a membrane support. Tetrahedron 48, 9217-9232.

170. Boisguerin, P., Leben, R., Ay, B., Radziwill, G., Moelling, K., Dong, L. and Volkmer-Engert, R. (2004) An improved method for the synthesis of cellulose membrane-bound peptides with free C termini is useful for PDZ domain binding studies, Chem. Biol. 11, 449–59.

224

171. Hoffmüller, U., Russwurm, M., Kleinjung, F., Ashurst, J., Oschkinat, H., Volkmer-Engert, R., Koesling, D., and Schneider-Mergener, J. (1999) Interaction of a PDZ protein domain with a synthetic library of all human protein C-termini. Angew. Chem., Int. Ed. 38, 2000-4.

172. Wiedermann, U., Boisquerin, P., Leben, R., Leitner, D., Krause, G., Moelling, K., Volkmer-Engert, R., and Oschkinat, H. (2004) Quantification of PDZ domain specificity, prediction of ligand affinity and rational design of super- binding peptides. J. Mol. Biol. 343, 703-18.

173. Liu, B., Jablonowski, K., Shah, E., Engelmann, B., Jones, R., and Nash, P. (2010) SH2 domains recognize contextual peptide sequence information to determine selectivity. Mol. Cell. Prot. 9.11, 2391-2404.

174. Clapperton, J., Manke, I., Lowery, D., Ho, T., Haire, L., Yaffe, M., and Smerdon, S. (2004) Structure and mechanism of BRCA1 BRCT domain recognition of phosphorylated BACH1 with implications for cancer. Nat. Struct. Mol. Biol. 11, 512-518.

175. Rodriquez, M., Li, S., Harper, J., and Songyang, Z. (2004) An oriented peptide array library (OPAL) strategy to study protein-protein interactions. J. Biol. Chem. 279, 8802-8807.

176. Huang, H., Li, L., Wu, C., Schibli, D., Colwill, K., Ma, S., Li, C., Roy, P., Ho, K., Songyang, Z., Pawson, T., Gao, Y., Li, S. (2008) Defining the specificity space of the human Src Homology 2 domain. Mol. Cell. Prot. 7.4, 768-784.

177. MacBeath, G. and Schreiber, S. (2000) Printing proteins as microarrays for high-throughput function determination. Science 289, 1760-1763.

178. Stiffler, M., Grantcharova, V., Sevecka, M., and MacBeath, G. (2006) Uncovering quantitative protein interaction networks for mouse PDZ domains using protein microarrays. JACS 128, 5913-5922.

179. Siffler, M., et al. (2007) PDZ domain binding selectivity is optimized across the mouse proteome. Science 317, 364-369.

180. Espejo, A., Côté, J., Bednarek, A., Richard, S., Bedford, M. (2002) A protein- domain microarray identifies novel protein–protein interactions. Biochem. J. 367, 697-702.

181. Jones, R., Gordus, A., Krall, J., MacBeath, G. (2006) A quantitative protein interaction network for the ErbB receptors using protein microarrays. Nature 439, 168-174.

225

182. Lam, K., Salmon, S., Hersh, E., Hruby, V., Kazmierski, W., and Knapp, R. (1991) A new type of synthetic peptide library for identifying ligand-binding activity. Nature 354, 82-84.

183. Lam, K., Lebl, M., and Krchňák, V. (1997) The “one-bead-one-compound” combinatorial library method. Chem. Rev. 97, 411-448.

184. Amblard, M., Fehrentz, J., Martinez, J., and Subra, G. (2006) Methods and protocols of modern solid phase peptide synthesis. Mol. Biotech. 33, 239-254.

185. Liu, R., Marik, J., and Lam, K. (2002) A novel peptide-based encoding system for “one-bead one-compound” peptidomimetic and small molecule combinatorial libraries. JACS 124, 7678-7680.

186. Chen, X., Tan, P.H., Zhang, Y., and Pei, D. (2009) On-bead screening of combinatorial libraries: reduction of nonspecific binding by decreasing surface ligand density. J. Comb. Chem. 11, 604-611.

187. Joo, S. and Pei, D. (2008) Synthesis and screening of support-bound combinatorial peptide libraries with free C-termini: Determination of the sequence specificity of PDZ domains. Biochemistry 47, 3061-3072.

188. Furka, A., Sebestyen, F., Asgedom, M., Dibo, G. (1991) General method for rapid synthesis of multicomponent peptide mixtures. Int. J. Peptide Protein Res. 37, 487-493.

189. Houghten, R., Pinilla, C., Blondelle, S., Appel, J., Dooley, C., and Cuervo, J. (1991) Generation and use of synthetic peptide combinatorial libraries for basic research and drug discovery. Nature 354, 84-86.

190. Biederman, K., Lee, H., Haney, C., Kaczmarek, M., and Buettner, J. (1999) Combinatorial peptide on resin analysis: optimization of static nanoelectrospray ionization technique for sequence determination. J. Pept. Res. 53, 234-243.

191. Wang, P., Arabaci, G., and Pei, D. (2001) Rapid sequencing of library- derived peptides by partial Edman degradation an mass spectrometry. J. Comb. Chem. 3, 251-254.

192. Sweeney, M. and Pei, D. (2003) An improved method for rapid sequencing of support-bound peptides by partial Edman degradation and mass spectrometry. J. Comb. Chem. 5, 218-222.

226

193. Thakkar, A., Wavreille, A., and Pei, D. (2006) A traceless capping agent for peptide sequencing by partial Edman degradation and mass spectrometry. Anal. Chem. 78, 5935-5939.

194. Sweeney, M., Wavreille, A., Park, J., Butchar, J., Tridandapani, S., and Pei, D. (2005) Decoding protein-protein interactions through combinatorial chemistry: Sequence specificity of SHP-1, SHP-2, and SHIP SH2 domains. Biochemistry 44, 14932-14947.

195. Qin, C., Wavreille, A., and Pei, D. (2005) Alternative mode of binding to phosphotyrosyl peptides by Src Homology-2 domains. Biochemistry 44, 12196-12202.

196. Imhof, D., Wavreille, A., May, A., Zacharias, M., Tridandapani, S., and Pei, D. (2006) Sequence specificity of SHP-1 and SHP-2 Src Homology 2 domains. J. Biol. Chem. 281, 20271-20282.

197. Zhang, Y., Zhou, S., Wavreille, A., DeWille, J., and Pei, D. (2008) Cyclic peptidyl inhibitors of Grb2 and Tensin SH2 domains identified from combinatorial libraries. J. Comb. Chem. 10, 247-255.

198. Zhang, Y., Wavreille, A., Kunys, A., and Pei, D. (2009) The SH2 Domains of Inositol Polyphosphate 5-phosphatases SHIP1 and SHIP2 have similar ligand specificity but different binding kinetics. Biochemistry 48, 11075-11083.

199. Zhao, B., Tan, P., Li, S., and Pei, D. (2013) Systematic characterization of the specificity of the SH2 domains of cytoplasmic tyrosine kinases. J. Proteomics 81, 56-69.

200. Hoedemaeker, F., Siegal, G., Roe, S., Driscoll, P., and Abrahams, J. (1999) Crystal structure of the C-terminal SH2 Domain of the p85a regulatory subunit of Phosphoinositide 3-Kinase: An SH2 domain mimicking its own substrate. J. Mol. Biol. 292, 763-770.

201. Spurkland, A., Brinchmann, J., Markussen, G., Pedeutour, F., Munthe, E., Lea, T., Vartdal, F. and Aasheim, H. (1998) Molecular cloning of a T cell specific adapter protein (TSAd) containing an Src homology (SH) 2 domain and putative SH3 and phosphotyrosine binding sites. J. Biol. Chem. 273, 4539- 4546.

202. Sundvold,V., Torgersen, K., Post, N., Marti, F., King, P., et al. (2000) T cell specific adapter protein inhibits T cell activation by modulating Lck activity. J. Immunol. 165, 2927–2931.

227

203. Granum, S., Andersen, T., Sorlie, M., Jorgensen, M., Koll, L., Berge, T., Lea, T., Fleckenstein, B., Spurkland, A., and Gjerstad, V. (2008) Modulation of Lck function through multisite docking to T cell-Specific Adapter Protein. J Biol Chem 283, 21909–21919.

204. Berge, T., Gjerstad, V., Granum, S., Anderson, T., Holthe, G., Welsh, L., Andreotti, A., Inngjerdingen, M., and Spurkland, A. (2010) T Cell Specific Adapter Protein (TSAd) interacts with Tec kinase ITK to promote CXCL12 induced migration of human and murine T cells. PLoS One 5, e9761.

205. Park, D., Park, I., Lee, D., Choi Y., Lee H., et al. (2007) The adaptor protein Lad associates with the G protein beta subunit and mediates chemokine- dependent T-cell migration. Blood 109: 5122–5128.

206. Park, E., Choi, Y., Ahn, E., Park, I., and Yun, Y. (2009) The adaptor protein, LAD/TSAd mediates Laminin-dependent T cell migration via association with the 67 kDa Laminin Binding Protein (LBP). Exp. Mol. Med. 41, 728–736.

207. Matsumoto, T., Bohman, S., Dixelius, J., Berge, T., Dimberg, A., et al. (2005) VEGF receptor-2 Y951 signaling and a role for the adapter molecule TSAd in tumor angiogenesis. EMBO J. 24, 2342–2353.

208. Marti, F., Post, N., Chan, E., and King, P. (2001) A transcription function for the T Cell–Specific Adapter (TSAd) protein in T cells: Critical role of the TSAd Src Homology 2 domain. J. Exp. Med. 193, 1425-1430.

209. Katan, M. (1998) Families of phosphoinositide-specific phospholipase C: structure and function. Biochimica et Biophysica Acta 1436, 5-17.

210. Braiman, A., Barda-Saad, M., Sommers, C., and Samelson, L. (2006) Recruitment and activation of PLCγ1 in T cells: a new insight into old domains. EMBO J. 25, 774-784.

211. Poulin, B., Sekiya, F., and Rhee, S. (2005) Intramolecular interaction between phosphorylated tyrosine-783 and the c-terminal Src Homology 2 domain activates phospholipase C-gamma1. PNAS 102, 4276-4281.

212. Ebinu, J., Bottorff, D., Chan, E., Strang. S., Dunn, R., and Stone, J. (1998) RasGRP, a Ras guanyl nucleotide-releasing protein with calcium- and diacylglycerol-binding motifs. Science 280, 1082-1086.

213. Ebinu, J., Stang, S., Teixeira, C., Bottorff, D., Hooton, J., Blumberg, P., Barry, M., Bleakley, R., Ostergaard, H., Stone, J. (2000) RasGRP links T- cell receptor signaling to Ras. Blood 95, 3199-3203.

228

214. Weiss, A. and Littman, D. (1994) Signal transduction by lymphocyte antigen receptors. Cell 76, 263-274.

215. Wang, Y., Tomar, A., George, S., and Khurana, S. (2007) Obligatory role for phospholipase C-1 in villin-induced epithelial cell migration. Am. J. Physiol. Cell Physiol. 292, 1775-1786.

216. Rhee, S. and Bae, Y. (1997) Regulation of phosphoinositide-specific phospholipase C isozymes. J. Biol. Chem. 272, 15045-15048.

217. Rhee, S. and Choi, K. (1992) Regulation of Inositol Phospholipid-specific Phospholipase C isozymes. J. Biol. Chem. 267, 12393-12396.

218. Choi, J., Ryu, S., and Suh, P. (2007) On/off-regulation of phospholipase C-훾1- mediated signal transduction. Advanc. Enzyme Regul. 47, 104-116.

219. Wells, A. and Grandis, J. (2003) Phospholipase C-γ1 in tumor progression. Clin. Exp. Metastasis 20, 285–90.

220. Yamaguchi, H. and Condeelis, J. (2007) Regulation of the actin cytoskeleton in cancer cell migration and invasion. Bioch. Byophys. Acta. 1773, 642–652.

221. Sala, G., et al. (2008) Phospholipase Cgamma1 is required for metastasis development and progression. Cancer Res. 68, 10187-96.

222. Chang, J., Noh, D., Park, I., Kim, M., Song, H., Ryu, S., et al. (1997) Overexpression of phospholipase C-γ1 in rat 3Y1 fibroblast cells leads to malignant transformation. Cancer Res. 57, 5465-8.

223. Pascal, S., Singer, A., Gish, G., Yamazaki, T., Shoelson, S., Pawson, T., Kay, L., and Kay, J. (1994) Nuclear magnetic resonance structure of an SH2 domain of Phospholipase C-γ1 complexed with a high affinity binding peptide. Cell 77, 461-472.

224. Li, L., Wu, C., Huang, H., Zhang, K., Gan, J., and Li, S. S. (2008) Prediction of phosphotyrosine signaling networks using a scoring matrix-assisted ligand identification approach. Nucleic Acids Res. 36, 3263–3273.

225. Sun, et al. (2012) VEGFR2 induces c-Src signaling and vascular permeability in vivo via the adaptor protein TSAd. J. Exp. Med. 209, 1363-1377.

229

226. Sun, W., Wei, X., Kesavan, K., Garrington, T., Fan, R., Mei, J., Anderson, S., Gelfand, E., and Johnson, G. (2003) MEK Kinase 2 and the adaptor protein Lad regulate Extracellular Signal-Regulated Kinase 5 activation by Epidermal Growth Factor via Src. Mol. Cell. Biol. 23, 2298-2308.

227. DeBell, K., Graham, L., Reischl, I., Serrano, C., Bonvini, E., and Rellahan., B. (2007) Intramolecular regulation of Phospholipase C-γ1 by its C-terminal Src Homology 2 domain. Mol. Cell. Biol. 27, 854-863.

228. Chattopadhyay, A., Vecchi, M., Ji, Q., Mernaugh, R., and Carpenter, G. (1999) The role of individual SH2 domains in mediating association of Phospholipase C-γ1 with the activated EGF receptor. J. Biol. Chem. 274, 26091-26097.

229. Chiu, C. Y., Leng, S., Martin, K. A., Kim, E., Gorman, S., and Duhl, D. M. (1999) Cloning and characterization of T-cell lymphoma invasion and metastasis 2 (TIAM2), a novel guanine nucleotide exchange factor related to TIAM1. Genomics 61, 66–73.

230. Habets, G. G., van der Kammen, R. A., Stam, J. C., Michiels, F., and Collard, J. G. (1995) Sequence of the human invasion-inducing TIAM1 gene, its conservation in evolution and its expression in tumor cell lines of different tissue origin. Oncogene 10, 1371–1376.

231. Malliri, A., van Es, S., Huveneers, S., and Collard, J. G. (2004) The Rac exchange factor Tiam1 is required for the establishment and maintenance of cadherin-based adhesions. J. Biol. Chem. 279, 30092–30098.

232. Woodcock, S. A., Rooney, C., Liontos,M., Connolly, Y., Zoumpourlis, V., Whetton, A. D., Gorgoulis, V. G., and Malliri, A. (2009) SRC induced disassembly of adherens junctions requires localized phosphorylation and degradation of the rac activator tiam1. Mol. Cell 33, 639–653.

233. Mertens, A. E., Rygiel, T. P., Olivo, C., van der Kammen, R., and Collard, J. G. (2005) The Rac activator Tiam1 controls tight junction biogenesis in keratinocytes through binding to and activation of the Par polarity complex. J. Cell Biol. 170, 1029–1037.

234. Nishimura, T., Yamaguchi, T., Kato, K., Yoshizawa, M., Nabeshima, Y., Ohno, S., Hoshino, M., and Kaibuchi, K. (2005) PAR-6-PAR-3 mediates Cdc42-induced Rac activation through the Rac GEFs STEF/Tiam1. Nat. Cell Biol. 7, 270–277.

230

235. Sander, E. E., van Delft, S., ten Klooster, J. P., Reid, T., van der Kammen, R. A., Michiels, F., and Collard, J. G. (1998) Matrix dependent Tiam1/Rac signaling in epithelial cells promotes either cell cell adhesion or cell migration and is regulated by phosphatidylinositol 3-kinase. J. Cell Biol. 143, 1385– 1398.

236. Shepherd, T. R., Klaus, S. M., Liu, X., Ramaswamy, S., DeMali, K. A., and Fuentes, E. J. (2010) The Tiam1 PDZ domain couples to Syndecan1 and promotes cell-matrix adhesion. J. Mol. Biol. 398, 730–746.

237. Kunda, P., Paglini, G., Quiroga, S., Kosik, K., and Caceres, A. (2001) Evidence for the involvement of Tiam1 in axon formation. J. Neurosci. 21, 2361–2372.

238. Leeuwen, F. N.,Kain, H. E.,Kammen, R. A.,Michiels, F.,Kranenburg, O. W., and Collard, J. G. (1997) The guanine nucleotide exchange factor Tiam1 affects neuronal morphology; opposing roles for the small GTPases Rac and Rho. J. Cell Biol. 139, 797–807.

239. Tanaka, M., Ohashi, R., Nakamura, R., Shinmura, K., Kamo, T., Sakai, R., and Sugimura, H. (2004) Tiam1 mediates neurite outgrowth induced by ephrin-B1 and EphA2. EMBO J. 23, 1075–1088.

240. Tolias, K. F., Bikoff, J. B., Burette, A., Paradis, S., Harrar, D., Tavazoie, S., Weinberg, R. J., and Greenberg, M. E. (2005) The Rac1-GEF Tiam1 couples the NMDA receptor to the activity-dependent development of dendritic arbors and spines. Neuron 45, 525–538.

241. Tolias, K. F., Bikoff, J. B., Kane, C. G., Tolias, C. S., Hu, L., and Greenberg, M. E. (2007) The Rac1 guanine nucleotide exchange factor Tiam1 mediates EphB receptor-dependent dendritic spine development. Proc. Natl. Acad. Sci. U.S.A. 104, 7265–7270.

242. Hou, M., Tan, L., Wang, X., and Zhu, Y. S. (2004) Antisense Tiam1 down- regulates the invasiveness of 95D cells in vitro. Acta Biochim. Biophys. Sin. 36, 537–540.

243. Minard, M. E., Ellis, L. M., and Gallick, G. E. (2006) Tiam1 regulates cell adhesion, migration and apoptosis in colon tumor cells. Clin. Exp. Metastasis 23, 301–313.

244. Zhao, L., Liu, Y., Sun, X., He, M., and Ding, Y. (2010) Overexpression of T lymphoma invasion and metastasis 1 predict renal cell carcinoma metastasis and overall patient survival. J. Cancer Res. Clin. Oncol. 137, 393-8.

231

245. Engers, R., Mueller, M., Walter, A., Collard, J. G., Willers, R., and Gabbert, H. E. (2006) Prognostic relevance of Tiam1 protein expression in prostate carcinomas. Br. J. Cancer 95, 1081–1086.

246. Ding, Y., Chen, B., Wang, S., Zhao, L., Chen, J., Chen, L., and Luo, R. (2009) Overexpression of Tiam1 in hepatocellular carcinomas predicts poor prognosis of HCC patients. Int. J. Cancer 124, 653–658.

247. Rooney, C., White, G., Nazgiewicz, A., Woodcock, S. A., Anderson, K. I., Ballestrem, C., and Malliri, A. (2010) The Rac activator STEF (Tiam2) regulates cell migration by microtubule-mediated focal adhesion disassembly. EMBO Rep. 11, 292–298.

248. Yoshizawa, M., Hoshino, M., Sone, M., and Nabeshima, Y. (2002) Expression of stef, an activator of Rac1, correlates with the stages of neuronal morphological development in the mouse brain. Mech. Dev. 113, 65–68.

249. Matsuo, N., Hoshino, M., Yoshizawa, M., and Nabeshima, Y. (2002) Characterization of STEF, a guanine nucleotide exchange factor for Rac1, required for neurite growth. J. Biol. Chem. 277, 2860–2868.

250. Ponting, C. P. (1997) Evidence for PDZ domains in bacteria, yeast, and . Protein Sci. 6, 464–468.

251. Schultz, J., Copley, R. R., Doerks, T., Ponting, C. P., and Bork, P. (2000) SMART: A web-based tool for the study of genetically mobile domains. Nucleic Acids Res. 28, 231–234.

252. Masuda, M., Maruyama, T., Ohta, T., Ito, A., Hayashi, T., Tsukasaki, K.,Kamihira, S.,Yamaoka, S.,Hoshino, H.,Yoshida, T.,Watanabe, T., Stanbridge, E. J., and Murakami, Y. (2010) CADM1 interacts with Tiam1 and promotes invasive phenotype of human T-cell leukemia virus type I- transformed cells and adult T-cell leukemia cells. J. Biol. Chem. 285, 15511– 15522.

253. Hoshino, M., Sone, M., Fukata, M., Kuroda, S., Kaibuchi, K., Nabeshima, Y., and Hama, C. (1999) Identification of the stef gene that encodes a novel guanine nucleotide exchange factor specific for Rac1. J. Biol. Chem. 274, 17837–17844.

254. Du, H., Fu, R. A., Li, J., Corkan, A., and Lindsey, J. S. (1998) PhotochemCAD: A computer-aided design and research tool in photochemistry. Photochem. Photobiol. 68, 141–142.

232

255. Bevington, P. R., and Robinson, D. K. (2003) Data reduction and error analysis, 3rd ed., McGraw-Hill, New York.

256. de Castro, E., Sigrist, C. J., Gattiker, A., Bulliard, V., Langendijk-Genevaux, P. S., Gasteiger, E., Bairoch, A., and Hulo, N. (2006) ScanProsite: Detection of PROSITE signature matches and ProRuleassociated functional and structural residues in proteins. Nucleic Acids Res. 34, W362–W365.

257. Beauvais, D. M., and Rapraeger, A. C. (2004) Syndecans in tumor cell adhesion and signaling. Reprod. Biol. Endocrinol. 2, 3.

258. Xian, X., Gopal, S., and Couchman, J. R. (2010) Syndecans as receptors and organizers of the extracellular matrix. Cell Tissue Res. 339, 31–46.

259. Spiegel, I., Salomon, D., Erne, B., Schaeren-Wiemers, N., and Peles, E. (2002) Caspr3 and caspr4, two novel members of the caspr family are expressed in the nervous system and interact with PDZ domains. Mol. Cell. Neurosci. 20, 283– 297.

260. Craig, A. M., and Kang, Y. (2007) Neurexin-neuroligin signaling in synapse development. Curr. Opin. Neurobiol. 17, 43–52.

261. Lise, M. F., and El-Husseini, A. (2006) The neuroligin and neurexin families: Fromstructure to function at the synapse. Cell. Mol. Life Sci. 63, 1833–1849.

262. Horovitz, A. (1996) Double-mutant cycles: A powerful tool for analyzing protein structure and function. Folding Des. 1, R121–R126.

263. Kawauchi, T., Chihama, K., Nabeshima, Y., and Hoshino, M. (2003) The in vivo roles of STEF/Tiam1, Rac1 and JNK in cortical neuronal migration. EMBO J. 22, 4190–4201.

264. Matsuo, N., Terao, M., Nabeshima, Y., and Hoshino, M. (2003) Roles of STEF/Tiam1, guanine nucleotide exchange factors for Rac1, in regulation of growth cone morphology. Mol. Cell. Neurosci. 24, 69–81.

265. Terawaki, S., Kitano, K., Mori, T., Zhai, Y., Higuchi, Y., Itoh, N., Watanabe, T., Kaibuchi, K., and Hakoshima, T. (2010) ThePHCCEx domain of Tiam1/2 is a novel protein- and membrane-binding module. EMBO J. 29, 236–250.

266. Schneider, S., Buchert, M., Georgiev, O., Catimel, B., Halford, M., Stacker, S. A., Baechi, T., Moelling, K., and Hovens, C. M. (1999) Mutagenesis and selection of PDZ domains that bind new protein targets. Nat. Biotechnol. 17, 170–175.

233

267. Reina, J., Lacroix, E., Hobson, S. D., Fernandez-Ballester, G., Rybin, V., Schwab, M. S., Serrano, L., and Gonzalez, C. (2002) Computer-aided design of a PDZ domain to recognize new target sequences. Nat. Struct. Biol. 9, 621– 627.

268. Wiedemann, U., Boisguerin, P., Leben, R., Leitner, D., Krause, G., Moelling, K., Volkmer-Engert, R., and Oschkinat, H. (2004) Quantification of PDZ domain specificity, prediction of ligand affinity and rational design of super- binding peptides. J.Mol. Biol. 343, 703–718.

269. Kurakin, A., Swistowski, A., Wu, S. C., and Bredesen, D. E. (2007) The PDZ domain as a complex adaptive system. PLoS One 2, e953. 50.

270. Chen, J. R., Chang, B. H., Allen, J. E., Stiffler, M. A., and MacBeath, G. (2008) Predicting PDZ domain-peptide interactions from primary sequences. Nat. Biotechnol. 26, 1041–1045.

271. Sakarya, O., Conaco, C., Egecioglu, O., Solla, S. A., Oakley, T. H., and Kosik, K. S. (2010) Evolutionary expansion and specialization of the PDZ domains. Mol. Biol. Evol. 27, 1058–1069.

272. Sone, M. (1997) Still life, a protein in synaptic terminals of Drosophila homologous to GDP-GTP exchangers. Science 275, 1405.

273. Letunic, I., Doerks, T., and Bork, P. (2009) SMART 6: Recent updates and new developments. Nucleic Acids Res. 37, D229–D232.

274. Schultz, J., Milpetz, F., Bork, P., and Ponting, C. P. (1998) SMART, a simple modular architecture research tool: Identification of signaling domains. Proc. Natl. Acad. Sci. U.S.A. 95, 5857–5864.

275. Larkin, M. A., Blackshields, G., Brown, N. P., Chenna, R., McGettigan, P. A.,McWilliam, H., Valentin, F., Wallace, I. M., Wilm, A., Lopez, R., Thompson, J. D., Gibson, T. J., and Higgins, D. G. (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947–2948.

276. Wickliffe, K., Williamson, A., Jin, L., and Rape, M. (2009) The Multiple Layers of Ubiquitin Dependent Cell Cycle Control. Chem. Rev. 109, 1537– 1548.

277. Reyes-Turcu, F., Ventii, K., and Wilkinson, K. (2009) Regulation and Cellular Roles of Ubiquitin-Specific Deubiquitinating Enzymes. Annu. Rev. Biochem. 78, 363–397.

234

278. Bilodeau, P., Urbanowski, J., Winistorfer, S., and Piper, R. (2002) The Vps27p-Hse1p Complex Binds Ubiquitin and Mediates Endosomal Protein Sorting. Nat. Cell Biol. 4, 534–539.

279. Hofmann, K., and Bucher, P. (1996) The UBA Domain: A Sequence Motif Present in Multiple Enzyme Classes of the Ubiquitination Pathway. Trends Biochem. Sci. 21, 172–173.

280. Hurley, J., Lee, S., and Prag, G. (2006) Ubiquitin Binding Domains. Biochem. J. 399, 361–372.

281. Beal, R. E., Toscano-Cantaffa, D., Young, P., Rechsteiner, M., and Pickart, C. M. (1998) The Hydrophobic Effect Contributes to Polyubiquitin Chain Recognition. Biochemistry 37, 2925–2934.

282. Mueller, T. D., and Feigon, J. (2002) Solution Structures of UBA Domains Reveal a Conserved Hydrophobic Surface for Protein-Protein Interactions. J. Mol. Biol. 319, 1243–1255.

283. Luger, K., Rechsteiner, T., and Richmond, T. J. (1999) Expression and purification of recombinant histones, and nucleosome reconstitution. Methods Enzymol. 304, 3–19.

284. Wavreille, A.-S., Garaud, M., Zhang, Y., and Pei, D. (2007) Defining SH2 domain and PTP specificity by screening combinatorial peptide libraries. Methods 42, 207–219.

285. Garaud, M., and Pei, D. (2007) Substrate profiling of protein tyrosine phosphatase PTP1B by screening a combinatorial library. J. Am. Chem. Soc. 129, 5366–5367.

286. Lam, K. S., and Lebl, M. (1992) Streptavidin and avidin recognize peptide ligands with different motifs. ImmunoMethods 1, 11–15.

287. Barylko, B., Binns, D., Lin, K.-M., Atkinson, M., Jameson, D., Yin, H., and Albanesi, J. (1998) Synergistic Activation of Dynamin GTPase by Grb2 and Phosphoinositides. J. Biol. Chem. 273, 3791–3797.

288. Craig, K. L., and Tyers, M. (1999) The F-box: A new motif for ubiquitin dependent proteolysis in cell cycle regulation and signal transduction. Prog. Biophys. Mol. Biol. 72, 299–328.

235

289. Mimnaugh, E. G., Kayastha, G., McGovern, N. B., Hwang, S. G., Marcu, M. G., Trepel, J., Cai, S., Marchesi, V., and Neckers, L. (2001) Caspase- dependent deubiquitination of monoubiquitinated nucleosomal histone H2A induced by diverse apoptogenic stimuli. Cell Death Differ. 8, 1182–1196.

290. Benedit, P., Paciucci, R., Thompson, T., Valeri, M., Nadal, M., Càceres, C., Torres, I., Estivill, X., Lozano, J., Morote, J., and Reventós, J. (2001) PTOV1, a novel protein overexpressed in prostate cancer containing a new class of protein homology blocks. Oncogene 20, 1455–1464.

291. Kalveram, B., Schmidtke, G., and Groettrup, M. (2008) The ubiquitin-like modifier FAT10 interacts with HDAC6 and localizes to aggresomes under proteasome inhibition. J. Cell Sci. 121, 4079–4088.

292. Ingvarsdottir, K., Krogan, N., Emre, N., Wyce, A., Thompson, N., Emili, A., Hughes, T., Greenblatt, J., and Berger, S. (2005) H2B ubiquitin protease Ubp8 and Sgf11 constitute a discrete functional module within the Saccharomyces cerevisae SAGA complex. Mol. Cell. Biol. 25, 1162–1172.

293. Kay, B., Adey, N., and Sparks, A. (1995) Reagents binding vinculin, dynein, and glutathione S-transferase from peptide libraries. PCT Int. Appl., pp 110 CODEN: PIXXD2 WO9520601 A1 19950803 CAN 123:250695 AN 1995:826768 CAPLUS.

294. Propheter, D. C., Hsu, K.-L., and Mahal, L. K. (2010) Fabrication of an oriented lectin microarray. ChemBioChem 11, 1–5.

295. Hoffmuller, U., Russwurm, M., Kleinjung, F., Ashurst, J., Oschkinat, H., Volker-Engert, R., Koesling, D., and Schneider-Mergener, J. (1999) Interaction of a PDZ protein domain with a synthetic library of all human protein C-termini. Angew. Chem., Int. Ed. 38, 2000–2004.

296. Ciccia, A., and Elledge, S.J. (2010) The DNA damage response: making it safe to play with knives. Mol. Cell 40, 179-204.

297. Futreal, P.A., Liu, Q., Shattuck-Eidens, D., Cochran, C., Harshman, K., Tavtigian, S., Bennett, L.M., Haugen-Strano, A., Swensen, J., Miki, Y., et. al. (1994) BRCA1 mutations in primary breast and ovarian carcinomas. Science 266, 120-2.

298. Miki, Y., Swenson, J., Shattuck-Eidens, D., Futreal, P.A., Harshman, K., Tavtigian, S., Liu, Q., Cochran, C., Bennett, L.M., Ding, W., et al. (1994) A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. Science 266, 66-71.

236

299. Thai, T.H., Du, F., Tsan, J.T., Jin, Y., Phung, A., Spillman, M.A., Massa, H.F., Muller, C.Y., Ashfaq, R., Mathis, J.M., Miller, D.S., Trask, B.J., Baer, R., and Bowcock, A.M. (1998) Mutations in the BRCA1-Associated RING Domain (BARD1) gene in primary breast, ovarian, and uterine cancers. Hum. Mol. Genet. 7, 195-202.

300. Wu, P., Frit, P., Meesala, S., Dauvillier, S., Modesti, M., Andes, S.N., Huang, Y., Sekiguchi, J., Calsou, P., Salles, B., and Junop, M.S. (2009) Structural and functional interaction between the human DNA repair proteins DNA Ligase IV and XRCC4. Mol. Cell Biol. 29, 3163-3172.

301. Feng, H., Parker, J.M., Lu, J., and Cao, W. (2004) Effects of deletion and site- directed mutations on ligation steps of NAD+-dependent DNA ligase: a biochemical analysis of BRCA1 c-terminal domain. Biochemistry 43, 12648- 12659.

302. Leung, C.C., Kellogg, E., Kuhnert, A., Hänel, F., Baker, D., and Glover, J.N. (2010) Insights from the crystal structure of the sixth BRCT domain of Topoisomerase IIBeta Binding Protein 1. Protein Sci. 19, 162-7.

303. Ghosh, A., Shuman, S., Lima, C.D. (2008) The structure of FCP1, an essential RNA polymerase II CTD phosphatase. Mol. Cell 32, 478-490.

304. Huang, Y.M., and Chang, C.A. (2011) Mechanism of phosphothreonine/serine recognition and specificity for modular domains from all-atom molecular dynamics. BMC Biophysics 4:12.

305. Sweeney, M.C., Wang, X., Park, J., Liu, Y., and Pei, D. (2006) Determination of the sequence specificity of XIAP BIR domains by screening a combinatorial peptide library. Biochemistry 45, 14740-14748.

306. Shepherd, T.R., Hard, R.L., Murray, A.M., Pei, D., and Fuentes, E. (2011) Distinct ligand specificity of the Tiam1 and Tiam2 PDZ domains. Biochemistry 50, 1296-1308.

307. Munoz, I.M., Jowsey, P.A., Toth, R., and Rouse, J. (2007) Phospho-epitope binding by the BRCT domains of hPTIP controls multiple aspects of the cellular response to DNA damage. Nucleic Acids Research 35, 5312-5322.

308. Liu, T., Chen, H., Kim, H., Huen, M.S., Chen, J., and Huang, J. (2012) RAD18-BRCTx interaction is required for efficient repair of UV-induced DNA damage. DNA Repair (Amst). 11, 131-8.

237

309. Ueda, S., Takeishi, Y., Ohashi, E., and Tsurimoto, T. (2012) Two serine phosphorylation sites in the C-terminus of Rad9 are critical for 9-1-1 binding to TopBP1 and activation of the DNA damage checkpoint response in HeLa cells. Genes to Cells 17, 807-816.

310. Boos, D., Sanchez-Pulido, L., Rappas, M., Pearl, L.H., Oliver, A.W., Ponting, C.P., Diffley, J.F.X. (2011) Regulation of DNA replication through Sld3- Dpb11 interaction is conserved from yeast to humans. Current Biology 21, 1152-1157.

311. Edwards, R.A., Lee, M.S., Tsutakawa, S.E., Williams, R.S., Tainer, J.A., and Glover, J.N.M. (2008) The BARD1 C-terminal domain structure and interactions with polyadenylation factor CstF-50. Biochemistry 47, 11446- 11456.

312. Yin, J., Straight, P.D., McLoughlin, S.M., Zhou, Z., Lin, A.J., Golan, D.E., Kelleher, N.L., Kolter, R., and Walsh, C.T. (2005) Genetically encoded short peptide tag for versatile protein labeling by SFP phosphopantetheinyl transferase. PNAS 102, 15815-15820.

313. Vojkovsky, T. (1995) Detection of secondary amines on solid phase. Pept. Res. 8, 236-37.

314. Jowsey, P.A., Doherty, A.J., Rouse, J. (2004) Human PTIP facilitates ATM- mediated activation of p53 and promotes cellular resistance to ionizing radiation. J. Biol. Chem. 279, 55562-55569.

315. Xu, C., Wu, L., Cui, G., Botuyan, M.V., Chen, J., Mer, G. (2008) Structure of a second BRCT domain identified in the Nijmegen Breakage Syndrome protein Nbs1 and its function in an MDC1-dependent localization of Nbs1 to DNA damage sites. J. Mol. Biol. 381, 361-372.

316. Williams, R.S., Dodson, G.E., Limbo, O., Yamada, Y., Williams, J.S., Guenther, G., Classen, S., Glover, J.N., Iwasaki, H., Russell, P., and Tainer, J.A. (2009) Nbs1 flexibly tethers Ctp1 and Mre11-Rad50 to coordinate DNA double-strand break processing and repair. Cell 139, 87-99.

317. Chapman, J.R., and Jackson, S.P. (2008) Phospho-dependent interactions between NBS1 and MDC1 mediate chromatin retention of the MRN complex at sites of DNA damage. EMBO Rep. 9, 795-801.

318. Wang, J., Gong, Z., and Chen, J. (2011) MDC1 collaborates with TopBP1 in DNA replication checkpoint control. J. Cell Biol. 193, 267-273.

238

319. Thanassoulas, A., Nomikos, M., Theodoridou, M., Yannoukakos, D., Mastellos, D., and Nounesis, G. (2010) Thermodynamic study of the BRCT domain of BARD1 and its interaction with the –pSER-X-X-Phe-motif- containing BRIP1 peptide. Biochim. Biophys. Acta 1804, 1908-16.

320. Yüce, Ö., Piekny, A., and Glotzer, M. (2005) An ECT2-Centralspindlin complex regulates the localization and function of RhoA. J. Cell. Biol. 170, 571-582.

321. Joo, W.S., Jeffrey, P.D., Cantor, S.B., Finnin, M.S., Livingston, D.M., Pavletich, N.P. (2002) Structure of the 53BP1 BRCT region bound to p53 and its comparison to the BRCA1 BRCT structure. Genes Dev. 16, 583-593.

239

240