SYNTHETIC COMBINATORIAL PEPTIDE LIBRARIES and THEIR APPLICATION in DECODING BIOLOGICAL INTERACTIONS DISSERTATION Presented in T

SYNTHETIC COMBINATORIAL PEPTIDE LIBRARIES AND THEIR APPLICATION IN DECODING BIOLOGICAL INTERACTIONS

DISSERTATION

Presented in the Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University

Michael Cameron Sweeney

The Ohio State University 2005

Dissertation Committee: Approved By:

Professor Dehua Pei, Advisor

Professor Jill Rafael-Fortney ______

Professor Charle Bell Dehua Pei

Professor Charles Brooks The Ohio State Biochemistry Graduate Program

ABSTRACT

The synthesis of peptides was revolutionized by the adoption of solid-phase synthetic techniques. Subsequent improvement, evolution, and refinement of this chemical technique has allowed research into areas of biology not previously accessible with such speed and breadth. Because of the efficiency and flexibility of the chemistry involved in peptide synthesis, libraries representing millions of unique natural, modified, or unnatural peptides can be constructed rapidly and in high enough purity as to obviate the need for purification. In this work, libraries were synthesized for screening against individual protein domains in an effort to both determine the preferred peptidyl binding partner types for each, as well as to establish an optimized, broadly applicable methodology for screening other domains. One of the problems encountered during the development of the screening methodology was the low success-rate of sequence determination for the peptides selected by each domain. Herein we report the successful modification of the peptide ladder mass spectrometry sequencing technique referred to as partial Edman degradation (PED). Success-rates were improved to greater than 90% for full-length sequencing determination of peptide up to 8-mers, even for more difficult phosphotyrosine (pY)-containing peptides. As a result of this improvement, three pY- binding Src Homology 2 (SH2) domains and two N-terminus binding Baculoviral

ii Inhibitor-of-Apoptosis Repeat (BIR) domains were screened against their respective libraries and the preferred ligand types for each was determined. The advantage of sequencing by the PED method became especially clear in the case of the N-terminal

SH2 (N-SH2) domain of Src Homology 2 Protien Tyrosine Phosphatase 2 (SHP-2) as previously unidentified sub-classes of binding consensus motifs were distinguishable due to the discreet nature of the sequencing technique. This work demonstrates the usefulness and potential generality of peptide library screening by this method.

iii

Dedicated to my parents and family

iv ACKNOWLEDGEMENTS

I would like to thank my advisor, Dr. Dehua Pei. I don’t know which was more important during my training, his broad scientific knowledge, surpassed in breadth only by its depth, or his nearly limitless patience. I benefited greatly from both and will be endebted throughout my career.

I am happy to have known and worked with my labmates, Dr. Kirk Beebe, Dr.

Peng Wang, Dr. Kiet Nguyen, Dr. Xubo Hu, Dr. Grace Zhu, Junguk Park, and Anne-

Sophie Wavreille. Thank you all for everything you have given me, I can only hope to have positively impacted you as much as you have me.

While at Ohio State, I was lucky enough to have had access to the professional support of the CCIC Mass Spectrometry Facility. The insight and timely assistance provided by Nan Kleinholz, Rhonda Pitsch, Ben Jones, and Josh Ellis under the direction of Dr. Kari Green-Church were indispensable. In addition to their technical expertise, their friendship made visits less about sample submission and more of a welcome reprieve from the daily grind.

Lastly, I am eternally grateful for the unwavering support I have received from my parents and family. Without you, I would never have tasted success in anything. If I can live up to your precedents in work and in life, I will know success in both.

v VITA

1997 B.S. Chemistry, The Ohio State University

1997-1999 Completed Years 1 and 2 of Medical School, The Ohio State University

1999-present Graduate Research Assistant, The Ohio State University

July, 2005 Return to Medical School, The Ohio State University

PUBLICATIONS

Sweeney, M. C. and Pei, D. (2003) An improved method for rapid sequencing of support-

bound peptides by partial Edman degradation and mass spectrometry. J. Comb.

Chem. 5, 218-222.

Sweeney, M. C., Park, J., Wavreille, A-S., and Pei, D. (2005) Decoding protein-protein

interactions through combinatorial chemistry: Sequence specificity of SHP-2 and

SHIP SH2 domains. Biochemistry, under revision.

Sweeney, M. C., Park, J., Wavreille, A-S., and Pei, D. (2005) Determination of the

binding specificities of the BIR domains of XIAP by combinatorial peptide

library screening. Chem. Biol., under review.

FIELDS OF STUDY

Major Field: Ohio State Biochemistry Program

vi TABLE OF CONTENTS

Page

Abstract...... ii Dedication...... iv Acknowledgments...... v Vita & Publications...... vi List of Schemes...... x List of Tables...... xi List of Figures...... xii List of Abbreviations...... xiii

Chapters:

1. General Introduction...... 1 1.1 Solid Phase Peptide Synthesis...... 1 1.2 Peptide Libraries...... 2 1.3 The Importance of Protein Domains to Signal Transduction...... 6

2. Peptide and Protein Sequencing by Partial Edman Degradation…...... 8 2.1 Introduction...... 8 2.2 The Edman Degradation Methodology...... 9 2.3 Peptide Ladder Detection Scheme for Edman Degradation...... 10 2.4 Peptide Ladder Encoding of Libraries...... 10 2.5 Post-Screening Peptide Ladder Encoding of Library Peptides...... 11 2.6 Development of an Improved Partial Edman Degradation Technique…..12 2.7 Experimental Designs and Techniques...... 12 2.7.1 General PIC-Based PED and MALDI-TOF MS…………………….…..12 2.7.2 Modified PIC-Based Procedures…………………………………………13 2.7.3 Replacement of PIC by OSu-Esters……………………………………...14 2.7.4 General OSu-Ester-Based PED……………………………………….….15 2.7.5 Synthesis of Alternative OSu-Esters……………………………………..15 2.7.6 Synthesis of Pro- and Trp-Containing Test Sequences………………..…16 2.7.7 Activated-Disulfide Resin Synthesis for Use in Native Protein PED……17 2.7.8 Immobilization of Tryptic Digest of Co2+E. coli Peptide Deformylase....18 2.8 Results and Discussion…………………………………………………..19 2.9 Conclusion……………………………………………………………….25

vii 3. Determination of the Phosphopeptide Ligand Specificities of SHP-2 and SHIP SH2 Domains by Combinatorial Peptide Library Screening………………...... 36 3.1 Introduction……………………………………………………...... 36 3.2 Experimental Procedures…………………………………………...... 38 3.2.1 Vector Constructs……………………………………………...... 38 3.2.2 SHP-2 SH2 Domain Constructs………………………………...... ….…44 3.2.3 Control and SHP-1 Constructs…………………………………...... …46 3.2.4 Purification and Biotinylation of His6-MBP-SH2 Proteins…...... …46 3.2.5 Purification of His-tagged SH2 Domains………………...... 47 3.2.6 Purification of Full-Length SHP-2 for Stimulation Assays…...... …47 3.2.7 Synthesis of pY Library…………………………………...... …48 3.2.8 Colorimetric Library Screening……………………...... 49 3.2.9 Partial Edman Degradation and Peptide Sequencing………………...... 50 3.2.10 Synthesis of Biotinylated pY Peptides………………………...... …51 3.2.11 Determination of Dissociation Constants by BIAcore……………...... 52 3.3 Results…………………………………………………...... 53 3.3.1 Library Design, Synthesis, and Screening……………………...... …53 3.3.2 Peptide Sequencing by PED………………………………...... 55 3.3.3 Specificity of the C-SH2 Domain of SHP-2…………………...... 55 3.3.4 Specificity of the N-SH2 Domain of SHP-2……………...... 57 3.3.5 Specificity of SHIP SH2 Domain…………………………...... 59 3.3.6 Affinity Measurements of Selected Sequences………………...... 59 3.3.7 Database Search of Potential SHP-1/SHP-2-Binding Proteins………...... 60 3.4 Discussion…………………………...... 61

4. Determination of the Tetrapeptide Ligand Specificities of the BIR2 and BIR3 Domains of XIAP by Combinatorial Peptide Library Screening……………..…77 4.1 Introduction……………………………………………………...... …77 4.2 Experimental Techniques………………………………………...... …80 4.2.1 Vector Constructs…………………………………………………...... 80 4.2.2 XIAP BIR Domain and Full-Length Constructs……………………...... 81 4.2.3 Purification and Lableling of His6-MBP-BIR Proteins……………….....83 4.2.4 Purification of GST-BIR1-3, GST-XIAP, and GST Control Proteins…...84 4.2.5 Synthesis of BIR Libraries…………………………………………….....84 4.2.6 Colorimetric Library Screening………………………...... ….85 4.2.7 Fluorimetric Library Screening…………………………...... 85 4.2.8 Partial Edman Degradation and Peptide Sequencing………………...... 86 4.2.9 Synthesis of biotinylated pY peptides………………………...... 87 4.2.10 Determination of Dissociation Constants by BIAcore……………...... …88 4.3 Results………………………...... 88

viii 4.3.1 Library Construction and Screening………………...... 88 4.3.2 Binding Specificity of the BIR2 Domain……...... …..90 4.3.3 Binding Specificity of the BIR3 Domain……………………...... 91 4.3.4 Affinity Measurements of Selected Peptides………………………….…92 4.3.5 Database Search for Potential BIR2 and BIR3 Binding Partners…...... …92 4.3.6 Probing BIR3-Caspase-10d Interactions...... 94 4.4 Discussion………………………...... 94 4.5 Conclusion...... 97

5. Materials and General Methods…………………...... ……….108 5.1 Materials………………………...... 108 5.2 Buffers………………………………...... 109 5.3 General Biochemical and Biological Methods…....……………………111 5.3.1 Materials …………………...... …111 5.3.2 Growth Media………………………………...... 111 5.3.3 Growth and Storage of Bacterial Strains…………….…………………112 5.3.4 Preparation of Competent Cells…………………...... 113 5.3.5 Quantitation of DNA and RNA………………...... ……113 5.3.6 Protein Quantitation………………...... ………114 5.4 Electrophoresis……………………….………...... …………114 5.4.1 Agarose Gel…………………………...... 114 5.4.2 Polyacrylamide Gels for Protein Separation………………………...... 115 5.4.3 Urea-PAGE Gels for Oligonucletide Purifiation…………………….....116 5.5 Recombinant DNA techniques...... 117 5.5.1 Restriction Digestions...... 117 5.5.2 Filling Recessed 3’-Termini and Removing Protruding 3’-Termini…...118 5.5.3 Removal of 5’ Phosphates……………...... 118 5.5.4 Ligation of DNA……………...... 119 5.5.5 Transformation…………………...... 119 5.5.6 Small-Scale Preparation of Plasmid DNAs………………………...... 120 5.5.7 Mutagenesis………………………...... …121 5.5.8 Sequencing……………………………...... …121

Appendix: Supplementary Schemes, Tables, and Figures...……………………………122

Bibliography……………………...... ………………………………….134

ix LIST OF SCHEMES

Scheme Page

2.1 Traditional Edman degradation...... 26

2.2 Solution-phase peptide-ladder sequencing...... 27

2.3 Support-bound peptide-ladder sequencing...... 29

A1 Construction of pETMAL vector...... 123

A2 Construction of pET-PNPT vector...... 124

A3 Construction of pPPTmal vector...... 125

A4 Construction of pGFPmal vector...... 126

x LIST OF TABLES

Table Page

2.1 Virtual EcPDF tryptic digest...... 34

3.1 SHP-2 C-SH2 domain selected peptides...... 65

3.2 SHP-2 N-SH2 domain selected peptides...... 66

3.3 SHIP SH2 domain selected peptides...... 67

3.4 Dissociation constants of SH2 domains toward selected pY peptides...... 72

3.5 Potential human SHP-1/SHP-2 interacting proteins from database search...... 74

4.1 XIAP BIR2 domain selected peptides...... 98

4.2 XIAP BIR3 domain selected peptides...... 100

4.3 Dissociation constants of BIR domains toward selected peptides...... 102

4.4 Potential human BIR2 interacting proteins from database search...... 103

4.5 Potential human BIR3 interacting proteins from database search...... 106

A1 Additional SHP-2 C-SH2 selected peptides...... 128

A2 Additional SHP-2 N-SH2 selected peptides...... 129

A3 Additional SHIP SH2 selected peptides...... 130

A4 Peptide sequences selected during BIR3 domain fluorimetric screening...... 131

xi LIST OF FIGURES

Figure Page

2.1 MALDI-MS sequence of [Glu1] fibrinopeptide B...... 28

2.2 MALDI-TOF MS of support-bound peptide...... 30

2.3 MALDI-TOF MS of support-bound pY peptide...... 31

2.4 MALDI-TOF MS of Pro- and Trp-containing peptides...... 32

2.5 MALDI-TOF MS of Trp oxidation products...... 33

2.6 MALDI-TOF MS of immobilized EcPDF tryptic digest...... 35

3.1 MALDI-TOF MS of C-SH2 domain selected pY peptide...... 68

3.2 Histograms of SHP-2 C-SH2 domain selected pY peptides...... 69

3.3 Histograms of SHP-2 N-SH2 domain selected pY peptides...... 70

3.4 Histograms of SHIP SH2 domain selected pY peptides...... 71

3.5 BIAcore sensorgram and secondary plot...... 73

4.1 Histograms of BIR2 domain selected peptides...... 99

4.2 Histograms of BIR3 domain selected peptides...... 101

A1 SDS-PAGE gel of full-length SHP-2...... 127

A2 Composite histogram of SHP-2 N-SH2 selected sequences...... 132

A3 Histogram of SHP-2 N-SH2 selected Class III and IV peptides...... 133

xii ABBREVIATIONS

AcCN Acetonitrile

BOC t-Butyloxycarbonyl

Biotin-OSu D-Biotin O-Succinimide

BSA Bovine Serum Albumin

β β-Alanine, β-Aminopropionic Acid

BCIP 5-Bromo-4-chloro-3-indolyl Phosphate

BME β-Mercatoethanol

Bz-OSu Benzoic Acid O-Succinimide

CLEAR Cross-Linked Ethoxylate Acrylate Resin

CIP Calf Intestinal Alkaline Phosphatase

DCM Dichloromethane

DIPEA N,N-Diisopropylethylamine

DMF N,N-Dimethylformamide

DNA Deoxyribonucleic Acid dNTPs Deoxyribonucleotide Triphosphates

DTNB 5,5’-Dithiobis(2-nitrobenzoic acid), also Ellman’s Reagent

DTT Dithiothreitol

EDTA N,N,N’,N’-Ethylendiamine Tetraacetate

ESI-MS Electrospray Ionization-Mass Spectrometry

xiii FMOC, Fmoc 9-Fluorenylmethoxycarbonyl

FPLC Fast Protein Liquid Chromatography

GFP Green Fluorescent Protein

GST Glutathione S-Transferase

GSH Glutathione

HBTU N-[(1H-Benzotriazol-1-yl)(dimethylamino)-methylene]-N-

methylmethanaminium hexafluorophosphate N-oxide

HPLC High Performance Liquid Chromatography, sometimes

High Pressure Liquid Chromatography

HOBt 1-Hydroxybenzotriazole iPrOH Isopropanol

IPTG Isopropyl-β-D-thiogalactoside

IMAC Immobilized Metal Affinity Chromatography

LBA/LBK Luria-Bertani media + Ampicillin or Kanamycin Antibiotic

MALDI-TOF Matrix Assisted Laser Desorption Ionization – Time Of Flight

MBP Maltose Binding Protein

MCS Multiple Cloning Sites

MeOH Methanol

Nic-OSu Nicotinic Acid O-Succimimide, also Nic-NHS

Nle Norleucine

xiv NMM N-Methylmorpholine

PCR Polymerase Chain Reaction

PED Partial Edman Degradation

PEG Polyethylene Glycol

PEGA Polyethylene Glycol Dimethylacrylamide

PIC Phenylisocyanate

PITC Phenylisothiocyanate pY Phosphotyrosine

SA-AP Streptavidin-Alkaline Phosphatase

SDM Site-Directed Mutagenesis

SDS-PAGE Sodium Dodecylsulfate-Polyacrylamide Gel Electrophoresis

SPPS Solid Phase Peptide Synthesis

SPR Surface Plasmon Resonance

TCEP Tris(2-carboxylethyl)phosphine

TEA Triethylamine

TFA Trifluoroacetic Acid

Standard one letter codes are used for deoxynucleotides occurring in PCR primers, while standard one and three letter abbreviations are used for genomically encoded amino acid residues.

CHAPTER 1

GENERAL INTRODUCTION

1.1 Solid Phase Peptide Synthesis

In 1963 R. Bruce Merrifield successfully demonstrated a technique for tethering

amino acids to an insoluble polymer support followed by step-wise peptide bond

formation and eventual cleavage from the support. His initial synthesis of a simple

model tetrapeptide [1] marked a significant break from traditional synthetic methodology

involving purification and characterization of products following each step. Moreover,

the ease with which this biphasic system allowed removal of unreacted reagents meant

that large excesses could be used in the coupling reactions to ensure completion at each

step. As a result, in most cases truncation products are limited and desired products can

be obtained in good yield following a single purification. The revolutionary concept of solid phase peptide synthesis (SPPS) has since been applied to the synthesis of other polymers, such as oligonucleotides [2] and oligosaccarides [3], and is the very basis for combinatorial library synthesis [4-6]. The ability to synthesize biologically relevant molecules, either individually or combinatorially, in a routine manner has had profound effects on the speed and types of biochemical research performed to date and Dr.

Merrifield was awarded the Nobel Prize in Chemistry for his groundbreaking technique in 1984.

1 In the forty plus years since the introduction of SPPS, numerous modifications

and improvements have been made in the strategies and reagents involved. Most

significantly, the “BOC/HF” strategy which evolved first [7-9] and relied on the relative

acid stabilities of permanent side-chain versus temporary α-amino protecting groups

between cycles, has been largely supplanted by the orthogonal “FMOC/piperidine”

approach. In this latter scheme, temporary α-amino protection is afforded by the base

labile 9-fluorenylmethoxycarbonyl group (FMOC) developed by Carpino [10], whereas the side-chain protecting groups are unreactive under basic condition and are removed by acid treatment. Additional improvements have been made in coupling reagents and the resin supports. The introduction of the phosphonium , aminium/uronium [11], and acid

fluoride-based carboxylate activating reagents have improved yields and reduced

racemization difficulties compared to carbodiimide reagents [reviewed in 12].

Meanwhile, incorporation of polyethyleneglycol (PEG) into resins, either as a graft onto polystyrene (e.g. TentaGel, Argogel) or as the scaffolding itself (e.g. PEGA, CLEAR) has improved the solvation and swelling properties of the resins for improved coupling efficiency, especially for difficult syntheses [13, 14]. Moreover, the amphipathic nature of the PEG linker is important in the current work as it allows for synthesis of peptide libraries in organic solvents and biological screening reactions in aqueous environments.

1.2 Peptide Libraries

In the early 1990’s, the development of large peptide libraries generated from randomized DNA sequences expressed and presented in the coat proteins of bacteriophage, referred to as phage display, marked an important advance in biochemistry [15]. Shortly thereafter, Geyson’s earlier work [16] was expanded into 2 large synthetic combinatorial peptide libraries using SPPS techniques [4-6]. More

recently, other chimeric nucleotide-peptide screening systems have been demonstrated

[17-20]. These powerful tools have allowed researchers to rapidly sample large regions

of ligand space in the search for binding interactions and characterization of enzymatic

reactions. Each technique has been extensively reviewed [21-23] and each has distinct

advantages and disadvantages. In many ways these techniques serve to complement each other in terms of the types, sizes, and constitutions of the peptide libraries obtained and

thus the types of binding or enzymatic screenings capable. For the sake of brevity, the

preponderance of discussion will be related on the phage display and common SPPS

library methods here.

Peptide phage display library screening represents a remarkable collaboration

between chemistry and biology. The randomized DNA library is chemically synthesized,

whereas the peptide epitopes are biologically synthesized [15]. Paramount to the successful application of this relationship, however, is that each DNA message be physically linked to the epitope it encodes. Thus, each phage capsid serves not only to display peptides for screening, but also to segregate and record the identity of each

selected peptide. Additionally, because of the infectious and reproductive nature of the

viral particle, selected phages can be harvested and propagated for subsequent rounds of

selection. After several rounds of competition and enrichment, very high affinity binding

epitopes can be elucidated by this methodology. Moreover, because the record-carrying

DNA the can be amplified in the sequencing reaction, very little sample is needed and

detection limits are not of concern. Such amplification is not possible during protein

sequencing of synthetic libraries and marks a significant advantage of nucleotide

encoding [24]. 3 Phage display has been applied with success to the elucidation and refinement of peptide ligands for a wide range of targets [25-27]. However, there are drawbacks to systems reliant upon biology for library synthesis. First and foremost, until recently, these libraries were limited in their composition to the 20 naturally occurring L-amino acids. In the instances where non-natural amino acids have been incorporated by amber suppression, diversity remains relatively limited and separate screenings are necessary for each analogue (i.e. direct competition is not yet possible). Furthermore, at present biosynthetic incorporation of non-proteinogenic residues is by no means trivial [28-33].

And second, positional biases, albeit relatively mild, have been demonstrated in phage display libraries stemming from the selective pressures of host-phage interactions [34-

36].

Synthetic peptide libraries can be designed for screening in two forms, solution phase and resin-bound. Both are generated by the split-pool method [4, 6], but solution phase libraries are cleaved from the resin before screening, whereas resin-bound libraries remain tethered to the support after de-protection such that many copies of one molecule

(~100 pmol) are displayed per bead (the one-bead one-compound, OBOC, principle). In contrast to the biological synthesis methods above, nucleotide encoding of synthetic libraries is burdensome and rarely employed [37, 38], and so selected peptides must be analyzed by a means other than PCR. The lack of an amplification mechanism introduces detection-limit considerations for synthetic libraries, and strategies for encoding libraries and deciphering the selected targets have been creative and varied

(reviewed in [39-41], discussed further in Chapter 2). Soluble peptide libraries have been successfully screened for ligand binding interactions, enzyme substrate preferences

[42], and inhibitors. In most cases, selected peptides are sequenced by pooled Edman 4 degradation or electrospray ionization-mass spectrometry (ESI-MS). As well as the

potential for missing low-copy, high-affinity ligands, pooled sequencing methods cannot

distinguish multiple, contextual binding/substrate motifs. Likewise, there are limits to

ESI-MS techniques, such as library size and potential sample loss during de-salting.

Solid-phase libraries offer an advantage by keeping each peptide separate and thus,

separately identifiable. This advantage can be especially helpful in the identification of high-affinity ligands that may not conform to more highly represented motifs and the ability to distinguish among multiple motifs (see Chapter 3). The drawback lies in the need to sequence large numbers of positively selected sequences rapidly and inexpensively.

Overcoming deconvolution difficulties associated with synthetic libraries by whichever means demonstrated in the literature, or as yet undescribed, opens doors to ligand space not attainable by genetically encoded systems. Although not as numerically diverse (usually 107~106 members) or as positional representative (usually 5~6 fully

randomized positions) as their transcribed counterparts, synthetic libraries offer

incomparably more direct access to incomparably more diverse unnatural monomers.

Moreover, multiple non-encoded building blocks can be incorporated, allowing direct

competition and selection between them. Indeed, libraries can be composed entirely of

unnatural D-amino acids in the search for protease-resistant receptor agonists.

Additionally, post-translationally modified (acetylated, formylated, methylated,

phosphorylated, etc.) amino acids are difficult or impossible to include in biologically-

originated libraries in an unbiased manner, but are generally amenable to synthetic

incorporation. Thus, there are types of screenings for which each library design is best

suited. 5

1.3 The Importance of Protein Domains to Signal Transduction

The pioneering research lead by Tony Pawson in the 1980’s and 90’s, in which the Src Homology 2 (SH2) domain was functional characterized as a conserved, self- folding, phosphotyrosine (pY) binding entity, elevated the role of recognition domains in signal transduction and cellular regulation [43-45]. The subsequent characterization of at least 50 conserved recognition domains, each binding substrates as chemically and spatially diverse as nucleic acids, lipids, and peptides, has helped bridge signaling pathways from the membrane to the nucleus and points in-between [46]. Diverse recognition events by diverse domains can be linked by the modular nature of the domains to create natural “fusion proteins” such as Grb2, a small adaptor protein composed of two SH3 and an SH2 domain. The result, in this example, is the linkage of a pY recognition event to two different Pro-based recognition events via the physically linked SH2 and SH3 domains, respectively. Thus, such an adapter molecule can both amplify a signal, and build specificity into a pathway and response via the inherent specificities Grb2’s fused domains. For instance, the SH2 domain of Grb2 not only recognizes pY, it does so in a contextually specific manner based on the surrounding C- terminal residues, in this case a sequence of pYXN, where X represents any amino acid, is the preferred substrate motif. This is in contrast to the SH2 domain of Src, which prefers a pYEEI motif. Likewise, the two SH3 domains within Grb2 prefer slightly different PXXP or RXXP motifs. Considering there are 115 SH2 and 253 SH3 domains encoded by the human genome, each possessing unique binding preference, the combinatorial signaling possibilities attainable through their fusion seems almost limitless. 6 One deconstructionist means of probing signaling pathways and directing their study is through the application of combinatorial library screening of the individual constituent domains. The use of phage display and other biologically derived libraries for the probing of domains which recognize unmodified peptides, e.g. SH3, is nicely complemented by SPPS-based libraries for understanding the specificity of domains requiring post-translational modifications, such as SH2, Bromo, Chromo, etc. Research involving pY peptide library screening led by Lewis Cantley [47] demonstrated the feasibility of such an approach, and in fact allowed the sub-classification of SH2 domains based their recognition preferences. Subsequent refinement and expansion of this application has been ongoing and we endeavored to add to the understanding signal transduction while improving the methodology of library screening.

CHAPTER 2

PEPTIDE and PROTEIN SEQUENCING

PARTIAL EDMAN DEGRADATION

2.1 Introduction

Equally important as designing and synthesizing a high quality peptide library for

screening is “reading” the results of that screening. While the generally encountered

chemical strategies by which to synthesize libraries are few, BOC/HF or

FMOC/piperidine, the schemes by which to decode the selected sequences are many and

are not limited to chemistry. The difficulty is finding a means of robustly encoding large

numbers (generally ≥ 106) of peptides with unique identifiers that, ideally 1) won’t

interfere with the binding or catalysis being studied; 2) won’t bias the library; 3) can be read quickly and cheaply; 4) can be incorporated quickly and cheaply; and 5) in chemically encoding cases, can be detected at fairly low limits with high accuracy.

Attempts to devise an ideal sequence encoding method have been far-ranging, and while some schemes are more promising than others, each fails at least one criterion of ideality.

A few examples of resin-bound non-chemical library encoding methods include spatial

segregation (i.e. peptide arrays), radio frequency tags, and optical diversification, among

others. Several of the exogenous chemical-tagging systems described involve adding

haloaromatics, secondary amines, and nucleotides during library synthesis. The term 8 exogenous is included in the previous descriptor because the peptides are themselves

chemicals and can in fact chemically encode their own sequence, endogenously. This

self-encoding nature can be exploited in several ways, but unlike oligonucleotides, amplification of the message is not possible, and thus detection limits are always of concern.

2.2 The Edman Degradation Methodology

The chemical methodology of sequentially removing and identifying each N- terminal residue of a protein or peptide using phenylisothiocyanate (PITC) was originally

described by Pehr Edman in 1950 [48]. Optimization of the degradation technique

(Scheme 2.1) which bears his name has yielded an automated process by which high

performance liquid chromatography (HPLC) identification of the degradation product, a

phenylthiohydantion (PTH) bearing the N-terminal residue, can routinely be achieved at

limits of 5-10 pmol of peptide sample [4, 49]. In many ways, determining the sequence

of selected peptides by way of the inherent peptide sequence seems ideal. However, with

regard to solution phase library screenings, the selected peptides are necessarily a

mixture, and thus Edman degradation can only yield the most preferred residues at each

position. With regard to resin-bound libraries, sequencing each positive bead (often >

100 selected) is cost and time prohibitive. Therefore, selected beads are often pooled and

sequenced as a mixture. Unfortunately, this act of pooling negates some of the power of

the one-bead-one-compound (OBOC) principle since only the most preferred residue(s)

at each position are observable. Contextual data, the contribution of one residue toward

binding or enzymatic specificity in the context of other residues in the same peptide, can

only be had by decoding each sequence individually. This topic will be revisited. 9

2.3 Peptide Ladder Detection Scheme for Edman Degradation

Traditionally, after cleavage from the parent peptide the PITC derivative must be extracted, treated for rearrangement, and identified by retention time in an HPLC. Chait and colleagues demonstrated a novel approach [50]. Instead of handling and analyzing the small PTH derivative produced after each degradation cycle by a time-consuming

HPLC method, they generated a “peptide ladder” out of the parent peptide through competitive capping and degradation (Scheme 2.2). The N-terminal capping reagent included as a small percentage of PITC was its oxygen analogue, phenylisocyanate (PIC), which formed a stable carbamate and blocked degradation of a small number of peptides in each cycle. Analysis of the generated peptide ladder was quick and all at once by mass spectrometry. In the mass spectrum, a peak was observed for each capping product like rungs of a ladder. Each rung differed from the next by the mass unit of the amino acid next to it in the sequence, and thus the ordered identity of most amino acids was obtained in one shot (Fig. 2.1, taken from [50]). Their work, conducted in a solution phase spinning cup sequenator, demonstrated the feasibility of constructing and analyzing a peptide ladder through Edman chemistry and mass spectrometry.

2.4 Peptide Ladder Encoding of Libraries

In a subsequent application of the peptide ladder-mass spectrometry sequencing concept to library encoding, Youngquist et al. described a method in which a peptide ladder was encoded during synthesis of an on-bead library [51]. This method allowed the peptides to act as their own endogenous chemical tags. In each cycle of peptide elongation, a small percentage of N-acetylated amino acid was incorporated along with 10 the FMOC-protected amino acid, thus terminating further elongation for this small percentage by capping the N-terminus. After screening, the full-length and chain- terminated peptides, each differing from the next by one amino acid, were released from the resin into individual vessels and sent through a mass spectrometer. The mass-identity of each residue was obtained in sequential order for each peptide individually in marked contrast to the pooled sequencing technique. Moreover, this technique was exceedingly fast, cheap, reliable, and had very low detection limits. However, due to the presence of truncation products during screening, unequivocal binding to the full-length product was not obvious. Moreover, the non-uniform reactivities of the amino acids during SPPS meant that the ratio of capping to elongation was not equal among amino acids, and thus a bias against certain residue’s representation in the library could not be ruled out, especially for longer sequences.

2.5 Post-Screening Peptide Ladder Encoding of Library Peptides

Chait’s technique, dubbed partial Edman degradation (PED), was subsequently utilized in our lab for sequencing peptides selected from an on-bead library (Scheme 2.3)

[52]. In this way, the peptides presented during screening were nearly all full-length and unbiased, and yet an individual sequence was obtained for each binding peptide rapidly and at low cost. However, the reagent PIC proved to be difficult to control and reproducibility of the PED sequencing results was less than satisfactory. Thus we endeavored to modify the technique, altering the solvent, substituting for PITC, and finally by replacing PIC. The eventual result was a more robust, forgiving, and easily optimized protocol based on the substitution of PIC by the less reactive O-succinimide

(OSu) esters of benzoic and nicotinic acid. 11

2.6 Development of an Improved Partial Edman Degradation Technique

In the course of screening pY and N-terminal libraries (design, construction, and screening of libraries are described in Chapters 3 & 4), we, along with colleagues in a neighboring lab, found that the determination of full-length sequences for these short peptides was frustratingly infrequent and irreproducible. Optimization of conditions and the ratio of PITC:PIC reagents against one batch of test beads failed to yield similar results the next day, or even in parallel experiments. A systematic deconstruction of the degradation scheme ensued.

2.7 Experimental Designs and Techniques

2.7.1 General PIC-Based PED and MALDI-TOF MS

Fresh library beads (50-100), each containing ~100 pmol of covalently attached peptide, were placed in a glass-fritted vessel to allow filtering. After washing with dichloromethane (DCM) and methanol (MeOH), the beads were suspended in 250 µL of degradation solvent (1:1 H2O:pyridine). In an eppendorf microcentrifuge tube, enough degradation solution was prepared for two cycles by adding 110 µL of of PITC from a freshly opened ampoule and 5.5 µL of fresh PIC (5% v/v) to 435 µL of pyridine. The degradation solution (250 µL) was added to the suspended beads with rapid mixing by pipette. The mixture was incubated at room temperature for 10 min, drained, washed with MeOH, DCM, and anhydrous trifluoroacetic acid (TFA), and suspending in TFA for

10 min. After draining and washing, the beads were re-suspended in degradation solvent and the cycle was repeated until the final position of the library, at which point, only PIC was added. 12 Submission for MALDI-TOF mass spectrometric analysis was performed

virtually unchanged throughout the work presented here. Following any PED procedure, the beads were subjected to a reductive work-up in order to reduce any methionine sulfoxide that may have formed in the course of screening or degrading. Reduction was performed by suspending the beads in the degradation vessel in ~1 mL of TFA on ice.

After 5 min on ice, dimethyl sulfide (20 µL) and ammonium iodide (10 mg) were added.

The vessels were incubated on ice with intermittent mixing by hand for 20 min, drained, washed with TFA. Following extensive washing with ddH2O, the beads were transferred

to individual microcentrifuge tubes and allowed to dry for 1 hr. The peptide ladders were

then cleaved from the resin by treatment overnight in the dark with 20 µL of 70% TFA

containing CNBr (20 mg/ml). Upon evacuation to dryness in a SpeedVac, the peptide

mixtures were re-suspended in 5 µL of 0.1% TFA, from which 1 µL of the peptide

solution was mixed with 2 µL of 0.1% TFA in 50% acetonitrile saturated with 4-

hydroxy-α-cyanocinnamic acid and spotted onto a 96-well MALDI-TOF sample plate.

MALDI-TOF mass spectrometry was performed on a Bruker Reflex III instrument in an

automated manner. Sequence determination from the mass spectra was performed

manually.

2.7.2 Modified PIC-Based Procedures

Experiments were attempted in which 1) the ratio of PIC:PITC was varied from

0.5% to 20%; 2) the solvent ratio was altered from 3:1 to 1:3 H2O:pyridine; 3) the reaction temperature was increased to 42 ºC; and 4) combinations of these alterations were explored. Some conditions, the higher PIC concentrations, accentuated N-terminal residues, but left the later positions un-interpretable. Conversely, some conditions 13 favored the C-terminus to the detriment of the earlier residues. Moreover, when

promising conditions were struck upon, consistent reproduction was absent.

The next course of action was to experiment with a PITC analogue. It was

expected that PIC was more electrophilic than PITC and so an attempt to increase the

latter’s reactivity towards the N-terminus during the coupling phase was made. However, it has long been appreciated that increased electrophilicity during the coupling phase results in decreased nucleophilicity of the thiocarbamoyl group during the acid promoted cleavage phase. Nevertheless, p-nitrophenylisothiocyanate was substituted for PITC in

three experiments. Competition reactions containing 0.9, 1.8, and 3.5% PIC (v/v) were

performed. Because of the electron withdrawing effect of the nitro substituent, the TFA

phase of the cycle was doubled to 20 min to promote cyclization and cleavage.

2.7.3 Replacement of PIC by OSu-Esters

Following attempts to replace PITC, a more controllable capping versus

degradation competition chemistry was still desirable. It therefore fell to replacing PIC

with an amine reactive species that would deliver an acid stable moiety. Sometimes

referred to as N-hydroxysuccidimdyl esters, OSu derivatives of carboxylic acids have

been used to good avail as relatively stable amine-reactive agents for capable of protein

modification in aqueous environments. In our own lab, we have used D-biotin-OSu to

label proteins for screening (Chapters 3 & 4). A strategy was devised in which

inexpensive, commercially available OSu derivatives would be substituted for PIC,

beginning with benzoic acid-OSu (Bz-OSu). This reagent yielded significant

improvements and promising results from the very first attempt, but eventually was itself

replaced by nicotinic acid-OSu (Nic-OSu). Three additional OSu-esters were synthesized 14 for use as novel capping reagents from (3-carboxypropyl)trimethylammonium chloride,

p-bromobenzoic acid, and N-methylnicotinic acid.

2.7.4 General OSu-Ester-Based PED

The method is subtly modified from the PIC-based procedure. Approximately 50

beads were placed in the degradation vessel and washed as before and 500 mM Bz-OSu

or a 400 mM Nic-OSu [crystallized from ethyl acetate (EtOAc) prior to usage] stock

solution was made in pyridine. The degradation solution (400 µL) was prepared anew every two cycles to contain 5% PITC (20 µL, 0.13 mmol) and a 0.67% mol ratio of OSu ester (0.087 mmol) in pyridine. After suspending the beads in 160 µL of the degradation solvent (2:1 pyridine:H2O), an equal volume of degradation solution was added and the

vessel was swirled by hand for mixing (instead of mixing by pipette) prior to placement

on the hub of a rotary shaker. The coupling reaction proceeded for 6 min before draining and washing as before. Aside from altering the TFA cleavage time from 10 min to 2 x 6 min, all other aspects remained unchanged from the PIC-based procedures.

2.7.5 Synthesis of Alternative OSu-Esters

The OSu-esters of (3-carboxypropyl)trimethylammonium chloride, p- bromobenzoic acid, and N-methylnicotinic acid were synthesized for testing in the PED scheme. A trimethylammoniumbutyric acid-OSu chloride activated ester derived from 3- carboxypropyl)trimethylammonium chloride was synthesized by Anthony Simpson and tested in the PED scheme without purification. The p-bromobenzoic acid-OSu molecule was synthesized by dissolution of the acid (4.0 g, 20 mmol) in anhydrous tetrahydrofuran

(THF) (50 mL) by the addition of N-hydroxsuccinimide (NHS) (2.75 g, 24 mmol) and 15 lastly, 1, 3-diisopropylcarbodiimide (DIC) (3.75 mL, 24 mmol). The reaction was left

stirring overnight under argon. The cloudy white mixture was filtered and washed with

DCM and THF to remove the diisopropyl urea by-product. TLC and NMR showed the

product to be very impure. Flash column chromatography was performed on a small

amount (~0.5 g) using 2:1 hexanes:EtOAc as the mobile phase. NMR showed the product to be pure, but the near-complete insolubility in water-miscible solvents made this OSu-ester ill-suited to PED and it was not tested. Lastly, N-methylnicotinic acid-

OSu iodide was synthesized from the Nic-OSu already present in the lab in a manner similar to Tadjamulia, et al [53]. Nic-OSu (3.5g, 16 mmol) was taken up in freshly prepared anhydrous acetone (65 mL). Iodomethane (2 mL, 32 mmol) was added, a condenser was fitted, and the reaction was heated to 45 ºC for 7 hr. The solid was collected by filtration, washed with acetone, and dried in vacuo. The melting point was measured (202 ºC) and compared to the literature value (222 ºC), indicating impurity.

The product was tested in the PED method in crude form.

2.7.6 Synthesis of Pro- and Trp-Containing Test Sequences

To test the mass spectrometric patterns observed for Pro and Trp residues resulting from PED of the library, two known Pro- or Trp-containing sequences were synthesized. The sequences were selected from two which allegedly occurred from the library for comparison. The Pro and Trp test sequences, FRAPLNβRM-resin and

DFWYLNβRM-resin, respectively, were synthesized on TentaGel S NH2 resin according

to standard FMOC/piperidine protocols employing 4 equivalents Fmoc-amino acids,

HBTU, and HOBt and 8 eq. N-methylmorpholine (NMM) as base in N,N-

dimethylformamide (DMF) for 1 hr. Reaction completion was measured by the absence 16 of color change in the ninhydrin (Kaiser) test. Final de-protection was performed by

treatment with Reagent K [7.5% phenol (w/v), 5% water (v/v), 5% thioanisole (v/v),

2.5% ethanedithiol (v/v), 1% anisole (v/v), and 1% triisopropylsilane (v/v) in TFA] for 1

hr at room temperature followed by extensive washing. Degradation and MALDI-TOF

MS were as before.

2.7.7 Activated-Disulfide Resin Synthesis for Use in Native Protein PED

An attempt was made to apply the PED technique to the sequencing of full-length

proteins reversibly immobilized on a solid-support. Disulfide chemistry was chosen as

the means of immobilization, with reduction allowing facile release from the resin. Thus,

100 mg of TentaGel M NH2 (10 µm) (Rapp Polymere GmbH) resin was functionalized

with a cysteine residue by reaction with 4 equivalents of Fmoc-amino acid as in 2.7.5

above. The N-terminus was acetylated by 8 eq. of acetic anhydride with 0.1% N,N- dimethylaminopyridine (DMAP) catalyst in 1:1 DCM:DMF for 1 hr at room temperature.

This capping reaction was repeated once. Removal of the S-trityl protection of Cys was affected by Reagent K treatment for 1 hr followed by extensive washing.

Next, a resin-bound activated disulfide was prepared. The resin was suspended in

3 mL of Buffer 1 before the addition of 50 µL of 100 mM tris(2-carboxyethyl)phosphine

(TCEP) for 15 min. Immediately after washing with water, the beads were suspended in

3 mL of 15 mg/mL NaHCO3 containing ~150 mg of 5,5’-dithiobis(2-nitrobenzoic acid)

(DTNB) for 5 min. This charging of the Cys side-chain was repeated once before the resin was washed with water and MeOH. After drying the resin was stored at -20 ºC until used.

17 An alternative solid support was synthesized by derivitizing cysteinyl-Amino

BioMac1800 (Biosearch Technologies) macroporous resin with Aldrithiol-2 (2,2’- dithiodipyridine). Briefly, 100 mg of the BioMac resin was reacted with Cys and acetylated as above. After de-protection and reduction as before, the resin was suspended in 3 mL of ethanol containing 4 eq of Aldrithiol-2 for 2 hr. The resin was washed, dried, and stored at -20 ºC.

2.7.8 Immobilization of Tryptic Digest of Co2+E. coli Peptide Deformylase

The Co2+ substituted C-terminally six-His-tagged peptide deformylase from E. coli (EcPDF-His6) was purified in our lab by Kiet Nguyen and Grace Zhou according to protocol [54]. A glycerol stock of the protein was buffer exchange into Buffer 2 by passage through a Sephadex G-25 fast de-salting column and quantitated by the Bradford method (BioRad). Immobilization was performed in two ways: 1) The trysin digest of

EcPDF was performed under de-naturing conditions before immobilization, and 2)

EcPDF was de-natured and immobilized, and then digested on the resin by trypsin. In the first case, to 47 µL containing 500 ng (~240 pmol) of EcPDF in Buffer 2 was added 33

µL of 3X Buffer 3. The solution was heated to 75 ºC for 2.5 min in a heating block before the addition of ice-cold ddH2O (19 µL) and incubation on ice for 5 min. After 5 min at room temperature, 1 µL of trypsin (1 mg/mL in Buffer 4) was added and the solution was place in a 37 ºC oven for 9 min. The digest mixture was then transferred to

1.5 mg of a mixed disulfide resin suspended in Buffer 5 and incubated for 45 min. This mixture was transferred to a degradation vessel by several rinses with ddH2O. The resin was further washed with water and MeOH, followed by 5 cycles of PED with Nic-OSu.

Subsequent reduction of the disulfide linkage was performed overnight in a 18 microcentrifuge tube in the presence of 2 eq of TCEP in 45 µL water. Samples were de-

salted prior to submission for MALDI-TOF analysis by performing two ZipTip

(Millipore) C18 pipette tip purifications on each according to manufacture’s protocol.

The on-resin trypsin digestion was only performed using the DTNB activated

TentaGel resin. Two 10 mg aliquots of resin were suspended in 50 µL 3X Buffer 3 (pH

8.4) or (pH 7.8) in Micro BioSpin columns (0.8 mL, BioRad), to each of which was

added 60 µL of EcPDF (4.2 mg/mL). The mixtures were heated to 75 ºC for 3 min and

then placed at room temp for 1 min before draining and washing with water. This

disulfide coupling reaction was repeated once. The resin was washed with water and suspended in Buffer 6 containing 3 µL trypsin (1 mg/mL) before incubating at 37 ºC for 6 hr. The digest was repeated with fresh trypsin for another 6 hr. Extensive washing preceded TCEP treatment for disulfide release and ZipTip clean-up for MALDI-TOF submission.

A virtual trypsin digest with [M+H]+ fragment calculations (Table 2.1) was

performed using the PeptideMass program found on the ExPASY Proteomics Server web

site: www.expasy.org. Only two Cys residues are present in EcPDF, and thus only these fragments are expected in the mass spectrum.

2.8 Results and Discussion

Despite its successful employment in our laboratory during the sequencing of a tetrapeptide library [52], as described earlier in the text, the PIC-based partial Edman

degradation technique proved problematic and unreliable. Our endeavors to improve

upon the procedure involved analyzing each component of the reaction including the

solvent composition, the degradation and capping reagents, and the temperature and 19 duration of the reaction. Reliability was finally achieved upon replacement of the highly reactive capping reagent PIC by the more controllable OSu-based reagents. The full- length sequencing success rate achieved with Bz-OSu averaged 95% in five trial degradations of ~50 beads from the pentatpeptide NH2-XXXXLNββRM-resin library

(example mass spectrum in Fig. 2.2).

Next, the library NH2-TAXXpYXXXLNββRM-resin was tested because it presented a much more daunting challenge. First, three constant residue positions must be partially degraded in addition to the five random ones (eight total cycles, plus a final cap) in order to achieve full-length sequences. This offered many more opportunities for error and tested the forgiving nature of the method. Second, the incorporation phosphotyrosine (pY) in the library had repeatedly been demonstrated in our lab to decrease the detection of residues N-terminal to it. This is believed to be due to the strong double negative charge imparted by the phosphate moiety which must be overcome during ionization in the positive mode. Thus, every peptide fragment that contains the pY residue (i.e. all truncation products N-terminal to pY) has a handicap toward ionization in the positive mode due to the inherent chemical nature of pY. The inclusion of Arg in the linker as a locus of positive charge acceptance is meant to overcome this handicap, but its compensation was incomplete as demonstrated in Figure

2.3.

However, the inclusion of pY residues in libraries is necessary for screening SH2 or PTB domains, and thus sequencing problems associated with it must be overcome.

With this in mind, one degradation of the pY library was performed with Bz-OSu and one with Nic-OSu as capping reagent. Only 24 beads from each were sequenced, but the Nic-

OSu reagent produced a higher percentage (92%) of full-length readable mass spectra 20 compared to the benzoylating agent (75%). It seemed likely that the conditions could be improved for both reagents, but Nic-OSu was more soluble and easier to work with than the Bz analogue. Moreover, it was reasoned that the mildly basic nicotinoyl appendage might work to restore a measure of the basicity lost upon acylation of the N-terminus, and thus improve the sensitivity of the detection method during positive ion mode, especially for pY peptides. Thus, Bz-OSu was abandoned in favor of its nicotinoyl cousin.

Non-commercial alternatives to Nic-OSu were explored as well. In attempts at increase ionization efficiency, and thus lower detection limits, the incorporation of permanent positive charges at the N-terminus were explored via the trimethylammoniumbutyric acid and N-methylnicotinic acid capping reagents. These

OSu-esters were tested in the PED method in crude form without purification and without success. If successful, these reagents would have been ideally suited for the native protein degradation technique explored later, since incorporation of a strongly basic ionizing site in the linker would not be possible in this situation (vide infra). Moreover, lowering detection limits would also be desirable in library sequencing applications, since lowered limits translates to smaller bead size, and thus, larger diversity and greater numbers of randomized positions. However, easy success with the Nic-OSu agent and difficulties encountered with solubilizing and purifying these reagents lead to their abandonment after merely cursory examination in the application of library sequencing.

One of the limitations of the PED technique described here is the low rate of

detection of Pro and Trp residues as their residue masses. Instead, these amino acids

were initially detected in absentia (i.e. gaps in sequences that corresponded to the weight

of either Pro or Trp plus some other amino acid, see top-most peak differences “P+A”

and “W+F” in Figure 2.4). Upon inspection of these gaps, consistent patterns were 21 recognizable. To test the authenticity and reproducibility of these gaps and patterns, two resin-bound test peptides of known sequence, FRAPNββRM* and DFWYNββRM*

(where M* represents the homoserine lactone produced by CNBr cleavage), were synthesized corresponding to alleged library-derived sequences. Comparison of mass spectra of the known and alleged sequences revealed virtually identical patterns, confirming the indentities of Pro and Trp residues in these sequences. Figure 2.3 illustrates the MALDI-TOF spectra of Bz-OSu degradations of the known Pro- and Trp- containing peptides.

For Pro, gaps reproducibly included a group of peaks at mass-differences 94, 112,

126, and 128 units from the preceding amino acid peak. Analysis of this pattern begins with the assumptions that the secondary amine of Pro 1) fails to react with the OSu-ester, but does react with PITC, and 2) forms a relatively stable phenythiocarbamate (PC). The peak at 618 corresponds to the benzoylated peptide fragment Bz-NββRM*. The assumed failure of the N-terminal Pro to react with Bz-OSu is confirmed by the lack of a 715 peak.

However, the uncapped peptide NH2-PNββRM* should appear at a mass of 611 (715 minus 104, mass of the benzoyl group less 1 proton), but is also absent. If NH2-

PNββRM* reacted with PITC, the PC-PNββRM* peptide would appear at 746 (611 plus

135), which is present and accounts for the 128 mass-difference. The slow rate of PC-

Pro adducts toward cyclization and cleavage in the Edman degradation reaction is well

characterized, alternatively being referred to as “lag” or “carryover” since it appears in

the HPLC chromatograms of subsequent amino acids during cycling [55]. Additionally,

automated Edman sequencing is performed under highly optimized conditions that

include an anoxic atmosphere to prevent the substitution of oxygen for sulfur in the PC-

adduct (-16 mass units), a known occurrence for degradation reactions performed in non- 22 inert atmospheres. This mechanism accounts for the 730 peak (112 mass-difference).

Lastly, a loss of water (-18 mass units) accounts for the 712 peak (94 mass-difference).

Thus, all but the 126 mass-difference can be rationalized according to known aspects of

Edman chemistry.

For Trp-containing gaps, peaks were consistently observed 183, 199, and 215 mass units from the preceding peak. Tryptophan’s historical failings in the Edman degradation methodology are related to its proclivity for oxidation [56]. Moreover, various oxidized and doubly-oxidized species have been observed for tryptophan in mass spectra [57]. These oxidized Trp species display a characteristic step-wise +16 mass unit pattern beginning from the native Trp peak (Fig. 2.5). This same incremental +16 pattern is present in Trp gap of Figure 2.4B (964, 980, and 996, corresponding to the 1516, 1532, and 1548 peaks of Figure 2.5. However, the peak at 964 is three mass units shy of the expected mass of the Bz-WYNββRM* peptide fragment. By analogy to the situation involving Pro, in which the 94 mass-difference peak was also 3 mass units shy of the expected Bz-Pro fragment, one can easily extrapolate the situation to include two +16 side-chain oxidation events.

As a result of the high success rate (generally >90%) of the Nic-OSu PED technique applied to synthetic library sequencing, we began considering its applicatio to intact protein sequencing as a sensitive, inexpensive, and quick alternative to tandem mass spectrometric (MS/MS) and traditional Edman degradation methods. The real challenge was finding an immobilization strategy that would withstand repetitive treatments by acid, base, aqueous, and organic solvents without significant sample loss, while also releasing the peptides for analysis at the end. We decided covalent attachment was least likely to incur sample loss during chemical cycling, and a disulfide linkage 23 would allow for facile release for mass spectrometric analysis. Moreover, the N-terminus

required for degradation would be unaffected by this chemistry and the relative

infrequency of cyteine residues in proteins would prevent the overcrowding of the

spectrum.

To test this strategy, activated mixed-disulfide resins were synthesized using

DTNB and dithiopyridyl derivatives of hydrophilic solid supports. The micro-spherical

TentaGel support failed to capture analyte under all conditions attempted, but the

macroporous BioMac resin employing the less reactive dithiopyridyl disulfide yielded

some success. The Co2+-substituted peptide deformylase of E. coli (EcPDF) was selected as the trial protein because it contains two Cys residues and was available in abundance in our lab (Table 2.1). Conditions were found under which EcPDF was rapidly and efficiently de-natured and digested in solution by trypsin such that the cysteine thiols were not oxidized. After immobilization, several cycles of PED were performed and the peptide mixture was eluted from the resin by treatment with TCEP. An enhanced

MALDI-TOF signal-to-noise ratio was obtained if the TCEP salt was removed by ZipTIP

mini-C18 purification prior to submission (Fig. 2.6). Under the optimal conditions tested,

detection limits in the high picomolar-range were the lowest achieved for the cysteine-

containing fragment terminated by Arg. The Cys-containing fragment terminating in Lys

was not detectable in most experiments. The lack of detection of this peptide may have

resulted from the faster oxidation rate of the Cys residue of this peptide or because of the

lowered ionization efficiency of Lys relative to Arg. As a result of the relatively high

detection limit of the Arg peptide and the lack of detection for the other, this

methodology was not pursued.

24 2.9 Conclusion

The application of PED to support-bound peptides offers a rapid, inexpensive, and highly successful means of sequencing the large numbers positive beads expected from library screening. Moreover, the elucidation of discreet, individual sequences is an important aspect for the determination of multiple binding motifs and the discovery of contextual relationships among ligand residues. On nearly all levels then, this technique appears superior to pooled sequencing techniques.

An added bonus of the technique is the side-chain acylation of lysine residues with Nic-OSu. Because of the mass shift from 128 to 233 mass-difference units introduced via the side-chain, Lys can easily be distinguished from Gln. Thus, libraries can be even more homogenous in the display of full-length ligands during screening, since molecular weight degeneracy can be removed during sequencing rather than during synthesis. The replacement of chain-termination encoding during synthesis marks a potentially powerful improvement in library screening, since non-uniform coupling reactivities among amino acids can lead to disparities and biases in ligand presentation.

A potential limiting facet of the technique, however, may be its difficulty with Pro and

Trp residues. While both display reproducible patterns allowing their identification, sequences rich with multiples of either residue may become problematic. On the whole, post-screening peptide-ladder sequencing is an improvement over previous methods.

AA1-AA2-AA3-AAn

PITC

PTC- AA1-AA2-AA3-AAn

H+

PTH- AA1 AA2-AA3-AAn

PITC HPLC PTC- AA2-AA3-AAn

H+

PTH- AA2 AA3-AAn

PITC

HPLC

Scheme 2.1. Traditional Edman degradation. PITC, phenyisothiocyanate. PTC, phenylthiocarbamate. PTH, phenylthiohydantoin.

AA1-AA2-AA3-AAn

AA1-AA2-AA3-AAn PITC + 5% PIC

PC- AA1-AA2-AA3-AAn

PTC- AA1-AA2-AA3-AAn

PTC- AA1-AA2-AA3-AAn H+

PC- AA1-AA2-AA3-AAn

AA2-AA3-AAn PTH- AA1 AA2-AA3-AAn

PITC + 5% PIC discard

PC- AA1-AA2-AA3-AAn

PC- AA2-AA3-AAn

PTC- AA2-AA3-AAn

n cycles

PC- AA1-AA2-AA3-AAn mass PC- AA2-AA3-AAn spec PC-AA3-AAn PC-AAn

Scheme 2.2. Peptide ladder generation in solution phase via partial Edman degradation.

PITC, phenyisothiocyanate. PIC, phenylisocyanate. PTC, phenylthiocarbamate. PTH, phenylthiohydantoin. PC, phenylcarbamate.

Figure 2.1. MALDI-MS obtained by Chait et al. [50] detecting ~25 pmol total peptide

(<~5 pmol per species) illustrating the protein ladder sequencing concept. The spectrum represents the eight N-terminal amino acids of the 14-mer [Glu1] fibrinopeptide B.

Scheme 2.3. Generation of the peptide ladder on the solid support. (a) 3:2 PITC/Bz-NHS in 4:1 pyridine/water; (b) TFA. Bz, benzoyl. Subsequent cleavage of the peptides from the resin by CNBr allows analysis by MALDI-TOF MS.

Figure 2.2. MALDI-TOF mass spectrum of a peptide and its truncation products after cleavage from the support by CNBr. The sequence of the peptide is NH2-

NIEILNββRM*, where M* is homoserine lactone generated during CNBr treatment of

Met.

1063.85 845.69

992.79

732.58 I F A

1605.04 1306.94 1534.09 1421.00 1706.15 pY N I A T

80 1000 1200 1400 1600 1800 m/z

Figure 2.3. MALDI-TOF MS of a pY peptide demonstrating the lowered ionization efficiency of the N-terminus complicating full-length sequencing. The sequence of the peptide is NH2-TAINpYAFILNββRM*, where M* is homoserine lactone generated

during CNBr treatment of Met.

Figure 2.4. A. MALDI-TOF MS of the sequence FRAPNββRM*. B. MALDI-TOF MS of the sequence DFWYNββRM*. Although neither residue is often seen as its residue mass, both are easily recognized by their consistent, aleternative patterns. M* is the homoserine lactone generated from Met during CNBr cleavage from the resin.

Figure 2.5. The step-wise +16 tryptophan oxidation pattern detected by MALDI-TOF

MS by Bienvenut and colleagues [57].

33 A.

1 MSVLQVLHIP DERLRKVAKP VEEVNAEIQR IVDDMFETMY AEEGIGLAAT QVDIHQRIIV 61 IDVSENRDER LVLINPELLE KSGETGIEEG CLSIPEQRAL VPRAEKVKIR ALDRDGKPFE 121 LEADGLLAIC IQHEMDHLVG KLFMDYLSPL KQQRIRQKVE KLDRLKARA

B. Calculated Monoisotopic Mass: 19316.28

Mass Position Peptide Sequence 1536.8202 1-13 MSVLQVLHIPDER 288.2030 14-15 LR 147.1128 16-16 K 1581.8594 17-30 VAKPVEEVNAEIQR 3052.4390 31-57 IVDDMFETMYAEEGIGLAAT QVDIHQR 1157.6524 58-67 IIVIDVSENR 419.1885 68-70 DER 1280.7824 71-81 LVLINPELLEK 1804.8381 82-98 SGETGIEEGCLSIPEQR 555.3613 99-103 ALVPR 347.1925 104-106 AEK 246.1812 107-108 VK 288.2030 109-110 IR 3433.7242 111-141 ALDRDGKPFELEADGLLAICIQHEMDHLVGK 1226.6489 142-151 LFMDYLSPLK 431.2361 152-154 QQR 288.2030 155-156 IR 275.1714 157-158 QK 375.2238 159-161 VEK 403.2299 162-164 LDR 260.1968 165-166 LK 246.1560 167-168 AR 90.0549 169-169 A

Table 2.1. A. The amino acid sequence of EcPDF. B. Virtual tryptic digest of EcPDF

with calculated [M+H]+ isotopic masses of the fragments. The two Cys-containing peptides are highlighted with the Cys residues in bold.

1822.62

1765.61 1909.68

1535.53 1636.57 GS TE

1500 1600 1700 1800 1900 2000 m/z

Figure 2.6. MALDI-TOF MS demonstrating the N-terminal sequencing of 500 ng (~240 pmol) of EcPDF by PED conducted by reversible disulfide immobilization. Note the masses of the peptides are 105 units larger than [M+H]+ calculated in Table 2.1 due to the mass of the nicotinoyl cap.

CHAPTER 3

DETERMINATION OF THE PHOSPHOPETIDE LIGAND SPECIFICITIES OF

SHP-2 AND SHIP SH2 DOMAINS BY COMBINATORIAL PEPTIDE

LIBRARY SCREENING

3.1 Introduction

Protein-protein interactions are an integral component of many cellular processes such as intracellular signaling. Frequently, the interactions are mediated by modular domains, which can recognize small, specific peptide motifs in their partner proteins. The

Src homology 2 (SH2) domain was one of the first examples of such domains, which binds to specific phosphotyrosyl (pY) peptides [43-45]. A large number of SH2 domains are now known and it has been estimated that the human genome encodes at least 115

SH2 domains [58]. Each SH2 domain interacts with a unique subset of pY peptides and the sequence specificity is primarily determined by the three amino acids immediately C- terminal to pY. Since the initial discovery of the SH2 domain, some many additional types of modular recognition domains have been discovered (e.g., BIR, SH3, PDZ, FHA, etc.) [59]. However, for the majority of these domains, their sequence specificity or in vivo interaction partners are currently unknown.

One approach to sorting out the complex protein-protein interaction network is to determine the sequence specificity of these modular domains through the screening of 36 combinatorial peptide libraries and then use the consensus sequence(s) to search the protein databases. Several combinatorial methods have been reported. In their pioneering work with SH2 domains, Cantley and co-workers employed affinity columns containing an immobilized SH2 domain to enrich SH2-binding sequences from a pY peptide library [47], a technique later expanded upon by others [60]. Sequencing of the enriched peptide pool by conventional Edman degradation revealed the preferentially selected amino acid(s) at each position. A variation of this method involved screening support-bound libraries against a GST-SH2 domain and detecting with fluorescently lablelled αGST antibodies [61]. The positive beads with the bound SH2 were removed from the library using a fluorescence-activated bead sorter and all of the selected beads were pooled and sequenced by Edman degradation. As described in Chapter 2, this method of sequencing provides information on the most preferred amino acid(s) at each position but, importantly, does not give individual sequences. Since the method selects for both affinity and abundance of certain types of sequences, a high-affinity peptide of low abundance may not emerge from the single derived consensus sequence. A second method involves the iterative synthesis and screening of sub-libraries or “positional scanning” [5]. However, in addition to being highly labor intensivethis method suffers from the same drawbacks and lack of contextual sequence information as the first method. In a third phage display method, bacteriophage bearing short random peptide sequences were selected against an immobilized modular domain [15, 62, 63]. The sequences of the binding peptides were determined after iterative amplification and selection of the bound phage by DNA sequencing. This method is highly effective for modular domains that recognize unmodified peptides but generally does not work well with protein domains that recognize post-translationally modified peptides [63, 64]. Here 37 we describe another method, in which resin-bound peptide libraries are selected against a

protein receptor and the positive beads are removed from the library and individually

sequenced by partial Edman degradation, a high-throughput technique recently developed

by this laboratory. Our method gives a statistically significant number of individual

sequences, from which a consensus sequence(s) can be derived. This method is applied

to determine the sequence specificity of three SH2 domains from phosphatases SHP-2

and SHIP.

3.2 Experimental Procedures

3.2.1 Vector Constructs

Originally, unmodified pMAL-c2 (New England Biolabs, NEB) and pGEX-2T

(Pharmacia) constructs were used for the expression of fusion proteins for library

screening. However, pMAL-c2 and several additional vectors were modified to enhance

some aspect of affinity tag purification, solubility, expression, etc, relative to the original.

First, two very similar pET-MAL vectors, pET-MAL I and II, were created by sub-

cloning the malE gene by PCR from pMAL-c2 into pET-28a (Novagen) with the retention all pMAL multiple cloning sites (MCS) (Appendix Scheme A1). The malE gene and MCS fragment was amplified with the following primers: sense, 5’-CCG AAC

TCC ATA TGA AAA TCG AAG AAG GTA AAC TGG-3’; anti-sense for pET-MAL I,

5’-GGA CGC TCG AGA ACG ACG GCC AGT GCC AAG CT-3’; anti-sense for pET-

MAL II, 5’-GGA CGC TCG AGT AAG CTT GCC TGC AGG TCG AC-3’. PCR was

performed simultaneously for both sets of primers in 50 µL reactions containing 80 ng of

MidiPrep-quality (Qiagen) pMAL-c2 template, 0.5 µM primers, 0.2 µM dNTPs (0.2 µM

of each nucleotide), 1X BSA, 1X ThermoPol Reaction Buffer, and 1 U of Deep VentR 38 DNA polymenrase (NEB). Thermocycling was programmed on a Applied Biosystems

PCR System 2400 to include 1 cycle of 94 ºC (2’), 59 ºC (1’), 72 ºC (1’), and 19 cycles of

94 ºC (1’), 62 ºC (45”), 72 ºC (45”). The PCR products were purified by spin column kit

(Qiagen), digested at the underlined primer sites with Nde I and Xho I (NEB) endonucleases according to manufacture’s protocols, and re-purified by spin column.

MiniPrep purified (Qiagen) pET-28a vector was similarly digested with Nde I and Xho I with the addition of Calf Intestinal Alkaline Phoshatase (CIP) (NEB) and purified by spin column. After quantitation of the digest products by agarose gel electrophoresis, ligation was performed overnight at 16 ºC in 15 µL reactions containing 140 ng of vector, 5- and

10- fold excesses of insert (malE), 1X T4 DNA Ligase Reaction Buffer, and 600 U of T4

DNA Ligase (NEB). After transformation of 5 µL of each reaction into 50 µL of chemically competent XL1 Blue E. coli (Stratagene) and screening, glycerol stocks of each construct were preserved at -80 ºC. The pET-MAL II construct was deemed more ideal and was subsequently the only one of the two used for future sub-cloning.

Hereafter, all references to pETMAL constructs refer to the pET-MAL II variant.

Furthermore, the control MBP, which contains N- and C-terminal six-His tags, was expressed from this variant, see later.

The next modifications involved the PinPoint Xa-1 (pPNPT) vector from

Promega. This plasmid encodes a fragment of a biotin carrier domain (BCD) from

Propioninbacter freundii as an N-terminal fusion. This domain is efficiently biotinylated at a specific lysine residue in E. coli, thus allowing it to be used as an affinity purification tag as well as for library screening. However, it was decided that purification by immobilized metal affinity chromatography (IMAC) would be less expensive and of greater benefit. Therefore, 32-mer oligonucleotides encoding a six-His-tag and an Eco RI 39 restriction site were designed such that, upon annealing they formed Not I

complementary 5’-overhangs at both ends. However, upon ligation one of the ends

destroys the Not I site for future restriction. The oligonucleotides 5’-GGC CGC CAT

CAT CAT CAT CAT CAT TGA ATT CT-3’ and 5’-GGC CAG AAT TCA ATG ATG

ATG ATG ATG GC-3’ were phosphorylated by T4 Polynucleotide Kinase (NEB). After

heating 15 µL of 100 µM double stranded insert to 45 ºC for 5’ and cooling on ice,

phosphorylation was performed in 30 µL of 1X T4 DNA Ligase Buffer and 10 U of T4

Polynucleotide Kinase at 37 ºC for 1 hr. After heat denaturation of the kinase at 65 ºC for

20 min, the His6-tag was ligated as above into the PinPoint vector digested by Not I.

Following transformation into the XL1 Blue strain, a colony containing the additional

Eco RI site and the correctly oriented insert, pPNPT-6HIS, was detected by restriction

mapping and stored at -80 ºC as a glycerol stock.

No further characterization was performed on the pPNPT-6HIS construct because

a new scheme was undertaken. By transferring the biotin carrier domain to the pET-14b

vector (Novagen) it was possible to add an N-terminal His6-tag and remove lac operator control of induction for use with pBirA [65] co-expression in the BL21-AI E coli strain

(Invitrogen). The procedure for sub-cloning by PCR into pET-14b was generally the same as that described for pET-MAL (Appendix Scheme A2). The pET vector was first altered to remove several restriction sites. After digesting the vector with Eco RI, the sticky-ends were filled in by the addition of dNTPs to 100 µM and 1 U of DNA

Polymerase I, Large (Klenow) Fragment (NEB) for 12’ at room temp. Subsequent to agarose gel purification and extraction by Gel Extraction Kit (Qiagen), digestion by Eco

RV allowed blunt-end ligation, which removed the Eco RI, Hind III, and Eco RV sites from pET-14. The BCD and MCS were amplified from the pPNTP template by PCR 40 using the primers 5’-GGA ACC ACA TAT GAA ACT GAA GGT AAC AGT CAA C-3’

and 5’-GGA ATT CAC TAT AGA ACC AGA TCG CG-3’. Thermocycling consisted of one cycle of 94 ºC (2’), 60 ºC (1’), 72 ºC (1’), and 19 cycles of 94 ºC (1’), 63 ºC (45”), 72

ºC (45”). The PCR product was purified, digested by Eco RI, blunt-ended as above, spin column purified, and digested by Nde I. The pET vector was digested with Bam HI, blunt-ended, and purified before treatment with Nde I and CIP. The single sticky-end ligation proceeded smoothly as above. Sequence confirmation of this pET-PNPT construct by the dideoxy chain-termination method was preformed at the Plant Genome

Facility and a glycerol stock of XL1 Blue cells containing this construct was made for storage at -80 ºC.

Additionally, the hydrophilic linker and MCS from the pMAL vector were inserted C-terminally (Appendix Scheme A3). This required the removal of one of the

Xho I sites within the biotin carrier gene by incorporation of a silent mutation.

Introduction of a point mutation was accomplished using the QuikChange Site-Directed

Mutagenesis Kit (Strategene) and the primer pair 5’-CCG TGC TCG TTC TTG AGG

CCA TGA AGA TGG-3’ plus its complement. A 50 µL reaction containing 10 ng of

pET-PNPT template, 0.2 µM dNTPs, 0.16 µM each primer, 1X reaction buffer, and 2.5 U

of Pfu Turbo polymerase was thermocycled once for 95 ºC (30”), followed by 14 cycles

of 95 ºC (30”), 52 ºC (1’), 68 ºC (12’), and allowed to hold 68 ºC for a extra 12’. The

methylation-specific endonuclease Dpn I was added for 1 hr. at 37 ºC before

transformation. Sequencing confirmed the single mutation. Next, the linker and MCS

was amplified from the pETMAL template with the primers 5’-GGA TAT CGC AGA

CTA ATT CGA GC-3’ and the T7-terminator primer 5’-GCT AGT TAT TGC TCA

GCG G-3’ and digested with Eco RV and Xho I prior to blunt-ending by Klenow 41 treatment.. The mutated vector was digested with Xho I and Not I before the Klenow fragment filled in the overhangs. Blunt-end ligation proceeded at room temperature for

90 min rather than 16 ºC overnight. Sequencing confirmed the authenticity of the final product, dubbed pPPTmal vector. A glycerol stock of XL1 Blue cells containing pPPTmal was stored at -80 ºC.

In an attempt to conduct fluorescent-based library screenings, a green fluorescent

protein (GFP) fusion system was sought. From the Gopalan lab, we received what was

believed to be the pGFPuv vector distributed by Clontech. However, it was determined

by sequencing that the plasmid contained the GFPuv gene, but as a six-His-tagged fusion

in pET-29. The tag was useful, but work was necessary to make the vector usable for

domain fusion (Appendix Scheme A4). First, the removal of a stop codon preceding the

MCS and destruction of a Bam HI site within the coding region were affected in one step

by PCR. The primer 5’-CCT GCA AGA TCT GTT CAA CTA GCA GAC CAT TAT C-

3’ hybridized just 3’ of the Bam HI site of the gene, but encoded a Bgl II restriction

sequence in its place. The sticky-ends of the 5’ overhangs of these two enzymes are

cohesive, but after ligation cannot be re-cut by either enzyme. The primer 5’-CCA CGA

ATT CAT CCA TGC CAT GTG TAA TCC-3’ contained an Eco RI site and hybridized a

few bases upstream of the vector’s own Eco RI site, excluding the stop codon. Thus, the

pET29-GFPuv plasmid served as the template for PCR amplification of a fragment

containing a Bam HI to Bgl II site switch and devoid of a stop codon. The fragment was

digested with Bgl II and Eco RI while pET29-GFPuv was digested with Bam HI and Eco

RI with de-phosphorylation by CIP. After ligation, sequencing confirmed restriction site destruction by a single silent C→T substitution and removal of the termination codon upstream of the MCS. 42 The GFP vector was now suitable for the construction of domain fusions for screening, but the GFPuv gene had been optimized for excitation by 354-nm light

sources, hence the “UV” moniker. Since our fluorescent microscope’s excitation

wavelength was ~488-nm, several mutations were made to increase excitation at this

wavelength in order to match our equipment. The following mutations were introduced

by QuikChange SDM Kit as above because they had been described to enhance the 488-

nm induced fluorescence and/or the solubility/folding of GFP: S72A, I167T, F64S, S65T,

and V68L. In addition, one of the Nde I sites was destroyed by silent mutation. The

primers (along with their implied complementary oligos) used for these mutations were,

consecutively: Nde I 5’-GTT ATCNCGG ATC ACA TGC AAC GGC ATG-3’; S72A 5’-

GGT GTT CAA TGC TTT GCC CGT TAT CCG GAT C-3’; I167T 5’-GCT AAC TTC

AAA ACC CGC CAC AAC ATT GAA G-3’; F64L,S65T,V68L 5’-CCA ACA CTT

GTC ACT ACT TTG ACT TAT GGT CTT CAA TGC TTT GCC-3’. The long triple

mutation primer was purified by 12% Urea-PAGE electrophoresis according to the protocol described in Chapter 5. Thermocycling for all reactions was 95 ºC (30”), followed by 16 cycles of 95 ºC (30”), 51 ºC (1’), 68 ºC (14’), and allowed to hold 68 ºC for a extra 14’, except for the triple mutation reaction, in which case 18 cycles were performed. Dpn I treatment preceded transformation and sequencing for each mutation.

Finally, in order to add the hydrophilic linker and MCS from pMAL C-terminal to

GFP, an Eco RV restriction site was introduced by QuikChange SDM one nucleotide 5’ of the Eco RI site of the endogenous MCS. The primer 5’- GGG ATT ACA CAT GGG

ATA TCT GAA TTC ACT ATG G-3’ and its complement were themocycled as for the

triple mutation above in 50 µL reactions. After confirming the mutation by restriction

mapping, the pMAL linker was amplified by PCR exactly as before and ligated at the Eco 43 RV and Hind III sites of the pGFP vector. After confirmation by dideoxy sequencing the

XL1 cells harboring this pGFPmal construct were preserved at -80 ºC as a glycerol stock.

3.2.2 SHP-2 SH2 Domain Constructs

The DNA sequences coding for SHP-2 N-SH2 domain (aa 1-106), C-SH2 domain

(aa 108-220), and SHIP SH2 domain (aa 1-109) were isolated by PCR from pET22-SHP2

[66], pBS-SHP2 [67], and pGEX2T-SHIP(SH2) [68] plasmid templates, respectively.

The DNA primers used were: N-SH2, (T7 promoter primer) 5’-TAA TAC GAC TCA

CTA TAG GG-3’ and 5’-AGA TTA GAA GCT TTC AAT CTG CAC AGT TCA GAG

GAT ATT TAA GC-3’; C-SH2, 5’-ATA TAG AAT TCA TGA CCT CTG AAA GGT

GGT TTC ATG GAC A-3’ and 5’-AGA TTA GAA GCT TTC AAC GAG TCG TGT

TAA GGG GCT GCT-3’; SHIP SH2, 5’-GCG AAT TCA TGC CTG CCA TGG TCC

CTG G-3’ and 5’-CGT CCA AGC TTC ACT CCT CCT CCA GGG GCA C-5’. The

PCR products of the C-SH2 and SHIP domains were digested with the restriction endonucleases Eco RI and Hind III, and ligated into their corresponding sites in pMAL-

c2. However, because the primer 5'-ATA TAG AAT TCA TGA CAT CGC GGA GAT

GGT TTC A-3’ repeatedly failed in the amplification of the N-SH2 domain, and was shown to be incapable of amplification when paired with the proven anti-sense primer of the C-SH2 domain, an alternative strategy was pursued for the sub-cloning of this domain. The T7 promoter primer hybrized upstream of the SHP-2 gene in the pET22-

SHP2 construct and also incorporated an in-frame Nde I site from the vector.

Fortunately, an in-house pMAL-c2’-SHPTP vector contained an in-frame 5’ Nde I sequence. Unfortunately, the site is not unique within the plasmid and therefore, partial

digestion was required to isolate the desired recipient vector. After 90 min of Hind III 44 linearization of 6.0 µg of pMAL-c2’-SHPTP vector in 100 µL, 20 µL aliquots were

removed and treated with Nde I for 15. 20, 25, and 30 min intervals before heat de- naturating (70 ºC for 20’). De-phosphorylation by CIP and agarose gel purification of the

~6.2 kb fragment delivered the correct recipient vector. The PCR product was double digested by Nde I and Hind III and ligated to pMAL-c2’. Each construct, pMAL-NSH2, pMAL-CSH2, and pMAL-SSH2, was confirmed by restriction mapping.

Subsequently, each domain was sub-cloned from the above pMAL templates to the pETMAL vector. Agarose gel purified Eco RI/Hind III digestion fragments from

each pMAL-domain fusion were ligated to reciprocating pETMAL sticky-ends, yielding

the respective His6-MBP-domain fusions. Furthermore, pPPTmal and pGFPmal fusions

were generated by PCR-based sub-cloning from the pETMAL-domain templates using

the malE sequencing primer (5’-GGT CGT CAG ACT GTC GAT GAA GCC-3’) and the

T7-terminator primer. Since all constructs had been designed to share the same reading frame and MCS, digestion and ligation at Eco RI/Hind III were universally applicable.

Lastly, simple N-terminal His6-tag fusions were introduced to each SH2 domain

by sub-cloning into the pET-28a vector. These smaller constructs were for use in surface

plasmon resonance (SPR) experiments. The same scheme as for the pPPTmal and

pGFPmal constructs was employed. Namely, the malE and T7-terminator sequencing

primers amplified each domain, which was subsequently ligated to the in-frame Eco

RI/Hind III sites of pET-28a. The His6-domain fusions were sequenced from this

construct, confirming the identities of their predecessor pETMAL constructs.

45 3.2.3 Control and SHP-1 Constructs

The pMAL-∆lacZ, pET28-NSH2(SHP-1), and pET28-CSH2(SHP-1) plasmid were generated by Kirk Beebe. The pMAL-NSH2(SHP-1) and pMAL-CSH2(SHP-1) fusions have been previously described [69, 70]. The ∆lacZ plasmid was the source of the MBP control lacking a domain fusion. The pETMAL vector encoded its own stop codon after the C-terminal His-tag, thus its MBP product contained two His-tags.

Control constructs for pPPTmal and pGFPmal were generated by in a fashion similar to that of pMAL- ∆lacZ, namely, linearized vector was treated with Klenow fragment and re-ligated in order to hasten the occurrence of a stop codon in the reading frame. For pPPTmal the restriction sites at which this was performed were Eco RI/Xho I, while for pGFPmal Hind III was used. In each case, the stop codon was reached within 7 amino acids.

3.2.4 Purification and Biotinylation of His6-MBP-SH2 Proteins

E. coli BL21(DE3) cells harboring the proper pETMAL-SH2 plasmid were grown in LBK medium + glucose (2 g/L) to the mid-log phase and induced by the addition of

300 µM isopropyl-β-D-thiogalactoside (IPTG) for 2.5 h at 30 ºC. The cells were harvested by centrifugation and lysed in Buffer 7 by passing through a French press.

Each MBP-SH2 protein was purified from the crude lysate on an amylose column according to manufacturer’s recommended procedures (NEB). The protein was eluted from the column in Buffer 8, concentrated in an Amicon stirred-cell concentrator to approximately 4 mg/mL, and treated with 2 equivalents of Biotin-OSU (dissolved in

DMF) at room temperature for 45 min. Excess biotin was removed by passing the solution through a Sephadex G-25 column equilibrated in Buffer 9. After concentration 46 and addition of glycerol (final 40%), the protein was quickly frozen in a dry

ice/isopropanol bath and stored at -80 ºC.

For BIAcore work, the above procedure was performed without the addition of

the biotin label. Rather, the concentrated protein was polished and buffer exchanged into

Buffer 10 by passage in 4 mL aliquots through a size exclusion column (XK-16

Superdex-75) connected to an FPLC system (Pharmacia) at a flow rate of 1.1 mL/min.

After concentration by centrifugal ultrafiltration (Millipore), all proteins were flash

frozen without the addition of glycerol.

3.2.5 Purification of His-tagged SH2 Domains

N-terminally histidine-tagged SH2 domains were expressed in the Rosetta

CodonPlus strain of E. coli BL21(DE3) cells (Novagen). Protein expression was induced

by the addition of 300 µM IPTG to mid-log phase cells and incubation at 30 ºC for 3 h.

Cells were harvested by centrifugation and lysed in a French pressure cell in Buffer 11.

Crude lysate was loaded onto a Talon cobalt affinity column (10 mL). After washing

with 10 column volumes of Buffer 12, the SH2 protein was eluted with Buffer 13 and exchanged through a Superdex-75 column connected to an FPLC system into Buffer 10.

After concentration by centrifugal ultrafiltration, each protein was flash frozen in the

absence of glycerol.

3.2.6 Purification of Full-Length SHP-2 for Stimulation Assays

The full-length SHP-2 protein was expressed and purified in a manner very

similar to that described in the literature [66]. Specifically, E. coli Rosetta CodonPlus

BL21(DE3) cells harboring the pET22-SHP2 plasmid were grown in 4 L of LBA 47 medium to the mid-log phase and induced by the addition of 300 µM IPTG for 3.5 hr at

30 ºC. The cells were harvested by centrifugation and lysed in Buffer 14 by passage

through a French press. The lysate was applied to a Q-Sepharose Fast Flow (Pharmacia) column (2.5 x 10 cm) equilibrated with Buffer 15 and washed with 2 column volumes of

Buffer 15 prior to the development of an elution gradient (100% Buffer 15 to 100%

Buffer 16) in 300 mL. Activity towards p-nitrophenylphosphate (pNPP) was detected in the flow-through, wash, early part of the gradient. These fractions were pooled (~500 mL total volume) and the pH was adjusted by the slow addition of 75 mL of a 100 mM

MES, pH 5.6 solution. The entire volume was loaded onto an SP-Sepharose Fast Flow

(Pharmacia) column (2.5 x 15 cm) equilibrated with Buffer 17 and washed with 4 column volumes of the same. Elution was performed by the development of a gradient (100%

Buffer 17 to 100% Buffer 18) in 300 mL. Activity was detected in fractions near the middle-end of the gradient. These fractions were pooled and concentrated to ~35 mL in an Amicon stirred-cell pressure concentrator. After buffer exchanging ~10 mL aliquots into Buffer 17 by passage through G-25 columns, 10 mL aliquots were injected at 2.5 mL/min by Superloop onto a Mono S HR 10/10 column connected to an FPLC pump system (Pharmacia). Gradient elution between Buffers 17 and 18 in 250 mL adequately resolved SHP-2 (see Appendix Fig. A3.1), which was stored at -80 ºC in the presence of

~40% glycerol.

3.2.7 Synthesis of pY Library

The library was synthesized on 5 g of 90-µm TentaGel S NH2 resin using standard

Fmoc chemistry employing HBTU/HOBt/DIPEA as the coupling reagents. The invariant

positions (LNββRM, pY, and N-terminal TA) were synthesized with 4 equiv of Fmoc- 48 amino acids and the coupling reaction was terminated after ninhydrin tests were negative.

The random positions were synthesized using the split-synthesis method [4, 6]. The

coupling reactions employed 5 equiv of Fmoc-amino acids and proceded for 45 min, after

which time the coupling reaction was repeated to insure complete reaction. To facilitate

sequence determination by mass spectrometry, 5% Ac-Gly was added to the coupling

reactions of Leu and Lys, whereas 5% Ac-Ala was added to the coupling reactions of Nle

[71]. After removal of the terminal Fmoc group, the resin-bound library was washed

with dichloromethane and deprotected using Reagent K (defined in Chapter 2) at room

temperature for 60 min. The resultant NH2-TAXXpYXXXLNββRM-resin library

(hereafter referred to as pY library) was washed with TFA, DCM, and MeOH before

drying for storage at –20 ºC.

3.2.8 Colorimetric Library Screening

In a micro BioSpin column (0.8 mL, BioRad), 100 mg of the pY library was

swollen in dichloromethane, washed extensively with methanol, ddH2O, and Buffer 19,

and blocked for 1 h with 800 µL of Buffer 19 containing 0.1% gelatin. The resin was drained and resuspended in 800 µL of a biotinylated MBP-SH2 domain of interest (10–50

nM final concentration) in Buffer 19 plus 0.1% gelatin. After overnight incubation at 4 ºC

with gentle mixing, the resin was drained and re-suspended in 800 µL of Buffer 20

containing 1 µL of streptavidin-alkaline phosphatase (SA-AP, ~1 mg/mL, Prozyme).

After 10 min of gentle mixing at 4 ºC, the resin was rapidly drained and washed with 400

µL of Buffer 20, 400 µL of Buffer 21, 400 µL of Buffer 19, and again with 400 µL of

Buffer 21. The resin was then transferred into a 35-mm Petri dish in 5 x 300 µL of Buffer

21. Upon addition of 80 µL of 5 mg/mL BCIP in Buffer 21, intense turquoise color 49 developed on positive beads in ~45 min, at which point the staining reaction was quenched by the addition of 3 mL of 8 M guanidine-HCl, pH 8.0. The resin was transferred back into the BioSpin column, extensively washed with water, and re-plated in the Petri dish from which colored beads were picked manually using a pipette under a dissecting microscope. The positive beads were sorted by color intensity into “intense”,

“medium”, and “light” categories. Control experiments with biotinylated MBP produced no colored beads under identical conditions.

3.2.9 Partial Edman Degradation and Peptide Sequencing

Positive beads were pooled according to color intensity and subjected to partial

Edman degradation in a procedure similar to that of Chapter 2, but with a few optimizations for the pY library. The beads were suspended in 66% pyridine (aq) containing 0.1% Et3N, to which was added an equal volume of 5% PITC in pyridine containing a variable amount of Nic-OSU. After rapid mixing, the reaction was allowed to proceed for 6 min. The beads were washed with methanol, dichloromethane, and TFA and suspended in TFA (2 x 6 min). After extensive washing with dichloromethane and pyridine, the cycle was repeated. An optimized procedure was established for this library by trial and error using unselected beads which employed varying PITC/Nic-OSU mole ratios as follows: 6:1 for the N-terminal T and A; 4.5:1 for the N-terminal random positions; no Nic-OSU during pY degradation; and 5:1 for the C-terminal random positions. Finally, the linker sequence was capped by Nic-OSU in the absence of PITC.

The beads were then treated for 20 min with ~1 mL of TFA containing NH4I (10 mg) and

Me2S (20 µL) on ice to reduce any oxidized methionine. The beads were washed with ddH2O, placed in individual microcentrifuge tubes, and treated overnight in the dark with 50 20 µL of 70% TFA containing CNBr (20 mg/mL). After evaporating to dryness, the

peptides were dissolved in 5 µL of 0.1% TFA in water. One µL of the peptide solution

was mixed with 2 µL of 0.1% TFA in 50% acetonitrile saturated with 4-hydroxy-α-

cyanocinnamic acid and spotted onto a 96-well MALDI-TOF sample plate. MALDI-

TOF mass spectrometry was performed on a Bruker Reflex III instrument in an

automated manner. Sequence determination from the mass spectra was performed

manually.

3.2.10 Synthesis of Biotinylated pY Peptides

All pY peptides contained a common C-terminal linker, -LNBKR-NH2. Each

peptide was synthesized on ~65 mg of CLEAR-amide resin using standard

Fmoc/HBTU/HOBt chemistry. The N-terminus was acetylated by the treatment of Ac2O.

Cleavage and de-protection were carried out as previously described. Approximately 3

mg of the crude peptide was dissolved in a minimal volume of DMSO (300–500 µL, with

sonication) and reacted with 1 equiv of NHS-PEG4-biotin (Quanta BioDesign, Ltd.) in 25

µL of DMSO. After 45 min at room temperature, the mixture was triturated twice with 20

volumes of Et2O. The precipitate was collected and dried under vacuum. The biotinylated pY peptide was purified by reversed-phase HPLC on a C18 column (Vydac 300Ǻ 4.6 x

250 mm). The identity of each peptide was confirmed by MALDI-TOF mass spectrometric analysis. This procedure resulted in the addition of a 15-atom hydrophilic linker between the side chain of the C-terminal lysine and the carboxyl group of biotin.

51 3.2.11 Determination of Dissociation Constants by BIAcore

All measurements were made at room temperature on a BIAcore 3000 instrument.

A sensorchip containing immobilized streptavidin was conditioned with 1 M NaCl in 50

mM NaOH (aq) according to manufacturer’s instructions. The biotinylated pY peptides

were immobilized onto the sensorchip by flowing 6 µL of ~8 µM pY peptide solution in

HBS-EP Buffer purchased from BIAcore. Initial studies using the MBP-fusions proved unreliable, and thus, sensorgram data for the secondary plot analysis were acquired by passing increasing concentrations (0–5 µM) of a His6-SH2 protein in HBS-EP Buffer over the sensorchip for 2 min at a flow rate of 15 µL/min. A blank flow cell (no immobilized pY peptide) was used as control to correct for any signal due to the solvent

bulk and/or nonspecific binding interactions. In fact, neither significant bulk effect nor

nonspecific binding was observed. In between two runs, the sensorchip surface was

regenerated by treatment with Strip Buffer for 5–10 s at a flow rate of 100 µL/min. The

equilibrium response unit (RUeq) at a given SH2 protein concentration was obtained by subtracting the response of the blank flow cell from that of the sample flow cell. The dissociation constant (KD) was obtained by fitting the data to the equation,

RUeq = RUmax[SH2]/(KD + [SH2])

where RUeq is the measured response unit at a certain SH2 protein concentration and

RUmax is the maximum response unit.

52 3.3 Results

3.3.1 Library Design, Synthesis, and Screening

To demonstrate the effectiveness of the combinatorial method, we chose to

determine the sequence specificity for the SH2 domains of protein tyrosine phosphatase

SHP-2 and inositol phosphatase SHIP. SHP-2 and a structurally similar phosphatase,

SHP-1, belong to a subfamily of PTPs which each contain two SH2 domains N-terminal

to their catalytic domain, whereas SHIP contains a single SH2 domain. All three proteins

are involved in a variety of signaling pathways [72]. Despite their sequence homology,

SHP-1 and SHP-2 have very different in vivo functions. For example, SHP-2 generally

acts as a positive regulator for the various signaling pathways, whereas SHP-1 primarily

acts as a negative regulator of signaling events [72]. Some studies show that SHP-1,

SHP-2, and SHIP recognize distinct pY motifs on various receptors via their SH2

domains, while others report that the three enzymes can compete for binding to a common receptor bearing one or more immunoreceptor tyrosine-based inhibition motifs

(ITIMs) [73]. These data suggest that the SH2 domains in SHP-1, SHP-2, and SHIP have distinctive but partially overlapping specificities. Therefore, a detailed study on their sequence specificities would be very helpful in identifying their physiological targets and determining their cellular functions.

The specificity of an SH2 domain is primarily determined by the pY residue and the three residues immediately C-terminal to pY [74-76], although it has been reported

that, for a few SH2 domains including those of SHP-1 and SHP-2, the -2 position (2

residues N-terminal to pY, which is position 0) is also important for high-affinity

interaction [77, 78]. Thus, we designed a pY library, H2N-TAXXpYXXXLNββRM-

resin, where X represents norleucine (Nle) or any of the 18 natural amino acids except for 53 Met and Cys and β is β-Ala. The N-terminal dipeptide TA helps reduce potential bias

caused by electrostatic interactions between an SH2 protein and the free N-terminus

(which is required for peptide sequencing). At the C-terminus, a methionine permits the

release of peptides from the resin by CNBr treatment prior to sequencing, while arginine

serves to increase peptide solubility and sensitivity during MALDI-MS sequencing by

providing a fixed positive charge. The two β-alanines add flexibility to the peptides,

making them more accessible to a protein target. The dipeptide LN is added to shift the

masses of the peptides to >600 Da, so that their mass spectral peaks do not overlap with

matrix signals (vide infra). Methionine is excluded from the randomized positions to

avoid internal cleavage during CNBr treatment, and is replaced by its isosteric residue

norleucine. The library was synthesized on TentaGel S NH2 resin (~90 µm in diameter and ~2.86 × 106 beads/g) using the split-pool method [4, 6] with each bead carrying ~100

pmol of a unique sequence. This method ensures equal representation of all possible

sequences in the library.

The theoretical diversity of the above library is 195 or 2.5 × 106 and therefore, in principle, 1 g of resin-bound library covers the entire sequence space. A typical screening involved ~100 mg of resin, to which was added a small amount of an SH2 domain protein (10–50 nM final concentration), constructed as an MBP fusion protein and biotinylated on a surface lysine residue(s). Binding of the biotinylated SH2 domain to a resin-bound pY peptide recruits a streptavidin-alkaline phosphatase conjugate to the surface of that bead. Upon the addition of BCIP, the bound alkaline phosphatase cleaves

BCIP into an indole, which dimerizes to form a turquoise precipitate deposited on the bead surface. As a result of this reaction cascade, beads carrying high-affinity SH2

54 ligands become colored. The number of colored beads depends on the binding affinity

and specificity of the protein domain as well as the stringency of the screening conditions

(e.g., SH2 domain concentration, number of washings, and length of staining time). The screening reactions were controlled so that 10~100 colored beads were obtained from 100 mg of resin (~280,000 beads). The number of positive beads was quite reproducible for

all of the SH2 domains we have studied as well as between different screenings against

the same SH2 domain. Positive beads were manually removed from the library using a

micropipette with the aid of a dissecting microscope.

3.3.2 Peptide Sequencing by PED

The development of a generally applicable, highly successful sequencing

technique dubbed “partial Edman degradation” was described in Chapter 2. After a brief

optimization of the conditions for this specific library, that technique was applied with

little modification to the sequencing of the numerous beads obtained from screening the

minimally-encoded pY library above with a high rate of success (>90%). Figure 3.1

shows an example mass spectrum, derived from a single bead carrying the peptide

sequence TA(Nle)YpYATILNBBRM. Note that the isobaric residues Nle, Leu, and Ile

are unambiguously resolved in the spectra by their appearance as a singlet (Ile) vs doublet

peaks (Nle and Leu).

3.3.3 Specificity of the C-SH2 Domain of SHP-2

Screening of the above library (100 mg) against 10 nM C-terminal SH2 domain of

SHP-2, MBP-CSH2, resulted in 14 intensely colored beads, 12 lightly colored beads, and

53 beads of intermediate color intensity. Since all of the beads were treated in the same 55 manner, the color intensity of a bead should correlate with the binding affinity of the pY peptide on the bead for the SH2 domain used in the screening. The three groups of beads

(79 total) were placed in 3 separate vessels and subjected to partial Edman degradation followed by MALDI-TOF analysis. Out of the 79 samples, 77 produced high-quality spectra, allowing for unambiguous determination of their peptide sequences (Table 3.1).

The mass spectra for the remaining two beads had one or more peaks missing, preventing complete sequence assignment. Construction of a histogram of the selected sequences reveals general trends, such as the strongly preference for a nonpolar aliphatic residue at the +3 position, with isoleucine being the most preferred amino acid (present in 57 selected peptides), followed by valine (present in 15 peptides) and leucine (present in 5 sequences) (Figure 3.2). The +1 position has the second most stringent requirement, strongly preferring an alanine (present in 46 peptides) or other small amino acids such as serine (present in 18 sequences), threonine (present in 10 sequences), and valine (present in 3 sequences). The –2 position is also critical for binding to the C-SH2 domain of SHP-

2, preferring a β-branched amino acid such as threonine, valine, and isoleucine, which are occasionally replaced by a tyrosine. There is a weak preference for a β-branched residue at the +2 position and virtually no selectivity at the –1 position.

To test whether the library screening result is reproducible, the above experiment was repeated with 50 nM MBP-CSH2 protein under otherwise identical conditions.

Ninety intensely colored and ~150 less colored beads were obtained and sequenced (see

Appendix Table A3.1 for sequences). The plot of the positional frequency of appearance for each amino acid (based on the 90 intensely colored beads) produced a pattern indistinguishable from that derived from the 10 nM screening (Figure 3.2). Inspection of the histograms in conjunction with the individual sequences allows us to draw the 56 following conclusions. First, SHP-2 C-SH2 domain recognizes a single consensus sequence (T/V/I/y)XpY(A/s/t/v)X(I/v/l), where lower case letters represent less frequently selected residues and X is any amino acid except for glycine and proline.

Second, the screening method is highly reproducible and robust. Finally, one can unambiguously determine the sequence specificity of an SH2 domain by screening just a fraction of the complete library (~10% in this case), because not all of the randomized positions are crucial for SH2 binding. The same conclusion (the validity of using incomplete libraries) was also borne out of our earlier work with FHA domains [79].

This greatly reduces the cost and time required for the characterization of each SH2 domain.

3.3.4 Specificity of the N-SH2 Domain of SHP-2

Initial screening of 100 mg of the library against 10 nM SHP-2 N-SH2 domain gave rather surprising results; the N-SH2 domain appeared to bind pY peptides of several distinct classes. To obtain additional sequences for more reliable statistical analysis, the screening experiment was repeated twice, once at 10 nM and another at 50 nM N-SH2 protein. Again, the results were highly reproducible, with all three screenings producing the same types of sequences. All together, 150 intensely colored beads were selected from 300 mg of the library and their sequences are listed in Table 3.2 (the most colored beads from 10 nM screenings are shown in boldface). Additional sequences from less colored beads are listed in Appendix Table A3.2. Clearly, the selected sequences fall into five distinct classes. The most abundant class (class I) has a consensus sequence of

(I/L/V)XpY(T/V/A)X(I/V/L), which is similar to that of the C-SH2 domain, albeit with some subtle differences at the -2 and +1 positions (Figure 3.3A). First, although the C- 57 SH2 domain most prefers threonine at the –2 position, threonine is rarely (twice) found in the N-SH2-binding peptides. Also, while leucine is seldom selected at –2 position by the

C-SH2 domain, it is the second most preferred residue at this position for the N-SH2 domain. Another difference is at the +1 position; while the C-SH2 domain strongly prefers alanine to serine, threonine, or valine, the N-SH2 domain selects for threonine, valine, and alanine with approximately equal frequency (but not serine).

The second most abundant class of peptides (Class II) has the consensus of

W(M/v/t/s)pYX(I/l/t/y)X, where the –2 residue is always a tryptophan and the –1 position is usually norleucine, valine, threonine, or serine (Fig. 3.3B). Remarkably, while the +2 position is highly variable among class I peptides, it is the most invariant position on the

C-terminal side of pY for class II peptides. The identity of the preferred residues (Ile,

Leu, Thr, and Tyr) suggests that the +2 side chain is engaged in hydrophobic interactions with the SH2 domain. Consistent with this binding mode, the selected +1 and +3 residues contain predominantly hydrophilic (e.g., Arg, Gln, and Thr) or small side chains, suggesting that they presumably face the solvent. The remaining three classes were less frequently selected and have the consensus sequences of

(I/V/L/T)XpY(L/M/T)Y(A/S/P/M) (class III), (I/V/L/T)XpY(F/M)XP (class IV), and

(L/I/V/T)XpY(M/V)(I/L/V/S)F (class V) (Table 3.2, Appendix Fig. A3.3). To the best of our knowledge, this is the first SH2 domain known to recognize multiple distinct consensus sequences. It is worth noting that all of the pY proteins currently known to bind to SHP-2 N-SH2 domain employ class I motifs (vide infra). It remains to be seen whether nature also uses the class II to V sequences as alternative SHP-2-binding motifs.

58 3.3.5 Specificity of SHIP SH2 Domain

The SH2 domain from SHIP was screened against the pY library at two different

SH2 protein concentrations (10 and 50 nM) and the peptide sequences from the 158 most intensely colored beads were used in statistical analysis (Table 3.3). SHIP SH2 domain binds to pY peptides of the consensus pY(Y/S/T/v)(L/y/nle/f/v)(L/Nle/i/v) (Figure 3.4).

Its specificity overlaps with those of SHP SH2 domains but also has a number of unique features. First, on the N-terminal side of pY, SHIP SH2 domain does not require specific residues for high-affinity binding, although among the selected sequences there appears to be a higher-than-expected number of small residues (e.g., Gly, Pro, and Ala) at the –2 and –1 positions. Second, high-affinity binding to SHIP SH2 domain requires a hydrophobic residue at the +2 position, with leucine being the most preferred, followed by tyrosine, norleucine, phenylalanine, and valine. The latter feature had previously been noted by other investigators [80]. Third, while alanine is among the most preferred amino acids at the +1 position for SHP SH2 domains, none of the SHIP SH2-binding sequences including those derived from lightly colored beads (see Appendix Table A3.3) had an alanine at the +1 position.

3.3.6 Affinity Measurements of Selected Sequences

Representative peptides from each consensus group were re-synthesized individually and tested for binding to the five SH2 domains of SHP-1, SHP-2, and SHIP using the surface plasmon resonance technique (BIAcore) (Figure 3.5). All of the pY peptides tested bound to their cognate SH2 domains with high affinity (KD = 0.2–9.7 µM)

(Table 3.4). As expected, a peptide generally binds with the highest affinity to the SH2 domain used in its selection. For example, peptide IHpYLYA was selected against the 59 N-terminal SH2 domain of SHP-2 (class III). It binds to the N-SH2 domain with a KD value of 0.28 µM but interacts with the other four SH2 domains of SHP-1, SHP-2, and

SHIP with 23–90-fold lower affinity. Likewise, peptide PFpYSLL binds to the SHIP

SH2 domain (which selected the former in the screening) with high affinity (KD = 0.20

µM) but much less tightly to the other four SH2 domains (KD = 5.6–14 µM). Some

peptides (e.g., LVpYATI), however, can associate with all five SH2 domains with similar

KD values (2–5 µM), consistent with the previous observation that the five SH2 domains

have overlapping sequence specificities. Thus, all of the pY peptides tested by BIAcore

bound to their cognate SH2 domains with high affinity. To test whether the color

intensity of a bead during screening correlates with the binding affinity of the peptide it

carries, we synthesized and tested peptides TIpYATI and NApYATI, which were both

selected by SHP-2 C-SH2 domain but the corresponding beads were intensely and lightly

colored, respectively. The former has a 6.5-fold higher affinity to the C-SH2 domain.

Note that the latter peptide contains an Asn at the –2 position, which is not among the

preferred amino acids at this position.

3.3.7 Database Search of Potential SHP-1/SHP-2-Binding Proteins

We performed a web-based search of human proteins that contain the tandem

consensus sequence motifs, (VIL)XY(ASTVI)X(ILV)X1–50(TVIY)XY(ASTV)X(IVL),

where X is any amino acid (web site: http://pir.georgetown.edu/). The two motifs were

designed to encompass the N- (class I) and C-SH2 domain consensus sequences of both

SHP’s, and were separated by anywhere from 1 to 50 residues. This search resulted in

420 “hits”, representing ~100 unique human proteins (many proteins appeared multiple

times under different names or as fragments). After discarding those proteins that are 60 obvious “false positives” (e.g., secreted proteins or transmembrane proteins with the

consensus motifs in the extracellular environment), we obtained 76 proteins as potential

SHP-1/SHP-2 targets (Table 3.5). Out of the 76 candidate proteins, 25 have previously

been shown to bind to SHP-1 and/or SHP-2 via their SH2 domains [81-111]. Since SHP-

1 and SHP-2 can also bind pY proteins through a single SH2 domain, similar searches

were conducted with single consensus motifs. Several additional proteins that are known to bind SHP-1 and/or SHP-2 were identified, such as PD-1 [112], death receptor [113], leptin receptor [114], insulin receptor substrate-1 [115], and Siglec-10 [116], among others. It remains to be determined whether the other predicted proteins in Table 5 are bona fide SHP-1 and SHP-2 binding partners in cellular systems.

3.4 Discussion

The combinatorial library method reported in this work has for the first time provided a complete solution to the problem of identifying linear peptide motifs that interact with a given protein or non-protein receptor. Compared to the previously reported methods, our method has several significant advantages. First, our method identifies individual binding sequences; this feature is crucial for understanding the specificity of receptors that recognize multiple consensus sequences. For example, when the five classes of binding sequences of SHP-2 N-SH2 domain were combined and plotted in the same manner as in Figure 2 to give a composite histogram, no clear consensus emerged (see Appendix Figures A3.2). Even for a receptor that has a single consensus sequence, individual sequences are useful in revealing subtle covariance of sequences. For example, among the pY peptides that bind to SHP-2 C-SH2 domain, 61 when the +3 residue is isoleucine, alanine is most frequently found at the +1 position;

however, when valine is the +3 residue, a serine is most preferred at the +1 position

(Table 3.1). Second, our method allows for “fair” competition among all library

peptides, as each bead contains roughly the same amount of peptide molecules (~100

pmol). This is not the case with pY peptide libraries displayed on phage surface, because

such libraries are biased against sequences that are poor substrates of the tyrosine kinases

used to phosphorylate the phage [63, 64]. Youngquist et al. reported another method in

which the peptide sequence on each bead is encoded by generating a set of chain-

termination products during library synthesis and the sequence of the full-length peptide

is determined by mass spectrometric analysis of the set of chain-termination products

[51]. Unfortunately, due to different reactivities of the 20 amino acids, the amount of

chain termination varies with peptide sequence. As a result, the amount of full-length

peptide on each bead also varies, biasing the screening against peptides containing slow-

coupling amino acids (e.g., Ile and Thr). Third, because our method employs chemically

synthesized libraries, modified (e.g., pY) and/or unnatural amino acids (e.g., D-amino

acids) can be easily incorporated into the libraries. Fourth, our method is high- throughput and cost effective. By employing partial Edman degradation, we can routinely sequence 20-30 beads in an hour, at a cost of ~US$0.50 per bead. Finally, as demonstrated with all three SH2 domains from SHP-2 and SHIP, our method is highly reproducible. Our method is readily applicable to other protein or non-protein receptors.

We have subsequently applied this method to determine the sequence specificity of BIR domains (Chapter 4), WW domains, and Chromodomains (unpublished results).

Our lab had previously determined the sequence specificities for the two SH2 domains of SHP-1 using the method of Youngquist et al. [78]. Comparison of the 62 specificities of the five SH2 domains of SHP-1, SHP-2, and SHIP revealed that these

SH2 domains have overlapping and yet distinctive specificities. Although all five

domains are capable of binding to peptides containing the ITIM,

(V/I/L/T)XpYXX(I/L/V), there are clear differences between SHP and SHIP SH2

domains. For example, SHP SH2 domains require a hydrophobic residue at the –2 position, whereas SHIP SH2 domain can tolerate most of the amino acids at the N- terminal side of pY. On the C-terminal side, SHIP SH2 strongly prefers a leucine at +2 position, but SHP SH2 domains have no such requirement (except for class II peptides of

SHP-2 N-SH2 domain). There are also more subtle but yet significant differences at the

+1 position; while SHP SH2 domains all prefer an alanine at this position, alanine is never found at this position among all of the SHIP SH2-binding sequences. Among the four SHP SH2 domains, the two N-terminal SH2 domains have similar specificities and the two C-SH2 domains are analogous to each other. Most of the SHP-2 N-SH2-binding peptides tested also bind to SHP-1 N-SH2 domain with similar affinities (Table 3.4). The only exception is the class III peptide, IHpYLYA, which has high affinity and selectivity for SHP-2 N-SH2 domain. The two C-SH2 domains have two major differences. The most preferred residues at the –2 position are valine, isoleucine, and leucine for SHP-1 C-

SH2 domain, whereas for SHP-2 C-SH2 domain, they are threonine and valine

(Appendix Figure A3.4). At the +3 position, the most preferred residue is leucine followed by valine and isoleucine for SHP-1 but is isoleucine followed by valine and leucine for SHP-2.

The SH2 specificity data are very useful in understanding the different functions of SHP-1, SHP-2, and SHIP in cell signaling. For example, immunoreceptor PD-1 has previously been reported to bind SHP-2 but not SHP-1 [112]. The pY motif responsible 63 for SHP-2 binding has the sequence TEpYATIVF, which is a perfect match to the

consensus sequence of SHP-2 C-SH2 domain, but not to that of SHP-1 [78]. Many

receptors, however, contain multiple ITIM motifs that match the specificities of both

proteins and are able to bind both SHP-1 and SHP-2. For example, the first ITIM motif

of human Siglec-11 (LHpYASL) closely matches the consensus sequence of SHP-1 SH2 domains, whereas its second ITIM, TEpYSEI, resembles the consensus of SHP-2 C-SH2 domain [82]. Some receptors contain ITIM motifs whose sequences represent a compromise between the consensus of SHP-1 and SHP-2 SH2 domains. Biliary glycoprotein 1 (CD66), which is known to bind both SHP-1 and SHP-2, is such an example [83]. Its two ITIMs (VTpYSTL and IIpYSEV) match the overlapping specificities of SHP-1 and SHP-2 SH2 domains. SHIP has been reported as the main inhibitory molecule for immunoglobulin G Fc receptor signaling pathway by binding to the pYSLL motif on the Fc receptor [80, 117]. Our data show that pYSLL matches the consensus sequence of SHIP SH2 domain and binds latter with much greater affinity than the SH2 domains of SHP-1 or SHP-2 (Table 3.4). The specificity data can also be used to

predict the interaction partners of the SH2 domain-containing proteins. As described

above, a simple database search identified almost all of the known SHP-1 and SHP-2

interacting proteins (Table 3.5). It is highly probable that some of the other predicted

proteins in Table 3.5 will prove to be bona fide SHP-1 and SHP-2 binding proteins.

In summary, a powerful combinatorial library method has now been developed

for the systematic determination of sequence specificities of protein interaction domains

such as SH2 domains. The specificity information generated by this method will be very

useful in understanding the cellular function of proteins that contain these interaction

domains and the design of specific inhibitors against such protein domains. 64

VIpYANI IHpYAVI ILpYSTI TIpYTII TVpYSIV VIpYANI PIpYAVI TTpYSTI TYpYTMI TIpYSMV IEpYAQI NMpYAVI VHpYSTI TIpYTEI TVpYSEV YTpYAQI VLpYAII TYpYSSI TYpYVEI TVpYTEV AIpYASI VYpYAII IVpYSQI IQpYVQI TVpYASL YSpYASI VYpYAII VTpYSQI TKpYVVI VYpYATL IWpYASI TQpYAII YTpYSQI TLpYAVV YLpYATL THpYASI SNpYAII VEpYSEI TRpYAVV IQpYAVL TIpYATI MYpYAII TYpYSMI TYpYAVV TApYAIL TSpYATI YQpYAII TFpYSRI VTpYAIV VTpYATI INpYAMI YYpYSRI IHpYATV VGpYATI THpYAMI TRpYTQI TVpYASV LYpYATI TMpYAMI VIpYTQI TRpYAKV NApYATI TTpYAAI VTpYTSI IIpYSQV VApYAVI YKpYARI VFpYTTI VIpYSSV VHpYAVI YMpYAHI HFpYTTI VIpYSVV IApYAVI YMpYAEI TIpYTVI TQpYSIV

Table 3.1. Selected SHP-2 C-SH2 domain-binding sequences (77 total). All sequences were obtained from a screening experiment performed with 10 nM SHP-2 C-SH2 domain. Boldface, peptides derive from the most intensely colored beads; plain text, peptides from beads of medium color intensity; italics, sequences from the lightly colored beads; M, norleucine.

Class I IRpYVEL VTpYTLI LNpYIVI WTpYSLQ ITpYTYI IApYVEL IVpYTLL LHpYAII WTpYYLF IRpYTYV INpYVQL IMpYTII VVpYAII WTpYVLY Class IV INpYVEI MNpYVTL VTpYALI WTpYYLI ILpYMIP INpYVQI MNpYVIV WTpYQIL Class II TEpYMVP IWpYVSI LRpYIQV WTpYQIT WMpYRII IQpYMVL IQpYVML LRpYMQL WTpYVIT WMpYKIY VLpYMQP IQpYVML LRpYVRV WTpYVTS WMpYNIG VMpYMQP IHpYVMI LRpYVSV WTpYSYT WMpYYIQ LVpYMGP IIpYVVI LHpYVSV WTpYQYV WMpYRLY LHpYMGP ISpYIEI MHpYVQV ITpYRLV WMpYRLI ALpYMIP INpYIEV LYpYLQI ITpYLIG WMpYQLS PMpYMIA ISpYIEV LYpYANI WSpYKIY WMpYYLT ILpYFIP ILpYTEV LYpYAQV WSpYVLV WMpYYLY VIpYFVP LVpYTEV LFpYAEI WMpYTLN Class III IVpYFVP ITpYTEV VMpYAEI WMpYMMP IHpYLYA IIpYFYP IFpYTAV LRpYAKL WMpYRMN TLpYLYA VQpYFIR IFpYTAI LYpYATI WMpYRYQ YTpYLVA IYpYTPV LVpYATI Class V WMpYYQV IFpYLYS IMpYTDI LTpYVTI QMpYYLY WIpYFIR VMpYLYS IYpYTDI LRpYVSI LYpYYQY WIpYTIG VVpYMYS IVpYADI LNpYMTI LRpYLVY IIpYTIG IVpYLYT IRpYAQI MYpYATI ITpYLVY WIpYYTR LNpYLYM IVpYAML MYpYATI LNpYMTF WVpYTIN VLpYLYP VApYVEL YApYATI MSpYMVF WVpYRID IKpYTYP VTpYVQL SYpYASI YNpYMVF WVpYYIG IMpYTYP VYpYTEI MYpYARI LNpYVIF WVpYYIR ITpYTYP VYpYTQI YIpYTTV LNpYVLF WVpYYTY VVpYMYT VIpYAQL YVpYTAI LYpYTSF WVpYRLE VTpYMYT VNpYTTL LNpYAVI LYpYATF WTpYSLA YVpYTYT VTpYTII LRpYAVI RApYIVM WTpYSLY ISpYTYI

Table 3.2. Selected SHP-2 N-SH2 domain-binding sequences (150 total). Boldfaced sequences derive from the most intensely colored beads during 10 nM N-SH2 domain screening. M, norleucine.

PFpYSLL TMpYSFL GGpYTMM YDpYVLM VGpYYFI PRpYSLV TLpYSLL AHpYTLM GTpYYLM IKpYYYL PApYSMI TYpYSVL ALpYTMM GIpYYYL LGpYYLL PPpYSMM AMpYSFM SApYTLM GIpYYYM LQpYYML PYpYSFI AYpYSYI STpYTYM GGpYYVI LLpYYYV PFpYSFI AVpYSIL SSpYTLL GGpYYFM MGpYYLL PPpYSFM NYpYSYL SPpYTLM GYpYYLM MKpYYMM PRpYSYM QGpYSMM NGpYTLL AYpYYLL MQpYYLM PKpYSYL RGpYSML NMpYTLL AYpYYLV MVpYYYV PKpYSYV VYpYTLL HPpYTLM ATpYYYM FNpYYLL PApYSYI VYpYTLM HYpYTLM AVpYYLM FYpYYLL PRpYSVM TGpYTLL QYpYTMM AHpYYLM FKpYYLM PFpYSVI TIpYTMM RWpYTLM AGpYYMI FApYYML PLpYSTL TYpYTFI WLpYTLM AGpYYFV FNpYYMI PMpYSTM IQpYTLL VGpYVLL PIpYYLL FMpYYYL PRpYSTM PYpYTLI LGpYVMM PGpYYLI YGpYYML PLpYSIL PFpYTLI SLpYVLL SApYYMM WYpYYLL FVpYSLM PMpYTLM AGpYVFM SApYYYV WVpYYLV FApYSYL PRpYTLM DGpYVLM SYpYYYV HPpYYLL YFpYSLM LPpYTLM EGpYVYM SFpYYYI HPpYYLM YQpYSIM YVpYTLM GApYVFL TGpYYLL NApYYML YGpYSMM YMpYTLL HKpYVLL TGpYYLL QIpYYLL YApYSYV YGpYTLM HPpYVLM TGpYYLI EGpYYFM YLpYSYV YYpYTYI MGpYVML TGpYYYM PFpYFLL HSpYSLL FTpYTLM PLpYVLM TTpYYYL TVpYFLL LYpYSLL FFpYTLL QGpYVLL TSpYYYI QTpYFLM VLpYSLL FQpYTMM QGpYVML TRpYYLM SGpYFLM VYpYSLL FTpYTYM TSpYVLL VGpYYLL YGpYFLM VYpYSLV MTpYTLM VSpYVLL VKpYYLL AGpYFYV VYpYSYI GYpYTLI VGpYVYM VGpYYMI TMpYAFI VYpYSYL GPpYTLI WGpYVML VTpYYMM KGpYQLL GGpYTMI WApYVMM VYpYYYL

Table 3.3. Selected SHIP SH2 domain-binding sequences (158 total). Boldfaced sequences were from the most intensely colored beads from 10 nM SH2 domain screening. M, norleucine.

1017.6

732.5 845.5

946.6

ITA 1607.8 1431.8 1708.9 1423.7 1536.8 pY1260.6 Y Nle A T

800 1000 1200 1400 1600 m/z

Figure 3.1. MALDI mass spectrum from a C-SH2 domain selected bead nine rounds of partial Edman degradation. The doublet at m/z 1423.7 and 1431.8 indicates that the residue N-terminal to tyrosine is a norleucine. M*, methionine prior to CNBr cleavage and homoserine lactone after CNBr cleavage.

68 70 -2 10 nM 60 50 nM 50 40 30 20 10 0 DENQHKRWFYML I VT SAGP 70 -1 60 50 40 30 20 10 0 DENQHKRWFYML I VTSAGP 70 +1 60 50 40 30 20 Occurrence 10 0 DENQHKRWFYML I V TSAG P 70 +2 60 50 40 30 20 10 0 DENQHKRWFYML I VTSAGP 70 +3 60 50 40 30 20 10 0 DENQHKRWFYML I V TSAGP

Figure 3.2. Specificity of the C-SH2 domain of SHP-2. Amino acids are identified at

each positions–2 to +3 relative to pY (position 0). Occurrence represents the number of

selected sequences containing each amino acid at that position. Open bar, results from

screening at 10 nM C-SH2 protein (77 sequences); closed bar, results from intensely colored beads from screening at 50 nM C-SH2 protein (90 sequences); M, norleucine.

69 A B 40 40 -2 -2 30 30

20 20

10 10

0 0 DENQHKRWFYML I VTSAGP DENQHKRWFYML I VTSAGP 40 40 -1 -1 30 30

20 20

10 10

0 0 DENQHKRWFYML I VTSAGP DENQHKRWFYML I VTSAGP 40 40 +1 +1 30 30

20 20

10 10 Occurrence

0 0 DENQHKRWFYML I VTSAGP DENQHKRWFYML I VTSAGP 40 40 +2 +2 30 30

20 20

10 10

0 0 DENQHKRWFYML I VTSAGP DENQHKRWFYML I VTSAGP 40 40 +3 +3 30 30

20 20

10 10

0 0 DENQHKRWFYML I VTSAGP DENQHKRWFYML I VTSAGP Figure 3.3. Specificity of the N-SH2 domain of SHP-2. Amino acids are identified at each positions–2 to +3 relative to pY (position 0). Occurrence represents the number of selected sequences containing each amino acid at that position. (A) class I sequences; and

(B) class II sequences. M, norleucine. 70 80

60 -2

0 DENQHKRWFYML I VTSAGP 80

60 -1

0 DENQHKRWFYML I VTSAGP 80

60 +1

20 Occurrence 0 DENQHKRWFYML I VTSAGP 80

60 +2

0 DENQHKRWFYML I VTSAGP 80

60 +3

0 DENQHKRWFYML I VTSAGP

Figure 3.4. Specificity of SHIP SH2 domain. Amino acids are identified at each positions–2 to +3 relative to pY (position 0). Occurrence represents the number of selected sequences containing each amino acid at that position. M, norleucine.

SHP-2 SHP-1 SHIP NSH2 CSH2 NSH2 CSH2 SH2 1) IHpYLYA 0.28 ± 0.04* 12 ± 2.5 7.0 ± 0.43 27 ± 6.9 13 ± 0.95

2) LVpYTEV 1.4 ± 0.12 8.5 ± 1.2 3.2 ± 0.17 8.8 ± 1.3 3.8 ± 0.26

3) VApYVEL 3.6 ± 0.11* 3.7 ± 0.10 4.9 ± 0.10 10 ± 0.92 5.2 ± 0.21

4) LVpYATI 1.9 ± 0.14* 2.0 ± 0.20 1.6 ± 0.51 5.2 ± 0.51 3.5 ± 0.15

5) ITpYTYP 2.4 ± 0.22* 9.7 ± 1.9 2.1 ± 0.07 11 ± 3.4 3.2 ± 0.24

6) WMpYRII 3.0 ± 0.39* 20 ± 6.2 8.5 ± 0.98 23 ± 5.0 6.3 ± 0.36

7) WIpYRII 7.2 ± 0.73* 11 ± 0.55 6.5 ± 1.2 30 ± 8.4 11 ± 2.1

8) WTpYQIL 9.7 ± 0.32* 10 ± 0.41 17 ± 1.0 59 ± 3.6 3.8 ± 0.32

9) TIpYATI 3.9 ± 0.26 0.6 ± 0.07* 6.4 ± 0.42 2.4 ± 0.11 2.2 ±0.19

10) NApYATI 34 ± 2.9 3.9 ± 0.42* 28 ± 2.62 16 ± 2.2 10 ± 0.51

11) PFpYSLL 9.7 ± 0.89 9.2 ± 1.5 5.6 ± 0.47 14 ± 4.5 0.20 ± 0.03*

Table 3.4. Dissociation constants (µM) of selected pY peptides toward SH2 domains.

All peptides are N-terminally acetylated and contain the C-terminal linker, LNBKR-NH2.

The lysine side chain was acylated with a PEG4-biotin moiety for immobilization. The

SH2 domains were constructed as N-terminal six-histidine fusion proteins. M, norleucine.

The asterisk indicates the SH2 domain by which each peptide was selected in screening.

2500 a [N-SH2] (µM) 2000 5.2 2.6 1.3 1500 0.65 0.33 1000 0.16 Response Units Response 0.08

500 RUeq

0 20406080100120140160 Time (s)

2000 b

1500 eq 1000 RU

500

0 0123456 [N-SH2] (µM)

Figure 3.5. SPR analysis of the binding of the SHP-2 N-SH2 domain to peptide

IHpYLYA. (a) Overlaid BIAcore sensograms at indicated concentrations of N-SH2 protein (0.08–5.2 µM). (b) Secondary plot of resonance signal under equilibrium binding conditions against SH2 concentration. The data were fitted to the equation RUeq = RUmax x [SH2]/(KD + [SH2]).

Table 3.5. Human proteins predicted to bind to SHP-1 and/or SHP-2 via SH2 domains.

These 76 candidate proteins remained after excluding repeated, fragmented, secreted, and extracellular proteins from the 420 possibilities returned from the database search. aProteins that have previously been shown to bind to SHP-1 via its SH2 domains. bProteins that have previously been shown to bind to SHP-2 via its SH2 domains.

Protein Binding Motif(s) Ref. Activating NK cell receptor 2B4b TIYSMI, TLYSLI [81] adenylate cyclase, type VI VSYVVL, IAYTLL adipocyte G protein-coupled receptor 175 LVYSLV, YVYAGI alternative splicing factor-1 VCYADV, TAYIRV alternative splicing factor-3 VCYADV, VGYTRI B and T lymphocyte attenuatora,b LLYSLL, IVYASL, TEYASI [84] beta-hexosaminidase β-subunit IEYARL, TTYSFL biliary glycoprotein-1 (CD66, CEACAM-1)a,b VTYSTL, IIYSEV [83] coagulation factor II (thrombin) receptor VCYVSI, VHYSFL, YVYSIL dol-P-man dependent α(1-3)-mannosyltransferase VAYTEI, YDYTQL Ewing’s sarcoma protein-1 LVYTSI, YPYSVL exportin-7, ran-binding protein-16 IGYSSV, TFYTAL, SYYSLL G protein-coupled receptor RDC1 VLYSFI, TEYSAL G6b-B protein of MHC IIIa,b LLYADL, TIYAVV [85] H-rev107-like protein (HRLP-5) VKYSRL, VQYSLI human germinal-center associated lymphoma protein LCYTLI, TEYSLL immune receptor expressed on myeloid cells-1 (polymeric LCYADL, VEYVTM, ISYASL [86, 87] immunoglobulin receptor)a TEYSTI immunoglobulin superfamily receptor translocation LVYSEI, VVYSEV associated-1 (IFGP-2) immunoglobulin superfamily receptor translocation VVYSEV, IIYSEV associated-2 immunoglobulin-like transcript 2, leukocyte VTYAEV, VTYAQL [88] immunoglobulin-like receptor-1 (MIR-7)a immunoglobulin-like transcript 3 (LIR-5)a VTYAKV, VTYAQL [89] immunoglobulin-like transcript 5 (LIR-3) VTYAPV, VTYAQL inhibitory receptor protein 60 (IRC-1)a,b LHYANL, VEYSTV, LHYASV [90] interleukin 8 receptor α (CXCR-1) IAYALV, ILYSRV, IIYAFI interleukin 8 receptor β (CXCR-2) IIYALV, ILYSRV, LIYAFI killer cell Ig-like receptor 2DL1 (p58, NKAT-1)a,b VTYTQL, IVYTEL [91, 92] killer cell Ig-like receptor 2DL2 (NKAT-6) VTYTQL, IVYAEL killer cell Ig-like receptor 2DL3 (NKAT-2) VTYAQL, IVYTEL killer cell Ig-like receptor 3DL1 (p70, NKB-1, NKAT-3)a,b VTYAQL, ILYTEL [93] leucine rich neuronal protein (LRCH-4) VFYVVL, VTYTRL leukocyte antigen (CD84)a,b TIYTYI, TVYSEV [94] leukocyte-associated immunoglobulin-like receptor-1a VTYAQL, ITYAAV [95] lipid phosphate phosphorylase-1 (phosphatidic acid LPYVAL, IPYALL phosphatase-2a) metabotropic glutamate receptor-2 LCYILL, VCYSAL metabotropic glutamate receptor-3 LCYILL, ICYSAL metabotropic glutamate receptor-4 LSYVLL, ISYAAL metabotropic glutamate receptor-7 LSYVLL, ISYAAL multiple C2-domain and transmembrane region protein-2 LRYIIL, VQYAEL natural killer inhibitory receptor NKG2-Aa,b VIYSDL, ITYAEL [96] natural killer-,T-, B-cell antigen receptora,b LEYVSV, TVYASV, TIYSTI [97] 75 neuropeptides B/W receptor type 1 (GPR7) VVYAVI, VLYVLL novel protein similar to PRAME LSYVLL, IHYSQL olfactory receptor 1F1 LFYSTI, VLYTVV olfactory receptor 8D1 ILYSIL, VFYTTV olfactory receptor 12D2 LRYTVI, LFYAPV, IMYTVV olfactory receptor 12D3 ISYSSV, LRYTVI, IMYSAV olfactory receptor 51B5 ISYVLI, VFYVTV olfactory receptor 51V1 TVYTVL, LRYSSI osteoblast-specific factor-2 IKYIQI, IKYTRI paired immunoglobin-like type 2 receptor alpha (FDF03)a,b IVYASL, TLYSVL [98, 99] phosphoribosyl transferase domain containing-1 LEYVLI, IGYSDI PIG-M mannosyltransferase VRYTDI, YRYTPL platelet endothelial cell adhesion molecule-1 (CD31)a,b VQYTEV, TVYSEV [100] polycystin-1, polycystic kidney disease-related protein-1 VTYTPV, VQYVAL, LNYTLL protein KIAA0319 (contains polycystic kidney disease 1 IFYVTV, TKYTIL domains) protein zero relatedb VIYAQL, VVYADI [101, 102] R3H domain protein-1 IPYTSV, VYYSVI ran-binding protein-17 VGYILL, TFYTAL, TSYTML ICYSAL SH2 domain-containing phosphatase anchor protein-1a VVYSQV, VIYSSV [103] sialic acid binding Ig-like lectin-2 (CD22)a VTYSAL, IHYSEL, VDYSEL [104, 105] sialic acid binding Ig-like lectin-3 (CD33)a,b LHYASL, TEYSEV [106] sialic acid binding Ig-like lectin-5 (OBBP-2) LHYASL, TEYSEI sialic acid binding Ig-like lectin-6 (OBBP-1) LHYAVL, TEYSEI sialic acid binding Ig-like lectin-9 (FOAP-9)a LQYASL, TEYSEI [107] sialic acid binding Ig-like lectin-11a,b LHYASL, TEYSEI [82] sialic acid binding Ig-like lectin-12 (S2V)a,b IQYASL, YEYSEI [108] signal regulatory protein alpha-1 (SHPS-1, BIT, MyD-1, ITYADL, TEYASI, LTYADL [109, PTPNS-1)a,b 110] sodium channel type V alpha subunit (cardiac muscle LNYTIV, IMYAAV, TTYIII alpha-subunit) IEYSVL sodium channel type XI alpha subunit (peripheral nerve INYTII, IIYAAV, VSYIII sodium channel 5, hNaN) IKYSAL solute carrier family 19, member 3 (SLC19A3) LNYVQI, VGYVKV somatostatin receptor 1b VIYVIL, VLYTFL, LCYVLI [111] spastic ataxia of Charlevoix-Saguenay IHYTLL, YTYAII trace amine receptor-5 (GPR102) LTYSGA, ILYSKI ubiquitin-specific protease-9, X chromosome (DFFRX) VMYANL, YQYAEL ubiquitin-specific protease-9, Y chromosome (DFFRY) VMYANL, YQYAEL zinc finger protein 521 VGYTSV, VTYSCI

CHAPTER 4

DETERMINATION OF THE TETRAPEPTIDE LIGAND SPECIFICITIES OF

THE BIR2 AND BIR3 DOMAINS OF XIAP BY COMBINATORIAL

LIBRARY SCREENING

4.1 Introduction

Apoptosis, the process of genetically programmed cell death, fulfills an essential requirement in normal physiology by eliminating unwanted cells during embryogenesis,

immune cell education, viral infection, and after environmental insult [118, 119]. As

might be expected for any process having such a final outcome its application must be

timely and appropriate, and thus regulation is paramount. Excessive or inappropriate

apoptotic activity is by definition contrary to viability and associated with several disease

states, whereas insufficient apoptotic activity results in pathological phenotypes such as autoimmune disorders and cancers [120]. Maintaining the proper balance requires that caspases, the cysteine proteases responsible for the initiation and execution of cellular dismemberment, be held in check until the appropriate time. Regulation of these powerful proteases is multi-layered, beginning with their synthesis as relatively inactive zymogens. Activation requires internal proteolytic processing, resulting in small and large sub-units which dimerize in an α2β2 stoichiometry to form a catalytically competent protease [121, 122]. Further regulation through the direct binding of inhibitor- 77 of-apoptosis proteins (IAPs) and blocking of dimerization or catalytic sites of processed caspases occurs before the caspase activities are released [121, 123-125]. These IAPs are in turn regulated by small soluble proteins which bind to the baculoviral IAP repeat (BIR) domains within IAPs, thereby sequestering the inhibitors and releasing the caspase activities [126]. Thus, the balance between cellular survival and demise is largely determined by a macromolecular titration of caspases by IAPs, and in turn, IAPs by their antagonists.

The central importance of IAPs in the regulation of the apoptotic process is demonstrated by their ubiquitous distribution in the animal kingdom and strong evolutionary conservation. Following the discovery of the first IAPs capable of blocking apoptosis in lepidopterin cells, baculoviral proteins Op-IAP and Cp-IAP, homologues were soon identified in Drosophila, mouse, and human. Subsequently, IAP genes have been isolated from insect, avian, piscean, mammalian, and viral sources [118, 127]. As with other proteins associated with the death process, widespread identification has been greatly aided by the high degree of sequence similarity among IAPs [119]. This sequence similarity has been extended by functional complementation experiments.

Illustrating the degree of general conservation in the apoptotic machinery, IAPs from baculoviruses (which infect arachnoid hosts) and Drosophila have been demonstrated to inhibit apoptosis in human cell lines, and human IAPs can reciprocate in Drosophila cells

[128-130]. At the root of this interspecies functional compensation is the conserved architecture of the IAPs and in particular the nearly superimposable structures of the BIR domains [118].

All IAPs characterized to date contain one to three BIR domains and many posses

C-terminal RING (really interesting new gene) finger domains, although variation in the 78 C-terminus is not uncommon. BIR domains are relatively small, ~70 amino acids, and

contain a zinc ion coordinated by invariant Cys and His residues. One of the most

characterized members of the IAP family is human X-linked IAP (XIAP). It includes

three N-terminal BIR domains (BIR1, BIR2, and BIR3) and a canonical C-terminal

RING finger, which has been demonstrated to posses ubiquitin ligase activity [131]. The

BIR2 and BIR3 domains of XIAP, which share 40% sequence identity [132], require free

N-termini in their binding partners for recognition. The tetrapeptide sequences NH2-

ATPF, derived from the small subunit of processed caspase-9, has been demonstrated to bind the BIR3 domain and is necessary for the inhibition of caspase-9 activity [133].

Moreover, the peptide NH2-AVPIA derived from the N-terminus of mature Smac, has

been shown to bind to both domains [134, 135]. Peptides similar to the Smac and caspase-9 N-termini are classified as IAP binding motifs (IBMs). Interestingly, the BIR1 domain, despite sharing 41% sequence identity with BIR2 [136], has never been observed in protein-protein interactions. Additionally, IAPs have been suggested to play roles in

processes outside of apoptosis, such as Survivin’s role in cytokinesis [137, 138]. Thus, it

seems likely that BIR domains may be able to recognize molecules other than caspases

and their antagonists.

It was therefore of interest to us to define the entire population of potential

binding candidates by individually screening each of the BIR domains of XIAP against a

synthetic tetrapeptide library. By synthesizing the library on the solid phase we could

ensure an unbiased free N-terminus. Additionally, this N-terminus was ideal for sequencing by the partial Edman degradation technique employed by our lab. Although the BIR1 domain failed to exhibit any affinity towards this library, the other BIR domains specificities and tolerances of the other two domains were quite different. The BIR3 79 domain generally selected for IBMs, but it was more willing to accept a Val at the N- terminus (P1 position) than BIR2. And while neither domain showed strong selectivity at

P2, the selectivity at P3 was much broader for BIR2 than BIR3. And finally, a discreet difference was observed at P4, with BIR2 selecting Val and small hydrophobes in contrast to BIR3’s strong preference for Phe and Ile residues. We present herein the results of library screenings performed under various stringencies for the BIR domains of

XIAP.

4.2 Experimental Techniques

4.2.1 Vector Constructs

Construction of the pETMAL, pPPTmal, and pGFPmal vectors was described in

Chapter 3. For mammalian cell culture co-immunoprecipitation experiments probing

XIAP and caspase-10d (casp10) interactions, the mammalian expression vectors pEBG-

SrfI and pCMV-SPORT6 were used, respectively. The pEBG plasmid accepted XIAP without modification as described in section 4.2.2, thus creating a GST-XIAP fusion. A pCMV-casp10 construct was obtained from the labs of Dr. Yusen Liu and was altered to include a C-terminal 3x-hemagglutinin (3xHA) tag. In order to insert the 3xHA tag sequence, QuikChange mutagenesis was performed to introduce a unique Bam HI restriction site 5’ to the casp10 stop codon. The primers for this mutation were 5’-CCC

TGG ATG CAC TTT CAT TAG GAT CCT AGC AGA GAG TTT TTG TTG G-3’ and its complement. The unusual length of these 46-mers resulted from the high A/T content of the 3’ end and required purification by 12% urea-PAGE before reaction. QuikChange was performed according to the protocol described in Chapter 3 and the thermocycling regimen 1 x 95 ºC (30”), followed by 18 x 95 ºC (30”), 52 ºC (1’), 68 ºC (15’), and an 80 additional 15’ at 68 ºC extension. Digestion of the parent plasmid by Dpn I and

transformation into XL1 Blue yielded only 6 colonies, of which 2 appeared correct by

restriction mapping. The 3xHA tag was sub-cloned from a pSRα-3HA-Jnk1 construct

(also from Dr. Liu) [139] by PCR using the primers 5’-TTT GCA GAA GCT CAG AAT

AAA CGC-3’ and 5’-GGC ACT CGA GCT AGC TAG TCA CGC TTG CTC GCC AT-

3’. Despite not encoding a restriction site in the former primer, the PCR product contained a Bam HI site encoded by the pSRα vector 10 base-pairs downstream of primer hybridization. Thus, the PCR fragment was ligated at unique Bam HI and Xho I sites shared with the mutagenized pCMV plasmid. Overlapping dideoxy sequencings confirmed the authenticity and fidelity of the C-terminally fused 3xHA tagged caspase-

10d construct.

Additional mutants of casp10 were constructed for suppression of the apoptotic phenotype by QuikChange kit. The following primers (plus complement) were used to make the indicated mutants: C401S, 5’-CAT CCA GGC CTC CCA AGG TGA AGA G-

3’; DA415,416GS, 5’-CGT ATC CAT CGA AGC AGG ATC CCT GAA CCC TGA

GCA GG-3’. The C401S mutant involves the active site Cys residue, whereas the

DA415,416GS mutation removes the inter-subunit cleavage site at Asp 415 while inserting an Eco RI site.

4.2.2 XIAP BIR Domain and Full-Length Constructs

The DNA sequences coding for the BIR1 (aa 1-123), BIR2 (aa 124-240), and

BIR3 (aa 241-356) domains were isolated by PCR from a pGEX-4T plasmid containing the BIR 1-3 domains of XIAP [140]. The DNA primers used were: BIR1, 5’- GGA ATT

CAT GAC TTT TAA CAG TTT TGA AGG-3’ and 5’-CTT GAA GCT TGT CCT CAG 81 GAT CCC AGA TAG TTT TCA AG-3’; BIR2, 5’-GAG AAT TCA GAG ATC ATT

TTG CCT TAG ACA GG-3’ and 5’- CTG GAA AGC TTA TTC ACT TCG AAT ATT

AAG ATT CC-3’; and BIR3, 5’-CGA ATT CTC TGA TGC TGT GAG TTC TGA TAG-

3’ and 5’-GTA CGA AGC TTA AGT AGT TCT TAC CAG ACA CTC C-3’. PCR products were digested at the underlined sites with the restriction endonucleases Eco RI and Hind III, and ligated into their corresponding sites in pMAL-c2. The sub-cloning was repeated from the pGEX-BIR1-3 template to the pETMAL vector using the same sets of primers. Subsequently, each domain was further sub-cloned into pPPTmal, pGFPmal, and pET-28a vectors by PCR using the malE and T7-terminator sequencing primers and pETMAL templates as described for SH2 domains in Chapter 3.

Full length XIAP including the RING domain was cloned from a Marathon human spleen cDNA library (Clontech) by PCR according to the iQ Supermix protocol with the primer set: 5’-GAG GAT CCA TGA CTT TTA ACA GTT TTG-3’ and 5’-GGT

CTA GAT TAA GAC ATA AAA ATT TTT GC-3’. Specifically, a 50 µL reaction containing 5 µL Marathon library, 0.4 µM each primer, and 25 µL iQ Supermix

(BioRad’s proprietary mixture of dNTPS, buffer, and polymerase) was thermocycled 1 x

94ºC (3’), 33 x 94 ºC (35”), 57 ºC (35”), 72 ºC (1.5’), and allowed an extra extension for

4’ at 72 ºC. The cloned DNA was restriction digested with Xba I follwed by treatment with Klenow fragment to achieve a blunt 3’ end. The vector pET-28a was similarly made blunt at the unique Hind III site. Both the vector and the gene were digested with Bam

HI, purified by spin column, and ligated at room temperature (45’) in the presence of Eco

RI before transformation into XL1 Blue E. coli. Dideoxy sequencing confirmed the ligation of human XIAP in-frame in the pET vector. In a nearly identical fashion, the full

82 length cloned XIAP was ligated into a pGEX-2T vector, except the blunt end was

instilled in the vector at the Eco RI site and ligation took place in the presence of Sma I.

Lastly, a mammalian GST-fusion construct was generated in the pEBG vector.

The full length XIAP gene was amplified from the pET construct using the T7 promotor and terminator primer set. In this way the gene gained a unique 3’ Not I restriction site from the pET vector, while retaining the 5’ Bam HI site. A pEBG vector containing the rat MPK-1gene was received from Dr. Liu. For unknown reasons, the native pEBG-Srf I vector without an inserted gene gave poor ligation results when digested by Bam HI and

Not I. The pEBG-MPK-1 construct facilitated agarose gel purification following complete Bam HI/Not I digestion, and following similar digestion of amplified XIAP gene, ligation at the complementary sites proceeded efficiently. Dideoxy sequencing confirmed the pEBG-XIAP construct’s authenticity. All pCMV and pEBG constructs for mammalian cell culture experiments were purified by Midi-Prep according to Qiagen’s protocols prior to transfection.

4.2.3 Purification and Lableling of His6-MBP-BIR Proteins

All His6-MBP-BIR domain fusion proteins were expressed, purified, and biotinylated precisely as described for the analogues SH2 domain constructs. Several aliquots of purified fusion protein were labeled by either 2 eq. or 4 eq. of fluorescein-OSu in a manner exactly like that for biotin-OSu. Furthermore, these fusion proteins were purified without any labeling for SPR binding affinity measurements as described in

Chapter 3.

83 4.2.4 Purification of GST-BIR1-3, GST-XIAP, and GST Control Proteins

E. coli BL21(DE3) Rosetta CodonPlus cells harboring the proper pGEX plasmid

were grown in LBA medium to the mid-log phase and induced by the addition of 300 µM

isopropyl-β-D-thiogalactoside (IPTG) for 4~6 hr at 31 ºC. The cells were harvested by

centrifugation and lysed in Buffer 22 by passing through a French press. Each GST

protein was purified from the crude lysate by immobilization on ~10-mL of GST-Bind

Resin (Novagen) according to manufacturer’s recommended procedures. After washing

with two column volumes of Buffer 22 minus protease inhibitors and 4 column volumes

of Buffer 23, ~1 mL of resin was removed and added to an equal volume of glycerol for

storage at -20 ºC, while another ~1 mL of resin was removed and stored at 4 ºC. The

remainder of the resin was eluted by 150 mL of Buffer 23 containing 10 mM reduced

glutathione (GSH). The eluted protein was concentrated, and quantitated by the Bradford

method for estimation of the resin-bound GST protein’s concentration. After addition of

glycerol to 40%, the protein was flash frozen in a dry ice/isopropanol bath and stored at -

80 ºC.

4.2.5 Synthesis of BIR Libraries

The first library was synthesized on 5 g of 130-µm TentaGel S NH2 resin using

standard Fmoc chemistry. All other aspects of library synthesis were identical to those

described for pY library synthesis, with exception of pY and constant N-terminal sequence incorporation. The de-protected library NH2-XXXXLNββRM-resin is referred

to as the BIR library. A second, related tetrapeptide library was synthesized differing in

its C-terminal linker. The same general FMOC chemistry and Leu/Lys/Nle capping

strategies were employed, but the library was synthesized on 90 µm beads carrying ~25% 84 fewer copies of each peptide and a quaternary ammonium salt in place of the constant

Arg of the linker. Synthesis was begun by exhaustive derivatization of 1 g of TentaGel S

NH2 resin by Met as usual, but in place of Arg, Fmoc-Lys(BOC)-OH was next

incorporated. FMOC de-blocking was followed by exhaustive chain-termination by (3-

carboxypropyl)trimethylammonium chloride. Removal of the BOC side-chain protection of Lys was affected by treatment with TFA containing 1% H2O and 4%

triisopropylsilane (v/v) for 1 hr. After washing, the peptide was elongated by β-Ala and

Asn residues. During the coupling of the next residue, Fmoc-β-Ala, the chain-

terminating residue Ac-Gly (25 mol%) was included in order to reduce the density of

peptides on each bead by approximately that percentage. The random positions were

synthesized next as previously described. Thus, the resultant library NH2-

XXXXβNβ(+)KM-resin is referred to as the K+ library.

4.2.6 Colorimetric Library Screening

All aspects of the SA-AP based library screening were identical to the SH2

colorimetric library screening with two exceptions. Only 50 mg of resin were screened

per reaction and the concentrations of the domains screened more varied. The BIR1

domain was screened versus the BIR library at concentrations up to 2 µM. The BIR2

domain was screened versus the BIR library at concentrations from 500 to 1 nM. The

BIR3 domain was screened versus the K+ library at from 500 nM to 2 µM.

4.2.7 Fluorimetric Library Screening

The screening of His6-MBP-BIR3 versus 50 mg of the K+ library was attempted

fluorimetrically using several procedures. In all cases, the preparation of the resin was 85 same as for any other screening, including organic to aqueous washes and blocking with

buffered gelatin solutions. The first screening was performed akin to an SA-AP procedure in which 1 µM of domain was incubated overnight with library before washing and re-suspending in a Petri dish for analysis. However, for bead examination a low- power Olympus SZX12 microscope fitted with the fluorescence excitation and observation optics was necessary. Namely, a high-voltage, high-intensity Hg vapor lamp light source was coupled with an excitation filter (460-490 nm) and an emmission filter

(510-550 nm) (Olympus). The fluorescent beads from the screening were picked manually and checked for loss of fluorescence in an 8 M guanidine-HCl solution. Beads which failed to stop fluorescing under such conditions were discarded. This test was necessitated by the inherent fluorescence observed for some native TentaGel S resin beads. A second screening approach was attempted in which larger volumes of lower domain concentration solutions were screened against the K+ library. In this procedure,

50 mg of resin was incubated overnight at 4ºC in 8 mL of a 500 nM BIR3 domain solution. Work-up was as before. The last form of this screening involved direct incubation in the Petri dish of 50 mg of resin with 3 mL of 1 nM BIR3 domain for 8 hr. at

4 ºC and fluorescent microscopic observation without washing.

4.2.8 Partial Edman Degradation and Peptide Sequencing

The positive beads from each color intensity category were pooled and subjected to partial Edman degradation as generally described in Chapters 2 and 3. Specifically for this library, the beads were suspended in 160 µL of 66% pyridine (aq) containing 0.1%

Et3N, to which an equal volume of 5% PITC in pyridine containing Nic-OSu as (6:1 mol

ratio PITC:Nic-OSu) was added. All other aspects of the degradation were as described. 86

4.2.9 Synthesis of biotinylated pY peptides

All test peptides contained a common C-terminal linker, -BBK-NH2. Each peptide

was synthesized on ~65 mg of CLEAR-amide resin using standard Fmoc/HBTU/HOBt

chemistry. The terminal FMOC protection was left in place during cleavage and

sidechain de-protection (previously described using Reagent K). Following Et2O

trituration, approximately 3 mg of the crude peptide was dissolved in a minimal volume

of DMSO (200–400 µL, with sonication) made basic with DIPEA and reacted with 1.1 equiv of NHS-PEG4-biotin (Quanta Biochem) dissolved in 25 µL of DMSO. After 45

min at room temperature, piperidine was added to 30% and allowed to mix for 20 min.

The mixture was acidified with TFA and re-triturated twice with 20 volumes of Et2O.

The precipitate was collected and dried under vacuum, and the biotinylated peptides were purified by reversed-phase HPLC on a C18 column (Vydac 300Ǻ 4.6 x 250 mm). The identity and purity of each peptide was confirmed by MALDI-TOF mass spectrometric

analysis. This procedure resulted in the addition of a 15-atom hydrophilic linker between

the side chain of the C-terminal lysine and the carboxyl group of biotin.

In the instance of peptide VKTFLEABE(PEG-biotin)-NH2, which contained an

internal Lys residue, the above procedure was modified by the replacement of the linker

Lys with a Glu(PEG-biotin) moiety during synthesis. This substitution was accomplished

using the reagent FMOC-Glu(PEG-biotin)-OH from NovaBiochem. The terminal FMOC

was removed prior to cleavage/de-protection and HPLC performed as above.

87 4.2.10 Determination of Dissociation Constants by BIAcore

All measurements were made at room temperature on a BIAcore 3000 instrument

as described in Chapter 3 using the MBP-fusions for each domain.

4.3 Results

4.3.1 Library Construction and Screening

Having previously demonstrated the validity and usefulness of the synthetic library screening methodology with SH2 domains requiring phosphotyrosine residues, we adapted the techniques for application to the screening of BIR domains. The design of the library was similar, utilizing the same C-terminal linker. Briefly, the terminal Met allowed for the specific release of the peptide from the resin upon treatment with CNBr, and the neighboring Arg provided a locus for ionization of the released peptide during

MALDI-TOF analysis. Two β-alanines increased the distance between the randomized positions of interest and the bead surface, thereby providing the binding domains with better access to the ligand region. Finally, relatively inert Asn and Leu residues were incorporated to increase the mass of the linker above 600 Da in the mass spectrum for ease of interpretation. After synthesis of the random positions by the split-pool method

[4, 6], a library of the form of NH2-XXXXLNBBRM-resin was achieved, where X

represents norleucine (Nle) or any of the natural amino acids, excluding Met and Cys. In

theory, a complete library included slightly more than 1.3 x 105 (194) unique members

and was completely covered in less than 150 mg of 130 µm TentaGel resin (8.87 x 105 beads/g).

Repeated fluorimetric screenings conducted by any of the methods described failed to yield reproducible results in terms of the number of positive beads observed. 88 Furthermore, the few beads which were subsequently analysized either yielded no signal

by MALDI-TOF MS or yielded sequences which bore no resemblance to those obtained

by the colorimetric method (Appendix Table A4).

A typical colorimetric screening experiment involved incubating a BIR domain,

which had been purified and biotinylated as the C-terminal fusion of maltose binding

protein (MBP), with 50 mg of library overnight at 4ºC with gentle mixing. After multiple

washings, domain selected peptides recruited a streptavidin-alkaline phosphatase

conjugate to the bead surface via biotin and became colorized by dye deposition upon

addition of the phosphatase substrate, 5-bromo-4-chloro-3-indolyl phosphate (BCIP).

The stringency of each screening was controlled by adjusting the concentration of the

BIR domain and was reflected in the number of positive beads obtained. The positive

beads were removed manually by pipette under a low-power dissecting microscope and

sequenced by the partial Edman degradation technique.

The BIR1 domain yielded no binding interactions when screened at a domain

concentration as high as 2 µM, whereas the BIR2 and BIR3 domains yielded 234 and 25

positive beads, respectively, at 500 nM concentrations. The BIR2 domain selected beads could be separated into two groups based on the color intensity of the positive beads;

there were 14 “dark” blue beads and 220 “light” blue beads. In the case of the BIR3

screening, no such lightly colored beads were discernable, and thus only the 25 were

collected. A sampling of 31 peptides from the BIR2 light population and the entirety of

other groups were sequenced and analyzed (Table 4.1). As expected, an alanine was

strongly selected for by both domains at the N-terminus (position 1, P1) and BIR3

strongly selected Phe and Ile residues at P4 in agreement with known IBM sequences.

Unexpected, however, were the numbers of Arg and Lys residues selected by both 89 domains at positions P2 and P3, and the virtual absence of Phe and Ile residues at P4 for

BIR2. These initial data intrigued us and prompted further investigation.

4.3.2 Binding Specificity of the BIR2 Domain

After the initial screening performed at a domain concentration of 500 nM, it appeared likely that a non-Smac-like IBM would emerge for BIR2, and so it was decided that more stringent conditions should be tested using lower domain concentrations in order to precisely define preferred ligands. Thus, three additional screening experiments

were attempted at MBP-BIR2 concentrations of 50, 10, and 1 nM (Table 4.1, Fig. 1).

Correspondingly, the number of binding events decreased substantially with decreasing

domain concentration, yielding 67, 12, and 5 positive beads. As was the case in the 500

nM screening, BIR2 showed exquisite selectivity for alanine at P1. At P4 Val and small

hydrophobes such as Ala and Gly were the most strongly selected residues under the 1

and 10 nM conditions, but at 50 nM laxity at this position began to surface.

Progressively scanning the last position from the 1 to 500 nM screenings there is a

noticeable increase of Tyr, Pro, and hydrophilic residues (Ser, Asp, and Glu), but still a

dearth of Phe and Ile residues. The selectivity for the P2 and P3 residues showed a more

rapid increase in laxity with increasing domain concentration. Selection at P2 was

generally for β-branched residues (Val, Thr, Ile) under stringent conditions, however,

selectivity was quickly lost as nearly every amino acid was represented among the 500

nM sequences sampled. To a lesser degree, the same was true for the P3 position. While

P2 had become tolerant of acidic, basic, hydrophilic, and larger aliphatic residues by the

50 nM screening, the P3 position never accepted acidic residues and remained limited

mostly to smaller hydrophobes and basic residues. Based on the screening derived 90 peptides we can define the consensus binding motif for the BIR2 domain of XIAP as

H2N-AX(A/+/V/y/p)(V/a/g), in which X represents any amino acid, + stands for a basic

amino acid, and lower case letters are indicative of lesser selected residues.

4.3.3 Binding Specificity of the BIR3 Domain

The situation following the initial 500 nM screening experiment involving the

BIR3 domain proved to be quite different from that for the BIR2 domain. The

expectations that Smac-like AVPI or caspase-9-like ATPF ligands would be found were

partially met in the initial screening. The observation of basic residues at the P2 and P3

sites was at first unexpected, and higher stringency screenings were planned as for the

BIR2 domain. However, it was realized that the consensus already pointed toward the

expected Smac-like motif and, in contrast to the BIR2 domain, higher stringency

conditions were most likely to produce an answer we already expected, thus not really

furthering our understanding of the BIR3 domain’s full ligand potential. Therefore, it

was decided that truly new ligands for this domain should be explored through further

screenings performed at equal or lower stringencies instead. Toward this end, additional

50 mg aliquots of library were screened at 500 nM, 1 µM, and 2 µM concentrations of

MBP-BIR3. The resulting peptide sequences are listed according to screening condition in Table 4.2 and plotted together in histogram form in Figure 4.2.

At all concentrations, the BIR3 domain strongly preferred an N-terminal Ala.

Unlike BIR2, however, BIR3 tolerated a small but significant population of sequences having Val and small residues at P1. The selectivity at P2 weakly favored β-branched and basic residues, while, in contrast to the BIR2 domain, acidic residues were almost completely absent from this position. The preference for Pro and Arg at P3 changed little 91 with decreased screening stringency, although Ile and Ala residues began appearing in the 2 µM domain screening. And lastly, the preference at P4 for Phe over Ile and Tyr residues progressively lessened with decreasing stringency. From the above screening data, we defined a consensus binding motif for the BIR3 domain of XIAP as H2N-

(A/v)(+/β)(R/P)(F/I/y).

4.3.4 Affinity Measurements of Selected Peptides

In order to confirm the binding of selected peptides, representative peptides were re-synthesized and tested by surface plasmon resonance (BIAcore). All of the peptides tested bound to their cognate BIR domains with high affinity (KD = 0.46-3.8 µM) (Table

4.3). These values agreed well with KD values previously determined by alternative techniques for similar peptides [134, 141]. As expected from the screening results, peptides 5 and 6 were quite specific for BIR3 due to BIR3’s greater tolerance of N- terminal Val residues, while peptides 1 through 4 showed reasonable cross-reactivity despite containing P4 residues not generally preferred by BIR2. Among the BIR2- selected sequences, specificity is conferred by the increased tolerance of the BIR2 domain for P3 and P4 residues, while peptide 9’s dual specificity demonstrated BIR3’s willingness to accommodate a P4 Val approximately as well as a Tyr residue.

4.3.5 Database Search for Potential BIR2 and BIR3 Binding Partners

Because of the high degree of phylogenetic conservation among apoptotic machinery, we originally performed web-based searches for proteins containing consensus sequence motifs described for the individual BIR domains (web site: http://pir.georgetown.edu/) against all known genomes. However, the results returned 92 were too numerous to be of practical use in this setting. We therefore limited our

database searches to human proteins. As a result of the requirement of a free N-terminus,

we performed the searches in several different ways. The first attempt was the most direct, searching for the N-terminally anchored sequences NH2-AX(RKPVAY)(VTYAG)

for BIR2 and NH2-(AV)X(RKP)(FIY) for BIR3 returned 347 and 73 candidate proteins,

respectively. However, the majority of these sequences were immunoglobulins or

fragments more likely representing incomplete DNA sequencing than proteolytic

processing, and therefore, the likelihood of a true free N-terminus was unknown. Thus,

modification of the searches was in order.

Numerous additional database searches were performed by modifying the N-

terminus. In one search schemes, an Asp residue or entire caspase cleavage sites (e.g.

IEAD) were added N-terminal to the Ala residue in order to resemble proteolytic

processing. However, the results were too numerous (≥1500 hits) to be of use, unless one

were looking for specific targets. Alternatively, several known IAP-antagonists are

processed to reveal their IBM motif by the removal of the N-terminal Met. Moreover,

since one of the intentions of the screening was to define a large ligand space to search

for less canonical potential interaction partners, it was decided to search for targets

possessing a free N-terminus as a result of methionine aminopeptidease (MAP)

proteolysis. For this reason, database searches were performed by adding an amino terminally anchored Met residue to the search pattern, assuming it could be lost during

MAP processing to expose the desired N-terminal residue. The BIR2 domain search

pattern was changed to NH2-MA(RKVTI)(RKPA)(VPAG) in order to reduce the number

of hits to 387, and similarly, the BIR3 search pattern NH2-M(AV)X(RKP)(FIY) returned

238 candidates. After removing redundant or fragment protein sequences, and unlikely 93 candidates such as secreted or extracellular proteins, 127 and 65 potential targets remained for BIR2 and BIR3, respectively. For brevity’s sake, unnamed and hypothetical proteins were omitted from Tables 4.4 and 4.5.

One additional database search was performed for BIR3 domain ligands. In this instance, instead of searching for a pattern, a specific peptide sequence, VKTF, was searched against the human genome. This sequence was deemed special because of the

N-terminal valine residue and its exact duplication in two separate screenings. Among the 330 hits returned was caspase-10, and the VKTF coincides with the inter sub-unit processing site.

4.3.6 Probing BIR3-Caspase-10d Interactions

After Midi-Prep purification of the pCMV and pEBG constructs described earlier, mammalian transfection experiments were perform in the laboratories of Dr. Yusen Liu.

These experiements have tentatively revealed a mild inhibitory relationship between the

BIR3 domain and caspase-10d. Further experiments are on-going.

4.4 Discussion

The combinatorial library method reported in this work provides a more complete picture of BIR domain binding motifs. Compared with previous domain screening methods, our technique has several significant advantages that yielded novel sequences and additional insight into BIR domain binding. First, as previously described our technique yields individual peptide sequences determined subsequent to screening, which is a necessary improvement over the pooled sequencing and chain termination encoding methods employed with synthetic libraries in the past. Second, in contrast to phage 94 display methods, the risk of positional bias due to enzymatic processing and secretion of displayed coat proteins is greatly diminished or absent from synthetic libraries [34-36].

Moreover, the occurrence of such biases in a pIII-fusion phage library has been

demonstrated to be most severe at positions +1 and +3 from the N-terminus, two

important recognition residues involved in BIR domain binding [142]. Third, despite

recent advances in the incorporation of several unnatural amino acids into phage display

libraries [143], the monomeric diversity of synthetic libraries is nearly limitless and

technically far less demanding, allowing future peptidomimetic inhibitor screenings. And

finally, in the absence of selection and amplification, many different stringency

conditions can be screened in order to discover alternative ligand types. Screenings

conducted under multiple stringencies without enrichment provide qualitative

assessments of positional contributions to ligand binding can be made by monitoring the

relative laxity introduced by changing conditions.

The BIR2 and BIR3 domains of XIAP have been extensively characterized by

many different techniques, and several binding partners, caspases-3,-7, and -9, Smac,

HtrA2, GSPT1, and Chk1, have been described in vitro and in vivo [133, 135, 144-147].

Unlike the BIR containing IAPs survivin and BRUCE, the physiological function of

XIAP has to date been limited to apoptosis inhibition, and its defined binding partners all

posses standard N-terminal IBMs. However, by defining the entire peptidic ligand space

for these domains it is expected that new target proteins may emerge, possibly in

biological systems outside of apoptosis. Additionally, as a result of measuring the

binding affinities of the peptides derived from these screening, the sequences themselves

may be of use in developing new, more selective domain-specific peptide-based

inhibitors for research use. 95 Previous library screenings has been described using phage display libraries and

various BIR domains, including XIAP’s [35, 148]. Comparison of these literature results

and those derived here reveals general agreement, with a few notable differences. Both

techniques agree there are marked contrasts between the BIR2 and BIR3 consensuses,

but, two main differences arise in the details of the individual consensuses. First, the P1

Val residue tolerated by the BIR3 domain had not been elucidated in the earlier

screenings. In fact, Franklin et al. reported stronger P1 Ala specificity for the BIR2

relative to BIR3 domain, whereas we observed the opposite trend even at 500 nM domain

concentration. Our BIAcore studies of peptides the N-terminal Val group suggest that

they are legitimate potential targets for the BIR3 domain based upon the observed KD,

and that they are quite specific for BIR3 relative to BIR2 (Table 4.3). Moreover, the

VKTF sequence found in two screenings coincides with the processed N-terminus of caspase-10. Second, as expected from the chosen selection conditions, the selectivity for

Pro at P3 was not nearly as strict as reported for BIR3. However, it is a bit surprising to observe the residue most often selected in its place is Arg, and comparison of the KD

values for peptides 1 and 3 shows only ~2-fold difference.

Although the individual sequences derived from this screening technique are

potentially useful in their own right for future research endeavors, it has been our

intension from the outset to apply these data groupwise in database searches for in vivo

binding targets. Between the BIR2 and BIR3 domains, the latter recognizes ligands

which confom more closely to standard Smac-like IBM, except for the P1 valine-

containing sub-group. It was this group that piqued our interest because of it novelty.

The ligand search for VKTF returned an interaction candidate among apoptosis-related

proteins, caspase-10. Subsequent mammalian cell lysate experimnets conducted in the 96 laboratory of our collaborator have demonstrated a weak inhibitory relationship between the BIR3 domain and caspase-10d activity. Further experiments are being pursued in order to confirm these tentative data.

4.5 Conclusion

The current work demonstrates the feasibility and potential usefulness of the resin-bound library screening technique. The ability to sequence large numbers of beads individually, rapidly, and inexpensively with minimal encoding marks a significant advancement in the use of modified peptide libraries. The three SH2 and two BIR domains screened in this work are from relatively well-characterized proteins. Our screening results both agree with the established literature and extend the possibilities for discovering unique interactions. While further improvement and refinement of the described methodology itself is anticipated, the application of the derived data to the design of future biological experiments is the true goal. Toward this end, successful collaborations are being cultivated and will undoubtedly yield insights into the complex workings of signal transduction.

1nM 50nM ATVT AVQA AMGY 500nM AVHA ALRG AVVV AYPV AKAT ARIG AISY AVKV ASYA AMRG ATPV ADVV AQGT AEVG AVSY ATKV ASVA ATRP AIKV AQAV AQST ASYG AYPS ASRV AQKA AIRP xYPY AKAV ASRT ARYG AAVS ANAV AQAA AYNP ASMV ATRT AYSG ARVS AWAV AQYA AKVP 10nM AYMV AKAI ADKG AQYS ATKT ARNA AEPF AVVV AVWV AEMI AEKG AYNS ATAT ATVG ATSS AYPV ANYV ATRI AVVP AVRS AVGT AQPG ATAN ASAV ARSV ATYA AVNP AIVD AVMT AQKG AEAD ARAT AVNV AMYA AVSF AIKD ANAT AEAG ARKE ATIA AIQV ARYA AKRF AYAE AVHI AEIG ATRH AINA AKRV AAIA AIPY AQGE AVKI ADAG ASAK ATKG ANRV ARIA ATPY AQPE AMKI ARQG AQKR AVKG AWRV ARVA AIAY AQAQ AQYG AYRV AQNA ARAY EFAS AIAK AHKV AYAA AEGY FVAG

Table 4.1. Peptides selected by the BIR2 domains during screenings at the indicated domain concentrations. Boldface indicates the “dark” beads selected by the BIR2 domain.

A B 80 40 60 P1 30 P1 10

10 5

0 0 DENQHKRWFYML I VTSAGP DENQHKRWFYML I VTSAGP 25 20

20 P2 P2 15

15 10 10

5 5

0 0 DENQHKRWFYML I VTSAGP DENQHKRWFYML I VTSAGP 25 20

Occurrence 20 P3 Occurrence P3 15

15 10 10

5 5

0 0 DENQHKRWFYML I VTSAGP DENQHKRWFYML I VTSAGP

25 20

20 P4 P4 15

15 10 10

5 5

0 0 DENQHKRWFYML I VTSAGP DENQHKRWFYML I VTSAGP

Figure 4.1. Histogram of the BIR2 selected sequences. (A) Combined 1, 10, and 50 nM domain concentration screenings. (B) 500 nM domain concentration. The extreme discrimination of alanine residues is apparent from all screening conditions. 99

3 x 500 nM AVPY AIMF AIKF AIPI ARPA AFPF AAMF AIRF AYPI ARRS AYPY AKIF ATRF ARPI VKPF AHPF AKIF ASRF ARPI VRPF AHPY AKVF AMRF ARPV VRPI AMPF AKSF AMRF AIRI VKTF AAPY AMAF AMRF AKRI TKRF ASPF AIAF AYRF ARRI FDHI AGPF AIGF ARRY ARPS MQII ARPF AKAF ARRY ARPS RQTA AKGF QGRW 1 x 1 µM ATPF AMGF AGRF ATPI ATRS ATPY ARIF ARRF ASPI ARPQ AAPF AFKF AYRF AYKI VTGF AHPF AKKF AFRY AVRI VARF AYPY ARKF AKRY AVRI VQRF AKPF AKMF AKRY AIRI VKTF AIAF AARF ARRY AARI SRPF AKAF ASRY AVPI AKRI ESPW AKAF AVRF ATPI ARPG 1 x 2 µM ATPF AFRF ARPI ARRI ARPN AVPF AYRF AVII AKRI AVPE AAPF ALRY AVII ARAT ARRE ATAY AGRF AKII ALPT VAIF AAAF AKAY AKIV AFPT VRAF AAVF AKSF AQII AAPT VRRF AVIF AMTF ARVI ARPT VQRF AIMF AQRY ARTI ATRT TKRI AKMF AAPI ANAI AIPS IRTF AVKF AAPV AVRI AIRA IRRF AFRF AFPI AIRI AFPG NHGW

Table 4.2. BIR3 domain selected peptides from indicated screening conditions.

Boldface is used to highlight the duplication of the unusual VKTF sequence in two separate screenings.

100 130 P1 120

0 DENQHKRWFYML I VTSAGP 75

60 P2

0 DENQHKRWFYML I VTSAGP

60 P3 Occurrence 45

0 DENQHKRWFYML I VTSAGP 75

60 P4

0 DENQHKRWFYML I VTSAGP

Figure 4.2. The combined histogram of the BIR3 domain screenings performed at 500 nM, 1 µM, and 2 µM concentrations.

101

Peptide BIR2 BIR3 1 ATPF 2.3 ± 0.3 0.46 ± 0.03 2 ATPI 2.7 ± 0.2 1.0 ± 0.08 3 ATRF 3.4 ± 0.2 1.1 ± 0.2 4 ATRY 6.9 ± 0.9 1.4 ± 0.2 5 VRRF N/B 0.85 ± 0.07 6 VKTFLEAb N/B 2.1 ± 0.2 7 AVVV 0.87 ± 0.3 N/B 8 ATRP 3.8 ± 0.2 N/B 9 AVRV 1.0 ± 0.1 1.5 ± 0.2

Table 4.3. Dissociation constants (µM) of library selected peptides toward BIR2 and

‡ BIR3 domains. All peptides, except ( ), contain the C-terminal linker -BBK-NH2, the lysine side chain of which was acylated with a PEG4-biotin moiety for immobilization by

BIAcore streptavidin sensorchips. Each domain was prepared as a C-terminal fusion to

MBP as described in Methods. N/B indicates no detectable binding at 30 µM domain

concentration. ‡Peptide was derived from the N-terminus the of large sub-unit of

processed caspase-10d, this peptide was synthesized with the C-terminal linker -

BE(PEG-biotin)-NH2. The N-terminal tetrapeptide was selected by a 500 nM and 1 µM

screening.

102

Table 4.4. Potential BIR2 mediated binding partners derived form the Protein Interaction

Resource database search.

103

Protein Motif Zinc Finger Protein 202 MATAV Mitochondrial Ribosomal Protein S5 MATAV Tripeptidyl Peptidase II MATAA LAG1 Longevity Assurance Homolog 5 MATAA Squamous Cell Carcinoma Antigen Recognized by T-Cells 3 (SART3) MATAA Remodeling and Spacing Factor 1 MATAA Procollagen (type III) N-Endopeptidase MATAA POU Domain, Class 3, Transcription Factor 3 MATAA WS β-Transducin Repeats Protein MATAA Low-Densisty Lipoprotein B MATAA N-Oct 3 MATAA Nicastrin MATAG Kallikrein 4 MATAG RAB14 MATAP Protein Kinase C, D2 type MATAP Nedd4 Binding Protein 3 MATAP Katanin p80 Subunit B1 MATPV Translation Initiation Factor 3, Subunit 5ε MATPA GRINL1A Downstream Protein Gdown 4 MATPA BCL2-like 2 Protien MATPA ARNT2 Protein MATPA ATP-Dependent RNA Helicase #3 MATPA Calponin Homology Domain Containing 1 MATPG Phosphatidylinositol-4-phosphate 5-kinase type IIα MATPG PP13 MATPP UXT Protein MATPP Deoxycytidine Kinase MATPP Zinc Finger Protein 268 MATRV Gamma-Taxilin MATRV Zinc Finger Transcription Factor BTEB2 MATRV P120 MATRG SH3PX1 Protien MATKA Superoxide Dismutase MATKA Kelch/Ankyrin Repeat Containing Cyclin A1 Interacting Protien (KARCA1) MAVAV Septin 11 MAVAV RNA Helicase MAVAV SCAND2 Protein MAVAV Pinin MAVAV β1-Syntrophin MAVAA Mitochondrial 28S Ribosomal Protein S32 MAVAA Thyroid Hormone Receptor Interactor 4 MAVAG ABCD4 MAVAG Ubiquitin Specific Protease 11 MAVAP COM Domain Containing 10 MAVPA Putative Tumor Suppressor Gene 26 Protein MAVPA THO Complex 3 MAVPA Dachshund MAVPA Phosphodiesterase 3A MAVPG G Patch Domain Containing 3 MAVPG PAP Associated Domain Contianing 1 MAVPG 104 Dual-Specificity Tyrosine-Phosphorylation Regulated Kinase 1B (DYRK1B) MAVPP MICAL-Like Isoform 2 MAVPP Ethanolamine Kinase 2 MAVPP GROS1-L Protein MAVRA Integrin β4 Binding Protein MAVRA Bagpipe homeobox 1 MAVRG Solute Carrier Family 25 MAVKV Dedicator of Cytokinesis 6 (DOCK6) MAIAG Mitochindrial Serine Hydroxymethyltransferase MAIRA L-Asparaginase MARAV Centrosomal Protein of 72 kDa MARAG Spire Homolog 2 MARAG Leucine-Rich Repeats and Immunoglobulin-Like Domains 1 MARPV Winged Helix Domain-Containing Protein MARPV Albumin D-Box Binding Protein MARPV Brain Acyl-CoA Hydrolase (BACH) MARPG ARPG863 MARPG Neuromedin B MARRA Constitutive Androstane Receptor SV16 MARRP SPRY Domain-Containing SOCS Box Protein SSB-3 MARRP c-myb MARRP v-myb myeloblastosis viral oncogene homologue MARRP WD Repeat Domain 40A MARKV aarF Domain Containing Kinase 1 MARKA HSP70-2 MAKAA RNA Processing Factor MAKAG NIR1 MAKAG Ankyrin Repeat Domain 2 MAKAP Annexin VI MAKPA Hexokinase 1 MAKRA Zinc Finger Protein 479 MAKRP Zinc Finger Protein 679 MAKRP Flavin Containing Monooxygenase 4 MAKKV

105

Table 4.5. Potential BIR3 mediated binding partners derived form the Protein Interaction

Resource database search.

106

Protein Motif Ref Checkpoint Kinase (Chk1) MAVPF [147] Creatine Kinase, Mitochondrial 1 MAGPF Nuclear Prelamin A Recognition Factor-Like Variant MASPF Tropomodulin 3 MALPF HMBA-Inducible Protein HEXIM1 MAEPF Similar to Hephaestin MAQPF Translocase of Outer Mitochondrial Membrane 34 MAPKF Sec61 Homolog MAIKF Heart Alpha-Kinase MASKF Mitogen-Activated Protein Kinase 6 MAEKF Four and a Half LIM Domains 1 (FHL-1, SLIM1) MAEKF Aldolase B MAHRF Purigernic Receptor P2X1 MARRF 39S Ribosomal Protein L45, Mitochindrial Precursor MAAPI Microcephalin MAAPI RAB33A, RAS Oncogene Family Member MAQPI GTP-Binding Protein S10 MAQPI Interferon-Responsive Finger Protein 1, Short Form MASKI Ring Finger Protein 38 MACKI Similar to Schlafen 5 MAMKI Similar to 40S Ribosomal Protein S3 MAARI F-Box Protein 47 MASRI Transcription Factor EB MASRI Myeloma Overexpressed MALRI Acyl-CoA Synthase 4 MAKRI Sorting Nexin 16 MATPY NPAS1 Protein MAAPY Hexaribonulceotide Binding Protein 1 MAQPY Phospholipase C Beta 4 Isoform A MAKPY Carboxypeptidase Z Isoform 3 MAWPY Proteosome (prosome, macropain) Subunit, Alpha Type 8 (PMSA 8) MASRY Pheromone Receptor MASRY 60S Ribosomal Protein L36 MALRY Meis2 MAQRY Myeloid Ecotropic Viral Integration Site 1 Homolog 2 Isoform 1 (Meis1) MARRY Meis1-Related Protein 2 (Meis3) MARRY T-Cell Receptor Alpha Chain V Region MVLKF PNAS-124 MVKKF PNAS-131 MVKKF Ubiquinol-Cytochrome C Reductase MVTRF Actin-Filament Binding Protein Frabin MVNKI VPRI645 MVPRI 26RFa Precursor MVRPY Orexigenic Neuropeptide QRFP Precursor MVRPY SEC14L MVQKY

107

CHAPTER 5

MATERIALS and GENERAL METHODS

5.1 Materials.

Escherichia coli peptide deformylase was overexpressed in E. coli BL21(DE3) strain and purified to apparent homogeneity with Co2+ as the divalent metal [54]. All general laboratory chemicals, buffers, and solvents were obtained from either Sigma–

Aldrich (St. Louis, MO) or VWR International (West Chester, PA), and Fisher Scientific

International (Hampton, NH). All electrophoresis materials were purchased from BioRad

Laboratories (Hercules, CA). All peptide synthesis reagents, including Fmoc-protected amino acids, 2-(1H-benzotriazol-1-yl)-1,1,3,3-tetramethyluronium hexafluorophosphate

(HBTU), 1-hydroxybenzotriazole (HOBT), Nic-OSu, and resins were purchased from

Advanced Chemtech (Louisville, KY), Peptides International (Louisville, KY ), or

Novabiochem (Darmstadt, Germany). Streptavidin-Alkaline Phosphatase was obtained from PROzyme (San Leandro, CA). All DNA restriction endonucleases and modifying enzymes were purchased from New England Biolabs (NEB, Beverly, MA), whereas

DNA oligos for PCR were from Integrated DNA Technologies (IDT, Coralville, IA).

DNA Clean-Up Kits and other spin-column-based DNA purification kits were from

Qiagen (Valencia, CA). Other chemicals, including isopropyl-β-D-thiogalactopyranoside

108 (IPTG), phenylmethanesulfonyl fluoride, kanamycin, ampicillin, and β-mercaptoethanol

were purchased from Aldrich.

5.2 Buffers

Buffer 1: 100 mM Bis-Tris, pH 6.7, 1 mM EDTA

Buffer 2: 10 mM HEPES, pH 8.0, 35 mM NaCl

3X Buffer 3: 300 mM Tris, pH 8.25, 15 mM EDTA, 6 M urea

Buff 4: 10 mM Bis-Tris, pH 6.5, 0.5 mM EDTA, 50 mM NaCl

Buffer 5: 50 mM Tris, pH 7.6

Buffer 6: 50 mM Tris, pH 8.25, 2.5 mM EDTA, 1 M urea

Buffer 7: 30 mM HEPES, pH 7.5, 150 mM NaCl, 5 mM BME, 5 mM EDTA, 0.5 mM

benzamidine, 20 µg/mL soybean trypsin inhibitor, 20 µg/mL leupeptin, 20 µg/mL

pepstatin, 0.2 % Triton X-100, 0.5 mg/mL protamine sulfate

Buffer 8: 20 mM HEPES, pH 8.15, 150 mM NaCl, 2 mM BME, 10 mM maltose

Buffer 9: 20 mM HEPES, pH 7.4, 150 mM NaCl, 2 mM BME

Buffer 10: 10 mM HEPES, pH 7.4, 150 mM NaCl, 3 mM EDTA, and 10 µM

tris(carboxyethyl)phosphine

Buffer 11: 30 mM HEPES, pH 7.85, 500 mM NaCl, 0.5 mg/mL protamine sulfate, 0.5

mM benzamidine, 20 µg/mL soybean trypsin inhibitor, 20 µg/mL leupeptin, 20

µg/mL pepstatin, 0.1 % Triton X-100 (optional)

Buffer 12: 10 mM HEPES, pH 7.85, 500 mM NaCl

Buffer 13: 10 mM HEPES, pH 7.4, 150 mM NaCl, 125 mM imidizole

109 Buffer 14: 10 mM Tris, pH 7.8, 5 mM NaCl, 1 mM EDTA, 10 mM BME, 0.5 mM

benzamidine, 20 µg/mL soybean trypsin inhibitor, 20 µg/mL leupeptin, 20 µg/mL

pepstatin, 0.3 mM PMSF, 0.5 mg/mL protamine sulfate

Buffer 15: 10 mM Tris, pH 7.8, 5 mM NaCl, 1 mM EDTA, 10 mM BME

Buffer 16: 10 mM Tris, pH 7.8, 500 mM NaCl, 1 mM EDTA, 10 mM BME

Buffer 17: 10 mM MES, pH 5.5, 0.5 mM EDTA, 10 mM BME

Buffer 18: 10 mM MES, pH 5.7, 500 mM NaCl, 0.5 mM EDTA, 10 mM BME

Buffer 19: 30 mM HEPES, pH 7.4, 150 mM NaCl, and 0.05% Tween 20

Buffer 20: 30 mM Tris, pH 7.6, 1 M NaCl, 10 mM MgCl2, 70 µM ZnCl2, 20 mM

potassium phosphate

Buffer 21: 30 mM Tris, pH 8.5, 100 mM NaCl, 5 mM MgCl2, 20 µM ZnCl2

HBS-EP Buffer: 10 mM HEPES, pH 7.4, 150 mM NaCl, 3 mM EDTA, and 0.005 %

polysorbate 20

Strip Buffer: 10 mM NaCl, 2 mM NaOH, and 0.025% SDS

Buffer 22: 10 mM HEPES, pH 7.6, 150 mM NaCl, 2 mM DTT, 0.5 mM benzamidine, 20

µg/mL soybean trypsin inhibitor, 20 µg/mL leupeptin, 20 µg/mL pepstatin, 0.2 %

Triton X-100, 40 U DNase (Promega)

Buffer 23: 10 mM HEPES, pH 7.6, 150 mM NaCl, 8 mM DTT, 1 mM EDTA

TE Buffer: 10 mM Tris, pH 8.0, 1 mM EDTA

Gel Extraction Buffer: 500 mM ammonium acetate, pH 8.0, 1 mM EDTA

110 5.3 General Biochemical and Biological Methods

5.3.1 Materials

Phenol was purchased from Fisher Scientific prepared [149] prior to use and stored in 10 mM Tris.HCl, pH 8.0, 1 mM EDTA, 0.1% β-mercaptoethanol at 4 ◦C for up to 1 month. Chloroform mixture is indicated by 24:1 chloroform and 3-methyl-1-butanol.

Chloroform is ACS certified. Deoxynucleotide mixture was from Stratagene. Ethidium bromide, bromophenol blue, xylene cyanol, ethylenediamine tetra-acetic acid (EDTA),

isopropyl-β-D-thiogalactosidase (IPTG) and DTT were from Sigma.

5.3.2 Growth Media

Dry bacterial growth media (bactotryptone, yeast extract, casamino acids, and

bactoagar) were obtained from Difco. All growth media were prepared with ddH2O and

sterilized by autoclaving for 20 min. at 20 lb/sq. in. on liquid cycle. Antibiotics,

inorganic salts, 20% (w/v) glucose, and 1 M MgSO4, were dissolved in ddH2O and

sterilized by passing through sterile 0.45 µM filters (Acrodisc, Gelman Sciences).

Antibiotics were added directly to the growth media after cooling to less than 50 ◦C.

Luria-Bertani Medium (LB): 10 g/L bactotryptone, 5 g/L yeast extract, 10 g/L NaCl.

LB Plates: LB medium plus 15 g/L bactoagar.

Escherichia coli Strains. The following strains were used in this work. Genotypes are taken from the 2005-2006 NEB catalogue. The description of the Rosetta strain is from the 2005 on-line Novagen catalogue.

- - - BL21(DE3) F ompT hsdSB (rB mB ; an E. coli B strain) gal dcm (DE3). This strain

carries a copy of T7 RNA polymerase gene on its chromosome (λ 111 lysogen). It has neither methylation nor restriction function, and therefore

it is useful for preparing DNA free of methylation. It is also deficient in

lon protease and is good for overproduction of foreign proteins.

BL21(DE3) (Rosetta) This strain is similar to the BL21(DE3) except it contains pRARE

(CmR). The strain supplies tRNAs for the codons AUA, AGG, AGA,

CUA, CCC, and GGA on a compatible chloramphenicol-resistant plasmid.

This strain is used to express gene sequences that contain many codons

infrequently used by E. coli. (i.e. biased codons).

- + r DH5αF’ F’/endA1 hsdR17 (rk mk ) glnV44 thi-1 recA1 gyrA (NaI ) relA1 ∆(lacZYA-

argF)U169 deoR (Ф80dlac∆(lacZ)M15). This train is deficient in DNA

recombination. Plasmid DNA prepared from this strain usually has very

high quality. It is also sometimes used to overproduce proteins.

XL1-Blue F’::Tn10 proA+B+ lacIq ∆(lacZ)M15/ recA1 endA1 gyrA96(NaIr) thi

- - hsdR17(rk mk ) glnV44 relA1 lac.

5.3.3 Growth and Storage of Bacterial Strains

E. coli were stored on agar plates at 4 ◦C for periods up to one month or as frozen

glycerol cultures at -80 ◦C indefinitely.

E. coli strains DH5αF’ and BL21(DE3) were streaked on LB plates and strain

BL21(DE3) (Rosetta) was streaked on LB plus chloramphenicol plates(35 mg/L) if no

additional antibiotic resistance was conferred by plasmid. Cells from a frozen glycerol

culture were streaked on an appropriate plate and then incubated at 37 ◦C until individual

colonies were 1-2 mm in diameter.

112 To store cells as frozen glycerol culture, 0.6 mL of liquid bacterial culture was added 0.4 mL of glycerol in a sterile microcentrifuge tube (1.5 mL). The culture was mixed with the glycerol by pipette and transferred to -80 ◦C for long-term storage.

Overnight cultures were prepared by picking a single, well-isolated colony from

an agar plate and using it to inoculate 1-10 mL of the appropriate sterile medium broth.

Cells were allowed to grow at 37 ◦C for 8-16 hr.

5.3.4 Preparation of Competent Cells

Unless otherwise noted, all sterile pipettes, tubes, and solutions were pre-chilled

to 4 ◦C, and all processes were performed on ice or in a cold room. 250 mL of LB media

was inoculated by the addition of 2.5 mL of the overnight culture of an appropriate strain.

◦ The cells were allowed to grow with aeration at 37 C until OD600 reached 0.4-0.6. The

cells were transferred to a sterile GS-3 centrifuge tube, cooled on ice 5-10 min., and centrifuged at 5,000 rpm for 5 min. The pellet was gently resuspended in 10 mL of ice cold competent solution A (30 mM KOAc, pH 5.8, 100 mM RbCl, 10 mM CaCl2, 50 mM

MnCl2, and 15% (v/v) glycerol) and kept on ice for 5 min. The cell suspension was

centrifuged again (5,000 rpm, 5 min.) and the resulting pellet was resuspended in 10 mL

of ice cold competent solution B (10 mM MOPS, pH 6.5, 75 mM CaCl2, 10 mM RbCl,

and 15% (v/v) glycerol). Competent cells in solution B were stored at -80 ◦C for up to 6

months.

5.3.5 Quantitation of DNA and RNA

Pure samples of DNA and RNA (i.e. free of proteins, phenol, agarose,

nucleotides, or other nucleic acids) were quantitated by measuring the absorbance at 260 113 nm using a UV/vis spectrophotometer as described. An OD260 of 1 corresponded to

approximately 50 µg/mL for double-stranded DNA, 40 µg/mL for large single-stranded

DNA, and 27 µg/mL for single-stranded oligonucleotides. The purity of a preparation of

DNA or RNA was estimated by reading its ratio of OD260/OD280. Pure preparations of

DNA and RNA should have OD260/OD280 values of 1.8 and 2.0, respectively. When the amount of a sample was small (<1 µg) or the sample was heavily contaminated with other substances that absorbed at 260 nm, the quantity of DNA (or RNA) in the sample was estimated by comparing the fluorescence yield of the sample with that of a series of standards.

5.3.6 Protein Quantitation

Protein concentrations were determined by the Bradford method using bovine serum albumin (BSA, Sigma) as standard. The Bradford reagent was obtained from

BioRad as a 6X stock, which was diluted with ddH2O to 1X. Construction of a standard

curve was performed by plotting the observed OD595 for five solutions containing

Bradford reagent versus their known concentrations of BSA (0, 1, 2, 4, and 6 µg/mL).

After regression analysis of the standard curve, the OD measurements of unknown samples could be used to determine the concentrations of the original unknown stock.

5.4 Electrophoresis

5.4.1 Agarose Gel

Agarose gel electrophoresis was used for separation of large DNAs (>150 bp).

Agarose was added to 100 mL of 0.5x TBE buffer (44.5 mM Tris, 41.5 mM boric acid, 1 mM EDTA, pH 8.3) to make final concentration of 0.5-1.5% (w/v). The agarose was 114 melted by heating in a microwave and 1~2 µL of an aqueous solution of ethidium bromide (10 mg/mL) was added. After mixing by gentle swirling, the agarose was poured into a horizontal slab gel tray (110 x 140 mm) and allowed to solidify at 4 ºC.

DNA samples were mixed with 1/6 volume of a 6x loading buffer (30% (v/v) glycerol, 0.25% (w/v) bromophenol blue in ddH2O) and loaded into 5 x 2 (1) x 5-10 mm

(length x width x depth) wells. Electrophoresis was carried at constant voltage (100 V)

submerged in 0.5x TBE buffer until the desired separation was achieved. The DNA-

containing bands were visualized under either long-wavelength (preparative) or short-

wavelength (analytical) UV light.

5.4.2 Polyacrylamide Gels for Protein Separation

Proteins were separated by sodium dodecyl sulfate-polyacrylamide gel

electrophoresis (SDS-PAGE) according to the method of Laemmli [150]. The gel

consisted of a stacking layer (175 x 30 x 1.5or 0.75 mm) and a separating layer (175 x

120 x 1.5 or 0.75 mm). The percentage (10-20%) of the acrylamide of the separating

layer depended on the sizes of the proteins in the sample. An appropriate volume of a

30% acrylamide stock solution (bis(acrylamide): acrylamide , 1:29) was diluted with

. ddH2O, separating gel buffer (325 mM Tris HCl, pH 8.8, 0.1% SDS (w/v)) and 1.6 mg/mL ammonium persulfate. The resulting solution was degassed, activated by the

addition of 8.0 µL of TEMED, and poured between two assembled glass plates. The

surface of the separating gel was smoothed by adding a few drops of n-butanol saturated

with water on the top of the gel. The gel was allowed to polymerize at ambient

temperature for 15-20 min and the n-butanol was removed by rinsing the gel with water.

The stacking gel solution containing 4% acrylamide mixture, 1.6 mg/mL ammonium 115 persulfate, and 8.0 µL of TEMED in 1x stacking gel buffer (125 mM Tris.HCl, pH 6.8,

0.1% SDS) was then added onto the top of the polymerized separating gel. The sample

loading wells were formed by inserting combs of the appropriate width into the top of the

stacking gel.

Protein samples were mixed with an equal volume of a 2x loading buffer (125 mM Tris.HCl, pH 6.8, 4% (w/v) SDS, 10% (v/v) β-mercaptoethanol, 20% (v/v) glycerol,

0.2% (w/v) bromophenol blue) and heated at 95-100 ◦C for 5 min prior to loading. The

gels were electrophoressed in Tris-glycine buffer (25 mM Tris, 192 mM glycine, 0.1%

SDS, pH 8.0) at 200 V until the bromophenol blue had migrated near the bottom of the

gel. The gels were then removed from the glass plates, stained in 40% methanol, 10%

acetic acid in water containing 0.25% (2 hr staining) or 0.05% (overnight staining)

Coomassie Brilliant Blue R-250, and destained by soaking in a solution of 40% (v/v)

methanol, 10% acetic acid in water.

5.4.3 Urea-PAGE Gels for Oligonucletide Purifiation

Oligonucleotides are separated by PAGE gels using urea as the de-naturant. The

percentage of polyacrylamide used in the gel depends on the size of the oligo being

purified (for ~19-mers, a 19% gel was used: for ~26-mers, a 15% gel was used; and for

~45-mers, a 12% gel was used). The ratio of acrylamide to bisacrylamide is always 29:1.

Recipes are shown below.

Thin gel (.4 mm) Thick gel Gel Components Gel Components (in 100 mL) mass (g) (in 600 mL) mass (g) 7 M urea 42.042 7 M urea 252.252 100 mM tris free base 1.2114 25 mM tris free base 1.817 100 mM boric acid 0.6183 25 mM boric acid 0.927

1 mM Na2EDTA 0.0372 0.25 mM Na2EDTA 0.0566 116 12%: 11.59/.41 12%: 69.52/2.48 29:1 acryl/bisacryl 15%: 14.48/.52 29:1 acryl/bisacryl 15%: 86.89/3.103 19%: 18.344/.6552 19%: 110.07/3.93

Buffer components Buffer components (in 2L) (in 2L) 100 mM tris free base 24.228 g 25 mM tris free base 6.06 g 100 mM boric acid 12.366 25 mM boric acid 3.09

1 mM Na2EDTA 0.744 0.25 mM Na2EDTA 0.186

When purifying oligos in the range of 45 nucleotides in length, it is necessary to use a

“thick gel.” When all the gel component materials dissolved, the volume was adjusted ti that indicated and the solution was vacuum filtered through a 0.45 µm filter. After assembly of the 38 x 50 cm gel caster (BioRad), the gel was polymerized by the addition of 1 µL each of 25% ammonium persulfate and TEMED per mL of gel solution and quickly injected into the castor. After polymerization, the comb was removed and the apparatus was warmed over the course of ~1 hr. to ~50 ºC by running the gel at 130 V

(PowerPac 3000 equipped with temperature probe, BioRad). Once equilibrated at 45~50

ºC, the wells were expunged of urea by syringe prior to loading the oligos as a 50% formamide solution up to 2 nmol/lane (15 nmol/lane for thick gels). After resolution by

PAGE, the oligos were visualized against the fluorescent backdrop of a TLC plate in a dark room using a 354 nm UV lamp (UV shadowing). The DNA was extracted from the excised band by incubation in 10 mL of Gel Extraction Buffer overnight. De-salting was performed by HPLC and the DNA quantitated as described.

5.5 Recombinant DNA techniques

5.5.1 Restriction Digestions

All restriction digestions were performed in buffers obtained from NEB.

Complete digestions were usually carried out at 37 ◦C for 1-8 hr using 1.5 units of 117 restriction enzymes for 1 µg of plasmid DNA. More enzymes (0.4-1 unit/pmol DNA)

were used for every microgram of smaller DNA fragments (synthetic DNAs, PCR

products, or restriction fragments). If required, the restriction enzymes were removed

after digestion by Qiagen spin column kits according to the manufacturer’s protocol.

5.5.2 Filling Recessed 3’-Termini and Removing Protruding 3’-Termini

In order to ligate DNA fragments with incompatible termini, they were converted

into compatible forms by partially or completely filling-in the recessed 3’-termini or

removing the protruding 3’-termini. To fill-in a recessed 3’-terminus, 1 µL of a solution containing the desired dNTPs (depended on the sequence of the 5’ overhang and whether partial or complete fill-in was required) at 1 mM was added directly to 0.2-5 µg of DNA

(20 µL in restriction enzyme buffer plus 5 mM MgCl2) digested with appropriate

restriction enzymes. After the addition of the Klenow fragment of DNA polymerase I (1

unit for every µg of DNA), the reaction mixture was incubated at room temperature for

15 min. Removal of protruding 3’ termini was carried out similarly except that T4 DNA

polymerase was used instead of the Klenow fragment, height concentrations of dNTPs

were added (1 µL of 2 mM each), and the reaction was incubated at 14 ◦C for 15 min.

After the reaction, enzymes and dNTPs were removed by either agarose gel

electrophoresis or Qiagen Kit (Qiaquick mini column).

5.5.3 Removal of 5’ Phosphates

In order to prevent undesired self-ligation of a DNA fragment, the 5’ phosphate

groups were removed with calf intestinal alkaline phosphatase (CIP). Dephosphorylation

was carried out in a 50 µL reaction containing digested DNA (up to 10 µg) and CIP (0.01 118 unit/pmol of protruding 5’ termini, 1 unit/pmol of blunt or recessed 5’ termini) in 10 mM

. Tris HCl, pH 8.3, 1 mM MgCl2, 1 mM ZnCl2 buffer. The reaction mixture was diluted to

100 µL in TE buffer and the CIP was removed by extraction with 1:1 mixture of phenol/chloroform (3x), chloroform (2x), followed by ethanol precipitation.

Alternatively, the CIP was removed by Qiaquick spin-column.

5.5.4 Ligation of DNA

Ligation reactions were generally carried out in 1X T4 DNA ligase buffer supplied with T4 DNA ligase at ~16 ºC ◦C for 4-20 hr. Ligations involving blunt-ends

were generally performed at room temperature. The concentrations of DNA varied from

2 µg/mL (sticky-end ligation) to 30 µg/mL (blunt-end ligation). The ratio of insert

DNA/vector DNA varied from 3:1 (restriction fragment to dephosphorylated vector

DNA) to 10:1. The concentrations of T4 DNA ligase varied from 5 (sticky-end ligation)

to 100 (blunt-end ligation) Weis units/mL. The reaction volume was 20-40 µL. The

resulting solution was stored at -20 ◦C until used for transformation.

5.5.5 Transformation

Competent cells in competence solution B were prepared as described previously

and dispensed into 100 µL aliquots on ice. For transformations with purified supercoiled

plasmid DNA, three dilutions of plasmid DNA (approx. 100 ng, 10 ng, and 1 ng of

plasmid in 1-10 µL of TE buffer) were added each to an aliquot of competent cells in a

sterile microcentrifuge tube. For transformations with plasmid DNA from mutagenesis

synthesis reactions or ligation reactions, 10 µL, 1 µL, and 1 µL of 10 fold-dilution of the

reaction mixture (usually containing 1-10 µg/mL DNA) were added to aliquots of 119 competent cells. The DNA/cell suspensions were gently mixed and kept on ice for 30-35

min, after which the tubes were placed in a 37-42 ◦C heat-block for 3 min. The tubes were

centrifuged for 15 sec in a microcentrifuge and the supernatants were carefully

withdrawn with a pipette. The cell pellet was gently resuspended in 1 mL of LB medium

and incubated at 37 ◦C for 30-60 min. The cells were pelleted again by centrifugation for

15 sec and resuspended in 100 µL of LB medium. This culture was evenly spread onto

an LB plate impregnated with the appropriate antibiotic(s). The plates were inverted and

incubated at 37 ◦C for 10-16 hr.

5.5.6 Small-Scale Preparation of Plasmid DNAs

Typically this was performed by isolating a single E. coli colony from the

transformation LB plate and inoculating 5 mL of LB media with the appropriate

antibiotic(s). This method was adapted from that of Holmes and Quigley [151]. After inoculation with aeration at 37 ◦C for 16-24 hrs, cells were pelleted by centrifugation

(5000 rpm, Sorvall SS-34 rotor, 5 min). The liquid supernatant was removed and the cell pellet was resuspended by vigorous vortexing in 100 µL of ice cold Solution I (50 mM glucose, 25 mM Tris.HCl, pH 8.0, 10 mM EDTA). 200 µL of freshly prepared Solution

II (0.2 N NaOH, 1% SDS) was added to the suspension and mixed by inverting 5-6 times and stored on ice for 2-5 min. 150 µL of ice-cold Solution III (60 mL 5 M KOAc, 11.5 mL glacial acetic acid, 28.5 mL ddH2O) was then added and mixed by vortexing in an

inverted position for 10 sec and returned to ice for 3-5 min. The lysed cell suspension

was centrifuged for 5 min in a microcentrifuge at 13,000 rpm. The supernatant was

transferred into a fresh tube and washed with 400 µL of 1:1 phenol:chloroform mixture.

The solution was centrifuged at 13,000 rpm for 2 min. The top aqueous layer was 120 transferred to another fresh tube. The DNA was precipitated by adding 2 volumes of

ethanol and centrifuged at 4 ◦C for 5 min. The pelleted DNA was dried and redissolved in

50 µL TE and stored at -20 ◦C.

5.5.7 Mutagenesis

Site-directed mutagenesis was carried out by QuickChange Site-Directed

Mutagenesis Kit (Stratagene). Reactions were carried strictly as directed by the

manufacture protocol. Mutagenesis primers (35-40 nt) were ordered and synthesized by

IDT DNA Technologies. Pfu DNA polymerase was from Stratagene. Reactions

typically were carried out in 50 µL PCR reaction volume (thin wall) by adding 5-50 ng

vector dsDNA template, 125 ng primers, 0.5 mM dNTP mix, and 2.5 unit Pfu turbo DNA

polymerase, and adjust volume with ddH20 to get 50 µL. Cycling temperatures were

performed as followed: 16-18 cycles of 94 ◦C for 30 s/55 ◦C for 1 min/72 ◦C for 2 min per

1 kbp of plasmid. Plasmids typically contained about 6 kbp and required 12 min at the 72

◦C cycle. At the completion of the cycle, 10 units of Dpn I was added to the mutagenesis mixture and incubated at 37 ◦C for 1 hr. Dpn I treatment was required to remove the

parental or wild-type plasmid. The reaction mixture was then transformed into E. coli as

described previously.

5.5.8 Sequencing

DNA sequencing was performed by the dideoxy chain-termination method

derived from Sanger [152]. Sequencing was performed at the Plant Genomic Facility at

The Ohio State University.

121

APPENDIX: SUPPLEMENTARY SCHEMES, TABLES, AND FIGURES

122 Nde I Eco RI Bam HI Bam HI Eco RI Hind III Hind III Not I Nde I Xho I Xho I 5’ 5’ 6x His malE

pET-28a pMAL-c2

kan amp

Nde I/Xho I 1. PCR 2. Nde I/Xho I Nde I Xho I Nde I Xho I

6x His malE

pET-28a

kan

Eco RI Bam HI T4 DNA Ligase Hind III

6x His malE

pETMAL

kan

Scheme A1. Construction of the pETMAL vector.

123 Hind III Bam HI Eco RI Nde I Eco RV Hind III Xho I Bgl II Eco RV Bam HI Nde I Not I 5’ 5’ 6x His BCD Eco RI

pET-14b PinPoint Xa-1

amp amp

1. Eco RI 1. PCR 2. Klenow 2. Eco RI 3. Eco RV 3. Klenow 4. T4 Ligase 4. Nde I Nde I Xho I Bam HI Nde I blunt

6x His BCD

X pET-14∆RIV

amp Hind III 1. Bam HI Bam HI 2. Klenow Eco RV 3. Nde I Bgl II T4 Ligase Not I

6x His BCD

X pET-PNPT

amp

Scheme A2. Construction of the pET-PNPT vector.

124 Hind III Bam HI Eco RI Xho I Xho I Eco RV Bam HI Bgl II Hind III Not I Xho I Eco RV 5’ 5’ T7 term 6x His BCD 6x His malE

X pET-PNPT pETMAL amp kan

QuikChange SDM Hind III 1. PCR Bam HI 2. Eco RV/Xho I Xho I Eco RV 3. Klenow Bgl II Not I

6x His BCD blunt blunt

X pET-PNPT Linker/MCS Linker = poly Asn amp

1. Xho I/Not I 2. Klenow Eco RI Bam HI T4 Ligase Hind III Xho I

6x His BCD Linker

X pPPTmal amp

Scheme A3. Construction of the pPPTmal vector.

125 Bam HI Eco RI stop Hind III Bgl II Eco RI Not I 5’ 5’ 6x His GFP GATCT Eco RI 1. PCR 2. Bgl II/Eco RI pET29-GFPuv no Bam HI or stop kan Eco RI Bam HI/Eco RI Hind III Nde I Not I GGATC T4 DNA Ligase Eco RI X 6x His GFP 6x His GFP

kan kan SDM: 1. – Nde I 2. S72A Eco RV 3. I167T Eco RI 4. F64L,S65T, V68L Hind III 5. +Eco RV Not I Eco RI Bam HI 6x His GFP Hind III Not I Eco RV/Hind III 6x His GFP Linker kan T4 pGFPmal blunt Hind III Ligase kan Linker/MCS

Similar to pPPTmal scheme

Sceme A4. Construction of the pGFPmal vector.

126

Figure A1. Cooomassie stained 4-15% SDS-PAGE gel (BioRad) of purified SHP-2.

The indicated molecular weights are listed in units of kDa. Lane X is the molecular weight marker (BioRad), lane 1 contains ~2.5 µg, and lane 3 contains ~6 µg of total protein.

127

MApYAVI TRpYSII MKpYASI YEpYSMI VRpYSFV IEpYAII TYpYSII LLpYAEI YHpYSVI VIpYSRV ILpYAII TVpYSNI LNpYALI HYpYSMI TMpYSAV IKpYAKI TFpYSSI LNpYALI QMpYSTI TRpYSIV IIpYAQI TNpYSVI LSpYALI LVpYTEI TYpYSLV IQpYASI AYpYSTI IApYALI TMpYTDI TFpYSPV IRpYASI VYpYTAI IGpYARI TMpYTGI TMpYSSV IVpYASI TIpYTMI VNpYADI TNpYTHI TNpYSYV IYpYASI TApYTQI VRpYADI TQpYTYI SYpYSTV IEpYAVI TYpYTQI VKpYAFI SVpYTTI YSpYSEV INpYAYI TFpYTRI VMpYAMI YIpYTLI VIpYTHV VQpYAFI TIpYTSI VNpYAVI YEpYTQI TFpYTDV VTpYAII TTpYTSI TSpYAAI YYpYTTI YTpYTRV VApYALI TSpYTTI TFpYAFI HFpYTQI YNpYTYV VEpYALI YFpYTTI TApYAII MDpYVQI TEpYVTV VFpYANI HFpYTTI TFpYAKI LHpYVEI YEpYVFV VGpYAQI HYpYTTI TEpYAVI LHpYVSI YMpYTPT VMpYAQI VSpYVQI TKpYAYI ITpYVSI VQpYAAL VApYAVI THpYVII SMpYAQI TDpYVII VSpYARL VHpYAVI TNpYVQI AEpYAEI THpYVQI YYpYAHL VLpYAVI TDpYVVI AMpYAFI PWpYVTI YSpYAIL TTpYANI TEpYVVI AQpYAVI AQpYVHI NRpYAIL TSpYAQI LFpYATV GLpYAHI ATpYVVI VLpYSQL TVpYATI IApYAIV WHpYAEI YNpYVLI IHpYTTL TKpYAVI VSpYATV WNpYAYI YEpYVMI TIpYTML SGpYAII TMpYAFV YApYADI NQpYVQI TIpYTQL SVpYANI IIpYSQV FHpYAQI DApYVSI TMpYTTL SYpYARI IQpYSTV FLpYAVI EApYVAI TRpYTVL SNpYAVI IMpYSVV HSpYAQI RApYVVI TLpYVTL SWpYAVI VTpYSRV QPpYATI YNpYLTI YTpYVYL PFpYAII TFpYSVV NVpYAAI LLpYATV VTpYATM PYpYAII VVpYTAV NYpYADI VIpYADV ITpYTMM AMpYAII TVpYTQV NMpYAFI VLpYAHV VMpYTAM GHpYAII IKpYAIL DKpYAII VYpYAIV TQpYVYF GQpYATI VTpYAQL DGpYAKI VIpYALV TKpYIYF WNpYAII VIpYATL MVpYSQI THpYAQV IVpYMTF YFpYAHI TKpYALL IYpYSPI TRpYAVV IHpYMYF YQpYAII YSpYARL IEpYSTI TRpYAVV SYpYVYY YQpYAKI TApYSIL IApYSYI SMpYAVV YNpYVFY YLpYATI TMpYTTL IEpYSYI AMpYAIV YApYVYY YEpYAVI ITpYTTM TYpYSII YRpYAQV IHpYLTY YHpYAVI VMpYTQM TYpYSQI NYpYATV YRpYMTY YLpYAVI TYpYAWM TPpYSVI IVpYSEV YDpYMYY VIpYSMI TVpYLTY TYpYSVI IFpYSQV IVpYITA VFpYSQI MEpYAEI SFpYSMI IIpYSTV PYpYIFA VVpYSTI MMpYAEI ANpYSTI VIpYSDV TVpYLYA YApYSEI

Table A1. Additional sequences selected against 50 nM SHP-2 C-SH2 domain. Bolface indicates sequences derived from intensely-colored beads. Normal typeface sequences are from medium-colored beads.

128

Class I Class II Class V IHpYVEI TLpYFTL WMpYYIQ WMpYRTV ISpYALF IVpYALI LQpYIIL WMpYYIR WTpYSTV RMpYKLF IYpYATI LHpYLVL WMpYTLE WMpYQYV LSpYLVF IIpYAIL ISpYMVL WVpYYTT WTpYVIY LTpYMSF AYpYAVI IRpYTIL WTpYTLY WMpYYQY LQpYMVF IIpYAAI LHpYTEL WTpYQIM WNpYMVY TApYMVF IYpYADI VVpYTIL WSpYKIY WMpYNYY VNpYMYF IQpYADI IWpYVAL WMpYRGA YVpYYIA MHpYVVF IVpYAII LNpYVEL WIpYQIA YIpYYID YWpYLTF IFpYATI VNpYVIL WVpYTIA RMpYYYH IRpYAYI LYpYAEM WMpYSIA Class III ILpYHIY LRpYAHI LRpYASM WVpYYLA IMpYTYA ITpYITY LLpYAII TLpYLVM WSpYQLA VMpYLYS IVpYLTY LNpYAKI ITpYMAM WMpYNLA LMpYMTS LSpYMYY LNpYAMI ITpYRIM WMpYYID MVpYMYS ITpYTYY LKpYATI ITpYSYM WMpYYID TMpYMYS LHpYTTY MVpYAEI ITpYILN WMpYHMD PMpYLYT LKpYVYY MRpYAEI IMpYQIN WTpYQIE QMpYMYT LTpYYYY VFpYAQI ITpYVIN WMpYKTE IHpYLYP YTpYLIY VYpYAQI ITpYANT WSpYTMF FMpYTYP WMpYYGY VRpYAVI ITpYITT WMpYFIG ILpYFQI LHpYMET WVpYTIH Class IV Other IQpYIAI VIpYMST WMpYRLI ILpYFFP LTpYRIV ITpYIDI IWpYATV WSpYYLI LYpYFSP MVpYIVA INpYINI ISpYAVV WTpYTTI VLpYFAP MNpYIYA LHpYLVI TLpYAIV WMpYRTI VVpYFIP VWpYIVA VQpYLII VIpYAEV WTpYTTI LYpYMPP ITpYYIE LHpYSTI IKpYISV WTpYSVK LRpYMVP LVpYTAI LApYIQV WSpYELL MMpYMTP LTpYTLI LRpYLQV WMpYTSL QMpYMIP LVpYTQI IEpYTAV WTpYHIM QWpYIVP VQpYTEI IQpYTHV WSpYENN LMpYLSP VKpYTEI IFpYTLV WTpYSTR AVpYFIA VMpYTQI ITpYTLV WTpYSYR VWpYMVA VFpYTQI VVpYTEV WIpYTMS VMpYTSI VKpYTPV WMpYNTS VVpYTVI LRpYVFV WTpYTIT LHpYVLI VNpYVEV WVpYTIT LQpYVQI VVpYVQV WMpYQIT TRpYVEI VTpYVQV WMpYFTT VNpYVEI YQpYAII WSpYYTT IMpYYGI YPpYAMI WVpYYTT FPpYAVL YLpYTQI WVpYRYT LMpYANL YApYAQI WSpYTIV YFpYTSI WTpYSLV

Table A2. Additional sequences selected against 50 nM SHP-2 N-SH2 domain.

129

AVpYSLL QLpYTYM MGpYYFM DGpYSLL VVpYTYI QApYYFL YFpYSLV ARpYVLL FRpYYFI SYpYSII YApYVML GGpYYFI FQpYSVM MGpYVYM DGpYYFV AMpYSVM KGpYVYM MYpYYYM PYpYSFA PFpYYLL MQpYYYM WYpYSFI QGpYYLL QApYYYI PTpYSFL WApYYLV ATpYYYI LNpYTLL GTpYYLV YLpYYYI VLpYTLL YFpYYLA PMpYYYI SSpYTLL LTpYYMM QVpYYYV ANpYTLM PQpYYMI FYpYFYA NLpYTLM PFpYYMI YTpYFYV PFpYTML KGpYYVM TVpYALM

Table A3. Additional sequences from lightly colored beads selected against 10 nM SHIP

SH2 domain.

130

AHPV IGNW KTRI REPR HHWQ MSSV RITV PYTW

Table A4. Peptide sequences from beads selected by BIR3 domain fluorimetric screening.

131

40 -2 30

0 DENQHKRWFYML I VTSAGP 40 -1 30

0 DENQHKRWFYML I VTSAGP

40 +1 30

Occurrence 0 DENQHKRWFYML I VTSAGP

40 +2 30

0 DENQHKRWFYML I VTSAGP

40 +3 30

0 DENQHKRWFYML I VTSAGP

Figure A2. Composite histogram of sequences selected by SHP-2 N-SH2 domain.

132 A B 20 20

15 -2 15 -2

10 10

5 5

0 0 DENQHKRWFYML I VTSAGP DENQHKRWFYML I VTSAGP 20 20

15 -1 15 -1

10 10

5 5

0 0 DENQHKRWFYML I VTSAGP DENQHKRWFYML I VTSAGP 20 20

15 +1 15 +1

10 10

5 5 Occurrence 0 0 DENQHKRWFYML I VTSAGP DENQHKRWFYML I VTSAGP 20 20

15 +2 15 +2

10 10

5 5

0 0 DENQHKRWFYML I VTSAGP DENQHKRWFYML I VTSAGP 20 20

15 +3 15 +3

10 10

5 5

0 0 DENQHKRWFYML I VTSAGP DENQHKRWFYML I VTSAGP Figure A3. Additional histograms of lesser represented ligands selected by the N-SH2 domain of SHP-2. (A) Class III, and (B) Class IV ligands. 133

BIBLIOGRAPHY

1. Merrifield, R.B. (1963). Solid phase peptide synthesis. I. The synthesis of a

tetrapeptide. J. Am. Chem. Soc. 85, 2149-2154.

2. Letsinger, R.L., and Mahadevan, V. (1965). Oligonucleotide synthesis on a

polymer support. J. Am. Chem. Soc. 85, 3526-3527.

3. Frechet, J.M., and Schuerch, C. (1971). Solid-phase synthesis of oligosaccharides.

I. Preparation of the solid support. J. Am. Chem. Soc. 93, 492-496.

4. Lam, K.S., et al. (1991). A new type of synthetic peptide library for identifying

ligand-binding activity. Nature 354, 82-84.

5. Houghten, R.A., et al. (1991). Generation and use of synthetic peptide

combinatorial libraries for basic research and drug discovery. Nature 354, 84-86.

6. Furka, A., Sebestyen, F., Asgedom, M., and Dibo, G. (1991). General method for

rapid synthesis of multicomponent peptide mixtures. Int. J. Peptide Protein Res.

37, 487-493.

7. Merrifield, R.B. (1964). Solid phase peptide synthesis. II. The synthesis of

Bradykinin. J. Am. Chem. Soc. 86, 304-305.

8. Merrifield, R.B. (1964). Solid-phase peptide synthesis. III. An improved synthesis

of Bradykinin. Biochemistry 3, 1385-1389.

9. Lenard, J., and Robinson, A.B. (1967). Use of hydrogen fluoride in Merrifield

solid-phase peptide synthesis. J. Am. Chem. Soc. 89, 181-182. 134 10. Carpino, L.A., and Han, G.Y. (1972). 9-Fluorenylmethoxycarbonyl amine-

protecting group. J. Org. Chem. 37, 3404-3409.

11. Carpino, L.A., et al. (2002). The uronium/guanididium peptide coupling reagents:

finally the true uronium salts. Angew. Chem. Int. Ed. 41, 441-445.

12. Albericio, F. (2004). Developments in peptide and amide synthesis. Curr. Opin.

Chem. Biol. 8, 211-221.

13. betterresin1.

14. betterresin2.

15. Scott, J.K., and Smith, G.P. (1990). Searching for peptide ligands with an epitope

library. Science 249, 386-390.

16. Geysen, H.M., Meleon, R.H., and Barteling, S.J. (1984). Use of peptide synthesis

to probe viral antigens for epitopes to a resolution of a single amino acid. Proc.

Natl. Acad. Sci. USA 81, 3998-4002.

17. Mattheakis, L.C., Bhatt, R.R., and Dower, W.J. (1994). An in vitro polysome

display system for identifying ligands from very large peptide libraries. Proc.

Natl. Acad. Sci. USA 91, 9022-9026.

18. Roberts, R.W., and Szostak, J.W. (1997). RNA-peptide fusions for the in vitro

selection of peptides and proteins. Proc. Natl. Acad. Sci. USA 94, 12297-12302.

19. Nemoto, N., Miyamoto-Sato, E., Husimi, Y., and Yanagawa, H. (1997). In vitro

virus: Bonding of mRNA bearing puromycin at the 3’-terminal end to the C-

terminal end of its encoded protein on the ribosome in vitro. FEBS Lett. 414, 405-

408.

20. Merryman, C., Weinstein, E., Wnuk, S.F., and Bartel, D.P. (2002). A bifunctional

tRNA for in vitro selection. Chem. Biol. 9, 741-746. 135 21. Liu, R., Barrick, J.E., Szostak, J.W., and Roberts, R.W. (2000). Optimized

synthesis of RNA-protein fusions for in vitro protien selction. Methods Enzymol.

318, 268-293.

22. Liu, R., Marik, J., and Lam, K.S. (2003). Design, synthesis, screening, and

decoding of encoded one- bead one-compound peptidomimetic and small

molecule libraries. Methods Enzymol. 369, 271-287.

23. Smith, G.P., and Petrenko, V.A. (1997). Phage Display. Chem. Rev. 97, 391-410.

24. Kurz, M., Kuang, G., and Lohse, P.A. (2000). An efficient synthetic strategy for

the preparation of nucleic acid-peptide and protein libraries for in vitro evolution

protocols. Molecules 5, 1259-1264.

25. Matthews, D.J., and Wells, J.A. (1993). Substrate phage: selection of protease

substrates by monovalent phage display. Science 260, 1113-1117.

26. Nixon, A.E. (2002). Phage display as a tool for protease ligand discovery. Curr.

Pharm. Biotechnol. 3, 1-12.

27. Zwick, M.B., Shen, J., and Scott, J.K. (1998). Phage-displayed peptide libraries.

Curr. Opin. Biotechnol. 9, 427-436.

28. Bjorklund, M., and Kiovunen, E. (2004). Steps towards phage display libraries

with an extended amino acid repertoire. Lett. Drug Design Dis. 1, 163-167.

29. Li, S., Millward, S., and Roberts, R. (2002). In vitro selection of mRNA display

libraries containing an unnatural amino acid. J. Am. Chem. Soc. 124, 9972-9973.

30. Noren, C.J., Anthony-Cahill, S.J., Griffith, M.C., and Schultz, P.G. (1989). A

general method for the site-specific incorporation of unnatural amino acids into

protein. Science 244, 182-188.

136 31. Bain, J.D., Diala, E.S., Glabe, C.G., Dix, T.A., and Chamberlin, A.R. (1989).

Biosynthetic site-specific incorporation of a non-natural amino acid into a

polypeptide. J. Am. Chem. Soc. 111, 8013-8014.

32. Frankel, A., Li, S., Starck, S.R., and Roberts, R.W. (2003). Unnatural RNA

display libraries. Curr. Opin. Struct. Biol. 13, 506-512.

33. Tian, F., Tsao, M.-L., and Schultz, P.G. (2004). A phage display system with

unnatural amino acids. J. Am. Chem. Soc. 126, 15962-15963.

34. Nielsen, H., Engelbrecht, J., Brunak, S., and von Heijne, G. (1997). Identification

of prokaryotic and eukaryotic signal peptides and prediction of their cleavage

sites. Protein Eng. 10, 1-6.

35. Franklin, M.C., et al. (2003). Structure and function analysis of peptide

antagonists of melanoma inhibitor of apoptosis (ML-IAP). Biochemistry 42,

8223-8231.

36. Peters, E.A., Schatz, P.J., Johnson, S.S., and Dower, W.J. (1994). Membrane

insertion defects caused by positive charges in the early mature region of protein

pIII of filamentous phage fd can be corrected by prlA suppressors. J. Bacteriol.

176, 4296-4305.

37. Brenner, S., and Lerner, R.A. (1992). Encoded combinatorial chemistry. Proc.

Natl. Acad. Sci. USA 89, 5381-5383.

38. Needels, M.C., et al. (1993). Generation and screening of an oligonucleotide-

encoded synthetic peptide library. Proc. Natl. Acad. Sci. USA 90, 10700-10704.

39. Seneci, P. (2001). Direct deconvolution techniques for pool libraries of small

organic molecules. J. Rec. Sig. Trans. Res. 21, 377-408.

137 40. Seneci, P. (2001). Encoding techniques for pool libraries of small organic

molecules. J. Rec. Sig. Trans. Res. 21, 409-445.

41. Mitsopoulos, G., Walsh, D.P., and Chang, Y.T. (2004). Tagged library approach

to chemical genomics and proteomics. Curr. Opin. Chem. Biol. 8, 26-32.

42. Wang, P., Fu, H., Snavely, D.F., Freitas, M.A., and Pei, D. (2002). Screening

combinatorial libraries by mass spectrometry. 2. Identification of optimal

substrates of protein tyrosine phosphatase SHP-1. Biochemistry 41, 6202-6210.

43. Sadowski, I., Stone, J.C., and Pawson, T. (1986). A noncatalytic domain

conserved among cytoplasmic protein-tyrosine kinases modifies the kinase

function and transforming activity of Fujinami sarcoma virus p130gag-fps. Mol.

Cell Biol. 6, 4396-4408.

44. DeClue, J.E., Sadowski, I., Martin, G.S., and Pawson, T. (1987). A conserved

domain regulates interactions of the v-fps protien-tyrosine kinase with the host

cell. Proc. Natl. Acad. Sci. USA 84, 9064-9068.

45. Moran, M.F., Koch, C.A., Anderson, D., Ellis, C., England, L., Martin, G.S., and

Pawson, T. (1990). Src homology region 2 domains direct protein-protein

interactions in signal transduction. Proc. Natl. Acad. Sci. USA 87, 8622-8626.

46. Pawson, T., Raina, M., and Nash, P. (2002). Interaction domains: from simple

binding events to complex cellular behavior. FEBS Lett. 513, 2-10.

47. Songyang, Z., et al. (1993). SH2 domains recognize specific phosphopeptide

sequences. Cell 72, 767–778.

48. Edman, P. (1950). Method for determination of the amino acid sequence in

peptides. Acta Chem. Scand. 4, 283-293.

138 49. James, P. (1997). Proteinidentification in the post-genome era:the rapid rise of

proteomics. Quart. Rev. Biophys. 30, 279-331.

50. Chait, B.T., Wang, R., Beavis, R.C., and Kent, S.B.H. (1993). Protein ladder

sequencing. Science 262, 89-92.

51. Youngquist, R.S., Fuentes, G.R., Lacey, M.P., and Keough, T. (1995). Generation

and screening of combinatorial peptide libraries designed for rapid sequencing by

mass spectrometry. J. Am. Chem. Soc. 117, 3900-3906.

52. Wang, P., Arabaci, G., and Pei, D. (2001). Rapid sequencing of library-derived

peptides by partial Edman degradation and mass spectrometry. J. Comb. Chem. 3,

251-254.

53. Tadjamulia, M.L., Srivastava, P.C., and Knapp, F.F. (1985). Evaluation of the

brain-specific delivery of radioiodinated (iodophenyl)alky-substituted amines

coupled to a dihydropyridine carrier. J. Med. Chem. 28, 1574-1580.

54. Rajagopalan, P.T.R., Grimme, S., and Pei, D. (2000). Characterization of

cobalt(II)-substituted peptide deformylase: function of the metal ion and the

catalytic residue Glu-133. Biochemistry 39, 779-790.

55. Pham, V., Tropea, J., Wong, S., Quach, J., and Henzel, W.J. (2003). High-

throughput protein sequencing. Anal. Chem. 75, 875-882.

56. Mahoney, W.C. (1985). An amino-terminal tryptophan derivative which is

refaractory to Edman degradation. Anal. Biochem. 147, 331-335.

57. Bienvenut, W.V., et al. (2002). Matrix-assisted laser desorption/ionization-

tandem mass spectrometry with high resolution and sensitivity for identification

and characterization of proteins. Proteomics 2, 868-876.

139 58. Pawson, T., and Nash, P. (2003). Assembly of cell regulatory systems through

protein interaction domains. Science 300, 445-452.

59. Bork, P., Schultz, J., and Ponting, C.P. (1997). Cytoplasmic signaling domains:

the next generation. Trends. Biochem. Sci. 22, 296-298.

60. De Souza, D., et al. (2002). SH2 domains from suppressor of cytokine signaling-3

and protein tyrosine phosphatase SHP-2 have similar binding specificities.

Biochemistry 41, 9229-9236.

61. Muller, K., et al. (1996). Rapid identification of phosphopeptide ligands for SH2

domains: screening of peptide libraries by fluorescence-activated bead sorting. J.

Biol. Chem. 271, 16500-16505.

62. Rickles, R.J., et al. (1994). Identification of Src, Fyn, Lyn, PI3K, and Abl SH3

domain ligands using phage display libraries. EMBO J. 13, 5598-5604.

63. Gram, H., Schmitz, R., Zuber, J.F., and Baumann, G. (1997). Identification of

phosphopeptide ligands for the Src-homology 2 (SH2) domain of Grb2 by phage

display. Eur. J. Biochem. 246, 633-637.

64. King, T.R., Fang, Y., Mahon, E.S., and Anderson, D.H. (2000). Using a phage

display library to identify basic residues in A-Raf required to mediate binding to

the Src homology 2 domains of the p85 subunit of phsophatidylinositol 3’-kinase.

J. Biol. Chem. 275, 36450-36456.

65. Sibler, A.-P., Kempf, E., Glacet, A., Orfanoudakis, G., Bourel, D., and Weiss, E.

(1999). In vivo biotinylated recombinant antibodies: high efficiency of labelling

and application to the cloning of active anti-human IgG1 Fab fragmnets. J.

Immunol. Methods 224, 129-140.

140 66. Sugimoto, S., Lechleider, R.J., Shoelson, S.E., Neel, B.G., and Walsh, C.T.

(1993). Expression, purification, and characterization of SH2-containing protein

tyrosine phosphatase, SH-PTP2. J. Biol. Chem. 268, 22771-22776.

67. Freeman, R.M., Jr., Plutzky, J., and Neel, B.G. (1992). Identification of a human

src homology 2-containing protein-tyrosine phosphatase: putative homologue of

Drosophila Corkscrew. Proc. Natl. Acad. Sci. USA 89, 11239-11243.

68. Tridandapani, S., et al. (1997). Recruitment and phosphorylation of SH2-

containing inositol phosphatase and Shc to the B-cell Fcγ immunoreceptor

tyrosine-based inhibitory motif peptide motif. Mol. Cell Biol. 17, 4305-4311.

69. Pei, D., Lorenz, U., Klingmuller, U., Neel, B.G., and Walsh, C.T. (1994).

Intramolecular regulation of protein tyrosine phosphatase SH-PTP1: a new

function for Src Homology 2 domains. Biochemistry 33, 15483-15493.

70. Pei, D., Wang, J., and Walsh, C.T. (1996). Differential functions of the two Src

homology 2 domains in protein tyrosine phosphatase SH-PTP1. Proc. Natl. Acad.

Sci. USA 93, 1141-1145.

71. Sweeney, M.C., and Pei, D. (2003). An improved method for rapid sequencing of

support-bound peptides by partial Edman degradation and mass spectrometry. J.

Comb. Chem. 5, 218-222.

72. Neel, B.G., Gu, H., and Pao, L. (2000). The ‘Shp’ing news: SH2 domain-

containing tyrosine phosphatases in cell signaling. Trends. Biochem. Sci. 28, 284-

293.

73. Ravetch, J.V., and Lanier, L.L. (2000). Immune inhibitory receptors. Science 290,

84-89.

141 74. Waksman, G., et al. (1992). Crystal structure of the phosphotyrosine recognition

domain SH2 of v-src complexed with tyrosine-phosphorylated peptides. Nature

358, 646-653.

75. Eck, M.J., Atwell, S.K., Shoelson, S.E., and Harrison, S.C. (1994). Structure of

the Regulatory Domains of the Src-Family Tyrosine Kinase Lck. Nature 368,

764-769.

76. Lee, C.-H., et al. (1994). Crystal structures of peptide complexes of the amino-

terminal SH2 domain of the Syp tyrosine phosphatase. Structure 2, 423-438.

77. Burshtyn, D.N., Yang, W., Yi, T., and Long, E.O. (1997). A novel

phosphotyrosine motif with a critical amino acid at position -2 for the SH2

domain-mediated activation of tyrosine phosphatase SHP-1. J. Biol. Chem. 272,

13066-13072.

78. Beebe, K.D., Wang, P., Arabaci, G., and Pei, D. (2000). Determination of binding

specificity of the SH2 domains of protein tyrosine phosphatase SHP-1 through the

screening of a combinatorial phosphotyrosyl peptide library. Biochemistry 39,

13251-13260.

79. Liao, H., et al. (2000). Structure of the FHA1 domain of yeast Rad53 and

identification of binding sites for both FHA1 and its target protein Rad9. J. Mol.

Biol. 304, 941-951.

80. Bruhns, P., et al. (2000). Molecular basis of the recruitment of the SH2 domain-

containing inositol 5-phosphatases SHIP1 and SHIP2 by FcγRIIB. J. Biol. Chem.

275, 37357-37364.

142 81. Tangye, S.G., et al. (1999). Cutting edge: human 2B4, an activating NK cell

receptor, recruits the protein tyrosine phosphatase SHP-2 and the adaptor

signaling protein SAP. J. Immunol. 162, 6981-6985.

82. Angata, T., et al. (2002). Cloning and characterization of human Siglec-11. J.

Biol. Chem. 277, 24466-24474.

83. Huber, M., et al. (1999). The carboxyl-terminal region of biliary glycoprotein

controls its tyrosine phosphorylation and association with protein-tyrosine

phosphatases SHP-1 and SHP-2 in epithelial cells. J. Biol. Chem. 274, 335-344.

84. Gavrieli, M., Watanabe, N., Loftin, S.K., Murphy, T.L., and Murphy, K.M.

(2003). Characterization of phosphotyrosine binding motifs in the cytoplasmic

domain of B and T lymphocyte attenuator required for association with protein

tyrosine phosphatases SHP-1 and SHP-2. Biochem. Biophys. Res. Commun. 312,

1236-1243.

85. De Vet, E.C.J.M., Aguado, B., and Campbell, R.D. (2001). G6b, a novel

immunoglobulin superfamily member encoded in the human major

histocompatibility complex, interacts with SHP-1and SHP-2. J. Biol. Chem. 276,

42070-42076.

86. Sui, L., et al. (2004). IgSF13, a novel human inhibitory receptor of the

immunoglobulin superfamily, is preferentially expressed in dendritic cells and

monocytes. Biochem. Biophys. Res. Commun. 319, 920-928.

87. Alvarez-Errico, D., et al. (2004). IREM-1 is a novel inhibitory receptor expressed

by myeloid cells. Eur. J. Immunol. 34, 3690-3701.

143 88. Fanger, N.A., et al. (1998). The MHC class I binding proteins LIR-1 and LIR-2

inhibit Fc receptor-mediated signaling in monocytes. Eur. J. Immunol. 28, 3423-

3434.

89. Cella, M., et al. (1997). A novel inhibitory receptor (ILT3) expressed on

monocytes, macrophages, and dendritic cells involved in antigen processing. J.

Exp. Med. 185, 1743-1751.

90. Cantoni, C., et al. (1999). Molecular and functional characterization of IRp60, a

member of the immunoglobulin superfamily that functions as an inhibitory

receptor in human NK cells. Eur. J. Immunol. 29, 3148-3159.

91. Burshtyn, D.N., et al. (1996). Recruitment of tyrosine phosphatase HCP by the

killer cell inhibitory receptor. Immunity 4, 77-85.

92. Olcese, L., et al. (1996). Human and mouse killer-cell inhibitory receptors recruit

PTP1C and PTP1D protein tyrosine phosphatases. J. Immunol. 156, 4531-4534.

93. Fry, A.M., Lanier, L.L., and Weiss, A. (1996). Phosphotyrosines in the killer cell

inhibitory receptor motif of NKB1 are required for negative signaling and for

association with protein tyrosine phosphatase 1C. J. Exp. Med. 184, 295-300.

94. Lewis, J., et al. (2001). Distinct interactions of the X-linked lymphoproliferative

syndrome product SAP with cytoplasmic domains of members of the CD2

receptor family. Clin. Immunol. 100, 15-23.

95. Xu, M.J., Zhao, R., and Zhao, Z.J. (2000). Identification and characterization of

leukocyte-associated Ig-like receptor-1 as a major anchor protein of tyrosine

phosphatase SHP-1 in hematopoietic cells. J. Biol. Chem. 275, 17440-17446.

96. Carretero, M., et al. (1998). Specific engagement of the CD94/NKG2-A killer

inhibitory receptor by the HLA-E class Ib molecule induces SHP-1 phosphatase 144 recruitment to tyrosine-phosphorylated NKG2-A: evidence for receptor function

in heterologous transfectants. Eur. J. Immunol. 28, 1280-1291.

97. Bottino, C., et al. (2001). NTB-A, a novel SH2D1A-associated surface molecule

contributing to the inability of natural killer cells to kill Epstein-Barr virus

infected B cells in X-linked lymphoproliferative disease. J. Exp. Med. 194, 235-

246.

98. Mousseau, D.D., Banville, D., L'Abbe, D., Bouchard, P., and Shen, S.H. (2000).

PILRα, a novel immunoreceptor tyrosine-based inhibitory motif-bearing protein,

recruits SHP-1 upon tyrosine phosphorylation and is paired with the truncated

counterpart PILRβ. J. Biol. Chem. 275, 4467-4474.

99. Fournier, N., et al. (2000). FDF03, a novel inhibitory receptor of the

immunoglobulin superfamily, is expressed by human dendritic and myeloid cells.

J. Immunol. 165, 1197-1209.

100. Wong, M.X., and Jackson, D.E. (2004). Regulation of B cell activation by

PECAM-1: Implications for the development of autoimmune disorders. Curr.

Pharm. Design 10, 155-161.

101. Zhao, Z.J., and Zhao, R. (1998). Purification and cloning of PZR, a binding

protein and putative physiological substrate of tyrosine phosphatase SHP-2. J.

Biol. Chem. 273, 29367-29372.

102. Zhao, R., and Zhao, Z.J. (2000). Dissecting the interaction of SHP-2 with PZR, an

immunoglobulin family protein containing immunoreceptor tyrosine-based

inhibitory motifs. J. Biol. Chem. 275, 5453-5459.

103. Xu, M.J., Zhao, R., and Zhao, Z.J. (2001). Molecular cloning and characterization

of SPAP1, an inhibitory receptor. Biochem. Biophys. Res. Commun. 280. 145 104. Doody, G.M., et al. (1995). A role in B cell activation for CD22 and the protein

tyrosine phosphatase SHP. Science 269, 242-244.

105. Law, C.-L., et al. (1996). CD22 associates with protein tyrosine phosphatase 1C,

Syk, and Phospholipase C-γ1 upon B cell activation. J. Exp. Med. 183, 547-560.

106. Taylor, V.C., et al. (1999). The myeloid-specific sialic acid-binding receptor,

CD33, associates with the protein-tyrosine phosphatases, SHP-1 and SHP-2. J.

Biol. Chem. 274, 11505-11512.

107. Ikehara, Y., Ikehara, S.K., and Paulson, J.C. (2004). Negative regulation of T Cell

receptor signaling by Siglec-7 (p70/AIRM) and Siglec-9. J. Biol. Chem. 279,

43117-43125.

108. Yu, Z., Lai, C.M., Maoui, M., Banville, D., and Shen, S.H. (2001). Identification

and characterization of S2V, a novel putative siglec that contains two V set Ig-like

domains and recruits protein-tyrosine phosphatase SHPs. J. Biol. Chem. 276,

23816-23824.

109. Fujioka, Y., et al. (1996). A novel membrane glycoprotein, SHPS-1, that binds

the SH2-domain-containing protein tyrosine phosphatase SHP-2 in response to

mitogens and cell adhesion. Mol. Cell Biol. 16, 6887-6899.

110. Veillette, A., Thibaudeau, E., and Latour, S. (1998). High expression of inhibitory

receptor SHPS-1 and its association with protein-tyrosine phosphatase SHP-1 in

macrophages. J. Biol. Chem. 273, 22719-22728.

111. Florio, T., et al. (2000). Somatostatin receptor 1 (SSTR-1)-mediated inhibition of

cell proliferation correlates with the activation of the MAP kinase cascade: role of

the phosphatase SHP-2. J. Physiol. 94, 239-250.

146 112. Okazaki, T., Maeda, A., Nishimura, H., Kurosaki, T., and Honjo, T. (2001). PD-1

immunoreceptor inhibits B cell receptor-mediated signaling by recruiting src

homology 2-domain-containing tyrosine phosphatase 2 to phosphotyrosine. Proc.

Natl. Acad. Sci. USA 98, 13866-13871.

113. Daigle, I., Yousefi, S., Colonna, M., Green, D.R., and Simon, H.-U. (2002). Death

receptors bind SHP-1 and block cytokine-induced anti-apoptotic signaling in

neutrophils. Nature Med. 8, 61-67.

114. Li, C., and Friedman, J.M. (1999). Leptin receptor activation of SH2 domain

containing protein tyrosine phosphatase 2 modulates Ob receptor signal

transduction. Proc. Natl. Acad. Sci. USA 96, 9677-9682.

115. Myers, M.G., Jr., et al. (1998). The COOH-terminal tyrosine phosphorylation

sites on IRS-1 bind SHP-2 and negatively regulate insulin signaling. J. Biol.

Chem. 273, 26908-26914.

116. Kitzig, F., Martinez-Barriocanal, A., Lopez-Botet, M., and Sayos, J. (2002).

Cloning of two new splice variants of Siglec-10 and mapping of the interaction

between Siglec-10 and SHP-1. Biochem. Biophys. Res. Commun. 296, 355-362.

117. Kiener, P.A., et al. (1997). Co-ligation of the antigen and Fc receptors gives rise

to the selective modulation of intracellular signaling in B cells. j. Biol. Chem.

272, 3838-3844.

118. Luque, L.E., Grape, K.P., and Junker, M. (2002). A highly conserved arginine is

critical for the functional folding of inhibitor of apoptosis (IAP) BIR domains.

Biochemistry 41, 13663-13671.

119. Reed, J.C., et al. (2003). Comparative analysis of apoptosis and inflammation

genes of mice and humans. Genome Res. 13, 1376-1388. 147 120. Liston, P., Young, S.S., Mackenzie, A.E., and Korneluk, R.G. (1997). Life and

death decisions: the role of the IAPs in modulating programmed cell death.

Apoptosis 2, 423-441.

121. Huang, Y., et al. (2001). Structural basis of caspase inhibition by XIAP:

Differential roles of the linker versus the BIR domain. Cell 104, 781-790.

122. Renatus, M., Stennicke, H.R., Scott, F.L., Liddington, R.C., and Salvesen, G.S.

(2001). Dimer formation drives the activation of the cell death protease caspase 9.

Proc. Natl. Acad. Sci. USA 98, 14250-14255.

123. Shiozaki, E.N., et al. (2003). Mechanism of XIAP-mediated inhibition of caspase-

9. Mol. Cell 11, 519-527.

124. Chai, J., et al. (2001). Structural basis of caspase-7 inhibition by XIAP. Cell 104,

769-780.

125. Riedl, S.J., et al. (2001). Structural basis for the inhibition of caspase-3 by XIAP.

Cell 104, 791-800.

126. Martins, L.M., et al. (2002). The serine protease Omi/HtrA2 regulates apoptosis

by binding XIAP through a Reaper-like motif. J. Biol. Chem. 277, 439-444.

127. LaCasse, E.C., Baird, S., Korneluk, R.G., and MacKenzie, A.E. (1998). The

inhibitors of apoptosis (IAPs) and ther emerging role in cancer. Oncogene 17,

3247-3259.

128. Huang, Q., et al. (2000). Evolutionary conservation of apoptosis mechanisms:

lepidopteran and baculoviral inhibitor of apoptosis proteins are inhibitors of

mammalian caspase-9. Proc. Natl. Acad. Sci. USA 97, 1427-1432.

148 129. Hawkins, C.J., Ekert, P.G., Uren, A.G., Holmgren, S.P., and Vaux, D.L. (1998).

Anti-apoptotic potential of insect cellular and viral IAPs in mammalian cells. Cell

Death Differ. 5, 569-576.

130. Hay, B.A., Wassarman, D.A., and Rubin, G.M. (1995). Drosophila homologs of

baculovirus inhibitor of apoptosis proteins function to block cell death. Cell 83,

1253-1262.

131. Yang, Y., Fang, S., P., J.J., Weissman, A.M., and Ashwell, J.D. (2000). Ubiquitin

protein ligase activity of IAPs and their degradation in proteasomes in response to

apoptotic stimuli. Science 288, 874-877.

132. Scott, F.L., et al. (2005). XIAP inhibits caspase-3 and -7 using two binding sites:

evolutionarily conserved mechanism of IAPs. EMBO J. 24, 645-655.

133. Srinivasula, S.M., et al. (2001). A conserved XIAP-interaction motif in caspase-9

and Smac/DIABLO regulates caspase activity and apoptosis. Nature 410, 112-

116.

134. Liu, Z., et al. (2000). Structural basis for binding of Smac/DIABLO to the XIAP

BIR3 domain. Nature 408, 1004-1008.

135. Verhagen, A.M., et al. (2000). Identification of DIABLO, a mammalian protein

that promotes apoptosis by binding to and antagonizing IAP proteins. Cell 102,

43-53.

136. ClustalW http://www.ebi.ac.uk/clustalw/.

137. Yang, D., Welm, A., and Bishop, J.M. (2004). Cell division and cell survival in

the absence of survivin. Proc. Natl. Acad. Sci. USA 101, 15100-15105.

138. Reed, J.C., and Bischoff, J.R. (2000). BIRinging chromosomes through cell

division—and survivin’ the experience. Cell 102, 545-548. 149 139. Derijard, B., et al. (1994). JNK1: a protein kinase stimulated by UV light and Ha-

Ras that binds and phosphorylates the c-Jun activation domain. Cell 76, 1025-

1037.

140. Takahashi, R., Deveraux, Q., Tamm, I., Welsh, K., Munt-Assa, N., Salvesen,

G.S., and Reed, J.C. (1998). A single BIR domain of XIAP sufficient for

inhibiting caspases. J. Biol. Chem. 273, 7787-7790.

141. Sun, C., Nettesheim, D., Liu, Z., and Olejniczak, E.T. (2005). Solution structure

of human Survivin and its binding interface with Smac/DIABLO. Biochemistry

44, 11-17.

142. Rodi, D.J., Soares, A.S., and Makowski, L. (2002). Quantitative assessment of

peptide sequence diversity in M13 combinatorial peptide phage display libraries.

J. Mol. Biol. 322, 1039-1952.

143. Feng, T., Tsao, M.-L., and Schultz, P.G. (2004). A phage display system with

unnatural amino acids. J. Am. Chem. Soc. 126, 15962-15963.

144. Deveraux, Q.L., Takahashi, R., Salvesen, G.S., and Reed, J.C. (1997). X-linked

IAP is a direct inhibitor of cell-death proteases. Nature 388, 300-304.

145. Hegde, R., et al. (2003). The polypeptide chain-releasing factor GSPT1/eRF3 is

proteolytically processed into an IAP-binding protein. J. Biol. Chem. 278, 38699-

38706.

146. Verhagen, A.M., et al. (2002). HtrA2 promotes cell death through its serine

protease activity and its ability to antagonize inhibitor of apoptosis proteins. J.

Biol. Chem. 277, 445-454.

150 147. Galvan, V., Kurakin, A.V., and Bredesen, D.E. (2004). Interaction of checkpoint

kinase 1 and the X-linked inhibitor of apoptosis during mitosis. FEBS Lett. 558,

57-62.

148. Li, Q., Liston, P., Schokman, N., Ho, J.M., and Moyer, R.W. (2005). Amsacta

moorei entomopoxvirus inhibitor of apoptosis suppresses cell death by binding

Grim and Hid. J. Virol. 79, 767-778.

149. Sambrook, J., Fritsch, E.F., and Maniatis, T. (1989). Molecular Cloning, 2nd

Edition (Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press).

150. Laemmli, U. (1970). Cleavage of structural proteins during the assembly of the

head of bacteriophage T4. Nature 227, 680-685.

151. Holmes, D.S., and Quigley, M. (1981). A rapid boiling method for the preparation

of bacterial plasmids. Anal. Biochem. 114, 193-197.

152. Sanger, F., Nicklen, S., and Coulson, A. (1977). DNA sequencing with chain-

terminating inhibitors. Proc. Natl. Acad. Sci. USA 74, 5463-5467.

151