Generating peptide probes against cancer-related peptide recognition domains using phage display

by

Yogesh Hooda

A thesis submitted in conformity with the requirements for the degree of Master of Science Graduate Department of Molecular Genetics University of Toronto

© Copyright by Yogesh Hooda 2012

Generating peptide probes against cancer-related peptide recognition domains using phage display

Yogesh Hooda

Master of Science

Graduate Department of Molecular Genetics

University of Toronto

2012

Abstract

Peptide recognition domains (PRD) bind to short linear motifs on their biological partners and are found in several cellular pathways including those found to be critical in tumorigenesis. In this study, I aimed to generate peptide probes against PRDs present on involved in ovarian cancer. Using bioinformatics, I identified 66 potential PRDs present on these proteins. I then used peptide phage display to successfully generate peptides against 27 of the 66 domains.

To validate my results, I performed an extensive literature review and structural analysis. For several cases, the phage-display derived binding preferences are similar to previously reported studies. However, for a subset of domains, I identified non-canonical binding preferences that have not been reported previously in literature. The binding preferences obtained in this study can be used to design intracellular probes for studying the role of these PRDs in biological pathways important in ovarian cancer.

ii

Acknowledgments

It is hard to imagine that it has already been two years since I started my graduate studies. Working at the Sidhu and the Kim labs has been a wonderful experience and I would like to take this opportunity to thank all the people who helped me through this part of my life.

First and foremost, I would like to thank my supervisors Dev Sidhu and Philip Kim who gave me the opportunity to work in their labs and guided me throughout my stay here. They both have been an immense source of inspiration. I would also like to thank my committee members, Frank Sicheri and Tim Hughes, for their constructive criticism and suggestions.

During my stay, I came across an awesome set of people at both the Sidhu and the Kim labs. I would especially like thank Joan for all his discussions and guidance during the latter part of my project. In the Sidhu lab, I would like to give special thanks to Maruti, Andreas, Megan, Haiming, Gang and Linda for their kind help and support. I would also like to thank Mark, Recep, Simon, Roland, Clare, Kurt and Ylva in the Kim lab.

I also would like to thank all my friends here in Toronto, around the world and back home in India for sharing with me their adventures or misadventures and listening to mine. Their friendships made Toronto a great city to stay in. I would especially like to thank Senjuti for her incredible love and encouragement. Her companionship has kept me going through all the ups and downs of my project.

Lastly, I am grateful to my family for their constant love and support. They have always been a tremendous source of strength and inspiration for me.

iii

Table of Contents

Acknowledgements ...... iii Table of Contents ...... iv List of Tables ...... vii List of Figures ...... viii List of Appendices ...... x 1 Introductions ...... 1 1.1 Overview ...... 2 1.2 Peptide recognition domains ...... 4 1.2.1 Properties of domain-peptide interactions ...... 4 1.2.2 Role in biological pathways ...... 5 1.3 Peptide-recognition domains as therapeutic targets ...... 6 1.3.1 Bcl-2 ...... 7 1.4 Studying peptide recognition domains using peptide probes ...... 10 1.4.1 Understanding structure and binding properties ...... 10

1.4.2 Elucidating biological role ...... 11

1.4.3 Validating drug targets ...... 12

1.4.4 Drug Discovery ...... 13

1.5 Goal of the project ...... 14 2 Identification of peptide recognition domains essential in ovarian cancer ...... 16 2.1 Introduction ...... 17 2.1.1 Whole Genome RNAi screen ...... 17 2.1.2 Computational methods to identify peptide recognition domains ...... 18 2.2 Methods ...... 19 2.2.1 Identification of peptide recognition domains ...... 19 2.2.2 Manual filtering and literature review of potential domains from PepX ...... 20 2.3 Results and Discussion ...... 20 2.3.1 Analysis of 1695 obtained from whole genome RNAi screens ...... 20 2.3.2 Literature review of domain list obtained from the computational pipeline ...... 21

iv

2.4 Summary ...... 25 3 Identification of peptide binders using phage display ...... 27 3.1 Introduction ...... 28 3.1.1 Displaying peptide on phage particles ...... 28 3.1.2 Site-directed mutagenesis and phage library design ...... 29 3.1.3 Selection strategy ...... 31 3.1.4 Selection of tight-binding peptides and identification of binding specificities ...... 32 3.2 Methods ...... 33 3.2.1 Strains ...... 33 3.2.1 expression and purification ...... 33 3.2.2 Library construction and design ...... 34 3.2.3 Phage Display selections ...... 35 3.2.4 Calculation of enrichment ratio and pool ELISA ...... 36 3.2.5 Clonal ELISA and sequencing of peptides ...... 37 3.2.6 Structural modeling of phage-display results ...... 38 3.3 Results and Discussion ...... 38 3.3.1 Selection of peptide binders using phage display ...... 38 3.3.2 Validation of tight binder using clonal ELISA ...... 39 3.3.3 Identification of binding preferences and literature validation ...... 39 3.3.4 Cellular signaling ...... 46 3.3.4.1 SH3 ...... 48 3.3.4.2 PDZ ...... 49 3.3.4.3 G-alpha ...... 50 3.3.4.4 14-3-3 ...... 52 3.3.4.5 Penta-EF hand ...... 53 a Calpain small regulatory subunit ...... 53 b Programmed Cell Death Protein 6 ...... 54 3.3.5 Cytoskeleton regulation ...... 55 3.3.5.1 Dynein light chain ...... 55 3.3.5.2 CAP/Gly ...... 57 3.3.5.3 Alpha-vinculin head domain ...... 58

v

3.3.6 Intracellular transport ...... 60 3.3.6.1 Importin beta ...... 60 3.3.6.2 UBA ...... 61 3.3.6.3 Bro1...... 62 3.3.6.4 Clathrin heavy chain ...... 63 3.3.7 Genome Regulation ...... 65 3.3.7.1 PCNA ...... 65 3.3.7.2 OB-fold ...... 66 3.3.7.3 Ligand binding domain of nuclear receptors ...... 67 3.3.7.4 WD40 domains ...... 69 3.3.7.5 TRF homology domain ...... 71 3.3.8 Miscellaneous ...... 71 3.3.8.1 SWIB/MDM2 ...... 72 3.3.8.2 HORMA domain ...... 73 3.3.8.3 eIF4E ...... 74 3.3.8.4 Ubiquitin ...... 75 3.4 Summary ...... 76 4 Conclusions ...... 78 4.1 Summary of work ...... 79 4.2 Future experiments ...... 79 4.3 Potential avenues for research ...... 80 4.4 Application of phage-derived peptides ...... 81 4.5 Final remarks ...... 81 5 References ...... 82

vi

List of Tables

Table 1 List of all PRDs that have been investigated as targets for cancer therapies ...... 9 Table 2 Summary of results obtained from DOMINO and PepX ...... 20 Table 3 List of 66 domains selected for phage display experiments ...... 21 Table 4 Summary of phage display results for 66 domains ...... 40

vii

List of Figures

Figure 1 Representative structures of PRDs present in the ...... 3 Figure 2 Peptide and small-molecule inhibitors of Bcl-2 ...... 7 Figure 3 Combinatorial methods for determining binding preferences of PRDs ...... 11 Figure 4 Generating intracellular Dvl2-PDZ inhibitors using phage display ...... 13 Figure 5 Fluorescence polarization assays for discovery of small-molecule inhibitors ...... 14 Figure 6 Whole genome RNAi screen for identifying essential genes in ovarian cancer ...... 17 Figure 7 Computational strategy for identifying potential peptide binding domains ...... 19 Figure 8 Schematic diagram of M13 bacteriophage ...... 29 Figure 9 Oligonucleotide-directed mutagenesis with an ssDNA template ...... 30 Figure 10 Phage display selection for PRDs ...... 31 Figure 11 Strategy for validating phage display results ...... 46 Figure 12 Overview of phage display results ...... 47 Figure 13 Structural and literature analysis of SH3 domains ...... 48 Figure 14 Structural and literature analysis of PDZ domains ...... 50 Figure 15 Structural and literature analysis of Gα subunits ...... 51 Figure 16 Structural and literature analysis of 14-3-3 ...... 52 Figure 17 Structural and literature analysis of Penta-EF hand of CAPNS1 ...... 54 Figure 18 Structural and literature analysis of Penta-EF hand of PDCD6 ...... 55 Figure 19 Structural and literature analysis of Dynein light chains ...... 56 Figure 20 Structural and literature analysis of CAP/Gly domain of p150glued ...... 58 Figure 21 Structural and literature analysis of Alpha-catenin/vinculin head domain ...... 59 Figure 22 Structural and literature analysis of Importin beta ...... 60 Figure 23 Structural and literature analysis of NXF1-UBA domain ...... 61 Figure 24 Structural and literature analysis of Alix-Bro1 domain ...... 63 Figure 25 Structural and literature analysis of Clathrin terminal domain ...... 64 Figure 26 Structural and literature analysis of PCNA ...... 66 Figure 27 Structural and literature analysis of RPA 70N OB-fold domain ...... 67 Figure 28 Structural and literature analysis of NR1H4 ligand binding domain ...... 69 Figure 29 Structural and literature analysis of WDR5 ...... 70 Figure 30 Structural and literature analysis of TRFH domain of TERF1 ...... 71

viii

Figure 31 Structural and literature analysis of SWIB/MDM2 ...... 72 Figure 32 Structural and literature analysis of HORMA domain ...... 73 Figure 33 Structural and literature analysis of eIF4E ...... 74 Figure 34 Structural and literature analysis of ubiquitin ...... 76

ix

List of Appendices

Appendix A List of ovarian cancer lines ...... 88 Appendix B Protein sequences of 66 domains ...... 89 Appendix C Vector sequences ...... 95

x

1

1. INTRODUCTION

2

1.1 Overview Protein-protein interactions form the molecular basis of key regulatory and signalling pathways inside cells [1]. They help in assembly of macromolecular complexes and formation of modular interaction networks that regulate key biological processes such as cell cycle, signal transduction and embryogenesis. Protein-protein interactions can be roughly categorized into two types: i) domain-domain interactions where two domains bind to each other and ii) domain- peptide interaction where domains bind to an unfolded linear motif on its partner [1]. Domain- peptide interactions are mediated by peptide-recognition domains (PRD) which bind to small linear motifs that often lie in disordered regions on their interaction partners [2]. Peptide recognition domains (PRD) are ubiquitous and assemble transient regulatory networks, identify post-translation marks, regulate signalling molecules and provide specificity to enzymatic complexes. Given the important role of domain-peptide interactions in key cellular processes, these interactions are frequently targeted by toxins or somatic found in diseases including cancer. In cancer, amplified and exogenous domain-peptide interactions often lead to rewiring of cellular networks, thereby promoting tumour growth, invasion and metastasis [3]. A number of such interactions, e.g. p53/mdm2, IAP/caspase and Bcl-2/BH3, have been targeted using small molecules and peptide-based drugs [4]. Peptide recognition domains (PRD) mediating these interactions form an emerging class of cancer drug targets. PRDs have been extensively studied by peptide-based probes. These probes can be derived from known natural binding partners or generated using combinatorial methods such as phage display and peptide microarrays [5]. Peptide-based probes have been extensively used to elucidate the biochemical and structural properties of interactions mediated by PRDs. These peptide probes have also been used to design intracellular reagents to target interactions mediated by PRD and to better understand cellular pathways [5]. Such probes may also be used to identify PRDs that may serve as potential cancer drug targets [5]. Peptide-based probes against PRD have also led to development of small molecule therapeutics (e.g. ABT737 against Bcl2, Nutilins against MDM2 etc.) against various cancers [4]. However, the number of PRDs whose role in cancer-related pathways is well-understood is limited. This is largely due to the lack of high affinity and specific probes to study these domains. In order to address this issue, I propose to use phage display to systematically generate peptide probes against different families of PRDs. The main focus of this study

3

Figure 1: Representative structures of PRDs present in the human genome. Peptide recognition domains are structurally diverse and use different binding surface to bind to peptides. Domains as defined by CATH are shown in grey; peptide ligands are shown in green.

4 is to develop peptide probes against the PRDs present on proteins involved in ovarian cancer that were identified by our collaborators. Peptide probes developed here may serve as valuable tools to understand the role of PRDs in ovarian cancer-related biological pathways. In the following sections, I will discuss progress made in the study of peptide recognition domains. First, I will discuss structural properties of interactions mediated by peptide recognition domains and describe some of their biological roles. Second, I will highlight examples of peptide recognition domains that have been identified as drug targets for specific types of cancer. Third, I will present studies that have used peptide probes against PRDs that demonstrates their utility as intracellular probes. Finally, I will elaborate the specific aims of the current study.

1.2 Peptide recognition domains As discussed above, PRDs bind to specific linear motifs on their interaction partners. Since the discovery of the first PRDs, a large number of such domains have been identified in the human proteome. This progress can be attributed to the development of high-throughput experimental methods that allow the identification of a large number of protein-protein interactions[6]. Such studies have established that a significant proportion of protein-protein interactions within a cell are often domain-peptide interactions mediated by dedicated peptide recognition domains [7]. Analysis of structures of peptide recognition domains in complex with their natural or synthetic partners have led to the elucidation of their mode of function [8]. Further experimental and computational studies have highlighted the roles played by PRDs inside cells.

1.2.1 Properties of domain-peptide interactions – Peptide recognition domains are found in structurally diverse protein families (Figure 1) that are catalogued in databases such as DOMINO [9], ADAN [10] and PepX [11]. Domain-peptide interactions are often mediated by a groove-like binding interface present on peptide recognition domains. The binding interface of domain- peptide interactions is ~500-1000 Å2, which is smaller than those of domain-domain interactions [12]. Domain-peptide interactions are often transient and exhibit binding affinity in the low- micromolar to nanomolar range. Structurally, the binding interface on the domain is often the largest pocket on the surface of the PRD [12]. The binding surface is more hydrophobic than the overall surface of the protein but less hydrophobic than the protein core. A small subset of the residues present on the binding surface contributes to most of the binding energy. These residues

5 known, as “hotspot residues” are essential for binding and change of any of these residues can severely affect the domain-peptide interaction [13]. The interaction between the PRD and the peptide may cause conformational changes on either the PRD or its interaction partner [12]. Many peptide recognition domains also possess enzymatic function such as the G-alpha subunits. The binding of peptide partners to the switch II/alpha III groove on the G-alpha subunit increases the GTPase function of the G-alpha subunit [14]. Furthermore, in the case of ligand binding domain of nuclear receptors, the peptide binding is often dependent on the binding of small molecule/hormone to the ligand-binding pocket [15]. The binding of ligand produces a conformation change allowing the peptide to bind to the hydrophobic pocket. Using these approaches, PRDs often couple peptide binding and enzymatic/ligand-binding functions present on the same domain. The binding preferences of PRDs are highly diverse. While PRD such as the SH3, WW, and EVH1 bind to motifs rich in proline residues, other domains such as the PDZ and CAP-Gly domains specifically recognize hydrophobic C-terminal residues of the peptides [5].The binding sites of PRDs are often present on the disordered regions on the interacting proteins. These peptide motifs undergo a disordered to order transition upon binding [12]. For example, the binding of co-activators to ligand binding domains of nuclear receptors leads to helical conformation of co-activator [15]. This produces a conformation change in the co-activator molecule that favors the assembly of active transcriptional complex [15]. A class of PRDs specifically recognizes post-translation modification such as phosphorylation (SH2, 14-3-3, FHA), acetylation (bromodomain) and methylation (chromodomains)[16]. Such domains act as readers of post-translation modifications and link these modifications to downstream cellular pathways. For example the SH2 domains of scaffold proteins such as Grb2 and Vav link phosphorylation of receptor tyrosine kinases to activation of intracellular kinases (Raf,Ras and Erk) [1].

1.2.2 Role in biological pathways: Proteins that regulate key cellular processes, such as signal transduction, cell cycle, protein trafficking, cytoskeleton organization and expression are composed of catalytic and interaction domains [1]. Catalytic domains such as kinases, GTPase, proteases etc. catalyze specific molecular reactions (phosphorylation and peptide bond digestion) that help in propagation of cellular signals. However these domains often have limited inherent

6 specificity i.e. they can bind to a large set of binding partners. Interaction domains regulate the specificity of catalytic domains either directly by recruiting substrates of catalytic domains or indirectly by controlling their spatio-temporal localization [17]. As previously mentioned, a large number of the interaction domains are PRDs that bind to specific peptide motifs present on their interacting partners. Thus, PRDs recruit and confine signaling proteins to an appropriate sub- cellular location and determine the specificity with which enzymes interact with their targets, analogous to association of protein kinases with their substrates. There are several evolutionary and mechanistic advantages provided by PRDs to cellular networks. Firstly, domain-peptide interactions often evolve faster than domain-domain interactions, allowing cellular pathways to be rewired with minimal changes [17]. Secondly, PRDs that act as scaffolds increase the speed of signal transduction by increasing the local concentration of enzymes and substrates [17]. Thirdly and most importantly, PRDs provide specificity to the information flow in intracellular networks [17]. This allows cells to accurately process the diverse range of signals they receive and produce the appropriate biochemical responses. A key function of PRD is to identify specific post-translation modifications (PTM). Protein function and localization are often regulated by a vast and dynamic array of PTM. By recognizing specific PTMs, PRDs link PTMs to cellular organization thereby sensing “the state of the proteome” [16]. PRDs are also involved in cellular protein trafficking. Specific peptide tags on the protein determine the transport of cellular proteins. PRDs such as importin-beta and clathrin recognize specific peptide tags and transport cellular proteins to their desired sub- cellular location [1].

1.3 Peptide-recognition domains as therapeutic targets Given their central role in biological pathways, peptide recognition domains are often targeted by pathogenic proteins and somatic mutations observed in various diseases including cancer [3]. Hence, PRDs are an emerging class of therapeutic targets. Small-molecule and peptide-based drugs have been developed against a handful of PRD families. These drugs are currently in various stages of pre-clinical and clinical drug development. In this section, I will describe the work done on an important family of PRD that has been extensively studied as cancer drug target: B-cell lymphoma-2 or Bcl-2. I will discuss the functions of this family of domains inside

7 cells and how these functions are often mis-regulated in cancer. I will also briefly discuss the various techniques that were used to develop potential therapeutic agents against these domains.

Figure 2: Peptide and small-molecule inhibitors of Bcl-2. A) The structure of 16- peptide derived from Bad in complex with Bcl-xl. B) The interaction surface of Bcl-xl and Bad-peptide. The interaction is mediated by a hydrophobic pocket on Bcl-xl. C) The structure of the small-molecule (ABT-737) in complex with Bcl-xl. D) The interaction surface of Bcl-xl and ABT-737. ABT-737 binds to same hydrophobic pocket on Bcl-xl and competes with its natural interaction with Bak and Bax. 1.3.1 Bcl-2: B-cell lymphoma-2 (Bcl-2) family of proteins are important regulators of mitochondrial outer membrane permeabilization (MOMP), an important step in apoptotic pathway inside the cells [18]. They regulate the release of cytochrome-c from the mitochondria and the activation of caspases which are the proteases responsible for breakdown of key cellular components during apoptosis. Bcl-2 family forms an alpha-helical structure consisting of repeats

8 called the baculovirus-homology domains (BH-domains). This protein family can be divided according to their positive or negative effect on apoptosis. While family members such as Bax, Bak etc initiate apoptosis; members such as Bcl-2, Bcl-xl & Mcl-2 inhibit apoptosis. In normal conditions, the interplay of these proteins regulates the apoptotic pathway. However upon the induction of stress conditions or DNA damage, pro-apoptotic members of Bcl-2 family are activated. The pro-apopototic Bcl-2 family member’s form pores in the outer membrane of the mitochondria, allowing cytochrome-c and other proteins to initiate apoptosis [18]. In various cancers, somatic mutations cause over-expression of anti-apoptotic members of Bcl-2 family leading to abrogation of apoptosis [19]. The anti-apoptotic member of Bcl-2 interact with the pro-apoptotic members of Bcl-2 and inhibit their ability to form pores in the mitochondrial outer membrane. This interaction is mediated by a linear alpha-helical peptide (BH3) on pro-apoptotic members binding to the hydrophobic pocket on the anti-apoptotic members(Figure 2). This interaction is critical for the abrogation of apoptosis and inhibition of this interaction leads to activation of apoptosis [20]. Synthetic peptides that mimic the BH3 peptides were shown to successfully induce apoptosis in different cancer cell lines and mouse models[21]. Later, small molecules identified by using structural-activity relationship (SAR) analysis were found to be efficacious in promoting apoptosis; re-establishing the observations made with the synthetic peptides (Figure 2). These small molecules bound to pro-apoptotic members of the Bcl-2 family with nano-molar affinity and showed good pharmacokinetic properties [22]. Small molecule inhibitors of Bcl-2 are currently in various stages of clinical or pre-clinical investigation. Several key observations can be derived from the study of the aforementioned Bcl-2 example. Firstly, PRDs that involved in critical cellular processes (such as apoptosis in the case of Bcl2)are often mis-regulated in a wide-spectrum of cancers. Secondly, somatic mutations are often sufficient to amplify the cellular levels of PRDs thereby modulating the cellular processes they are involved in. This also provides an opportunity for drug development, because in theory these perturbations can be reversed by specifically blocking the interactions mediated by these PRDs. Thirdly, small molecules developed against Bcl-2 bind with an affinity comparable to that of the native partner protein or peptide by binding to a small subset of residues on the interaction surface. These residues often, but not always, correspond to the “hotspot residues”. The Bcl-2

9 example highlights the possibility of identifying small compounds that can inhibit interactions mediated by PRDs with desirable affinity and specificity. A number of PRDs have been identified as drug targets (reviewed in Table 1). These domains follow the characteristics described above i.e. amplification in cancer, involvement in key cellular pathway and presence of hotspot residues. These characteristics have made PRDs an good target for anti-cancer drug development.

Drug target Interaction partner Role Remarks

MDM2 P53 Negative regulation Mdm2 down-regulates tumor of p53 protein suppressor protein p53 in cancer, targeted using small molecule and peptides IAP Caspase Inhibition of caspase IAP’s negatively regulate caspases; targeted by peptides and peptidomimetics

Dvl2 PDZ Fzd-7 Involved in Wnt Dvl2 PDZ domain binds to signalling internal peptide; targeting using peptides and small molecules

N-Cadherin N-cadherin, E- Cell adhesion N-cadherin binds to HAV cadherin sequence at EC1 domain of different cadherin

Plk1-PBD CDC25C, Chk2, G2/M checkpoint Polo-box domain of Plk1 binds PDBIP1 regulation to phospho-peptides; targeted using small molecules and peptidomimetics ICN-1/CSL MAML1 , MAML-1 binds to hydrophobic involved in Notch groove on ICN1-CSL, targeting signalling by peptiomimetic (stapled peptide) eIF4E eIF4G, 4E-BP1 Translation initiation eIF4E binds to 16-mer segment factor within eIF4G and 4E-BP1; targeted using peptides and small molecules Menin MLL Histone modification Menin-MLL fusion leads to over- expression of Hox genes; targeted using small molecules

Table 1: List of all PRD's that are currently being investigated as targets for cancer therapies

10

1.4 Studying peptide recognition domains using peptide probes Peptides can be generated against PRDs from natural partners or directed evolution methods such as phage display and SPOT microarray. These peptides have been used as valuable tools for studying the biological roles of PRDs.

1.4.1 Understanding structure and binding properties: To obtain detailed understanding of interactions between a PRD and its biological partner, it is important to characterize the structural and molecular aspects of the interaction in-depth. Peptides derived from interacting partners can be used for studying these biophysical binding properties. Further insights can be obtained by using combinatorial methods such as SPOT microarray and phage display. SPOT microarrays are generated by synthesizing peptides on a cellulose membrane [23]. On a single membrane, different peptides can be obtained which can sample all the amino acids at each position of the peptide. The domain is incubated with the microarray and fluorometric/colorimetric methods can be used to study the binding of domains at each spot. By analyzing the intensity of each position on the microarray, we can obtain the binding preference of a given domain which is often visually represented as position weight matrix (PWM) or sequence logo (Figure 3). The height of an amino acid in the PWM is indicative of the relative frequency at that position. One of the first applications of this method was to study the SH3 domains [24]. A key advantage of SPOT/peptide microarray is the ability to study PRDs that bind to modified peptides, such as phosphorylated, acetylated peptides [25,26]. Phage display is a powerful technique that can be used to obtain binding preferences of PRDs. In phage display, peptides are fused to the coat protein of filamentous bacteriophage such that the peptides are displayed on the surface of the bacteriophage[27]. Using site-directed mutagenesis, large 1010 library of phages can be generated where each phage displays a unique peptide. These libraries can then be panned against immobilized PRDs to capture phages that bind specifically to the domain of interest. The peptide displayed by these tightly-bound phages can be identified by sequencing the DNA of the phage (Figure 3). There are several advantages of phage display over other approaches. These include cost effectiveness and ability to re-use libraries to probe against a large set of PRDs. Previous studies in the Sidhu lab have used peptide

11 phage display to understand binding preferences of well-studied domains such as the PDZ and the SH3 domains [28,29]. Both phage display and peptide microarrays are extremely effective in understanding the binding preferences of domains and can be complemented with biophysical methods such as iso- thermal calorimetry (ITC), surface plasmon resonance (SPR) and fluorescent polarization to obtain binding affinities of the peptide recognition domains. Computational methods (machine learning/structural methods) have also been developed to predict the binding preferences of PRDs [28,29].

Figure 3: Combinatorial methods for determining binding preferences of peptide recognition domains. Combinatorial methods such phage display and peptide microarrays have been extensively used to isolate binding preferences of diverse set of domains. These preferences are often represented as position weight matrixes (PWM) that are based on the occurrence frequency of a given amino acid.

1.4.2 Elucidating the biological role: Peptides have been extensively helpful in elucidating the biological roles of PRDs. Peptide motifs obtained from combinatorial screens can be used to screen the proteome to identify potential binding partners. These partners can then be confirmed using yeast 2-hybrid and/or pull-down assays [30]. A number of such peptides motifs are available in databases such as ELM [7]. Peptides that bind specifically and with high-affinity can be used as intracellular probes against PRDs. Linear peptides are often unstable and cannot cross the cellular membrane. However, recent developments in molecular biology and peptide chemistry have significantly

12 increased the stability and cellular permeability of peptides. Chemical modifications can greatly increase the stability and affinity of peptides that bind to a target domain [31,32]. Fluorescent labels and probes can be attached to the peptides to track their localization inside cells and model organisms [33]. Further, peptide probes can be readily fused to cell-penetrating peptides (CPP) to increase their cellular permeability in various mammalian cell lines. Other entities (such as NLS for nuclear localization) can be attached to the peptides to deliver the peptides in specific cellular organelles[34]. Finally, transduction methods can be used to express peptides inside mammalian cell lines. These methods include lenti-viral based expression systems that effective delivery of peptides to different cell-lines and model organisms [34]. The key advantage of lenti-viral expression vectors is that the DNA encoding the peptide is incorporated in the genome that allows stable expression of peptides in dividing and non-dividing cell lines [34]. One of the central advantages of using peptides as probes for biology is their ability to modulate protein function in various aspects. Peptides bind to epitopes on the proteins that are often distinct from the enzymatic pocket [35]. This allows peptides to modulate domain function as either antagonist or agonists.

1.4.3 Validating drug targets: One of the central motivations of modern biology is to identify therapeutic targets for diseases. Numerous methods are available to perturb activity of a particular gene, e.g. gene knockouts, RNAi and small molecule drugs. Drugs act at the protein- level and perturb the natural biological function of a given protein. Drugs whose perturbations result in resolution of pathogenic phenotype are ideal candidates for therapy. Drugs are often small organic molecules that can be identified using structure-based approaches or high- throughput screens. However, development of high specificity and affinity small molecules often requires large monetary and time investment. These costs make the development of small molecule drugs against all known PRDs prohibitive. By prioritizing PRD to those that play a role in the onset of a given disease, we can greatly increase the efficiency of drug discovery. To this end, peptides may act as probes for identification of drug targets for diseases. As described previously, peptides can be generated against a large number of PRDs and introduced into mammalian cells. Peptides modulate their targets by various methods and can produce distinct phenotypes. In some disease models, peptide modulators may lead to alleviation of the disease. This has been

13 previously used to identify various domains such as MDM2 and Bcl-2 as drug targets for cancer [4]. Previously in the Sidhu lab, phage display was used to generate high affinity and specificity peptides against the PDZ domain of Dishevelled-2 (Dvl-2) (Figure 4) [36]. The interaction between Dvl-2 and Frizzled-7 receptor is mediated by the PDZ domains of Dvl-2 and an internal peptide on Fzd-7. This interaction is critical for the activation of the Wnt-signalling, a critical step for tumorigenesis in different cancers; and deletion of Dvl-2 PDZ or Fzd-7 peptide motif leads to the abrogation of Wnt-signalling [36]. Reasoning that inhibition of PDZ-Dvl may disrupt Wnt signalling, Zhang et al introduced phage-derived peptides into cells and observed that the peptides specifically targeted PDZ domain of Dvl2 inside cells and down-regulated β-catenin signalling stimulated by Wnt signalling [36]. Thus by using peptide probes against Dvl2-PDZ, , Zhang et al were able to demonstrate that targeting PDZ-Dvl2 may be a viable means for attenuating the growth of cancer cells that are dependent on Wnt-mediated signalling pathways and established Dvl2-PDZ as a valid drug target for cancer. Similar studies may be used to identify potential drug targets for diseases including can

Figure 4: Generating intracellular Dvl2-PDZ inhibitors using phage display. (A) Phage display was done against Dvl2 PDZ using internal peptide library. The phage-dereived binding preference was then used to design peptide inhibitor: pep-N3. (B) Pep-N3 structure in complex with Dvl-2 PDZ confirms the binding mode of the peptide. (C) For intracellular uptake, Pep-N3 was fused to antennapedia and introduced in Wnt3a responsive human embryonic kidney (HEK) 293S cell lines. Real-time cellular uptake of pep-N3 is observed using time-lapsed microscopy. (D) Normalized TOPglow reporter activity was measured in Wnt3a-stimulated HEK293S cells after 18 h of treatment with pen-N3 shows inhibition if Wnt/TCF-dependent signalling. Pen-N3 does not inhibit TCF response signal in the control APC mutant HCT-15 colon cell line. Western-blots show Pen-N3 inhibits Wnt-signalling by inhibiting the accumulation of beta-catenin in HEK293S cells treated with Wnt3a (right side panel). (Figures from Zhang et al 2008)

14

1.4.4 Drug discovery: Recent studies have suggested that peptide probes may themselves serve as a starting point for drug discovery against peptide recognition domains. In their direct application, peptides themselves may serve as modulators of peptide recognition domains [37, 38]. Modifications such stapling or cyclization may be performed to improve the pharmacokinetic properties [31, 32]. Another popular method of drug discovery is to develop peptidomimetics. Peptidomimetics are organic molecules that mimic peptides. Peptidomimetics can be generated by replacing natural amino acids by amino-acid derivatives that make the peptide molecule less-susceptible to degradation and increases stability [39]. Finally peptide probes may themselves be used to design fluorescent detection assays that can then be used to screen large libraries of compounds (Figure 5). Often these screens include identification of compounds that can displace the natural peptide from the binding site [40].

Figure 5: Fluorescence polarization assays for the discovery of small-molecule inhibitors of domain-peptide interactions. The chief method for identification of small-molecule compounds against domain-peptide interaction is to use a fluoroscent polarization assay. Natural or synthetic peptide binder is fluorescently tagged and incubated with the target domain. A library of small-molecule drugs is screened to identify molecules that compete with the binding of fluorescent peptide with target domain. This allows rapid screening of large small molecule libraries.

1.5 Goal of the project Motivated by recent developments, the long-term goal of this project is to identify PRDs that may act as novel cancer targets. To do this, we have focussed on shortlisted protein targets against ovarian cancer provided by our collaborators, Dr. Rob Rottapel and Dr. Jason Moffat. Ovarian cancer is the second most common gynaecological cancer in women and currently has

15 only one approved therapy. The 5-year survival rate for this cancer is only 47% highlighting the need to develop targeted therapies against ovarian cancer. To assist in the development of novel therapies, our collaborators used a whole RNAi screens to knockdown ~16000 human genes in 15 different ovarian cancer cell lines [41]. Using this screen, they identified 1695 genes whose knockdown severely affected proliferation of ovarian cancer cells. Based on the current literature, we hypothesized that PRDs present on these ovarian cancer essential genes play an essential role in tumorigenesis and may serve as drug targets for further investigation.

The study has two key aims: 1) Identify peptide recognition domains present on these 1695 gene targets in ovarian cancer using computational methods; and 2) Generate peptide binders against these domains using peptide phage display.

The peptide binders generated here can then be used to design intracellular probes to specifically modulate interactions mediated by these PRDs and study the effect of these perturbations on cellular pathways in specific ovarian cancer cell-lines. Such peptide can also be used to identify PRDs that may serve as drug targets for ovarian cancer. Finally, peptide inhibitors can be used to design assays to identify small-molecules that target interactions mediated by these PRDs.

16

2 Identification of peptide recognition domains essential in ovarian cancer

17

2.1 Introduction The first goal of the project was to identify potential peptide recognition domains present on a shortlisted group of proteins involved in ovarian cancer. The shortlisted candidates were based on whole genome RNAi screens performed by our collaborators Dr. Jason Moffat and Dr. Rob Rottapel and represent genes that are essential for cancer growth. The domains present on these proteins were matched to known PRDs present in existing databases (PepX and DOMINO) in order to identify potential PRDs.

Figure 6: Whole genome RNAi screen for identifying essential genes in ovarian cancer. A library of ~80,000 lenti-virus encoded shRNAs is used to selectively knockdown 16,000 human genes in different cancer cell lines. Each shRNA is identified using a single barcode. The genomic DNA is harvested at multiple time points. Genomic DNA from all the time points is hybridized on a microarray chip to study the specific growth rate of each unique cell type. shRNAs that knockdown genes essential for cancer proliferation significantly affect the growth rate and can be detected by microarray analysis. Using this approach our collaborators generated a list of 1695 human genes that effect the growth of 15 different ovarian cancer cell lines.

2.1.1 Whole Genome RNAi screen: RNAi is a powerful technique to knockdown specific genes and study their effect on biological pathways. RNAi studies have illuminated roles of various genes and helped to obtain a better understanding of their functions. Developments in the cellular biology and molecular genetics techniques have made it possible to perform genome- wide RNAi screens, where in a single experiment a large number of the genes in the human genome can be targeted. These screens are performed using a library of short hairpin RNA

18

(shRNA) targeting many human genes where each shRNA is encoded inside a lenti-viral expression vector. Lenti-viral expression vectors allow specific shRNAs to be incorporated into cells. The library of shRNA is incubated with cancer cell lines to allow incorporation of a unique shRNA inside a given cell in the population. Upon infection, the cells are allowed to proliferate for 3-4 weeks, after which shRNAs that have been selectively depleted or enriched are identified using microarrays, deep sequencing or high-content screening. Such pooled screens can be used to define genes necessary for cancer cell proliferation/survival in cell culture [42]. In this study, we focused on screens done on ovarian cancer by our collaborators Dr. Jason Moffat and Dr. Rob Rottapel (Figure 6) [41]. Using a library of 78,432 shRNAs, Marcotte et al targeted 16,056 genes in 15 different ovarian cancer cell lines. The cancer cell lines used in their analysis are attached in Appendix A. To select genes that are essential for ovarian cancer, Marcotte et al. followed the dropout rate of each shRNA. These dropout rates were derived by calculating the slope between the measured microarray expression intensity at each time point relative to the initial time point. These dropout rates were used to define the GARP (Gene Activity Ranking Profile) score for each gene. Genes with negative GARP score represent genes that are critical for cancer proliferation. Using a cut-off to select highly essential genes, Marcotte et al. identified 1695 genes that were essential across all ovarian cancer cell lines. In this study, I used these 1695 genes as an input to my computational pipeline. There are specific reasons for focussing on genes obtained from whole genome RNAi screens: 1) The whole genome RNAi screens provide an unbiased list of genes that are important for ovarian cancer growth and hence allows to focus on a much reduced set of genes, and 2) Given that knockdown of these genes hamper cancer growth, the screen provides evidence that peptide inhibitors of peptide recognition domains present on these genes may also negatively effect cancer growth which can be rapidly tested by delivering peptides inside ovarian cancer cell lines. Peptides that successfully re-capitulate the results obtained whole genome RNAi screens can actually serve as templates for development of cancer therapeutics.

2.1.2 Computational methods to identify peptide-recognition domains: Recent developments in high throughput experimental methods for identifying protein-protein interactions have led to rapid identification of protein interaction partners. Experimentally known domain-peptide pairs are documented in databases such as DOMINO [9] (a database of known

19 domain-peptide interactions), PEPX [11] (a database of domain-peptide interactions where the co-crystal structures are available) and ADAN [10] (database of selected domain-peptide interactions with known motifs). Other sources include ELM [7] which is a database of peptide- like motifs but also includes information about domains that bind to such peptide motifs. Computational methods have also been developed to identify novel PRDs [43]. These approaches use sequence or structural similarity to known peptide binding domains present in databases mentioned above as a metric to identify novel PRDs. In this study, we focussed on peptide recognition domains present on protein targets provided by our collaborators. To develop a computational method for the identification of PRDs, we focused on domains that share high sequence similarity to known PRDs present in PEPX and DOMINO database (Figure 7).

Figure 7: Computational strategy for identifying potential peptide binding domains. Using databases such DOMINO and PepX, I obtained a high-confidence list of known peptide recognition domains. Using this list, I searched for domains present on target gene list that share high sequence similarity to known peptide recognition domains. The final domain list was then optimized by including information regarding the domain boundaries and expression conditions. For phage display, I selected domains which can be readily expressed in bacterial system and have crystal structures in complex with a known peptide. Using this approach, I was able to identify 66 domains from the list of 1695 genes provided by our collaborators.

20

2.2 Methods Identification of peptide recognition domains: The proteins encoded by each of the 1695 genes were obtained using Uniprot annotations. Using BLAST, all the full-length proteins were searched against domains present in PepX and DOMINO. The sequences with greater than 70% sequence identity were retained while the other sequences were discarded. The sequence cut-off was chosen based on previous studies that show that accurate structural models (<2 Å rmsd) The domain boundaries were then annotated based on the closest available domain structure available in (PDB). Figure 7 shows the entire computational pipeline used for this method.

2.2.1 Manual filtering and literature review of potential domains from PEPX: Based on the results obtained from computational pipeline, only domains present in PepX were selected for further investigation. Each domain in PepX has a crystal structure bound to a peptide ligand present in the database. This gave us confidence that such a domain: 1) binds to peptides; and 2) can be expressed in bacterial cells. Further, the structures of domain-peptide ligand can be used validate the results obtained from phage display. The domains that were obtained from PepX were manually analyzed to remove false positives. These included domains that do not make direct contacts with the peptide or those that share the interaction surface with another domain. Further, domains that could not be expressed in bacterial cells for crystallization were also removed from the list. Finally, literature review was done to analyze the domains obtained from our computational pipeline.

Database used Description No. of domains obtained PepX Database of domains with known peptide 86 recognition for which structures are available DOMINO Database of domains known to bind to 390 peptides PDB Database of all known structures 885 Pfam Total no. of domains in our dataset 5567

Table 2: Summary of results obtained from DOMINO and PepX.

2.3 Results and Discussion 2.3.1 Analysis 1695 genes obtained from whole genome RNAi screens: 390 domains were

21 obtained from DOMINO and 86 from PepX (Table 2). As reasoned above, the 86 domains from PepX were selected for further study. Upon manual analysis and filtration, 66 domains were selected as targets for phage display.

2.3.2 Literature review of domain list obtained from the computational pipeline: The list of 66 domains represents a good initial set for the study. To get a better understanding of the kind of domains present in this list, a thorough literature review was performed and structural information for each of these domains was annotated (Table 3). Structurally, these 66 domains represent 42 domain families. These include well-characterized peptide binding domains such as the SH3 and the PDZ domains, which have been extensively studied for their peptide binding potential by structural, biochemical and combinatorial studies. I decided to keep these domains in our list as they can be used as positive controls for future experiments.

Domain Protein name Domain Related Comment Name boundary PDB structure PDZ (2.30.42.10) 1 Disc-large homolog 1 #1 224-310 2I0L (98.8) First PDZ domain ofDLG1; plays a role in planar cell polarity 2 Disc-large homolog 1 #2 319-405 1TP5 (82.9) Second PDZ domain ofDLG1; plays a role in planar cell polarity 3 Disc-large homolog 2 #2 193-279 2I0L (84) Second PDZ domain ofDLG2; plays a role in planar cell polarity 4 Disc-large homolog 2 #3 421-501 1TP5 (83.1) Third PDZ domain ofDLG2; plays a role in planar cell polarity 5 Disc-large homolog 4 #2 160-246 2I0L (89.2) Second PDZ domain of DLG4; ; plays a role in planar cell polarity 6 Disc-large homolog 4 #3 3113-393 1TP5 (97.1) Third PDZ domain ofDLG4; plays a role in planar cell polarity SH3 (2.30.30.40) 7 Growth-factor receptor 1-58 2VWF (93.3) N-terminal SH3 of Grb2, adaptor bound protein 2 protein in RTK signalling 8 Grb2-related protein 2 271-330 2W10(93.3) C-terminal SH3 of GRAP2, adaptor protein in RTK signalling 9 Phospholipase gamma 791-851 1YWO (93.4) Involved in RTK signalling 10 Sorbin and SH3 938 – 999 2O9V (70) Second SH3 domain of SORBS2, containing protein 2 interacts with Abl kinase Protein Kinase (3.30.200.20+1.10.510.10)

22

11 Mitogen-activated kinase 42-330 2FYS (87.8) Protein kinase signal cascade 3 12 PKA 44-298 2VO7 (99.4) cAMP signalling 13 PKB 44-298 2VO7 (92.9) cAMP signalling 14 Serine/threonine kinase 2 81-684 1WBP (84.6) Regulates p53, cell cycle 15 Aurora kinase B 2-344 2BFY (80.2) Involved in chromosomal segregation G-alpha subunit (3.40.50.300+1.10.400.10) 16 G-alpha (i) 1 2-354 1Y3A (100) Involved in G-protein signalling 17 G-alpha (i) 3 2-354 1Y3A (93.5) Involved in G-protein signalling 18 G-alpha (o) 1 2-354 1Y3A (72.1) Involved in G-protein signalling Ligand binding domain of (1.10.565.10) 19 Bile acid Receptor 256-474 3BEJ (100) Binds to bile acid, hormone receptor signalling 20 - 261-463 3E94 (86.2) Binds to retinoic acid, hormone gamma receptor signalling 21 528-777 1M2Z (99.6) Binds to cortisol, hormone receptor signalling 22 204-434 1NRL (100) Orphan nuclear receptor, hormone receptor signalling Dyenin light chain (3.30.740.10) 23 Dynein light chain 1 1-89 1CMI (100) Part of dynein motor complex 24 Dynein light chain 2 1-89 3E2B (96.6) Part of dynein motor complex RNA recognition module (3.30.70.330) 25 Splicing factor U2AF1 65-147 1JMT (99) mRNA splicing 26 Splicing factor 45 306-385 2PEH (100) mRNA splicing Profilin (3.30.450.30) 27 Profilin 1 1-140 2PAV (100) Regulates cytoskeleton 28 Profilin 2 1-140 2V8C (99.3) Regulates cytoskeleton Penta-EF hand (1.10.238.10) 29 Programmed cell death 23-191 2ZNE (100) Intracellular Ca2+ signalling receptor 6 30 Calpain small regulatory 1-268 1NX0 (97.1) Regulates Ca2+ dependent subunit 1 calpain protease complex (3.30.420.40+3.90.640.10) 31 Actin-gamma 1 1-375 3CHW (95.2) Highly conserved in eukaryotes; plays a role in cytoskeleton 32 Actin-gamma 2 3-374 2V52 (98.1) Highly conserved in eukaryotes; plays a role in cytoskeleton Beta-propeller (2.130.10.10)

23

33 Clathrin heavy chain 1 1-363 1UTC (100) Involved in endocytosis 34 WDR5 20-334 3EMH (100) Involved in histone modifications PH/PTB domain (2.30.29.30) 35 Dynamin 2 2-301 2AKA (87) Microtubule-associated protein 36 Disabled homolog 2 45-196 (97.5) Involved in endocytosis P-loop containing nucleotide triphosphate hydrolase (3.40.50.300) 37 RAC3 1-189 2QME (95.4) Intracellular G-protein signalling 38 RAD51 97-339 1N0W (100) DNA damage response Typrin-like serine protease (2.40.10.10) 39 Tissue-type plasminogen 311-562 1RTF (100) Extracellular protease activator 40 Acrosin 43-343 1FIW (71.1) Extracellular protease 14-3-3 (1.20.190.20) 41 14-3-3 eta 2-246 2O02 (75.5) Adaptor protein in signalling pathways AP50 domain (2.60.40.1170) 42 AP-2 subunit mu 122-435 1I31 (100) Involved in endocytosis Bcl-2 (1.10.437.10) 43 Bcl-2 like protein 1 1-233 3FDL (100) Regulates apoptosis Bro1 (1.25.40.280) 44 Alix 3-392 3C3R (100) Intracellular protein transport CAP/Gly (2.30.30.190) 45 Dynactin subunit 1 26-97 2HQH (100) Microtubule associated protein Caspase-like (3.40.50.1460) 46 Caspase 2 155-452 1PYO (100) Intracellular protease, apoptosis DNAse1-like (3.60.10.10) 47 DNAse 1 23-282 2D1K (78.8) DNAse involved in apoptosis eIF4E (3.30.760.10) 48 eIF4E 2-217 2V8Y (100) Translation initiation factor FERM (1.20.80.10) 49 Band 4.1-like protein 3 110-391 3BIN (100) Negative growth regulator Ig domain (2.60.40.10) 50 Immunoglobulin lamba- 38-213 1W72 (84.1) B-cell surface receptor like polypeptide 1 Importin-beta (1.25.10.10)

24

51 Importin beta-1 1-876 1QGR (99.8) Nuclear import Mad2A (3.30.900.10) 52 MAD2-like protein 1 2-205 2QYF (99.5) Anaphase cell cycle checkpoint MHC II (3.10.320.10) 53 DRB1 beta 30-227 1T5X (100) Antigen recognition PCNA (3.70.10.10) 54 Proliferating cell nuclear 1-261 2ZVM (100) DNA replication antigen OB-fold domain (2.40.50.140) 55 Replication factor A 70 2-121 2B3G (100) Formation of replication fork Serpin (3.30.497.10+2.30.39.10) 56 Plasma serine protease 20-406 1LQ8 (99.7) Inhibitor of serine protease inhibitor SH3-type barrels (3.40.50.300/2.30.30.40) 57 Volage-dependent L-type 65-411 1T3L (76.6) Ca2+ channel, G-protein calcium channel subunit signalling beta SWIB/MDM2 (1.10.245.10) 58 MDM4 26-106 3DAB (100) Regulator of p53, apoptosis TAP-UBA (1.10.8.10) 59 Nuclear export factor 1 565-619 (100) Nuclear export of mRNA Winged helix repressor DNA binding domain (1.10.10.10) 60 Transcription factor IIF 449-517 1J2X (100) General transcription TRFH (1.25.40.201) 61 Telomeric repeat-binding 58-268 3BQO (100) Regulates telomeric length factor 1 Factor Xa inhibitor (4.10.410.10) 62 Amyloid-like protein 2 306-364 1CA0 (71.2) Regulation of homeostasis Ubiqutin-like (3.10.20.90) 63 Ubiquitin-60S ribosomal 1-76 2D3G (100) Post-translation modification, protein L40 regulates protein function Vinculin (1.20.1490.10) 64 Vinculin 1-259 1YDI (100) Actin-filament binding protein XRCC4 (2.170.210.10+1.20.5.370) 65 XRCC4 1-213 1IK9 (98.1) Double stranded DNa break repair Tyrosine phosphatase (3.90.190.10)

25

66 Tyrosine-protein 1-310 3BRH (99.4) Regulator of tyrosine kinase SRC phosphatase non-receptor family of kinases type 22

Table 3: List of the 66 domains selected for the phage display experiments. The table shows the different protein families and their structural classification code (CATH) found by our computational method. The table also shows the boundary of the domain (defined by PDB structure) and the reference PDB structure. The sequence similarity between the reference structure and domain is shown in brackets.

Apart from these well-characterized domains, I also obtained 52 domains for which this study represents the first combinatorial study to identify their binding preferences. These 52 domains are part of 39 unique protein families. If we are able to obtain peptides against these domains using phage display, we can in principle extend phage display to other members of the family. These 52 domain families are involved in diverse cellular pathways including (Nuclear receptors), endocytosis (clathrin, AP2), cytoskeleton modelling (actin, vinculin, dynactin), receptor tyrosine signalling (Grap2, Grb2, PLCG1, 14-3-3) and apoptosis (Mdm4, Bcl-xl)to name a few. The 66 domains identified by our computational pipeline also include some known cancer targets. These include Bcl-xl which has been extensively targeted for its anti-apoptotic role and discussed previously in detail [18]. We also identified eIF4E, a translation initiation factor that is responsible for binding to mRNA caps and loading them on to ribosomes. In different cancers, eIF4E is over-expressed which leads to expression of mRNA with unstable 5’ UTR [40]. Presence of previously-known cancer targets in our data-set provides us with confidence that using our computational pipeline, we have been able to identify PRDs that are important in cancer.

2.4 Summary In this chapter, I have discussed the computational pipeline we used to identify potential peptide recognition domains on a shortlisted protein candidates involved in ovarian cancer. I utilized -based approach to identify domains present on each protein that are similar to known PRDs in the database PepX. Using this approach, I identified 66 domains that will serve as targets for my phage display experiments. For this study, I used essential genes obtained from whole genome RNAi screens as a surrogate for genes involved in ovarian cancer. A number of these genes are involved in key regulatory pathways that are conserved between normal cells and cancer cell lines and hence

26 may not represent viable cancer targets. To overcome these limitations, recent studies have integrated data from other functional genomics screens such as mRNA expression data, copy- number variations and exome sequencing to accurately predict proteins important for carcinogenesis [43]. While integration of data from multiple sources may help in generating a more refined list of cancer related proteins, such an analysis is beyond the scope of the current study. For identifying PRDs, I selected two databases, PepX and DOMINO. Both these databases provided me with a large number of potential peptide recognition domains. For further analysis, I focused on domains obtained from PepX. The domains obtained from PepX were then manually filtered to remove false positive hits. This provided me with a shortlisted list of 66 domains. This list included domains from distinct structural folds and biological pathways. Some of them have previously been studied in context of cancer in some cases these domains have themselves been established as drug targets. It is important to interpret the results obtained from computational pipeline in context of future experiments. Using a simple analysis, I was able to obtain a diverse set of potential PRDs that can be used as targets for phage display. The conclusions made from this study can be extended to other studies of similar origin.

27

3 Identification of peptide binders using phage display

28

3.1 Introduction After selecting the potential targets for phage display, my next aim was to generate peptides against each of these domain targets. To do this, I used phage display technology to screen large peptide libraries. As described previously, phage display is a directed evolution approach in which peptides can be displayed on the surface of filamentous bacteriophage, M13, using specialized vectors known as phagemid. Site-directed mutagenesis can then be used to generate large peptide libraries where each phage member displays a unique peptide on its surface. Phage display has been used extensively to generate high affinity and specificity peptides against different protein targets. In the Sidhu lab, peptide phage display has been used to identify binding preferences of a large number of human and yeast SH3 and PDZ domains [28, 29]. In the case of Dvl2-PDZ domains, phage-derived peptides were used as intracellular inhibitors of Fzd7-Dvl2 interactions; thereby knocking down Wnt signalling, an important signalling pathway [36]. In this section I will describe the results obtained from the phage display screens.

3.1.1 Displaying peptides on phage particles: In the Sidhu lab, we use M13, a single stranded DNA containing virus from the Inoviridae family for expression of peptides. M13 viruses infect gram-negative bacteria such as E. coli. There are several advantages to using M13 for phage display experiments. First, it follows a non-lytic life cycle which makes it easier to grow and propagate in the lab. Second, its DNA is present in single-stranded form which makes it possible to genetically display proteins on the surface of M13 using site-directed mutagenesis. The coat of M13 is made up of five proteins as shown in Figure 8. Two of these proteins – p8 and p3 have been used previously for displaying proteins. P3 protein is present in 5 copies on the phage and is required for infection. Various proteins such as antibodies and fibronectin have been successfully displayed on the surface using the p3 fusion without affecting infection of bacteria. Other protein that is regularly used for phage display is p8, or the major coat protein which is present all over the surface. Small peptides (<10 amino acids of length) can be fused to p8 without affecting the assembly and secretion of the phage particle. In this study, we use the p8 coat protein as it allows multiple copies of peptides to be present on the phage particle leading to selection of lower- affinity peptides [44]. To display peptides onto the M13 phage surface, specialized vectors called phagemids are required. Phagemid contains a single copy of the p8 phage protein under the influence of an

29

IPTG-inducible PTac promoter, an antibiotic-resistance cassette and a single and a double stranded origin of replication. The peptide is fused to the N-terminal end of the p8 coat protein such that it is expressed along with p8 inside the bacterial host [27]. Once the phagemid is introduced into the cell, it is replicated into multiple ssDNA copies inside the bacterial host. The infected cells can be selected using the resistance marker. To initiate formation of new virus particles, the cells are super-infected with modified M13 phage that acts as a “helper”. Helper phage leads to production of single-stranded phagemid DNA that can be effectively packaged into virion particles. The packaging unit also introduces the mutant coat protein produced by the phagemid. The abundance of mutant coat proteins is dictated by the IPTG concentration in the culture media and may help in optimizing the number of peptides displayed by the bacteriophage [27].

Figure 8: Schematic diagram of M13 bacteriophage. M13 filamentous phage is made up of 5 proteins. P8, the major coat protein is the most abundant coat protein that forms the cylinder around the phage ssDNA. The distal end of M13 assembles first and contains approximately three to four copies of p7 and p9. The proximal end is formed by five copies each of p6 and p3. The p3 coat protein is required for infection of the bacterial host. P8 and p3 coat proteins are used extensively for phage display.

3.1.2 Site-directed mutagenesis and phage library design: Once a protein is successfully displayed on the M13 phage coat, mutations can be introduced into its encoding DNA in order to generate vast numbers of variants. The ease of manipulating M13 ssDNA makes this phage an ideal system for the synthetic construction of libraries of up to 1011 unique clones. Changes to the phagemid DNA are performed in a series of reactions known as Kunkel mutagenesis. In brief, E coli cells deficient in deoxy uracil transphosphatase (dut) and uracil DNA deglycosylase (ung) are used to synthesise a uracil-rich version of the ssDNA phagemid (dU-ssDNA) that serves as the template for the mutagenesis reaction. Synthetic oligonucleotides that introduce mutations to the region of interest anneal to the dU-ssDNA template and serve as

30 primers for synthesis of the complementary strand. This reaction is completed in the absence of uridine to form covalently-closed circular double-stranded DNA (CCC-dsDNA) with an original uracil-rich DNA strand and a mutagenic DNA strand (Figure 9). Transformation of the CCC- dsDNA into a dut+/ung+ bacterial host results in the degradation of the uracil-rich strand and retention of the mutagenic strand. The CCC-dsDNA is then electroporated into a bacterial host infected with M13 helper phage to synthesize the phage library. Kunkel site-directed mutagenesis is ideal for phage display applications because it allows for complete control over library construction, starting from the design of the mutagenic oligonucleotides themselves to the annealing and synthesis conditions [44].

Figure 9: Oligonucleotide-directed mutagenesis with an ssDNA template. (A) A synthetic oligonucleotide (red arrow) is annealed to the template (dU-ssDNA). The oligonucleotide contains region with desired mutations flanked by perfectly complementary sequences. (B) Covalently-closed circular dsDNA (CCC-dsDNA) is enzymatically synthesized by T7 DNA polymerase and T4 DNA ligase. (C) CCC-dsDNA is introduced into an E. Colihost using electroporation. Different peptide libraries can be generated using different sets of mutagenic oligonucleotides. The library used for this project was obtained from Dr. Gang Chen, a post- doctoral fellow in the Sidhu lab. The length of the peptides is 16 amino acids and each position can accommodate any of the 19 amino acids (excluding cysteine). Cysteine is excluded because it may lead to cyclization and disruption of the linear structure of peptides. The oligonucleotides used in designing the library were obtained from TriLink Biotechnologies. These oligonucleotides were synthesized three nucleotides at a time instead of single nucleotide as used

31 by other vendors. This allows one codon per amino acid removing codon bias that is generally observed in oligonucleotides that use the NNK codons for randomization (where 32 codons code for 20 amino acids).

3.1.3 Selection strategy: The peptide library constructed can be used to screen for peptide binders against a target protein. After incubation of the library with the immobilized target, non- specific phage particles are removed through a series of washes. The remaining bound clones are eluted and amplified in a bacterial host, allowing for further rounds of screening to enrich for clones expressing proteins with the desired traits. Figure 9 shows the entire phage display selection pipeline used in this study. The success of the selection depends on both the quality of the phage display library and the quality of the protein targets [44].

Figure 10: Phage display selection for peptide recognition domains. The peptide library was incubated with immobilized antigen. The non-binders were washed away while positive binders attached to the plate. The phage library was eluted and amplified in a bacterial host. The process was repeated five times to obtain an enriched set of binders. The phage pools from Round 5 were introduced in a bacterial host and plated on LB plates. 96 colonies were picked for each domain and grown over night to obtain phage clones. Each of the 96 clones was tested for binding in phage ELISA. The clones that showed high enrichment ratio were sequenced. The DNA sequences obtained were processed and translated to obtain the peptide sequence. The peptides were aligned manually or using multiple sequence alignment tools to obtain peptide logos.

32

From the library standpoint, quality can be affected by library construction or display levels. Inefficient completion of the site-directed mutagenesis reaction may result in a large proportion of phage particles that display the wild-type p8 coat protein. Further, if the number of peptide copies on each phage particle is low, it may lead to weak binding. In both cases, such libraries would offer a reduced chance of identifying peptide against their targets. The diversity of the peptide library obtained from Dr. Gang Chen had been previously tested using phage titrations. The IPTG concentration required for adequate display was also known and well- documented. The library however was amplified for use in this study. Phage titrations were re- done to estimate the diversity and size of the peptide library before using the peptide library for further experiments. From the immobilized target side, the quality and stability of the target are both important factors in the success of a selection. For example, the presence of contaminants in impure protein samples or denaturation of the samples themselves can result in the enrichment of unwanted phage clones. Furthermore, the use of constructs that are unstable may result in a heterogeneous and inconsistent interface that differs between rounds and is not amenable to enrichment of binders against the intended target conformation. Consequently, SDS PAGE and spectro- photometry were used test the purity and quantity of the protein target. In light of these considerations, it is not only important to monitor the behaviour of the phage population throughout the selection but optimization of the selection conditions and reagents may be required for a successful outcome. One important consideration that was used to design the selection strategy for this study was the presence of a GST tag on each of the domains. To remove any peptides that bind to the GST tag, peptide library was pre-incubated in a well containing only GST. Selections were done in presence of high GST concentrations to further remove any weak GST binders.

3.1.4 Selection of tight-binding peptides and identification of binding specificities: The progress of the selection screen is determined through an Enzyme Linked Immunosorbant Assay (ELISA) and phage titrations. In phage titration, the phage obtained at the end of each round of selection is used to infect exponentially growing bacterial culture. Upon infection, the bacterial culture is serially diluted and plated on a plate containing the selectable marker for selecting the cells that were successfully infected by the virus. The number of viruses (colony forming units or

33 cfu) obtained after each round of selection is calculated by counting the colonies obtained in the serial dilutions. Enrichment ratio is defined as the ratio of the number of colony forming units (cfu/ml) obtained from the target well and the negative control well (BSA). In a successful phage display experiment, the enrichment ratio increases after each round of selection. In ELISA, phage population obtained at each round is incubated with the immobilized target and a control protein (GST and BSA) in parallel. Unbound phage particles are removed from the wells through a series of washes and the remaining phage are then probed with anti- M13 antibodies conjugated to horseradish peroxidase. Addition of the substrate results in the synthesis of a blue pigment and the reaction is stopped with phosphoric acid to allow for a spectrophotometric reading at 450nm. The enrichment ratio is determined by comparing the signal intensity of the target well relative to the negative control well (BSA). As with phage titrations, in a successful phage display experiment the enrichment ratio should increase after each round of selection. ELISA can also be used to determine the strength of binding of individual phage clones obtained after all the rounds of selection are done. Depending on the stringency of the experiment, tight binders can be defined as clones with target to control ratio of five or greater.

3.2 Methods 3.2.1 Strains: E.Coli strain XL1 Blue was used for expression of GST-fusion proteins. Peptide phage display libraries were re-amplified in T1-resistant E.coli strain SR320, which was generated by mating the strain XL1blue to the strain MC1061. All phage amplifications during selection experiments were done in XL1 Blue.

3.2.2 Protein expression and purification: The DNA encoding the 66 shortlisted domains was chemically synthesized (Genscript) and cloned into IPTG inducible expression vector (pGEX) with Ptac promoter and N-terminal 6XHis and Glutathione-S-transferase (GST) tag available in the Sidhu lab (pHH0103 –Appendix C). The protein sequences for each of 66 domains are attached in Appendix B. The plasmids containing these domains were transformed into chemically competentXL1Blue. Single colonies were propagated in 2YT + 100 ug/ml carbenicillin and stored as glycerol stocks (10% glycerol v/v) at -80C. For protein expression, five ml starter cultures were inoculated from glycerol stocks and

34 grown overnight at 37oC, 200 rpm. The following day, 2-L baffled flasks containing 500 ml of 2YT + 100 ug/ml carbnicillin were inoculated with the starter culture. The cells were grown to o logarithmic phase (OD600=0.6) at 37 C, 200 rpm and induced with 0.4 mM isopropyl-β-D- thiogalactopyranoside (IPTG) for protein expression. The cells were grown for 16 hrs at 16oC, 200 rpm. The cells were harvested by centrifugation (17,600 x g) at 4oC for 20 min and frozen at -20oC. Frozen cell pellets were re-suspended in a 12.5 ml 1xPhosphate Buffer Saline (PBS) buffer with 1mM EDTA, 1mM DTT, 0.5% Triton X-100 (v/v) and protease inhibitors (1 tablet per 50 ml of buffer, Roche). Sonication (three 2-min cycles of 5 sec “ON”, 5 sec “OFF”, amplitude 25%) was used for cell lysis. The cell debris was removed by centrifugation (26,700 x g), at 4oC for 20 min. The cell lysate obtained was then incubated with equilibrated glutathione- sepharose 4B resin (GE-healthcare) and incubated at 4oC for 2 hrs. The resin and cell lysate mixture was then applied to a gravity flow column. The column was washed with buffers (first with 3 ml PBS, second with 3 ml PBS + 150mM NaCl and finally with 3ml PBS). The column was then blocked and the resin was incubated with 1ml elution buffer (100mM glutathione in 50 mM Tris-Cl, pH 8, 1mM PMSF, 1mM EDTA) for 20 minutes. The eluate was collected and kept for further analysis. SDS-PAGE gel and spectrophotometry were used to validate the purity, size and estimate the concentration of the protein. The proteins obtained were immediately aliquoted into smaller volumes, frozen in liquid nitrogen and stored at -80 C. For troubleshooting the protein purification pipeline, samples were collected after overnight incubation, upon lysis & flow- through of the column and tested using SDS-PAGE.

3.2.3 Library construction and design: The peptide library used for the selections was 16 amino acid in length where each of the 16 positions can harbour 19 amino acids (Cys is not included). The library has a theoretical diversity of 2.88x1020. The primary library had a diversity of 4x1010and a titer of 1012cfu/ml. The phagemid used to design the library is listed in Appendix C (pR4STOP).

The library was re-amplified by infecting actively growing SR320at OD600=0.8 containing 5X1012 cells in a 250 ml culture flask. The library was added to the culture such that the ratio of phage : cell is 1:1. The culture was incubated for 30 minutes at 37oC, 200 rpm.

35

Helper phages (M13KO7) were added to the culture (such that the ratio between helper phage: bacteria is 10:1) after 30 minutes to initiate the packaging of viral particles. After an hour of incubation at 37oC, 200 rpm, the culture was added to 5 L of 2YT media and grown for 19 hrs at 37oC, 200 rpm. The cells were harvested by centrifugation (17,600 x g) at 4oC for 20 min. The supernatant containing the bacteriophage particles was incubated with 20% v/v PEG/NaCl (20% PEG-8000 (w/v), 2.5 M NaCl) at 4oC for 20 min. The supernatant was then centrifuged (26,700 x g), at 4oC for 20 min to obtain the white phage pellet. The remaining supernatant was removed by pipetting. The phage pellets were re-centrifuged (26,700 x g), at 4oC for 2 min to concentrate the pellet and then re-suspended in 20 mL PBT (1xPBS, 0.05% Tween 20 (v/v) and 0.5% BSA (w/v)). The final library was stored at -80oC with 10% glycerol (v/v). Phage titrations were performed to estimate the purity and titer of the phage library.

3.2.4 Phage display selections: Phage display was done using the previously established protocol described by Tonikian et al [45]. First round: The target proteins were immobilized on a microtiter plate (NUNC maxisorp 96- well plate) by incubating the proteins overnight at 4oC. For each protein, five wells were used with three wells for the protein and two for the negative control (PBS). Each well was incubated with 100 ul of 10 ug/ml purified protein. The overnight coated wells were blocked with 200 ul of PBT buffer (1xPBS, 0.05% Tween 20 (v/v) and 0.5% BSA (w/v)). The blocked wells were washed three times with PT buffer (1xPBS, 0.05% Tween 20 (v/v)). The phage library was re-suspended to a final concentration of 5X1012 phages/ ml in PBT buffer (1xPBS, 0.05% Tween 20 (v/v), 0.5 BSA (w/v)), and added to each well and incubated for 2 hrs at room temperature. The unbound phages were removed and wells were washed eight times with PT buffer (1xPBS, 0.05% Tween 20 (v/v). Bound phages were then eluted by incubating with 0.1N HCl for 5 minutes at room temperature. The eluted phages were then neutralized using Tris-Cl, pH 11. The eluted and neutralized phages were incubated in 10 volumes of actively growing o XL1blue cells (OD600=0.6) at 37 C, 200 rpm for 30 minutes. Helper phage was then added to the final concentration of 1010 phages per ml to initiate the formation of viral particles. The cells were grown at 37oC, 200 rpm for 60 minutes. Kanamycin at 50 ug/ml was used to select for cells that have been super-infected with helper phage and the culture was grown overnight at 37oC,

36

200 rpm. Round 2, 3, 4 and 5: The target protein was immobilized on a microtiter plate (NUNC maxisorp 96-well plate) by incubating the protein overnight at 4oC. For each protein, five wells were used with three wells for the protein and two for the negative control (BSA). Each well was incubated with 100 ul of 10 ug/ml purified protein. The overnight coated wells were blocked with 200 ul of PBT (1xPBS, 0.05% Tween 20 (v/v) and 0.5% BSA (w/v)). The blocked wells were washed three times with PT buffer (1xPBS, 0.05% Tween 20 (v/v)). Phages obtained from previous round of selection were collected from overnight cultures. The cells were harvested by centrifugation (26,700 x g), at 4oC for 20 min. The virus particles present in the supernatant were incubation with 20% v/v PEG/NaCl (20% PEG-8000 (w/v), 2.5 M NaCl) at 4oC for 20 min. The supernatant was then centrifuged (26,700 x g), at 4oC for 20 min to obtain the white phage pellet. The remaining supernatant was removed by pipetting. The phage pellets were re-centrifuged (26,700 x g), at 4oC for 2 min to concentrate the pellet and then re-suspended in 1 mL PBT (1xPBS, 0.05% Tween 20 (v/v) and 0.5% BSA (w/v)). The 100 ul of phages were added to the respective wells and incubated for 2 hrs at room temperature. The unbound phages were removed and wells were washed eight times with PT buffer (1xPBS, 0.05% Tween 20 (v/v). Bound phages were then eluted by incubating with 0.1N HCl for 5 minutes at room temperature. The eluted phages were then neutralized using Tris-Cl, pH 11. The eluted and neutralized phages were then incubated in10 volumes of actively growing o XL1 blue cells (OD600=0.6) at 37 C, 200 rpm for 30 minutes. Helper phages were then added to a final concentration of 1010 phages per ml to initiate the formation of viral particles. The cells were grown at 37oC, 200 rpm for 60 minutes. Kanamycin at 50 ug/ml was used to select for cells that have been super-infected with helper phage and the culture was grown overnight at 37oC, 200 rpm. In round 1 & round 2, pre-selection was done for 60 minutes at room temperature on wells coated with 10ug/ml GST to remove phage clones that bind to the GST tag (present in each purification). 10-20 fold excess concentration of GST was added in round 3, 4 and 5 to each well coated with target-domain during incubation of library to further remove clones that preferentially bind to the GST tag.

3.2.5 Calculation of enrichment ratio: Phage titrations (to test the number of phages obtained

37 from the protein well and the control well and to calculate the enrichment ratio) were performed at each round of selection. Briefly, 50ul phage obtained from each day of selections were added o to 450ul of XL1 blue (OD600 = 0.6) and incubated for 30 minutes at 37 C, 200 rpm. The cells were then serially diluted (10-fold dilution series) in 2YT. The various dilutions were spotted on a LB agar plate with 100 ug/ml carbnecillin. The plates were incubated overnight at 37 oC. Similar work was performed for the phages obtained from control well (GST/BSA). Next day, colonies were counted on the protein and the control plate to calculate the number of phage present after a round of selection. Enrichment ratio was calculated as the ratio of colonies in the protein well compared to the control well. The enrichment ratio was calculated for each round of selection.

3.2.6 Clonal ELISA and sequencing of peptides: The 50 ul eluted phages from Round 3, 4 & o 5 were introduced to 450 ul of XL1blue (OD600 = 0.6) and incubated for 30 minutes at 37 C, 200 rpm. The cells were then serially diluted (10-fold dilution series) in 2YT. The various dilutions were spotted on a LB agar plate with 100 ug/ml carbnecillin. The plates were incubated at 37oC, 200 rpm overnight. For each protein, 96 colonies were picked and grown over night in 450 ul of 2YT containing 100 ug/ml carbenecillin and 1010 phages/ml M13KO7 (1010 phages/ml), and incubated overnight at 37oC, 200 rpm in a 96-well block. The overnight cultures were centrifuged at 3400 x g for 15 minutes the next day. Phage clones were tested for binding to protein, GST and BSA in an ELISA assay. For each protein, 96 clones are tested in a single microtiter plate (384 well Maxisorp plate, Nunc). The 384-well plate is divided into 96 sections with four wells each. In each section, two wells were coated with 30ul of 10ug/ml of protein, one well with 10ug/ml GST and one well was left empty overnight at 4oC. The plate was then blocked with 50 ul of PBT buffer (1xPBS, 0.05% Tween 20 (v/v) and 0.5% BSA (w/v)) for two hrs at room temperature. 30 ul of phage supernatant was added to all the four wells present in each section and incubated for 60 minutes at room temperature. Wells were washed four times with PT buffer (1xPBS, 0.05% Tween 20 (v/v)). Anti-M13: HRP conjugated antibody was diluted 1:5000 in PBT buffer (1xPBS, 0.05% Tween 20 (v/v), 0.5% BSA (w/v)) and 30 ul was added to each well. The antibody was incubated for 45 minutes at room temperature and then discarded. The wells were washed eight times with PT buffer (1xPBS, 0.05% Tween 20 (v/v)). Colorimetric HRP substrate reagents (TMB

38 substrate, Pierce) were mixed in equal volumes and 25 ul was added to each well and incubated at room temperature for 5-10 minutes with gentle shaking. The reaction was stopped by adding

30 ul of 1M H3PO4. Absorbance at 450 nm was measured for each well using an ELISA plate reader. The enrichment ratio was calculated by comparing the intensity of signal in protein and GST & BSA wells. The plates with enrichment ratio greater than five and GST background noise of 0.1 or less were selected as true binders. The binders were then obtained and the DNA sequence encoding the peptide displayed by that phage clone was amplified using PCR. The DNA encoding the specific peptide was identified by DNA sequencing. The DNA sequences obtained were processed and translated to obtain the peptide sequence. The peptides against each domain were aligned manually or using multiple sequence alignment tools (Geneious) to obtain peptide logos. Till date, I have obtained phage clones that bind specifically to 27 of the 44 domains (61% of the purified domains, 40% of all the domains).

3.2.7 Structural modeling of phage-display obtained results: All structural models were obtained with Modeller. Modeller was installed on a Linux machine and run using the command- line. Discovery Studio Visualizer was used to analyze the results from obtained from Modeller. Energy minimization of Modeller structures was performed with the Molecular Dynamics (MD) plug-in available in the licensed version of Discovery Studio.

3.3 Results 3.3.1 Selection of peptide binders using phage display: Each of the 66 domains was purified using GST purification protocol described in the methods section. Purified proteins were obtained for 44 out of 66 domains (67%). Table 5 contains all the results obtained from protein purification. For most domains, I was able to obtain protein sufficient for performing phage display. SDS page gels were run to check the correctness and purity of proteins. SDS gels were also run to diagnose the entire purification and expression process. The 22 domains that could not be purified showed high expression of protein. However in all such cases; the expressed protein was insoluble and went into cell debris upon lysis. Further optimization may be required to obtain these proteins in soluble form. However, in this study we continued with the 44 domains and used them as targets for phage display.

39

The original 16-aa length peptide library had a diversity of 4X1010 unique peptides and a phage titer of 5X1012 cfu/ml. Upon re-amplification, a phage titer of 2.5X1011cfu/ml was obtained post-infection providing a 10-fold coverage of the original library diversity. Upon amplification, a library titer of 2X1013 cfu/ml was obtained. The selections were performed using protocol modified from the one previously described by Tonikian et al [45]. A subset of phage clones present in the library may bind tighter to GST tag than the target domain. These may lead to spurious or false positive results. To remove such phage clones, pre-selection was done on GST coated wells. Further negative selection was performed by adding 10-fold excess GST in the target domain-coated well during selection. For most proteins, upon negative selection, strong enrichment ratios were obtained. This suggests that the negative selections were effective in removing strong GST-binding phage clones from our library. Table 5 shows the enrichment ratio obtained after each round of all 38 targets against which phage selections were done. For 27 targets, I obtained enrichment in selections.

3.3.2 Validation of tight and specific binders using clonal ELISA: For each protein with a significant pool ELISA signal, I picked out 96 clones for clonal ELISA. The ELISAs were done in a 384 well plate. For each phage clones, four wells were selected; two wells were coated with the target domain while the other two wells were used for negative controls GST and BSA. The clones that gave enrichment ratio greater than five were selected. DNA sequence encoding the peptide for each of the selected clone was amplified by PCR and sent for sequencing.

3.3.3 Identification of binding preferences and literature validation: Peptide binders were obtained for 27 domains. Geneious toolkit was used to align all peptide sequences obtained for each of the selected domain. No gaps were allowed in the alignment. Alignments obtained from Geneious were improved manually. For 22 of these 27 domains, sufficient numbers of peptides were available to generate a position weight matrix (PWM) that represents the binding preferences of these domains. The 27 domains for which phage display peptides were obtained belong to 20 different domain families and exhibit distinct binding preferences. The divergence in peptide binding preferences highlights the power of phage display in generating specific peptide binders.

Name Protein name Protein Protein yield Pool Clonal Sequence Logo Comment expression (mg/ml) ELISA ELISA PDZ (2.30.42.10) 1 Disc-large homolog 1 PDZ 1 Yes 2.82 1.5 - - Non-specific binders 2 Disc-large homolog 1 PDZ 2 Yes 4.2 5 - - Non-specific binders 3 Disc-large homolog 2PDZ 2 Yes 3.74 23 19(18) -

4 Disc-large homolog 2 PDZ 3 Yes - - - - No protein in lysate 5 Disc-large homolog 4 PDZ 2 Yes 1.9 12 19(5) -

6 Disc-large homolog 4 PDZ 3 Yes - - - - No protein in lystate SH3 (2.30.30.40) 7 Growth-factor receptor bound Yes 0.9 56 82(30) - protein 2 8 Grb2-related protein 2 Yes 1.44 444 82(77) -

9 Phospholipase gamma Yes 1.21 205 17(14) -

10 Sorbin and SH3 containing protein Yes 1.32 277 82(54) - 2 Protein kInase (3.30.200.20+1.10.510.10) 11 Mitogen-activated kinase 3 Yes - - - - No protein in lystate 12 PKA Yes - - - - No protein in lystate 13 PKB Yes - - - - No protein in lystate 14 Serine/threonine kinase 2 Yes - - - - No protein in lystate 15 Aurora kinase B Yes 0.30 3 - - Non-specific binders 40

G-alpha subunit (3.40.50.300+1.10.400.10) 16 G-alpha (i) 1 Yes 1.33 14.3 43(12)

17 G-alpha (i) 3 Yes 0.85 18.7 - - Non-specific binders 18 G-alpha (o) 1 Yes 1.12 25 - - Non-specific binders Ligand binding domain of nuclear receptor (1.10.565.10) 19 Bile acid Receptor Yes 0.60 13.3 22(12) -

20 Retinoic acid receptor-gamma Yes - - - - No protein in lystate 21 Glucocorticoid receptor Yes - - - - No protein in lystate 22 Pregnane X Receptor Yes - - - - No protein in lystate Dyenin light chain (3.30.740.10) 23 Dynein light chain 1 Yes 1.18 233 69(51) -

24 Dynein light chain 2 Yes 1.21 63 25(25) -

RNA recognition module (3.30.70.330) 25 Splicing factor U2AF1 Yes 1.28 0.5 - - No enrichment 26 Splicing factor 45 Yes - - - - No protein in lystate Profilin (3.30.450.30) 27 Profilin 1 Yes 4.51 15 - - Non-specific binders 28 Profilin 2 Yes 1.13 12 - - Non-specific binders Penta-EF hand (1.10.238.10) 41

38 3 7

29 Programmed cell death receptor 6 Yes 0.55 100 55(55) -

30 Calpain small regulatory subunit 1 Yes 1.91 63 85(12) -

Actin (3.30.420.40+3.90.640.10) 31 Actin-gamma 1 Yes - - - - No protein in lystate 32 Actin-gamma 2 Yes - - - - No protein in lystate Beta-propeller (2.130.10.10) 33 Clathrin heavy chain 1 Yes 1.63 250 47(22) -

34 WDR5 Yes 1.24 80 -

PH/PTB domain (2.30.29.30) 35 Dynamin 2 Yes 0.44 - - - Phage Display not done 36 Disabled homolog 2 Yes - - - - No protein in lystate P-loop containing nucleotide triphosphate hydrolase (3.40.50.300) 37 RAC3 Yes 0.84 5 - - Non specific binders 38 RAD51 Yes 0.20 15 - - Non specific binders Typrin-like serine protease (2.40.10.10) 39 Tissue-type plasminogen activator Yes 0.55 - - - Phage Display not done 40 Acrosin Yes 0.61 - - - Phage Display not done 14-3-3 (1.20.190.20) 42

38 3 7

41 14-3-3 eta Yes 1.80 33.3 56(9) -

AP50 domain (2.60.40.1170) 42 AP-2 subunit mu Yes - - - - No protein in lystate Bcl-2 (1.10.437.10) 43 Bcl-2 like protein 1 Yes - - - - No protein in lystate Bro1 (1.25.40.280) 44 Alix Yes 0.80 22.2 15(7) -

CAP/Gly (2.30.30.190) 45 Dynactin subunit 1 Yes 2.22 18 2(1) GQDEWVPWQLWSWQESI No sequence logo

Caspase-like (3.40.50.1460) 46 Caspase 2 Yes - - - - No protein in lystate DNAse1-like (3.60.10.10) 47 DNAse 1 Yes - - - - No protein in lystate eIF4E (3.30.760.10) 48 eIF4E Yes 0.44 30 6(6) FLYYYGLSHNWFGDQT No sequence logo LVPWWWRVEQTMDPVI SVWWFGQTPYVLWEAS RVMIWWWLTQGIPFSF NLYYNNMYWQWYEWLN PWSWFTYREQLETENV

FERM (1.20.80.10) 49 Band 4.1-like protein 3 Yes - - - - No protein in lystate Ig domain (2.60.40.10) 43

38 3 7

50 Immunoglobulin lamba-like Yes - - - - No protein in polypeptide 1 lystate Importin-beta (1.25.10.10) 51 Importin beta-1 Yes 0.36 33 10(10) -

Mad2A (3.30.900.10) 52 MAD2-like protein 1 Yes 0.49 75 16(13) -

MHC II (3.10.320.10) 53 DRB1 beta Yes - - - - No protein in lystate PCNA (3.70.10.10) 54 Proliferating cell nuclear antigen Yes 0.34 40 2(1) GARQTLITDWLMVSSD No sequence logo OB-fold domain (2.40.50.140) 55 Replication factor A 70 Yes 4.41 94 66(11) -

Serpin (3.30.497.10+2.30.39.10) 56 Plasma serine protease inhibitor Yes - - - - No protein in lystate SH3-type barrels (3.40.50.300/2.30.30.40) 57 Volage-dependent L-type calcium Yes - - - - No protein in channel subunit beta lystate SWIB/MDM2 (1.10.245.10) 58 MDM4 Yes 3.40 150 26(24) -

TAP-UBA (1.10.8.10) 59 Nuclear export factor 1 Yes 2.57 45 24(9) -

Winged helix repressor DNA binding domain (1.10.10.10) 44

38 3 7

60 Transcription factor IIF Yes 1.99 7.5 - - Non-specific binders TRFH (1.25.40.201) 61 Telomeric repeat-binding factor 1 Yes 1.42 15 8(4) LGHTTAEMIDYMELQW No sequence logo SFPLEFTTDYMYNLMA MLFDDEAMYNWQWHLM EHSFLFEDWMWEGKDH Factor Xa inhibitor (4.10.410.10) 62 Amyloid-like protein 2 Yes 0.79 - - - Phage Display not done Ubiqutin-like (3.10.20.90) 63 Ubiquitin-60S ribosomal protein Yes 3.90 22 5(4) EHMWDAQMWEWSWWDL No sequence logo L40 EMWVFTPAEWFQIYLN MTVVEWWTDAQIAEWM DLHYDWSLEYWTSLLQ Vinculin (1.20.1490.10) 64 Vinculin Yes 1.64 65 33(26) -

XRCC4 (2.170.210.10+1.20.5.370) 65 XRCC4 Yes 0.88 - - - Phage Display not done Tyrosine phosphatase (3.90.190.10) 66 Tyrosine-protein phosphatase non- Yes 0.28 - - - Phage Display not receptor type 22 done Table 4: Summary of phage display results for 66 domains. SDS PAGE gels were run to determine the protein expression in the whole cell lysate. OD at 280 nm was used to determine the protein yield (shown in mg/ml). Enrichment ratio for pool ELISA is the maximum ratio of colony forming units per ml obtained from protein and empty plate for a given domain. The clonal ELISA column shows the number of sequences with: Enrichment ratio > 5 and background signal < 0.1. The number of unique sequences obtained from sequencing is included in the bracket. Sequence logo obtained from phage display is also included. For domains for which the number of sequences was insufficient to generate a sequence logo, the peptide sequences have been included (key residues have been highlighted in bold). Rows containing proteins that: were not purified is shown in blue, for which phage display was not done are shown in orange, for which no peptide sequences were obtained are shown in green and for which sequence logo could not be obtained are shown in purple. 45

38 3 7

46

To rationalize the binding preferences obtained from phage display, an extensive structural analysis was done using the available structures of the protein domains in complex with their known peptide ligand (Figure 11). Based on my analysis, I have presented below an in-depth analysis of all the domains for which phage display results were obtained. These domains belong to different cellular pathways and hence I have divided the 27 domains into five sections based on their biological function (Figure 12). This would help in understanding the potential uses of peptide probes generated by phage display and the various cellular processes that can be targeted using these peptides.

38 Figure 11: Strategy for validating phage display results. To interpret the results from phage display, an extensive literature review and structural analysis was done. For each domain, the peptide sequences were aligned using Alignment tool in Geneious with high gap penalty. The sequences were then visualized as a position weight matrix using Weblogo. The alignments that show no consensus were improved manually. Structural analyses of existing complex structure of domains in complex with peptides were used to further improve the alignment obtained. This analysis was generating a model for binding of phage-derived peptides to target domain.

3.3.4 Cellular signalling: Based on the literature analysis, ten out of the 27 domains were present on proteins involved in signalling networks including kinase and G-protein networks. These included previously well-studied domains such the SH3, PDZ and 14-3-3 domains. 45

47

38

Figure 12: Overview of phage results. Based on the literature review, the 27 domains were divided into four distinct biological functions: cellular signalling, cytoskeleton regulation, Intracellular transport and genome regulation. Domains that did not fit into any of the four categories are shown as miscellaneous. 45

48

3.3.4.1 SH3: Src Homology 3 (SH3) protein interaction domains participate in a diverse set of signalling pathways by binding to linear motifs [5]. These domains preferentially bind to proline- rich-motifs (PxxP) with affinities ranging from Kd = 1 to 200 uM. The selectivity of SH3 domains have been studied in detail and consensus motifs have been predicted using yeast-2- hybrid, phage display, alanine scanning and structure determination [5]. SH3 domain is a 60 amino acid domain with a beta-barrel fold which consists of 5 or 6 β- strands arranged as two tightly packed anti-parallel β sheets (Figure13) [24]. The interaction surface (between the RT and N-src loops) is relatively flat, hydrophobic with three shallow grooves defined by conserved aromatic residues. The peptide adopts an extended, left-handed conformation (polyproline-2 or PPII helix). Sequences lacking PxxP motif are also known to bind to SH3 domains. Grap2-SH3 in our study is an example of domain that prefers RxxK motif [46]. Crystal structures have confirmed that RxxK motif binds to a different binding region on the SH3 domain. (Figure 13) 38

Figure 13: Structural and literature analysis of SH3 domains: (A) SH3 domains are known to bind to peptides using two distinct binding surfaces: binding surface 1 binding to PxxP motif and binding surface 2 that binds to RxxK motifs. Some SH3 domains (such as C-terminal SH3 domain of Grb2 (PDB ID: 2VWF) – shown in the figure) binds to its interaction partner using both the binding surface. (B) The binding preferences of phage display in comparison to previous results. In each panel, the logo on the top shows the binding preferences obtained by our study. Sequence logo at bottom of 3 panels (GRAP2, PLCG1 and SORBS2) was obtained from large-scale phage display screen performed by Dr. Haiming Huang at Sidhu lab (results unpublished). The phage display logo for N-terminal Grb2 was obtained from study done by Sparks et al [47]. The binding preferences obtained in this study matches with previous phage display experiments.

In this study, four SH3 domains were targeted: the N-terminal SH3 domain of Grb2, the 45

49

C-terminal SH3 domain of GRAP2, the SH3 domain from PLCG1 and the second SH3 domain from SORBS2. For each of the SH3 domains, high levels of enrichment were obtained in pool ELISA. Further for four SH3 domains, we obtained a large number of unique sequences that showed high enrichment ratio in clonal ELISA. The binding preferences of these SH3 domains have previously been elucidated using phage display by the Sidhu lab and other groups. However, the SH3 domains were selected to serve as positive controls for our experimental pipeline and validate our screening method and peptide library. As expected, the phage display results match previously generated binding preferences (Figure 13). The positive results obtained for SH3 domains inform us that the diversity and display levels of the peptide library is sufficient for elucidating binding preferences of PRDs.

3.3.4.2 PDZ: PDZ domains are peptide binding domains that bind to hydrophobic C-terminal motifs of proteins. They regulate multiple cellular processes, acting as scaffolds involved in protein-protein interactions. PDZ domains are ~90 aa in length and have a conserved fold consisting of 5-6 β-strands and 2-3 α-helical structures [28]. These domains have a single binding site in a groove between the α2 and β2 structural elements with a highly conserved carboxylate-binding loop ([R/K] xxxGΦGΦ motif, where x: any amino acid residue and Φ: hydrophobic residues) located before the β2 strand typically recognizing the extreme C-termini of their target proteins (Figure 14). PDZ domains have a well-defined binding preference and previous work done by Tonikian et al [28] has identified C-terminal binding preferences of 72 human PDZ domains. A subset of PDZ domains are also known to bind to internal binding motifs such as Syntrophin and Par6 domains (Figure 14) [48]. In the Sidhu lab, an internal peptide phage library has been used previously to identify internal peptide binding mode of

Dvl2-PDZ [36]. 38 In the current study, six PDZ domains were selected: the first and second PDZ domains of DLG1, the second and third PDZ domains of DLG2 and the second and third domains from DLG4. Out of these six PDZ domains, four were successfully purified. All the four domains were used as targets for phage display out of which two PDZ domains showed phage clones with high enrichment ratio. The peptide sequences obtained from the phage selections were aligned to obtain results shown in Figure 13. The sequence logo obtained for PDZ domains was similar to the internal binding mode observed for the Par6-PDZ domain suggesting that the two PDZ 45

50 domains may also bind to internal ligands. These observations have to be validated using further experiments.

Figure 14: Structural and literature analysis of PDZ domains: A. PDZ domains are known to bind to C-terminal peptides where the free COOH group binds to the carboxylate binding pocket. In selected PDZ domains, peptide binding can occur via a beta-hairpin motif (syntrophin PDZ) or a Par6 internal binding motif where negatively charged residue at site +1 compensates for COOH group [48]. B. The sequence logos obtained for second PDZ domain of DLG2 and DLG4. The binding preference is similar to canonical PDZ internal binding motif observed for Par6-PDZ domain. C. The structure of Par6-PDZ domain in complex with peptide obtained from Pals1 (PDB ID: 1RZX). The DLG2-PDZ2 motif shows conservation for Glu and Thr at positions -2 and -3 (Glu and Met in yellow); Ile/Leu at position 0 (Val in yellow); Asp at position +1 (Asp in yellow) and Pro at position +3 (Pro in yellow) of Pals1 internal ligand. DLG4-PDZ2 domain is similar but much weaker pattern (due to less number of unique sequences obtained). These results predict that phage-derived peptides bind to DLG2-PDZ 2 and DLG4-PDZ2 at the peptide binding pocket in an internal binding mode.

3.3.4.3 G-alpha subunit of hetero-trimeric G-proteins: Guanine nucleotide-binding proteins are an important family of cell-signalling molecules that regulate key cellular pathways [14]. The alpha subunit of G-proteins binds to G protein-coupled Receptors (GPCR) and acts as a GTPase. Upon binding of ligand to GPCR, an exchange of GDP with GTP occurs in the Gα subunit. This active Gα dissociates from the inactive G-protein complex and acts on its downstream effectors via its GTPase activity. Structurally, Gα consists of two domains: a GTPase domain and an alpha-helical domain. The GTPase domain is similar in structure to p21ras and other members of 38 the GTPase super-family of proteins and contains five helices surrounding a six-stranded beta- sheet with five strands running parallel and one strand running anti-parallel to the others. The second of the five helices is a 3(10) helix, rather than an alpha helix. The alpha-helical domain is unique to the Gα subunits and has a long central helix surrounded by five shorter helices. The alpha helical domain is joined to the GTPase domain by two extended strands, linker 1 (res 54- 58) and linker 2 (res 173-179). Between these two linking segments lies a deep cleft within which the nucleotide (GTP or GDP) is tightly bound. Phage display and mRNA display have 45

51 been used to obtain peptide antagonists/agonists of the GTPase activity of Gα [14, 54]. Both the methods generated peptides that bind to the hydophobic pocket between the α3 helix and the switch II helix. The switch II/α3 binding pocket is also the position for binding of RGS14 GoLoco motif, an important regulator of Gα activity. In this study, I targeted three G-alpha domains – Gαi1, Gαi3 and Gαo1. All the G-alpha domains were successfully expressed and purified in sufficient quality and quantity to perform phage display experiments. However, upon selections, peptides were obtained against only one G-alpha domain: Gαi1, which has been targeted by previous studies. This may be due to the absence of GTP/GDP during selection which may be required for stabilizing the structure of Gαi3 and Gαo1. For Gαi1, 12 peptides were obtained which were aligned to generate a consensus motif: “ΦWexeWV” (where Φ: hydrophobic residue; e: negative charged residue). The consensus motif is distinct from the peptide obtained from previous combinatorial library studies (Figure 15). Structural analysis of Gαi1 with KB-752 peptide suggests that phage-derived peptides may bind to the same binding site of peptide as KB-752; albeit in a different mode (Figure 15). 38

Figure 15: Structure and literature analysis of Gα subunits. (A) The structure of Gαi1 in complex with GDP (red) and KB-752 peptide (PDB ID: 1Y3A). (B) The binding logo obtained for Gαi1. The residues that are conserved in the sequence correspond to Trp10, Trp14 and Phe15. (C) The interaction surface of KB-752 and Gαi1. The residues on KB-752 that are important for interaction are the Trp, Phe, Asp and Leu (shown in yellow). A similar pattern is observed in phage display binding motif, albeit with a spacing of two residues between Trp and negative charge compared to one in KB-752 and a preference for Trp and Phe at position 14 and 15 instead of Phe and Leu as found in KB752.

G-protein signalling is one of the major signalling pathways used by cells and have been implicated in a number of disorders. This can be highlighted by the fact that 25% of the marketed pharmaceuticals target GPCRs [55]. Gαs oncogenes have been shown to increase carcinogenicity 45

52 and metastasis, and recent identification of Gαs-hyperactivating mutations in kidney cancer indicates that the subunit could be a therapeutic target in developed tumours [55]. Peptide probes developed here may be used to modulate the activity of Gα domains.

3.3.4.4 14-3-3: The14-3-3 family of proteins play a key role as scaffolding proteins in a number of cellular signalling pathways [56]. Seven family members have been reported in human and are expressed in all human tissues except for 14-3-3 sigma that is specific to epithelial cells. Structurally, 14-3-3 proteins contain a single domain that forms an alpha-alpha super-helical structure harbouring a conserved amphipathic groove that forms the binding pocket. 14-3-3 proteins are generally known to bind to phosphorylated serine residues on their binding partners with sub-micromolar affinity [57]. Non-phosphorylated peptides have also been identified. Binding of Exoenzyme S, a toxin produced by Pseudomonas aeruginosa, displays a high affinity towards 14-3-3 zeta/delta and binds to the same amphipathic groove responsible for binding to the phosphorylated peptides [58]. Phage display has also been used to identify cyclic peptides 14-3-3zeta/delta. The peptide with high affinity contained “WLDLE” motif that was essential for binding [59]. 38

Figure 16: Structural and literature analysis of 14-3-3. (A) The structure of 14-3-3 zeta/delta in complex with the ExoS peptide (PDB ID: 2O02). (B) The binding logo obtained for 14-3-3 eta. (C) The interaction surface of 14-3-3 zeta/delta and the ExoS peptide. The sequence logo shows conservation for residues Glu(8)-Trp(9)-Leu(10) Asp(11)-Leu(12)-Ala(13). These correspond to Asp-Ala-Leu-Asp-Leu-Ala residues present in the ExoS peptide (shown in yellow).

Of the seven 14-3-3 isoforms, 14-3-3 eta was found in the list obtained from whole genome RNAi screens. The peptides I obtained from phage display share a similarity to peptides obtained from ExoS and phage display experiments (Figure 16). To further understand the 45

53 binding preference from phage display, I used the structure of 14-3-3 zeta/delta with ExoS [58]. The amino acids forming the interaction surface are identical in 14-3-3 zeta/delta and 14-3-3 eta and hence both domains should have similar (if not identical) binding preferences. ExoS peptide and cyclic phage-derived peptide have been shown to bind to 14-3-3eta. Hence we predict that the peptide against 14-3-3 eta will bind to same interaction surface in a mode that is similar to the binding of ExoS peptide (Figure 16).

3.3.4.5 Penta-EF hand domains: Penta EF-hand domains (PEF) are a family of Ca2+ binding domains that are composed of five EF-hand motifs [83]. EF-hand is a helix-loop-helix structure characterized by a conserved 12-residue inter-helical sequence that co-ordinates a Ca2+ ion. The EF-hand motifs are present in a multitude of proteins, usually in multiple copies. Penta-EF hand domains consist of five EF-hand motifs and consist of eight alpha helices. The five EF-hands are formed by: α1-α2 (EF1), α3-α4 (EF2), α4-α5 (EF3), α6-α7 (EF4) and α7-α8 (EF5). Based on sequence similarity, penta-EF hand domains can be divided into two groups: Group I PEF domains (PDCD6 and peflin) and Group II PEF domains (calpain sub-family members, sorcin and grancalcin). In this study, I targeted two PEF domains: Caplain small regulatory subunit 1 and Programmed cell death protein 6. PEF hand domains have not been previously studied using phage display and hence this study represents the first to elucidate penta-EF hand binding preferences. a. Calpain small regulatory subunit 1: Calpains are intracellular Ca2+ dependent cysteine proteases that play key roles in cells and have been implicated in a number of cellular processes such as signal transduction, apoptosis, and cytoskeleton modelling[84]. The calpain proteolytic system consists of a small subunit, which acts as a Ca2+ dependent adaptor, a large subunit that

contains the catalytic site and an endogenous calpain-specific inhibitor, calpastatin. Calpastatin is 38 ubiquitously expressed and blocks the protease activity of calpain by binding to three sites on the calpain protease: the active site and domain V on the large regulatory subunit and penta-EF hand domain on the small regulatory subunit [84]. In this study, I focussed on the small regulatory subunit of calpain. The small subunit is required for proper functioning of the calpain large subunit and acts as a chaperone to stabilize the calpain protease system. Calpain small regulatory subunit harbours a hydrophobic binding surface that binds to the peptide obtained from calpastatin: DAIDALSSDFT. The binding 45

54 preference obtained from phage display is similar to that of the calpastatin-derived peptide with the motif – DLxxWLxxDM (Figure 17).Hence, these peptides should competitively inhibit the interaction of the calpain small regulatory subunit and calpastatin. Calpains have been previously implicated as drug target in a number of disorders including cancer and neurodegenerative diseases. There are large efforts to design small molecule and peptide-based drugs that target calpain. Almost all of these drugs target the active site of calpain to inhibit calpain activity. However, given the similarity between the active site of calpain and other cysteine proteases, most of these compounds are non-specific. Calpastatin is the endogenous and most specific inhibitor of calpain but it is not stable inside cells and hence cannot be used for therapeutic applications. Thus there is an urgent need to develop high affinity and high specificity inhibitors of calpain. The peptides identified here by phage display may serve as templates for developing peptide inhibitors against the calpain protease system.

Figure 17: Structure and literature analysis of Penta-EF hand of CAPNS1. (A) The structure of CAPNS1 penta-EF hand domain in complex with calpastatin-derived peptide (PDB ID: 1NX1). (B) The binding motif for CAPNS1 penta-EF hand domain obtained from phage display aligned with the peptide derived from calpastatin. (C) The interaction surface between CAPNS1 penta-EF hand and calpastatin-derived peptide. Calpastatin peptide forms an alpha helical structure that binds to the binding pocket on the penta-EF hand. The key interactions are made by two Asp; two Ala; one Leu and one Phe residues on the peptide (highlighted in yellow). These nature and position of these residues are conserved in 38 binding motif. Hydorphobic residues are replaced by aromatic residues in the binding motif indicating that the hydrophobic surface is more flexible and can accommodate bulkier residues. b. Programmed cell death protein 6: Programmed cell death protein 6 or PDCD6 (Alg-2) functions as a Ca2+-dependent adaptor protein in the ESCRT and ER-to-Golgi transport systems. Alg-2 interacting proteins commonly contain Pro-rich regions, and Alg-2 recognizes at least two distinct Pro-containing motifs: PPYP(x)nYP (Alix, PLSCR3) and PxPGF (Sec31A, ABM-2)

[85]. The binding of PPYP(x)nYP peptide occurs at a groove that contains two peptide-binding hydrophobic pockets [86]. The structural basis for the binding of Alg-2 to PxPGF peptides 45

55 remains to be established. Mutational and competitive binding analysis have shown that the

PxPGF peptides bind to a binding pocket(s) that is different from that of PPYP(x)nYP peptides. From phage display, I obtained 55 unique peptides against PDCD6. Interestingly all these peptides contained a prominent GWxxWV motif (Figure 18). This motif is distinct from the

PPYP(x)nYP motif, which binds to a structurally-defined binding surface. However this motif does partially overlap with the PxPGF motif suggesting that these peptides may bind to the same surface. The binding surface of this motif is not known structurally and phage-derived peptides can be used to identify the binding surface. It has been shown that knock-down of Alg-2 leads to growth defects in cancer cell lines via cell cycle arrest at G2/M checkpoint [87]. However, the molecular mechanism of how Alg-2 is involved in cancer-related pathways is still unclear. Peptide probes developed in this study can be used to investigate the biological role of Alg-2 in specific cancer cell lines.

Figure 18: Structure and literature review of Penta-EF hand of PDCD6. (A) The structure of PDCD6 penta-EF hand domain in complex with Alix-derived peptide (PDB ID: 2ZNE). (B) The binding motif for PDCD6 penta-EF hand domain obtained from phage display. (C) The interaction surface between PDCD6 penta-EF hand domain and the Alix-derived peptide showing the pocket 1 and pocket 2 bind to the two PYP motifs on Alix peptide (highlighted in yellow). The phage- 38 derived motif is distinct from pro-rich motif found in Alix and hence is predicted to bind to a distinct binding site.

3.3.5 Cytoskeleton regulation: Four domains in our study were found to be present on proteins involved in regulation of cytoskeleton structure. These include two domains of the dynein light chain family, one domain from the CAP/Gly domain family and one domain alpha-catenin/ vinculin head domain family. 3.3.5.1 Dynein light chain: Dynein light chains(DLC) are domains found on mono-domain proteins: light chains of cytoplasmic motor protein dynein (DYL1 and DYL2). Structurally, DLC 45

56 domains contain three beta-sheets and three alpha helices in a two-layer alpha-beta core structure. DLC are peptide recognition domains that bind to a large range of proteins such as Pak1-kinase, Bim (pro-apoptotic) and many viral proteins [49]. Previous studies have found two motifs that bind to DYL1 – GIQVD, KxTQT; where both bind in an anti-parallel conformation to the binding groove (Figure 19).Recent studies have also done phage display to determine the binding preference of DLC from DYL1 (Figure 13) [50]. The peptide profile obtained via phage display looks similar to the motifs obtained from natural peptides and previous phage display studies. However, some features of the motif obtained by phage display seem to be different; for e.g.: a surprising preference for M/ W (φ) instead of K/R at position -3. DYL proteins act as essential hub proteins that are involved in a range of cellular signalling pathways such as cytoskeleton including intra-cellular transport, autophagy, apoptosis etc. Over-expression of DYL1 has been shown in a number of tumour types [49]. However, the mechanism by which DYL1 and DYL2 are involved in carcinogenesis is poorly understood. Peptide-based inhibitors developed in this study may help in critically evaluating the role of DYL1 in various cancer-related pathways. 38

Figure 19: Structural and literature analysis of Dynein light chains.(A) The structure of Dynein light chain 1 in complex with the peptide derived from Swallow (PDB ID: 3E2B). (B) The sequence logos obtained from phage display for Dynein light chain 1 (DYL1) and Dynein light chain 2 (DYL2). DYL1 and DYL2 share high sequence similarity between each other and hence are predicted to have similar binding preference; which is observed in phage display results. Few difference exist in the two sequences, including stronger preference for Gly and Ala at position 4, Asp and Glu at position 9 in DYL1 compared to DYL2. (C) Binding interaction of Dynein light chain 1 and the Swallow peptide. The key interactions are mediated by Lys(-3)-Ala(-2)-Thr(-1)-Gln(0)-Thr(1)-Asp(2) residues present on the Swallow peptide. The central Thr(-1)-Gln(0)-Thr(1) residues are conserved in the sequence logo obtained for DYL1. Few difference are also observed namely, Met and Trp are possible at Lys(-3); Serine and Thr are possible at Thr(2); Asp and Glu are possible at Asp(5) on the Swallow peptide.

45

57

3.3.5.2 CAP/Gly domain: CAP/Gly domains are Gly-rich domain found in a number of Cytoskeleton-associated proteins (CAP) that bind to C-terminal peptides with the motif – EEY/F-COOH [98]. CAP/Gly domains are extensively involved in cellular processes including segregation, establishment and maintenance of cell polarity, intracellular organelle and vesicle transport, cell migration, intracellular signalling and tumorigenesis. CAP-Gly domains are found in single or multiple copies and are primarily involved in protein interactions and the formation of protein networks. In this study, I targeted the CAP-Gly protein of the large subunit of dynactin, p150glued [99]. The dynactin complex is required for targeting dynein to its cargo and for dynein motor processivity. CAP-Gly domains are characterized by highly conserved motif with glycine and hydrophobic residues. Structurally, CAP-Gly domains form a globular-protein fold with a highly twisted, five-stranded antiparallel β-sheet flanked by a small β-hairpin. A unique cluster of conserved aromatic residues forms a solvent exposed hydrophobic cavity bordered by the highly conserved GKNDG motif. This hydrophobic cavity of CAP-Gly domains serves as a binding site for the C-terminal Glu-Glu-Tyr/Phe (EEY/F)-COOH sequence motifs (Figure 20). The C-terminal binding preference could not be tested in the study as the C- terminal peptide library was not available. From phage display, I obtained a single peptide that bound to the CAP/Gly domain of DCTN1 suggesting an internal binding mode for this CAP/Gly domain. Based on the sequence alignment of internal peptide with C-terminal peptide, I observed similarity between N-terminal end of the two peptides. To gain further insight into peptide binding, I modelled the phage- derived peptide on the CAP/Gly domain of dynactin using Modeller. The structural model obtained was analyzed by visual inspection. Based on the results, I predict that the binding surface of CAP/Gly can accommodate an internal peptide (Figure 20). Briefly, the Asp and Glu

residues are conserved between the natural peptide and internal peptide. The Thr residue on 38 natural peptide is replaced by a Trp residue which interacts with a hydrophobic pocket present on CAP/Gly domain. The C-terminal Phe residue is compensated by the Val and Pro residues which also provides the structural flexibility to the internal peptide. The backbone CO group between Val and Pro in internal peptide forms hydrogen bonds with positively charged residues on CAP/Gly domains mimicking terminal COO- group in natural peptide. Finally the movement of highly-mobile β4/ β5 loop allows the Trp residue on the internal peptide to interact with a hydrophobic pocket. This pocket is covered by Asn (69) in presence of natural peptide. The 45

58 movement of the β4/ β5 loop also allows the internal peptide to move out of the peptide binding pocket.

Figure 20: Structure and literature analysis of the CAP-Gly domain of p150glued. (A) The structure of the p150glued CAP/Gly domain in complex with CLIP-170 zinc-knuckle 2 (PDB ID: 3E2U). (B) The binding surface of CAP/Gly-peptide complex. (C) The sequence alignment of the natural C-terminal and phage-derived internal peptide partner of p150glued CAP/Gly domain. (D) The modelled structure of the p150glued CAP/Gly domain in complex with the phage-derived peptide. The structure was generated using Modeller by mutating the CLIP-170 peptide to Trp-Val-Pro-Trp-Gln. (E) The binding surface of CAP/Gly-internal peptide complex showing the potentially novel internal binding mode.

Biophysical assays including iso-thermal calorimetry (ITC) are required to confirm the binding between the phage-derived peptide and DCTN1 CAP/Gly domain. Structure 38 determination of phage-derived peptide and CAP/Gly domain is required to confirm the structural model of the internal binding mode. If confirmed, the phage-derived peptides may help in identifying novel protein partners and biological roles of CAP/Gly domains.

3.3.5.3 Alpha-catenin/vinculin head domain: The alpha-catenin/vinculin head domain is found in proteins involved in cytoskeletal organization, such as vinculin and alpha-catenin. Structurally, the alpha-catenin/vinculin head domain comprises of seven amphipathic helices 45

59 arranged as two four-helical bundles [66].In this study, we focused on the vinculin head domain. The vinculin head domain is involved in mediating interactions between vinculin and its interaction partners: alpha-actinin, talin and Shigella toxin IpA. The interaction between the vinculin head domain and talin has been studied extensively and has been shown to occur via a “helix addition” mechanism; where 26-amino acid length amphipathic peptide from talin inserts into the first four helices of vinculin head domain to form a compact five helix structure. The interaction exhibits high affinity and the peptides from alpha-actinin and IpA bind in a similar mode [66]. A number studies have aimed at identifying the binding preference of vinculin head domain. One of the first studies used phage display to identify peptides against the vinculin head domain. Adey et al identified five peptides that specifically bound to the tailin binding region on vinculin head domain with high affinity [67]. However these peptides failed to generate a binding motif for vinculin binding. Gingras et al provided a consensus motif (LXXAAXXVAXXVXXLIXXA) for vinculin binding based on the study of complex structure of talin bound to vinculin and SPOT microarray analysis [68]. In this study, I obtained 26 unique peptides against the vinculin head domain. The sequences produce a low-resolution alignment (shown in Figure 21) and do not align well to the vinculin binding motif provided by previous combinatorial studies. This can be attributed to the shorter length of these peptides. At 16 amino-acid length, the peptides obtained in this study represent the shortest peptide sequences that are known to bind to the vinculin head domain and hence may serve as inhibitors of vinculin-talin binding. 38

Figure 21: Structural and literature analysis of Alpha-catenin/vinculin head domain. (A) The structure of vinculin head domain in complex with the talin VBS1 (PDB ID: 1SYQ). (B) The binding profile for vinculin head domain obtained from phage display. (C) The interaction surface of VBS1 and vinculin head domain. VBS1 binds to vinculin head domain via amphipathic helix with hydrophobic residues facing vinculin core and charged residues facing the solution. No clear similarity was observed between the phage-derived peptides and known natural peptide binders of vinculin head domain. 45

60

3.3.6 Intracellular Transport: Intracellular transport is often mediated by peptide tags that guide the spatial localization of a protein. Often, specialized PRD’s are involved in the recognition of these peptide-based tags. In this study, we obtained four domains that are involved in intracellular transport. 3.3.6.1 Importin beta: Importin beta is a member of the family of nuclear transport receptors that are responsible for importing large macromolecules inside the nucleus [69]. Structurally, importin beta contains a single domain with a superhelical structure containing 12 helical repeats known as HEAT repeats connected via flexible linkers. These repeats contain two alpha helices, A and B, connected by a flexible linker. The A helix face the outer, convex surface of importin beta while the B helix is present on the inner concave surface. Importin-beta is known to bind to NLS and importin alpha via the C-terminal section, RAN-GTP via its N-terminal domain and the other proteins including the members of nuclear pore complex (FG nucleoporins) via its central region. The interaction between importin beta and the FG-nucleoporins is mediated by a FxFG motif present on this class of nucleoporins. 38

Figure 22: Structural and literature analysis of Importin beta. (A) The structure of importin-beta in complex with the FxFG peptide (PDB ID: 1F59). (B) Binding motif of importin-beta obtained from phage display. (C) The interaction surface between importin beta and the FxFG peptide. The key residues in this interaction are highlighted in yellow. The three key residues: two Phe and Gly are conserved in the binding motif obtained from phage display.

From phage display, I obtained 10 unique peptides against importin beta. All of these contain the FxFG motif (except one that contains Y instead of F at position 14) that is known to bind to the central region of importin beta (Figure 22). The structural analysis of importin beta in 45

61 complex with FxFG peptide from nucleoporins matches accurately with the binding motif obtained from phage display. (Figure 22) Different groups have previously developed specific modulators of nuclear import [70]. The peptides generated in this study may serve as inhibitors of the interaction of importin beta and FG nucleoporin and hence inhibit importin beta-mediated transport across the nuclear pore. Such peptide inhibitors may act as useful tools for studying the role of importin beta in cells.

3.3.6.2 UBA: The ubiquitin-associated (UBA) domain is an approx. 40 amino acid domain that was first recognized in proteins associated with ubiquitin but is also found in proteins involved in nucleotide excision-repair and nuclear transport [75]. UBA domains form three-helix bundles with a hydrophobic core that stabilizes the protein and possesses a conserved surface patch of hydrophobic amino acids that interacts with hydrophobic regions of ubiquitin and other target proteins. In this study, I targeted the UBA domain of nuclear export factor 1 (NXF1). NXF1 is a member of the family of proteins involved in mRNA export from the nucleus. NXF1-UBA domain is present on the C-terminal end of NXF1 and has been shown to be sufficient for nucleo-cytoplasmic shuttling and localization to the nuclear pore complexes (NPCs) in-vivo. NXF1 is also essential for the export of many viral RNAs bearing the constitutive transport element (CTE) [76]. 38

Figure 23: Structure and literature analysis of NXF1-UBA domain. (A) Structure of NXF1-UBA with the FxFG peptide (PDB ID: 1OAI). (B) The binding logo for NXF1-UBA obtained from phage display. (C) Binding surface of NXF1-UBA with the FxFG peptide. The phage display logo shows high conservation of Phe (pos 12) and Trp (pos 13). While the conservation of two aromatic residues is similar to two Phe residues (highlighted in yellow) in FxFG peptide; the lack of conservation of Gly residues (before and after the central hydrophobic residues) shows that the binding mode of phage- derived may be different from FxFG motif. Other differences include: preference of Trp(13) instead of Phe; hydrophobic residue (Phe/Leu/Ile) at postion 9 instead of the Asp residue in the FxFG motif. Based on structural analysis of the crystal structure, I predict that the Trp residue can fit into the hydrophobic binding pocket. 45

62

From phage display, I obtained a preference of FxW for NXF1-UBA (Figure 23). To obtain a deeper understanding of the binding mode of phage-derived peptides to NXF1-UBA, I analyzed the known structure of FxFG peptide (from nucleoporins) to NXF1-UBA (Figure 23) [77]. The sequence pattern of the phage-derived binding preference is similar to that of the FxFG peptide. While the hydrophobic residues are conserved between the phage-derived motif and the FxFG peptide, the results also suggest that the hydrophobic surface of TAP/UBA may have more flexibility than previously reported. This indicates that NXF1/UBA domains may bind to other proteins that contain a hydrophobic motif which can be predicted using the binding motif obtained from this study. To my knowledge, this is the first study to report the binding preference of NXF1-UBA. The peptides obtained in this study can be used to design peptide-based or small- molecule based inhibitors of NXF1 mediated nuclear export. There is enough evidence suggesting that blocking this interaction should be sufficient to inhibit NXF1-mediated transport. Inhibitors against NXF1 may be used to further probe the mechanisms of NXF1-mediated nuclear transport.

3.3.6.3 Bro1: The Bro1 domain is found in different eukaryotic proteins such as Alix (PDCD6IP), Brox and HD-PTP [78]. Structurally, Bro1 domain has a banana-shaped shaped structure that is organized around a core of tetratricopeptide helical hairpins. In this study, I targeted Alix Bro1 domain. Alix plays an important role in intracellular transport as an adaptor protein that recruits CHMP4/ESCRT-III complexes (via its Bro1 domain) to function at distinct biological membranes. Other functions include lysobisphosphatidic acid (LBPA) binding, endophilin binding, receptor trafficking, endosome distribution, cell motility/adhesion, apoptosis, actin and microtubule binding and regulation of JNK signalling. Alix has also been implicated in 38 the release of several other classes of enveloped viruses, including hepatitis B virus, dengue virus, yellow fever, HCV, SIV, RSV human para-influenza virus, and Sendai virus. The interaction between CHMP4 and the Alix-Bro1 domain is mediated by an amphipathic helix present in CHMP4 binding to helix 5-7 on Bro1 domain [78]. A second protein interaction site has been reported within the first half of the Bro1 domain, which interacts with the p6-adjacent nucleocapsid (NC) domain of Gag, a HIV protein. While the exact interaction surface between the Alix-Bro1 domain and the NC domain has not been identified; 45

63 residue substitutions in NC or within the first 200 residues of the Ali-Bro1 domain compromised HIV-1 release emphasizing the critical role of NC-Bro1 domain interaction in this process [79]. In this study, I obtained 15 peptides (7 unique peptides) against the Alix-Bro1 domain. Upon sequence alignment, a “Mxx[L/M]xx[W/L]” motif was resembled and it resembles the amphipathic helix derived from CHMP4C (Figure 24). Comparing the available structure of Alix-Bro1 domain in complex with CHMP4C peptide shows that the binding preferences obtained from phage display resembles the binding observed for CHMP4A peptides. Hence, I predict that the phage-derived peptides will block the interaction between Alix Bro1 domain and CHMP4 proteins thereby blocking the recruitment of ESCRT III complex. Hence, these peptides may play a role in discerning the role of Alix-Bro1 domain in a range of cellular pathways.

Figure 24: Structure and literature analysis of Alix-Bro1 domain. (A) The structure of Alix-Bro1 domain with the CHMP4C peptide (PDB ID: 3C3R). (B) The binding motif of the Alix-Bro1 domain. (C) The binding surface of Bro1 domain and CHMP4C peptide showing the amphipathic CHMP4A peptide with the hydrophobic surface towards the Bro1 domain. The pattern obtained from phage display shows two components: negatively charged patch at the N- terminal end and hydrophobic helical component with triad of hydrophobic residues: Met (pos 10); Leu/Met (pos 13) and Trp/Phe (pos16). The conserved hydrophobic triad in phage display corresponds to Ile; Leu and Trp (highlighted in yellow) in the CHMP4C peptide. 38

3.3.6.4 Clathrin heavy chain: Clathrin forms the outer coat of vesicles involved in cellular transport between different membrane locations [93]. Structurally, clathrin contains the N- terminal adaptor domain (CTD) and the alpha-helical repeats that forms the large part of clathrin heavy chain. CTD is a 7-bladed beta propeller that binds to compartment-specific adaptor proteins such as beta-arrestin and the adaptor protein complexes (AP-1 & AP-2). The interaction between the CTD and adaptor proteins is mediated by peptide-like motifs. The first linear motif obtained was the ‘clathrin-box’ consensus LΦxΦ[D/E] that was confirmed to bind between the 45

64 first two blades of the beta-propeller [94]. In recent years, other peptide motif variants have been shown to bind to the CTD, such as: the W-box motif (PWxxW) which binds to the top of CTD [95] and the [L/I][L/I]GxL motif, which binds between blade 4 and blade 5 of the CTD [96] (Figure 24). A fourth binding site has also been predicted between blade 6 and blade 7 using multiple sequence alignments however the binding preferences of this site has not yet been determined [97]. (Figure 25) In this study, 47 peptides (22 unique peptides) were obtained against clathrin terminal domain. Based on the sequence alignments, it was observed that structurally, the sequences could be split into 3 groups: Set 1: sequences with DΦxWΦ motif that resembled the clathrin-box peptide, albeit in the reverse order, Set 2: sequence with DxxDW motif that does not match any known clathrin binding motif and Set 3: with no consensus sequence to one-self or with Set 1 and Set 2 (Figure 25). It is surprising that no sequences were obtained that resembled the peptide-motifs reported in literature. Previous studies have suggested that the beta-propeller structure of CTD changes conformation upon binding to different peptides and small molecule ligands. This flexibility may allow peptides with distinct sequence to bind to the same binding surface and explain the diversity in peptide sequences that bind to the CTD. 38

Figure 25: The structure and literature analysis of the Clathrin terminal domain. (A) The structure of Clathrin terminal domain showing the four peptide binding sites: Site 1 with preference for LΦxΦ[D/E] peptides (PDB ID: 1C9I); Site 2 with preference for PWxxW peptides (PDB ID: 1UTC); Site 3 with preference for [L/I][L/I]GxL peptides (PDB ID: 3GD1) and Site 4 with an unknown binding preference. (B) The binding motifs obtained from phage display for the Clathrin terminal domain showing the two binding motifs. 45

65

A recent study by von Kleist et al identified a novel family of small molecules that bind to the clathrin terminal domain [98]. On structural analysis, it was observed that these molecules specifically bind to the “clathrin-box” interaction surface and blocked clathrin-mediated endocytosis. This observation is extremely important as peptides derived from phage display may act as specific inhibitors of clathrin-mediated endocytosis. Phage-derived peptides often show higher affinity and specificity towards target protein compared to small molecule inhibitors and hence may serve as probes for elucidating the role of clathrin-terminal domain in clathrin- mediated endocytosis. Further, the different set of peptides obtained from phage display can be combined to generate bivalent peptides to develop high affinity inhibitors of clathrin-mediated endocytosis.

3.3.7 Genome Regulation: A host of proteins interact with different parts of the genome. These include transcription factors that activate the activity of specific genes, histone modification enzymes that introduce post-translation medications on histone tails, RNA polymerase complex that initiates transcription of genes. In this study, I was successfully able to generate peptides against five PRDs that are involved in the regulation of the genome. 3.3.7.1 PCNA: Proliferating cell nuclear antigen (PCNA) is a single-domain protein that acts as a co-factor for DNA polymerase δ in eukaryotic cells [62]. Functional PCNA is a homotrimer forming a ring structure, in which three monomers are joined together in an anti-parallel head to tail interaction. Numerous protein partners interact with PCNA, including the DNA polymerase δ and the DNA polymerase ε for DNA replication, DNMT1, HDAC1, and p300 involved in chromatin assembly and gene regulation, DNA mismatch repair protein Msh3/Msh6 for DNA repair, p21(CIP1/WAF1) for cell cycle control, and ESCO1/2 for sister-chromatid cohesion. Most of the 38 interactions are mediated by a PCNA interaction peptide or PIP-box (QXXhXXaa; where h- hydrophobic residue, a – aromatic residue) that bind to a specific interaction surface in PCNA (Figure 26)[62]. Other interaction motifs are also present such as the non-canonical PIP box and other non-PIP box binding motif ([KR]-[FYW]-[LIVA]-[LIVA]-[KR]) [63]. Bacterial display has been performed against PCNA to identify a number of canonical and non-canonical PCNA binding peptides including two more major classes: YxxxY/TxxxxW and KA-box peptides [64]. 45

66

All these peptides bind to same binding surface as PIP box, albeit in different binding modes to recruit PCNA in diverse cellular pathways [65]. In our study, I identified two phage clones against PCNA both of which showed high enrichment ratio compared to GST (Figure 26). Both these peptides correspond to the same sequence shown in Figure 26. The sequence matches accurately to the PCNA-interacting motif (PIP-box) suggesting that the clonal ELISA results are accurate. The structure of PCNA in complex with PIP-box motif from FEN1 confirms that the key residues required for the PCNA- PIP interaction are found in phage-derived peptides (Figure 26). More sequences maybe required to generate a more accurate binding motif for PCNA.

Figure 26: Structural and literature analysis of PCNA. (A) The structure of PCNA in complex with the FEN1 peptide. (B) The single peptide sequence obtained from phage display showing binding to PCNA. (C) The interaction surface of PCNA and the FEN1-derived peptide. The key interactions are mediated by Gln; Leu and two consecutive Phe residues forming the canonical PIP box motif: Qxxhxxaa where h – hydrophobic and a – aromatic amino acids. These residues are conserved in the phage-derived peptide.

3.3.7.2 OB-fold: Oligonucleotide-oligosaccharide binding (OB fold) domains are ssDNA binding domains found in several proteins including the replication protein A 1 (RPA1), the 38 primary eukaryotic ssDNA binding protein [71]. Structurally, OB fold domains form a five- stranded beta-barrel with one end of the barrel capped by an alpha-helix. The ssDNA binding is mediated by one face of the beta-barrel and is conserved amongst all OB-fold members. N- terminal OB-fold domain of RPA 70KDa subunit (present in this study) is a unique example of OB-fold that is known to be a protein interaction module [72]. It plays a key role in central cellular processes such as DNA replication, damage response and repair by interacting with proteins such as phospho-RPA2, p53, RAD9,BID, MRE11, NBS1, Rad17, RAD52,BRCA2 and 45

67

ATRIP. The interaction between RPA70N and its interaction partners is mediated by the binding surface involved in ssDNA binding in other OB-fold domains. From phage display, I obtained 66 peptide sequences (11 unique sequences) for the RPA70N domain. These contain a consensus motif shown in Figure 27. The structure of RPA70N with phospho-mimic peptide from p53 indicates that the peptides bind to the canonical peptide binding surface. RPA has already been proven to be a valid target for cancer therapy. Small molecule inhibitors that target the central OB-folds of RPA70, RPA70A and RPA70B, have shown to induce cytotoxicity and increase the efficacy of genotoxic chemotherapeutics [73]. Peptide-based inhibitors of RPA70 N should inhibit the binding of RPA to multiple checkpoint proteins (at least ATRIP, RAD9, MRE11, and p53) and hence significantly impair the replication stress response. Cancer cells are more dependent on replication stress response than normal cells to complete replication and retain viability. Thus inhibitors of RPAN may amplify levels of DNA damage in cancer cells caused by a wide variety of genotoxic agents [74]. Apart from potentially acting as anti-cancer agents, peptide inhibitors to RPA70N may also be used to further understand the role of RPA in cell cycle and DNA repair and may serve as valuable tools for studying RPA biology. 38

Figure 27: Structural and literature analysis of the RPA70N OB-fold domain. (A) Structure of RPA70N in complex with the p53-derived peptide. (B) Binding logo of RPA70N obtained from phage display. (C) The interaction surface between RPA70N and the p53 peptide. The key interactions are mediated by Asp, Leu and Met residues (highlighted in yellow). These residues are conserved in the binding motif obtained from phage display with an additional possibility of Glu instead of Leu at position 12 of the motif. The phage-derived binding motif also contains an aromatic residue (Trp, Tyr) at position 17 which is not present in the p53 peptide.

3.3.7.3 Ligand binding domain of nuclear receptors: Nuclear receptors (NR) are a group of transcription factors that regulate the expression of target genes in response to binding of small 45

68 molecules such as steroids, hormones and metabolites [80]. They are an integral part of several key cellular signalling pathways regulating homeostasis, proliferation etc. Nuclear receptors contain two structured domains – a DNA binding domain (DBD) that specifically recognizes hormone response elements (HRE) on the DNA and a ligand binding domain (LBD) that binds to small-molecule ligand and interacts with co-regulators. Structurally, the LBD forms a globular structure composed of a three-layered α-helical sandwich that contains 12 alpha helices. The C- terminal helix 12 is highly mobile, and is stabilized upon ligand binding into a position that completes the recognition surface for binding of co-regulators. Upon binding to LBD, co- regulators may recruit RNA-polymerase to initiate transcription of downstream genes (also called co-activators) or HDAC complex to silence gene expression (also called co-repressors) [15]. Most of the co-activators and co-repressors identified to date interact primarily with nuclear receptor activation function 2 (AF2), which is located on the LBD itself. These include p160 steroid co-activator family members (SRC), p300 and related integrator proteins, TRAP mediator complex, and various other co-activators. The region (called NR Box) responsible for binding in co-activators comprises of a short alpha-helical LxxLL motif. Although this motif is necessary to mediate the binding of these proteins to liganded NR's, amino acids flanking the core motif dictate specificity of interaction. The three conserved leucines align on the face of the α-helix that packs against the hydrophobic channel of the LBD surface. The binding of the co-repressors occurs at the same pocket as the co-activators but in a different mode. In its unliganded state, the C-terminal helix 12 arranges itself along the helix3 to form the peptide binding pocket. The NR- box responsible for binding to co-repressors such as NCoR and SMRT comprise of the Lxx[I/H] IxxxL motif. In co-repressor motif, the hydrophobic residues pack against the structure of hydrophobic channel of the LBD surface [81].

Out of the four NR LBDs present in the initial dataset, only the LBD of bile acid receptor 38 (NR1H4) could be successfully expressed and purified. Bile acid receptor binds to its natural ligand bile acid and is involved in metabolism of bile acid in liver tissue [82]. The phage selections were done in the absence of bile acid ligand and Figure 28 shows the peptides obtained from phage display experiments. To rationalize the consensus motif obtained from phage display, I analyzed the structure of the nuclear receptor in complex with the co-repressor peptide. Since no structure of bile acid receptor bound to its co-repressor (SMRT or NCoR) is available, the structure of the closest nuclear receptor, PPARα in complex with SMRT peptide 45

69 was used for analysis [81]. Based on sequence alignment of SMRT and phage-display peptides, it is observed that the key hydrophobic residues responsible for binding of the SMRT peptide are exchanged by bulkier hydrophobic and aromatic residues in the phage-derived peptides. This has not been observed previously and may represent a novel mode for peptide binding to LBD of NR1H4. Further biochemical and structure analysis are required to confirm the binding of these peptides. Nonetheless these might represent novel binding partners for the bile acid receptor.

Figure 28: Structure and literature analysis of the NR1H4 ligand biding domain. (A) The structure of PPARα LBD in complex with the SMRT peptide (PDB ID: 1KKQ). (B) The binding motif for the NR1H4-LBD obtained from phage display. (C) The interaction surface between PPARα and the SMRT peptide. The key interactions between PPARα and the SMRT peptide are mediated by two Leu and Ile residues (highlighted in yellow) that stack against the hydrophobic surface of PPARα. Based on sequence alignment to SMRT peptide, the hydrophobic and aromatic residues conserved in the phage display results correspond to the key residues in the interaction.

Nuclear receptors are amongst the most important classes of drug targets for various diseases including breast and prostate cancer. Often, the small molecule drugs bind to ligand binding pocket and de-activate the NR-LBD. Other peptide- or small-molecule inhibitors affect 38

binding of NR-LBD to their co-regulators. NR-LBD/co-regulator binding is critical for downstream activity of nuclear receptors and inhibition of this interaction results in disruption of nuclear receptor action [15]. The peptides identified in our study bind to bile acids receptor in its unliganded form. Once confirmed, these peptides may modulate the activity of this important class of drug targets.

3.3.7.4 WD40 domains: WD40 domains consist of sequence repeats of 44-60 residues that have a four-stranded anti-parallel beta sheet which come together to form a beta-propeller fold 45

70

[88, 89]. The most common of these domains are seven-bladed beta-propeller that contains seven WD40 repeats. WD40 domains are among the most common domain types across eukaryotic proteomes and act as scaffolds for several key cellular pathways. In this study, I targeted the WD40-repeat containing protein 5 (WDR5). WDR5 is an adaptor protein that forms part of the Set1-family of methyl-transferase. It binds to unmodified histone H3 and members of the Set1 family of methyl-transferases via the top of the beta-propeller [90]. Residues from a small region located close to the MLL catalytic site (called the Win-motif) binds in a 3/10-helical conformation within the central depression of the beta-propeller, and the residues that follow extend away from this cleft along the surface of WDR5. From phage display, I obtained a clear consensus motif “R[T/W]xxW” with a strong preference for Arg at the central position of the residue. The structure of WDR5 in complex with the peptide obtained from MLL4 peptide provides me with confidence that the phage derived peptides bind to the interaction surface on top of the beta propeller (Figure 29) [91]. WDR5 is an important target for regulating histone modifications, specifically histone methylation. The interaction of WDR5 and histone H3 is critical for this event and inhibition of this may potentially block the methylation of histone by the Set1 family of methyltransferase [92]. Hence the peptides obtained in this study may be used to inhibit the function of Set1 family of methyltransferase as intracellular probes for studying the biology of this important class of methyltransferase. 38

Figure 29: The structure and literature analysis of WDR5. (A) The structure of WDR5 with the MLL4 peptide (PDB ID: 3UVM). (B) The binding motif of WDR5 domain obtained from phage display. (C) The binding surface of WDR5 and the MLL4 peptide with the key residues involved in WDR5 interaction highlighted in yellow. These residues are conserved in the phage-derived binding motif. 45

71

3.3.7.5 TRF homology domain: TRFH domain found in proteins like TERF1 and TERF2, that are part of the shelterin complex [106]. TERF1 and TERF2 are homologous proteins that associate with the full length of the double-stranded portion of the telomere. The centrally located TERF1 TRF homology domain is a known protein interaction domain that mediates the recruitment of telomere binding proteins such as tankyrase and TIN2. Structurally, the TRFH domain consists of nine α-helices forming an elongated . TRF1 recognizes TIN2 and PinX1 using a conserved interaction surface on its TRF homology (TRFH) domain. The N terminus of TIN2 peptide adopts an extended conformation stabilized by an extensive intermolecular hydrogen-bonding network with key interactions made by Leu, Phe and Pro residues. The binding motif that is proposed based on structural studies is: F/YxLxP (Figure 30) [107]. In this study, I obtained a limited number of peptides against TRFH domain of TERF1 (Figure 30). These sequences did not show any strong binding preference to known natural peptides. More sequences are required to identify the binding preference of these peptides. Further, these peptides should be investigated further for identifying in detail, the binding preferences of TRFH domain. 38

Figure 30: Structural and literature review of the TRFH domain of TERF1. (A) The structure of the TERF1-TRFH domain with the TIN2 peptide (PDB ID: 3BQO). (B) The peptide sequences obtained from phage display against the TERF1-TRFH domain. (C) The binding surface of TERF1-TRFH domain and the TIN2 peptide. The key interactions are mediated by Leu and Phe residues on Tin2 (highlighted in yellow). The number of sequences obtained from phage display is limited and hence it is difficult to obtain a clear consensus motif for the TERF1-TRFH domain. However, the peptides obtained in this study do not contain the F/YxLxP motif described by Chen Y et al 2008.

3.3.8 Miscellaneous: Apart from the aforementioned examples, I was also able to generate peptide against a handful of domain families that are involved in different cellular pathways. 45

72

These include the SWIB/MDM2 domain family involved in apoptosis, eIF4E involved in translation initiation, HORMA domain involved in cell cycle regulation and ubiquitin that is involved in ubquitin-proteasomal degradation system. 3.3.8.1 SWIB/MDM2:The SWIB/MDM2 family of domains are found in the Mdm2 family of oncoproteins that are known regulators of p53 and SWI/SNF family of ATP-dependent chromatin-remodelling proteins [51]. In MDM2 proteins, SWIB/MDM2 domain binds to the transactivation domain of p53, allowing the degradation of p53. Structurally, the SWIB/MDM2 domain contains six beta sheets and four alpha helices. The binding surface is a hydrophobic cleft formed by two alpha helices where the alpha-helical transactivation domain of p53 binds. The key interaction between SWIB/MDM2 and p53 is mediated by a triad of residues on the p53: Phe, Trp and Leu on the peptides that insert into the hydrophobic pocket on MDM2 [52]. In this study, we identified peptides against the SWIB/MDM2 domain of MDM4 or MDMX, a member of MDM2 family of proteins that is domain that is essential for regulating p53. A high enrichment ratio was obtained for SWIB/MDM2 domain during phage selections. A binding preference obtained showed a clear “FxxxWxxL” motif which correlates with previous structural and biochemical analysis. This clearly suggests that the peptides generated in this study bind to the same interaction surface that binds to the p53 peptide (Figure 31). 38

Figure 31: Structure and literature analysis of MDM4. (A) The structure of MDM4 in complex with the peptide from p53 (PDB ID: 3DAB). (B) The binding motif obtained for MDM4. (C) The interaction surface of MDM4 with p53 peptide. The key interactions between the two domains are mediated by triad of residues: Phe, Trp and Leu on the binding surface (shown in yellow). These residues are conserved in the consensus motif. Interestingly, the binding motif suggests that Leu at the third triad position can also accommodate a Met residue.

Disruption of the p53 tumor suppressor pathway due to mutations on the p53 gene is found in approximately 50% of all cancers. Several genome-wide functional genomics studies have revealed an increased MDM4 copy number in 65% of human retinoblastomas [53]. Ectopic 45

73 expression of MDM4 in mouse Rb-null p107-null retinal progenitor cells leads to a reduction in the p53-mediated apoptosis and a clonal expansion of tumor cells. On the contrary, colony assays have shown that knocking down MDMX blocks proliferation of MCF-7 cells unless p53 levels are simultaneously decreased. Nutlin-3, a dual inhibitor of MDM2 andMDM4 reduces the MDM2/4-p53 interaction and efficiently kills retinoblastoma cells [53]. Other small-molecule and peptide-based MDM4 inhibitors have been identified and tested for anti-cancer activities. The peptides generated in this study are predicted to bind to the same p53 binding site on MDM4 and hence may have similar effect on p53-mediated apoptosis pathway.

3.3.8.2 Horma domain: The HORMA family of domains are found in several eukaryotic proteins such as Mad2, a protein that is involved in mitosis checkpoint [60]. The core of the HORMA fold contains three α-helices sandwiched between a six-stranded β-sheet and an irregular β-hairpin. In the Mad2 protein, the domain exists in two conformationally distinct forms: open form (O-Mad2) and closed form (C-Mad2). Both recombinant and endogenous Mad2 are predominantly folded as O-Mad2 while C-Mad2 forms upon binding to its interaction partners – Mad1, p31comet and Cdc20. 38

Figure 32: Structure and literature analysis of HORMA domain. (A) The structure of HORMA domain of Mad2A in complex with the phage-derived Mad2 binding peptide (MBP1: PDB ID – 1KLQ). (B) The binding logo obtained from phage display done previously (bottom – Luo et al 2002 [61]) and this study (on top). The binding motifs are similar to each other with similar conserved positions. (C) The interaction surface of MBP1 and Mad2A. The key residues involved in the interaction (highlighted in yellow): Trp(2) are; Tyr(3); Pro(7); Pro(8); Gln(9) and Arg(10). In out binding logo, we see conservation of Gly before the highly conserved hydrophobic region and Lys/Arg after the poly-proline region. This can be attributed to the longer library used in this study which may have produced a longer binding motif.

Phage display has been used previously to generate peptides against Mad2 [61]. The binding preference obtained is shown in the Figure 32. The core of the motif consists of two hydrophobic residues, a basic residue, and a third hydrophobic residue (Figure 32). This core motif is 45

74 generally followed by a proline-rich sequence. Known interaction partners of Mad2A Cdc20 and Mad1 contain a consensus similar to that found in phage display results. In this study, I obtained a similar binding preference as observed previously suggesting that the peptides obtained here will bind to the same binding site as described earlier.

3.3.8.3 eIF4E: The eukaryotic translation initiation factor eIF4E is a protein that exists as part of the translation pre-initiation complex (eIF4F) involved in directing ribosomes to the cap structure of mRNAs [100]. elF4E has a curved eight-stranded antiparallel β-sheet with three helices forming the convex face and three smaller helices inserted in connecting loops and binds directly to the mRNA cap. The m7G of the mRNA cap binds to a stack of tryptophan residues on the concave face. EIF4E is recruited to the eIF4F complex via its interaction partner eIF4G by binding to a conserved binding site. The eIF4E peptide binding site is located in a region encompassing one edge of the β-sheet, the adjacent helix α2 and several regions of non-regular secondary structure on the convex surface of eIF4E. The peptide from eIF4G forms a helical structure and consensus motif is YxxxxLΦ [101]. EIF4E also binds to the RING domain of Z protein found in anenovirus and tumor suppressor protein PML. A recent study has identified structural and biochemical characterization of the interaction which is mediated by a site that is distinct from the known YxxxxLΦ peptide motif [102]. 38

Figure 33: The structure and literature analysis of eIF4E. (A) The structure of eIF4E with the eIF4G peptide (PDB ID: 3UVM). (B) The sequences of peptides obtained from phage display against eIF4E. (C) The binding surface of eIF4E and the eIF4G peptide.

In this study, I obtained a limited set of peptides for eIF4E (Figure 33). A handful of these peptides contained sequences similar to the YxxxxLΦ motif. However, a subset of phage- derived eIF4E binders do not show the YxxxxLΦ motif (Figure 33). This observation is critical 45

75 as these peptides may bind to eIF4E at a site different from the canonical peptide binding site. However, more sequences are required to confirm the binding of the phage-derived peptides to eIF4E. EIF4E is an important target for cancer therapy. EIF4E is up-regulated in a number of malignancies and over-expression of eIF4E leads to tumorigenesis in mice models. Inhibition of eIF4E in various cancer models leads to apoptosis and reduction of the level of oncogenic protein Ras. Herbert et al first reported that peptides obtained from its natural binder 4E-BP1, that bind to the initiation factor eIF4E and induces apoptosis in MRC-5 cells [103]. Peptides derived from eIF4G have been used to identify a small molecule (4EGI-1) that binds to eIF4E and inhibits its recruitment to the eIF4F complex [40]. When introduced inside cells, these molecules lead to the inhibition of growth of multiple cancer cell lines. Phage-derived peptides may join the growing list of inhibitors of eIF4E.

3.3.8.4 Ubiquitin: Ubiquitin is small protein that can be attached to a range of proteins to affect their cellular fate. Ubiquitination is an important post-translation modification that directs protein recycling. A number of proteins recognize the ubiquitin tag on other proteins. One of the most common modules to bind to ubiquitin is the ubiquitin-interacting motif (UIM) [103]. The UIM is found in a number of proteins involved in endocytosis and vacuolar protein sorting including Hrs, Vps27p, Stam1, and Eps15. The UIM consists of an amphipathic α-helical structure with hydrophobic core sequences composed of alternating large and small residues (Leu-Ala-Leu- Ala-Leu) that are flanked on both sides by patches of acidic residues. Sequence analyses of known UIM have been used to define a more general 15-residue UIM motif: eeexΦxxAΦx [e/Φ]Saxe; where x is a helix-favoring residue and a is a bulky hydrophobic or polar residue with considerable aliphatic content, e is a negatively charged residue and Φ is a hydrophobic residue 38

[104]. Structurally, ubiquitin contains three and one-half turns of α-helix, a short 310-helix, a five strand β-sheet. The five-stranded beta-sheet of ubiquitin constitutes the principal interaction surface for the Vps27 UIM helix. The helix binds in an antiparallel orientation relative to the C- terminal β-sheet, and interacts with the β-sheet 4 & 5 and loops between β-sheet 1& 2 and β- sheet 4 & 5. The UIM forms a left-handed, antiparallel helix with the hydrophobic face of the amphipathic helix facing the ubquitin molecule [103]. 45

76

In this study, I obtained five peptides (four unique peptides) against ubiquitin (Figure 34). No consensus motif was obtained as the numbers of sequences were low. I also observed an abundance of tryptophan residues at different positions. The clonal ELISA results confirm that the peptides obtained here are specific for GST-tagged ubquitin compared to GST alone. More sequence maybe required to obtain a clear a binding preference. Also biochemical analyses are required to confirm the binding of the phage-derived peptides. Nevertheless, if confirmed these peptide may serve as potential reagents to recognize ubiquitin. In recent years, there have been advances in using UIM chains to develop intracellular reagents to detect specific ubiquitin chains on proteins [105]. The peptides obtained here, if effective, may potentially help in developing better intracellular reagents.

Figure 34: Structure and literature analysis of ubiquitin. (A) The structure of ubiqutin with the Vps27p UIM (PDB ID: 1Q0W). (B) The peptides obtained from phage display against ubiquitin. (C) The binding surface of ubiqutin and Vps27p shows the key residues required for interaction with ubiqutin. Ubiquitin interaction motif can be divided to two regions: a central amphipathic helix with conserved Ala and Ser residues (highlighted in yellow) and a N-terminal negatively charged helix (highlighted in yellow). The phage-derived peptides do not fall into the canonical UIM and may represent a 38 novel mode of binding to ubiquitin.

3.4 Summary The second aim of this study was to generate peptide binders against shortlist proteins using phage display. To this end, I selected all 66 domains that were identified by my computational analysis. These domains were cloned into a pGEX expression vector and expressed as purified proteins. 44 out of 66 domains were successfully purified using established GST-purification protocol. SDS PAGE gels were used to identify the steps of the protein 45

77 purification where protein purification failed for 22 domains. Interestingly, all the 22 domains were found to be insoluble and hence could not be obtained in the cell lysate. Different growth and lysis conditions may be required to solubilise these domains and increase protein yield. For each of the purified domains, I used a 16 amino acid length random library to screen for peptide binders. The selection procedure was optimized by incorporating pre-selection in GST wells and in-solution GST negative selection. These modifications helped in removal of non-specific peptide binders and significantly reduced the background noise in phage selections. From my phage display screen, I was able to obtain specific peptides against 27 of the 44 domains. These domains belonged to different structural families and exhibited divergent binding preferences. This highlights the power of phage display to identify distinct binding preferences using a single random library. Further, I was able to generate position weight matrices (PWM) for 22 of the 27 domains. Extensive structural analysis and literature survey were performed to examine the peptides obtained from phage display. For 20 of the 27 domains, I obtained peptides that resemble natural peptide partners for these domains. This provides me with confidence that these peptides may block the binding of endogenous partner of these domains, thereby perturbing their cellular function. For these cases, I have predicted the phenotypic effects of peptide-mediated inhibition and enlisted the potential uses of these peptides. For the remaining seven domains, the peptides obtained did not resemble the known natural peptide partner. In such cases, based on current evidence, I have predicted the potential binding surface and their binding mode. For CAP/Gly domain, I generated a structural model for binding of the phage-derived peptide. The structural model is currently being tested by Dr. Yugeng Tong, the principal investigator at the Structural Genomics Consortium (SGC), Toronto. For domains where I was not able to make

structural based models, I have suggested potential binding modes. These domains include 38

PDCD6 penta-EF hand domain and clathrin terminal domain. For seven domains, the peptide sequences obtained were insufficient to generate a high confidence binding motif. These include eIF4E, TRFH domain of TERF1 and ubiqutin. Further experiments are required to predict the binding mode of these peptides. Nonetheless, these results do suggest a non-canonical peptide binding mode for the known peptide binding domains. Once verified, these binding modes may uncover novel protein partners and biological roles for these domains. 45

78

4 CONCLUSIONS 38

45

79

4.1 Summary of work: Peptide recognition domains (PRD) play key roles in cellular pathways regulating homeostasis and cellular signalling. Such domains are frequently mis- regulated in diseases including cancer. Considerable progress has been made in developing specific small-molecule and peptide based reagents against a limited families of PRDs. The central aim of this study was to use to phage display to generate peptide probes against a diverse- set of cancer-related PRDs. In Chapter 2, I covered my work in identifying cancer-relevant peptide recognition domains. To this end, I focused on a list of proteins related to ovarian cancer. These candidate genes were identified by our collaborators using whole genome RNAi screens on 15 different ovarian cancer cell lines. I developed a computational methodology to identify target domains present on these candidate genes that share high sequence similarity to known PRDs. A set of known PRDs were obtained from online databases such as PepX and DOMINO. The list of potential PRDs identified from my computational pipeline was manually curated and analyzed. Based on my analysis, I selected 66 domains as targets for further work. In Chapter 3, I described the phage display pipeline used to identify peptides against each of the 66 target domains. First, using the standard GST purification method, I successfully purified 44 of the 66 domains. Second, I used a 16 amino acid random library to obtain peptides against 27 of the 44 purified domains. Third, I validated the phage derived peptides using an extensive structural analysis and literature review. Based on this analysis, I was able to accurately predict the peptide binding mode for a large proportion of the 27 domains. For the domains where accurate models could not be generated, I have listed future experiments required to fully elucidate the mechanism of binding. For each of the 27 domains, I have also included potential applications of phage-derived peptides.

Based on the results obtained thus far, I have been able to successfully generate binding 38 preferences for 22 known PRDs that belong to 15 different protein families. For 11 of these protein families, this represents the first phage display study done to elucidate their binding preferences.

4.2 Future experiments: The current study has yielded several promising results including novel peptide binding modes for known PRDs. However, further experiments are required to validate the results obtained from phage display. 45

80

One of the first follow-up experiments required is to perform deep sequencing on all the phage pools obtained from phage display. For each of the domains on which selections were done, 96 phages were picked for DNA sequencing and generating peptide motifs. While for a large set of domains, the sequences obtained were sufficient to accurately map the binding preference, for others such as ubiquitin, TRFH domain of TERF1 and eIF4E, we could not accurately predict the binding preferences. To this end, deep sequencing may help in providing a large set of peptide sequences for each of these domains. Previous studies in the Sidhu lab have successfully used deep sequencing to map the binding preferences of a large set of synthetic PDZ domains [108]. Our group was also the first to report multiple peptide binding preferences exhibited by a subset of known PDZ domains based on results obtained from deep sequencing [109]. Such an analysis can be extended to all domains that were screened in this study. In our study, I have identified peptide with potentially novel binding motifs. One of the most promising results obtained was the CAP/Gly domain of DCTN1. CAP/Gly domains have been studied in the context of binding to C-terminal peptides. I was able to obtain an internal peptide against this domain. Further experimental evidence is required to elucidate the mechanism of binding of this peptide. First, I need to perform biochemical assays such as iso- terminal calorimetry (ITC) to confirm the interaction between the domain and the peptide. This would also provide an accurate measure of the binding affinity of an internal peptide and the CAP/Gly domain. Once the peptide binding is confirmed, I also require a structure of the domain in complex with the phage-derived peptide to validate my structural model. We are currently collaborating with Dr. Yufeng Tong at Structural Genomics Consortium, Toronto, to obtain a crystal structure of the CAP/Gly domain in complex with the phage-derived peptide. Work done for CAP/Gly can be extended to other domains such as the penta-EF hand domain of

PDCD6 and the clathrin terminal domain for which we were not able to generate a structural 38 models to explain results obtained from phage display experiments.

4.3 Potential avenues for research: The current study provides several potential avenues for further investigation. One of the first steps is to extend the current study to the family members of domains for which binding preference were successfully obtained. In the Sidhu lab, we have generated a high-resolution binding preference map of PDZ, SH3 and WW domains. As previously mentioned, we were able to generate binding preferences for 15 different protein 45

81 families of which for 12 domains this represents the first phage display study done. We can now potentially extend our study to include all members of these 12 protein families. These families include the ligand binding domains of nuclear receptors, 14-3-3, Gα subunits and WD40-repeat containing proteins. These protein families play important roles in key cellular pathways and some have been established as important drug targets (such as ligand binding domain of nuclear receptors, 14-3-3) in cancer. The phage-derived binding preferences can be used to obtain high specificity and high affinity peptide probes against these families. Phage display can also be used to probe other cancer-related targets. In this study, we focussed on candidate genes identified by our collaborator using whole genome RNAi screens in ovarian cancer. Other functional genomics screens including exome sequencing, cDNA hybridization microarrays have been used to predict potential cancer-relevant genes. In principle, the current study can be extended to candidates obtained by such screens. The study can also be used for other diseases including other cancer types. In the Sidhu lab, we have developed a high- throughput phage display methodology to screen 96 targets in a single experiment [110]. This methodology can be readily used to target members from diverse set of protein families.

4.4 Applications of phage-derived peptides: The peptides generated here may serve as valuable tools for the scientific community. Peptide-based probes have been routinely used to structurally characterize the interaction mediated by PRDs and identify novel interaction partners. Peptides derived here may be used to design intracellular probes for studying specific biological pathways. Finally, the phage-derived peptides may assist in identification of small- molecule drugs against specific domains.

4.5 Final Remarks: I have presented here a systematic application of phage display pipeline 38 to rapidly identify peptides against a diverse set of domains. To my knowledge, this is the first successful application of phage display against such a diverse class of domains. 45

82

5 REFERENCES

38

45

83

[1] Pawson, T., Nash, P., Assembly of Cell Regulatory Systems Through Protein Interaction Domains.Science 2003, 300, 445 –452. [2] Kim, P.M., Sboner, A., Xia, Y., Gerstein, M., The role of disorder in interaction networks: a structural analysis. Mol Syst Biol n.d., 4, 179–179. [3] Pawson, T., Warner, N., Oncogenic re-wiring of cellular signaling pathways. Oncogene 0000, 26, 1268–1275. [4] Wells, J.A., McClendon, C.L., Reaching for high-hanging fruit in drug discovery at protein–protein interfaces. Nature 2007, 450, 1001–1009. [5] Teyra, J., Sidhu, S.S., Kim, P.M., Elucidation of the binding preferences of peptide recognition modules: SH3 and PDZ domains. FEBS Letters n.d. [6] Rual, J.-F., Venkatesan, K., Hao, T., Hirozane-Kishikawa, T., et al., Towards a proteome-scale map of the human protein–protein interaction network. Nature 2005, 437, 1173–1178. [7] Puntervoll, P., Linding, R., Gemünd, C., Chabanis-Davidson, S., et al., ELM server: A new resource for investigating short functional sites in modular eukaryotic proteins. Nucleic Acids Res. 2003, 31, 3625–3630. [8] Jones, S., Thornton, J.M., Principles of protein-protein interactions. Proceedings of the National Academy of Sciences 1996, 93, 13 –20. [9] Ceol, A., Chatr-aryamontri, A., Santonico, E., Sacco, R., et al., DOMINO: a database of domain– peptide interactions. Nucleic Acids Res 2007, 35, D557–D560. [10] Encinar, J.A., Fernandez-Ballester, G., Sánchez, I.E., Hurtado-Gomez, E., et al., ADAN: a database for prediction of protein–protein interaction of modular domains mediated by linear motifs. Bioinformatics 2009, 25, 2418 –2424. [11] Vanhee, P., Reumers, J., Stricher, F., Baeten, L., et al., PepX: a structural database of non-redundant protein–peptide complexes. Nucleic Acids Res 2010, 38, D545–D551. [12] London, N., Movshovitz-Attias, D., Schueler-Furman, O., The Structural Basis of Peptide-Protein Binding Strategies.Structure 2010, 18, 188–199. [13] Clackson, T., Wells, J., A hot spot of binding energy in a hormone-receptor interface. Science 1995, 267, 383 –386. [14] Johnston, C.A., Willard, F.S., Jezyk, M.R., Fredericks, Z., et al., Structure of Galpha(i1) bound to a GDP-selective peptide provides insight into guanine nucleotide exchange. Structure 2005, 13, 1069– 1080. [15] McKenna, N.J., Lanz, R.B., O’Malley, B.W., Nuclear Receptor Coregulators: Cellular and Molecular Biology. Endocrine Reviews 1999, 20, 321 –344. [16] Seet, B.T., Dikic, I., Zhou, M.-M., Pawson, T., Reading protein modifications with interaction domains. Nature Reviews Molecular Cell Biology 2006, 7, 473–483. [17] Good, M.C., Zalatan, J.G., Lim, W.A., Scaffold Proteins: Hubs for Controlling the Flow of Cellular Information. Science 2011, 332, 680–686. [18] Youle, R.J., Strasser, A., The BCL-2 protein family: opposing activities that mediate cell death. Nature Reviews Molecular Cell Biology 2008, 9, 47–59. 38 [19] Tanaka, S., Louie, D.C., Kant, J.A., Reed, J.C., Frequent incidence of somatic mutations in translocated BCL2 oncogenes of non-Hodgkin’s lymphomas. Blood 1992, 79, 229–237. [20] Chittenden, T., Harrington, E.A., O’Connor, R., Remington, C., et al., Induction of apoptosis by the Bcl-2 homologue Bak. , Published online: 20 April 1995; | doi:10.1038/374733a0 1995, 374, 733– 736. [21] Wang, J.-L., Zhang, Z.-J., Choksi, S., Shan, S., et al., Cell Permeable Bcl-2 Binding Peptides: A Chemical Approach to Apoptosis Induction in Tumor Cells. Cancer Res 2000, 60, 1498–1502. [22] Oltersdorf, T., Elmore, S.W., Shoemaker, A.R., Armstrong, R.C., et al., An inhibitor of Bcl-2 family proteins induces regression of solid tumours. Nature 2005, 435, 677. [23] Frank, R., The SPOT-synthesis technique: Synthetic peptide arrays on membrane supports— principles and applications. Journal of Immunological Methods 2002, 267, 13–26. 45

84

[24] Yu, H., Chen, J.K., Feng, S., Dalgarno, D.C., et al., Structural basis for the binding of proline-rich peptides to SH3 domains. Cell 1994, 76, 933–945. [25] Filippakopoulos, P., Picaud, S., Mangos, M., Keates, T., et al., Histone Recognition and Large-Scale Structural Analysis of the Human Bromodomain Family. Cell 2012, 149, 214–231. [26] Mok, J., Kim, P.M., Lam, H.Y.K., Piccirillo, S., et al., Deciphering Protein Kinase Specificity through Large-Scale Analysis of Yeast Phosphorylation Site Motifs. Sci Signal n.d., 3, ra12–ra12. [27] Sidhu, S.S., Lowman, H.B., Cunningham, B.C., Wells, J.A., Phage display for selection of novel binding peptides. Meth. Enzymol. 2000, 328, 333–363. [28] Tonikian, R., Zhang, Y., Sazinsky, S.L., Currell, B., et al., A specificity map for the PDZ domain family. PLoS Biol. 2008, 6, e239. [29] Tonikian, R., Xin, X., Toret, C.P., Gfeller, D., et al., Bayesian modeling of the yeast SH3 domain interactome predicts spatiotemporal dynamics of endocytosis proteins. PLoS Biol. 2009, 7, e1000218. [30] Tong, A.H.Y., Drees, B., Nardelli, G., Bader, G.D., et al., A Combined Experimental and Computational Strategy to Define Protein Interaction Networks for Peptide Recognition Modules. Science 2002, 295, 321–324. [31] Heinis, C., Rutherford, T., Freund, S., Winter, G., Phage-encoded combinatorial chemical libraries based on bicyclic peptides. Nature Chemical Biology 2009, 5, 502–507. [32] Bernal, F., Wade, M., Godes, M., Davis, T.N., et al., A stapled p53 helix overcomes HDMX- mediated suppression of p53. Cancer Cell 2010, 18, 411–422. [33] Abedi, M.R., Caponigro, G., Kamb, A., Green fluorescent protein as a scaffold for intracellular presentation of peptides. Nucl. Acids Res. 1998, 26, 623–630. [34] van de Wijngaart, D.J., Dubbink, H.J., Molier, M., de Vos, C., et al., Inhibition of functions by gelsolin FxxFF peptide delivered by transfection, cell-penetrating peptides, and lentiviral infection. Prostate 2011, 71, 241–253. [35] Stanger, K., Steffek, M., Zhou, L., Pozniak, C.D., et al., Allosteric peptides bind a caspase zymogen and mediate caspase tetramerization. Nature Chemical Biology 2012, 8, 655–660. [36] Zhang, Y., Appleton, B.A., Wiesmann, C., Lau, T., et al., Inhibition of Wnt signaling by Dishevelled PDZ peptides. Nat. Chem. Biol. 2009, 5, 217–219. [37] Cook, D.J., Teves, L., Tymianski, M., Treatment of stroke with a PSD-95 inhibitor in the gyrencephalic primate brain. Nature 2012, 483, 213–217. [38] Bach, A., Clausen, B.H., Møller, M., Vestergaard, B., et al., A high-affinity, dimeric inhibitor of PSD-95 bivalently interacts with PDZ1-2 and protects against ischemic brain damage. PNAS 2012, 109, 3317–3322. [39] Li, L., Thomas, R.M., Suzuki, H., De Brabander, J.K., et al., A Small Molecule Smac Mimic Potentiates TRAIL- and TNFα-Mediated Cell Death. Science 2004, 305, 1471–1474. [40] Moerke, N.J., Aktas, H., Chen, H., Cantel, S., et al., Small-Molecule Inhibition of the Interaction between the Translation Initiation Factors eIF4E and eIF4G. Cell 2007, 128, 257–267. [41] Marcotte, R., Brown, K.R., Suarez, F., Sayad, A., et al., Essential Gene Profiles in Breast,

Pancreatic, and Ovarian Cancer Cells. Cancer Discovery 2012, 2, 172–189. 38 [42] Luo, B., Cheung, H.W., Subramanian, A., Sharifnia, T., et al., Highly parallel identification of essential genes in cancer cells. Proc Natl Acad Sci U S A 2008, 105, 20380–20385. [43] Hooda, Y., Kim, P.M., Computational structural analysis of protein interactions and networks. PROTEOMICS 2012, 12, 1697–1705. [44] Sidhu, S.S., Phage Display In Biotechnology And Drug Discovery, CRC Press, 2005. [45] Tonikian, R., Zhang, Y., Boone, C., Sidhu, S.S., Identifying specificity profiles for peptide recognition modules from phage-displayed peptide libraries. Nat. Protocols 2007, 2, 1368–1386. [46] Liu, Q., Berry, D., Nash, P., Pawson, T., et al., Structural Basis for Specific Binding of the Gads SH3 Domain to an RxxK Motif-Containing SLP-76 Peptide: A Novel Mode of Peptide Recognition. Molecular Cell 2003, 11, 471–481. 45

85

[47] Sparks, A.B., Rider, J.E., Hoffman, N.G., Fowlkes, D.M., et al., Distinct ligand preferences of Src homology 3 domains from Src, Yes, Abl, Cortactin, p53bp2, PLCgamma, Crk, and Grb2. PNAS 1996, 93, 1540–1544. [48] Penkert, R.R., DiVittorio, H.M., Prehoda, K.E., Internal Recognition Through PDZ Domain Plasticity in the Par-6 - Pals1 Complex. Nat Struct Mol Biol 2004, 11, 1122–1127. [49] Rapali, P., Szenes, Á., Radnai, L., Bakos, A., et al., DYNLL/LC8: a light chain subunit of the dynein motor complex and beyond. FEBS Journal 2011, 278, 2980–2996. [50] Rapali, P., Radnai, L., Süveges, D., Harmat, V., et al., Directed Evolution Reveals the Binding Motif Preference of the LC8/DYNLL Hub Protein and Predicts Large Numbers of Novel Binders in the Human Proteome. PLoS ONE 2011, 6, e18818. [51] Bennett-Lovsey, R., Hart, S.E., Shirai, H., Mizuguchi, K., The SWIB and the MDM2 domains are homologous and share a common fold. Bioinformatics 2002, 18, 626–630. [52] Pazgier, M., Liu, M., Zou, G., Yuan, W., et al., Structural Basis for High-Affinity Peptide Inhibition of P53 Interactions with MDM2 and MDMX. PNAS 2009, 106, 4665–4670. [53] Hu, B., Gilkes, D.M., Chen, J., Efficient P53 Activation and Apoptosis by Simultaneous Disruption of Binding to MDM2 and MDMX. Cancer Res 2007, 67, 8810–8817. [54] Ja, W.W., Adhikari, A., Austin, R.J., Sprang, S.R., Roberts, R.W., A peptide core motif for binding to heterotrimeric G protein alpha subunits.J. Biol. Chem. 2005, 280, 32057–32060. [55] Prévost, G.P., Lonchampt, M.O., Holbeck, S., Attoub, S., et al., Anticancer Activity of BIM-46174, a New Inhibitor of the Heterotrimeric Gα/Gβγ Protein Complex. Cancer Res 2006, 66, 9227–9234. [56] Hermeking, H., The 14-3-3 cancer connection. Nat Rev Cancer 2003, 3, 931–943. [57] Muslin, A.J., Tanner, J.W., Allen, P.M., Shaw, A.S., Interaction of 14-3-3 with Signaling Proteins Is Mediated by the Recognition of Phosphoserine. Cell 1996, 84, 889–897. [58] Masters, S.C., Pederson, K.J., Zhang, L., Barbieri, J.T., Fu, H., Interaction of 14-3-3 with a Nonphosphorylated Protein Ligand, Exoenzyme S of Pseudomonas aeruginosa†. Biochemistry 1999, 38, 5216–5221. [59] Wang, B., Yang, H., Liu, Y.-C., Jelinek, T., et al., Isolation of High-Affinity Peptide Antagonists of 14-3-3 Proteins by Phage Display†. Biochemistry 1999, 38, 12499–12504. [60] Mapelli, M., Massimiliano, L., Santaguida, S., Musacchio, A., The Mad2 Conformational Dimer: Structure and Implications for the Spindle Assembly Checkpoint. Cell 2007, 131, 730–743. [61] Luo, X., Tang, Z., Rizo, J., Yu, H., The Mad2 Spindle Checkpoint Protein Undergoes Similar Major Conformational Changes Upon Binding to Either Mad1 or Cdc20. Molecular Cell 2002, 9, 59–71. [62] Gulbis, J.M., Kelman, Z., Hurwitz, J., O’Donnell, M., Kuriyan, J., Structure of the C-terminal region of p21(WAF1/CIP1) complexed with human PCNA. Cell 1996, 87, 297–306. [63] Meslet-Cladiére, L., Norais, C., Kuhn, J., Briffotaux, J., et al., A Novel Proteomic Approach Identifies New Interaction Partners for Proliferating Cell Nuclear Antigen. Journal of Molecular Biology 2007, 372, 1137–1148. [64] Xu, H., Zhang, P., Liu, L., Lee, M.Y.W.T., A Novel PCNA-Binding Motif Identified by the Panning

of a Random Peptide Display Library†. Biochemistry 2001, 40, 4512–4520. 38 [65] Hishiki, A., Hashimoto, H., Hanafusa, T., Kamei, K., et al., Structural Basis for Novel Interactions between Human Translesion Synthesis Polymerases and Proliferating Cell Nuclear Antigen. J. Biol. Chem. 2009, 284, 10552–10560. [66] Izard, T., Evans, G., Borgon, R.A., Rush, C.L., et al., Vinculin activation by talin through helical bundle conversion. Nature 2003, 427, 171–175. [67] Adey, N.B., Kay, B.K., Isolation of peptides from phage-displayed random peptide libraries that interact with the talin-binding domain of vinculin. Biochem J 1997, 324, 523–528. [68] Gingras, A.R., Ziegler, W.H., Frank, R., Barsukov, I.L., et al., Mapping and Consensus Sequence Identification for Multiple Vinculin Binding Sites within the Talin Rod. J. Biol. Chem. 2005, 280, 37217–37224. [69] Bayliss, R., Littlewood, T., Stewart, M., Structural basis for the interaction between FxFG nucleoporin repeats and importin-beta in nuclear trafficking. Cell 2000, 102, 99–108. 45

86

[70] Ambrus, G., Whitby, L.R., Singer, E.L., Trott, O., et al., Small molecule peptidomimetic inhibitors of importin α/β mediated nuclear transport. Bioorg. Med. Chem. 2010, 18, 7611–7620. [71] Arcus, V., OB-fold domains: a snapshot of the evolution of sequence, structure and function. Current Opinion in Structural Biology 2002, 12, 794–801. [72] Bochkareva, E., Kaustov, L., Ayed, A., Yi, G.-S., et al., Single-stranded DNA mimicry in the p53 transactivation domain interaction with replication protein A. PNAS 2005, 102, 15412–15417. [73] Anciano Granadillo, V.J., Earley, J.N., Shuck, S.C., Georgiadis, M.M., et al., Targeting the OB- Folds of Replication Protein A with Small Molecules. Journal of Nucleic Acids 2010, 2010, 1–11. [74] Glanzer, J.G., Liu, S., Oakley, G.G., Small molecule inhibitor of the RPA70 N-terminal protein interaction domain discovered using in silico and in vitro methods. Bioorganic & Medicinal Chemistry 2011, 19, 2589–2595. [75] Suyama, M., Doerks, T., Braun, I.C., Sattler, M., et al., Prediction of structural domains of TAP reveals details of its interaction with p15 and nucleoporins. EMBO reports 2000, 1, 53–58. [76] Zolotukhin, A.S., Michalowski, D., Smulevitch, S., Felber, B.K., Retroviral constitutive transport element evolved from cellular TAP(NXF1)-binding sequences. J. Virol. 2001, 75, 5567–5575. [77] Grant, R.P., Neuhaus, D., Stewart, M., Structural basis for the interaction between the Tap/NXF1 UBA domain and FG nucleoporins at 1A resolution. J. Mol. Biol. 2003, 326, 849–858. [78] McCullough, J., Fisher, R.D., Whitby, F.G., Sundquist, W.I., Hill, C.P., ALIX-CHMP4 interactions in the human ESCRT pathway. Proc Natl Acad Sci U S A 2008, 105, 7687–7691. [79] Sette, P., Mu, R., Dussupt, V., Jiang, J., et al., The Phe105 loop of Alix Bro1 domain plays a key role in HIV-1 release. Structure 2011, 19, 1485–1495. [80] Aranda, A., Pascual, A., Nuclear Hormone Receptors and Gene Expression. Physiol Rev 2001, 81, 1269–1304. [81] Xu, H.E., Stanley, T.B., Montana, V.G., Lambert, M.H., et al., Structural basis for antagonist- mediated recruitment of nuclear co-repressors by PPARα. Nature 2002, 415, 813–817. [82] Makishima, M., Okamoto, A.Y., Repa, J.J., Tu, H., et al., Identification of a nuclear receptor for bile acids. Science 1999, 284, 1362–1365. [83] Maki, M., Kitaura, Y., Satoh, H., Ohkouchi, S., Shibata, H., Structures, functions and molecular evolution of the penta-EF-hand Ca2+-binding proteins. Biochimica et Biophysica Acta (BBA) - Proteins & Proteomics 2002, 1600, 51–60. [84] Todd, B., Moore, D., Deivanayagam, C.C.., Lin, G., et al., A Structural Model for the Inhibition of Calpain by Calpastatin: Crystal Structures of the Native Domain VI of Calpain and its Complexes with Calpastatin Peptide and a Small Molecule Inhibitor. Journal of Molecular Biology 2003, 328, 131–146. [85] Shibata, H., Suzuki, H., Kakiuchi, T., Inuzuka, T., et al., Identification of Alix-Type and Non-Alix- Type ALG-2-Binding Sites in Human 3 DIFFERENTIAL BINDING TO AN ALTERNATIVELY SPLICED ISOFORM AND AMINO ACID-SUBSTITUTED MUTANTS. J. Biol. Chem. 2008, 283, 9623–9632.

[86] Suzuki, H., Kawasaki, M., Inuzuka, T., Okumura, M., et al., Structural basis for Ca2+ -dependent 38 formation of ALG-2/Alix peptide complex: Ca2+/EF3-driven arginine switch mechanism. Structure 2008, 16, 1562–1573. [87] Høj, B.R., la Cour, J.M., Mollerup, J., Berchtold, M.W., ALG-2 knockdown in HeLa cells results in G2/M cell cycle phase accumulation and cell death. Biochemical and Biophysical Research Communications 2009, 378, 145–148. [88] Stirnimann, C.U., Petsalaki, E., Russell, R.B., Müller, C.W., WD40 proteins propel cellular networks. Trends Biochem. Sci. 2010, 35, 565–574. [89] Xu, C., Min, J., Structure and function of WD40 domain proteins. Protein Cell 2011, 2, 202–214. [90] Patel, A., Dharmarajan, V., Cosgrove, M.S., Structure of WDR5 bound to mixed lineage leukemia protein-1 peptide. J. Biol. Chem. 2008, 283, 32158–32161. 45

87

[91] Zhang, P., Lee, H., Brunzelle, J.S., Couture, J.-F., The plasticity of WDR5 peptide-binding cleft enables the binding of the SET1 family of histone methyltransferases. Nucleic Acids Res 2012, 40, 4237–4246. [92] Karatas, H., Townsend, E.C., Bernard, D., Dou, Y., Wang, S., Analysis of the Binding of Mixed Lineage Leukemia 1 (MLL1) and Histone 3 Peptides to WD Repeat Domain 5 (WDR5) for the Design of Inhibitors of the MLL1−WDR5 Interaction. J. Med. Chem. 2010, 53, 5179–5185. [93] Lemmon, S.K., Traub, L.M., Getting in Touch with the Clathrin Terminal Domain. Traffic 2012, 13, 511–519. [94] Haar, E. ter, Harrison, S.C., Kirchhausen, T., Peptide-in-groove interactions link target proteins to the β-propeller of clathrin. PNAS 2000, 97, 1096–1100. [95] Miele, A.E., Watson, P.J., Evans, P.R., Traub, L.M., Owen, D.J., Two distinct interaction motifs in amphiphysin bind two independent sites on the clathrin terminal domain beta-propeller. Nat. Struct. Mol. Biol. 2004, 11, 242–248. [96] Kang, D.S., Kern, R.C., Puthenveedu, M.A., Zastrow, M. von, et al., Structure of an Arrestin2- Clathrin Complex Reveals a Novel Clathrin Binding Domain That Modulates Receptor Trafficking. J. Biol. Chem. 2009, 284, 29860–29872. [97] Willox, A.K., Royle, S.J., Functional Analysis of Interaction Sites on the N-Terminal Domain of Clathrin Heavy Chain. Traffic 2012, 13, 70–81. [98] Weisbrich, A., Honnappa, S., Jaussi, R., Okhrimenko, O., et al., Structure-function relationship of CAP-Gly domains. Nat. Struct. Mol. Biol. 2007, 14, 959–967. [99] Steinmetz, M.O., Akhmanova, A., Capturing protein tails by CAP-Gly domains. Trends Biochem. Sci. 2008, 33, 535–545. [100] Matsuo, H., Li, H., McGuire, A.M., Fletcher, C.M., et al., Structure of translation factor elF4E bound to m7GDP and interaction with 4E-binding protein. Nature Structural & Molecular Biology 1997, 4, 717–724. [101] Marcotrigiano, J., Gingras, A.-C., Sonenberg, N., Burley, S.K., Cap-Dependent Translation Initiation in Eukaryotes Is Regulated by a Molecular Mimic of eIF4G. Molecular Cell 1999, 3, 707– 716. [102] Volpon, L., Osborne, M.J., Capul, A.A., Torre, J.C. de la, Borden, K.L.B., Structural characterization of the Z RING-eIF4E complex reveals a distinct mode of control for eIF4E. PNAS 2010, 107, 5441–5446. [103] Swanson, K.A., Kang, R.S., Stamenova, S.D., Hicke, L., Radhakrishnan, I., Solution structure of Vps27 UIM–ubiquitin complex important for endosomal sorting and receptor downregulation. EMBO J 2003, 22, 4597–4606. [104] Hofmann, K., Falquet, L., A ubiquitin-interacting motif conserved in components of the proteasomal and lysosomal protein degradation systems. Trends in Biochemical Sciences 2001, 26, 347–350. [105] Sims, J.J., Scavone, F., Cooper, E.M., Kane, L.A., et al., Polyubiquitin-sensor proteins reveal

localization and linkage-type dependence of cellular ubiquitin signaling. Nature Methods 2012, 9, 38 303–309. [106] Fairall, L., Chapman, L., Moss, H., de Lange, T., Rhodes, D., Structure of the TRFH dimerization domain of the human telomeric proteins TRF1 and TRF2.Mol. Cell 2001, 8, 351–361. [107] Chen, Y., Yang, Y., Overbeek, M. van, Donigian, J.R., et al., A Shared Docking Motif in TRF1 and TRF2 Used for Differential Recruitment of Telomeric Proteins. Science 2008, 319, 1092–1096. [108] Ernst, A., Gfeller, D., Kan, Z., Seshagiri, S., et al., Coevolution of PDZ domain-ligand interactions analyzed by high-throughput phage display and deep sequencing.Mol Biosyst 2010, 6, 1782–1790. [109] Gfeller, D., Butty, F., Wierzbicka, M., Verschueren, E., et al., The multiple-specificity landscape of modular peptide recognition domains.Mol Syst Biol 2011, 7. [110] Huang, H., Sidhu, S.S., Studying binding specificities of peptide recognition modules by high- throughput phage display selections. Methods Mol. Biol. 2011, 781, 87–97. 45

88

APPENDIX

38

45

89

Appendix A: List of ovarian cancer cell lines

Cancer cell line Cancer type Species Development Morphology stage 609050M Ovarian Homo Sapiens Adult Epithelial A2780 Ovarian Homo Sapiens Adult Epithelial A2780_CIS Ovarian Homo Sapiens Adult Epithelial MM_OVCAR432_Bast_1 Ovarian Homo Sapiens Adult Epithelial OV-1946 Ovarian Homo Sapiens Adult Epithelial OV-90 Ovarian Homo Sapiens Adult Epithelial OVCA1369_TR Ovarian Homo Sapiens Adult Epithelial OVCA433_Bast Ovarian Homo Sapiens Adult Epithelial OVCA5 Ovarian Homo Sapiens Adult Epithelial OVCA8 Ovarian Homo Sapiens Adult Epithelial OVCAR-3 Ovarian Homo Sapiens Adult Epithelial SK-OV-3 Ovarian Homo Sapiens Adult Epithelial TOV-1946 Ovarian Homo Sapiens Adult Epithelial TOV-2223G Ovarian Homo Sapiens Adult Epithelial TOV-3133G Ovarian Homo Sapiens Adult Epithelial * Information obtained from COLT-cancer database at the Moffat lab at Terrence Donnelly CCBR, University of Toronto. 38

45

90

Appendix B:Protein sequences of 66 domains

1433F GDREQLLQRARLAEQAERYDDMASAMKAVTELNEPLSNEDRNLLSVAYKNVVGARRSSWRVISSIEQKTMADGNEKKLEKVKAYRE KIEKELETVCNDVLSLLDKFLIKNCNDFQYESKVFYLKMKGDYYRYLAEVASGEKKNSVVEASEAAYKEAFEISKEQMQPTHPIRL GLALNFSVFYYEIQNAPEQACLLAKQAFDDAIAELDTLNEDSYKDSTLIMQLLRDNLTLWTSDQQDEEAGEGN

ACTG2MEEEIAALVIDNGSGMCKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQSKRGILTLKYPIEHGIVTNWDD MEKIWHHTFYNELRVAPEEHPVLLTEAPLNPKANREKMTQIMFETFNTPAMYVAIQAVLSLYASGRTTGIVMDSGDGVTHTVPIYE GYALPHAILRLDLAGRDLTDYLMKILTERGYSFTTTAEREIVRDIKEKLCYVALDFEQEMATAASSSSLEKSYELPDGQVITIGNE RFRCPEALFQPSFLGMESCGIHETTFNSIMKCDVDIRKDLYANTVLSGGTTMYPGIADRMQKEITALAPSTMKIKIIAPPERKYSV WIGGSILASLSTFQQMWISKQEYDESGPSIVHRKCF

ACTH EEETTALVCDNGSGLCKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQSKRGILTLKYPIEHGIITNWDDMEKIWH HSFYNELRVAPEEHPTLLTEAPLNPKANREKMTQIMFETFNVPAMYVAIQAVLSLYASGRTTGIVLDSGDGVTHNVPIYEGYALPH AIMRLDLAGRDLTDYLMKILTERGYSFVTTAEREIVRDIKEKLCYVALDFENEMATAASSSSLEKSYELPDGQVITIGNERFRCPE TLFQPSFIGMESAGIHETTYNSIMKCDIDIRKDLYANNVLSGGTTMYPGIADRMQKEITALAPSTMKIKIIAPPERKYSVWIGGSI LASLSTFQQMWISKPEYDEAGPSIVHRKCF

IGLL1 LLRPTAASQSRALGPGAPGGSSRSSLRSRWGRFLLQRGSWTGPRCWPRGFQSKHNSVTHVFGSGTQLTVLSQPKATPSVTLFPPSS EELQANKATLVCLMNDFYPGILTVTWKADGTPITQGVEMTTPSKQSNNKYAASSYLSLTPEQWRSRRSYSCQVMHEGSTVEKTVAP AECS

AP2M1 KYRRNELFLDVLESVNLLMSPQGQVLSAHVSGRVVMKSYLSGMPECKFGMNDKIVIEKQGKGTADETSKSGKQSIAIDDCTFHQCV RLSKFDSERSISFIPPDGEFELMRYRTTKDIILPFRVIPLVREVGRTKLEVKVVIKSNFKPSLLAQKIEVRIPTPLNTSGVQVICM KGKAKYKASENAIVWKIKRMAGMKESQISAEIELLPTNDKKKWARPPISMNFEVPFAPSGLKVRYLKVFEPKLNYSDHDVIKWVRY IGRSGIYETRC

B2CL1 MSQSNRELVVDFLSYKLSQKGYSWSQFSDVEENRTEAPEGTESEMETPSAINGNPSWHLADSPAVNGATGHSSSLDAREVIPMAAV KQALREAGDEFELRYRRAFSDLTSQLHITPGTAYQSFEQVVNELFRDGVNWGRIVAFFSFGGALCVESVDKEMQVLVSRIAAWMAT YLNDHLEPWIQENGGWDTFVELYGNNAAAESRKGQERFNRWFLTGMTVAGVVLLGSLFSRK

PDC6I TFISVQLKKTSEVDLAKPLVKFIQQTYPSGGEEQAQYCRAAEELSKLRRAAVGRPLDKHEGALETLLRYYDQICSIEPKFPFSENQ ICLTFTWKDAFDKGSLFGGSVKLALASLGYEKSCVLFNCAALASQIAAEQNLDNDEGLKIAAKHYQFASGAFLHIKETVLSALSRE PTVDISPDTVGTLSLIMLAQAQEVFFLKATRDKMKDAIIAKLANQAADYFGDAFKQCQYKDTLPKEVFPVLAAKHCIMQANAEYHQ SILAKQQKKFGEEIARLQHAAELIKTVASRYDEYVNVKDFSDKINRALAAAKKDNDFIYHDRVPDLKDLDPIGKATLVKSTPVNVP ISQKFTDLFEKMVPVSVQQSLAAYNQRKADLVNRSIAQMREATTLA

CPNS1 MFLVNSFLKGGGGGGGGGGGLGGGLGNVLGGLISGAGGGGGGGGGGGGGGGGGGGGTAMRILGGVISAISEAAAQYNPEPPPPRTH YSNIEANESEEVRQFRRLFAQLAGDDMEVSATELMNILNKVVTRHPDLKTDGFGIDTCRSMVAVMDSDTTGKLGFEEFKYLWNNIK RWQAIYKQFDTDRSGTICSSELPGAFEAAGFHLNEHLYNMIIRRYSDESGNMDFDNFISCLVRLDAMFRAFKSLDKDGTGQIQVNI 38 QEWLQLTMYS

DCTN1 PLRVGSRVEVIGKGHRGTVAYVGATLFATGKWVGVILDEAKGKNDGTVQGRKYFTCDEGHGIFVRQSQIQVF

CASP2 RLSTDTVEHSLDNKDGPVCLQVKPCTPEFYQTHFQLAYRLQSRPRGLALVLSNVHFTGEKELEFRSGGDVDHSTLVTLFKLLGYDV HVLCDQTAQEMQEKLQNFAQLPAHRVTDSCIVALLSHGVEGAIYGVDGKLLQLQEVFQLFDNANCPSLQNKPKMFFIQACRGDETD RGVDQQDGKNHAGSPGCEESDAGKEKLPKMRLPTRSDMICGYACLKGTAAMRNTKRGSWYIEALAQVFSERACDMHVADMLVKVNA LIKDREGYAPGTEFHRCKEMSEYCSTLCRHLYLFPGHPPT

CLH1 MAQILPIRFQEHLQLQNLGINPANIGFSTLTMESDKFICIREKVGEQAQVVIIDMNDPSNPIRRPISADSAIMNPASKVIALKAGK TLQIFNIEMKSKMKAHTMTDDVTFWKWISLNTVALVTDNAVYHWSMEGESQPVKMFDRHSSLAGCQIINYRTDAKQKWLLLTGISA QQNRVVGAMQLYSVDRKVSQPIEGHAASFAQFKMEGNAEESTLFCFAVRGQAGGKLHIIEVGTPPTGNQPFPKKAVDVFFPPEAQN 45

91

DFPVAMQISEKHDVVFLITKYGYIHLYDLETGTCIYMNRISGETIFVTAPHEATAGIIGVNRKGQVLSVCVEEENIIPYITNVLQN PDLALRMAVRNNLAGAEEL

DLG1-1 EITLERGNSGLGFSIAGGTDNPHIGDDSSIFITKIITGGAAAQDGRLRVNDCILRVNEVDVRDVTHSKAVEALKEAGSIVRLYVKR R

DLG1-2 EIKLIKGPKGLGFSIAGGVGNQHIPGDNSIYVTKIIEGGAAHKDGKLQIGDKLLAVNNVCLEEVTHEEAVTALKNTSDFVYLKVAK P

DLG2-2 EIKLFKGPKGLGFSIAGGVGNQHIPGDNSIYVTKIIDGGAAQKDGRLQVGDRLLMVNNYSLEEVTHEEAVAILKNTSEVVYLKVGK P

DLG2-3 KVVLHKGSTGLGFNIVGGEDGEGIFVSFILAGGPADLSGELQRGDQILSVNGIDLRGASHEQAAAALKGAGQTVTIIAQYQ

DLG4-2 EIKLIKGPKGLGFSIAGGVGNQHIPGDNSIYVTKIIEGGAAHKDGRLQIGDKILAVNSVGLEDVMHEDAVAALKNTYDVVYLKVAK P

DLG4-3 RIVIHRGSTGLGFNIVGGEDGEGIFISFILAGGPADLSGELRKGDQILSVNGVDLRNASHEQAAIALKNAGQTVTIIAQYK

DNAS1 LKIAAFNIQTFGETKMSNATLVSYIVQILSRYDIALVQEVRDSHLTAVGKLLDNLNQDAPDTYHYVVSEPLGRNSYKERYLFVYRP DQVSAVDSYYYDDGCEPCGNDTFNREPAIVRFFSRFTEVREFAIVPLHAAPGDAVAEIDALYDVYLDVQEKWGLEDVMLMGDFNAG CSYVRPSQWSSIRLWTSPTFQWLIPDSADTTATPTHCAYDRIVVAGMLLRGAVVPDSALPFNFQAAYGLSDQLAQAISDHYPVEVM LK

DYL2 MSDRKAVIKNADMSEDMQQDAVDCATQAMEKYNIEKDIAAYIKKEFDKKYNPTWHCIVGRNFGSYVTHETKHFIYFYLGQVAILLF KSG

DYL1 MCDRKAVIKNADMSEEMQQDSVECATQALEKYNIEKDIAAHIKKEFDKKYNPTWHCIVGRNFGSYVTHETKHFIYFYLGQVAILLF KSG

PROF2 MAGWQSYVDNLMCDGCCQEAAIVGYCDAKYVWAATAGGVFQSITPIEIDMIVGKDREGFFTNGLTLGAKKCSVIRDSLYVDGDCTM DIRTKSQGGEPTYNVAVGRAGRVLVFVMGKEGVHGGGLNKKAYSMAKYLRDSGF

DYN2 MEELIPLVNKLQDAFSSIGQSCHLDLPQIAVVGGQSAGKSSVLENFVGRDFLPRGSGIVTRRPLILQLIFSKTEHAEFLHCKSKKF TDFDEVRQEIEAETDRVTGTNKGISPVPINLRVYSPHVLNLTLIDLPGITKVPVGDQPPDIEYQIKDMILQFISRESSLILAVTPA NMDLANSDALKLAKEVDPQGLRTIGVITKLDLMDEGTDARDVLENKLLPLRRGYIGVVNRSQKDIEGKKDIRAALAAERKFFLSHP

AYRHMADRMGTPHLQKTLNQQLTNHIRESLPALRSKLQSQL 38

PDCD6 PDQSFLWNVFQRVDKDRSGVISDTELQQALSNGTWTPFNPVTVRSIISMFDRENKAGVNFSEFTGVWKYITDWQNVFRTYDRDNSG MIDKNELKQALSGFGYRLSDQFHDILIRKFDRQGRGQIAFDDFIQGCIVLQRLTDIFRRYDTDQDGWIQVSYEQYLSMVFSIV

IF4E ATVEPETTPTPNPPTTEEEKTESNQEVANPEHYIKHPLQNRWALWFFKNDKSKTWQANLRLISKFDTVEDFWALYNHIQLSSNLMP GCDYSLFKDGIEPMWEDEKNKRGGRWLITLNKQQRRSDLDRFWLETLLCLIGESFDDYSDDVCGAVVNVRAKGDKIAIWTTECENR EAVTHIGRVYKERLGLPPKIVIGYQSHADTATKSGSTTKNRFVV

E41L3 MQCKVILLDGSEYTCDVEKRSRGQVLFDKVCEHLNLLEKDYFGLTYRDAENQKNWLDPAKEIKKQVRSGAWHFSFNVKFYPPDPAQ LSEDITRYYLCLQLRDDIVSGRLPCSFVTLALLGSYTVQSELGDYDPDECGSDYISEFRFAPNHTKELEDKVIELHKSHRGMTPAE AEMHFLENAKKLSMYGVDLHHAKDSEGVEIMLGVCASGLLIYRDRLRINRFAWPKVLKISYKRNNFYIKIRPGEFEQFESTIGFKL PNHRAAKRLWKVCVEHHTFFRLLL 45

92

GNAI1 GCTLSAEDKAAVERSKMIDRNLREDGEKAAREVKLLLLGAGESGKSTIVKQMKIIHEAGYSEEECKQYKAVVYSNTIQSIIAIIRA MGRLKIDFGDSARADDARQLFVLAGAAEEGFMTAELAGVIKRLWKDSGVQACFNRSREYQLNDSAAYYLNDLDRIAQPNYIPTQQD VLRTRVKTTGIVETHFTFKDLHFKMFDVGGQRSERKKWIHCFEGVTAIIFCVALSDYDLVLAEDEEMNRMHESMKLFDSICNNKWF TDTSIILFLNKKDLFEEKIKKSPLTICYPEYAGSNTYEEAAAYIQCQFEDLNKRKDTKEIYTHFTCATDTKNVQFVFDAVTDVIIK NNLKDCGLF

GNAO GCTLSAEERAALERSKAIEKNLKEDGISAAKDVKLLLLGAGESGKSTIVKQMKIIHEDGFSGEDVKQYKPVVYSNTIQSLAAIVRA MDTLGIEYGDKERKADAKMVCDVVSRMEDTEPFSAELLSAMMRLWGDSGIQECFNRSREYQLNDSAKYYLDSLDRIGAADYQPTEQ DILRTRVKTTGIVETHFTFKNLHFRLFDVGGQRSERKKWIHCFEDVTAIIFCVALSGYDQVLHEDETTNRMHESLMLFDSICNNKF FIDTSIILFLNKKDLFGEKIKKSPLTICFPEYTGPNTYEDAAAYIQAQFESKNRSPNKEIYCHMTCATDTNNIQVVFDAVTDIIIA NNLRGCGLY

GNAI3 GCTLSAEDKAAVERSKMIDRNLREDGEKAAKEVKLLLLGAGESGKSTIVKQMKIIHEDGYSEDECKQYKVVVYSNTIQSIIAIIRA MGRLKIDFGEAARADDARQLFVLAGSAEEGVMTPELAGVIKRLWRDGGVQACFSRSREYQLNDSASYYLNDLDRISQSNYIPTQQD VLRTRVKTTGIVETHFTFKDLYFKMFDVGGQRSERKKWIHCFEGVTAIIFCVALSDYDLVLAEDEEMNRMHESMKLFDSICNNKWF TETSIILFLNKKDLFEEKIKRSPLTICYPEYTGSNTYEEAAAYIQCQFEDLNRRKDTKEIYTHFTCATDTKNVQFVFDAVTDVIIK NNLKECGLY

GRB2 MEAIAKYDFKATADDELSFKRGDILKVLNEECDQNWYKAELNGKDGFIPKNYIEMKPH

GRAP2 GRVRWARALYDFEALEDDELGFHSGEVVEVLDSSNPSWWTGRLHNKLGLFPANYVAPMTR

IMB1 MELITILEKTVSPDRLELEAAQKFLERAAVENLPTFLVELSRVLANPGNSQVARVAAGLQIKNSLTSKDPDIKAQYQQRWLAIDAN ARREVKNYVLQTLGTETYRPSSASQCVAGIACAEIPVNQWPELIPQLVANVTNPNSTEHMKESTLEAIGYICQDIDPEQLQDKSNE ILTAIIQGMRKEEPSNNVKLAATNALLNSLEFTKANFDKESERHFIMQVVCEATQCPDTRVRVAALQNLVKIMSLYYQYMETYMGP ALFAITIEAMKSDIDEVALQGIEFWSNVCDEEMDLAIEASEAAEQGRPPEHTSKFYAKGALQYLVPILTQTLTKQDENDDDDDWNP CKAAGVCLMLLATCCEDDIVPHVLPFIKEHIKNPDWRYRDAAVMAFGCILEGPEPSQLKPLVIQAMPTLIELMKDPSVVVRDTAAW TVGRICELLPEAAINDVYLAPLLQCLIEGLSAEPRVASNVCWAFSSLAEAAYEAADVADDQEEPATYCLSSSFELIVQKLLETTDR PDGHQNNLRSSAYESLMEIVKNSAKDCYPAVQKTTLVIMERLQQVLQMESHIQSTSDRIQFNDLQSLLCATLQNVLRKVQHQDALQ ISDVVMASLLRMFQSTAGSGGVQEDALMAVSTLVEVLGGEFLKYMEAFKPFLGIGLKNYAEYQVCLAAVGLVGDLCRALQSNIIPF CDEVMQLLLENLGNENVHRSVKPQILSVFGDIALAIGGEFKKYLEVVLNTLQQASQAQVDKSDYDMVDYLNELRESCLEAYTGIVQ GLKGDQENVHPDVMLVQPRVEFILSFIDHIAGDEDHTDGVVACAAGLIGDLCTAFGKDVLKLVEARPMIHELLTEGRRSKTNKAKT LATWATKELRKLKNQA

NR1H4 KTELTPDQQTLLHFIMDSYNKQRMPQEITNKILKEEFSAEENFLILTEMATNHVQVLVEFTKKLPGFQTLDHEDQIALLKGSAVEA MFLRSAEIFNKKLPSGHSDLLEERIRNSGISDEYITPMFSFYKSIGELKMTQEEYALLTAIVILSPDRQYIKDREAVEKLQEPLLD VLQKLCKIHQPENPQHFACLLGRLTELRTFNHHHAEMLMSWRVNDHK

NR1I2

DLCSLKVSLQLRGEDGSVWNYKPPADSGGKEIFSLLPHMADMSTYMFKGIISFAKVISYFRDLPIEDQISLLKGAAFELCQLRFNT 38

VFNAETGTWECGRLSYCLEDTAGGFQQLLLEPMLKFHYMLKKLQLHEEEYVLMQAISLFSPDRPGVLQHRVVDQLQEQFAITLKSY IECNRPQPAHRFLFLKIMAMLTELRSINAQHTQRLLRIQDIHPFATPLMQELFGITGS

GCR LTPTLVSLLEVIEPEVLYAGYDSSVPDSTWRIMTTLNMLGGRQVIAAVKWAKAIPGFRNLHLDDQMTLLQYSWMFLMAFALGWRSY RQSSANLLCFAPDLIINEQRMTLPCMYDQCKHMLYVSSELHRLQVSYEEYLCMKTLLLLSSVPKDGLKSQELFDEIRMTYIKELGK AIVKREGNSSQNWQRFYQLTKLLDSMHEVVENLLNYCFQTFLDKTMSIEFPEMLAEIITNQIPKYSNGNIKKLLFHQK

RXRG STNDPVTNICHAADKQLFTLVEWAKRIPHFSDLTLEDQVILLRAGWNELLIASFSHRSVSVQDGILLATGLHVHRSSAHSAGVGSI FDRVLTELVSKMKDMQMDKSELGCLRAIVLFNPDAKGLSNPSEVETLREKVYATLEAYTKQKYPEQPGRFAKLLLRLPALRSIGLK CLEHLFFFKLIGDTPIDTFLMEMLETPLQIT

MD2L1 45

93

ALQLSREQGITLRGSAEIVAEFFSFGINSILYQRGIYPSETFTRVQKYGLTLLVTTDLELIKYLNNVVEQLKDWLYKCSVQKLVVV ISNIESGEVLERWQFDIECDKTAKDDSAPREKSQKAIQDEIRSVIRQITATVTFLPLLEVSCSFDLLIYTDKDLVVPEKWEESGPQ FITNSEEVRLRSFTTTIHKVNSMVAYKIPVND

2B11 GDTRPRFLWQLKFECHFFNGTERVRLLERCIYNQEESVRFDSDVGEYRAVTELGRPDAEYWNSQKDLLEQRRAAVDTYCRHNYGVG ESFTVQRRVEPKVTVYPSKTQPLQHHNLLVCSVSGFYPGSIEVRWFRNGQEEKAGVVSTGLIQNGDWTFQTLVMLETVPRSGEVYT CQVEHPSVTSPLTVEWRARSESAQSK

PCNA MFEARLVQGSILKKVLEALKDLINEACWDISSSGVNLQSMDSSHVSLVQLTLRSEGFDTYRCDRNLAMGVNLTSMSKILKCAGNED IITLRAEDNADTLALVFEAPNQEKVSDYEMKLMDLDVEQLGIPEQEYSCVVKMPSGEFARICRDLSHIGDAVVISCAKDGVKFSAS GELGNGNIKLSQTSNVDKEEEAVTIEMNEPVQLTFALRYLNFFTKATPLSSTVTLSMSADVPLVVEYKIADMGHLKYYLAPKIEDE EGS

MK03 YTQLQYIGEGAYGMVSSAYDHVRKTRVAIKKISPFEHQTYCQRTLREIQILLRFRHENVIGIRDILRASTLEAMRDVYIVQDLMET DLYKLLKSQQLSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLINTTCDLKICDFGLARIADPEHDHTGFLTEYVATRWYRAP EIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINMKARNYLQSLPSKTKVAWAKLFPKS DSKALDLLDRMLTFNPNKRITVEEALAHPYL

KAPCB FERKKTLGTGSFGRVMLVKHKATEQYYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVRLEYAFKDNSNLYMVMEYVPGGEM FSHLRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDHQGYIQVTDFGFAKRVKGRTWTLCGTPEYLAPEIILSKG YNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVSDIKTHKWF

SRPK2 YHVIRKLGWGHFSTVWLCWDMQGKRFVAMKVVKSAQHYTETALDEIKLLKCVRESDPSDPNKDMVVQLIDDFKISGMNGIHVCMVF EVLGHHLLKWIIKSNYQGLPVRCVKSIIRQVLQGLDYLHSKCKIIHTDIKPENILMCVDDAYVRRMAAEATEWQKAGAPPPSGSAV STAPQQKPIGKISKNKKKKLKKKQKRQAELLEKRLQEIEELEREAERKIIEENITSAAPSNDQDGEYCPEVKLKTTGLEEAAEAET AKDNGEAEDQEEKEDAEKENIEKDEDDVDQELANIDPTWIESPKTNGHIENGPFSLEQQLDDEDDDEEDCPNPEEYNLDEPNAESD YTYSSSYEQFNGELPNGRHKIPESQFPEFSTSLFSGSLEPVACGSVLSEGSPLTEQEESSPSHDRSRTVSASSTGDLPKAKTRAAD LLVNPLDPRNADKIRVKIADLGNACWVHKHFTEDIQTRQYRSIEVLIGAGYSTPADIWSTACMAFELATGDYLFEPHSGEDYSRDE DHIAHIIELLGSIPRHFALSGKYSREFFNRRGELRHITKLKPWSLFDVLVEKYGWPHEDAAQFTDFLIPMLEMVPEKRASAGECLR HP

AURKB AQKENSYPWPYGRQTAPSGLSTLPQRVLRKEPVTPSALVLMSRSNVQPTAAPGQKVMENSSGTPDILTRHFTIDDFEIGRPLGKGK FGNVYLAREKKSHFIVALKVLFKSQIEKEGVEHQLRREIEIQAHLHHPNILRLYNYFYDRRRIYLILEYAPRGELYKELQKSCTFD EQRTATIMEELADALMYCHGKKVIHRDIKPENLLLGLKGELKIADFGWSVHAPSLRRKTMCGTLDYLPPEMIEGRMHNEKVDLWCI GVLCYELLVGNPPFESASHNETYRRIVKVDLKFPASVPMGAQDLISKLLRHNPSERLPLAQVSAHPWVRANSRRVLPPSALQSVA

KAPCA FERIKTLGTGSFGRVMLVKHKETGNHYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVMEYVPGGEM FSHLRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGYIQVTDFGFAKRVKGRTWTLCGTPEYLAPEIILSKG YNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWF

PROF1 38

AGWNAYIDNLMADGTCQDAAIVGYKDSPSVWAAVPGKTFVNITPAEVGVLVGKDRSSFYVNGLTLGGQKCSVIRDSLLQDGEFSMD LRTKSTGGAPTFNVTVTKTDKTLVLLMGKEGVHGGLINKKCYEMASHLRRSQY

DAB2 GDGVKYKAKLIGIDDVPDARGDKMSQDSMMKLKGMAAAGRSQGQHKQRIWVNISLSGIKIIDEKTGVIEHEHPVNKISFIARDVTD NRAFGYVCGGEGQHQFFAIKTGQQAEPLVVDLKDLFQVIYNVKKKEEEKKKIEEASKAVENGSEAL

RAC1 MQAIKCVVVGDGAVGKTCLLISYTTNAFPGEYIPTVFDNYSANVMVDGKPVNLGLWDTAGQEDYDRLRPLSYPQTDVFLICFSLVS PASFENVRAKWYPEVRHHCPNTPIILVGTKLDLRDDKDTIEKLKEKKLTPITYPQGLAMAKEIGAVKYLECSALTQRGLKTVFDEA IRAVLCPPPVKKRKRKC

RAD51 45

94

SEIIQITTGSKELDKLLQGGIETGSITEMFGEFRTGKTQICHTLAVTCQLPIDRGGGEGKAMYIDTEGTFRPERLLAVAERYGLSG SDVLDNVAYARAFNTDHQTQLLYQASAMMVESRYALLIVDSATALYRTDYSGRGELSARQMHLARFLRMLLRLADEFGVAVVITNQ VVAQVDGAAMFAADPKKPIGGNIIAHASTTRLYLRKGRGETRICKIYDSPCLPEAEAMFAINADGVGDAKD

RFA1 VGQLSEGAIAAIMQKGDTNIKPILQVINIRPITTGNSPPRYRLLMSDGLNTLSSFMLATQLNPLVEEEQLSSNCVCQIHRFIVNTL KDGRRVVILMELEVLKSAEAVGVKIGNPVPYNEG

U2AF1 LRCAVSDVEMQEHYDEFFEEVFTEMEEKYGEVEEMNVCDNLGDHLVGNVYVKFRREEDAEKAVIDLNNRWFNGQPIHAELSPV

IPSP HRHHPREMKKRVEDLHVGATVAPSSRRDFTFDLYRALASAAPSQSIFFSPVSISMSLAMLSLGAGSSTKMQILEGLGLNLQKSSEK ELHRGFQQLLQELNQPRDGFQLSLGNALFTDLVVDLQDTFVSAMKTLYLADTFPTNFRDSAGAMKQINDYVAKQTKGKIVDLLKNL DSNAVVIMVNYIFFKAKWETSFNHKGTQEQDFYVTSETVVRVPMMSREDQYHYLLDRNLSCRVVGVPYQGNATALFILPSEGKMQQ VENGLSEKTLRKWLKMFKKRQLELYLPKFSIEGSYQLEKVLPSLGISNVFTSHADLSGISNHSNIQVSEMVHKAVVEVDESGTRAA AATGTIFTFRSARLNSQRLVFNRPFLMFIVDNNILFLGKVNRP

PLCG1 TFKCAVKALFDYKAQREDELTFIKSAIIQNVEKQEGGWWRGDYGGKKQLWFPSNYVEEMVN

CACB1 RPSDSDVSLEEDREALRKEAERQALAQLEKAKTKPVAFAVRTNVGYNPSPGDEVPVQGVAITFEPKDFLHIKEKYNNDWWIGRLVK EGCEVGFIPSPVKLDSLRLLQEQKLRQNRLGSSKSGDNSSSSLGDVVTGTRRPTPPASAKQKQKSTEHVPPYDVVPSMRPIILVGP SLKGYEVTDMMQKALFDFLKHRFDGRISITRVTADISLAKRSVLNNPSKHIIIERSNTRSSLAEVQSEIERIFELARTLQLVALDA DTINHPAQLSKTSLAPIIVYIKITSPKVLQRLIKSRGKSQSKHLNVQIAASEKLAQCPPEMFDIILDENQLEDACEHLAEYLEAYW KA

MDM4 QVRPKLPLLKILHAAGAQGEMFTVKEVMHYLGQYIMVKQLYDQQEQHMVYCGGDLLGELLGRQSFSVKDPSPLYDMLRKNL

NXF1 PEQQEMLQAFSTQSGMNLEWSQKCLQDNNWDYTRSAQAFTHLKAKGEIPEVAFMK

T2FA SGDVQVTEDAVRRYLTRKPMTTKDLLKKFQTKKTGLSSEQTVNVLAQILKRLNPERKMINDKMHFSLKE

TPA IKGGLFADIASHPWQAAIFAKHRRSPGERFLCGGILISSCWILSAAHCFQERFPPHHLTVILGRTYRVVPGEEEQKFEVEKYIVHK EFDDDTYDNDIALLQLKSDSSRCAQESSVVRTVCLPPADLQLPDWTECELSGYGKHEALSPFYSERLKEAHVRLYPSSRCTSQHLL NRTVTDNMLCAGDTRSGGPQANLHDACQGDSGGPLVCLNDGRMTLVGIISWGLGCGQKDVPGVYTKVTNYLDWIRDNMRP

TERF1 EEEEEDAGLVAEAEAVAAGWMLDFLCLSLCRAFRDGRSEDFRRTRNSAEAIIHGLSSLTACQLRTIYICQFLTRIAAGKTLDAQFE NDERITPLESALMIWGSIEKEHDKLHEEIQNLIKIQAIAVCMENGNFKEAEEVFERIFGDPNSHMPFKSKLLMIISQKDTFHSFFQ HFSYNHMMEKIKSYVNYVLSEKSSTFLMKAAAKVVESKR

APLP2 38

VKAVCSQEAMTGPCRAVMPRWYFDLSKGKCVRFIYGGCGGNRNNFESEDYCMAVCKAMI

ACRO IVGGKAAQHGAWPWMVSLQIFTYNSHRYHTCGGSLLNSRWVLTAAHCFVGKNNVHDWRLVFGAKEITYGNNKPVKAPLQERYVEKI IIHEKYNSATEGNDIALVEITPPISCGRFIGPGCLPHFKAGLPRGSQSCWVAGWGYIEEKAPRPSSILMEARVDLIDLDLCNSTQW YNGRVQPTNVCAGYPVGKIDTCQGDSGGPLMCKDSKESAYVVVGITSWGVGCARAKRPGIYTATWPYLNWIASKIGSNALRMIQSA TPPPPTTRPPPIRPPFSHPISAHLPWYFQPPPRPLPPRPPAAQ

RL40 MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG

SORBS GEIGEAIAKYNFNADTNVELSLRKGDRVILLKRVDQNWYEGKIPGTNRQGIFPVSYVEVVKK

SPF45 45

95

VVLLRNMVGAGEVDEDLEVETKEECEKYGKVGKCVIFEIPGAPDDEAVRIFLEFERVESAIKAVVDLNGRYFGGRVVKAC

VINC PVFHTRTIESILEPVAQQISHLVIMHEEGEVDGKAIPDLTAPVAAVQAAVSNLVRVGKETVQTTEDQILKRDMPPAFIKVENACTK LVQAAQMLQSDPYSVPARDYLIDGSRGILSGTSDLLLTFDEAEVRKIIRVCKGILEYLTVAEVVETMEDLVTYTKNLGPGMTKMAK MIDERQQELTHQEHRVMLVNSMNTVKELLPVLISAMKIFVTTKNSKNQGIEEALKNRNFTVEKMSAEINEIIRVLQLTSWDEDAWA

WDR5 SSSATQSKPTPVKPNYALKFTLAGHTKAVSSVKFSPNGEWLASSSADKLIKIWGAYDGKFEKTISGHKLGISDVAWSSDSNLLVSA SDDKTLKIWDVSSGKCLKTLKGHSNYVFCCNFNPQSNLIVSGSFDESVRIWDVKTGKCLKTLPAHSDPVSAVHFNRDGSLIVSSSY DGLCRIWDTASGQCLKTLIDDDNPPVSFVKFSPNGKYILAATLDNTLKLWDYSKGKCLKTYTGHKNEKYCIFANFSVTGGKWIVSG SEDNLVYIWNLQTKEIVQKLQGHTDVVISTACHPTENIIASAALENDKTIKLWKSDC

XRCC4 MERKISRIHLVSEPSITHFLQVSWEKTLESGFVITLTDGHSAWTGTVSESEISQEADDMAMEKGKYVGELRKALLSGAGPADVYTF NFSKESCYFFFEKNLKDVSFRLGSFNLEKVENPAEVIRELICYCLDTIAENQAKNEHLQKENERLLRDWNDVQGRFEKCVSAKEAL ETDLYKRFILVLNEKKTKIRSLHNKLLNAAQEREKDIKQEG

PTN22 MDQREILQKFLDEAQSKKITKEEFANEFLKLKRQSTKYKADKTYPTTVAEKPKNIKKNRYKILPYDYSRVELSLITSDEDSSYINA NFIKGVYGPKAYIATQGPLSTTLLDFWRMIWEYSVLIIVMACMEYEMGKKKCERYWAEPGEMQLEFGPFSVSCEAEKRKSDYIIRT LKVKFNSETRTIYQFHYKNWPDHDVPSSIDPILELIWDVRCYQEDDSVPICIHCSAGCGRTGVICAIDYTWMLLKDGIIPENFSVF SLIREMRTQRPSLVQTQEQYELVYNAVLELFKRQMDVIRDKHSGTESQAKH

38

45

96

Appendix C: Vector sequences

The vector sequences are given below: pR4STOP (bp: 6203): The vector for peptide phage display Length: 6203 Legend: Ptac Signal peptide 4 Stop codons P3 P8

GAATTCCCGACACCATCGAATGGTGCAAAACCTTTCGCGGTATGGCATGATAGCGCCCGGAAGAGAGTCAATTCAGGGTGGTGAAT GTGAAACCAGTAACGTTATACGATGTCGCAGAGTATGCCGGTGTCTCTTATCAGACCGTTTCCCGCGTGGTGAACCAGGCCAGCCA CGTTTCTGCGAAAACGCGGGAAAAAGTGGAAGCGGCGATGGCGGAGCTGAATTACATTCCCAACCGCGTGGCACAACAACTGGCGG GCAAACAGTCGTTGCTGATTGGCGTTGCCACCTCCAGTCTGGCCCTGCACGCGCCGTCGCAAATTGTCGCGGCGATTAAATCTCGC GCCGATCAACTGGGTGCCAGCGTGGTGGTGTCGATGGTAGAACGAAGCGGCGTCGAAGCCTGTAAAGCGGCGGTGCACAATCTTCT CGCGCAACGCGTCAGTGGGCTGATCATTAACTATCCGCTGGATGACCAGGATGCCATTGCTGTGGAAGCTGCCTGCACTAATGTTC CGGCGTTATTTCTTGATGTCTCTGACCAGACACCCATCAACAGTATTATTTTCTCCCATGAAGACGGTACGCGACTGGGCGTGGAG CATCTGGTCGCATTGGGTCACCAGCAAATCGCGCTGTTAGCGGGCCCATTAAGTTCTGTCTCGGCGCGTCTGCGTCTGGCTGGCTG GCATAAATATCTCACTCGCAATCAAATTCAGCCGATAGCGGAACGGGAAGGCGACTGGAGTGCCATGTCCGGTTTTCAACAAACCA TGCAAATGCTGAATGAGGGCATCGTTCCCACTGCGATGCTGGTTGCCAACGATCAGATGGCGCTGGGCGCAATGCGCGCCATTACC GAGTCCGGGCTGCGCGTTGGTGCGGATATCTCGGTAGTGGGATACGACGATACCGAAGACAGCTCATGTTATATCCCGCCGTTAAC CACCATCAAACAGGATTTTCGCCTGCTGGGGCAAACCAGCGTGGACCGCTTGCTGCAACTCTCTCAGGGCCAGGCGGTGAAGGGCA ATCAGCTGTTGCCCGTCTCACTGGTGAAAAGAAAAACCACCCTGGCGCCCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGAT TCATTAATGCAGCTGGCACGACAGGTTTCCCGACTGGAAAGCGGGCAGTGAGCGCAACGCAATTAATGTGAGTTAGCTCACTCATT AGGCACAATTCTCATGTTTGACAGCTTATCATCGACTGCACGGTGCACCAATGCTTCTGGCGTCAGGCAGCCATCGGAAGCTGTGG TATGGCTGTGCAGGTCGTAAATCACTGCATAATTCGTGTCGCTCAAGGCGCACTCCCGTTCTGGATAATGTTTTTTGCGCCGACAT CATAACGGTTCTGGCAAATATTCTGAAATGAGCTGTTGACAATTAATCATCGGCTCGTATAATGTGTGGAATTGTGAGCGGATAAC AATTTCACACAGGAAACAGCCAGTCCGTTTAGGTGTTTTCACGAGCACTTCACCAACAAGGACCATAGATTATGAAAAAGAATATC GCATTTCTTCTTGCATCTATGTTCGTTTTTTCTATTGCTACAAATGCCTATGCAGCCTCTTCATCTGGCTAATAATGATGAGGTGG AGGATCCGGAGGAGGCGCCGAGGGTGACGATCCCGCAAAAGCGGCCTTTAACTCCCTGCAAGCCTCAGCGACCGAATATATCGGTT ATGCGTGGGCGATGGTTGTTGTCATTGTCGGCGCAACTATCGGTATCAAGCTGTTTAAGAAATTCACCTCGAAAGCAAGCTGATAA ACCGATACAATTAAAGGCTCCTTTTGGAGCCTTTTTTTTTGGAGATTTTCAACGTGAAAAAATTATTATTCGCAATTCCTTTAGTT GTTCCTTTCTATTCTCACTCCGCTGAAACTGTTGAAAGTTGTTTAGCAAAACCCCATACAGAAAATTCATTTACTAACGTCTGGAA AGACGACAAAACTTTAGATCGTTACGCTAACTATGAGGGTTGTCTGTGGAATGCTACAGGCGTTGTAGTTTGTACTGGTGACGAAA CTCAGTGTCTAGCTAGAGTGGCGGTGGCTCTGGTTCCGGTGATTTTGATTATGAAAAGATGGCAAACGCTAATAAGGGGGCTATGA CCGAAAATGCCGATGAAAACGCGCTACAGTCTGACGCTAAAGGCAAACTTGATTCTGTCGCTACTGATTACGGTGCTGCTATCGAT GGTTTCATTGGTGACGTTTCCGGCCTTGCTAATGGTAATGGTGCTACTGGTGATTTTGCTGGCTCTAATTCCCAAATGGCTCAAGT CGGTGACGGTGATAATTCACCTTTAATGAATAATTTCCGTCAATATTTACCTTCCCTCCCTCAATCGGTTGAATGTCGCCCTTTTG TCTTTAGCGCTGGTAAACCATATGAATTTTCTATTGATTGTGACAAAATAAACTTATTCCGTGGTGTCTTTGCGTTTCTTTTATAT GTTGCCACCTTTATGTATGTATTTTCTACGTTTGCTAACATACTGCGTAATAAGGAGTCTTAATCATGCCAGTTCTTTTGGCTAGC GCCGCCCTATACCTTGTCTGCCTCCCCGCGTTGCGTCGCGGTGCATGGAGCCGGGCCACCTCGACCTGAATGGAAGCCGGCGGCAC CTCGCTAACGGATTCACCACTCCAAGAATTGGAGCCAATCAATTCTTGCGGAGAACTGTGAATGCGCAAACCAACCCTTGGCAGAA

CATATCCATCGCGTCCGCCATCTCCAGCAGCCGCACGCGGCGCATCTCGGGCAGCGTTGGGTCCTGGCCACGGGTGCGCATGATCG 38 TGCTCCTGTCGTTGAGGACCCGGCTAGGCTGGCGGGGTTGCCTTACTGGTTAGCAGAATGAATCACCGATACGCGAGCGAACGTGA

AGCGACTGCTGCTGCAAAACGTCTGCGACCTGAGCAACAACATGAATGGTCTTCGGTTTCCGTGTTTCGTAAAGTCTGGAAACGCG GAAGTCAGCGCCCTGCACCATTATGTTCCGGATCTGCATCGCAGGATGCTGCTGGCTACCCTGTGGAACACCTACATCTGTATTAA CGAAGCGCTGGCATTGACCCTGAGTGATTTTTCTCTGGTCCCGCCGCATCCATACCGCCAGTTGTTTACCCTCACAACGTTCCAGT AACCGGGCATGTTCATCATCAGTAACCCGTATCGTGAGCATCCTCTCTCGTTTCATCGGTATCATTACCCCCATGAACAGAAATTC CCCCTTACACGGAGGCATCAAGTGACCAAACAGGAAAAAACCGCCCTTAACATGGCCCGCTTTATCAGAAGCCAGACATTAACGCT TCTGGAGAAACTCAACGAGCTGGACGCGGATGAACAGGCAGACATCTGTGAATCGCTTCACGACCACGCTGATGAGCTTTACCGCA GGATCCGGAAATTGTAAACGTTAATATTTTGTTAAAATTCGCGTTAAATTTTTGTTAAATCAGCTCATTTTTTAACCAATAGGCCG AAATCGGCAAAATCCCTTATAAATCAAAAGAATAGACCGAGATAGGGTTGAGTGTTGTTCCAGTTTGGAACAAGAGTCCACTATTA AAGAACGTGGACTCCAACGTCAAAGGGCGAAAAACCGTCTATCAGGGCTATGGCCCACTACGTGAACCATCACCCTAATCAAGTTT TTTGGGGTCGAGGTGCCGTAAAGCACTAAATCGGAACCCTAAAGGGAGCCCCCGATTTAGAGCTTGACGGGGAAAGCCGGCGAACG TGGCGAGAAAGGAAGGGAAGAAAGCGAAAGGAGCGGGCGCTAGGGCGCTGGCAAGTGTAGCGGTCACGCTGCGCGTAACCACCACA CCCGCCGCGCTTAATGCGCCGCTACAGGGCGCGTCCGGATCCTGCCTCGCGCGTTTCGGTGATGACGGTGAAAACCTCTGACACAT GCAGCTCCCGGAGACGGTCACAGCTTGTCTGTAAGCGGATGCCGGGAGCAGACAAGCCCGTCAGGGCGCGTCAGCGGGTGTTGGCG GGTGTCGGGGCGCAGCCATGACCCAGTCACGTAGCGATAGCGGAGTGTATACTGGCTTAACTATGCGGCATCAGAGCAGATTGTAC 45

97

TGAGAGTGCACCATATGCGGTGTGAAATACCGCACAGATGCGTAAGGAGAAAATACCGCATCAGGCGCTCTTCCGCTTCCTCGCTC ACTGACTCGCTGCGCTCGGTCGTTCGGCTGCGGCGAGCGGTATCAGCTCACTCAAAGGCGGTAATACGGTTATCCACAGAATCAGG GGATAACGCAGGAAAGAACATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATA GGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCG TTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAG CGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCC CCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCA GCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAG AAGGACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCA CCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCT ACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCT TTTAAATTAAAAATGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGG CACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTA CCATCTGGCCCCAGTGCTGCAATGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAG GGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGC CAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTGCAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCC GGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAG AAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTT CTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAACACGGGAT AATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCT GTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAA AAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTCAATATTAT TGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAGGGGTTCCGCGCAC ATTTCCCCGAAAAGTGCCACCTGACGTCTAAGAAACCATTATTATCATGACATTAACCTATAAAAATAGGCGTATCACGAGGCCCT TTCGTCTTCAA

Primers for Sequencing: YH005 (M13 forward) TGT AAA ACG ACG GCC AGT CGA GCA CTT CAC CAA CAA YH006 (M13 reverse) CAG GAA ACA GCT ATG ACC GAC AAC AAC CAT CGC CCA pHH0103: The vector for protein expression Length: 6734 Legend: GST tag His Tag cleavage site Stop codon PTac promoter Protein insertion site

GAATTCCCGACACCATCGAATGGTGCAAAACCTTTCGCGGTATGGCATGATAGCGCCCGGAAGAGAGTCAATTCAGGGTGGTGAAT GTGAAACCAGTAACGTTATACGATGTCGCAGAGTATGCCGGTGTCTCTTATCAGACCGTTTCCCGCGTGGTGAACCAGGCCAGCCA CGTTTCTGCGAAAACGCGGGAAAAAGTGGAAGCGGCGATGGCGGAGCTGAATTACATTCCCAACCGCGTGGCACAACAACTGGCGG GCAAACAGTCGTTGCTGATTGGCGTTGCCACCTCCAGTCTGGCCCTGCACGCGCCGTCGCAAATTGTCGCGGCGATTAAATCTCGC GCCGATCAACTGGGTGCCAGCGTGGTGGTGTCGATGGTAGAACGAAGCGGCGTCGAAGCCTGTAAAGCGGCGGTGCACAATCTTCT 38 CGCGCAACGCGTCAGTGGGCTGATCATTAACTATCCGCTGGATGACCAGGATGCCATTGCTGTGGAAGCTGCCTGCACTAATGTTC CGGCGTTATTTCTTGATGTCTCTGACCAGACACCCATCAACAGTATTATTTTCTCCCATGAAGACGGTACGCGACTGGGCGTGGAG CATCTGGTCGCATTGGGTCACCAGCAAATCGCGCTGTTAGCGGGCCCATTAAGTTCTGTCTCGGCGCGTCTGCGTCTGGCTGGCTG GCATAAATATCTCACTCGCAATCAAATTCAGCCGATAGCGGAACGGGAAGGCGACTGGAGTGCCATGTCCGGTTTTCAACAAACCA TGCAAATGCTGAATGAGGGCATCGTTCCCACTGCGATGCTGGTTGCCAACGATCAGATGGCGCTGGGCGCAATGCGCGCCATTACC GAGTCCGGGCTGCGCGTTGGTGCGGATATCTCGGTAGTGGGATACGACGATACCGAAGACAGCTCATGTTATATCCCGCCGTTAAC CACCATCAAACAGGATTTTCGCCTGCTGGGGCAAACCAGCGTGGACCGCTTGCTGCAACTCTCTCAGGGCCAGGCGGTGAAGGGCA ATCAGCTGTTGCCCGTCTCACTGGTGAAAAGAAAAACCACCCTGGCGCCCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGAT TCATTAATGCAGCTGGCACGACAGGTTTCCCGACTGGAAAGCGGGCAGTGAGCGCAACGCAATTAATGTGAGTTAGCTCACTCATT AGGCACAATTCTCATGTTTGACAGCTTATCATCGACTGCACGGTGCACCAATGCTTCTGGCGTCAGGCAGCCATCGGAAGCTGTGG TATGGCTGTGCAGGTCGTAAATCACTGCATAATTCGTGTCGCTCAAGGCGCACTCCCGTTCTGGATAATGTTTTTTGCGCCGACAT CATAACGGTTCTGGCAAATATTCTGAAATGAGCTGTTGACAATTAATCATCGGCTCGTATAATGTGTGGAATTGTGAGCGGATAAC AATTTCACACAGGAAACAGCCAGTCCGTTTAGGTGTTTTCACGAGCACTTCACCAACAAGGACCATAGATTATGAAAATCGAAGAA CACCATCACCATCACCATTCCAGCGGTAAGCTTATGTCCCCTATACTAGGTTATTGGAAAATTAAGGGCCTTGTGCAACCCACTCG ACTTCTTTTGGAATATCTTGAAGAAAAATATGAAGAGCATTTGTATGAGCGCGATGAAGGTGATAAATGGCGAAACAAAAAGTTTG 45

98

AATTGGGTTTGGAGTTTCCCAATCTTCCTTATTATATTGATGGTGATGTTAAATTAACACAGTCTATGGCCATCATACGTTATATA GCTGACAAGCACAACATGTTGGGTGGTTGTCCAAAAGAGCGTGCAGAGATTTCAATGCTTGAAGGAGCGGTTTTGGATATTAGATA CGGTGTTTCGAGAATTGCATATAGTAAAGACTTTGAAACTCTCAAAGTTGATTTTCTTAGCAAGCTACCTGAAATGCTGAAAATGT TCGAAGATCGTTTATGTCATAAAACATATTTAAATGGTGATCATGTAACCCATCCTGACTTCATGTTGTATGACGCTCTTGATGTT GTTTTATACATGGACCCAATGTGCCTGGATGCGTTCCCAAAATTAGTTTGTTTTAAAAAACGTATTGAAGCTATCCCACAAATTGA TAAGTACTTGAAATCCAGCAAGTATATAGCATGGCCTTTGCAGGGCTGGCAAGCCACGTTTGGTGGTGGCGACCATCCTCCAAAAT CGGATCTAGAAGTTCTGTTCCAGGGGCCCCTGTCCAGCGGTCTGGTTCCGCGTGGTTCCGGTACCGCGGCCCAGCCGGCCTTTTTT GCGGCCGCATAATAAACCGATACAATTAAAGGCTCCTTTTGGAGCCTTTTTTTTTGGAGATTTTCAACGTGAAAAAATTATTATTC GCAATTCCTTTAGTTGTTCCTTTCTATTCTCACTCCGCTGAAACTGTTGAAAGTTGTTTAGCAAAACCCCATACAGAAAATTCATT TACTAACGTCTGGAAAGACGACAAAACTTTAGATCGTTACGCTAACTATGAGGGTTGTCTGTGGAATGCTACAGGCGTTGTAGTTT GTACTGGTGACGAAACTCAGTGTCTAGCTAGAGTGGCGGTGGCTCTGGTTCCGGTGATTTTGATTATGAAAAGATGGCAAACGCTA ATAAGGGGGCTATGACCGAAAATGCCGATGAAAACGCGCTACAGTCTGACGCTAAAGGCAAACTTGATTCTGTCGCTACTGATTAC GGTGCTGCTATCGATGGTTTCATTGGTGACGTTTCCGGCCTTGCTAATGGTAATGGTGCTACTGGTGATTTTGCTGGCTCTAATTC CCAAATGGCTCAAGTCGGTGACGGTGATAATTCACCTTTAATGAATAATTTCCGTCAATATTTACCTTCCCTCCCTCAATCGGTTG AATGTCGCCCTTTTGTCTTTAGCGCTGGTAAACCATATGAATTTTCTATTGATTGTGACAAAATAAACTTATTCCGTGGTGTCTTT GCGTTTCTTTTATATGTTGCCACCTTTATGTATGTATTTTCTACGTTTGCTAACATACTGCGTAATAAGGAGTCTTAATCATGCCA GTTCTTTTGGCTAGCGCCGCCCTATACCTTGTCTGCCTCCCCGCGTTGCGTCGCGGTGCATGGAGCCGGGCCACCTCGACCTGAAT GGAAGCCGGCGGCACCTCGCTAACGGATTCACCACTCCAAGAATTGGAGCCAATCAATTCTTGCGGAGAACTGTGAATGCGCAAAC CAACCCTTGGCAGAACATATCCATCGCGTCCGCCATCTCCAGCAGCCGCACGCGGCGCATCTCGGGCAGCGTTGGGTCCTGGCCAC GGGTGCGCATGATCGTGCTCCTGTCGTTGAGGACCCGGCTAGGCTGGCGGGGTTGCCTTACTGGTTAGCAGAATGAATCACCGATA CGCGAGCGAACGTGAAGCGACTGCTGCTGCAAAACGTCTGCGACCTGAGCAACAACATGAATGGTCTTCGGTTTCCGTGTTTCGTA AAGTCTGGAAACGCGGAAGTCAGCGCCCTGCACCATTATGTTCCGGATCTGCATCGCAGGATGCTGCTGGCTACCCTGTGGAACAC CTACATCTGTATTAACGAAGCGCTGGCATTGACCCTGAGTGATTTTTCTCTGGTCCCGCCGCATCCATACCGCCAGTTGTTTACCC TCACAACGTTCCAGTAACCGGGCATGTTCATCATCAGTAACCCGTATCGTGAGCATCCTCTCTCGTTTCATCGGTATCATTACCCC CATGAACAGAAATTCCCCCTTACACGGAGGCATCAAGTGACCAAACAGGAAAAAACCGCCCTTAACATGGCCCGCTTTATCAGAAG CCAGACATTAACGCTTCTGGAGAAACTCAACGAGCTGGACGCGGATGAACAGGCAGACATCTGTGAATCGCTTCACGACCACGCTG ATGAGCTTTACCGCAGGATCCGGAAATTGTAAACGTTAATATTTTGTTAAAATTCGCGTTAAATTTTTGTTAAATCAGCTCATTTT TTAACCAATAGGCCGAAATCGGCAAAATCCCTTATAAATCAAAAGAATAGACCGAGATAGGGTTGAGTGTTGTTCCAGTTTGGAAC AAGAGTCCACTATTAAAGAACGTGGACTCCAACGTCAAAGGGCGAAAAACCGTCTATCAGGGCTATGGCCCACTACGTGAACCATC ACCCTAATCAAGTTTTTTGGGGTCGAGGTGCCGTAAAGCACTAAATCGGAACCCTAAAGGGAGCCCCCGATTTAGAGCTTGACGGG GAAAGCCGGCGAACGTGGCGAGAAAGGAAGGGAAGAAAGCGAAAGGAGCGGGCGCTAGGGCGCTGGCAAGTGTAGCGGTCACGCTG CGCGTAACCACCACACCCGCCGCGCTTAATGCGCCGCTACAGGGCGCGTCCGGATCCTGCCTCGCGCGTTTCGGTGATGACGGTGA AAACCTCTGACACATGCAGCTCCCGGAGACGGTCACAGCTTGTCTGTAAGCGGATGCCGGGAGCAGACAAGCCCGTCAGGGCGCGT CAGCGGGTGTTGGCGGGTGTCGGGGCGCAGCCATGACCCAGTCACGTAGCGATAGCGGAGTGTATACTGGCTTAACTATGCGGCAT CAGAGCAGATTGTACTGAGAGTGCACCATATGCGGTGTGAAATACCGCACAGATGCGTAAGGAGAAAATACCGCATCAGGCGCTCT TCCGCTTCCTCGCTCACTGACTCGCTGCGCTCGGTCGTTCGGCTGCGGCGAGCGGTATCAGCTCACTCAAAGGCGGTAATACGGTT ATCCACAGAATCAGGGGATAACGCAGGAAAGAACATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGC TGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTA TAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTT TCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCT GTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTA TCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAA CTACGGCTACACTAGAAGGACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGAT CCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGAT CCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGAT CTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAAT GCTTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACG

ATACGGGAGGGCTTACCATCTGGCCCCAGTGCTGCAATGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCAATAAA 38

CCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTA GAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTGCAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATG GCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCC TCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCAT CCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCG GCGTCAACACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTC AAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCG TTTCTGGGTGAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTC CTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAAT AGGGGTTCCGCGCACATTTCCCCGAAAAGTGCCACCTGACGTCTAAGAAACCATTATTATCATGACATTAACCTATAAAAATAGGC GTATCACGAGGCCCTTTCGTCTTCAA 45