Ubiquitin Engineering: Understanding Recognition and Generating Affinity Reagents

by

Isabel Leung

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Department of Molecular Genetics University of Toronto

© Copyright by Isabel Leung 2017

Abstract

Ubiquitin Engineering: Understanding Ubiquitin Recognition and Generating Affinity Reagents

Isabel Leung

Doctor of Philosophy

Department of Molecular Genetics University of Toronto

2016

Protein- interactions are necessary for virtually all biological processes. There have been tremendous efforts to document the diversity of molecular recognition, and to understand how molecular recognition occurs. The understanding of molecular interaction has also served as the foundation for designing novel protein interactions for use in therapeutics, diagnostics and basic sciences. An attractive system for studying protein-protein interactions is the ubiquitin (Ub) system. Ub is a protein modifier that is combinatorially ligated onto substrate to influence substrate turnover and function. Ub uses a common surface to interact with more than

1000 proteins and plays pivotal roles in cell physiology. Despite the substantial structural information on Ub mediated interactions, there is no clear understanding of how individual Ub residues contribute to Ub’s broad scope of interactions. To address this question, I used affinity enhanced Ub variants (Ubvs) as proxies of native Ub in saturation scanning. Using saturation scanning, I studied the interactions between Ubvs and two Ub specific (USP), USP2 and USP21, and elucidated a common functional that is critical for USP recognition.

The functional epitope recognizes USP residues that are conserved among the human USP family, suggesting it may make functional contributions in many other USP interactions. While

ii

the functional for Ub binding to USP2 and USP21 share many identical residues, there exist distinct clusters of residues that are responsible for gaining high affinity and specificity to either USP2 or USP21. In this thesis, I also describe the usage of Ub as an alternative scaffold for generating affinity reagents. I developed a novel Ub phage library for the generation of

Ubvs to targets of interest. I present validation data on Ubvs selected to bind mammalian cell surface receptor, Her3, and an intracellular protein, Grb2, showing that Ubvs can be used like conventional in cell biological experiments and can be utilized to antagonize within a cell. My work demonstrates that protein-engineering approaches can reveal novel aspects of Ub biology and generate novel affinity reagents to probe intracellular signaling pathways.

iii

Acknowledgments

First and foremost, I thank my parents for leaving behind their established lives in Hong Kong and immigrating to Canada so that my brother and I can have a better future. Without their sacrifices, this PhD degree would likely have been impossible.

I sincerely thank my supervisor, Sachdev Sidhu, for taking a chance on me. Not only have I learnt the art of doing science from him, but I also learnt from him the meaning of hard work, determination and most of all what it means to have a vision. I thank my committee members Jason Moffat and James Ellis for their scientific insights and support as well.

This degree would have been far less enjoyable without the help from the fun loving, past and present colleagues of the Sidhu lab: Andreas Ernst, Rachel Hanna, Amandeep Gakhal, Nish Patel, Linda Beatty, Esther Lau, Rashida Williams, Joan Teyra, Maryna Gorelik, Lia Cardarelli, Jaspal Singh, Max London, Kristin Kantautus, Natasha Pascoe, Valencio Salema, Moshe Ben-David, Johan Nilvebrant, Jan Tykvart, Gianluca Veggiani, Wei Ye, Alia Pavlenco, Lori Moffat and Danielle Carranza.

Finally, I owe a big part of this degree to my fiancé and soon to be husband, Mike Eastwood, for pushing me to the finish line. I will forever be grateful for his, sometimes feeble, attempts to put a smile on my face when times were tough. Thanks for putting up with me. Finally, special shout outs to my friends - Leena Baker, Praan Misir, Tina Shum, and Paul Lee.

Thank you.

iv

Table of Contents

Abstract ...... ii

Acknowledgments ...... iv

Table of Contents ...... v

List of Tables ...... viii

List of Figures ...... ix

List of Abbreviations ...... xi

1 Introduction ...... 1 1.1 Protein-Protein Interactions ...... 1 1.1.1 Affinity and Specificity ...... 1 1.1.2 Binding mechanisms and their effects on affinity and specificity ...... 3 1.2 Characteristics of protein binding surfaces ...... 6 1.2.1.1 Composition ...... 6 1.2.1.2 Hot spots ...... 7 1.2.2 Studying PPIs ...... 9 1.2.2.1 ...... 9 1.2.2.2 Applications of phage display in PPI studies ...... 11 1.3 of Affinity Reagents ...... 13 1.3.1 Antibodies ...... 14 1.3.1.1 Limitations ...... 15 1.3.2 Alternative Scaffolds ...... 16 1.3.3 In vitro display systems for affinity reagent engineering ...... 17 1.3.4 Examples of Successful Alternative Scaffolds ...... 19 1.3.4.1 Affibodies ...... 19 1.3.4.2 ...... 20 1.3.4.3 ...... 20 1.3.4.4 ...... 21 1.3.5 Practical considerations for scaffold engineering ...... 22 1.4 Ubiquitin ...... 23

v

1.4.1 Ub in biology and disease ...... 23 1.4.2 The Ub fold ...... 26 1.4.3 Ub Interactions ...... 27 1.5 Thesis Overview ...... 29

2 Saturation Scanning of Ubiquitin Variants reveals a common hot spot for binding to USP2 and USP21 ...... 31 2.1 Statement of Contribution ...... 31 2.2 Introduction ...... 31 2.3 Methods and Materials ...... 33 2.3.1 Library construction and analysis ...... 33 2.3.2 Statistical Analysis...... 35 2.3.3 activity assays ...... 35 2.3.4 Computational saturation mutagenesis ...... 36 2.4 Results ...... 36 2.4.1 Saturation Scanning of Ubv.2.1 and Ubv.21.4 ...... 36 2.4.2 The Ub-binding site of USPs ...... 43 2.4.3 Interactions at the USP:Ubv interface ...... 48 2.5 Discussion ...... 52

3 Ubiquitin as an alternative scaffold for the development of affinity reagents ...... 55 3.1 Statement of contribution ...... 55 3.2 Introduction ...... 55 3.3 Methods and Materials ...... 58 3.3.1 Library construction ...... 58 3.3.2 Mutation tolerance scan ...... 60 3.3.3 Ag binding selections ...... 60 3.3.4 Protein expression and purification ...... 61 3.3.5 Phage and protein ELISAs ...... 61 3.3.6 ...... 62 3.3.7 Cell fractionation ...... 63 3.3.8 Western blotting ...... 63 3.3.9 EGFR signaling assays ...... 64 3.3.10 SPR kinetic binding analysis ...... 64 3.4 Results ...... 64

vi

3.4.1 Mutation Tolerance Scan ...... 64 3.4.2 Library Design ...... 67 3.4.3 Binding selections ...... 70 3.4.4 Characterization of Ubvs targeting Her3 ...... 73 3.4.5 Characterization of Ubvs targeting the Grb2 SH2 domain...... 77 3.5 Discussion ...... 84

4 Conclusions and Future Perspectives ...... 87 4.1 Ubvs in saturation scanning ...... 87 4.1.1 Technical improvements to saturation scanning ...... 89 4.2 Naïve Ub library ...... 89 4.2.1 Ubvs targeting transcription factors ...... 90 4.2.2 Validation of TFs as drug targets ...... 91 4.3 Future of affinity reagents based on alternative scaffolds ...... 92

5 References ...... 95

vii

List of Tables

Table 2.1 Sequences of mutagenic oligonucleotides used to construct saturation scanning libraries ...... 34

Table 3.1 Degenerate codons used in the mutation tolerance scan ...... 59

Table 3.2 Oligonucleotides used to construct naïve Ub libraries...... 60

Table 3.3 Kinetic analysis of Ubv binding to corresponding targets...... 76

viii

List of Figures

Figure 1.1 Models for protein-binding mechanisms ...... 6

Figure 1.2 Phage Display workflow ...... 11

Figure 1.3 Antibody structure ...... 15

Figure 1.4 Structures of binding domains ...... 21

Figure 1.5 The β-grasp fold ...... 26

Figure 1.6 The structure of human Ub ...... 28

Figure 1.7 Interactions of Ub with partner proteins ...... 28

Figure 2.1 Ubv saturation scanning library designs ...... 37

Figure 2.2 The saturation scan database ...... 38

Figure 2.3 Saturation scans of Ubvs ...... 39

Figure 2.4 Dose response curves for inhibition of USP2 by Ubvs ...... 40

Figure 2.5 Computational saturation scanning of Ubvs ...... 42

Figure 2.6 Sequence alignment of the Ub-binding sites of the human USP family...... 44

Figure 2.7 Sequence conservation across the Ub-binding sites of human USPs...... 45

Figure 2.8 Alanine-scanning of USP2 site-2 ...... 47

Figure 2.9 Superposition of USPs in complex with Ub.wt or Ubvs...... 49

Figure 2.10 Interactions between the Ubv core functional epitopes and USPs...... 50

Figure 3.1 Phage ELISA of Ub library ...... 65

ix

Figure 3.2 Library designs for mutational tolerance scan...... 66

Figure 3.3 Ub positions that exhibit low tolerance to mutations...... 68

Figure 3.4 Amino acid frequencies in libraries KHT and NNK...... 69

Figure 3.5 Binding selection outcomes for a panel of protein ...... 71

Figure 3.6 Sequences of binding Ubvs ...... 73

Figure 3.7 Competitive phage ELISAs for Her3 binding Ubvs...... 74

Figure 3.8 Analysis of purified Her-binding Ubvs...... 75

Figure 3.9 Kinetic analysis of Ubv-Her3.2 binding to Her3 ...... 75

Figure 3.10 Ubv-Her3.2 as a tool to immunoprecipitate Her3 and co-immunoprecipitate Her3 interacting Her members from cell lysates ...... 77

Figure 3.11 Competition phage ELISA for Grb2 selected Ubvs...... 78

Figure 3.12 Analysis of purified Grb2-binding Ubvs ...... 79

Figure 3.13 Grb2 by Ubvs ...... 80

Figure 3.14 Kinetic analysis of Ubvs binding to Grb2-SH2 ...... 81

Figure 3.15 Inhibition of the interaction of Grb2-SH2 and p-Tyr peptide ligand by Ubvs...... 82

Figure 3.16 Effects of Ubv-L12 on Grb2-mediated EGFR signaling...... 83

x

List of Abbreviations

Ab - antibody Ag - DBD - DNA binding domain DUB - deubiquitinase ECD - extracellular domain EGFR - epidermal growth factor receptor ErbB - erythroblastosis oncogene B Fab - Fragment antigen-binding Fc - Fragment crystalizable Fv - Fragment variable G - Gibbs free energy Grb2 - Growth factor receptor-bound protein 2 Her - human epidermal growth factor receptor hGH - human growth hormone hGHR - human growth hormone receptor

KD - dissociation constant

Koff - dissociation rate

Kon - association rate M - Molar nM - Nanomolar PDZ - postsynaptic density protein 95; discs large; and zonula occludens-1 pM - picomolar PPI - protein-protein interaction PPIC - phosphatase inhibitor cocktail PRM - peptide recognition module PTI - pancreatic trypsin inhibitor s - second scFV - single chain fragment variable SH - Src Homology TF - transcription factor Ub - ubiquitin UBA - ubiquitin association domain UBD - ubiquitin binding domain UIM - ubiquitin interacting motif wt – wildtype α - alpha β - Beta β-GF - beta grasp fold μM - micromolar

xi

1 Introduction

1.1 Protein-Protein Interactions

Proteins are the workhorses of the cell. They maintain homeostasis and cell survival by performing functions such as catalysis, activation or inhibition of other proteins, translocation of biological molecules between compartments, immune recognition to destroy foreign entities, and assembly of macromolecular architectures. All of these functions are rooted in protein- protein interactions (PPIs), the act of one protein contacting another. Not only are PPIs critical for cell survival, they are also fascinating in their own right. Far from the textbook view, PPIs are not blocks that merely complement each other in shape and exist only in either bound or unbound form. Rather, PPIs are extremely dynamic and vary in affinity, specificity and binding mechanisms.

1.1.1 Affinity and Specificity

At the molecular level, protein interactions occur because there exists complementarity between surfaces of two proteins. Such complementarity include the shape of the two molecules, charge, electrostatics, dipole-dipole forces (e.g. hydrogen bonds), van der Waals forces and hydrophobic packing to shield non-polar groups from solvent or other polar environments. PPIs can be characterized by two parameters: affinity and specificity. Affinity is the strength of the interaction between the two proteins, and is quantified by the equilibrium dissociation constant,

Kd, (units in mol/L or M), which also equates to the concentration required to reach 50% binding. As such, the amount of protein complexes formed within a cell is influenced by local concentrations. Kd is also equivalent to the ratio of association and dissociation rates (Kd=koff/kon) of an interaction. In physical terms, affinity is related to the Gibbs free energy of association

(∆G=RTlnKd), which must be negative for a complex to form. Gibbs free energy can also be represented in terms of changes in enthalpy (∆H) and entropy (∆S), as follows: ∆G= ∆H-T∆S. The concept of entropy is relevant to protein binding mechanisms discussed in the next section.

Affinity is an important parameter in PPIs, and contributes to the proper execution of biological processes. An example that highlights the importance of affinity is the barnase-barstar

1

complex. Barnase is a secreted ribonuclease first discovered in the bacterium, Lactobacillus amyloliquefaciens. Barnase is lethal to when expressed in the absence of its inhibitor Barstar, as the enzyme readily cleaves RNA within the cell. Due to the inhibitors critical role in preventing cell death, the affinity of the interaction is extremely high with an association rate

9 -1 -1 -14 (kon) of > 5 x 10 M s and a Kd of 1.3 x 10 M (Schreiber and Fersht 1993). Thus, binding is practically irreversible at the concentrations present in the cell, and this ensures that barnase activity is potently inhibited.

Yet in other scenarios, low affinity interactions are optimal. PPIs in signaling are generally much more transient. In signaling pathways, interactions mediated by many peptide recognition modules (PRMs), such as Src homology-2 (SH2) and SH3 domains, often recognize their binding partners with affinities in the 10-6 M to 10-3 M range, and this is thought to enable rapid response to external stimuli (Pawson and Nash 2000). Therefore, in the context of biological efficacy, differing PPI affinities are necessary to accurately execute and regulate cellular processes.

The other parameter of PPIs is specificity. A specific interaction is defined as a functional biologically relevant recognition event, while non-specific interactions are viewed as off target binding, which are usually several orders of magnitude lower in Kd and do not contribute to purposeful biological output (Kastritis and Bonvin 2013). Specificity can also be used to describe the number of functional binding partners a protein has, a protein can be mono- specific or multispecific in its interactions. For example, some proteins recognize few partners that are usually similar in structure, while others mediate many interactions with diverse structural partners. In an extreme case of binding promiscuity, the transcription factor p53 contains numerous binding sites and is known to interact with at least 84 proteins (Schreiber and Keating 2011), which enable responses myriad of cellular insults to initiate processes such as apoptosis, DNA damage response, cell cycle arrest and autophagy modulation (Zilfou and Lowe 2009). As such, protein-binding sites can be plastic to accommodate different binding partners, and it is hypothesized that extreme multispecificity is a hallmark of “hub” proteins for mediating cross talk between different pathways (Peleg, Choi et al. 2014).

PPI specificity is especially important when viewed within the context of the cell. When proteins reach high enough concentrations, they will inevitably interact. Thus, in a crowded cellular environment, where total protein concentrations have been estimated to reach mM range 2

(Milo 2013), protein-binding specificity plays an important role in avoiding miscommunication. Furthermore, the chances of binding within a cell are compounded by other cellular processes such as expression levels, compartmentalization, post-translational modification and pH. Thus, affinity and specificity measurements in vitro might not necessarily translate to the cellular context. However, by comparing specificity and affinity of proteins measured under identical in vitro conditions, one can gain invaluable insights into molecular determinants that dictate binding and biology.

From the above examples, it is clear that affinity and specificity of PPIs are integral to the proper execution of biological processes. Due the importance of affinity and specificity in shaping biological processes, it is important to understand the molecular underpinnings by which the two parameters are derived.

1.1.2 Binding mechanisms and their effects on affinity and specificity

Understanding how proteins establish binding is pivotal to understanding how affinity and specificity are derived. The processes by which two proteins come together in the thermodynamic sense can be very different, and can explain why two proteins have high affinity for each other or why a protein is specific or promiscuous in its interactions.

Three dominant binding mechanisms have been proposed overtime, each offering a different proposition on how proteins associate. The first mechanism, the lock and key model (Fig 1.1A), was proposed by Emil Fischer more than 100 years ago (Fischer 1894). According to this model, proteins possess shape complementarity like pieces of a puzzle, and binding depends on the degree of complementarity in the pre-bound state. This description was founded on observations of enzyme-substrate complexes (Ruhlmann, Kukla et al. 1973), which tend to have rigid binding pockets. Interactions that follow the lock and key model have excellent superposition of unbound structures to the complex. According to this theory, the inflexibility of the binding interface disallows the protein from binding others that are very different from the shape of preferred partner, and thus binding is usually limited to a few partners that are structurally similar. Lock and key interactions are usually high in affinity because they tend to incur a low binding entropic penalty compared to interactions that follow other binding mechanisms, because the starting pre-bound proteins have already restricted motion. Furthermore, a protein that has a preformed binding configuration usually has a faster

3

association rate, kon, to its cognate partner because it does not require time to reorient into a binding competent molecule. This can be seen in the comparison between trypsinogen and trypsin. At the sequence level, the only difference between trysin and trysinogen is that trypsinogen has a 15 residue N-terminal extension compared to trypsin. However, at the structure level, trypsinogen possess a disordered binding site for the pancreatic trypsin inhibitor (PTI), which becomes ordered when bound to PTI, while trypsin has an ordered PTI binding site prior to PTI binding. (Kossiakoff, Chambers et al. 1977, Bode, Schwager et al. 1978).

Comparison of binding kinetics revealed that trypsinogen has a kon for PTI that is several orders of magnitude lower than that of trypsin and PTI, likely reflecting the transition from the disordered to the ordered state.

The second theory, the induced-fit model, was proposed more than 50 years after the Lock and Key model, by Daniel Koshland (Koshland 1959). He proposed that rather than rigid bodies, proteins structures are dynamic and the binding shapes are only induced or formed when two proteins come together (Fig 1.1B). In other words, proteins adjust to each other to adopt complementing shapes. At the temporal scale, induced fit binding requires reorientation of the proteins and thus retards the association rate, resulting in interactions that can be lower in affinity.

A third model, conformation selection, was developed in 1965 and describes binding as a process that selects one of many pre-existing conformations in equilibrium (Fig 1.1C) (Monod, Wyman et al. 1965, Rubin and Changeux 1966). Upon binding, the ligand stabilizes one conformation of the partner and thus shifts the equilibrium to favour that particular conformation. Evidence for this mechanism was seen in NMR studies of unbound ubiquitin (Ub), which showed an ensemble of solution conformations that covered a span of Ub conformations observed in previously solved crystal structures of Ub complexes (Lange, Lakomek et al. 2008). This theory has been used to explain binding promiscuity, as one protein can bind to different partners using different conformations within its naturally occurring ensemble. With regards to affinity, such a theory would result in interactions that have a lower association rate than the lock and key model, because time is required for a protein to search for its binding conducive conformation and accompanies a general lost in entropy as proteins will become restricted in their bound conformations. Evidence can be seen in antibodies (Abs), which are known to bind through the conformation selection model (Berger, Weber-Bornhauser

4

et al. 1999, James, Roversi et al. 2003). However, affinity matured Abs that exhibit tighter affinity and specificity towards their cognate antigens, often have much reduced flexibility compared to germline Abs, and it is believed that the rigidification of the complementarity determining region (CDR) loops changes the binding process to more closely resemble the lock and key model (Manivel, Sahoo et al. 2000, Jimenez, Salazar et al. 2003, Sagawa, Oda et al. 2003).

More recently, a combination of the induced fit model and conformation selection model have been observed to apply simultaneously to a binding interaction, albeit at differing segments of a protein. Many examples have been observed where the binding conformation at the global scale already exists in solution, but the local binding site rearranges only in the presence of ligand (Wlodarski and Zagrovic 2009, Csermely, Palotai et al. 2010, Kurisaki, Takayanagi et al. 2014).

A

B

C

D

5

Figure 1.1 Models for protein-binding mechanisms

(A) Lock and key. (B) Induced-fit. (C) Conformational selection. (D) Conformation selection and induced-fit.

1.2 Characteristics of protein binding surfaces

A holy grail in protein science has been to accurately predict where PPIs will occur and in turn decipher the connectivity of PPIs within a cell. To narrow down the search of potential binding events, much research has been dedicated to answering the question: what makes a protein binding site? The question is particularly interesting to researchers within the field of , where we hope to harness some of the basic biochemical principles of PPIs to create customized or de novo binding proteins. The characteristics of ligand binding sites on are easy to describe because they are usually structurally obvious, forming deep clefts, pockets or crevices. However, protein binding sites are much more difficult to differentiate from the rest of the exposed surface, in terms of structure and character.

1.2.1.1 Amino Acid Composition

It is widely accepted that PPI are mainly driven by packing of hydrophobic residues at the binding interface (Jones and Thornton 1996, Keskin, Gursoy et al. 2008), with aromatic residues (His, Tyr, Trp, and Phe) more prevalent than aliphatic residues (Met, Leu, Ile, and Val) when compared to the average protein surface (Lo Conte, Chothia et al. 1999). However, hydrogen bonding and ion pair interactions mediated by hydrophilic and charged residues also play important roles in binding (Keskin, Gursoy et al. 2008). In addition, binding sites do not have to be the largest exposed hydrophobic surface on a protein (Keskin, Gursoy et al. 2008). Therefore, while hydrophobicity is an important parameter for binding, it is only a loose criterion for defining PPIs. In addition, amino acid propensities also vary depending on the type of PPI. Obligatory protein complexes (proteins that must oligomerize to function) have an even higher preference for hydrophobic residues.

Many insights in molecular recognition emerged from studies of Ab-antigen (Ag) interactions. The six CDRs or hypervariable loops of Abs form the antigen-binding site, which is also known as the paratope (Wu and Kabat 1970). In the immune system, the diverse sequence variation in CDRs can create large Ab repertoires that can theoretically recognize any 6

foreign molecule (discussed further in section 1.2.1). Interestingly, sequence analysis of natural paratopes revealed that Tyr residues are prevalent in CDRs (Janin and Chothia 1990, Mian, Bradwell et al. 1991), and if considering only functional contacts to Ag, Tyr abundance is even higher at ~25% (Mian, Bradwell et al. 1991). It was hypothesized that the aromatic nature of Tyr in conjunction with a hydroxyl group on the side chain makes it a versatile amino acid for molecular recognition. The aromatic ring can take part in π-stacking and large van der Waals interactions, while the hydroxyl group can form a hydrogen bond with partner proteins.

Furthermore, the functionality of Tyr in Ab-Ag interactions was highlighted in synthetic Ab studies, where one can precisely control the sequence composition of CDRs and in turn test hypotheses regarding the functionality of particular amino acids. Minimalist antibody libraries with only Tyr and Ser residues in the CDRs have generated binding Abs to diverse Ags with high affinity and specificity (Fellouse, Wiesmann et al. 2004, Fellouse, Li et al. 2005, Fellouse, Barthelemy et al. 2006, Fellouse, Esaki et al. 2007). Ser is a small polar amino acid that can play an auxiliary role by providing flexibility to position Tyr for optimal antigen contacts. However, Tyr dominated CDRs are not optimal for all Ags, and additional chemical diversity has been shown to improve binding to some antigens (Birtalan, Fisher et al. 2010). Another notable observation was that while Arg is favourable for affinity, it is deleterious for specificity (Birtalan, Zhang et al. 2008). Many of these insights were applied to protein engineering efforts and will be discussed in Chapter 3.

1.2.1.2 Hot spots

Another method to describe the binding interface surface relies on the architectural organization of the surface rather than the amino acid composition.

When proteins interact, previously exposed protein surfaces become buried. Analyses of many protein complexes have shown that about 1000-2000 Å2 of surface area on each protein is buried (Hwang, Vreven et al. 2010). Interestingly, despite the large area that participates in binding, the binding energy released upon association of a pair of proteins is not equally distributed amongst the binding surface, and this discovery has had profound implications on drug design for perturbing PPIs (Arkin, Tang et al. 2014).

Wells and colleagues pioneered alanine-scanning mutagenesis as a methodology to assess energetic contributions by individual sides chains within the binding surface area. 7

Substitution with Ala removes all side chain atoms past the β-carbon of the residue without altering main chain conformations or introducing electrostatics, thus allowing one to infer consequential binding effects of individual side chains (Clackson and Wells 1995, Wells 1996, Wells and de Vos 1996). By individually replacing all residues within the binding surface area, one can scan the entire buried surface and determine the residues that functionally contribute to binding. When Wells and colleagues applied alanine-scanning to human growth hormone receptor (hGHR), and measured resulting binding to human growth hormone (hGH), they discovered that of the 33 residues that became buried upon complexation, only nine residues had substantial negative effects on binding to HGH (ΔΔG ≥2 kcal/mol). Hence within the buried surface area, also termed the ‘structural epitope’, most residues were actually energetically neutral for binding, and only a subset of residues within the structural epitope are responsible for the affinity of the complex. The residues that are responsible for the affinity of a PPI constitute the “hot spot” or functional epitope within the structural epitope, and together often form a contiguous patch.

Much work has been done to characterize the anatomy of hot spots. In a study where 2325 Ala mutants of protein complexes were analyzed, Tyr, Trp and Arg dominated hot spots (Bogan and Thorn 1998), which further reinforced the findings derived from Ab studies that showed that Tyr is important for molecular recognition. Another intriguing aspect of hot spots is that hot spot residues are often more dynamic then the remaining binding surface, and the hot spot residues accommodate diverse rotameric conformations (DeLano, Ultsch et al. 2000, Sundberg and Mariuzza 2000). Moreover, peptide phage display studies have shown that protein binding peptides that exhibit no homology to cognate ligands also bind to hot spots of protein, and thus suggest that hot spots are better adapted for binding, and possibly possess a unique molecular anatomy (Fairbrother, Christinger et al. 1998, DeLano, Ultsch et al. 2000, Schaffer, Deshayes et al. 2003).

From a therapeutic standpoint, PPIs were traditionally assumed to be intractable to inhibition by small molecules. Unlike enzymes and receptors that have deep binding pockets for small ligands, PPI surfaces cover expansive flat areas that were thought to be too shallow to accommodate small molecules. However, the understanding of hot spots and their anatomy led to the realization that targeting PPI for therapeutic intervention is feasible, as one only needs to target the hot spot when designing drugs. The hot spot area covers a much smaller area relative

8

to the structural epitope, and has been observed to possess dynamic behavior that can accommodate small molecules. In fact, while unbound structures might seem to be devoid of binding clefts or pockets, the few available solved structures of small molecules bound to hot spots of protein-protein interfaces have revealed unexpected cavities that were only visible when bound to the small molecules (Arkin, Randal et al. 2003, Grasberger, Lu et al. 2005, Bruncko, Oost et al. 2007). As of 2014, 26 PPI inhibitors were reported (Rognan 2015), which further reinforces the notion that PPIs are tractable for small molecular targeting.

1.2.2 Studying PPIs

While structural elucidation by x-ray crystallography, NMR, and more recently high-resolution electron microscopy enable the visualization of protein complexes in three dimensions, these techniques only provide limited insights into relationships between structure and function. Structures of static protein complexes do not give information on the plasticity of the binding site nor on the functional epitope that is energetically responsible for interaction. Site-directed mutagenesis is an invaluable technique for elucidating structure-function relationships. It endowed biochemists with the ability to alter protein structure by making specific changes at the primary sequence level, and subsequently testing for consequential functional changes (Flavell, Sabo et al. 1975, Hutchison, Phillips et al. 1978). By identifying and mapping functional epitopes onto three-dimensional structures, one can understand the organization and composition of functional epitopes. This approach is not only important for answering the fundamental question of molecular recognition, but in more practical terms, it provides clues for protein and drug design.

As described above, alanine-scanning mutagenesis is a powerful method for probing the functional contributions of individual side chains in PPIs. However, the technique is labourious as it requires cloning, purification and binding analysis of individual Ala point mutants that together cover the entire binding surface. To expedite the process, a combinatorial approach called shotgun alanine-scanning was developed to produce combinatorial phage-displayed libraries of mutants and collectively assess binding in a pooled format.

1.2.2.1 Phage Display

Phage display is a powerful technology that is now routinely used to study PPIs and for protein engineering (see also section 1.2.3). In 1985, George Smith demonstrated that peptides can be 9

genetically fused to the phage outer coat protein, which resulted in the external display of a heterologous peptide on phage (Fig. 1.2)(Smith 1985). In 1990, the Wells and Winter groups demonstrated that large proteins can also be functionally displayed on phage (Bass, Greene et al. 1990, McCafferty, Griffiths et al. 1990), and this accelerated the field of protein engineering. The power of phage display and other in vitro display technologies (Lipovsek and Pluckthun 2004, Pepper, Cho et al. 2008, Lofblom 2011) stems from the fact that there exists a physical linkage between genotype (gene coding the displayed protein) and phenotype (the displayed protein), permitting for in vitro evolution to occur. By exploiting the life cycle of phage, one can amplify and enrich for ‘fitter’ molecules on phages that exhibit desirable qualities in parameters such as affinity, specificity and function.

The most common species exploited for phage display is the non-lytic filamentous bacteriophage M13, which infects a variety of gram-negative bacteria including the laboratory workhorse (Smith and Scott 1993). The workflow of phage display is simple and cost effective, and thus can be easily adopted in a common laboratory setting. The concept relies on the production of a massive library of diversified protein variants on phage (routinely >1010)(Fellouse 2007), and subjecting the library to a panning process to select for binding variants, which can then be amplified in the E.coli host (Fig 1.2). By iteratively performing the selection, one can enrich for binding molecules. Since the selection process is performed independently of a living cell, one can employ different in vitro conditions to select for displayed molecules exhibiting desired properties such as stability (Jung, Honegger et al. 1999) or slow off rates (Levin and Weiss 2006). Another benefit is that one can easily identify the binding proteins by sequencing the encoding gene encapsulated in the phage, which allows for binder characterization independent of phage or further engineering efforts.

10

Binding Phages amplification in bacterial host

Phage 9 Immobilized Binding Library 10 antigen Phages

Washing Step

Non-binding phages

Figure 1.2 Phage Display workflow A Phage pool representing a displayed combinatorial library of protein variants is added to immobilized antigen to allow binding phages to interact with antigen. Non-binding phages are washed away and bound phages are subsequently amplified in a bacterial host. The selection process can be repeated with the amplified phage pool until desirable binding phages are identified.

Other display systems have emerged over the years, such as bacterial (Lofblom 2011), yeast (Pepper, Cho et al. 2008) and ribosomal display (Lipovsek and Pluckthun 2004), each aimed to add a different functionality to the display technology at the expense of other limitations.

1.2.2.2 Applications of phage display in PPI studies

The advent of phage display provided a combinatorial mutagenesis approach to tackle the structure-function problem in a high-throughput (HTP) manner. It enabled the facile production of libraries of mutant proteins that subsequently can be rapidly tested for binding. In shotgun alanine-scanning, one can produce a combinatorial library of Ala mutants and enrich for binding mutants by panning the library against the cognate binding partner. Subsequent sequence analysis of binding variants can reveal positions that are depleted in Ala, and thus one can deduce (wild-type) wt residues that are critical for binding. Under the assumption that the wt occurrence to Ala occurrence ratio (wt/Ala) is proportional to equilibrium binding constants of the wt or Ala mutants, one can perform statistical analyses of the wt/Ala ratios (after proper

11

correction for display biases) to calculate ΔΔGmut-wt at each tested position. Combinatorial Ala scanning by phage display was validated by Sidhu and colleagues when they benchmarked alanine-scanning data produced from the combinatorial method to to that produced by conventional alanine-scanning (Weiss, Watanabe et al. 2000). Thus, shotgun alanine-scanning circumvented the need to make kinetic measurements on individual mutants, and expedited the process of mapping binding hotspots.

In addition to alanine-scanning, other combinatorial scanning strategies have also emerged to unveil other relationships between protein structure and function. Recognizing that substitution by an apolar Ala side chains may overstate the binding contributions of hydrophilic side chains, serine-scanning was devised to assess if binding contributions will differ when scanned with a small polar side chain (Pal, Fong et al. 2005). Results from serine-scanning of hGH were in good accord with results from alanine-scanning, indicating that both apolar and polar residues were good substitutes for assessing binding contributions in this case. Homolog- scanning is an alternative method in which side chains are substituted with similar side chains, and this method was designed to assess binding effects introduced by subtle structural changes at the binding surface (Pal, Fong et al. 2005). Interestingly, the homolog scan of hGH showed little effect on binding to the hGHR, indicating that affinity cannot be improved substantially with only subtle changes to the binding surface.

An even more powerful method is the combinatorial saturation scan by phage display, which involves making libraries where the entire binding surface is substituted by all 20 amino acids (Pál, Kouadio et al. 2006). Such a scan provides a comprehensive assessment of the relationship between function and all the possible structural diversity afforded by the 20 amino acids. Results generated from such a scan offer tremendous insight into the adaptability of binding surfaces. Such a scan is practically inconceivable if not for display technologies, and will be discussed further in chapter 2.

Others have applied phage display to uncover relationships between protein structure and dynamics. Rather than strictly exploring binding surface structure-function relationships, Zhang and colleagues designed libraries to study protein interior core substitutions and consequential functional effects. Interestingly, protein core substitutions altered protein structural dynamics and stabilized some conformations over others, which resulted in biasing certain interactions at the expense of others in cells (Zhang, Zhou et al. 2013). 12

Yet another application of phage display in exploring PPIs involved tuning the affinity or specificity of naturally low affinity binding sites. These applications are highlighted in studies of SH3 and PDZ domains (Ernst, Gfeller et al. 2010, Teyra, Sidhu et al. 2012), which in nature often recognize many linear peptide ligands with low affinity. These studies proved to be powerful in elucidating the molecular determinants of affinity and specificity. Furthermore, by expressing engineered PRM variants in cells, one can study the consequential changes in PPI networks and signaling effects of a monospecific PRM (Findlay, Smith et al. 2013).

1.3 Protein Engineering of Affinity Reagents

The discipline of protein engineering emerged in 1980s, initially out of the necessity to improve industrial enzyme properties such as efficiency, robustness or specificity (Ulmer 1983) . The idea of improving enzymes was posited upon the observation that nature had similar enzymes that exhibited wide variations in traits such as thermo stability and catalytic rate constants. In addition, natural enzymes harbouring genetic variations showed distinct properties (Scazzocchio and Sealy-Lewis 1978). Thus, it was logical to propose that one could use site-directed mutagenesis to change structural features and endow an enzyme with desirable traits.

In addition to enzyme enhancements, it became apparent that the idea of changing structure and thus function of a molecule could be applied to another branch of biotechnology, the development of tailored affinity reagents. Affinity reagents are valuable tools in detecting analytes in basic research and in medicine. Not only can affinity reagents be used as diagnostics, but they can also be used as therapeutics that function to neutralize infectious agents or harmful overexpressed soluble proinflammatory molecules such as TNF-α (Feldmann and Maini 2001), to activate beneficial surface receptors such as CD40 (Beatty, Chiorean et al. 2011) or to antagonize unwanted immunosuppressive signals or disease causing receptors such as EGFR, (Van Cutsem, Kohne et al. 2009). In addition, small molecules or other toxins can be ligated to an affinity binder to deliver payloads to diseased cells. In biotechnology, affinity proteins are conjugated to resin for use in affinity purification of recombinant proteins. Traditionally, affinity regents are antibody based. However, over the years, new classes of affinity reagents based on alternative frameworks have been engineered to bind targets of interest.

13

1.3.1 Antibodies

Abs are naturally occurring proteins in the immune system, evolved as part of the humoral immune response to recognize Ags, which are usually foreign immunogenic molecular entities that can induce an immune response. The ability of Abs to recognize diverse molecular patterns lies in the molecule’s unique structure. Each Ab is a “Y” shaped assembly of two heavy and two light chains (Fig 1.3). The stem of the Y structure is a constant “fragment crystallizable” (Fc) domain that is responsible for extending in vivo half-life and for mediating immune effector functions. Each arm of the Y structure is a ‘fragment antigen binding’ (Fab) domain that is responsible for antigen recognition,

The Fab contains a fragment variable (Fv) domain that is composed of two juxtaposed immunoglobulin (Ig) folds. The Ig fold is a beta sandwich made of two β-sheets held together by a bond, and the loops at one end of the fold are the CDR or hypervariable loops. In total, each Fv has six CDR loops that are responsible for engaging antigens. The surface area on the CDRs that becomes buried upon binding to the antigen is known as the paratope. The B cells that produce Abs undergo various genetic shuffling events known as VDJ recombination to produce sequence diversity within the CDR loops. Usually the naïve Ab repertoire will only produce low affinity antibodies. However, under antigen stimulation, B cells producing low affinity Abs will undergo an affinity maturation process to produce higher affinity Abs. In addition to natural repertoires (diversity encoded by B cells), synthetic repertoires, where CDR diversities are man-made, have also proven to be useful for generating binding molecules to diverse antigens (see Section 1.2.3).

14

Figure 1.3 Antibody structure (A) Antibody schematic. (B) Fab structure. Light and heavy chain frameworks are colored tan and blue, respectively. Light and heavy chain CDRs are colored magenta and green, respectively (PDB entry 1ZTI).

1.3.1.1 Antibody Limitations

The relationship between Ab CDR sequence variation and the ability to form novel binding interfaces to any molecular entity with high affinity and specificity have long been exploited for the development of binding reagents for basic research and medicine. However, the continuous development of novel affinity reagent applications has demanded more than what is possible with natural Abs.

As previously mentioned, Abs are large (150 kDa) tetrameric proteins. They are heavily disulfide bonded and glycosylated assemblies that were evolved to exist in the oxidizing extracellular environment. Not only does the complex structure result in cost-intensive tissue culture production methods, the complexity limitations are not ideal for certain applications. From an in vivo diagnostic standpoint, Abs face two challenges. First, due to their large size, Abs cannot penetrate tissues easily. Second, Abs have a slow circulation clearance rate, because Abs are above the renal clearance size of ~70kDa and they possess an Fc domain that mediates recycling by binding to the neonatal Fc receptor (Chaudhury, Mehnaz et al. 2003, Montoyo, Vaccaro et al. 2009). As such, Abs are not ideal for diagnostic applications where clearance rate

15

needs to be rapid so that imaging tools such as radioisotope conjugated Abs are excreted as soon as possible post imaging to reduce toxic effects. While smaller Ab fragments such as Fabs or scFv domains, which lack the Fc region and are smaller in size (refer to Fig 1.3) can theoretically be substituted for full size Ab for diagnostic purposes, they remain disulfide bonded, which require production methods that are usually more expensive and/or result in lower production yields (Frenzel, Hust et al. 2013, Schlegel, Rujas et al. 2013). In addition, since these fragments lack the Fc portion, they are often less structurally stable and hence more prone to aggregation (Wörn and Plückthun 2001, Demarest, Chen et al. 2006, Perchiacca and Tessier 2012, Gil and Schrum 2013).

Another application limitation is that Ab tools are not suitable for intracellular applications. In basic research, Abs are used to tease out surface receptor function. Since natural ligands of receptors are often multispecific and can stimulate different receptors simultaneously (Yarden and Sliwkowski 2001, MacEwan 2002, Lisabeth, Falivelli et al. 2013), Abs are especially useful in deciphering receptor biology as they can be designed to specifically activate or inhibit a single receptor with high affinity. In addition, Abs are valuable in teasing out receptor structure and function relationships by trapping receptors in particular conformations or linking targeted epitopes on receptors to downstream functional consequences. However, Abs are less effective tools for probing intracellular protein function, because Abs are not cell membrane permeable. Moreover, even with gene delivery methods coupled with intracellular expression, Abs are structurally reliant on disulfide bonds, and thus do not fold and function well in the reducing intracellular environment. While small molecules exist to dissect some protein functions, a large fraction of these only target enzymes, which have deep binding pockets. Since many protein functions are a consequence of PPIs and PPIs generally lack deep pockets, it has been especially challenging to develop small molecule based tools to probe intracellular PPIs. Thus, there remains a severe paucity of tools for investigating intracellular protein functions.

1.3.2 Alternative Scaffolds

Alternative scaffolds were first conceptualized in the early 1990s. The goal was to create antibody mimetics to overcome some of the limitations of Ab reagents. The idea revolved around constructing synthetic antigen-binding sites on other protein scaffolds. The invention of alternative scaffolds was spurred by the success of libraries and display 16

technologies, which enabled HTP protein engineering. Like Abs, alternative scaffolds have a stable three-dimensional structure and are structurally robust so that they can tolerate extensive sequence variation in the designed binding site without compromising structural integrity. Additionally, these alternative scaffolds must be able to form high affinity and specific interactions to antigens of interest. However, to overcome Ab limitations, these alternative scaffolds should ideally be structurally simple, single domain, small in size (<15kDa) and lack disulfides so they can be produced in high yields using recombinant bacterial methods, have broad applicability, and can be easily linked in modular fashion for novel effector functions.

Over the last 25 years, more than 50 alternative scaffolds have been reported (Gebauer and Skerra 2009). The proteins chosen to be alternative scaffolds originate from diverse species and perform a wide array of native functions. Few groups have taken advantage of extremophile bacteria, which naturally express proteins evolved to withstand extreme conditions, and therefore are naturally structurally robust (Gera, Hussain et al. 2011, Behar, Bellinzoni et al. 2013). Other scaffolds are protease inhibitors (Dennis and Lazarus 1994, Smith, Patel et al. 1998, Williams and Baird 2003) or toxins (Pierret, Virelizier et al. 1995, Li, Dowd et al. 2001), which have small compact robust folds, most probably evolved to be exceptionally stable to escape proteolytic cleavage while inhibiting protease active sites. While others chose proteins that resemble the immunoglobulin fold (Fig 1.4A), such as (Fig 1.4B), which also has a beta sandwich fold with loops connecting two beta sheets. Based on the concept of Ab CDRs, scaffolds such as Anticalins (Beste, Schmidt et al. 1999), Knottins (Moore and Cochran 2012), and Kunitz domains (Hosse, Rothe et al. 2006) also utilize combinatorial loops to create novel binding sites. Yet others have co-opted other secondary structures that are entirely comprised of helices or a beta sheet to support a universal binding site. Thus, while nature has utilized flexible loops for Ab paratopes, the ability to form a complementarity region is not restricted to loop based architectures and can be achieved with other structures.

1.3.3 In vitro display systems for affinity reagent engineering

The discipline of structural biology has provided much insight into molecular recognition, and the knowledge governing PPIs have been in turn used to develop algorithms for interface design. Although computational protein designs have made remarkable achievements in designing de novo proteins or re-designing enzymes (Koga, Tatsumi-Koga et al. 2012, Siegel, Smith et al. 2015, Boyken, Chen et al. 2016), current methods are only applicable for designing 17

binding molecules to certain proteins that have existing high-resolution structural information and predicted candidate sequences are only partially accurate. Furthermore, binding sequences still require additional experimental optimization to achieve high affinity and specificity. As such, in vitro evolution systems still remain the dominant choice for affinity reagent engineering, especially for scaffold-based binders.

Molecular biologists have long taken advantage of natural immune systems to produce high affinity and specificity Abs. However, many practical drawbacks exist in the immune system when it comes to antibody production. First, antigens used to raise antibodies cannot be toxic or non-immunogenic. Second, it is difficult to produce antibodies against antigens that are either identical or have high similarities to host molecules because the immune system goes through a negative selection process to rid cells that produce antibodies that recognize self- epitopes. Third, the conditions in which an immune system develops antibodies cannot be controlled, and as such one cannot tailor Abs against a particular epitope of a domain or tweak antibody biochemical properties such as solubility. Fourth, the sequence of antibodies raised in an animal cannot be easily identified, which makes it difficult to streamline antibody production in recombinant systems, add functional modifications, or humanize for therapeutic development. In fact, the lack of sequence information on research grade Abs has been criticized as a major contributor to published data irreproducibility issues (Baker 2015, Bradbury and Plückthun 2015). Since Ab sequences are unknown, Ab production is tied to the immunized animals, which can produce different Ab clones when immunized with the same Ag. Even when Abs are produced using hybridoma technology, which in theory should produce monoclonal Abs, variability can be introduced in the produced Abs because biological changes can occur in hybridoma cell lines over long-term propagation (Couture and Heath 1995, Barnes, Bentley et al. 2003). Thus, it is difficult to produce the same Abs consistently using animals or hybridoma methods, and the batch-to-batch Ab variations have led to inconsistent experimental outcomes. The inability to define Ab based on its sequence poses another problem for users, as there is no way to confirm which Abs were actually used in published literature, so it can be difficult to match experimental setups to reproduce published data.

In vitro evolution using display methods has served as an attractive alternative method to the natural immune system for Ab engineering, and has been a powerful HTP platform for scaffold engineering. Combinatorial scaffold libraries, synthetic Ab libraries, or natural immune

18

repertoires can be cloned and displayed on various vectors such as phage, RNA or yeast. Subsequently, the libraries are subjected to selection pressure in in vitro settings to select for affinity molecules to targets of interest. As discussed above in section 1.1.4.1, in display systems such as phage display, selection settings are tunable to develop binders with desirable biochemical properties and competing target binding molecules can be included in the selection procedure to enrich for binders to other epitopes of the target. The ability to bias binder development against particular protein epitopes is important since binder engagement of different epitopes can result in different functional outcomes. For example, affinity reagent binding to the ligand-binding site on a receptor may mimic ligand activity and stimulate the receptor, where as an affinity reagent binding to the dimerization interface of a receptor may inhibit receptor activity. Another benefit of display systems is that the DNA coding sequence of the binder can be easily identified, which enables the binding clone to be indefinitely produced by recombinant methods, and may alleviate the data irreproducibility issues surrounding the use of affinity reagents lacking sequence information (Bradbury and Plückthun 2015, Bradbury and Plückthun 2015). The known binder sequence will also allow for further engineering, if necessary. In addition, in display systems one can precisely define the diversity of the binding sites so that residues that are known to be advantageous or deleterious for molecular recognition can be biased or depleted, respectively.

1.3.4 Examples of Successful Alternative Scaffolds

To date, numerous small biotechnology companies have been built based on alternative scaffolds. The following four scaffolds stand out as achieving much success in broad applications.

1.3.4.1 Affibodies

Affibodies, which are based on the Z domain of Staphylococcus Protein A (Fig. 4C), is a 58 residue scaffold comprised of three helices (Nord, Nilsson et al. 1995, Lofblom, Feldwisch et al. 2010), and to date over 100 papers has been published regarding this scaffold. Thirteen residues along the face formed by two helices were chosen as the binding site. Phage-displayed Affibody libraries have been successful for selecting μM to pM affinity binders to diverse antigens including protein A, CD-28 (a T-cell activating co-receptor) (Sandstrom, Xu et al. 2003), and Her2 (a cell-surface receptor) (Orlova, Magnusson et al. 2006). Currently, over 20 Affibody-

19

based research reagents have been marketed, and at least two affibodies are in clinical trials (Affibody 2016). In addition, Gallium conjugated anti-Her2 affibodies have been tested in clinical studies and proved to provide better imaging contrasts than Abs (Sorensen, Velikyan et al. 2016).

1.3.4.2 DARPins

Designed ankyrin repeat proteins (DARPins) are based on the natural ankyrin repeat proteins (Fig 4D), which typically consist of four to six consecutively linked 33-residue motifs containing two alpha helices connected by a loop (Plückthun 2015). The idea of adopting repeat proteins for building synthetic binding proteins is based on the observation that in nature, repeat proteins are abundantly present in PPIs and can vary in length to create larger or smaller binding surfaces. Prior to library construction, a consensus strategy based on sequence analysis of hundreds of ankryin repeats was adopted to design a framework with superb biochemical properties (Binz, Stumpp et al. 2003). On a single motif, six residues located on the loop and helix were randomized and a larger complementarity region was formed by assembling multiple units together. Phage-displayed and ribosome-displayed ankryin libraries have yielded binders to hundreds of molecules and pM affinities have been reported in some cases (Plückthun 2015). Currently more than 200 publications are available on DARPins, and four DARPin molecules are in clinical trials (Molecular_Partners 2016).

1.3.4.3 Monobodies

Monobodies are based on the type III domain (FN3) (Fig. 4B), which has been estimated to account for 2% of animal proteins. It is 96 residues long and has a fold similar to that of an Ig domain. The binding site was developed by randomizing 21 residues on the loops situated between the two beta sheets (Koide, Bailey et al. 1998). Recombinant libraries based on phage, yeast and mRNA display have been used to select for binders to diverse protein folds including post-translational modification, PRMs, surface receptors and kinases, with nM to pM affinities (Lipovsek 2011, Koide, Koide et al. 2012). Currently, there are more than 30 papers published on the monobody scaffold with another 20 papers published by the company Adnexus, which has commericialized a FN3 library, and developed binders called Adnectins. An Adnectin raised against VEGFR2 was advanced through a Phase 2 clinical trial to treat recurrent glioblastoma but did not meet efficacy requirements (Schiff, Kesari et al.

20

2015). Many monobodies have demonstrated success in intervening with intracellular protein function when expressed within the cell (Wojcik, Hantschel et al. 2010, Grebien, Hantschel et al. 2011, Sha, Gencer et al. 2013). Of particular note, a monobody targeting specifically an epitope at the interface between an SH2 and kinase domain within the oncogenic fusion protein BCR-Abl has validated this region as important for oncogenesis and could be a target for therapeutic intervention (Grebien, Hantschel et al. 2011).

1.3.4.4 Anticalins

Anticalins are based on members of the lipocalin protein family (Fig. 4E), which naturally exist in animal blood as carriers to store or transport small ligands (Beste, Schmidt et al. 1999). Lipocalins possess a beta-barrel fold with a cup-like structure that supports four loop extensions, and it is within the cup where natural ligands are known to bind. Exploiting the natural function of lipocalins, synthetic binding proteins have been isolated from phage or bacterial display libraries in which the loops and inner beta barrel residues were randomized (Gebauer and Skerra 2012). Nanomolar to pM affinity binders have been selected to bind small molecules such as steroids and digoxigenin (Schlehuber, Beste et al. 2000), and disease relevant proteins such as VEGF and CTLA4 (Richter, Eggenstein et al. 2014). Currently, there are more than 20 papers published regarding the scaffold and two molecules are in clinical trials (Pieris 2016).

A B C

D E

Figure 1.4 Structures of binding domains 21

(A) Antibody light chain variable domain (PDB entry 1ZTI). (B) 10th human fibronectin type III domain (PDB entry 1FNF). (C) Z domain of Staphylococci Protein A (PDB entry 1Q2N) (D) Ankryin repeats (PDB entry 1MJ0). (E) lipocalin domain of Bilin binding protein (PDB entry 1BBP). β-sheets and α-helices are colored blue, loops are colored gray, and diversified positions are represented as red spheres.

1.3.5 Practical considerations for scaffold engineering

As mentioned before, computational approaches have made great strides in binding interface design. However, it continues to face many challenges, and thus, there is still a great need for carefully designed scaffold libraries for affinity reagent development.

As mentioned before, it is of paramount importance for binders to have excellent biochemical properties as this correlates with longer shelf life, higher yields, lower off target binding, and also proposed to have lower antigenicity for therapeutic applications (Pluckthun 2009). Thus, a scaffold should be naturally stable, so that it can afford to incur some instability attributed by sequence variation. As a result, some current scaffolds are natural proteins that have gone through framework engineering to further improve structural stability (Binz, Amstutz et al. 2004, Lorey, Fiedler et al. 2014). Typically, scaffold choice also takes into consideration the type of surface for binding site construction. The surface to be diversified can be loops or secondary structures, and each has advantages and disadvantages. Loop flexibility adds another dimension of diversity, as it can adopt different conformations, and can increase the chances of arriving at a binding configuration. However, once bound, flexible loops are fixed spatially, which can result in unfavorable entropic costs, and this is contrary to complementarity regions based on fixed beta sheets or helices. Regardless of surface chosen, diversified positions should be surface exposed to prevent disturbing the hydrophobic core and the overall fold of the protein.

Since every display method has a practical limit in the achievable library size, one must carefully consider the diversification strategy to maximize coverage of the theoretical diversity and to minimize the number of non-functional members due to instability issues. The amino acids used in scaffold diversification should be devoid of Cys, to prevent formation of , which can complicate folding. Gly and Pro should also be avoided on secondary structure based complementarity regions, as they can disrupt helices and beta sheets. Arg has been shown to be beneficial for affinity but deleterious for specificity, thus Arg should also be

22

excluded or used sparingly (Birtalan, Zhang et al. 2008). As mentioned in section 1.1.3.1, Ab studies have revealed Tyr and Ser to be important in Ab-Ag interactions and this also holds true for monobody engineering (Koide and Sidhu 2009). Thus, to maximize contacting residues one can bias diversification to favour Tyr and Ser residues. For scaffold libraries produced with oligonucleotide-directed mutagenesis strategies, precise amino compositions can be achieved with oligonucleotides synthesized with trinucleotide phosphoramidite codons (Sondek and Shortle 1992, Persson, Ye et al. 2013).

Finally, it is important to note that library design is an iterative process. Binder characterization will often provide valuable lessons for improved library designs. For example, observations have been made that Abs rich in negative charges are more soluble and less prone to aggregation (Lee, Perchiacca et al. 2013), thus future Ab library design might benefit from incorporation of additional negative charges. Structural determination of binders can also give valuable insight into randomized positions that may not be suitable for randomization or vice versa. For example, second generation monobody libraries included randomization of the “framework” because it was observed in a structure that the framework residues can also make contact (Koide, Wojcik et al. 2012).

1.4 Ubiquitin

1.4.1 Ub in biology and disease

Ub is a 76-residue post-translational modifier that is covalently attached to substrate lysines or N termini, either singly or in chains of differing topologies. These modifications serve to target substrate for degradation or to alter PPIs for orchestrating a variety of cellular processes such as gene silencing, receptor internalization (Terrell, Shih et al. 1998, Haglund, Sigismund et al. 2003), DNA repair (Ulrich and Walden 2010) and cell cycle progression (Spence, Gali et al. 2000, Jin, Williamson et al. 2008). Ub attachment to substrates is catalyzed by the sequential action of three classes of enzymes: E1, E2 and E3. (Schulman and Harper 2009, Ye and Rape 2009, Berndsen and Wolberger 2014). Ubiquitination can also be reversed by deubiquitinases (DUBs), which are proteases that cleave Ub moieties from substrates (Komander, Clague et al. 2009). In addition to the plethora of enzymes that recognize Ub, there are at least 20 structural families of Ub-binding domains (UBDs), which recognize Ub non-covalently and serve to

23

translate Ub modifications into signaling cascades (Hurley, Lee et al. 2006, Husnjak and Dikic 2012, Harrison, Jacobs et al. 2015). Thus, Ub function is the product of finely balanced interactions between Ub and diverse enzymes and binding domains.

Given that Ub’s pivotal role in maintaining cell physiology is dependent on the proper integration of diverse Ub-mediated interactions, deregulation of Ub interactions can have broad consequences on the cell. In fact, mutations or alterations affecting expression in many Ub associated genes are strongly implicated in diseases such as cancer, immune pathologies, and neurodegeneration (reviewed in (Popovic, Vucic et al. 2014).

Due to the role of ubiquitination in targeting proteins for proteasomal degradation, it directly controls the abundance of oncogenic proteins or tumor suppressors, as such genetic alterations of E2s, E3s and DUBs are strongly implicated in malignancies. Typical examples of E3 ligases in cancer include MDM2 (Wade, Li et al. 2013) and the SCF ligase complex (Frescas and Pagano 2008), which are responsible for regulating levels of tumor suppressor p53 or cylin- dependent cell cycle kinases, respectively. As for DUBs, a number of them are tumor suppressors, including A20, CYLD and BAP1, which all have been found to be genetically mutated in many immune cancers such as B cell lymphomas and lymphoblastic leukemia (Popovic, Vucic et al. 2014). Furthermore, ubiquitination has been shown to directly effect oncogenes or suppressors activity, and thus offers another mechanism in which misregulation of ubiquitination can lead to malignancy (Baker, Lewis et al. 2013).

In immune pathologies, misregulation of ubiquitination is often described in the context of the NFκB pathway, which has critical roles in coordinating cellular response to stress and immune response to infection (Kanarek, London et al. 2010, Chen and Chen 2013). The canonical NFκB pathway can be activated through many or pathogen sensing receptors, and once activated, many downstream components are ubiquitinated in different topologies, which serve as a signal for degradation or as a scaffold for complex formation. Ubiquitination is critical for proper NFκB signaling, and this is reflected by the many genetic alternations in ubiquitination along the NFκB pathway in immune disorders such as immunodeficiencies, chronic auto-inflammation, rheumatoid arthritis, psoriasis, systemic lupus erythematosus and celiac disease(Popovic, Vucic et al. 2014).

Neurodegenerative diseases such as Parkinson’s or Alzheimer’s are characterized by

24

toxic accumulation of protein aggregates. Not surprisingly, both diseases are linked to defects in Ub mediated degradation or autophagic pathways that are normally responsible for removing aggregates (Lam, Pickart et al. 2000, Hara, Nakamura et al. 2006). Interestingly, ubiquitination at specific sites of α-synuclein, the main aggregate forming protein in Parkinson’s, seems to promote toxic fibril formation, while ubiquitination at other sites strongly inhibits fibril formation (Meier, Abeywardana et al. 2012). This suggests that differential Ub interactions are responsible for pathogenesis.

All of the above diseases highlight the importance of regulating ubiquitination and speak to the importance of Ub-mediated interactions.

25

1.4.2 The Ub fold

Ub is the prototype of the β-grasp fold (β-GF). The β-GF superfamily is widespread amongst the animal kingdom, and in eukaryotes there are at least 70 subfamilies identified (Burroughs, Iyer et al. 2012). According to Burroughs et al., the common β-GF topology is a four or five- stranded beta-sheet and a helical segment. The first and last β-strands are adjacent to each other and located in parallel in the center of the domain. The two remaining strands are found flanking the middle two, in anti-parallel fashion. The single helix is packed against one face of the sheet, and it appears as if the β-sheet is grasping the helical segment (Fig. 1.5A).

A B 3 C C 2 4 6 1 2 7 8 5 1 3 6 9 10

5 4 N N

C 1 3 6 C 1 9 2 8 1 5 7 4 10 6 3 4 N 5 N

Figure 1.5 The β-grasp fold Schematics (upper panel) and structures (lower panel) are shown for (A) the typical five- stranded β-GF fold, for which Ub (PDB entry: 1UBQ) is an example, and (B) the β-GF fold of the Fasciclin subfamily, for which the Fasciclin domain from Drosophila fasciclin I (PDB entry 1O70) is an example. Arrows and cylinders indicate β-strands and α-helices, respectively. The orange strand is absent in the four-stranded β-GF fold. Insertions in the basic β-GF fold are indicated by green arrows (β-strands) and magenta cylinders (α-helices).

The β-GF is thought to be already present in the last common organismal ancestor. In prokaryotes, the fold went through much structural diversification. Many extensions and modifications are found in numerous present prokaryotic β-GF domains, some with elaborate

26

modular insertions of β-sheets and helices such as in the Fasiciclin superfamily (Fig. 1.5B) (Burroughs, Iyer et al. 2012). The structural diversification enabled the prokaryotic β-GF to adopt a wide array of biological and biochemical niches such as iron-sulphur transfer (e.g. ThiS/MoaD) (Lake, Wuebbens et al. 2001), RNA recognition (e.g. threonyl-tRNA synthetase) (Sankaranarayanan, Dock-Bregeon et al. 1999), toxin function (e.g. Staphylococci enterotoxin A) (Schad, Zaitseva et al. 1995), and catalysis (e.g. fibrinolytic enzymes) (Rabijns, De Bondt et al. 1997). However, evolution of the β-GF in eukaryotes did not mirror that in prokaryotes, and instead of evolving new biochemical functions, expanded the β-GF members to impart function by mainly altering PPIs (Burroughs, Balaji et al. 2007). It is clear that the rich evolutionary history of the β-GF structure and hence function, demonstrates that this fold is a multifunctional scaffold that is highly versatile and robust to structural perturbations. As such, the Ub fold is an attractive candidate for an alternative scaffold, as discussed further in Chapter 3.

1.4.3 Ub Interactions

To date, two E1 enzymes (Schulman and Harper 2009), ~45 E2 conjugation enzymes (Valimberti, Tiberti et al. 2015), ~600 E3 ligases (Li, Bengtson et al. 2008), ~80 DUBs (Komander, Clague et al. 2009) and more than 20 structural families of UBDs have been identified in humans. Analysis of the ~250 solved Ub complexes revealed that Ub contacts are predominately found on the flat beta sheet surface (Fig 1.5 and 1.6), with very rare contacts on the Ub backside encompassing segments α1 and β2 (Lange, Lakomek et al. 2008, Harrison, Jacobs et al. 2015). This observation corroborates with a functional study that showed Ub substitution on the beta sheet was deleterious for yeast growth (Roscoe, Thayer et al. 2013).

Remarkably, evolution has resulted in the dedication of one side of Ub for interactions to accommodate distinct structural binding features. For example, Ub interacting motifs (UIMs) and Ub-associated domains (UBAs) recognize Ub with single or double helices, respectively, while zinc fingers utilize loops for interaction. E2s, which all contain a UBC domain, recognize Ub with a mix of helices and loops, but in HECT E3s, a mix of helices and a distinct β-hairpin is used for Ub binding (Maspero, Valentini et al. 2013). In USPs, the Ub-binding site consists of three subdomains that form a hand-like structure (Fig. 1.7F).

27

180o

Figure 1.6 The structure of human Ub The left panel shows the β-sheet on the helix unobstructed side, which is the surface used predominately in Ub non-covalent interactions. The right panel shows a 180o rotation, and this “back” side rarely participates in interactions.

A B C D

E F

Figure 1.7 Interactions of Ub with partner proteins Ub (blue) bound to partner proteins (gray): (A) UIM of RAP80 (PDB entry 3A1Q), (B) UBA of Dsk2p (PDB entry 1WR1), (C) Zinc finger of Npl4 (PDB entry 1Q5W), (D) UBC domain of UbcH5 (PDB entry 5FER), (E) HECT domain of NEDD4 (PDB entry 4BBN), (F) USP domain of USP2 (PDB entry 3V6C).

28

While the binding partners have been categorized according to their structural domains, many of them remain functionally elusive and only a fraction of these complexes have been solved up to date. To aid the elucidation of functions for Ub-associated proteins, our group developed a phage display strategy to rapidly develop Ub variants (Ubvs) that can act as tight binding reagents for specific Ub system proteins (Ernst, Avvakumov et al. 2013). This strategy emerged from the observations that Ub interactions predominately took place on the β-sheet and that the binding interfaces are not complementarily optimized. With a combinatorial Ub library with low levels of β-sheet substitutions, Ub variants (Ubvs) were selected with more optimal contacts to Ub system proteins, and hence the Ubvs had improved affinity and specificity over Ub.wt. Most Ubvs occluded natural Ub-binding sites and thus acted as inhibitors of protein function. Most interestingly, some Ubvs occupied natural Ub-binding regulatory sites on enzymes, and thus acted as modulators or activators inside cells, and both of these functions are not achievable with knockdown techniques (Zhang, Wu et al. 2016). With these tools in hand, Ub-binding protein function can be probed inside cells.

Although many structures have unveiled insights into Ub recognition at the molecular level, and this information has been successfully translated into engineering recombinant Ubvs for tools, many questions regarding Ub recognition remain unanswered. For example, within the large buried surface areas, what is the functional epitope for Ub interactions, and are there distinct functional epitopes for different proteins that utilize the same fold to bind Ub? How do the functional epitopes differ between different Ub binding features? How are the binding determinants spatially organized within the large binding surface? All these questions have important implications for drug design targeting specific enzymes and future development of computational protein design. Some of these questions will be addressed in Chapter 2.

1.5 Thesis Overview

A detailed understanding of the molecular underpinnings of how Ub recognizes enzymes in the Ub proteasome system is crucial for understanding Ub biological function. Many structures of Ub complexes have been solved and, in most cases, reveal a large structural epitope on a common face of the Ub molecule. However, due to the generally weak nature of these interactions, it has been difficult to map in detail the functional contributions of individual

29

Ub side chains to affinity and specificity. In chapter 2, I describe how I took advantage of Ubvs that bind tightly to two particular Ub specific proteases (USPs), USP2 and USP21, and employed saturation scanning mutagenesis by phage display to comprehensively map functional epitopes within the structural epitopes. I found that Ubvs that bind to USP2 or USP21 contain a remarkably similar core functional epitope, or hot spot, consisting mainly of positions that are conserved as the wt sequence but also some positions that preferred mutant sequences. The Ubv core functional epitope contacted residues that are conserved in the human USP family, and thus it is likely to be important for the interactions across many family members. In addition, despite the high identity between USP2 and USP21, we discovered distinct molecular preferences on each Ubv that enhanced binding to its cognate USP, and these molecular differences could be exploited for specific drug design.

Given the high adaptability of the β-GF in evolutionary history, the diverse interactions Ub can make and its unusual biochemical characteristics, we speculated that Ub could serve as a robust alternative scaffold for the development of affinity reagents. In chapter 3, I present my work on a novel combinatorial phage-displayed Ub library that is based on randomization of the β-sheet. In an attempt to reduce structural destabilization, prior to library construction, I performed a mutation tolerance scan to identify positions along the β-sheet that were important for stability, and these were fixed as wt in the final library design. In the end, I co-opted an expansive surface on the β -sheet for the construction of a binding site by sequence diversification. The resultant library was successful for generating binders to 30 of 43 diverse proteins. Validation data are presented for Ubvs selected to bind a cell surface receptor, human epidermal growth factor receptor (Her3), and Ubvs selected to bind the SH2 domain of the intracellular adaptor, growth factor receptor bound-2 (Grb2). My results show that Ubvs are specific to the proteins that they were raised against and have affinities that are comparable to antibodies. In particular, I show that Her3 binding Ubv can be used similarly to antibodies in immunoprecipitation experiments and that expression of Grb2 binding Ubv in mammalian cells has inhibitory effects on Grb2 mediated signaling.

30

2 Saturation Scanning of Ubiquitin Variants reveals a common hot spot for binding to USP2 and USP21

2.1 Statement of Contribution

This project was a collaboration with Dr. Julia Shifman from the Hebrew University of Jerusalem, and it began when she was a visiting scientist at the Sidhu lab. Dr. Shifman and I designed and constructed the phage libraries and performed selections. Computational scanning (Fig. 2.5) was done by Dr. Shifman and her student, Ayelet Dekel. Sequence analysis was done by me and Dr. Shifman. All other experiments were performed by me.

Sections of text and figures in this chapter are adapted and modified from Leung, I, Dekel, A., Shifman, J.M., Sidhu, S.S. (2016). Saturation Scanning of Ubiquitin Variants reveals a common hot spot for binding to USP2 and USP21. Proceedings of the National Academy of Sciences. DOI 10.1073/pnas.1524648113.

2.2 Introduction

Ub binds to thousands of proteins in the human proteome, but typically with low affinities in the 10-5-10-3 M range. Structural studies have delineated that most interactions occur on the beta-sheet face (Lange, Lakomek et al. 2008, Harrison, Jacobs et al. 2015), which is thus highly versatile in recognizing a wide variety of domain folds and surface topologies. Despite low affinity, many Ub interactions bury substantial surface areas that can exceed 2000 Å2 (Ernst, Avvakumov et al. 2013), defying the conventional view that buried surface area positively correlates with binding affinity of protein-protein interactions (Chen, Sawyer et al. 2013).

We have shown that the affinity of particular Ub-protein interactions can be enhanced by several orders of magnitude by mutating the surface to produce Ubvs that function as potent and specific inhibitors of the enzymes they target (Ernst, Avvakumov et al. 2013). Although this 31

work demonstrated that Ub interactions are not optimal and can be improved by mutations, we still do not understand which residues contribute the most to binding energetics and which residues are sub-optimal for binding. Many questions remain to be resolved. For example: what is the functional epitope, how are the binding determinants spatially organized within the large interface, and how do binding energetics differ depending on the interacting partner?

To understand molecular recognition by Ub, we need detailed molecular binding landscapes of Ub-protein interactions. Such binding landscapes can be obtained by introducing all possible single mutations in the binding site and assessing the consequent effects on affinities (Wells 1996, Wells and de Vos 1996, Pál, Kouadio et al. 2006, Moreira, Fernandes et al. 2007, Sharabi, Erijman et al. 2013, Aizner, Sharabi et al. 2014). However, the low affinities of typical Ub-protein interactions (Hurley, Lee et al. 2006) make it difficult to accurately measure the effects of mutations, and thus, detailed empirical binding landscapes of Ub-protein interactions have not been reported.

Here I used two Ubvs, which were evolved as high affinity inhibitors of different Ub specific proteases (USPs), as proxies to explore the functional details of Ub-protein interactions (Ernst, Avvakumov et al. 2013). USPs constitute the largest class of DUBs and several have been implicated as potential therapeutic targets (Yang, Kitagaki et al. 2009, Sippl, Collura et al. 2011). USPs share a structurally similar catalytic domain, which binds Ub and positions its C- terminal tail into the conserved catalytic cleft. I analyzed Ubv.2.1 and Ubv.21.4, distinct triple mutants with high affinity for USP2 or USP21, respectively (Ernst, Avvakumov et al. 2013). Comparisons of crystal structures of each Ubv or Ub.wt in complex with USP2 or USP21 have revealed that each Ubv binds its cognate enzyme in a manner that is virtually identical to the binding mode of Ub.wt (Ernst, Avvakumov et al. 2013). Thus, the fact that only three mutations yield Ubvs that bind in a similar manner but with dramatically enhanced affinities relative to Ub.wt makes these Ubvs ideal candidates for mutational studies to shed light on both the molecular bases for enhanced affinity and on the molecular details of natural Ub-USP interactions.

To obtain a comprehensive assessment of the mutational tolerance of the binding surface of each Ubv, without having to make and test thousands of point mutants, I employed combinatorial saturation scanning by phage display. This method uses combinatorial libraries of protein variants designed for the rapid and facile assessment of the effects of all possible point 32

mutations across a large protein binding surface and has been used previously to survey the binding energy landscape of human growth hormone (hGH) for its receptor (hGHR) (Pál, Kouadio et al. 2006). Saturation scanning analysis of the USP-binding sites of Ubv.2.1 and Ubv.21.4 reveals many common elements and some differences between the two interactions. Moreover, interpretation of the results in the context of the human USP family shows that many conserved functional interactions are likely generalizable to most family members. Our group and collaborators have reported structures of additional Ubvs in complex with E3 ligases of the Homologous-to-E6AP-C-terminus (HECT), Skp1-Cul1-F-box (SCF) and Really-Interesting- New-Gene (RING) families (Gorelik, Orlicky et al. 2016, Zhang, Wu et al. 2016), and the extension of the methods described here to these and other types of Ub-protein interactions may identify commonalities and differences in how Ub recognizes diverse structural folds to mediate biological effects.

2.3 Methods and Materials

2.3.1 Library construction and analysis

Each Ubv was displayed on , as described (Ernst, Avvakumov et al. 2013). Phage-displayed Ubv libraries were constructed, sorted and analyzed, as described (Pál, Kouadio et al. 2006). Briefly, a Ubv-displaying phagemid vector containing TAA stop codons in the regions to be mutagenized was used as a template for oligonucleotide-directed mutagenesis (Kunkel 1985). Mutagenic oligonucleotides were designed to replace positions to be scanned with degenerate NNK codons (N = A/G/C/T, K = G/C) that collectively code for all 20 natural amino acids. Five libraries were designed for each Ubv, with each combining three mutagenic oligonucleotides to introduce the mutations into the Ubv sequence (Table 1). Each mutagenesis reaction was electroporated separately into E.coli SS320 and each yielded a library of >109 unique members. The libraries were handled separately and each was sorted for four rounds to select for binding to immobilized USP. Individual clones from the fourth round of selection were grown in 96-well format, and the culture supernatants were used directly in phage ELISAs (Sidhu, Lowman et al. 2000) to detect clonal Ubv-phage that bound to USP. 50- 100 positive binding clones from each library were subjected to DNA sequencing and statistical analysis.

33

Table 2.1 Sequences of mutagenic oligonucleotides used to construct saturation scanning libraries.

C C C

C

C C C C C

C

CC CC CC C AA C C C CC

CC CC

C C CC C CC CC

G C C

GG GG AA AA AA GG

GG GG AA

AA AA

T

GG GG GG

T AA AA AA

T TT

T

T G G G G GG GG G G

G

G G G G

G G G

T

G G G G

T T

T T G CG G CG

T T T

CG G CG CG TT TT TT

G

G

TT T

G

G G T TT TT T

TT TT TT

A

G G G

A G G G T A

A

A A

G G T

T G G G T T

G G G

T T C T

T

G C C G G G T C C

G GG G A G C C GG C C and G G G

GG A GG A

GG A A A

A A

C

C A

A

A A A A A T A A A T

G

A G G G T T T G T G G G

G G G G G A

A G G G T

f

G G A A G C A T

T T T T C C C

G G o C T T G C G T C

G

G C C C G

G T T T G

T G G

G

T

T x

T

G G T T G T i

C C C C

G CC C C

C C C C C C AA

C C C AA

)

m A C C AA AA

C

A AA ) A

A A A A

C G r K G A A C A K

G G G

C C

CC CC C A C

C C a

l

CC

CC C C C CC CC CC G T

A A G T

C T G T o G T

G T

T ) A NN A

) T A A A NN T T

( A T

T T (

C

K T C C m K

C i ) C C CG

C C C C C T C T T

CG C CG

CG CG

CG

C

T K

CG T T T ) A A A T G )

) G NN T CG

NN

G G A (

A A A ( K equ TT

A A K

K A

TT

A K

A TT

C A TT NN

TT C C C ( C e C C

A

C C TT CC C C

t

C

TT

A NN C TT NN A NN CC CC A

a ) C ( A NN CC

C G ( A A ( G A

)

A C G

) C C

c A A C

K

i A G

CC C G K

)

A G K G G G G A )

A A A AA a A AA A G

K

AA A CC

G A

nd

G K

G

)

G NN ) 2 2 2 2 2 s G ) CC i A AA NN AA AA ( A NN A CC " K

( A

G K G ( A

K AA A

AA

NN

on on on on on NN A K

( i i i i i G ) )

( G "

G G

G G GGC TT TT NN GGG . K K

NN GGC

NN

(

) ( eg eg eg eg eg ) ( ) s

TT TT

GGC GGC GGG TT TT GGG GGG

TT TT

T

K

K

R R R R R K

T G

) ) NN NN GGG G de G T T CG GGG G

( ( GGG G G T 1 2 3 4 5 G G

GGG K

ti K T

T

T A GC T T NN NN NN T T CG C

y y y y y GC ( ( G

(

) G C

equence r r r r r

GC GC

eo G G A CG

C C

l

CG NN

NN

CG CG a a a a a CG K CG ( ( ) S c A r r r r r

TT

TT A

A

A A TT TT

A TT b b b b b

TT K TTT

i i i i i

TT TT C TT TT

C C TTT TTT nu C NN

C

)

C TTT C

TTT L L L L L

TT TT

( TT ) ) C C

C C

r TT

de TT TT

TT K C C T TT NN i C C T 1 1 1 1 1 K C K C CC C C C (

C T T

. . . . .

C C t T C C C T

A

C ou

A

T T

A A CC A A NN C TT

TT

C A A NN NN (

ll f TT TT C C

CC

A CC

CC

( ( eo bv2 bv2 bv2 bv2 bv2

CC

CC

CC CC

G CC a G l CC A A A G G G A U U U U U

A

f C

T TT TT A A T A TT

TT

A

) )

T T TT TT TT TT T

o TT TT C C C C C

C C K K

C

C C C C C C C

C

as as as as as x C C AA ) )

)

i

A T T AAA AA K K T T A T T K A e e e e e A AAA T T

AAA m AA G AA NN NN

G C C AAA

C C C C ( (

G G G C m m m m m C r T G G

A A A G gonuc G

A A G A A

NN NN G A a NN A A A G A T T

l

G

( ( G G T (

li sa sa sa sa sa T T

T T

T

G T o T T T T G G T T T T G G G G

T T

G C C C G A G

G O G ) C C )

C C m

C C A G G A

A i T T C C

A C C T T T T

K

K C T C T

C

C

C C TT C T T

)

C C

C

) ) G

G G TT K equ TT G G TT G G K TT K NN

A NN TT GG

G

A A TT ( TT

GG GG ( TT A A

A A

e T GG G A t T T

A G G

NN TT

C C

) C T TT ) G NN TT NN TT (

TT C

)

( (

TT TT TT TT A

K ca

G G A

K A G G A T

T i T A

K

T G

G A A A ) T A

G

G A A A A G G K G A G nd AA AA AA AA NN NN G G G G G G

G

AA A NN ( AA i A A G ( A

G

(

AA AA " A A A

A

C T T C C T C NN T

T

N C C C ( AA AA C AA AA G

"

AA AA AA AA G G G G C C AA T AA C C . CC CC

G G G G CC G

C C C T CC C T

T

T s CC

A

C C

T T T T T

A T T A A T A T

T TT TT

TT TT A A A A A on

TT

TT TT TT A A A A TT TT iti

A CC A CC A CC A CC

CC

A A AAA T T

T T

AAA T AAA T AAA T AAA T

T T AAA AAA AAA AAA AAA pos C TT TT

TT

TT AA AA C C TT AA AA C C

A A

A AA AA A AA C C C C AA C A

A A

A

A

AA A AA

C C

G C C A A A A A A A

A C C C G G zed C A G G A A A i

A A C C

G G G G G T A A A A

T T T T T T T A T A m

A

T T T T T T T T T GG GG GG

A A

T T GG A A GG C C

C C G A A A A A C

C C C G G G G A A A A

C

C

G G G G G A A A C A

A G G ando A C C G G C C r A G G G G C C C C C

A A G G A A G e A A A A A t G G G G G G G G a G c i

nd i s on t

i

1 1 1 1 3 3 3 3 3 1 1 1 2 2 2 2 2 2 2 2 2 2 1 1 1 3 3 3 3 3 g e acke r R b

n

i y r

a r 4 2 3 5 2 3 1 1 4 5 osed l b

i . s L enc

de 4

ti .

1 . v

eo l odons b c bv2 C

U bv21 U a nu U

34

2.3.2 Statistical Analysis.

The binding sequences from each library were aligned. The occurrence of each amino acid at each scanned position was corrected for bias by dividing the counts by the number of codons for that amino acid contained within the NNK degenerate codon (Bond, Wiesmann et al. 2005). Normalized sequences were used to produce an alignment in LOGOS consensus format (http://weblogo.berkeley.edu/logo.cgi) (Schneider and Stephens 1990). Bits scores at position i were calculated according to the formula: Ri=log2 20 – (Hi – en), where Hi, Shannon Entropy at position i, =–Σƒa,i x log2ƒa,I, and fa,i, is the relative frequency of amino acid, a, at position i.

Small sample correction, en, was assigned as 0, so that bits scores ranged from 0 to 4.32 (log220). The statistical analysis of the Ub-binding site of USPs was performed similarly, using an alignment of the catalytic domains of 48 human USPs (Fig. 2.6).

2.3.3 Enzyme activity assays

Enzymatic assays were performed in buffer (50 mM HEPES, pH 7.5, 0.01% Tween 20, 10 mM DTT). Serial dilutions were performed in 96-well plates and then transferred to 384-well plates for measurements. USP2 inhibition assays were performed by measuring the release of fluorogenic amido-4-methylcoumarin (AMC) from the substrate, Ub-AMC (Boston Biochem, Boston, MA), as described (Ernst, Avvakumov et al. 2013), with 1 mM Ub-AMC substrate, 7.5 nM USP2 and serial dilutions of Ubvs. USP2 and Ubv were mixed in assay buffer and incubated at room temperature for 2 min prior to the addition of Ub-AMC. Proteolytic activity was measured by monitoring the increase of fluorescence emission at 460 nm (excitation at 360 nm) for 30 min using a Synergy2 plate reader (BioTek Instruments, Winooski, VT). IC50 was defined as the concentration of Ubv that inhibits 50% of USP2 activity, and was fitted using the sigmoidal 4PL equation in GraphPad Prism software.

For determination of kinetic parameters, a fixed concentration of USP2 was incubated with serial dilutions of Ub-AMC. Duplicates of initial reaction velocity (nM/s) were determined at each substrate concentration by determining the linear slope from plotting fluorescence signal versus time and converting to molarity by interpolating from a standard curve of known AMC concentrations. Velocity versus substrate concentration was plotted to determine the KM and Vmax using Graphpad Prism with the Michaelis-Menten equation V0 = Vmax*[S]/(KM + [S]). kcat was

35

obtained from the equation kcat = Vmax/[E]o, where [E]o is the total enzyme concentration. For velocity versus substrate plots that could not be fitted using the Michaelis Menten equation but could be fitted using a linear line, I determined an estimate of the kcat/KM value by dividing the -1 -1 linear slope (nMs M ) by [E]o.

2.3.4 Computational saturation mutagenesis

Computational scanning of Ubv.2.1 or Ubv.21.4 in complex with USP2 or USP21 (PDB file 3V6E or 3MTN), respectively, was performed as described (21). Each substitution was introduced into the Ubv, and the intermolecular energy DEinter was calculated by subtracting the energies of the unbound chains from that of the complex. DDGbind was calculated as the difference between DEinter of the Ubv and Ub.wt complexes. Backrub algorithm was used to model backbone flexibility including movements of -15, -10, -5, +5, +10, +15 degrees relative to the original conformation. The lowest DDGbind from all backrub moves was assigned to each substitution, and a substitution was considered allowed if it exhibited a negative DDGbind value. The position was categorized as high, medium or low conservation if <4, 4-7, or >7 substitutions were allowed, respectively.

2.4 Results

2.4.1 Saturation Scanning of Ubv.2.1 and Ubv.21.4

In saturation scanning, precisely designed combinatorial libraries of phage-displayed protein variants are used to assess the functional importance of individual residues for binding to a partner protein (Clackson and Wells 1995). The basic principle of the method is that the frequency of a particular amino acid at a particular position within a protein interaction site reflects how favourable it is for the function of the site. By allowing all 20 genetically encoded amino acids to compete against each other, the relative frequencies can be used to simultaneously compare the effects of all possible substitutions across the interface.

I defined the USP-binding site on Ub as 25 residues across the β-sheet face, and for each Ubv, I divided these residues into five groups of five for randomization in phage-displayed libraries (Fig. 2.1). To minimize cooperative effects between mutated positions, each library was designed to randomize positions that are not in direct contact with each other (Pál, Kouadio et

36

al. 2006). Ubv.2.1 or Ubv.21.4 libraries, displayed on M13 bacteriophage as fusions to the N- terminus of the gene-3 minor coat protein, were subjected to selections for binding to the catalytic domain of USP2 or USP21, respectively. Individual clones were tested for cognate USP binding by phage enzyme-linked immunosorbent assays (ELISAs) and 50-100 clones with specific binding signals were sequenced from each library. Binding sequences were used to create a position weight matrix (PWM) consisting of frequencies of each amino acid type at each position (Fig. 2.2). Each matrix column depicts the amino acid preference at each position as a probability distribution. From this PWM, the distribution at each position was visualized as a sequence logo (Fig. 2.3). The importance of each Ubv position to USP binding was quantified as a bits score plotted along the y-axis of the sequence logo (Fig. 2.3). The bits score measures the amount of information that is required to describe the amino acid frequency pattern at a particular position and is related to Shannon Entropy, which measures the degree of randomness within a population (Schneider and Stephens 1990) and is routinely used to identify and predict DNA (D'Haeseleer 2006, Badis, Berger et al. 2009), RNA (Ray, Kazan et al. 2013) and protein binding motifs (Tonikian, Xin et al. 2009, Gfeller, Butty et al. 2011, Andreatta and Nielsen 2012). For a frequency distribution of 20 amino acids, the bits score varies between zero and 4.32 for positions that are completely random or completely conserved, respectively.

A B Region 1 Region 2 Region 3 74 73 2 4 6 8 9 10 11 12 14 40 42 44 46 47 48 49 62 63 64 66 68 70 71 72 73 74 40 72 Ub.wt Q F K L T G K T T Q R I A G K Q Q K E T H V L R L R 71 Ubv.2.1 - - N - - - T H ------Ubv.21.4 ------W - F L - - - - 9 8 7042 49 10 44 48 11 6 68 Library 1 X X X X X 12 47 Library 2 X X X X X 14 4 66 46 Library 3 X X X X X 2 Library 4 X X X X X 64 63 Library 5 X X X X X

Figure 2.1 Ubv saturation scanning library designs

(A) The USP-binding site mapped onto the structure of Ub (PDB entry 1UBQ). The site contains 25 residues distributed across three regions: region 1 (residues 2-14), region 2 (residues 40-49) and region 3 (residues 63-74). Five libraries were designed to scan five residues each, in a manner that maximized the distances between residues in any one library. Scanned residues are shown as spheres and are color-coded according to the library they share. (B) Ubv sequences subjected to saturation scanning. Dashes (-) indicate residues that are identical to Ub.wt and positions that differ are indicated by the amino acid substitution. Five libraries were designed for each Ubv, and the randomized positions in each library are denoted by (X).

37

A Bits 4.3 4.3 2.8 2.6 1.4 4.3 1.9 4.2 2.2 1.4 3.7 3.8 2.6 4.3 1.9 3.3 1.0 2.1 4.3 3.5 3.0 1.6 2.9 3.2 1.9 Residue Q2 F4 N6 L8 T9 G10 T11 H12 T14 Q40 R42 I44 A46 G47 K48 Q49 K63 E64 T66 H68 V70 L71 R72 L73 R74 P 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 2 0 G 0 0 0 0 1 100 1 0 0 23 84 0 17 100 0 0 6 0 0 0 0 10 4 0 3 A 0 0 1 0 1 0 0 0 0 10 0 0 58 0 0 0 1 0 0 0 0 12 0 0 0 C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 1 V 0 0 0 18 6 0 6 0 0 0 0 10 3 0 1 1 2 0 0 0 4 0 0 0 1 L 0 0 0 54 0 0 0 0 29 0 0 0 0 0 6 0 0 2 0 0 1 23 0 11 1 I 0 0 0 21 0 0 2 0 0 0 0 90 0 0 9 1 0 0 0 0 0 1 0 0 1 M 0 0 49 0 1 0 0 0 8 0 0 0 0 0 0 0 5 2 0 0 12 7 0 0 3 F 0 100 1 7 3 0 2 0 0 0 0 0 0 0 0 0 2 46 0 78 0 22 0 10 38 Y 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 16 12 0 1 0 21 0 0 6 W 0 0 0 0 0 0 0 0 31 0 0 0 7 0 6 0 0 7 0 0 0 1 0 77 37 S 0 0 2 0 3 0 19 1 0 5 0 0 15 0 2 1 3 0 0 0 6 0 20 1 2 T 0 0 1 0 1 0 5 0 1 0 0 0 0 0 1 1 2 0 100 0 2 0 0 0 0 N 0 0 44 0 24 0 5 0 0 0 0 0 0 0 0 79 11 0 0 0 74 0 9 0 0 Q 0 0 0 0 1 0 0 0 0 5 0 0 0 0 3 16 7 4 0 0 0 0 0 0 3 D 100 0 0 0 37 0 0 0 0 5 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 E 0 0 0 0 3 0 2 0 0 26 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 H 0 0 1 0 4 0 4 99 3 10 0 0 0 0 3 0 0 25 0 22 0 1 0 0 3 K 0 0 0 0 6 0 2 0 0 10 0 0 0 0 35 0 14 0 0 0 0 0 0 0 0 R 0 0 0 0 6 0 51 0 27 3 16 0 0 0 35 0 26 0 0 0 0 0 67 0 3 B Bits 1.8 4.3 2.9 2.8 1.1 3.0 0.6 2.7 1.7 1.1 4.1 3.2 1.7 2.0 2.3 2.7 1.1 1.0 3.4 3.6 1.5 1.3 2.4 2.8 1.2 Residue Q2 F4 K6 L8 T9 G10 K11 T12 T14 Q40 R42 I44 A46 G47 K48 Q49 K63 W64 T66 F68 L70 L71 R72 L73 R74 P 3 0 0 4 5 4 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 G 46 0 0 74 0 71 25 0 11 8 3 1 23 32 1 2 1 0 0 0 8 0 0 0 5 A 11 0 0 0 2 13 4 0 1 0 0 0 11 34 0 0 0 1 0 0 21 6 0 1 1 C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 V 0 0 1 0 8 0 6 0 0 1 0 53 0 0 0 0 4 1 0 1 0 16 0 4 3 L 0 0 0 0 4 0 2 0 26 9 0 0 0 1 1 0 2 4 1 0 28 8 0 6 1 I 0 0 0 0 18 0 1 0 0 0 0 46 0 11 3 0 0 9 0 2 0 18 0 7 5 M 0 0 0 0 0 0 6 0 28 0 0 0 0 0 2 0 2 7 0 3 12 3 0 2 3 F 0 100 5 6 0 0 6 3 0 25 0 0 0 7 41 0 4 20 0 90 0 22 0 7 10 Y 0 0 2 0 3 0 3 18 3 8 0 0 11 0 5 0 15 23 0 2 0 16 0 0 37 W 3 0 0 3 0 0 0 0 8 17 0 0 29 11 42 0 15 9 0 0 0 5 0 73 10 S 8 0 1 6 6 12 8 25 3 3 0 0 17 2 0 1 1 2 0 0 1 2 10 0 5 T 1 0 0 1 8 0 5 53 16 1 0 0 2 2 1 2 0 2 80 0 2 1 0 0 0 N 0 0 60 0 27 0 4 0 0 14 0 0 0 0 1 22 0 8 4 1 12 0 6 0 8 Q 8 0 0 0 3 0 3 0 0 0 0 0 4 0 0 62 11 2 0 1 0 0 3 0 5 D 13 0 0 3 3 0 3 0 0 0 0 0 4 0 0 0 13 0 0 0 0 0 0 0 0 E 5 0 0 0 12 0 1 0 0 3 0 0 0 0 0 2 4 0 0 0 0 0 0 0 0 H 3 0 0 3 3 0 3 0 0 6 0 0 0 0 0 11 0 7 15 0 0 3 3 0 3 K 0 0 32 0 0 0 3 0 0 3 0 0 0 0 1 0 19 3 0 0 8 0 45 0 3 R 0 0 0 0 0 0 13 0 3 1 97 0 0 0 3 0 9 1 0 0 7 0 34 0 1

Figure 2.2 The saturation scan database

Data shown for (A) Ubv.2.1 and (B) Ubv.21.4. Following selction for binding to cognate USP, the percent occurrence of each amino acid was calculated after normalization for codon bias at each scanned position. The wt occurrences are boxed and values are colored as follows: yellow ≥10%, blue ≤2%. Bits score were calculated as described in “Materials and Methods.

38

Figure 2.3 Saturation scans of Ubvs (A and B) The results of saturation scanning mapped on the structure of (A) Ubv.2.1 (PDB entry 3V6C) or (B) Ubv.21.4 (PDB entry 3MTN). The Ubv is shown as a surface colored according to bit values derived from amino acid frequencies (see Fig. S2), as follows: green ≤ 2, 2 < yellow <2.7, red ≥ 2.7, grey = unscanned. The white dashed lines demarcate the core functional epitope. (C and D) The results of saturation scanning presented as sequence logos for (C) Ubv.2.1 or (D) Ubv.21.4. The sequence of each Ubv is shown at the bottom and boxes signify sequences that differ from Ub.wt. Residues marked by blue or red boxes indicate sequence changes in Ubv.2.1 predicted to enhance inhibition of USP2 for which enzyme inhibition assays did or did not validate predictions, respectively (see Fig. 2.4).

To assess the accuracy of the saturation scanning data, I focused on inhibitors of USP2 and measured the effects of Ub.wt and a panel of Ubvs on the hydrolysis of the model substrate Ub-

39

AMC (Fig. 2.4). Ub.wt exhibited an IC50 value of ~60 μM and Ubv.2.1 (K6N/K11T/T12H) was

~1000-fold more potent (IC50 = 56 nM). Amongst the three mutated positions in Ubv.2.1, we noted that position 11 exhibited a low bits score and that Thr was relatively uncommon (Fig 1D). Thus, we predicted that Thr11 does not contribute significantly to the enhanced affinity of Ubv.2.1 for USP2 and confirmed this by showing that a variant containing two substitutions

(K6N/T12H) was virtually equipotent (IC50 = 58 nM) to Ubv.2.1. I assessed the effects of individual substitutions at positions 6 and 12 and found that, relative to Ub.wt, the substitution

6 12 Asn or His enhanced potency by ~300-fold (IC50 = 200 nM) or ~2-fold (IC50 = 24 μM), respectively. Taken together, these results show that the two mutations K6N/T12H are necessary and sufficient to confer high potency inhibition of USP2.

IC50,K6N/T12H. / IC50 (nM) IC50 wt 60000 ± 20000 0.001 K6N 200 ± 10 0.3 100 T12H 24000 ± 1000 0.002 K6N/T12H 58 ± 7 1.0 K6N/K11T/T12H 56 ± 2 1.0 K6N/T12H/Q2D 22 ± 4 2.5 K6N/T12H/N6M 70 ± 20 0.8 50 K6N/T12H/K11R 12 ± 5 4.7 % USP2 Activity % USP2 K6N/T12H/T14L 21 ± 5 2.7 K6N/T12H/E64F 32 ± 3 1.8 K6N/T12H/H68F 44 ± 7 1.3 K6N/T12H/V70N 52 ± 6 1.1 0 0.001 0.1 10 1000 K6N/T12H/R42G 1400 ± 200 0.04 K6N/T12H/L73W 2300 ± 800 0.02 [Ubv] µM

Figure 2.4 Dose response curves for inhibition of USP2 by Ubvs The IC50 value was determined as the concentration of Ubv that reduced USP2 activity by 50% (n = 2).

Next, in the background of the double-mutant K6N/T12H, I tested the effects of substitutions that were abundant in the selected sequences (Fig. 2.4). Substitutions at seven positions (Q2D, N6M, KllR, T14L, E64F, H68F, V70N) either enhanced or did not affect inhibitor potency, confirming the predictions of the saturation scan analysis. However, substitutions at two positions (R42G, L73W) reduced inhibitor potency, and we speculate that these substitutions may have been selected due to favorable effects on protein stability or levels of protein display on phage, factors that were not corrected for in the analysis. Taken together, these results generally confirm the accuracy of predictions derived from the saturation scan data, but the 40

method was inaccurate for two of nine positions at which substitutions to the Ubv.2.1 sequence were predicted to enhance binding to USP2.

To visualize the saturation scanning data in a structural context, I assigned the positions on the basis of bits scores into high, moderate or low conservation groups and mapped the data onto the structure of Ubv.2.1 (Fig. 2.3A). The high conservation positions form a large, contiguous functional epitope including the C-terminal tail region and a band of residues across the center of the interface. Strikingly, the functional epitope of Ubv.21.4 (Fig. 2.3B) is very similar to that of Ubv.2.1, as both exhibit high sequence conservation at nine positions across the central band (4, 6, 10, 12, 42, 44, 49, 66, 68), which I henceforth refer to as the core functional epitope.

To further validate and understand the saturation scanning data, we used an in silico saturation mutagenesis protocol (Sharabi, Erijman et al. 2013, Aizner, Sharabi et al. 2014) to predict tolerance of each Ub position to mutation. In this protocol, we substituted all Ub residues randomized in this study with the nineteen other amino acids and calculated changes in free energy of Ub binding (∆∆Gbind) to USP2 or USP21 (Fig. 2.5 C and D). The positions were then classified according to the number of substitutions that produced negative ∆∆Gbind values. Our computational results largely confirm the conservation patterns observed by saturation scanning, producing a similar Ub functional epitope for binding to USP2 or USP21 (Fig. 2.5A and B). A few Ub positions exhibited slightly higher variability in the computational analysis relative to the empirical analysis, and this may be due to the inability to model certain experimental phage selection pressures such as higher stability and expression.

41

A B 74 74 73 73 40 40 72 72 71 71 42 42 8 70 8 70 49 49 9 9 44 44 47 47 48 10 6 68 48 11 10 6 68 11 46 46 12 12 66 66 14 4 14 4 2 2 64 64 63 63

C Residue Q2 F4 N6 L8 T9 G10 T11 H12 T14 Q40 R42 I44 A46 G47 K48 Q49 K63 E64 T66 H68 V70 L71 R72 L73 P - >6 >6 0.7 0.8 >6 0.8 3.3 - 0.3 - >6 - - - >6 1.0 - 2.3 >6 - 0.3 - >6 G >6 0.8 >6 >6 1.6 0.0 1.4 >6 1.7 >6 >6 >6 0.9 0.0 2.9 >6 1.6 >6 1.9 2.3 1.5 >6 >6 5.2 A >6 0.8 0.6 >6 1.1 1.1 1.0 2.1 1.2 0.4 >6 1.0 0.0 >6 2.9 >6 1.0 >6 1.5 1.8 0.8 >6 3.4 3.7 C 2.9 0.5 0.3 0.6 0.5 3.6 0.6 1.8 0.8 0.2 >6 0.7 -0.2 >6 2.9 1.9 0.5 >6 1.4 1.2 0.3 0.6 3.2 2.3 V 2.8 0.5 0.3 0.4 0.3 >6 0.5 2.1 1.2 0.1 >6 0.5 -0.1 - 2.5 1.3 0.3 >6 1.6 1.0 0.0 0.6 3.0 1.8 L 3.2 0.5 0.3 0.0 -0.6 >6 -0.1 3.1 0.5 -0.3 >6 0.4 -0.7 >6 1.7 1.6 -0.1 3.4 >6 0.2 1.2 0.0 3.1 0.0 I 2.6 0.9 >6 0.0 0.0 >6 0.0 1.5 3.5 -0.2 >6 0.0 -0.1 - 2.2 1.4 -0.1 - 1.4 1.1 0.2 0.2 3.0 2.2 M 3.4 0.5 0.4 -0.2 -0.8 >6 -0.7 >6 0.2 -0.5 >6 -0.1 -0.8 >6 2.0 1.4 -0.1 >6 >6 -0.2 -0.1 -0.3 3.2 0.7 F 3.2 0.0 >6 0.0 -1.1 >6 -0.4 2.0 1.9 -0.6 >6 0.7 -0.5 >6 1.0 1.6 -0.6 3.1 >6 -0.2 >6 -0.5 3.7 >6 Y 3.3 >6 >6 1.2 -0.8 >6 -0.2 >6 >6 -0.2 >6 >6 0.6 - 1.2 1.7 -0.1 2.4 >6 2.3 >6 -0.6 2.5 >6 W 3.0 >6 >6 1.4 -1.2 >6 -0.9 >6 >6 -1.3 >6 -0.5 -1.2 >6 0.9 >6 -1.4 3.0 >6 0.1 >6 >6 3.8 >6 S 2.9 0.7 -0.1 >6 0.9 0.4 0.5 0.7 1.0 0.6 >6 1.1 0.1 >6 1.6 >6 1.1 >6 0.6 1.8 0.5 0.4 3.1 3.1 T 2.7 0.4 0.2 0.1 0.0 >6 0.0 1.1 0.0 0.3 >6 0.5 0.0 >6 1.4 1.9 0.6 >6 0.0 1.2 0.3 0.4 2.8 1.7 N 2.7 0.0 0.0 -0.1 -0.3 >6 0.2 2.2 1.0 0.0 >6 -0.3 0.4 >6 1.7 1.8 1.4 2.0 1.1 0.8 -0.5 -0.1 2.8 3.3 Q 0.0 0.4 0.5 -0.5 0.2 0.4 0.9 2.6 0.9 0.0 >6 0.1 0.2 >6 1.4 0.0 0.6 1.0 >6 -0.2 -0.1 0.1 2.8 2.2 D 2.3 -0.9 0.2 1.4 1.5 >6 1.4 2.3 1.2 0.1 >6 0.0 0.5 >6 3.0 2.2 1.1 2.0 0.0 1.8 0.3 0.3 3.7 2.5 E 1.5 0.1 -0.1 0.3 1.0 >6 0.8 2.6 0.9 -0.6 >6 0.8 0.1 >6 2.9 1.9 0.6 0.0 1.4 -0.5 0.0 0.4 3.5 2.6 H 2.5 0.3 -0.1 0.0 -0.3 >6 -0.8 0.0 0.6 0.1 >6 0.3 0.0 >6 0.9 1.5 0.2 3.7 >6 0.0 >6 0.1 2.9 1.3 K 3.1 1.3 0.9 -1.3 -1.8 0.3 -2.1 1.8 4.2 -1.7 >6 -0.7 -0.3 >6 0.0 0.2 0.0 1.7 >6 -0.6 1.0 -0.6 2.1 1.9 R 2.0 2.4 1.3 -0.2 -1.8 >6 -1.8 >6 0.9 0.1 0.0 0.2 -0.1 >6 -1.0 0.6 -0.8 3.3 >6 -0.6 >6 -0.2 0.0 >6 D Residue Q2 F4 K6 L8 T9 G10 K11 T12 T14 Q40 R42 I44 A46 G47 K48 Q49 K63W64T66 F68 L70 L71 R72 L73 P - >6 >6 0.2 -0.3 - >6 1.1 >6 >6 >6 - -1.0 - >6 >6 0.3 >6 1.2 - >6 - 3.2 - G >6 >6 2.7 >6 0.4 0.0 >6 1.2 1.5 >6 >6 >6 -1.6 0.0 2.1 >6 >6 1.8 1.7 >6 1.5 >6 3.3 5.2 A 0.9 >6 2.2 0.4 -0.2 -0.6 >6 0.6 1.1 >6 >6 0.8 0.0 -0.3 1.9 >6 0.4 1.5 1.2 2.3 1.2 0.7 3.4 3.8 C 0.9 1.2 1.9 -0.3 -0.3 - 0.6 0.3 0.8 0.0 >6 0.5 -0.3 >6 1.8 1.1 0.3 0.6 0.7 1.8 0.6 0.3 3.3 2.4 V 0.9 1.7 1.7 -0.2 -0.8 - 0.5 0.3 0.8 0.0 >6 0.4 -2.3 - 1.7 0.5 0.3 0.8 -0.3 1.6 0.5 0.5 2.8 >6 L 0.5 0.9 1.1 0.0 -1.5 - 0.1 0.7 -0.5 -0.5 >6 0.1 -1.6 - 1.0 0.5 0.0 -0.4 0.4 0.9 0.0 0.0 2.6 0.0 I >6 0.8 1.3 -0.7 -1.5 - 0.0 0.6 0.5 -0.3 >6 0.0 -1.6 - 1.2 0.3 0.2 0.2 -0.3 1.0 0.2 >6 2.8 3.2 M 0.3 0.2 0.8 -0.2 -1.9 - -0.1 0.6 -1.0 -0.9 >6 0.2 -0.7 >6 0.6 0.4 -0.1 -1.3 1.1 0.4 -0.1 0.1 2.1 0.9 F 0.5 0.0 1.0 -0.2 -1.7 - -0.4 -0.9 >6 -0.8 >6 1.2 -0.3 >6 0.2 0.5 -0.2 -0.1 0.2 0.0 0.8 1.3 2.7 >6 Y -0.2 -0.1 >6 -0.3 -2.3 - -0.1 -0.4 >6 -0.5 >6 >6 0.0 >6 0.6 0.6 -0.9 0.0 1.4 0.6 1.0 1.3 2.7 >6 W 0.5 0.4 0.5 -1.3 -2.8 - >6 -0.2 >6 -1.0 >6 >6 -0.8 >6 0.3 0.1 -0.7 0.0 >6 -0.1 0.4 >6 2.7 >6 S 0.8 >6 2.1 -0.2 -0.1 - 0.8 -0.4 1.1 0.6 2.9 -0.2 0.0 0.2 1.9 >6 0.5 1.0 -0.5 2.2 0.6 0.4 3.2 2.9 T 0.7 0.9 1.8 -0.7 0.0 - 0.0 0.0 0.0 0.3 >6 0.1 -0.7 >6 1.8 1.1 0.3 0.3 0.0 1.8 0.5 0.5 2.6 3.3 N 0.6 0.5 2.0 -1.1 0.0 - 0.0 0.7 0.2 0.6 3.5 0.4 - >6 1.6 1.4 0.5 0.0 -0.7 1.3 -0.5 0.2 1.9 3.2 Q 0.0 0.2 1.0 -1.2 -0.7 - -0.3 0.2 -0.1 0.0 >6 0.5 -2.0 - 1.7 0.0 0.4 -0.4 0.1 1.1 -0.8 0.1 2.1 2.2 D 0.5 0.9 2.0 -1.1 -0.1 - 0.8 0.1 1.3 0.7 >6 0.2 -0.6 >6 2.3 1.4 0.5 0.1 -0.5 2.0 -0.2 0.4 3.0 2.5 E -0.4 0.5 2.0 -1.7 -1.0 - 0.5 0.0 -0.4 0.2 3.0 0.4 -1.8 >6 2.2 1.2 0.3 -0.3 -0.5 1.0 -1.7 0.4 2.9 2.1 H 0.5 0.1 1.4 -0.5 -1.8 - -0.6 -0.2 0.9 -0.1 >6 0.6 - >6 0.7 0.9 0.2 -1.5 0.8 1.3 0.6 0.1 1.9 3.0 K 0.4 0.2 0.0 0.1 -2.1 - 0.0 -0.1 -0.9 -0.5 3.1 -0.7 -1.1 - 0.0 0.9 0.0 -1.2 0.8 0.3 -0.4 0.6 0.3 1.9 R 0.1 0.8 -0.1 1.4 -2.2 - -0.2 1.3 -0.9 -0.9 0.0 -0.8 -0.5 - 0.0 -0.6 -1.6 -1.6 >6 -0.1 -0.6 0.3 0.0 >6

Figure 2.5 Computational saturation scanning of Ubvs Computational saturation scanning data mapped onto the structures of (A) Ubv.2.1 and (B) Ubv.21.4. Residues shaded in red indicate positions that tolerate less than or equal to 3 substitutions, yellow indicates positions that tolerate between 4- 7 substitutions and green indicate positions that tolerate 8 or more substitutions. (C and D) Database of ∆∆Gbind values calculated from in silico substitution of the 19 non-wt amino acids at each indicated position for 42

(C) Ubv.2.1 binding to USP2 and (D) Ubv21.4 binding to USP21. The wt ∆∆Gbind values are boxed, and negative ∆∆Gbind values are highlighted in red. Dash indicates calculations that were not completed most probably due to largely unfavorable energy upon introduction of the particular mutation.

2.4.2 The Ub-binding site of USPs

I next examined the interactions of Ubv.2.1 and Ubv.21.4 with their cognate USPs. I examined the structure of the USP2:Ubv.2.1 complex and defined the Ub-binding site on USP2 as 52 residues that are within 4.5 Å of Ubv.2.1 (Fig. 2.6). In a manner analogous to the analysis of the saturation scan data for the Ubvs, an alignment of these 52 positions across 48 human USPs (Fig. 2.6), and the data were used to calculate bits scores to quantify sequence conservation within the human USP family. Based on the same cut-offs used for the saturation scan analysis of the Ubvs, the Ub-binding site contains 18 residues that exhibit high conservation and 12 that exhibit moderate conservation, and are shown as a sequence logo (Fig. 2.7B). The sequence conservation data were visualized by mapping the bits scores onto the USP2 structure (Fig. 2.7A).

43

Position Numbering According to USP2 sequence 349 355 356 357 358 359 360 363 364 367 371 403 406 419 421 423 432 434 436 437 438 439 440 442 444 461 462 466 468 471 472 473 474 475 477 487 489 501 503 504 505 507 509 514 549 552 553 554 555 556 558 592 USP2 R Q Q D A Q E R F D N Y R L S L S V D P F W D S P L F D L D E K P T C K F H K R F E R K H T T M G G Y Y USP21 S Q Q D A Q E K L E L Y R L S L S T E V F C D S P L F E L E N A P V D K L H N R F A R K H S V H Y G Y Y USP1 M Q H D A Q E Q C G E E I L L T T R E D F Q D S P Q F E I E D K Y F E R L H K C F A G K H T I S S G Y Y USP3 N Q Q D A H E R Y D L C G L N V S K D P F L D S D S F E L T E L Y M H K F H K R F W A K H G V G S G Y Y USP4 Q Q Q D S Q E A F D E H R F S L S T D P F C Y T P L F E L H D P W Y P K F H K R F Y R K H A M G V G Y Y USP5 E Q Q D A Q E L H N R . . V E I K T R V D Y I Q P A Y E V . . D F W T K T Q K K F F D K H S T M C G Y Y USP6 K Q Q D S Q E A F D E H R L S V S R D P F N F S P A F E L S E M Y Y S K L H K R F F N K H I L S G G Y Y USP7 S Q H D V Q E R V D N . C M S I S R E D Y Y D Q S D Y E L D N K Y D G K V Q M R F Y T K H D N H G G Y Y USP8 Q Q Q D S Q E L F D E H L F S V S T E A F M Y S P L F E L N N R F Y S K I H K R F Y G K H G L D G G Y Y USP9X P Q H D A L E N S D E . G F D K Y C E S F T T N D Q Y D L A N A Y H E K L Q K R F Y R K H Q A S G G Y Y USP9Y P Q H D A L E N S D E . G F D K F C E S F T T N D Q Y D L A N A Y H E K L Q K R F Y R K H Q A S G G Y Y USP10 S Q E D A E E G F N E Q F I S V A T Q P F F T Q D S L E V . . . G Y T R V H K R F Y T K H S A T G G Y Y USP11 Q Q H D S Q E S F D E H R F S L S T D P F C Y S P L F E L E N P W Y P K L H K R F Y K K H G M R D G Y Y USP12 L Q Q D A H E N Y N D N P L N T S K E D F L D S D G F E L E Y K Y Y E K M H K R F Y L K H G P N R G Y Y USP13 E Q Q D A Q E L H N R . . V E I R T R V D Y L Q P A F E V . . D F W S K S Q K K F F D K H S T M S G Y Y USP14 Q Q Q D A N E I Q R Q T K F T M E T G K E N Q Q S L R E T . . . . Q P K S Q V R F Y E K H S S S S G Y Y USP15 Q Q Q D C Q E A F D E H R F S L S T D P F C Y T P L F E L E D P W Y P K L H K R F Y R K H G M G G G Y Y USP16 R Q Q D S Q E R Y D A K S L S I S V E S F L D S P Q F E L A N K L L E K M H K R F Q G K H T M R S G Y Y USP17 G Q E D A H E M F D K H K W S I S T D P Y L D A D Q L E L E N A Y H G K L V K R F D . K H S C H N G Y Y USP19 Q Q H D A Q E A F D E H R Y S L S T D P F L Y P P L F E L E E A W Y P K L Q K R F F F K H G M I G G Y Y USP20 M Q Q D T Q E R C D E K R I S V S T E T F Q D S P A F D L D N M Y S E K C H K R F H V K H T A G S G Y Y USP22 H Q Q D A H E I A D R N H L S V S T D P F W D S D R F E L S A K I K S K L H K R F H A K H T L E S G Y Y USP24 E Q Q D A Y E T S D E . G Y D K Y R E A F M A N G Q F E L S N A Y Y E K T H M R F F S K H Q A H A G Y Y USP25 S Q Q D V S E H K D D D K F A G N . E M F G Q P Q A A G I . . . . . L Q H E S R F F L K H Q A N A G Y Y USP26 I Q N D A H E A H D D D G L H I I K E L N N Y S N L F E L . . . Y K A G H H K R Y L F K H T L K S G Y Y USP27 H Q Q D A H E I A D R N H L S V S T D P C W D S D R F E L S A K I K G K L H K R F H A K H T L E S G Y Y USP28 S Q Q D V S E H K D D P K F E R E . . T F G Q P Q G A G V . . . . L P Q R E S R F F L K H Q A N A G Y Y USP29 I Q N D A H E G Q D E A V L L L V K E P N N Y S N L F E L . . . Y N Q A H H K R Y F A K H S P N S G Y Y USP30 Q E Q D A H E H V S D H S L S M S R D T F D S S S H F E V . . D V V D K L H Q R L W H K H D M H S G F Y USP31 Q Q H D A Q E L W D E S V Y S L S T D P F L C S P L Y E L D D A W R P I L H K R F Q G K H T M Q G G Y Y USP32 R Q Q D S Q E A F D E H R L S V S R D P F N F S P A F E L N E M Y Y S K L H K R F F N K H I L G G G Y Y USP33 T Q Q D A Q E R C D E Q K I S V S T E T F Q D S P A F D L D N M Y S E K C H K R F H L K H T A S S G Y Y USP34 P Q K D M T E T D T E . E I N V S T E E F Y T R Q E V D L D N M Y T S K A N M R Y F T K H T A D G G Y Y USP35 W Q Q D C S E K Y D E P P I T I S R E A F T D S A Y F E L E N R Y Y E K V T L R F F T K H S S E S G Y Y USP36 H Q E D A H E R Y D K R Q L S V S T D P Y L D A E L F D L E N A Y M A K F S K R F N . K H S C H A G Y Y USP37 R Q N D A H E S Q D E A A V H I I K E Q F N D S D L F E L . . . Y S E V H H K R Y F L K H T S S S G Y Y USP38 W Q Q D C S E R F D E T G L T I S K E A F T D S A Y F E L D N Q Y Y E K M T L R F Y Y K H S S E S G Y Y USP40 E Q H D V Q E R I S T . S I N I S R E D F L D T A N Y E F D N L Y H G K A S L R F F K K H G C Y G G Y Y USP41 P Q H D A A Q L K N D . H M D L S R S S M L T R S C F R L K S K C F E Q L H M R F I N K H M A D S G Y Y USP42 H Q E D A H E Q Y D K R Q L S V S T D P Y L D T E Q F E L E N S Y K S K F S K R F N . K H N C H A G Y Y USP43 Q Q H D A L E L W D E Q L Y S L S T D P F L C S P F Y E L D D A W K P V L H K R F Q G K H N L Q G G Y Y USP44 A Q Q D A Q E C E D R L Q L S V S T E P F W D S E K F E L G K I Y V D K L H K R F W R K H G F G S G Y Y USP45 R Q Q D S Q E H Y D T K V L S V S V D P F I D S P Q F E L N N K L L E K L H K R F Q G K H S M R E G Y Y USP46 L Q Q D A H E N Y N D N P L N T S K E D F L D S D D F E L E Q K Y Y E K M H K R F Y L K H G P N R G Y Y USP47 E Q H D V Q E R V D Q . E L D V G R D T Y L D P V A F E L P N Q Y F E K L Q K R F F T K H S A A G G Y Y USP48 G Q Q D A Q E K L S D K D Y Y T S L S K F Y E E N E F E L D N R Y F E R I Q M R F F T K H S A Y S G Y Y USP49 A Q Q D A Q E C E H Q L Q L S V S T E P F W D S E K F E L G R I Y A D K L H K R F W R K H G F G S G Y Y USP51 H Q Q D A H E I A D R N C L S V S T D P C W D S D W F E L S A K I K N K L H K R F H G K H T L E S G Y Y Bits 0.7 4.2 2.7 4.3 2.7 2.2 4.2 0.9 0.8 2.8 1.8 1.3 0.8 2.1 2.0 1.9 2.5 2.4 2.7 1.6 2.8 1.1 1.9 2.2 1.7 1.0 3.0 3.3 3.3 1.9 2.3 1.6 2.2 0.9 1.4 3.0 1.5 2.6 2.9 3.9 3.8 1.5 1.0 4.3 4.3 1.7 1.2 0.8 2.0 4.3 4.2 4.3

Figure 2.6 Sequence alignment of the Ub-binding sites of the human USP family. The Ub-binding site was defined as the 52 residues in USP2 that are within 4.5 A ̊ of Ubv.2.1 in the structure of the USP2:Ubv.2.1 complex (PDB entry 3V6C), and the alignment for these residues is shown for 48 human USPs. Residues are shaded based on bits score: green ≤2 (non- conserved), 2< yellow <2.7 (conserved), red ≥ 2.7 (highly conserved).

44

A

42 42 49 180 o 44 49 68 10 44 6 10 68 6 12 66 12 4 4 66 !"

B 4 3

Bits 2 1 0 355 356 357 358 359 360 367 419 421 432 434 436 438 442 462 466 468 472 474 487 501 503 504 505 514 549 555 556 558 592 USP2 Q Q D A Q E D L S S V D F S F D L E P K H K R F K H G G Y Y USP21 Q Q D A Q E E L S S T E F S F D L N P K H N R F K H Y G Y Y * * ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ * * * * * * * ^

Figure 2.7 Sequence conservation across the Ub-binding sites of human USPs. (A) The structure of USP2 in complex with Ubv.2.1 (PDB entry 3V6C). USP2 is shown as a grey surface with residues in the Ub-binding site (residues within 4.5 Å of any residue on Ubv.2.1) colored according to bits scores derived from the alignment of 48 human USPs, as follows: green ≤ 2, 2 < yellow < 2.7, red ≥ 2.7. The main-chain of Ubv.2.1 is shown as a wheat tube and nine residues that show high conservation in the saturation scan of both Ubv.2.1 and Ubv.21.4 (core functional epitope) are shown as ruby spheres. White or black dashed lines demarcate site-1 or site-2, respectively The structure on the right is Ubv.2.1 rotated 180° relative to Ubv.2.1 in complex. (B) The sequence logo for highly conserved (bits ≥ 2.7) and moderately conserved positions (2 < bits < 2.7) within the Ub-binding sites of human USPs. Positions are numbered according to the sequence of USP2, and sequences of USP2 and USP21 are shown. Asterisks (*) or arrowheads (^) indicate residues that interact with the Ub C-terminal tail or the core functional epitope, respectively.

Viewed in a structural context, the conserved residues form two sites that interact with Ubv.2.1. Eight highly conserved residues and one moderately conserved residue form the active site cleft (site-1) that interacts with the C-terminal tail of Ubv.2.1 (residues 70-74). Ten of the remaining highly conserved residues and the remaining 11 moderately conserved residues form

45

a second site (site-2) adjacent to site-1 (Fig. 2.7). Site-2 interacts with the flat beta-sheet face of Ubv.2.1, and notably, 14 of the residues in site-2 interact with the nine residues that form the core functional epitope shared by Ubv.2.1 and Ubv.21.4.

I speculated that the USP2 site-2 revealed to be important for binding to Ubv.2.1 may also be involved in the recognition of native Ub during catalysis. I explored this possibility using alanine-scanning mutagenesis (Morrison and Weiss 2001) by constructing a panel of USP2 variants containing single-site alanine substitutions at each of the 14 conserved site-2 residues that make contact with the core functional epitope of Ubv.2.1 and measuring their effects on the catalytic efficiency for the hydrolysis of Ub-AMC substrate. These assays showed that 10 of the 14 alanine substitutions reduced catalytic efficiency by at least two-fold relative to wt USP2 (Fig. 2.8), thus verifying that USP2 uses site-2 for the recognition of Ub substrates.

46

A -1 -1 -1 USP2 kcat (s ) KM (M) kcat/KM (M s ) Fold Effect** wt (1.4 ± 0.1) x 10-1 (1.9 ± 0.3) x 10-6 (8.0 ± 1) x104 1.0 F462A (2.9 ± 0.2) x 10-1 (7.0 ± 0.7) x 10-6 (4.1 ± 0.4) x 104 2.0 D466A (2.0 ± 0.1) x10-1 (6.0 ± 0.5) x 10-6 (3.3 ± 0.3) x104 2.4 P474A (6.0 ± 0.2) x10-2 (2.5 ± 0.3) x 10-6 (2.4 ± 0.3) x 104 3.3 K503A (5.7 ±0.2) x10-1 (5.1 ± 0.5) x 10-6 (1.2 ± 0.1) x 105 0.7 Y592A (4.8 ± 0.4) x10-1 (1.3 ± 0.2) x10-5 (3.9 ± 0.7) x104 2.0 D357A* (5.9 ± 0.2) x 102 135

4 Q359A* (3.7 ± 0.1) x 10 2.2 E360A* (1.3 ± 0.1) x103 63

L419A* (7.3 ± 0.2) x104 1.1

S421A* (1.2 ± 0.1) x104 6.9

D436A* (9.7 ± 0.5) x103 8.3

L468A* (3 ± 0.1) x 103 27

K487A* (2.7 ± 0.1) x 104 3.0

H501A* (2.9 ± 0.1) x 104 2.8 B D357 ! E360 Y592 Q359 H501 42 49 K503 44 68 10 F462 6 L419 12 66 D466 D436 4 L468 S421K487 P474

Figure 2.8 Alanine-scanning of USP2 site-2

(A) Kinetic values for Ub-AMC cleavage by USP2 variants. Asterisk (*) Only kcat/KM values could be determined. See “Materials and Methods” for details. Double asterisks (**) Fold effect was calculated by dividing kcat/KM of wt USP2 by the kcat/KM of the variant. (B) Alanine- scanning data mapped on to the structure of USP2 bound to Ubv.2.1 (PDB entry: 3V6C). USP2 is shown as a surface colored according to fold reduction in catalytic efficiency relative to wt caused by alanine substitutions at the labeled positions, as follows: green < 2, 2 ≤ yellow < 10, red > 10, grey = unscanned. The Ubv.2.1 main chain is shown as a tube and residues in the core functional epitope are shown as numbered spheres.

47

2.4.3 Interactions at the USP:Ubv interface

Saturation scanning enabled elucidation of the core functional epitope of the two Ubvs for binding to their cognate USPs and examination of the USP:Ubv complex structures identified the USP residues interacting with this epitope. Notably, the core functional epitopes of Ubv.2.1 and Ubv.21.4 differ from the sequence of Ub.wt at only two or one positions, respectively, suggesting that each Ubv likely interacts with its cognate USP in a manner very similar to that of Ub.wt. Indeed, comparison of the USP2:Ubv.2.1 and USP2:Ub.wt (Fig. 2.9A and B, respectively) or the USP21:Ubv.21.4 and USP21:Ub.wt complex structures (Fig. 2.9 C and D, respectively) revealed excellent superposition of the C-α atoms, with a root-mean-square deviation (rmsd) less than 0.28 Å or 0.32 Å, respectively. Moreover, there was excellent superposition of the six or seven side-chains of the core functional epitope that are identical between Ub.wt and Ubv.2.1 or Ubv.21.4, respectively (rmsd less than 0.13 Å or 0.20 Å, respectively) and for the 19 USP2 side-chains or 18 USP21 side-chains that make contact with any of the eight side-chains of the Ubv.2.1 or Ubv.21.4 core functional epitope, respectively (rmsd less than 0.18 Å or 0.36 Å, respectively).

48

D357 A E360 B F364 Y592 R42 H501 R363 Q49 Q359 K503 H68 I44 F462 K6N

L419 F489 D466 T66 T12H D436 S421 L468 F4 K487 L423 P474 C D357 Y592 D E360 R42 H501 Q49 L364 Q359 K363 H68F I44 F462 K6 L419 L489 E466 K487 T66 E436 L468 T12 S421 L423 P474 F4

Figure 2.9 Superposition of USPs in complex with Ub.wt or Ubvs. Ribbon representations of superposed (A) USP2 from USP2:Ub.wt (PDB entry 2HD5) and USP2 from USP2:Ubv2.1 (PDB entry 3V6C), (B) Ub.wt from USP2:Ub.wt (PDB entry 2HD5) and Ubv2.1 from USP2:Ubv2.1 (PDB entry 3V6C), (C) USP21 from USP21:Ub.wt (PDB entry 3I3T) and USP21 from USP21:Ubv21.4 (PDB entry 3MTN), and (D) Ub.wt from USP21:Ub.wt (PDB entry 3I3T) and Ubv21.4 from USP21:Ubv21.4 (PDB entry 3MTN). USP and Ub.wt in complex are colored green or light green, respectively. USP and Ubv in complex are colored purple or pink, respectively. Side-chains are shown as sticks for the Ubv core functional epitope and for USP side-chains that contact the core functional epitope (USP side-chains that are within 4.5 Å of side-chains in the Ubv core functional epitope). For residues that differ between Ub.wt and Ubv, the Ubv sequence is shown after the residue number. Both USP2 and USP21 are numbered according to the sequence of USP2, based on a sequence alignment (Fig. 2.6).

I next compared the USP2:Ubv.2.1 and USP21:Ubv.21.4 complex structures to understand how the core functional epitopes on the Ubvs interact with USPs (Fig. 2.10A). Superposition of the two complexes showed the overall structures to be highly similar (rmsd less than 0.55 Å for C-α atoms). Notably, six of nine residues in the core functional epitopes of the Ubvs are identical and 12 of 19 USP residues that contact this epitope are identical. Thus, both in terms of the overall fold and in terms of side-chain interactions, the two Ubvs bind to their cognate USPs in a very similar manner.

49

A B R42 Q49

o D357 50 E360 Y592 o 50 F364L I44 Patch A Q359 H501 R363K C K503N H12T N6K D466E

o 30 H68F o 55 K487 F462 Patch B F489L L419 D T66 D436E F4 USP2 USP21 L468 o Thumb Thumb 70 Palm Palm F489L P474 K487 Fingers Fingers S421 non wt-conserved L423 wt-conserved

Figure 2.10 Interactions between the Ubv core functional epitopes and USPs. (A) Superposition of USP2 (purple, pink and magenta) and USP21 (lime, sand and yellow) in complex with Ubv.2.1 (green) or Ubv.21.4 (blue), respectively. The USP main-chains are shown as ribbons, the catalytic Cys is shown as a purple sphere and side-chains are shown as sticks for residues that contact the Ubv core functional epitope (within 4.5 Å). For clarity, the Ubv main- chains are not shown and only the side-chains of the core functional epitopes are shown. (B, C, and D) Interactions between USP side-chains and Ubv side-chains of (B) patch A residues 42, 44 and 49, (C) non-wt patch B residues 6, 12 and 68, and (D) wt patch B residues 4 and 66. USP residues are labeled in italics and numbered according to the sequence of USP2. For residues that differ between USP2 and USP21, the USP21 sequence is shown after the residue number. Ubv residues are labeled in bold, and for residues that differ between Ubv.2.1 and Ubv.21.4, the Ubv.21.4 sequence is shown after the residue number.

The USP fold is comprised of three sub-domains, likened to a palm, thumb and fingers, with the catalytic site located between the palm and the thumb (Hu, Li et al. 2002). The eight side-chains in the Ubv core functional epitope form two patches, and one patch (patch A) docks on to the palm while the other (patch B) docks on to the fingers (Fig. 2.10A). Patch A contains three residues (Arg42, Ile44, Gln49) and these are all conserved as the Ub.wt sequence in both Ubvs (Fig. 2.10B). Patch B contains five residues, which can in turn be grouped into two clusters, with one cluster containing two residues (Phe4, Thr66) that are conserved as the Ub.wt

50

sequence in both Ubvs (Fig. 2.10D) and the second cluster containing three positions (6, 12, 68) that differ between the Ubvs and show preference for particular non-wt sequences in the saturation scans (Fig. 2.10C).

Each patch is anchored by a highly conserved hydrophobic residue: Ile44 or Phe4 in patch A or B, respectively. In patch A of Ubv.2.1, the hydrophobic side-chain of Ile442.1 resides in a predominantly hydrophilic environment formed by the side-chains of Gln359USP2 and Arg363USP2, which are either identical (position 359) or conservatively substituted (position 363) in USP21 (Fig. 2.10B). The other two residues in patch A that are conserved as the Ub.wt sequence (Arg42 and Gln49) also reside in a hydrophilic environment that is conserved in the two USPs. In particular, the Arg422.1 side-chain forms hydrogen bonds with the side-chains of Gln359USP2 and Gln492.1 and makes favorable electrostatic interactions with the side-chain of Asp357USP2, and the Gln492.1 side-chain also forms a hydrogen bond with the Glu360USP2 side- chain. Notably, all of these side-chains are identical in USP2 and USP21 and are highly conserved within the USP family (Fig. 2.6 and 2.7B), and consequently, the interactions between Ubv.21.4 and USP21 are virtually identical in this region of the interface. In patch B of Ubv.2.1, the Phe42.1 side-chain interacts with Pro474USP2, the side-chains of Leu423USP2 and Leu468USP2, and the aliphatic portion of the Lys487USP2 side-chain. The amino group of the Lys487USP2 side-chain makes a hydrogen bond with the side-chain of Thr662.1, the second residue in patch B that is conserved as the Ub.wt sequence (Fig. 2.10D). In addition, the aliphatic portion of the Thr662.1 side-chain packs against the side-chains of Asp436USP2 and Phe489USP2. Notably, aside from positions 436 and 489 that have conservative substitutions, these residues are identical and make very similar interactions in the USP21:Ubv.21.4 complex structure.

Between the conserved hydrophobic anchor residues (Ile44 and Phe4) and the associated conserved hydrophilic residues (Arg42, Gln49, Thr66), resides a cluster of three residues (6, 12, 68) that differ between Ubv.2.1 and Ubv.21.4 and tend to vary from the sequence of Ub.wt (Fig. 2.10C). At position 68, Ubv.2.1 contains a wt His residue while Ubv.21.4 contains a Phe substitution, but the saturation scan shows that Ubv.2.1 also prefers a Phe at this position (Fig. 2.3C). Consistent with a preference for a hydrophobic Phe, the side-chains at position 68 of both Ubvs interact with three hydrophobic residues in their cognate USPs that are either identical (positions 419 and 462) or conserved (position 489). At position 6 of Ubv.2.1, the wt Lys

51

residue is replaced by an Asn and Met is also highly prevalent in the saturation scan dataset (Fig. 2.3C). The Asn62.1 side-chain forms a hydrogen bond with the side-chain of Lys503USP2. The side-chain at position 6 is close to the hydrophobic side-chains of Phe462USP2 and Phe489USP2, and this likely explains why a hydrophobic Met can effectively substitute for Asn at this position (Fig. 2.4). Notably, although Ubv.21.4 contains a wt Lys at position 6, the saturation scan also shows a preference for Asn, which could be explained by a putative hydrogen bond with Gln503USP21 (Fig. 2.10B). At position 12 of Ubv.2.1, His dominates over the wt Thr and all other substitutions in the saturation scan (Fig. 2.10C), suggesting that this side- chain contributes to recognition of USP2. This preference can be explained by a hydrogen bond between the His122.1 side-chain and the side-chain of Asp466USP2. In contrast, in the saturation scan data for Ubv.21.4, His is completely absent at position 12, which instead prefers the wt Thr and also tolerates Ser and Tyr. In this case, the structure shows that a longer Glu466USP21 side- chain makes a hydrogen bond with Thr1221.4, and it is likely that a larger His residue at position 12 could not make the same favorable interactions.

2.5 Discussion

By applying phage display and saturation scanning analyses, we have obtained detailed views of how Ubv.2.1 and Ubv.21.4 bind with high affinity to their respective targets USP2 and USP21. Moreover, the availability of structures of each USP in complex with either its cognate Ubv or Ub.wt enabled us to extend our insights to the interactions between USPs and native Ub. Our structural comparison shows remarkable similarity at both the main-chain and side-chain level between key residues on both sides of the interface (Fig. 3), suggesting that our conclusions regarding the Ubvs are likely also applicable to native Ub.

Overall, the saturation scans reveal that a large number of residues in each Ubv exhibit significant sequence conservation, and thus, a large surface area at the binding interface is recruited for productive binding contacts (Fig. 1). Notably, nine positions exhibit high conservation in both Ubvs, and this core functional epitope is involved in interactions with many other Ub-binding proteins (Roscoe, Thayer et al. 2013, Harrison, Jacobs et al. 2015). Moreover, six of the nine positions (4, 6, 10, 42, 44, 68) have been shown to be highly intolerant

52

to substitutions in a study that probed the effects of point mutations on the biological function of Ub in yeast (Roscoe, Thayer et al. 2013).

Interestingly, the core functional epitope docks on a region of the USP fold that is conserved within the family and is distinct from the active site (site-2, Fig. 2.7). Six of the nine residues within the core functional epitope are conserved as the wt sequence and interact with USP residues that are also highly conserved within the USP family, suggesting that these interactions may make similar energetic contributions to the recognition of Ub substrates by many USPs (Fig. 2.6 and 2.7). The importance of site-2 for the recognition of Ub substrates was confirmed by alanine-scanning mutagenesis of USP2, which showed that alanine substitutions at many positions in site-2 reduced catalytic efficiency for hydrolysis of Ub-AMC (Fig. 2.8). The remaining three residues are also highly conserved but favor non-wt sequences. Notably, these three residues cluster together (Fig. 2.10C) and include residues that were mutated in the Ubvs to gain high affinity and specificity for their cognate USPs, and thus, they represent a critical cluster that can be exploited for specific inhibitor design.

Taken together, our results show that site-2 is important for the recognition of Ubvs and Ub substrates, and thus, this region may be an attractive alternative target for the design of small molecules that inhibit USP function with higher specificity than has been possible with inhibitors targeting the highly conserved active site (Colland, Formstecher et al. 2009, Colland 2010, Pal, Young et al. 2014). Indeed, site-2 on the USPs and the core functional epitope on the Ubvs conform to the hot spot model of protein-protein interactions, in which a subset of side- chains on each side of an interface form contiguous patches that interact with each other and contribute a large fraction of the binding energy (Clackson and Wells 1995, Wells 1996, Moreira, Fernandes et al. 2007). While inhibition of protein-protein interactions with small molecules remains challenging, significant progress has been made in targeting sites other than classical active site clefts (Arkin and Wells 2004, Arkin, Tang et al. 2014, Guo, Wisniewski et al. 2014).

Our phage-displayed technology for developing high affinity Ubvs has proven successful against not only USPs, but also against other DUB families, E3 ligases, E2 conjugating enzymes and Ub-binding domains (Ernst, Avvakumov et al. 2013). Indeed, we have recently reported structures of Ubvs that target E3 ligases of the HECT (Zhang, Wu et al. 2016), SCF (Gorelik, Orlicky et al. 2016) and RING families (Brown, VanderLinden et al. 2016). 53

Thus, the methods described here can be applied to dissect the functional epitopes for Ubvs targeting many other components of the Ub proteasome system. These studies should prove useful for guiding new inhibitor design strategies, and also, for understanding the molecular basis for the myriad of protein interactions that mediate Ub biology.

54

3 Ubiquitin as an alternative scaffold for the development of affinity reagents

3.1 Statement of contribution

I constructed all libraries described in this chapter. I performed all selections, except for Fab library selections on the following targets: Grb2, Her3, ROR-alpha, MBP, and BATF, and selections on targets: ASH1L, BRD1, BRD2, BRD3, and BRD4.

HEK293T cell lines stably overexpressing ErbB family member(s) were engineered by Dr. Nish Patel, a former post-doc in the Sidhu/Moffat lab.

All surface plasmon resonance (SPR) experiments were designed by Nick Jarvik and myself and performed by Nick Jarvik.

3.2 Introduction

Affinity reagents, such as Abs, are powerful binding tools that are used to identify and detect the presence of virtually any analyte in a sample. As such, they are the cornerstone of biological research and biomedicine. The demand for affinity reagents has accelerated over the past decade due to the expansion of proteomic approaches aimed at uncovering the diverse functions of gene products in biology, health and disease

Although, the global affinity reagent market in 2014 has been estimated at a worth of $2.2 billion, with a staggering 2 million products available (Fan 2015), there still exists an immense need for additional specific affinity reagents, because tools useful for identifying many proteins, sequence variants, and post-translational modifications still do not exist (Taussig, Stoevesandt et al. 2007, Colwill, Renewable Protein Binder Working et al. 2011, Edwards, Isserlin et al. 2011). The direness of the situation has been highlighted in a number of recent opinion pieces (Edwards, Isserlin et al. 2011, Anonymous 2015, Bradbury and Plückthun 2015,

55

Marcon, Jain et al. 2015) and the establishment of numerous national large-scale projects to expand the catalogue of affinity reagents (http://commonfund.nih.gov/proteincapture/index# , https://antibodies.cancer.gov/apps/site/default , Taussig, Stoevesandt et al. 2007, Ponten, Jirstrom et al. 2008, Colwill, Renewable Protein Binder Working et al. 2011).

For decades, antibodies have been the gold standard for affinity tools, due to their unique structure that permits a high tolerance for sequence variation to make diverse binding surfaces. However, a number of structural complexities inherent to the antibody fold have limited their utility. First, the complex, heterotetrameric structure of conventional IgGs makes production technically challenging, which has been linked to batch-to-batch variations and problems with data reproducibility (Bradbury and Plückthun 2015). Secondly, the large number of disulfides in the native structure prevents antibody usage in live, intracellular protein studies because they do not readily fold within the reducing environment of the . Another class of affinity reagents known as alternative scaffolds has been emerging for the past decade. These reagents, which lack disulfides and are typically less than 10 KDa, have provided relief from antibody limitations, and have been successfully used in diverse applications in research, diagnostics, and therapeutics (Lofblom, Feldwisch et al. 2010, Wurch, Pierre et al. 2012, Hassanzadeh- Ghassabeh, Devoogdt et al. 2013, Tanaka, Takahashi et al. 2015). The lack of disulfides in these affinity reagents opens up an immense potential for intracellular applications, including protein localization studies, the discovery of unknown gene functions, and the validation of potential drug targets.

Alternative scaffolds are typically much less structurally complex when compared to antibodies. Like other successful scaffolds such as Anticalins, Affibodies, Monobodies, and DARPins (Fig. 1.4), Ub also possesses favourable biochemical properties that make it suitable as a scaffold. Ub is remarkably stable at high temperatures in a wide spectrum of pH and denaturation agents (Wintrode, Makhatadze et al. 1994, Ibarra-Molero, Loladze et al. 1999, Kony, Hunenberger et al. 2007), which enables it to tolerate extensive sequence variation for forming new binding surfaces. It is highly soluble with no cysteines, thus allowing for less complex bacterial production methods. The lack of disulfide bonds also permits Ub based reagents for usage in the reducing intracellular milieu, which is currently intractable for traditional antibody formats. Furthermore, the small size of the 76-residue Ub protein also supports modular design for novel capabilities.

56

Previously, our group and a company, Scil Proteins, have already validated Ub as a viable scaffold option. Our lab has demonstrated Ub as a scaffold to generate activators or inhibitors of enzymes, such as E3s, DUBs, and UBDs in the ubiquitin proteasome system (UPS) (Ernst, Avvakumov et al. 2013, Gorelik, Orlicky et al. 2016, Zhang, Wu et al. 2016). Ub phage libraries were produced by randomizing the natural protein-binding site situated on the flat β- sheet of Ub, in a manner that biased the occurrence of wt sequence, Through panning, Ubvs exhibiting enhanced affinity and specificity for targets were selected. The Ubvs that inhibited or activated ubiquitination were active inside cells. Thus, our previous work validated Ub as a scaffold for making extracellular and intracellular functional reagents.

At Scil Proteins, while the Ub β-sheet was also chosen as the designed binding site, the randomization approach differed substantially, with the effort focused on producing binders to targets that do not naturally bind Ub. Two Ub phage libraries were produced, one of a mono-Ub and another of a di-Ub joined by a linker. The former library had six positions “hard randomized”, which allowed for all 20 amino acids (Hoffmann, Kovermann et al. 2012). The resulting library thus had a restricted complementarity region and a limited library diversity of 6.4 x 107, which likely explained why only low affinity binders (μM range) were isolated from this library. The di-Ub library expanded the engineered binding site to 15 residues spread over the two Ub molecules (Lorey, Fiedler et al. 2014), and was successful at isolating nM affinity binders to targets of interest.

Here, I present a phage-displayed Ubv library that differs substantially from previously reported libraries, and applied it to select for tight and selective Ubvs binding to diverse proteins of interest. Like our group’s previous libraries and the libraries reported by Scil Proteins, my naïve Ubv library uses the β-sheet of Ub as the binding surface. However, unlike the previous libraries, I diversified a surface comprising an expansive 24 positions, and thus increased the likelihood for making contacts with antigens. I also biased the binding surface for Tyr and Ser residues, which have proven to contribute functionally to binding (Fellouse, Wiesmann et al. 2004, Fellouse, Li et al. 2005, Fellouse, Barthelemy et al. 2006). In addition, I incorporated numerous quality control measures into the library design and production processes to maximize Ubv stability and display, which are critical parameters that contribute to the success of a library. Finally, I demonstrated the feasibility of making affinity reagents from the Ub library by validating affinity reagents for two proteins of interest: the cell-surface receptor Her3 and the

57

intracellular adaptor Grb2. Finally, I applied a novel anti-Grb2 Ubv as a genetically encoded intracellular inhibitor of receptor signaling.

3.3 Methods and Materials

3.3.1 Library construction

For mutation tolerance scan libraries, FLAG-Ub was cloned into a previously described phagemid for gene-3 minor coat protein display (Ernst, Avvakumov et al. 2013). Three stop templates were constructed, where each had a different region replaced with a stop codon, and were subsequently used for library randomization. For library design, each position was randomized using degenerate codons shown in Table 3.1, and in total three libraries were produced following procedures described previously (Fellouse 2007). For naïve Ub libraries, a dimerization sequence was added between the FLAG-Ub and the phage protein coding sequence for bivalent display, as described previously for Fab display (Lee, Sidhu et al. 2004). A stop template in which region 1 and 2 of the wt Ub sequence were replaced by stop codons was constructed and used to first construct a library where only region 1 and 2 were randomized using the KHT (K= G/C, H=A/C/T) degenerate codon. An equal mix of two mutagenic oligonucleotides was used to randomize region 2 (Table 3.2), and the phage library was produced as described previously (Fellouse 2007). The resultant phage library was subjected to two rounds of selection for binding anti-FLAG Ab (Sigma F3165) immobilized on 48 wells of a 96-well NUNC Maxisorp plate (Thermo Scientific 12565135) to enrich for variants that were stably displayed. Subsequently, the phage output was amplified in 200 ml of E.coli CJ236 culture, and the isolated ssDNA were used as template for a mutagenesis reaction that randomized region 3. Two Ub naïve libraries were produced that differed in how region three was randomized. In library KHT, KHT degenerate codons were used, while in library NNK, NNK (N=A/C/G/T) degenerate codons were used. The libraries were made by electroporating separately into E. coli SS320 cells. For amino acid frequency determination, 201 and 58 sequences were analyzed from the naïve library KHT or NNK, respectively. Phage pools from the libraries were concentrated to 1013 CFU/ml, aliquoted and flash frozen in 50% glycerol.

58

Table 3.1 Degenerate codons used in the mutation tolerance scan

wt Degenerate Position Residue Codon^ Encoded Residues 2 Q HMK K N T T Q H P P * Y S S 4 F KHT D V A Y F S 6 K HMK K N T T Q H P P * Y S S 8 L HHT N I T Y F S H L P 9 T HHT N I T Y F S H L P 10 G DVT G D A C Y S S N T Region 1 11 K HMK K N T T Q H P P * Y S S 12 T HHT N I T Y F S H L P 14 T HHT N I T Y F S H L P

42 R YNT H L P Y F S R C 44 I HHT N I T Y F S H L P 46 A KHT D V A Y F S 47 G DVT G D A C Y S S N T

Region 2 48 K HMK K N T T Q H P P Y S S 49 Q HMK K N T T Q H P P Y S S

62 Q HMK K N T T Q H P P * Y S S 63 K HMK K N T T Q H P P * Y S S 64 E KVK G G A A E D W C S S * Y 66 T YHT H L P Y F S 68 H YHT H L P Y F S 70 V KHT D V A Y F S 71 L YHT H L P Y F S Region 3 72 R YNT H L P Y F S R C 73 L YHT H L P Y F S 74 R YNT H L P Y F S R C 75 G KVT G D A C Y S V 76 G KVT G D A C Y S V * indicates a stop ^ all letters follow the International Union of Biochemistry nucleotide code.

59

Table 3.2 Oligonucleotides used to construct naïve Ub libraries.

Region ID randomized Oligonucleotide Sequence* IL_Reg1 1 AAA GCA GGC TCC ATG KHT ATT KHT GTG KHT ACC KHT KHT GGG KHT KHT ATC KHT CTC GAG GTT GAA CCC

a IL_Reg2.1 2 CCT CCT GAT CAG CAG KHT CTG KHT TTT GCT GGC KHT YMW CTG GAA GAT GGA CGT

b IL_Reg2.2 2 CCT CCT GAT CAG CAG KHT CTG KHT TTT GCT GGC AAG CAG CTG GAA GAT GGA CGT

c IL_Reg3.1 3 TCT GAC TAC AAT ATT KHT KHT KHT TCT KHT CTT KHT CTT KHT KHT KHT KHT KHT KHT KHT TAC CCA GCT TTC TTG

d IL_Reg3.2 3 TCT GAC TAC AAT ATT NNK NNK NNK TCT NNK CTT NNK CTT NNK NNK NNK NNK NNK NNK NNK TAC CCA GCT TTC TTG

* all letters follow the International Union of Biochemistry nucleotide code. a Oligonucleotide randomizing positions 42, 44, 48 and 49. b Oligonucleotide randomizing positions 42, 44, and 48. c Oligonucleotide used to construct KHT Ub library. d Oligonucleotide used to construct NNK Ub library.

3.3.2 Mutation tolerance scan

The three scanning libraries were handled separately, and each was subjected to two rounds of panning on anti-FLAG antibody immobilized on eight wells of a 96-well NUNC Maxisorp plate. Individual clones that bound to the anti-FLAG Ab were identified by phage ELISA(Cunningham, Lowe et al. 1994), and 100-200 binding clones were sequenced from the round 2-selection pool of each library. The frequency for each amino acid occurrence was normalized by adjusting for codon bias, which was done by dividing the occurrence of a particular amino acid by the number of times that amino acid was encoded in the degenerate codon. The mutant/wt ratio was calculated by dividing the normalized frequency of an amino acid substitution by the frequency of the wt.

3.3.3 Ag binding selections

Glutathione S transferase (GST)-Ag fusion proteins were immobilized onto 96-well NUNC Maxisorp plates by incubating 100 μ of protein (2-10 ng/ml) in each well overnight at 4 °C, and blocked with PBS/0.5% BSA for 1 hour at room temperature on the next day. Four rounds of selections were performed as described (Fellouse 2007). All Ags except for Her3, Grb2, MBP, BAFT, ROR-α, and FOXP1 were subjected to HTP selection. The protocol for HTP binding selections differed at the phage output amplification step between selection rounds. Instead of amplifying binding phages in 30 ml volumes of culture in baffled shaker flasks overnight at 37 °C, for the HTP method phage were amplified in 1.4 ml of 2YT broth supplemented with 50 μg/ml carbenicillin, 25 μg/ml kanamycin, and 1010 K07 helper phage in 96-deep-well plates.

60

Subsequently, the amplified phage supernatant was adjusted to neutral pH by adding 155 μl of 10x PBS and used directly in the next round of selection. For round one, a mix of phage from libraries KHT and NNK were incubated with immobilized Ags. To eliminate non-specific binding phages in rounds two through four, phage pools were preincubated in wells coated with GST for 30 minutes at room temperature and then transferred to selection plates.

3.3.4 Protein expression and purification

MBP, and Grb2-SH2 used for SPR were cloned into the pET15 vector (Invitrogen) using Gateway methods as described in (Liang, Peng et al. 2013) for fusion to the C-terminus of a 6xHIS tag. For FLAG-Grb2-SH2 protein used for peptide-binding ELISAs, a FLAG tag was cloned N-terminally to Grb2-SH2 in the pET15 Grb2-SH2 vector using transfer PCR method (Erijman, Dantes et al. 2011). All TFs were cloned into a custom-made isopropyl β-D-1- thiogalactopyranoside (IPTG) expression vector as described in (Huang, Economopoulos et al. 2015) and Ubvs were cloned into pDEST15 vector (Invitrogen) using Gateway methods, which all produced fusions to the C-terminus of GST. Her3-Fc fusion protein was purchased from R&D Systems (348-RB-050). Plasmids designed for protein expression were transformed into E.coli BL21 DE3 pLysS cells. Cultures were grown from single colonies and protein expression was induced with 500μM of IPTG when bacterial growth reached mid-log phase. Induced cultures were grown overnight at 18 °C, collected by centrifugation and lysed using standard methods (Gasteiger 2005). 6xHIS or GST tagged proteins were purified using Ni- NTA affinity resin (Qiagen 30230) or glutathione resin (GE healthcare 17-0756-05), respectively, according to manufacturer’s instructions. The purity of eluted proteins was determined by sodium dodecyl sulfate polyacrylamide (SDS-PAGE). Protein concentrations were determined by measuring the absorption at 280 nm with Nanodrop 1000 (Thermo Scientific) and converted to molarity using the extinction-coefficient determined by Protparam (http://web.expasy.org/protparam/).

3.3.5 Phage and protein ELISAs

All ELISAs were performed in 384-well Maxisorp plates (Thermoscientific 12665347). Assay volumes were 25 μl except for blocking, which was done with 50 μl volumes. For clonal phage ELISAs, 5 ng/ml of Ags or 5ng/ml of BSA (Bioshop ALB001.500) were immobilized as in section 3.3.3. Binding to immobilized anti-FLAG Ab (2 ng/ml) was used to assess display

61

levels. Clonal phage cultures were grown by picking single colonies into 400 μl of 2YT broth supplemented with 50 μg/ml carbenicillin and 1010 CFU/ml of K07 helper phage and incubated overnight at 37 °C. Supernatants containing phage were used directly for ELISAs, as described (Tonikian, Zhang et al. 2007) except for the following modifications: phages were incubated with immobilized Ag for 15 min and four washes were conducted to remove non-binding phages. The procedure for specificity ELISAs conducted with purified GST-Ubv proteins was the same as that for clonal phage ELISAs, except that binding to Ubv was detected with anti- GST-HRP Ab (1:8000, Sigma A7340).

For EC50 determination by ELISA, phages or Ubvs were serially diluted in 96-well plates, transferred to plates that were coated with immobilized Ag, and incubated for 15 min.

EC50 values were calculated by fitting binding curves with a four parameter logistic nonlinear regression model using Graphpad Prism software and were defined as the concentration required to achieve 50% of the maximum binding signal.

For competitive phage ELISAs, phage supernatants were first diluted 3 fold in PBS and then incubated with serial dilutions of soluble Ag in 96 well plates for 1 hour at room temperature. Subsequently, the solutions were transferred to 384 well ELISA plates, and the assay was completed using same protocol as described for clonal phage ELISAs.

For IC50 assays, ELISA plates were prepared by first immobilizing 25 μl of 5ng/ml of streptavidin overnight. Next day, plates were blocked with 50 μl of PBS/0.5% BSA and then 25 μl of 5 μM biotinylated p-Tyr SHC peptide (sequence: DHQpYYNDFPG, GenScript) or SHC peptide (sequence: DHQYYNDFPG, GenScript) were added. In a 96 well plate, serial dilutions of Ubv protein were incubated with 25 nM His-FLAG-Grb2-SH2 for 1 hour at room temperature before transfer to the ELISA plate. After 15 min of binding, the plate was washed four times with PBS and Grb2-SH2 binding was detected by anti-FLAG-HRP Ab (Sigma A8592).

3.3.6 Immunoprecipitations

For immunoprecipitation (IP) of Her3, all HEK293T cell lines expressing Her receptors were stable expression lines generated by Lentiviral transduction, except for HEK293T expressing EGFR. HEK293T cells overexpressing EGFR were made by transfecting a 15-cm dish of HEK293T cells at ~70 confluency with 5 μg of pCDNA3.1 EGFR plasmid using X- 62

tremeGENE9 transfection reagent (Roche 06365809001) and according to manufacturer’s instructions. Unless stated otherwise, all DNA transfections into mammalian cells were performed with X-tremeGENE 9 transfection reagent and used according to manufacturer’s instructions. Stable cell lines or transfected cells were grown to ~95% confluency on 15 cm plates and were lysed directly on plates by adding lysis buffer (Sigma S8830) supplemented with protease and phosphatase inhibitor cocktail (PPIC) (ThermoScientific 78440). Collected lysates were incubated on ice for 20 minutes and centrifuged at 21,130 g for 10 minutes. Protein concentrations were determined by bicinchoninic acid assay (ThermoScientific 23225). For each sample, 500 μg of lysate protein in 1 ml of lysis buffer were precleared by incubating with 50 ml glutathione sepharose slurry (GE #17-0756-05) by nutating at 4°C for 1 hour. Subsequently, 3 mg of Ubv or Ub.wt (negative control) were added to each sample and incubated at 4 °C from 4 hours or overnight. IPs were performed using 40 μl glutathione sepharose slurry according to manufacturer’s instructions. The resin was washed three times with 500 ml of lysis buffer and proteins were eluted by adding 50 μl SDS-PAGE loading buffer (Invitrogen NP0007).

For IP of Grb2, HEK293T cells were grown to ~70% confluency in 6-well plates and were transfected with 1.5 μg of pCAGGS GST-Ubv plasmid alone or in combination with 0.3 μg pCDNA3 3xFLAG-Grb2 plasmid. Two days after transfection, cells were lysed, IPs were performed using 20 μl glutathione sepharose slurry and proteins were eluted by adding 20 μl SDS-PAGE loading buffer, as described above.

3.3.7 Cell fractionation

HEK293T cells were grown to ~70% confluency in 6-well plates and were transfected with 1.5 μg of pCAGGS GST-Ubv plasmid. Two days post transfection, cells were washed with ice-cold PBS. Cytosolic and membrane fractions were isolated with a Subcellular Protein Fractionation Kit (ThermoScientific 78840) used according to manufacturer’s instructions, with the exception that an additional wash with 500 μl PBS supplemented with 1xPPIC was performed after isolation of cytosolic fraction and prior to membrane fraction isolation.

3.3.8 Western blotting

The following sample volumes were loaded for SDS-PAGE: 10 μl for Her3 IP experiment and EGFR signaling experiment, 15 μl for Grb2 IP experiment, and 25 μl for fractionation experiment. Proteins were transferred to PVDF member using standard methods. Instead of 63

stripping and reprobing blots, multiple fresh blots were used to probe for different proteins of the same size in each experiment to avoid possible residual signals. Blots were probed with antibodies against EGFR (Invitrogen 44798G), p-EGFR (Invitrogen 44794ZG), Her2 (LS Biosciences N12), Her3 (ThermoScientific 2F12), GST (Sigma A7340), HSP90 (SantaCruz sc- 7947), Na+K+ATPase (Abcam ab7671), and Abs targeting the following proteins were purchased from Cell Signaling Technology: Her4 (4795), GAPDH (2118), Grb2 (3972), ERK (9101), p-ERK (4370), MEK (4694), and p-MEK (9154).

3.3.9 EGFR signaling assay

HEK293T cells in 6-well plates were grown to ~70% confluency and transfected with 1.5 μg of pCAGGS GST-Ubv plasmid. After 6-8 hours, cells were transfected with siRNA designed to knock down Grb2 (ThermoScientific S6121) or mock transfected with Lipofectamine RNAiMAX (Invitrogen 13778-075). After 12-16 hours, cells were washed with PBS and starved by switching to media without FBS supplement. The next day, cells were stimulated with EGF (1 ng/ml), and after 10 minutes, cells were harvested by washing with ice-cold PBS. Cells were scraped off and each sample was split into two Eppendorf tubes. Cells in one tube were lysed for total protein analysis and those in the other tube were processed by fractionation for Grb2 localization analysis (see Section 3.3.7).

3.3.10 SPR kinetic binding analysis

SPR measurements were performed at 25 °C using a ProteOn XPR36 instrument (Bio-Rad). Proteins were immobilized by amine coupling to GLC sensor chip surfaces. GST-Ubv proteins were diluted with PBS buffer and injected for 360 s at a flow rate of 50 μl/min. For binding kinetics, sensorgrams were fitted to 1:1 Langmuir model using ProteOn Manager Software (Bio-

Rad).

3.4 Results

3.4.1 Mutation Tolerance Scan

To produce a combinatorial phage-displayed Ubv library to generate high affinity and selective binders to targets of interest, I aimed to use a large surface on Ub as the binding site. Previously, we identified a β-sheet surface on Ub consisting of 27 residues that was amenable for 64

engineering by phage display (Ernst, Avvakumov et al. 2013). By mutating a few residues on this surface, we were able to develop Ubv inhibitors and activators of enzymes in the Ub proteasome system. To extend the utility of the Ub scaffold and to develop Ubvs capable of binding to proteins that do not naturally interact with Ub, I decided to diversify all 27 residues with a subset of amino acids (Tyr, Ala, Asp, Ser, Phe, and Val) that have been proven to function in the binding sites of synthetic antibody and alternative scaffolds (Fellouse, Wiesmann et al. 2004, Gilbreth, Esaki et al. 2008, Mandal, Uppalapati et al. 2012). However, this resulted in a non-optimal library in which >95% of phage clones did not display Ubvs at detectable levels in phage ELISAs (Fig. 3.1), thus severely jeopardizing the probabilities of deriving functional Ubvs. We attributed low display levels to the high mutation frequency resulting in unstable variants incorporating destabilizing mutations, which would then be proteolytically cleaved in the E.coli during phage assembly (Betton, Sassoon et al. 1998, Ernst, Sazinsky et al. 2009). As a first step towards designing a library of more stable Ubvs, I designed an experiment to investigate which of the 27-targeted positions were intolerant to mutation.

2.5

2.0 m n 450

1.5 ce n a b

r 1.0 o s b A 0.5

0.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Ub.w Random phage clones t

Figure 3.1 Phage ELISA of Ub library Twenty-four randomly selected clones from the library constructed by randomizing 27 residues Ub were subjected to anti-FLAG ELISA to assess display. Ub.wt denotes Ub.wt displayed on phage.

Since detection of an epitope tag, such as FLAG, fused to the N-terminus of a displayed protein can be used as a proxy for stability (Betton, Sassoon et al. 1998, Ernst, Sazinsky et al. 2009), I designed a phage display approach called mutation tolerance scan to systematically assess and compare FLAG tag detection when wt residues were substituted by other residues.

65

First, I designed three phage libraries that together covered the 27 residues (Fig. 3.1). Each position was randomized with a degenerate codon that only encoded six to nine residues including the wt, Tyr and Ser (Table S1), and this allowed me to completely sample the theoretical diversities and enabled the quantitative assessment of the relative frequency of each substitution compared with the wt. The wt residue was important for comparing the occurrence of mutations relative to wt. Tolerance to substitution by Tyr or Ser was of interest because of their importance in mediating interaction in antibodies and scaffold based complementarity regions (Fellouse, Wiesmann et al. 2004, Fellouse, Barthelemy et al. 2006).

B A 1 2 3 76 9 75 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 1110 74 M Q V F I K T L T G K T I T L E V E P S D T I E N 8 73 71 Region 1 12 72 70 1 4 5 14 6 42 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 4 68 2 66 V K A K I Q D K E G I P P D Q Q R L I F A G K Q L 44 Region 2 64 49 6 63 46 47 48 62 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 E D G R T L S D Y N I Q K E S T L H L V L R L R G G Region 3

Figure 3.2 Library designs for mutational tolerance scan. Residues randomized in each of the three scanning libraries are colored differently and mapped onto (A) the Ub primary sequence and (B) the Ub structure (PDB entry: IUBQ). Secondary structure elements, β-strands (blue arrows) and α-helices (red zigzags), are indicated above sequence.

Following selections for binding to anti-FLAG Ab, many clones that were positive for anti- FLAG Ab binding by phage ELISA were sequenced. The sequences were used to calculate a mutant/wt ratio at each scanned position, and based on this ratio, positions were categorized as exhibiting, low, medium or high tolerance to mutations (Fig. 3.2). The analysis showed that of the 27 scanned positions, 23 positions exhibited either medium tolerance, defined as 0.5 > mutant/wt ratio < 1.5, or high tolerance, defined as mutant/wt ratio ≥ 1.5, for at least half of the encoded mutations. Only four positions exhibited low tolerance, defined as mutant/wt ratio < 0.5 for all or all but one tested substitutions (Fig. 3.2). Surprisingly, many positions tolerated side- chains that were substantially different than wt and in some cases multiple mutations with different chemical properties were preferred over wt. For example, position 62 preferred Pro or Tyr over the wt Gln (Fig. 3.2). Our data showed that Ub is structurally robust, and this is in 66

agreement with literature showing that Ub is very stable protein (Ibarra-Molero, Loladze et al. 1999, Lange, Lakomek et al. 2008).

Region 1 Region 2 Region 3 2 4 6 8 9 10 11 12 14 42 44 46 47 48 49 62 63 64 66 68 70 71 72 73 74 75 76 wt: Q F K L T G K T T R I A G K Q Q K E T H V L R L R G G W ------0.4 ------F - 1 - 1.5 1 - - 1.1 0.5 0.8 0.9 0.3 ------1.8 0.7 1.2 2.4 0.5 1 0.9 - - Y 0.8 1.1 1.2 1.3 0.4 0.1 0.4 0.8 0.6 0.7 0.6 0.2 0.2 1.1 0.4 7 1 2 1.2 0.8 1 1.8 0.8 1 0.6 0.7 1 M ------L - - - 1 0.4 - - 0.9 0.8 0.4 0.8 ------0.8 - 1 0.7 1 0.9 - - I - - - 1.1 0.9 - - 0.9 0.7 - 1 ------1.4 ------V - 3.6 ------0.6 ------1 ------A - 2.1 - - - 0.3 - - - - - 1 0.2 - - - - 1.1 - - 0.7 - - - - 0.8 0.8 G - - - - - 1 ------1 - - - - 0.9 ------1 1 S 0.9 2.0 1.9 2.4 0.6 0.2 0.7 0.5 0.9 0.4 1.1 0.3 0.2 1.0 0.4 5.5 1.1 1.4 1.5 0.9 1.4 2 0.7 0.6 0.5 0.6 1 T 0.6 - 0.8 1.6 1 0.1 0.4 1 1 - 0.5 - 0.2 0.7 0.4 2.5 0.7 - 1 ------D - 1.7 - - - 0.4 - - - - - 0.3 0.2 - - - - 1.2 - - 1.2 - - - - 0.9 0.8 E ------1 ------R ------1 ------1 - 1 - - K 0.2 - 1 - - - 1 ------1 0.6 2 1 ------H 0.6 - 0.7 1.0 0.9 - 0.3 0.6 0.8 0.3 0.6 - - 0.8 0.1 4 0.6 - - 1 - 1.4 0.7 0.8 0.9 - - N 0.7 - 2.5 1.8 1.2 0.3 0.6 0.5 0.9 - 0.6 - 0.1 0.6 0.2 3.7 0.7 - 1.9 ------Q 1 - 1.8 - - - 0.7 ------1.3 1 1 0.3 ------P 0.5 - 2.3 1.3 0.9 - 0.4 1.1 0.6 0.6 0.2 - - 2.7 0.5 3.5 0.5 - - 4.5 - 2.5 0.3 0.8 0.8 - - C - - - - - 0.4 - - - 1.1 - - 0.2 - - - - 1 - - - - 0.7 - 0.8 0.9 0.7

Figure 3.2 Mutant/wt ratios derived from mutation tolerance scanning data. The scan sampled 6-9 amino acids at each position, and amino acids not sampled are omitted. Ratios are shaded as follows: red ≤ 0.5, 0.5 > grey < 1.5, 1.5 ≥ green, and blue indicates the wt. Ub sequences shaded in yellow represent residues that were intolerant to mutation and were retained as wt in the final library design.

3.4.2 Library Design

The mutation tolerance scan identified four positions, Gly10, Ala46, Gly47 and Gln49 that exhibited low tolerance to mutation (Fig. 3.2). Within the Ub structure, G10, A46, and G47 are found in β-turns (Fig. 3.3), hence it is likely that the small residues were required to maintain the loop conformation Thus, we fixed positions 10, 46, and 47 as wt in the library design. Gln49 forms a a two-residue β strand with Lys48 (Fig. 3.3), which is highly tolerant to mutation. Since Gln49 and Lys48 together form a distinct surface, but one is highly tolerant of mutation and one is intolerant to mutation, we decided to use a partial randomization strategy at these two positions, where wt was allowed at least 50% of the time. At position 48, wt Lys was allowed at 50%, while 50% of the library was diversified with the KHT degenerate codon, which codes for Tyr, Ala, Asp, Ser, Phe, and Val. At position 49, while wt Gln was allowed at 50%, the other 50% of the library was diversified with the degenerate codon YMW. This codon was chosen to

67

further increase the variants stability, as it coded for the wt Gln in addition to Tyr, Ser. Due to codon degeneracy, His and Pro were also included.

10

49 47 46 48

Figure 3.3 Ub positions that exhibit low tolerance to mutations. The main-chain of the Ub structure (PDB entry: IUBQ) is shown as a ribbon, and spheres indicate positions that exhibit low tolerance to mutations in the mutation tolerance scan, except for position 48, which is tolerant to mutations. Scanned positions are colored as in Fig. 3.1.

Phage-displayed protein libraries that contain a high proportion of non-displaying phages can severely jeopardize selection outcomes (Kramer, Cox et al. 2003), and because display is linked to stability (Betton, Sassoon et al. 1998, Ernst, Sazinsky et al. 2009), we sought to further increase the structural stability of displayed Ubvs by integrating a stability selection step in the library construction process. The mutation tolerance scan revealed regions one and two to be more structurally sensitive to mutation than region 3 (Fig. 3.2). Thus, we first constructed a library in which regions 1 and 2 were randomized with the degenerate codon KHT (encoding for Tyr, Ala, Asp, Ser, Val, and Phe) and subjected this library to a selection for binding to an anti- FLAG Ab to enrich for Ubvs that were well-displayed. The resulting pool of binding phage were used make a template in which region 3 was randomized. For region 3, in addition to using the KHT degenerate codon for randomization (referred to as Library KHT), a second library was constructed where positions were randomized with the NNK degenerate codon, which encodes for all 20 amino acids (referred to as Library NNK), because I wanted to study whether additional diversity may provide better binding Ubvs.

In total, 24 of the 76 positions in the Ub sequence were diversified, and we constructed libraries that contained 6.8x1010 or 4x109 unique Ubvs in Libraries KHT or NNK, respectively. Sequencing of the naïve libraries showed that region one had an enrichment for Phe at the 68

expense of Ala, and regions two and three showed unbiased distributions of each allowed residue (Fig. 3.4). Libraries KHT and NNK had some wt sequence in region 3 due to incomplete oligonucleotide incorporation, at a frequency of 8% or 14%, respectively. Notably, 15% of the library NNK sequences had at least one stop codon incorporated in region 3.

A 100 * F G Y T 80 * V R A E 60 * S D Frequency(%) 40 * K H 20 Q P * * * * * * * * * * * 0 L 1 2 4 2 4 2 4 6 8 9 1 1 1 4 4 48 49 62 63 64 66 68 70 71 72 73 74 75 76 Position B 100 * F G * Y T 80 V R * * * * A E 60 * * S N * D C * *

Frequency(%) 40 * K M * * H I * 20 Q W P stop 0 L 2 4 6 8 9 11 12 14 42 47 48 49 62 63 64 66 68 70 71 72 73 74 75 76 Position

Figure 3.4 Amino acid frequencies in libraries KHT and NNK. Bar graphs show the amino acid frequencies for (A) library KHT and (B) library NNK. Asterisks (*) indicate the wt residues.

69

3.4.3 Binding selections

I pooled phage from libraries KHT and NNK and assessed the ability to derive binders to a panel of 40 diverse proteins (Fig. 3.5). For comparison, 37 of the 40 proteins were also subjected to binding selections with the highly validated Fab library F (Persson, Ye et al. 2013). I successfully isolated specific binding Ubv-phage clones for 30 of the 40 Ags (Fig. 3.5), and I determined the sequences of a total of 99 unique binding clones for 14 Ags (Fig. 3.6). Sequences of many clones that bound to the same target showed conservation at particular positions, suggesting that the conserved positions may be critical for binding to a common epitope. Of the 37 proteins that were subjected to selection with both Ubv and Fab libraries, the Ubv libraries yielded binders to 28 antigens, the Fab library yielded binders to 26 antigens, and 21 targets yielded binders with both libraries. Neither libraries were successful against four targets (ZFP691, EOMES, SFP1, and ORS2), suggesting that these targets may be unstable or misfolded and thus poorly antigenic. Taken together, these results show that Ubv-phage library KHT is robust and as functional as the highly validated Fab library F in terms of generating binders against a large and diverse panel of proteins.

70

Antigen Domain Ub Library Fab Library Her3 Extracellular 47/48 + Grb2 SH2 46/48 + BATF Basic Leucine Zipper 13/48 + MBP Full Length 41/48 + ROR-alpha zf-C4 6/48 + ASH1L Bromo 21/24 10/48 BRD1 Bromo 16/24 33/48 BRD2 Bromo 19/24 43/48 BRD3 Bromo 17/24 45/48 BRD4 Bromo 19/24 37/48 FOXC2 (mouse) Forkhead 1/24 2/24 FOXO3 Forkhead 6/24 16/24 GCN5 Bromo 24/24 31/48 Goosecoid2 Homeobox 9/24 4/24 MTA1 SANT + Zinc Finger 2/24 1/24 MTA2 SANT + Zinc Finger 1/24 3/24 NF B1 Rel Homology 1/24 2/24 Snail 1 zf-C2H2 14/24 19/24 Snail 2 zf-C2H2 16/24 10/24 ZEB1 Homeobox 13/24 1/24 ZEB2 Homeobox 16/24 8/24 FOXC1 Forkhead 10/24 - FOXC2 (human) Forkhead 22/24 - FOXJ3 Forkhead 15/24 - FOXP1* Forkhead 101/124 - Goosecoid Homeobox 19/24 - TWIST1&2 HLH-Basic 9/24 - Zscan4c zf-C2H2 10/24 - SOX4* homeobox 1/48 NA TCF3* HLH-Basic 11/48 NA HIF-1 Basic Motif + HLH - 3/24 MYB Myb DNA-binding - 15/24 PB1 Bromo - 36/48 PITX1 Homeobox - 20/24 ZFP281 zf-C2H2 - 20/24 ESRRB zf-C4 - NA EOMES T-box - - ORS2 zf-C2H2 - - SFPI1 ETS - - ZFP691 zf-C2H2 - - Figure 3.5 Binding selection outcomes for a panel of protein antigens. For each antigen, the indicated domain was used as a target for binding selections. A plus sign (+) indicates that library F yielded binding Fabs but percent positives were not determined. Results with library F were taken from previous studies for Grb2 (Colwill, Renewable Protein Binder Working et al. 2011) and MBP (Persson, Ye et al. 2013). “NA” indicates that the selection was not attempted. Green and red shading denote successful or failed selections, respectively. A selection was considered to be successful if it yielded at least some clones exhibiting at least five-fold higher binding signal by phage ELISA against antigen compared with an irrelevant protein, and numbers in cells show the number of positive clones followed by the number of clones screened.

71

Region 1 Region 2 Region 3 Signal:Noise Antigen Ubv Occurrence 2 4 6 8 9 11 12 14 42 44 48 49 62 63 64 66 68 70 71 72 73 74 75 76 Ratio Her3.1 Y Y Y D S F F D F F K Q A D D D Y D A V D A A V 33 1 Her3.2 F Y Y V A F F D Y S K Q V D V D Y A F D D V Y Y 23 1 Her3.3 Y F Y A D F Y D V A K Q Y D D D F Y D V D S V S 28 1 Her3.4 A F F F D Y F D S Y K Q D D V Y D V V D A A A V 15 1 Y F Y D V F F D V S K Q V D V D D D V V A S D D 12 1 V F F S F Y F D F S K Q V D V S D A V V F D A Y 24 2 F F Y V V F F D S Y K Q D F F S I N L N I G G A 14 1 F F F S Y F F D S F F Q T E L D T E Y Q P D R G 13 1 V F Y A D F F D Y Y K Q S D V D Y D D D D D V V 23 1 Y Y F D F Y F D V S K Q D D A F F V V A D D D F 15 1 Y F F D F F V D V Y K Q A D V D F A A A D D D D 26 1 V F Y D S Y F D Y Y Y P V D V D D Y V D A V D D 20 1 Y Y Y V D F F D S F K Q A D V D F S S D V D V V 32 1 Y Y Y A D F F D V F A P D D Y D A D F D A A V S 11 1 Her3 Y Y Y V D F F D S F K Q V D D D V A D V Y A A D 30 1 Y Y F S S F Y D F F K Q D D A D V A A D V F D A 14 1 A Y F D D Y F D F S K Q D D V Y D Y S Y D S Y V 27 1 V F F V D F F D S F A Q A D D D F V V A D V F F 23 1 Y Y F A F F F D F Y K Q D D A A F D V V F S D S 25 1 F Y F S V F F D Y F K Q A D V F D D Y S V D F A 5 1 F F Y A F F F D A Y K Q F D S D Y V D D F V V D 33 1 V F S F D F F D V F K Q D D V D V V V D V S D A 33 3 V F F D D Y F D S Y K Q F D V D D Y A Y A D V V 25 1 A Y F Y D Y F D Y F K Q D D V Y D D Y V Y D D A 30 1 F Y F S V F A D S F K Q V D Y D Y D S S D A F V 19 1 V F F A S Y F D Y Y V Y A D V D Y D D D V D A A 22 1 Y Y Y S V F F D F S K Q D D Y D F Y Y D S D F V 14 1 V F F D F F F D Y F A H D A D D Y D F D S D S A 26 1 A F F D S F F D S Y S Y V D V Y D A V D D V V D 22 1 Grb2.2 D S V S D A V D A D Y P S T N V W M H P V N L G 23 8 Grb2.1 S S S F D F V D Y F A Y A D A D D Y Y V V Y F D 17 1 F D V Y D F D V Y F A Y F S A Y D F Y V V Y D A 14 4 Grb2 D Y Y F F F S Y A F Y Q V D D V V F V A D V D V 19 2 D Y Y F F F S Y A F Y Q V E D V V F V A D V D V 29 1 F Y F F F S Y V D Y F Q S F D F V F V F S F D Y 6 4 D S V Y D F V D V S D Y F Y D V Y F F D F D V V 7 1 F F D D D D Y D S D D S F S F D Y A V Y S S S V 6 1 F F D F F F Y V S V K Q W D W T G M H T L Y P W 5 10 BATF F Y Y D F S Y F F F F H A A A A D A A A A A A A 6 1 Y F D D F D Y V F V F P V A A A D V D Y V A A A 8 1 FOXO3 Y Y D S F Y F S A Y Y Y A D D D D F V V D A F Y 13 2 Goosecoid2 S D A V F D V Y Y Y K Q S D S F D Y D D Y V A F 10 9 MBP A V Y A A S A Y D F V S A F F S A F V D F V A D 8 41 D V F V D F F D V F K Q D D F D D Y F D Y D D A 14 1 V F F V F D V A Y S V S D D F F D A S V F V F D 11 2 Y V D V A Y S D F Y Y P D F Y D Y S Y D D V Y F 29 2 F Y V D Y V F D Y D Y P D S Y D F Y Y D A D V Y 11 1 Snail1 Y V D V D D Y F Y F V S D Y Y D Y Y Y D Y S S V 9 1 V Y F V D Y D D F Y K Q S F A Y V D D F S F A S 15 1 A Y D D Y D F V Y Y K Q D Y F D F F Y D D D Y D 22 9 F V D D Y D F V Y V K Q D Y V D F Y V F D Y D D 8 2 Y F D F Y D F D V Y D S F D Y V D D F D D Y D Y 15 1 D D V V A S F D D A K Q Y Y F F D V D F A D S F 9 2 ROR-alpha V V D S Y A V A V F F P D Y F F Y V Y F S A V S 10 8 Y Y V F F D D V F V F Q Y A Y V D Y F S Y Y V D 10 2

72

Region 1 Region 2 Region 3 Signal:Noise Antigen Ubv Occurrence 2 4 6 8 9 11 12 14 42 44 48 49 62 63 64 66 68 70 71 72 73 74 75 76 Ratio V D V A Y D F V Y D V Y Q K E T H V L R L R G G 18 2 V D A D Y V D Y F F D Y V D F F S V V D D V V D 9 3 D F V S F S D S F Y K Q D S V S Y A S V D S S A 7 2 Y F V D F S D D F Y K Q D S V S Y A V F D V D A 15 4 F D V F A D Y D F F K Q D V D V D A V V D A V D 8 1 D F V F S F D V V Y K Q D S F S V F D V S F S F 8 16 F F V V D V A D F Y K Q D V F S V S F Y Y D V S 6 2 F F D D A Y F D Y F A H F D F A Y S S D F F Y S 6 7 F F D D A Y F D Y F K Q Y S F A Y D Y A F A S F 11 24 FOXP1 F F D D A Y F D Y S K H Y D F A Y S S F D Y Y A 10 1 F F D D A Y F D Y S K Q Y D F A Y S S F D Y Y V 8 4 F F D D A Y F D F Y K Q Y D F S Y D F V Y F F D 6 3 F F D S S F F D F Y K Q Y S F A Y D F Y S S Y D 16 14 F F D V S Y F D F Y K Q F D F A Y S F S Y D S D 14 3 F F D F A F F D S F K Q F D F A F D V D A A Y V 9 1 F F D F A F F D S Y A Q Y D F A Y Y V D D F V D 8 10 F F D V F F F D V Y F Q F D F A Y D Y D S S V D 10 5 F F D S D F F D Y Y K Q F D F A Y F S D S S S D 7 1 F F D S D F F D Y Y K Q V D F A Y A F Y V D V V 10 5 SOX4 F Y D V F D S A Y D S P V V F A F Y F V A F F A 8 1 S S V F S F D S D S V Q L V K S L C S F L R F T 5 1 F F F A F F A F V Y K Q V F F A Y F Y V S S A F 7 1 D V F Y D Y V D A S V H D F Y D Y D D V F Y D F 7 1 D V V S F D V D S V K Q D F Y D Y F F D Y S F D 6 6 V A D F S F Y A S V K Q T C Q Q S G M R Y T N C 8 1 TCF3 A D Y V Y D F V F S K Q F F V F D V V D D V V A 6 1 S D V S D Y V S V F K Q V Y F D F A Y S V D Y F 6 1 Y Y F Y F A Y Y S V S H V V V F D D Y F A D Y S 6 1 D V F F V S F F Y F K Q Y V V V Y D D A D V D Y 6 1 A D A A D A A Y F F K Q V V V F D A Y F A S D F 5 1 V F S A F F D Y F V K Q S S F V V V S D V V A S 8 1 V F D Y F F S S Y Y K Q Y S F Y S D S D F F Y A 18 4 A F D S V D F V S F K Q S F F S A Y S S V S D Y 27 1 Y F F S Y Y S A V A K Q G W W G K G S S S A P G 6 1 S Y D S F Y V A Y F K Q D V Y V D D S D V F Y S 24 1 ZEB1 D S V D F F V A S S S S Y A Y D F D S D Y F Y S 16 1 F D F Y D D F Y F Y D P F S D Y Y D D A D V V V 11 2 F D F F D D V D F Y K Q S F Y Y Y D V D A S A Y 13 1 F D F S V D F F V S K Q V D Y Y Y D V V A S A A 20 4 A Y Y V V Y V A V S K Q Y D A A Y D A A V F Y D 13 3 A F D D Y D Y V S V S H G W F S Y G V V R R A V 7 1 F S D A F Y D F Y Y Y P A D S F A F V D S A F V 22 16 ZEB2 F F D S S D D F Y Y Y P D S F Y V F S F A D V D 17 1 F A D A A F D V A F K Q S S F Y V F S D D F V F 15 3

Figure 3.6 Sequences of binding Ubvs Sequences are shown for positions diversified in the library design and grey shading indicates at least 50% identity amongst clones binding to a given antigen. “Signal:noise” indicates the ratio of the phage ELISA signals for binding to the antigen and an irrelevant protein. “Count” indicates the number of times the sequence occurred amongst sequenced binding clones.

3.4.4 Characterization of Ubvs targeting Her3

Her3 is a cell-surface receptor tyrosine kinase and is of great biological significance both in development (Britsch, Li et al. 1998, Schmucker, Ader et al. 2003, Stern 2008, Buac, Xu et al. 2009) and cancer biology (Stern 2008, Amin, Campbell et al. 2010, Campbell, Amin et al. 2010). Due to its importance in biology, I sought to generate binding Ubvs to the extracellular domain (ECD) of Her3. I identified 29 unique Her3 binding Ubvs that showed conservation at

73

many positions (Fig. 3.6), suggesting that they bind to a common epitope. Based on relative affinities determined by competitive phage ELISAs (Fig. 3.7), the four tightest binders (Her3.1- Her3.4) were chosen for further characterization as purified proteins.

100 250nM 100nM 25nM 0nM

50 % Phage Binding

0

Her3.1Her3.2Her3.3Her3.4Her3.5Her3.6Her3.7Her3.8Her3.9 Her3.10Her3.11Her3.12Her3.13Her3.14Her3.15Her3.16Her3.17Her3.18Her3.19Her3.20Her3.21Her3.22Her3.23Her3.24Her3.25Her3.26Her3.27Her3.28Her3.29Her3.30 Variant ID

Figure 3.7 Competitive phage ELISAs for Her3 binding Ubvs. Thirty unique phage clones were preincubated with 250 nM, 100 nM, 25 nM or 0 nM of recombinant Her3 in solution, and then transferred to ELISA plates with immobilized Her3. Binding to immobilized Her3 was detected and binding signals were normalized to signal obtained with 0 nM Her3 preincubation.

First, I performed dose response binding ELISA against Her3 to obtain relative affinities for the four purified Her3 selected Ubvs (Ubv-Her3.1 – 3.4). The half-maximum effective concentration (EC50) was calculated to be the lowest, and hence the highest affinity, for Ubv-

Her3.2 (EC50 = 90 nM) (Fig. 3.8A). I also conducted binding ELISA against all four ErbB members and confirmed that the four Her3 selected Ubvs bound only to Her3 (Fig. 3.8B), despite homology between EGFR members (Her3 to EGFR identity: 44%, Her3 to Her2 identity: 42%, Her3 to Her4 identity: 57%). Next, I conducted a competition ELISA with the Her3 ligand Neregulin (NRG) (Carraway, Sliwkowski et al. 1994). In the presence of NRG, Ubvs bound Her3 with similar intensities as in the absence of NRG, indicating Ubvs do not compete with NRG binding (Fig. 3.8C). Only Ubv-Her3.2 was carried forward, as it bound Her3 with the highest affinity according to dose response ELISA. To measure the affinity of Ubv- Her3.2 for Her3, we studied binding kinetics by surface plasmon resonance (SPR). While an exact affinity value could not be determined because the Langmiur binding model cannot

74

accurately fit the SPR binding data, it nonetheless provided us with an approximate affinity measurement in the low nM range (Fig. 3.9).

.

A B C

0.8 EGFR 150 BSA Her3.1 EC50 1.5 Her3.2 Her2 NRG Her3 Her3.3 0.6 Her3.4 Her4 100 1.0 Ubwt BSA 0.4 50 0.5 0.2 Percent Binding Absorbance 450nm Absorbance 450nm 0.0 0 0 0.001 0.01 0.1 1 10 Her Her Her Her Ubwt Her Her Her Her Fab 3.1 3.2 3.3 3.4 3.1 3.2 3.3 3.4 D5

Figure 3.8 Analysis of purified Her-binding Ubvs.

(A) ELISAs for Ubvs binding to immobilized Her3. EC50 was determined as the Ubv concentration at which 50% of the saturation signal is reached for binding to immobilized Her3. (B) Specificity ELISAs for Ubvs binding to the human ErbB family members. (C) Competitive ELISA for Ubvs binding to immobilized Her3 in the presence or absence of the Her3 ligand NRG. Fab D5 is an antibody that competes with NRG and served as a positive control.

400

300 U R 200

100

0 120 240 360 480 600 . Time(s)

Figure 3.9 Kinetic analysis of Ubv-Her3.2 binding to Her3 Sensorgram showing binding to immobilized Her3 by 2-fold serial dilutions of Ubv-Her3.2 starting at 50 nM. Colored lines represent actual binding data and black lines represent fits by Langmiur binding model.

75

Table 3.3 Kinetic analysis of Ubv binding to corresponding targets.

-1 ka(M Apparent IC50 -1 -1 Ubv Target sec ) kd(sec ) KD(M) (M) Her3.2 Her3 ECD 1.7E+06 1.5E-03 4.7E-09 Grb2.1 Grb2-SH2 1.6E+06 2.5E-03 1.6E-09 NA Grb2.2 Grb2-SH2 7.4E+05 7.5E-04 1.0E-09 1.5E-08 Grb2.1-2.2 L12 Grb2-SH2 1.0E+06 8.3E-04 8.3E-10 7E-09 Grb2.1-2.2 L16 Grb2-SH2 1.0E+06 1.0E-03 1.0E-09 Grb2.1-2.2 L20 Grb2-SH2 7.8E+05 1.1E-03 1.4E-09 Grb2.2-2.1 L12 Grb2-SH2 1.3E+06 3.6E-04 2.8E-10 Grb2.2-2.1 L16 Grb2-SH2 9.8E+05 4.6E-04 4.7E-10 Grb2.2-2.1 L20 Grb2-SH2 9.5E+05 5.1E-04 5.3E-10

To determine whether the anti-Her3 Ubvs can recognize Her3 specifically in the cellular context, I performed immunoprecipitation (IP) with Ubv-Her3.2. IP was conducted with lysates from four HEK293T cell lines that overexpressed individual ErbB family members, and four cell lines that expressed multiple ErbB family members (Fig. 3.10A). Western blotting analysis of IPs from HEK293T overexpression cells showed that UbV-Her3.2 effectively pulled down Her3, but not other ErbB family members (Fig. 3.10B), confirming specificity. For HEK293T cells that overexpressed Her2 and Her3, which result in constitutive heterodimerization (Holbro, Beerli et al. 2003, Amin, Campbell et al. 2010), IP of Her3 resulted in co-IP of Her2. Similarly, for the three cancer cell lines (A431, MCF7 and SkBr3), which expressed Her3 in combination with other ErbB members, IP of Her3 resulted in co-IP of other ErbB family members (Fig. 3.10B). Taken together, these results show that Ubv-Her3.2 is Her3 specific and can be used to IP Her3 and to co-IP associated proteins to map Her3 receptor interactions.

76

A 293T

EGFRHer2 Her3 Her4 Her2/3UT MCF7SkBr3A431 EGFR

Her2

Her3lo

Her3hi

Her4

GAPDH

B 293T EGFR Her2 Her3 Her4 Her2/3 UT MCF7 SkBr3 A431 UbV: 3.2 wt 3.2 wt 3.2 wt 3.2 wt 3.2 wt 3.2 wt WCL 3.2 wt 3.2 wt 3.2 wt EGFR

Her2

Her3 IP:GST Her4

GST -Ubv

Figure 3.10 Ubv-Her3.2 as a tool to immunoprecipitate Her3 and co-immunoprecipitate Her3 interacting Her members from cell lysates (A) Profiling of ErbB family members from nine cell lines by . Superscripts “hi” and “lo” denote high and low exposure for blot development, respectively. GAPDH serves as loading control. (B) Western blot analysis of IPs with GST Ubv-Her3.2. Ub.wt serves as a negative control. WCL indicates whole cell lysate that had the expression of the ErbB member being probed for that particular blot.

3.4.5 Characterization of Ubvs targeting the Grb2 SH2 domain.

Grb2 is an intracellular adaptor molecule with one SH2 domain flanked by two SH3 domains (Lowenstein, Daly et al. 1992), and it is central to transducing signals resulting from activation of receptor tyrosine kinases (RTKs) (e.g. EGFR, c-MET and FGFR) into intracellular signaling such as the MAPK pathway (Tari and Lopez-Berestein 2001). Selections against the Grb2 SH2 domain (Grb2-SH2) yielded on binder (Ubv-Grb2.1) that originated from library NNK and four binders (Ubv-Grb2.2-5) that originated from library KHT (Fig. 3.6). The binders derived from library KHT exhibited sequence similarities that suggested that they likely bind the same epitope, while the binder derived from library NNK exhibited a unique sequence (Fig. 3.6).

77

Ubv-Grb2.1 and Ubv-Grb2.2 had the highest relative affinities as determined by competitive phage ELISA (Fig. 3.11) and thus were selected for purification and further characterization.

100 250nM 100nM 25nM 0nM

50 % Phage Binding

0

Grb2.1 Grb2.2 Grb2.3 Grb2.4 Grb2.5 Grb2.6 Grb2.7 Variant ID

Figure 3.11 Competition phage ELISA for Grb2 selected Ubvs.

Ubv-phage were preincubated with 250 nM, 100 nM, 25 nM or 0 nM of Grb2-SH2 in solution, and then transferred to ELISA plates with immobilized Grb2-SH2. Binding to immobilized Grb2-SH2 was measured and binding signals were normalized to signal obtained with 0 nM Grb2-SH2 preincubation.

I first assessed binding of each Ubv to Grb2-SH2 and the two SH2 domains from GRAP and GADS, which are most closely related to Grb2-SH2, exhibiting 63% and 56% identity, respectively (Liu, Jablonowski et al. 2006). ELISAs revealed that Ubv-Grb2.1 is highly specific for Grb2-SH2 while Ubv-Grb2.2 exhibited low but detectable binding to the other two SH2 domains (Fig. 3.12A). This suggested that Grb2.1 bound to a non-conserved epitope on Grb2- SH2 while Ubv-Grb2.2 bound to an epitope that was somewhat conserved amongst the three SH2 domains. Mapping of sequence conservation onto the structure of Grb2-SH2 revealed that many of the residues that are identical across all three SH2 domains are located in the ligand- binding site (Fig. 3.12B), suggesting that Ubv-Grb2.2 likely binds to this site and may thus act as an antagonist of ligand binding. Indeed, a competition ELISA confirmed that Ubv-Grb2.2 inhibited binding of Grb2-SH2 to an immobilized phospho-Tyrosine (p-Tyr) peptide ligand with an IC50 value of 16 nM, while Ubv.Grb2.1 had no effect (Fig. 3.12C).

78

A C 1.5 150

Grb2.1 g

n Grb2.1 Grb2.2 i

nd Grb2.2 i m B

n Ub.wt Peptide 1.0 2 100 H 450 S

ce n

a IC :30M

b 50 r

o 0.5 50 s

b IC50:16nM A pTyr-peptide- 0.0 % 0 Grb2 GADS GRAP 0.0001 0.01 1 100 Immobilized SH2 domain [Ubv] M B

180o

Figure 3.12 Analysis of purified Grb2-binding Ubvs (A) ELISAs for Ubvs binding to immobilized SH2 domains of Grb2, GADS and GRAP. (B) Surface representation of Grb2-SH2 structure (PDB entry: 1JYR). Identical residues between GRAP, GADS and Grb2 are shaded in blue. p-Tyr peptide ligand is shown in yellow. (C) Dose response curve for Grb2-SH2 binding to immobilized p-Tyr SHC peptide. Ubv-Grb2.1 and non- biotinylated P-Tyr SHC peptide inhibit Grb2-SH2 binding to immobilized peptide with IC50 of 16 nM and 30 μM, respectively. (N = 3).

To assess whether the Ubvs could bind to full-length Grb2 in cells, I immunoprecipitated intracellularly expressed Ubvs from HEK293T cells and detected Grb2 by western blotting. In addition, I performed the same IP experiment with HEK293T cells overexpressing 3xFLAG- Grb2 (referred to as 293T-3FGrb2). These experiments showed that IP of Ubv-Grb2.1 and Ubv-Grb2.2 co-immunoprecipitated overexpressed 3xFLAG-Grb2 but no detectable levels of endogenous Grb2 (Fig. 3.13), which was expressed at a much lower level (see expression comparison in anti-Grb2 western blots in Fig. 3.13). Importantly, a negative control Ubv that bound to maltose binding protein, Ubv-MBP.1, did not bind Grb2 (Fig. 3.13). Hence, Grb2-SH2 binding Ubvs recognize full length Grb2 intracellularly but affinity was apparently not sufficient to pull down the less abundant endogenous Grb2. To boost affinity by avidity, I took advantage 79

of the small size of Ubvs and linked them in a modular fashion, which has been shown as a viable strategy to improve target binding in cells (Grebien, Hantschel et al. 2011).

MBP.1 Grb2.1 Grb2.2 Grb2.1 Grb2.2 Grb2.2 Grb2.1 Linker Length: 20 16 12 20 16 12 IB:3xFLAG Grb2

IB:Endo Grb2 IP:GST

IB:GST-Ubv

3xFLAG Grb2 WCL Grb2

Figure 3.13 Grb2 immunoprecipitation by Ubvs Co-IP of transiently expressed Grb2 (3xFLAG Grb2) or endogenous Grb2 (Endo Grb2) with Ubvs. Indicated Ubvs were expressed in HEK293T cells or in HEK293T cells over-expressing Grb2 with an N-terminal triple-FLAG epitope fusion. Grb2.1 – Grb2.2 and G2.2 – Grb2.1 indicate tandem fusions of Grb2.1 and Grb2.2 with Grb2.1 at the N- or C-terminus, respectively, and the numbers “20”, “16” and “12” indicate intervening linkers of 20 (GGSGGSGGSGGSGGSGGSGG), 16 (GGSGGSGGSGGSGGSG) or 12 (GGSGGSGGSGGS) residues, respectively. “MBP.1” indicates an Ubv that binds to MBP, which was used as a negative control. Precipitated endogenous Grb2, transiently expressed Grb2 and GST-Ubv fusions were detected by western blotting with anti-Grb2, anti-FLAG or anti-GST antibodies, respectively. “WCL” indicates western blotting of the whole cell lysate with an anti-Grb2 antibody to compare relative levels of transiently expressed and endogenous Grb2.

Plasmids designed to express Ubvs linked in different orientations (Grb2.1-Grb2.2 or Grb2.2- Grb2.1) with Gly-Gly-Ser linkers at different lengths (12, 16, or 20 residues) were transfected into HEK293T or 293T-3FGrb2 cells, and IP experiments were conducted. Compared to unlinked Ubvs, linked Ubvs immunoprecipitated 3XFLAG-Grb2 much more effectively, but linker length had no effect on binding (Fig. 3.13). Notably, the Ubvs in the Grb2.1-Grb2.2 orientation pulled down endogenous Grb2 much more efficiently than those in the Grb2.2- Grb2.1 orientation. Interestingly, surface plasmon resonance (SPR) did not show enhanced affinities for the Ubv linked in the Grb2.1-Grb2.2 orientation (Fig. 3.14 D-F and Table 3.3), and only showed two- to three-fold enhancements for Ubvs in the Grb2.2-Grb2 orientation (Fig. 3.14 G-I and Table 3.3), suggesting that other cellular factors might be contributing to the vastly improved interaction observed by IP. Since, the Ubvs in the Grb2.2-Grb2.1 orientation exhibited the best IP efficiency and had improved affinity by SPR, I chose to conduct further intracellular 80

studies with the Ubv in the Grb2.1-Grb2.2 with the shortest linkage of 12 residues (referred to as Ubv-L12).

A Grb2.1 B Grb2.2 C Ub.wt KD 1.0E-09M KD 1.6E-09M 100 5 40 RU RU 50 RU 0 20

-5 0 0 0 100 200 300 400 0 100 200 300 400 0 100 200 300 400 Time (S) Time (S) Time (S) D E F Grb2.1-Grb2.2-L12 Grb2.1-Grb2.2-L16 Grb2.1-Grb2.2-L20 K 8.3E-10M 90 K 1.0E-9M K 1.4E-09M 90 D D 80 D

60 60

RU RU RU 40 30 30

0 0 0

0 100 200 300 400 0 100 200 300 400 0 100 200 300 400 Time (S) Time (S) Time (S) G H I Grb2.2-Grb2.1-L12 Grb2.2-Grb2.1-L16 Grb2.2-Grb2.1-L20 K 2.8E-10M K 4.7E-10M K 5.3E-10M 120 D 120 D D 100 RU RU RU 60 60 50

0 0 0 0 100 200 300 400 0 100 200 300 400 0 100 200 300 400 Time (S) Time (S) Time (S)

Figure 3.14 Kinetic analysis of Ubvs binding to Grb2-SH2 Sensorgrams showing binding to immobilized Grb2-SH2 by serial diluted (A) Ubv-Grb2.1 (B) Ubv-Grb2.2 (C) Ub.wt (D - F) Ubv-Grb2.1-Grb2.2 fusion with a 12, 16, 20 residue linker, respectively (G-I) Ubv-Grb2.2-Grb2.1 fusion with a 12, 16, 20 residue linker, respectively. Ubv concentrations assayed were 50 nM (magenta), 25 nM (cyan), 12.5 nM (navy), 6.3 nM (green), and 3.2 nM (purple), and the black line represents the fit by 1:1 Langmiur binding model.

Since Ubv-L12 had improved Grb2 binding capacity inside cells, and retained the ability to interfere with the interaction between Grb2-SH2 and p-Tyr peptide ligand (Fig. 3.15A), I rationalized that it may have functional effects inside cells. First, I assessed whether Ubv-L12 could block Grb2 from localizing to the cell membrane by virtue of Grb2-SH2 and RTK cytoplasmic tail interactions (Lowenstein, Daly et al. 1992, Rozakis-Adcock, Fernley et al. 1993, Margolis and Skolnik 1994). HEK293T fractionation experiment showed that cells expressing Ubv-L12 had reduced endogenous Grb2 levels in the membrane fraction compared 81

to mock transfected cells or cells expressing Ubv.MBP.1 (FIG. 3.15B), indicating that Ubv-L12 binds to Grb2-SH2 in cells and competes with endogenous p-Tyr ligands.

A 150 Grb2.1 g n i Grb2.2

nd B i L12 B Mock MBP.1 L12

2 Peptide

H 100 Grb2

S MEB Na+K+ATPase IC50:30M Grb2 50 IC50:16nM HSP90 WCL pTyr-peptide-

% IC50:6nM GST-Ubv 0 0.0001 0.01 1 100 [Ubv] M Figure 3.15 Inhibition of the interaction of Grb2-SH2 and p-Tyr peptide ligand by Ubvs.

(A) Competition ELISAs to assess inhibition of Grb2-SH2 binding to immobilized p-Tyr peptide by the indicated Ubvs. (B) Effects of the intracellular expression of Ubv-L12 on the levels of Grb2 in the membrane fraction (MEB) of HEK293T cells. WCL indicate whole cell lysate. Na+K+ATPase and HSP90 serve as loading control for MEB and WCL, respectively.

Next, I investigated whether Ubv-L12 can antagonize Grb2 mediated signaling in the EGFR pathway. The SH3 domain of Grb2 is known to complex with Sons of Sevenless1, which is a guanine nucleotide exchange factor of the membrane anchored RAS. Upon EGFR stimulation, the EGFR C-terminal tail is phosphorylated and recruits the Grb2:SOS1 complex to the membrane through the Grb2-SH2, and this in turn activates RAS and downstream MAPK pathway (Lowenstein, Daly et al. 1992, Margolis and Skolnik 1994). I reconstituted the EGFR system in HEK293T cells with transfection of a plasmid that expressed EGFR (referred to as 293T-EGFR). 293T-EGFR cells responded to EGF stimulation leading to phosphorylation of EGFR and the downstream kinases MEK and ERK (Fig. 3.16). Grb2 depletion by siRNA led to a reduction in pERK and pMEK (Fig. 3.16), which has been observed in other cell lines (Huang and Sorkin 2005, Modi, Li et al. 2011, Qu, Chen et al. 2014). Grb2 depletion also resulted in a decrease in pEGFR levels. Although, to our knowledge this has never been reported before, it has been reported that knockdown of Grb2 reduces phosphorylation of the RTK, FGFR2 (Lin, Melo et al. 2012, Ahmed, Lin et al. 2013). Interestingly, Grb2 knockdown also reduced total EGFR levels in 293T cells (Fig. 3.16), and this may be a feature of an artificial EGFR system

82

introduced in HEK293T cells, as this has not been reported in other cell lines. Under EGF stimulation, cells that expressed Ubv-L12 exhibited a reduction in levels of pEGFR, total EGFR, pMEK and pERK (Fig. 3.16). Reductions in pMEK and pERK levels by Ubv-L12 were not as robust as siGrb2 treatment, but nonetheless, the effects of Ubv-L12 phenocopied those of an siRNA targeting Grb2. Under unstimulated conditions, Ubv-L12 expression reduced levels of pEGFR, total EGFR and pERK similarly to siGrb2 (Fig. 3.16). Taken together, these results show that in cells, Ubv-L12 binds to Grb2, competes with endogenous p-Tyr ligands, and this in turn reduces Grb2 membrane association and antagonizes Grb2 mediated signaling.

+EGF -EGF

L12 MBP.1si-Grb2Mock L12 MBP.1si-Grb2Mock Grb2 MEB Na+K+ATPase

+EGF -EGF

L12 MBP.1 si-Grb2 Mock L12 MBP.1 si-Grb2 Mock EGFR EGFR pY1173 pMEK 0.73 1.13 0.59 1.0 total MEK

pERKl lo WCL 0.73 1.07 0.64 1.0

pERK hi 0.62 1.1 0.60 1.0 Total ERK

Grb2

HSP90

GST

Figure 3.16 Effects of Ubv-L12 on Grb2-mediated EGFR signaling.

Effects of Ubv-L12 on Grb2-mediated EGFR signaling. Both membrane fraction (MEB) and whole cell lysate (WCL) were isolated from the same samples. Cells were either mock transfected (Mock), transfected with siRNA designed to knock down Grb2 (siGrb2), or transfected with plasmid encoding for GST fusion of Ubv-MBP.1 (MBP.1), or GST fusion of Ubv-L12 (L12). Cells treated with or without EGF treatment are denoted by “+EGF” or “-

83

EGF”, respectively. Blotting of Na+K+ATPase and HSP90 were used as loading controls for the membrane fraction or whole cell lysate samples, respectively. Superscripts “hi” and “lo” denote high or low exposure for blot development, respectively. Numbers are quantification of signal intensities of bands above and were normalized to mock transfection condition.

3.5 Discussion

I have described the design, construction and validation of a highly diverse combinatorial library that co-opted an expansive surface consisting of 24 residues within the 76-residue Ub protein. Our mutation tolerance scan and anti-FLAG stability selection methods proved to increase display levels substantially. By reducing the proportion of non-displayed variants, we effectively improved the functional diversity of the library, a feature that has been positively correlated with selection outcomes (Kramer, Cox et al. 2003). Furthermore, we have demonstrated that Ubvs derived from my naïve Ub libraries can be used as affinity reagents. Notably, Ubv-Her3.2 was very effective recognizing Her3 in IP experiments and could be used to co-IP binding partners, which could be useful in mapping protein-protein interactions. In addition, I have demonstrated that Ubvs can be expressed in cells to perturb signaling, which is an invaluable tool for uncovering protein function inside cells.

Unlike previously reported Ub libraries, I have made a library that utilizes a restricted diversity biased in favor of Tyr and Ser. Minimalistic Fab and FN3 libraries with loops biased towards Tyr and Ser are well known to form high affinity and specific interactions targeting diverse protein folds (Fellouse, Wiesmann et al. 2004, Fellouse, Li et al. 2005, Fellouse, Barthelemy et al. 2006, Gilbreth, Esaki et al. 2008, Koide and Sidhu 2009). It was thought that the binary diversity was sufficient because flexible loops have additional conformational diversity, which aids in forming complementarity shapes to binding antigen. However, my current study shows that even a β-sheet, which has much reduced conformational flexibility, can still form high affinity interactions with restricted chemical diversity. Whether Tyr residues contribute the bulk of the binding energy in Ubvs, like in the antigen-binding sites of antibodies (Fellouse, Barthelemy et al. 2006), remains to be determined.

Engineered binding proteins lacking disulfides are conducive for intracellular applications, and hold great potential for basic research. While there have been some successes with engineered intracellular Abs (intrabodies), it still remains very challenging due to folding 84

issues of Abs in the cytoplasmic environment. Attempts to improve folding have been very time-consuming and relied on trial and error approaches with no standardized procedures (Marschall, Dubel et al. 2015). Others (Mirecka, Hey et al. 2009, Grebien, Hantschel et al. 2011, Parizek, Kummer et al. 2012, Ernst, Avvakumov et al. 2013, Shibasaki, Karasaki et al. 2014) and I have demonstrated the ease of producing functional intracellular binders from alternative scaffolds. Once a high quality scaffold library is obtained, one can have a functional binder in as little as four weeks, given all downstream procedures and assays have been optimized. Due to the specificity of protein binders, there should be a much-reduced chance of off target effects compared to small molecule tools or siRNA for probing protein function. As such, binders derived from alternative scaffolds are ideal for uncovering novel protein functions, as they can be rapidly generated and should have lower off target effects when compared to other tools such as small molecule or siRNA. In addition, in the context of drug target validation, direct protein inhibition or activation by designed binders maybe a superior strategy than siRNA or CRISPR/Cas methods, which rely on abolishing expression of the target protein, and do not accurately mimic drug mediated mechanisms. With constant innovations in technology, it is hoped that the direct delivery of proteins into cells will become a reality in the near future.

Another advantage of alternative scaffolds is that due to their small size, they can be linked in a modular fashion to target different epitopes on the same target to boost affinity, which I demonstrated by the linkage of two Grb2-binding Ubvs. It will also be feasible to fuse Ubvs with differing functions or different targets to produce bi-functional molecules. Notably, heterologous fusion Ubvs can be linked in one genetic fusion owing to their small size, which alleviates the problem of producing mis-paired molecules, like those observed in the production of bi-specific antibodies (De Lau, Heije et al. 1991, Klein, Sustmann et al. 2012).

Finally, as we move further into the post-genomic era, elucidation of protein function will become increasingly more important in the context of disease and drug development. In basic science, intracellular perturbation tools derived from alternative scaffolds can fill the long- standing gap of selectively inhibiting not only enzymes but also protein-protein interactions. In addition to intracellular application, affinity reagents based on alternative scaffolds exceed antibodies in the areas of production and modification for multifunctional purposes. Given that protein folds are highly diverse, and different scaffolds offer different geometries for binding,

85

the scientific community may benefit from expanding the repertoire of scaffolds to tackle the problem of identifying the tightest and most specific affinity reagent to targets of interest.

86

4 Conclusions and Future Perspectives

A long-standing goal in protein science has been to understand how the observable characteristics of an interaction (i.e. affinity and specificity) are explained by the molecular features of the proteins involved. Understanding these molecular underpinnings is the foundation for PPI predictions and molecular design, which encompasses both proteins and small molecules. A deeper understanding will continue to help refine current methodologies.

In Chapter 2 of this thesis, I first explored the molecular landscape of Ubv interactions with USP2 and USP21, and also deduced insights into how wt Ub is recognized. In Chapter 3, I utilized Ub as an alternative scaffold and used previously identified molecular insights to design a phage-displayed Ubv library for the isolation of binding molecules to targets of interest. This library was successful in deriving functional binding reagents against numerous protein targets, and two of these targets, Her3 and Grb2, were investigated in detail.

4.1 Ubvs in saturation scanning

Although many structural studies have elucidated the residues that participate in Ub interactions, it still remains unclear how Ub can recognize diverse protein folds using a single surface composed mainly of a β-sheet. In particular, what are the Ub residues responsible for energetically driving interaction and how are they distributed within the binding surface? How do the functional residues differ among interactions to different family folds and how do they differ between interactions to members within the same family? In Chapter 2, I begin to tackle these questions by using phage display saturation scanning to map in detail the binding landscape of enhanced affinity Ubvs to two DUBs, USP2 and USP21. Since Ub.wt binds partner proteins with very low affinity, interactions mediated by Ub.wt are intractable for saturation scanning, as it is difficult to accurately measure substitution associated affinity changes of a weak interaction. As such, engineered Ubvs with few substitutions that recognize binding partners with enhanced affinity and preserve Ub.wt binding mode (Fig. 2.9) serve as ideal substitutes for Ub.wt in saturation scanning. 87

Using saturation scanning, I assessed the functional importance of each amino acid substitution on the entire Ubv surface that becomes buried when bound to USP2 or USP21. This comprehensive assessment revealed three different categories of residues across the binding surface (Fig.2.3): residues that were intolerant to mutation and thus were already optimized for binding, residues that exhibited no preferences, suggesting they do not energetically contribute to the interaction, and residues that were highly conserved but favoured residues other than wt, representing positions that are functionally important for binding but not optimized.

From the scanning data, I identified a common functional epitope on Ubvs that is important for binding to their cognate USPs (Fig. 2.3), and this docks onto a USP region (site-2) that is distinct from the active side (site-1). I also proved that site-2 on USP2 is critical for recognition and hydrolysis of native Ub substrates, thus supporting the supposition that Ubvs are viable proxies for Ub.wt and that insights derived from Ubv saturation scanning are applicable to Ub.wt. Additionally, from USP sequence alignments, I determined that the USP site-2 residues are strongly conserved amongst the USP family indicating the Ubv functional core likely makes similar energetic contributions in many other USP interactions. Importantly, I identified a cluster of Ubv residues within the functional core that are different from wt and also different between the two scans (Fig. 2.10, Patch B non-wt cluster). This cluster reflects the different binding preferences of the two investigated USPs and we hypothesize that it can have significant implications in designing specific drugs that target site-2 rather than the highly conserved catalytic site.

This study using Ubvs as proxies for Ub.wt in saturation scanning paves the way for more Ub PPIs analysis. By identifying Ub functional residues for binding to different family folds, we will gain a clearer picture of how Ub uses the same surface to bind diverse protein folds. It is likely that different Ub-binding protein folds (e.g. UBC, UIM and HECT domains), which all recognize Ub, will utilize different functional residues within the overlapping buried surface on Ub for recognition. Identification of hot spots for each Ub interaction will undoubtedly provide valuable information on the epitope that should be targeted for potential therapy. With a complete hot spot database for all Ub binding partners, one can focus drug design on epitopes that are more likely to compete with Ub binding and use structural information to rationally add elements to small molecules to gain affinity and specificity towards a particular Ub-binding protein. In addition, the vast molecular knowledge on Ub

88

interactions will undoubtedly uncover trends regarding Ub biology that are not observable with few samples. In the context of broader protein science, the vast molecular knowledge will aid the development of computational algorithms for PPI prediction and protein design.

4.1.1 Technical improvements to saturation scanning

Prior to expanding the saturation scan methodologies to other Ub interactions, some technical adjustments should be made to improve the accuracy of the scanning results. As shown in Fig. 2.3 and Fig. 2.4 not all highly conserved substitutions revealed in the scan actually translated to a gain in affinity (e.g. R72W and R42G), and I attributed this to expression or display biases that are intrinsic to phage display technique. To reduce biases, it would be useful to conduct another selection in parallel against anti-FLAG Ab (Ubvs are FLAG tagged within the phage libraries) or an Ab that recognizes the other side of Ub. Sequences enriched in this selection will reflect sequences that are more highly displayed or expressed, and can be subtracted from the saturation scan frequencies to correct for biases. In addition, deep sequencing the phage libraries prior and post selections will ensure the libraries indeed cover the entire sequence space, and give the scanning results more statistical confidence.

4.2 Naïve Ub library

In Chapter 3, I utilized Ub as an alternative scaffold and designed a phage-displayed Ubv library for the isolation of binding molecules to targets of interest. I showed that a minimalistic randomization strategy biased towards Tyr and Ser is feasible for binding site formation on a β- sheet secondary structure. Previous minimalistic libraries (Fab and Monobody) utilized loops for binding surfaces, and it was suggested that the conformational flexibility of loops enables diverse binding interactions to form with limited chemical diversities (Koide, Gilbreth et al. 2007). However, the success of the Ub library presented in this thesis suggests that loop based binding surfaces are not essential for forming interactions to diverse protein topologies, because small scaffolds with restricted conformationally flexible binding surfaces are still capable of using minimalistic chemical diversities to form diverse interactions, suggesting that conformational diversity is not necessary to compensate for restricted chemical diversities in forming binding surfaces to diverse Ags.

89

Using this navie Ubv combinatorial library, I generated binding proteins to 30 of the 40 proteins tested. Of the 37 proteins that were also subjected to selections with Fab library F, the Ubv libraries performed comparably to Library F as I was able to generate binders to 28 and 26 of targets with Ubv and Fab libraries, respectively. With the characterization of Ubvs that bound to Her3, I proved that Ubvs can be used in place of antibodies in biochemical experiments such as IPs and could potentially be used to map PPI network. In addition, I proved that Grb2-binding Ubvs could be used as intracellular tools to modulate signaling, which is a function that is not easily achieved with antibodies.

4.2.1 Ubvs targeting transcription factors

Many of the proteins subjected to Ub library selection within this thesis were DNA binding domains (DBDs). DBDs are found in transcription factors (TFs), and the human genome is predicted to encode ~1900 DNA binding TFs (Vaquerizas, Kummerfeld et al. 2009). TFs are critical to all cellular functions as they control transcriptional regulation of genes. Although Ubvs targeting DBDs were not extensively characterized in this thesis, I would like to dedicate a section to explain how such tools could be applied to uncover TF biology.

Within the field of TF biology, it has been difficult to accurately describe DNA binding specificities of DBDs. Many HTP techniques, such as chromatin immunoprecipitation sequencing (ChIP-Seq), have been developed to identify binding motifs (Geertz and Maerkl 2010, Ogawa and Biggin 2012). However, a TF can exhibit different DNA binding modes as binding is often complicated by combinatorial factors such as having multiple DBDs linked within a TF, and consequently DNA binding can be mediated by a single DBD or combinations of DBDs within a single TF (Siggers and Gordân 2014). This is exemplified by the Oct1 TF, which has two DBDs, a Pit-Oct-Unc (POU) homeodomain (POUHD) and a POU-specific domain

(POUS), and can singularly or combinatorially bind to a total of three different DNA motifs (Verrijzer, Alkema et al. 1992, Klemm, Rould et al. 1994). Additionally, many TFs are capable of homodimerization or heterodimerization with other TFs, which can alter TF binding specificity (Siggers and Gordân 2014). While it is possible to identify the existence of different DNA recognition modes from distinct binding motifs generated from HTP methods, it is possible that differences can go undetected due to technical limitations, and even if detected, the data do not provide insights into what contributes to the different binding modes.

90

One method to dissect DBD contributions to TF binding is to intracellularly express a designed Ubv that inhibits a DBD from binding DNA. By comparing ChIP-Seq generated binding motifs of cells expressing protein inhibitors of different DBDs of a TF, one can determine the DNA binding contributions from individual DBDs. In addition, expression of a protein inhibitor that recognizes homodimerization or heterodimerization surfaces that can disrupt TF complex formation can shift the DNA binding specificity to that of the monomeric state. Application of affinity reagents that target and inhibit DBDs may also shed light on the functional roles of each DBD in cellular processes by enabling observation of phenotypic outcomes mediated by selectively manipulating an individual DBD. While ChIP-seq of genetically engineered TFs with missing DBDs or mutations that disrupt homodimerization or heterodimization can in principle be used to answer the same questions, protein inhibitors have the advantage of modulating endogenous TFs. Engineered TFs may have compromised folds or may not express at endogenous levels and thus might not provide an accurate depiction of binding specificities of native TFs.

4.2.2 Validation of TFs as drug targets

Many TFs, including JUN, MYC and p53, have long been recognized as critical and central players in cancer and immune diseases, as TFs are often converging points of pathogenic signaling (Bhagwat and Vakoc 2015, Fontaine, Overman et al. 2015, Lazo and Sharlow 2016). However, just like proteins that function by PPIs, TF has been considered to be intractable for drug intervention due the large binding surfaces that lack obvious small molecule binding pockets (Fontaine, Overman et al. 2015, Lazo and Sharlow 2016). Although possible, it remains difficult to develop small molecules to disrupt PPI or protein-DNA interactions (Scott, Bayly et al. 2016), and thus it may be beneficial to first apply TF-binding Ubvs to determine if disruption of a particular TF interaction will truly lead to a desired outcome in in vitro disease models before dedicating resources and time to the development of small molecules. TF modulation by Ubvs would be particularly useful for cases where overexpression of a TF has generated conflicting results in both support and repression of malignancy (Sporn and Liby 2012). In addition, if expression of a Ubv that inhibits a particular DBD does indeed result in therapeutic benefit, then it may be useful to solve the structure of the Ubv-DBD

91

complex. The structure can offer clues as to what epitope on a DBD should be targeted by small molecules and the Ubv residues responsible for engaging DBD maybe used as a template for drug design.

While most TFs are difficult to drug, one class of TFs that have been successfully targeted for therapy is the nuclear hormone receptor (NHR) family. In addition to DBDs, NHRs also have ligand binding domains that are conducive to small molecule binding (Bhagwat and Vakoc 2015). However, resistance to drugs targeting NHR frequently develop in patients (Lewis and Jordan 2005, Joseph, Lu et al. 2013), and thus much effort has been devoted to developing combinatorial therapies that can target multiple domains of a NHR. In this scenario, inhibitors of the DBDs of NHR DBDs may validate the DBD as a viable targeting domain and provide structural insights for drug design.

4.3 Future of affinity reagents based on alternative scaffolds

Affinity reagents based on alternative scaffolds have many benefits over conventional Ab based reagents. First, current Ab reagents are typically derived from animal sources and do not have a known DNA coding sequence. This is problematic because production of animal sourced Abs rely on hybridoma technology, which involves indefinitely maintaining the Ab producing B cells by fusion to myeloma immortal cell lines (Kohler and Milstein 1975). Not only is hybridoma production cost intensive but it also suffers from issues inherent to cell culture propagation such as loss of expression over time, mutations and spurious expression of other Ab genes (Zack, Wong et al. 1995, Bradbury and Plückthun 2015). As a result, the identity of Abs produced from hybridomas are difficult to confirm and this often results in Ab batch-to-batch variations, which contributes to data irreproducibility. Since alternative scaffolds are synthetically engineered usually by display technologies, these binders are recombinant and by default have a known sequence. Therefore, production does not require indefinitely maintaining the cells producing these binders, and identification of a recombinant binder can be easily confirmed by comparing to its known sequence, which helps to keep production

92

consistent. Secondly, alternative scaffolds are carefully chosen or more recently, can be created de novo to be small modular proteins with superior biophysical properties (Khoury, Smadbeck et al. 2014, Woolfson, Bartlett et al. 2015). As such, recombinant binders based on alternative scaffolds can be produced much more cost effectively than Abs. While synthetic Abs developed using display technologies are becoming more prominent and can bypass the requirement for hybridoma production, downstream production of synthetic Abs still necessitates expensive and resource intensive tissue culture methods. With the increased emphasis on reducing data irreproducibility due to Ab reagent inconsistencies, and the declining funding climate, the use of alternative scaffolds will likely rise.

Currently, the intracellular usage of engineered binding proteins is underappreciated, with only a handful of studies published. Most binding protein engineering projects have been directed towards targeting surface receptors for therapeutic purposes, and this is intuitively justified as investments into such ventures provide a larger financial incentive. Many published papers regarding the intracellular use of affinity reagents have been proof of principle studies that demonstrated the possibility to achieve expected functional outcomes by benchmarking to other established molecular tools, such as sh/siRNA (Amstutz, Binz et al. 2005, Mirecka, Hey et al. 2009, Ernst, Avvakumov et al. 2013). However, as we move further into the post-genomic era, the elucidation of protein function rather than gene function will become more important for understanding cell physiology, and the need for protein modulation tools will rise.

As more scientists dedicated to developing affinity reagents continue to publish work dealing with creative usage of alternative scaffolds in uncovering new protein function, the adoption of these binders will most likely expand. To date, binders derived from alternative scaffolds have been used to selectively inhibit function (Sha, Gencer et al. 2013) or activate proteins (Zhang, Wu et al. 2016), trace proteins in live cells without the need of fusing fluorescent proteins to target proteins (Riedl, Flynn et al. 2010, Burgess, Lorca et al. 2012, Zolghadr, Gregor et al. 2012), and identify novel epitopes for drug targeting (Grebien, Hantschel et al. 2011, Gorelik, Orlicky et al. 2016). Currently, no other tools can parallel the intracellular utility of engineered protein binders. I envision that the usage of intracellular

93

protein binders will be a routine methodology in cell biology in the future, just as genetic knock down techniques have evolved to become over the years.

I support the development of more scaffolds that utilize secondary structural elements for binding surface formation. This is because loop based binding surfaces are more conformationally flexible and thus are likely to pay a higher entropic cost when bound to target proteins, which equates to lower affinity. In addition, flexibility can result in non-specific interactions and thus affinity maturation procedures are usually needed to enhance affinity and specificity. It is possible that a single alternative scaffold library using a fixed structural binding site may not perform well against all protein folds or may not be able to target specific epitopes owing to geometric incompatibilities. Thus, it would be wise to develop a collection of scaffolds with varying binding topologies (i.e. concave, convex, or flat) based on different structural elements (i.e. helices, beta-sheets or combinations of), and the entire collection may be pooled together to select for binders against a protein of interest. This will enable competition between different scaffolds for the selection of the best binders. However, precautions must be taken so that certain scaffold libraries will not dominate selection outcomes due to expression or display biases, leading to wrongly identifying enriched binders as the ones possessing the tightest affinity or other desirable qualities. First, such a library collection must be made within the same plasmid vector so that expression and identification of binding variants are standardized for all libraries. Second, the abundance of each scaffold library may need to be adjusted for the proportions of members that are actually displaying a protein variant in each library. A library with all of its members displaying a protein is more likely to dominate a selection outcome compared to a library in which only a fraction of its members are displayed, regardless of the quality of the derived binders. As selection successes accumulate, one might be able to predict which libraries will perform better for particular target proteins.

94

5 References

. Retrieved May 25, 2016, from http://www.affibody.com/en/.

Affibody. (2016). Retrieved 25 May, 2016, from http://www.affibody.com/en/.

Ahmed, Z., C. C. Lin, K. M. Suen, F. A. Melo, J. A. Levitt, K. Suhling and J. E. Ladbury (2013). "Grb2 controls phosphorylation of FGFR2 by inhibiting receptor kinase and Shp2 phosphatase activity." J Cell Biol 200(4): 493-504.

Aizner, Y., O. Sharabi, J. Shirian, G. R. Dakwar, M. Risman, O. Avraham and J. Shifman (2014). "Mapping of the binding landscape for a picomolar protein-protein complex through computation and experiment." Structure 22(4): 636-645.

Amin, D. N., M. R. Campbell and M. M. Moasser (2010). "The role of HER3, the unpretentious member of the HER family, in cancer biology and cancer therapeutics." Semin Cell Dev Biol 21(9): 944-950.

Amstutz, P., H. K. Binz, P. Parizek, M. T. Stumpp, A. Kohl, M. G. Grutter, P. Forrer and A. Pluckthun (2005). "Intracellular kinase inhibitors selected from combinatorial libraries of designed ankyrin repeat proteins." J Biol Chem 280(26): 24715-24722.

Andreatta, M. and M. Nielsen (2012). "Characterizing the binding motifs of 11 common human HLA-DP and HLA-DQ molecules using NNAlign." Immunology 136(3): 306-311.

Anonymous (2015). "Protein binder woes." Nat Methods 12(5): 373.

Arkin, M. R., M. Randal, W. L. DeLano, J. Hyde, T. N. Luong, J. D. Oslob, D. R. Raphael, L. Taylor, J. Wang, R. S. McDowell, J. A. Wells and A. C. Braisted (2003). "Binding of small molecules to an adaptive protein-protein interface." Proc Natl Acad Sci U S A 100(4): 1603- 1608.

Arkin, M. R., Y. Tang and J. A. Wells (2014). "Small-molecule inhibitors of protein-protein interactions: progressing toward the reality." Chem Biol 21(9): 1102-1114.

Arkin, M. R. and J. A. Wells (2004). "Small-molecule inhibitors of protein-protein interactions: progressing towards the dream." Nat Rev Drug Discov 3(4): 301-317.

Badis, G., M. F. Berger, A. A. Philippakis, S. Talukder, A. R. Gehrke, S. A. Jaeger, E. T. Chan, G. Metzler, A. Vedenko, X. Chen, H. Kuznetsov, C. F. Wang, D. Coburn, D. E. Newburger, Q. Morris, T. R. Hughes and M. L. Bulyk (2009). "Diversity and complexity in DNA recognition by transcription factors." Science 324(5935): 1720-1723.

Baker, M. (2015). "Reproducibility crisis: Blame it on the antibodies." Nature 521(7552): 274- 276.

95

Baker, R., S. M. Lewis, A. T. Sasaki, E. M. Wilkerson, J. W. Locasale, L. C. Cantley, B. Kuhlman, H. G. Dohlman and S. L. Campbell (2013). "Site-specific monoubiquitination activates Ras by impeding GTPase-activating protein function." Nat Struct Mol Biol 20(1): 46- 52.

Barnes, L. M., C. M. Bentley and A. J. Dickson (2003). "Stability of protein production from recombinant mammalian cells." Biotechnol Bioeng 81(6): 631-639.

Bass, S., R. Greene and J. A. Wells (1990). "Hormone phage: an enrichment method for variant proteins with altered binding properties." Proteins 8(4): 309-314.

Beatty, G. L., E. G. Chiorean, M. P. Fishman, B. Saboury, U. R. Teitelbaum, W. Sun, R. D. Huhn, W. Song, D. Li, L. L. Sharp, D. A. Torigian, P. J. O'Dwyer and R. H. Vonderheide (2011). "CD40 agonists alter tumor stroma and show efficacy against pancreatic carcinoma in mice and humans." Science 331(6024): 1612-1616.

Behar, G., M. Bellinzoni, M. Maillasson, L. Paillard-Laurance, P. M. Alzari, X. He, B. Mouratou and F. Pecorari (2013). "Tolerance of the archaeal Sac7d scaffold protein to alternative library designs: characterization of anti- Affitins." Protein Eng Des Sel 26(4): 267-275.

Berger, C., S. Weber-Bornhauser, J. Eggenberger, J. Hanes, A. Pluckthun and H. R. Bosshard (1999). "Antigen recognition by conformational selection." FEBS Lett 450(1-2): 149-153.

Berndsen, C. E. and C. Wolberger (2014). "New insights into ubiquitin E3 ligase mechanism." Nat Struct Mol Biol 21(4): 301-307.

Beste, G., F. S. Schmidt, T. Stibora and A. Skerra (1999). "Small antibody-like proteins with prescribed ligand specificities derived from the lipocalin fold." Proc Natl Acad Sci U S A 96(5): 1898-1903.

Betton, J. M., N. Sassoon, M. Hofnung and M. Laurent (1998). "Degradation versus aggregation of misfolded maltose-binding protein in the periplasm of Escherichia coli." J Biol Chem 273(15): 8897-8902.

Bhagwat, A. S. and C. R. Vakoc (2015). "Targeting Transcription Factors in Cancer." Trends in Cancer 1(1): 53-65.

Binz, H. K., P. Amstutz, A. Kohl, M. T. Stumpp, C. Briand, P. Forrer, M. G. Grutter and A. Pluckthun (2004). "High-affinity binders selected from designed ankyrin repeat protein libraries." Nat Biotechnol 22(5): 575-582.

Binz, H. K., M. T. Stumpp, P. Forrer, P. Amstutz and A. Pluckthun (2003). "Designing repeat proteins: well-expressed, soluble and stable proteins from combinatorial libraries of consensus ankyrin repeat proteins." J Mol Biol 332(2): 489-503.

Birtalan, S., R. D. Fisher and S. S. Sidhu (2010). "The functional capacity of the natural amino acids for molecular recognition." Mol Biosyst 6(7): 1186-1194.

96

Birtalan, S., Y. Zhang, F. A. Fellouse, L. Shao, G. Schaefer and S. S. Sidhu (2008). "The intrinsic contributions of tyrosine, serine, glycine and arginine to the affinity and specificity of antibodies." J Mol Biol 377(5): 1518-1528.

Bode, W., P. Schwager and R. Huber (1978). "The transition of bovine trypsinogen to a trypsin- like state upon strong ligand binding. The refined crystal structures of the bovine trypsinogen- pancreatic trypsin inhibitor complex and of its ternary complex with Ile-Val at 1.9 A resolution." J Mol Biol 118(1): 99-112.

Bogan, A. A. and K. S. Thorn (1998). "Anatomy of hot spots in protein interfaces." J Mol Biol 280(1): 1-9.

Bond, C. J., C. Wiesmann, J. C. Marsters, Jr. and S. S. Sidhu (2005). "A structure-based database of antibody variable domain diversity." J Mol Biol 348(3): 699-709.

Boyken, S. E., Z. Chen, B. Groves, R. A. Langan, G. Oberdorfer, A. Ford, J. M. Gilmore, C. Xu, F. DiMaio, J. H. Pereira, B. Sankaran, G. Seelig, P. H. Zwart and D. Baker (2016). "De novo design of protein homo-oligomers with modular hydrogen-bond network-mediated specificity." Science 352(6286): 680-687.

Bradbury, A. and A. Plückthun (2015). "Reproducibility: Standardize antibodies used in research." Nature 518(7537): 27-29.

Bradbury, A. R. and A. Plückthun (2015). "Getting to reproducible antibodies: the rationale for sequenced recombinant characterized reagents." Protein engineering, design & selection : PEDS 28(10): 303-305.

Britsch, S., L. Li, S. Kirchhoff, F. Theuring, V. Brinkmann, C. Birchmeier and D. Riethmacher (1998). "The ErbB2 and ErbB3 receptors and their ligand, neuregulin-1, are essential for development of the sympathetic nervous system." Genes Dev 12(12): 1825-1836.

Brown, N. G., R. VanderLinden, E. R. Watson, F. Weissmann, A. Ordureau, K. Wu, W. Zhang, S. Yu and J. S. H. Peter Y. Mercredi, Iain F. Davidson3, Renping Qiao3, Ying Lu7, Prakash Dube8,9, Michael R. Brunner1, Christy R. R. Grace1, Darcie J. Miller1, David Haselbach8,9, Marc A. Jarvis3, Masaya Yamaguchi1, David Yanishevski1, Georg Petzold3, Sachdev S. Sidhu5, Brian Kuhlman6, Marc W. Kirschner7, J. Wade Harper4, Jan-Michael Peters3,11, Holger Stark8,9,11, Brenda A. Schulman1,2,11 (2016). "Dual architectures regulate multiubiquitination and ubiquitin chain elongation for a RING E3." Cell In-press.

Bruncko, M., T. K. Oost, B. A. Belli, H. Ding, M. K. Joseph, A. Kunzer, D. Martineau, W. J. McClellan, M. Mitten, S. C. Ng, P. M. Nimmer, T. Oltersdorf, C. M. Park, A. M. Petros, A. R. Shoemaker, X. Song, X. Wang, M. D. Wendt, H. Zhang, S. W. Fesik, S. H. Rosenberg and S. W. Elmore (2007). "Studies leading to potent, dual inhibitors of Bcl-2 and Bcl-xL." J Med Chem 50(4): 641-662.

Buac, K., M. Xu, J. Cronin, A. T. Weeraratna, S. M. Hewitt and W. J. Pavan (2009). "NRG1 / ERBB3 signaling in melanocyte development and melanoma: inhibition of differentiation and promotion of proliferation." Pigment Cell Melanoma Res 22(6): 773-784.

97

Burgess, A., T. Lorca and A. Castro (2012). "Quantitative live imaging of endogenous DNA replication in mammalian cells." PLoS One 7(9): e45726.

Burroughs, A. M., S. Balaji, L. M. Iyer and L. Aravind (2007). "Small but versatile: the extraordinary functional and structural diversity of the beta-grasp fold." Biol Direct 2: 18.

Burroughs, A. M., L. M. Iyer and L. Aravind (2012). "Structure and evolution of ubiquitin and ubiquitin-related domains." Methods Mol Biol 832: 15-63.

Campbell, M. R., D. Amin and M. M. Moasser (2010). "HER3 comes of age: new insights into its functions and role in signaling, tumor biology, and cancer therapy." Clin Cancer Res 16(5): 1373-1383.

Carraway, K. L., 3rd, M. X. Sliwkowski, R. Akita, J. V. Platko, P. M. Guy, A. Nuijens, A. J. Diamonti, R. L. Vandlen, L. C. Cantley and R. A. Cerione (1994). "The erbB3 gene product is a receptor for heregulin." J Biol Chem 269(19): 14303-14306.

Chaudhury, C., S. Mehnaz, J. M. Robinson, W. L. Hayton, D. K. Pearl, D. C. Roopenian and C. L. Anderson (2003). "The major histocompatibility complex-related Fc receptor for IgG (FcRn) binds albumin and prolongs its lifespan." J Exp Med 197(3): 315-322.

Chen, J. and Z. J. Chen (2013). "Regulation of NF-kappaB by ubiquitination." Curr Opin Immunol 25(1): 4-12.

Chen, J., N. Sawyer and L. Regan (2013). "Protein-protein interactions: general trends in the relationship between binding affinity and interfacial buried surface area." Protein Sci 22(4): 510-515.

Clackson, T. and J. A. Wells (1995). "A hot spot of binding energy in a hormone-receptor interface." Science 267(5196): 383-386.

Colland, F. (2010). "The therapeutic potential of deubiquitinating enzyme inhibitors." Biochem Soc Trans 38(Pt 1): 137-143.

Colland, F., E. Formstecher, X. Jacq, C. Reverdy, C. Planquette, S. Conrath, V. Trouplin, J. Bianchi, V. N. Aushev, J. Camonis, A. Calabrese, C. Borg-Capra, W. Sippl, V. Collura, G. Boissy, J. C. Rain, P. Guedat, R. Delansorne and L. Daviet (2009). "Small-molecule inhibitor of USP7/HAUSP ubiquitin protease stabilizes and activates p53 in cells." Mol Cancer Ther 8(8): 2286-2295.

Colwill, K., G. Renewable Protein Binder Working and S. Graslund (2011). "A roadmap to generate renewable protein binders to the human proteome." Nat Methods 8(7): 551-558.

Couture, M. L. and C. A. Heath (1995). "Relationship between loss of heavy chains and the appearance of nonproducing hybridomas." Biotechnol Bioeng 47(2): 270-275.

Csermely, P., R. Palotai and R. Nussinov (2010). "Induced fit, conformational selection and independent dynamic segments: an extended view of binding events." Trends Biochem Sci 35(10): 539-546.

98

Cunningham, B. C., D. G. Lowe, B. Li, B. D. Bennett and J. A. Wells (1994). "Production of an atrial natriuretic peptide variant that is specific for type A receptor." Embo j 13(11): 2508-2515.

D'Haeseleer, P. (2006). "What are DNA sequence motifs?" Nat Biotechnol 24(4): 423-425.

De Lau, W. B., K. Heije, J. J. Neefjes, M. Oosterwegel, E. Rozemuller and B. J. Bast (1991). "Absence of preferential homologous H/L chain association in hybrid hybridomas." J Immunol 146(3): 906-914.

DeLano, W. L., M. H. Ultsch, A. M. de Vos and J. A. Wells (2000). "Convergent solutions to binding at a protein-protein interface." Science 287(5456): 1279-1283.

Demarest, S. J., G. Chen, B. E. Kimmel, D. Gustafson, J. Wu, J. Salbato, J. Poland, M. Elia, X. Tan, K. Wong, J. Short and G. Hansen (2006). "Engineering stability into Escherichia coli secreted Fabs leads to increased functional expression." Protein Engineering Design and Selection 19(7): 325-336.

Dennis, M. S. and R. A. Lazarus (1994). " inhibitors of tissue factor-factor VIIa. I. Potent inhibitors selected from libraries by phage display." J Biol Chem 269(35): 22129- 22136.

Edwards, A. M., R. Isserlin, G. D. Bader, S. V. Frye, T. M. Willson and F. H. Yu (2011). "Too many roads not taken." Nature 470(7333): 163-165.

Erijman, A., A. Dantes, R. Bernheim, J. M. Shifman and Y. Peleg (2011). "Transfer-PCR (TPCR): a highway for DNA cloning and protein engineering." Journal of structural biology 175(2): 171-177.

Ernst, A., G. Avvakumov, J. Tong, Y. Fan, Y. Zhao, P. Alberts, A. Persaud, J. R. Walker, A. M. Neculai, D. Neculai, A. Vorobyov, P. Garg, L. Beatty, P. K. Chan, Y. C. Juang, M. C. Landry, C. Yeh, E. Zeqiraj, K. Karamboulas, A. Allali-Hassani, M. Vedadi, M. Tyers, J. Moffat, F. Sicheri, L. Pelletier, D. Durocher, B. Raught, D. Rotin, J. Yang, M. F. Moran, S. Dhe-Paganon and S. S. Sidhu (2013). "A strategy for modulation of enzymes in the ubiquitin system." Science 339(6119): 590-595.

Ernst, A., D. Gfeller, Z. Kan, S. Seshagiri, P. M. Kim, G. D. Bader and S. S. Sidhu (2010). "Coevolution of PDZ domain-ligand interactions analyzed by high-throughput phage display and deep sequencing." Mol Biosyst 6(10): 1782-1790.

Ernst, A., S. L. Sazinsky, S. Hui, B. Currell, M. Dharsee, S. Seshagiri, G. D. Bader and S. S. Sidhu (2009). "Rapid evolution of functional complexity in a domain family." Sci Signal 2(87): ra50.

Fairbrother, W. J., H. W. Christinger, A. G. Cochran, G. Fuh, C. J. Keenan, C. Quan, S. K. Shriver, J. Y. Tom, J. A. Wells and B. C. Cunningham (1998). "Novel peptides selected to bind vascular endothelial growth factor target the receptor-binding site." Biochemistry 37(51): 17754-17764.

Fan, M. (2015). Global Markets for Research Antibodies, BCC Research 24, 32.

99

Feldmann, M. and R. N. Maini (2001). "Anti-TNF alpha therapy of rheumatoid arthritis: what have we learned?" Annu Rev Immunol 19: 163-196.

Fellouse, F. A., P. A. Barthelemy, R. F. Kelley and S. S. Sidhu (2006). "Tyrosine plays a dominant functional role in the paratope of a synthetic antibody derived from a four amino acid code." J Mol Biol 357(1): 100-114.

Fellouse, F. A., K. Esaki, S. Birtalan, D. Raptis, V. J. Cancasci, A. Koide, P. Jhurani, M. Vasser, C. Wiesmann, A. A. Kossiakoff, S. Koide and S. S. Sidhu (2007). "High-throughput generation of synthetic antibodies from highly functional minimalist phage-displayed libraries." J Mol Biol 373(4): 924-940.

Fellouse, F. A., B. Li, D. M. Compaan, A. A. Peden, S. G. Hymowitz and S. S. Sidhu (2005). "Molecular recognition by a binary code." J Mol Biol 348(5): 1153-1162.

Fellouse, F. A., Sidhu, S.S. (2007). Making antibodies in bacteria. Making and using antibodies. G. C. Howard, Kaser, M.S. . Boca Raton, FL, CRC Press: 157-180.

Fellouse, F. A., C. Wiesmann and S. S. Sidhu (2004). "Synthetic antibodies from a four-amino- acid code: a dominant role for tyrosine in antigen recognition." Proc Natl Acad Sci U S A 101(34): 12467-12472.

Findlay, G. M., M. J. Smith, F. Lanner, M. S. Hsiung, G. D. Gish, E. Petsalaki, K. Cockburn, T. Kaneko, H. Huang, R. D. Bagshaw, T. Ketela, M. Tucholska, L. Taylor, D. D. Bowtell, J. Moffat, M. Ikura, S. S. Li, S. S. Sidhu, J. Rossant and T. Pawson (2013). "Interaction domains of Sos1/Grb2 are finely tuned for cooperative control of embryonic fate." Cell 152(5): 1008-1020.

Fischer, E. (1894). "Influence of configuration on the action of enzymes." Ber. Dtsch. Chem. Ges. 27(3): 2985-2993.

Flavell, R. A., D. L. Sabo, E. F. Bandle and C. Weissmann (1975). "Site-directed mutagenesis: effect of an extracistronic mutation on the in vitro propagation of bacteriophage Qbeta RNA." Proc Natl Acad Sci U S A 72(1): 367-371.

Fontaine, F., J. Overman and M. Francois (2015). "Pharmacological manipulation of transcription factor protein-protein interactions: opportunities and obstacles." Cell Regen (Lond) 4(1): 2.

Frenzel, A., M. Hust and T. Schirrmann (2013). "Expression of ." Frontiers in immunology 4: 217.

Frescas, D. and M. Pagano (2008). "Deregulated proteolysis by the F-box proteins SKP2 and beta-TrCP: tipping the scales of cancer." Nat Rev Cancer 8(6): 438-449.

Gasteiger, H. C., Gattiker, A., Duvand, S., Wilkins, M.R., Appel, R.D. (2005). The Proteomics Protocol Handbook. J. M. Walker, Humana Press: 571-607.

Gebauer, M. and A. Skerra (2009). "Engineered protein scaffolds as next-generation antibody therapeutics." Curr Opin Chem Biol 13(3): 245-255.

100

Gebauer, M. and A. Skerra (2012). "Anticalins small engineered binding proteins based on the lipocalin scaffold." Methods in enzymology 503: 157-188.

Geertz, M. and S. J. Maerkl (2010). "Experimental strategies for studying transcription factor- DNA binding specificities." Brief Funct Genomics 9(5-6): 362-373.

Gera, N., M. Hussain, R. C. Wright and B. M. Rao (2011). "Highly stable binding proteins derived from the hyperthermophilic Sso7d scaffold." J Mol Biol 409(4): 601-616.

Gfeller, D., F. Butty, M. Wierzbicka, E. Verschueren, P. Vanhee, H. Huang, A. Ernst, N. Dar, I. Stagljar, L. Serrano, S. S. Sidhu, G. D. Bader and P. M. Kim (2011). "The multiple-specificity landscape of modular peptide recognition domains." Mol Syst Biol 7: 484.

Gil, D. and A. G. Schrum (2013). "Strategies to stabilize compact folding and minimize aggregation of antibody-based fragments." Advances in bioscience and biotechnology (Print) 4(4a): 73-84.

Gilbreth, R. N., K. Esaki, A. Koide, S. S. Sidhu and S. Koide (2008). "A dominant conformational role for amino acid diversity in minimalist protein-protein interfaces." J Mol Biol 381(2): 407-418.

Gorelik, M., S. Orlicky, M. A. Sartori, X. Tang, E. Marcon, I. Kurinov, J. F. Greenblatt, M. Tyers, J. Moffat, F. Sicheri and S. S. Sidhu (2016). "Inhibition of SCF ubiquitin ligases by engineered ubiquitin variants that target the Cul1 binding site on the Skp1-F-box interface." Proc Natl Acad Sci U S A 113(13): 3527-3532.

Grasberger, B. L., T. Lu, C. Schubert, D. J. Parks, T. E. Carver, H. K. Koblish, M. D. Cummings, L. V. LaFrance, K. L. Milkiewicz, R. R. Calvo, D. Maguire, J. Lattanze, C. F. Franks, S. Zhao, K. Ramachandren, G. R. Bylebyl, M. Zhang, C. L. Manthey, E. C. Petrella, M. W. Pantoliano, I. C. Deckman, J. C. Spurlino, A. C. Maroney, B. E. Tomczuk, C. J. Molloy and R. F. Bone (2005). "Discovery and cocrystal structure of benzodiazepinedione HDM2 antagonists that activate p53 in cells." J Med Chem 48(4): 909-912.

Grebien, F., O. Hantschel, J. Wojcik, I. Kaupe, B. Kovacic, A. M. Wyrzucki, G. D. Gish, S. Cerny-Reiterer, A. Koide, H. Beug, T. Pawson, P. Valent, S. Koide and G. Superti-Furga (2011). "Targeting the SH2-kinase interface in Bcr-Abl inhibits leukemogenesis." Cell 147(2): 306-319.

Guo, W., J. A. Wisniewski and H. Ji (2014). "Hot spot-based design of small-molecule inhibitors for protein-protein interactions." Bioorg Med Chem Lett 24(11): 2546-2554.

Haglund, K., S. Sigismund, S. Polo, I. Szymkiewicz, P. P. Di Fiore and I. Dikic (2003). "Multiple monoubiquitination of RTKs is sufficient for their endocytosis and degradation." Nat Cell Biol 5(5): 461-466.

Hara, T., K. Nakamura, M. Matsui, A. Yamamoto, Y. Nakahara, R. Suzuki-Migishima, M. Yokoyama, K. Mishima, I. Saito, H. Okano and N. Mizushima (2006). "Suppression of basal autophagy in neural cells causes neurodegenerative disease in mice." Nature 441(7095): 885- 889.

101

Harrison, J. S., T. M. Jacobs, K. Houlihan, K. Van Doorslaer and B. Kuhlman (2015). "UbSRD: The Ubiquitin Structural Relational Database." J Mol Biol.

Hassanzadeh-Ghassabeh, G., N. Devoogdt, P. De Pauw, C. Vincke and S. Muyldermans (2013). "Nanobodies and their potential applications." Nanomedicine (Lond) 8(6): 1013-1026.

Hoffmann, A., M. Kovermann, H. Lilie, M. Fiedler, J. Balbach, R. Rudolph and S. Pfeifer (2012). "New binding mode to TNF-alpha revealed by ubiquitin-based artificial binding protein." PLoS One 7(2): e31298.

Holbro, T., R. R. Beerli, F. Maurer, M. Koziczak, C. F. Barbas, 3rd and N. E. Hynes (2003). "The ErbB2/ErbB3 heterodimer functions as an oncogenic unit: ErbB2 requires ErbB3 to drive breast tumor cell proliferation." Proc Natl Acad Sci U S A 100(15): 8933-8938.

Hosse, R. J., A. Rothe and B. E. Power (2006). "A new generation of protein display scaffolds for molecular recognition." Protein Sci 15(1): 14-27. http://commonfund.nih.gov/proteincapture/index. https://antibodies.cancer.gov/apps/site/default. from https://antibodies.cancer.gov/apps/site/default.

Hu, M., P. Li, M. Li, W. Li, T. Yao, J. W. Wu, W. Gu, R. E. Cohen and Y. Shi (2002). "Crystal structure of a UBP-family deubiquitinating enzyme in isolation and in complex with ubiquitin aldehyde." Cell 111(7): 1041-1054.

Huang, F. and A. Sorkin (2005). "Growth factor receptor binding protein 2-mediated recruitment of the RING domain of Cbl to the epidermal growth factor receptor is essential and sufficient to support receptor endocytosis." Mol Biol Cell 16(3): 1268-1281.

Huang, H., N. O. Economopoulos, B. A. Liu, A. Uetrecht, J. Gu, N. Jarvik, V. Nadeem, T. Pawson, J. Moffat, S. Miersch and S. S. Sidhu (2015). "Selection of recombinant anti-SH3 domain antibodies by high-throughput phage display." Protein science : a publication of the Protein Society 24(11): 1890-1900.

Hurley, J. H., S. Lee and G. Prag (2006). "Ubiquitin-binding domains." Biochem J 399(3): 361- 372.

Husnjak, K. and I. Dikic (2012). "Ubiquitin-binding proteins: decoders of ubiquitin-mediated cellular functions." Annu Rev Biochem 81: 291-322.

Hutchison, C. A., 3rd, S. Phillips, M. H. Edgell, S. Gillam, P. Jahnke and M. Smith (1978). "Mutagenesis at a specific position in a DNA sequence." J Biol Chem 253(18): 6551-6560.

Hwang, H., T. Vreven, J. Janin and Z. Weng (2010). "Protein-protein docking benchmark version 4.0." Proteins 78(15): 3111-3114.

Ibarra-Molero, B., V. V. Loladze, G. I. Makhatadze and J. M. Sanchez-Ruiz (1999). "Thermal versus guanidine-induced unfolding of ubiquitin. An analysis in terms of the contributions from charge-charge interactions to protein stability." Biochemistry 38(25): 8138-8149.

102

James, L. C., P. Roversi and D. S. Tawfik (2003). "Antibody multispecificity mediated by conformational diversity." Science 299(5611): 1362-1367.

Janin, J. and C. Chothia (1990). "The structure of protein-protein recognition sites." J Biol Chem 265(27): 16027-16030.

Jimenez, R., G. Salazar, K. K. Baldridge and F. E. Romesberg (2003). "Flexibility and molecular recognition in the immune system." Proc Natl Acad Sci U S A 100(1): 92-97.

Jin, L., A. Williamson, S. Banerjee, I. Philipp and M. Rape (2008). "Mechanism of ubiquitin- chain formation by the human anaphase-promoting complex." Cell 133(4): 653-665.

Jones, S. and J. M. Thornton (1996). "Principles of protein-protein interactions." Proc Natl Acad Sci U S A 93(1): 13-20.

Joseph, J. D., N. Lu, J. Qian, J. Sensintaffar, G. Shao, D. Brigham, M. Moon, E. C. Maneval, I. Chen, B. Darimont and J. H. Hager (2013). "A clinically relevant androgen receptor mutation confers resistance to second-generation antiandrogens enzalutamide and ARN-509." Cancer Discov 3(9): 1020-1029.

Jung, S., A. Honegger and A. Plückthun (1999). "Selection for improved protein stability by phage display." Journal of molecular biology 294(1): 163-180.

Kanarek, N., N. London, O. Schueler-Furman and Y. Ben-Neriah (2010). "Ubiquitination and degradation of the inhibitors of NF-kappaB." Cold Spring Harb Perspect Biol 2(2): a000166.

Kastritis, P. L. and A. M. Bonvin (2013). "On the binding affinity of macromolecular interactions: daring to ask why proteins interact." J R Soc Interface 10(79): 20120835.

Keskin, O., A. Gursoy, B. Ma and R. Nussinov (2008). "Principles of protein-protein interactions: what are the preferred ways for proteins to interact?" Chem Rev 108(4): 1225- 1244.

Khoury, G. A., J. Smadbeck, C. A. Kieslich and C. A. Floudas (2014). "Protein folding and de novo protein design for biotechnological applications." Trends Biotechnol 32(2): 99-109.

Klein, C., C. Sustmann, M. Thomas, K. Stubenrauch, R. Croasdale, J. Schanzer, U. Brinkmann, H. Kettenberger, J. T. Regula and W. Schaefer (2012). "Progress in overcoming the chain association issue in bispecific heterodimeric IgG antibodies." MAbs 4(6): 653-663.

Klemm, J. D., M. A. Rould, R. Aurora, W. Herr and C. O. Pabo (1994). "Crystal structure of the Oct-1 POU domain bound to an octamer site: DNA recognition with tethered DNA-binding modules." Cell 77(1): 21-32.

Koga, N., R. Tatsumi-Koga, G. Liu, R. Xiao, T. B. Acton, G. T. Montelione and D. Baker (2012). "Principles for designing ideal protein structures." Nature 491(7423): 222-227.

Kohler, G. and C. Milstein (1975). "Continuous cultures of fused cells secreting antibody of predefined specificity." Nature 256(5517): 495-497.

103

Koide, A., C. W. Bailey, X. Huang and S. Koide (1998). "The fibronectin type III domain as a scaffold for novel binding proteins." Journal of molecular biology 284(4): 1141-1151.

Koide, A., R. N. Gilbreth, K. Esaki, V. Tereshko and S. Koide (2007). "High-affinity single- domain binding proteins with a binary-code interface." Proc Natl Acad Sci U S A 104(16): 6632-6637.

Koide, A., J. Wojcik, R. N. Gilbreth, R. J. Hoey and S. Koide (2012). "Teaching an old scaffold new tricks: monobodies constructed using alternative surfaces of the FN3 scaffold." J Mol Biol 415(2): 393-405.

Koide, S., A. Koide and D. Lipovšek (2012). Chapter six - Target-Binding Proteins Based on the 10th Human Fibronectin Type III Domain (10Fn3). Methods in Enzymology. K. D. Wittrup and L. V. Gregory, Academic Press. Volume 503: 135-156.

Koide, S. and S. S. Sidhu (2009). "The importance of being tyrosine: lessons in molecular recognition from minimalist synthetic binding proteins." ACS Chem Biol 4(5): 325-334.

Komander, D., M. J. Clague and S. Urbé (2009). "Breaking the chains: structure and function of the deubiquitinases." Nat Rev Mol Cell Biol 10(8): 550-563.

Kony, D. B., P. H. Hunenberger and W. F. van Gunsteren (2007). "Molecular dynamics simulations of the native and partially folded states of ubiquitin: influence of methanol cosolvent, pH, and temperature on the protein structure and dynamics." Protein Sci 16(6): 1101- 1118.

Koshland, D. E., Jr. (1959). "Enzyme flexibility and enzyme action." J Cell Comp Physiol 54: 245-258.

Kossiakoff, A. A., J. L. Chambers, L. M. Kay and R. M. Stroud (1977). "Structure of bovine trypsinogen at 1.9 A resolution." Biochemistry 16(4): 654-664.

Kramer, R. A., F. Cox, M. van der Horst, S. van der Oudenrijn, P. C. Res, J. Bia, T. Logtenberg and J. de Kruif (2003). "A novel helper phage that improves phage display selection efficiency by preventing the amplification of phages without recombinant protein." Nucleic Acids Res 31(11): e59.

Kunkel, T. A. (1985). "Rapid and efficient site-specific mutagenesis without phenotypic selection." Proc Natl Acad Sci U S A 82(2): 488-492.

Kurisaki, I., M. Takayanagi and M. Nagaoka (2014). "Combined mechanism of conformational selection and induced fit in U1A-RNA molecular recognition." Biochemistry 53(22): 3646- 3657.

Lake, M. W., M. M. Wuebbens, K. V. Rajagopalan and H. Schindelin (2001). "Mechanism of ubiquitin activation revealed by the structure of a bacterial MoeB-MoaD complex." Nature 414(6861): 325-329.

104

Lam, Y. A., C. M. Pickart, A. Alban, M. Landon, C. Jamieson, R. Ramage, R. J. Mayer and R. Layfield (2000). "Inhibition of the ubiquitin-proteasome system in Alzheimer's disease." Proc Natl Acad Sci U S A 97(18): 9902-9906.

Lange, O. F., N. A. Lakomek, C. Farès, G. F. Schröder, K. F. Walter, S. Becker, J. Meiler, H. Grubmüller, C. Griesinger and B. L. de Groot (2008). "Recognition dynamics up to microseconds revealed from an RDC-derived ubiquitin ensemble in solution." Science 320(5882): 1471-1475.

Lazo, J. S. and E. R. Sharlow (2016). "Drugging Undruggable Molecular Cancer Targets." Annu Rev Pharmacol Toxicol 56: 23-40.

Lee, C. C., J. M. Perchiacca and P. M. Tessier (2013). "Toward aggregation-resistant antibodies by design." Trends Biotechnol 31(11): 612-620.

Lee, C. V., S. S. Sidhu and G. Fuh (2004). "Bivalent antibody phage display mimics natural immunoglobulin." Journal of immunological methods 284(1-2): 119-132.

Levin, A. M. and G. A. Weiss (2006). "Optimizing the affinity and specificity of proteins with molecular display." Molecular bioSystems 2(1): 49-57.

Lewis, J. S. and V. C. Jordan (2005). "Selective estrogen receptor modulators (SERMs): mechanisms of anticarcinogenesis and drug resistance." Mutat Res 591(1-2): 247-263.

Li, C., C. S. Dowd, W. Zhang and I. M. Chaiken (2001). "Phage randomization in a charybdotoxin scaffold leads to CD4-mimetic recognition motifs that bind HIV-1 envelope through non-aromatic sequences." J Pept Res 57(6): 507-518.

Li, W., M. H. Bengtson, A. Ulbrich, A. Matsuda, V. A. Reddy, A. Orth, S. K. Chanda, S. Batalov and C. A. Joazeiro (2008). "Genome-wide and functional annotation of human E3 ubiquitin ligases identifies MULAN, a mitochondrial E3 that regulates the organelle's dynamics and signaling." PLoS One 3(1): e1487.

Liang, X., L. Peng, C.-H. H. Baek and F. Katzen (2013). "Single step BP/LR combined Gateway reactions." BioTechniques 55(5): 265-268.

Lin, C. C., F. A. Melo, R. Ghosh, K. M. Suen, L. J. Stagg, J. Kirkpatrick, S. T. Arold, Z. Ahmed and J. E. Ladbury (2012). "Inhibition of basal FGF receptor signaling by dimeric Grb2." Cell 149(7): 1514-1524.

Lipovsek, D. (2011). "Adnectins: engineered target-binding protein therapeutics." Protein Eng Des Sel 24(1-2): 3-9.

Lipovsek, D. and A. Pluckthun (2004). "In-vitro protein evolution by and mRNA display." J Immunol Methods 290(1-2): 51-67.

Lisabeth, E. M., G. Falivelli and E. B. Pasquale (2013). "Eph receptor signaling and ephrins." Cold Spring Harb Perspect Biol 5(9).

105

Liu, B. A., K. Jablonowski, M. Raina, M. Arce, T. Pawson and P. D. Nash (2006). "The human and mouse complement of SH2 domain proteins-establishing the boundaries of phosphotyrosine signaling." Mol Cell 22(6): 851-868.

Lo Conte, L., C. Chothia and J. Janin (1999). "The atomic structure of protein-protein recognition sites." J Mol Biol 285(5): 2177-2198.

Lofblom, J. (2011). "Bacterial display in combinatorial protein engineering." Biotechnol J 6(9): 1115-1129.

Lofblom, J., J. Feldwisch, V. Tolmachev, J. Carlsson, S. Stahl and F. Y. Frejd (2010). "Affibody molecules: engineered proteins for therapeutic, diagnostic and biotechnological applications." FEBS Lett 584(12): 2670-2680.

Lorey, S., E. Fiedler, A. Kunert, J. Nerkamp, C. Lange, M. Fiedler, E. Bosse-Doenecke, M. Meysing, M. Gloser, C. Rundfeldt, U. Rauchhaus, I. Hanssgen, T. Gottler, A. Steuernagel, U. Fiedler and U. Haupts (2014). "Novel ubiquitin-derived high affinity binding proteins with tumor targeting properties." J Biol Chem 289(12): 8493-8507.

Lowenstein, E. J., R. J. Daly, A. G. Batzer, W. Li, B. Margolis, R. Lammers, A. Ullrich, E. Y. Skolnik, D. Bar-Sagi and J. Schlessinger (1992). "The SH2 and SH3 domain-containing protein GRB2 links receptor tyrosine kinases to ras signaling." Cell 70(3): 431-442.

MacEwan, D. J. (2002). "TNF ligands and receptors--a matter of life and death." Br J Pharmacol 135(4): 855-875.

Mandal, K., M. Uppalapati, D. Ault-Riche, J. Kenney, J. Lowitz, S. S. Sidhu and S. B. Kent (2012). "Chemical synthesis and X-ray structure of a heterochiral {D-protein antagonist plus vascular endothelial growth factor} protein complex by racemic crystallography." Proc Natl Acad Sci U S A 109(37): 14779-14784.

Manivel, V., N. C. Sahoo, D. M. Salunke and K. V. Rao (2000). "Maturation of an antibody response is governed by modulations in flexibility of the antigen-combining site." Immunity 13(5): 611-620.

Marcon, E., H. Jain, A. Bhattacharya, H. Guo, S. Phanse, S. Pu, G. Byram, B. C. Collins, E. Dowdell, M. Fenner, X. Guo, A. Hutchinson, J. J. Kennedy, B. Krastins, B. Larsen, Z. Y. Lin, M. F. Lopez, P. Loppnau, S. Miersch, T. Nguyen, J. B. Olsen, M. Paduch, M. Ravichandran, A. Seitova, G. Vadali, M. S. Vogelsang, J. R. Whiteaker, G. Zhong, N. Zhong, L. Zhao, R. Aebersold, C. H. Arrowsmith, A. Emili, L. Frappier, A. C. Gingras, M. Gstaiger, A. G. Paulovich, S. Koide, A. A. Kossiakoff, S. S. Sidhu, S. J. Wodak, S. Graslund, J. F. Greenblatt and A. M. Edwards (2015). "Assessment of a method to characterize antibody selectivity and specificity for use in immunoprecipitation." Nat Methods 12(8): 725-731.

Margolis, B. and E. Y. Skolnik (1994). "Activation of Ras by receptor tyrosine kinases." J Am Soc Nephrol 5(6): 1288-1299.

Marschall, A. L., S. Dubel and T. Boldicke (2015). "Specific in vivo knockdown of protein function by intrabodies." MAbs 7(6): 1010-1035.

106

Maspero, E., E. Valentini, S. Mari, V. Cecatiello, P. Soffientini, S. Pasqualato and S. Polo (2013). "Structure of a ubiquitin-loaded HECT ligase reveals the molecular basis for catalytic priming." Nature Structural and Molecular Biology 20: 696+.

McCafferty, J., A. D. Griffiths, G. Winter and D. J. Chiswell (1990). "Phage antibodies: filamentous phage displaying antibody variable domains." Nature 348(6301): 552-554.

Meier, F., T. Abeywardana, A. Dhall, N. P. Marotta, J. Varkey, R. Langen, C. Chatterjee and M. R. Pratt (2012). "Semisynthetic, site-specific ubiquitin modification of α-synuclein reveals differential effects on aggregation." Journal of the American Chemical Society 134(12): 5468- 5471.

Mian, I. S., A. R. Bradwell and A. J. Olson (1991). "Structure, function and properties of antibody binding sites." J Mol Biol 217(1): 133-151.

Milo, R. (2013). "What is the total number of protein molecules per cell volume? A call to rethink some published values." Bioessays 35(12): 1050-1055.

Mirecka, E. A., T. Hey, U. Fiedler, R. Rudolph and M. Hatzfeld (2009). " molecules selected against the human papillomavirus E7 protein inhibit the proliferation of target cells." J Mol Biol 390(4): 710-721.

Modi, H., L. Li, S. Chu, J. Rossi, J. K. Yee and R. Bhatia (2011). "Inhibition of Grb2 expression demonstrates an important role in BCR-ABL-mediated MAPK activation and transformation of primary human hematopoietic cells." Leukemia 25(2): 305-312.

Molecular_Partners. (2016). "Molecular Partners " Retrieved 25 May, 2016, from http://www.molecularpartners.com/.

Monod, J., J. Wyman and J. P. Changeux (1965). "On the Nature of Allosteric Transitions: A Plausible Model." J Mol Biol 12: 88-118.

Montoyo, H. P., C. Vaccaro, M. Hafner, R. J. Ober, W. Mueller and E. S. Ward (2009). "Conditional deletion of the MHC class I-related receptor FcRn reveals the sites of IgG homeostasis in mice." Proc Natl Acad Sci U S A 106(8): 2788-2793.

Moore, S. J. and J. R. Cochran (2012). "Engineering knottins as novel binding agents." Methods Enzymol 503: 223-251.

Moreira, I. S., P. A. Fernandes and M. J. Ramos (2007). "Hot spots--a review of the protein- protein interface determinant amino-acid residues." Proteins 68(4): 803-812.

Morrison, K. L. and G. A. Weiss (2001). "Combinatorial alanine-scanning." Curr Opin Chem Biol 5(3): 302-307.

Nord, K., J. Nilsson, B. Nilsson, M. Uhlen and P. A. Nygren (1995). "A combinatorial library of an alpha-helical bacterial receptor domain." Protein Eng 8(6): 601-608.

Ogawa, N. and M. D. Biggin (2012). "High-throughput SELEX determination of DNA sequences bound by transcription factors in vitro." Methods Mol Biol 786: 51-63.

107

Orlova, A., M. Magnusson, T. L. Eriksson, M. Nilsson, B. Larsson, I. Hoiden-Guthenberg, C. Widstrom, J. Carlsson, V. Tolmachev, S. Stahl and F. Y. Nilsson (2006). "Tumor imaging using a picomolar affinity HER2 binding ." Cancer Res 66(8): 4339-4348.

Pal, A., M. A. Young and N. J. Donato (2014). "Emerging potential of therapeutic targeting of ubiquitin-specific proteases in the treatment of cancer." Cancer Res 74(18): 4955-4966.

Pal, G., S. Y. Fong, A. A. Kossiakoff and S. S. Sidhu (2005). "Alternative views of functional protein binding epitopes obtained by combinatorial shotgun scanning mutagenesis." Protein Sci 14(9): 2405-2413.

Pál, G., J. L. Kouadio, D. R. Artis, A. A. Kossiakoff and S. S. Sidhu (2006). "Comprehensive and quantitative mapping of energy landscapes for protein-protein interactions by rapid combinatorial scanning." J Biol Chem 281(31): 22378-22385.

Parizek, P., L. Kummer, P. Rube, A. Prinz, F. W. Herberg and A. Pluckthun (2012). "Designed ankyrin repeat proteins (DARPins) as novel isoform-specific intracellular inhibitors of c-Jun N- terminal kinases." ACS Chem Biol 7(8): 1356-1366.

Pawson, T. and P. Nash (2000). "Protein-protein interactions define specificity in signal transduction." Genes Dev 14(9): 1027-1047.

Peleg, O., J. M. Choi and E. I. Shakhnovich (2014). "Evolution of specificity in protein-protein interactions." Biophys J 107(7): 1686-1696.

Pepper, L. R., Y. K. Cho, E. T. Boder and E. V. Shusta (2008). "A decade of yeast surface display technology: where are we now?" Comb Chem High Throughput Screen 11(2): 127-134.

Perchiacca, J. M. and P. M. Tessier (2012). "Engineering aggregation-resistant antibodies." Annual review of chemical and biomolecular engineering 3: 263-286.

Persson, H., W. Ye, A. Wernimont, J. J. Adams, A. Koide, S. Koide, R. Lam and S. S. Sidhu (2013). "CDR-H3 diversity is not required for antigen recognition by synthetic antibodies." J Mol Biol 425(4): 803-811.

Pieris. (2016). Retrieved 25 May, 2016, from http://www.pieris.com/.

Pierret, B., H. Virelizier and C. Vita (1995). "Synthesis of a metal binding protein designed on the alpha/beta scaffold of charybdotoxin." Int J Pept Protein Res 46(6): 471-479.

Pluckthun, A. (2009). Alternative Scaffolds: Expanding the Options of Antibodies Melvyn Little, New York, Cambridge University Press

Plückthun, A. (2015). "Designed ankyrin repeat proteins (DARPins): binding proteins for research, diagnostics, and therapy." Annual review of pharmacology and toxicology 55: 489- 511.

Ponten, F., K. Jirstrom and M. Uhlen (2008). "The Human Protein Atlas--a tool for pathology." J Pathol 216(4): 387-393.

108

Popovic, D., D. Vucic and I. Dikic (2014). "Ubiquitination in disease pathogenesis and treatment." Nature medicine 20(11): 1242-1253.

Qu, Y., Q. Chen, X. Lai, C. Zhu, C. Chen, X. Zhao, R. Deng, M. Xu, H. Yuan, Y. Wang, J. Yu and J. Huang (2014). "SUMOylation of Grb2 enhances the ERK activity by increasing its binding with Sos1." Mol Cancer 13: 95.

Rabijns, A., H. L. De Bondt and C. De Ranter (1997). "Three-dimensional structure of staphylokinase, a plasminogen activator with therapeutic potential." Nat Struct Biol 4(5): 357- 360.

Ray, D., H. Kazan, K. B. Cook, M. T. Weirauch, H. S. Najafabadi, X. Li, S. Gueroussov, M. Albu, H. Zheng, A. Yang, H. Na, M. Irimia, L. H. Matzat, R. K. Dale, S. A. Smith, C. A. Yarosh, S. M. Kelly, B. Nabet, D. Mecenas, W. Li, R. S. Laishram, M. Qiao, H. D. Lipshitz, F. Piano, A. H. Corbett, R. P. Carstens, B. J. Frey, R. A. Anderson, K. W. Lynch, L. O. Penalva, E. P. Lei, A. G. Fraser, B. J. Blencowe, Q. D. Morris and T. R. Hughes (2013). "A compendium of RNA-binding motifs for decoding gene regulation." Nature 499(7457): 172-177.

Richter, A., E. Eggenstein and A. Skerra (2014). "Anticalins: exploiting a non-Ig scaffold with hypervariable loops for the engineering of binding proteins." FEBS Lett 588(2): 213-218.

Riedl, J., K. C. Flynn, A. Raducanu, F. Gärtner, G. Beck, M. Bösl, F. Bradke, S. Massberg, A. Aszodi, M. Sixt and R. Wedlich-Söldner (2010). "Lifeact mice for studying F-actin dynamics." Nature methods 7(3): 168-169.

Rognan, D. (2015). "Rational design of protein–protein interaction inhibitors

." Med.Chem.Commun.(6): 10.

Roscoe, B. P., K. M. Thayer, K. B. Zeldovich, D. Fushman and D. N. Bolon (2013). "Analyses of the effects of all ubiquitin point mutants on yeast growth rate." J Mol Biol 425(8): 1363- 1377.

Rozakis-Adcock, M., R. Fernley, J. Wade, T. Pawson and D. Bowtell (1993). "The SH2 and SH3 domains of mammalian Grb2 couple the EGF receptor to the Ras activator mSos1." Nature 363(6424): 83-85.

Rubin, M. M. and J. P. Changeux (1966). "On the nature of allosteric transitions: implications of non-exclusive ligand binding." J Mol Biol 21(2): 265-274.

Ruhlmann, A., D. Kukla, P. Schwager, K. Bartels and R. Huber (1973). "Structure of the complex formed by bovine trypsin and bovine pancreatic trypsin inhibitor. Crystal structure determination and stereochemistry of the contact region." J Mol Biol 77(3): 417-436.

Sagawa, T., M. Oda, M. Ishimura, K. Furukawa and T. Azuma (2003). "Thermodynamic and kinetic aspects of antibody evolution during the immune response to hapten." Mol Immunol 39(13): 801-808.

109

Sandstrom, K., Z. Xu, G. Forsberg and P. A. Nygren (2003). "Inhibition of the CD28-CD80 co- stimulation signal by a CD28-binding affibody ligand developed by combinatorial protein engineering." Protein Eng 16(9): 691-697.

Sankaranarayanan, R., A. C. Dock-Bregeon, P. Romby, J. Caillet, M. Springer, B. Rees, C. Ehresmann, B. Ehresmann and D. Moras (1999). "The structure of threonyl-tRNA synthetase- tRNA(Thr) complex enlightens its repressor activity and reveals an essential zinc ion in the active site." Cell 97(3): 371-381.

Scazzocchio, C. and H. M. Sealy-Lewis (1978). "A mutation in the xanthine dehydrogenase (purine hydroxylase I) of Aspergillus nidulans resulting in altered specificity. Implications for the geometry of the active site." Eur J Biochem 91(1): 99-109.

Schad, E. M., I. Zaitseva, V. N. Zaitsev, M. Dohlsten, T. Kalland, P. M. Schlievert, D. H. Ohlendorf and L. A. Svensson (1995). "Crystal structure of the superantigen staphylococcal enterotoxin type A." Embo j 14(14): 3292-3301.

Schaffer, M. L., K. Deshayes, G. Nakamura, S. Sidhu and N. J. Skelton (2003). "Complex with a phage display-derived peptide provides insight into the function of insulin-like growth factor I." Biochemistry 42(31): 9324-9334.

Schiff, D., S. Kesari, J. de Groot, T. Mikkelsen, J. Drappatz, T. Coyle, L. Fichtel, B. Silver, I. Walters and D. Reardon (2015). "Phase 2 study of CT-322, a targeted biologic inhibitor of VEGFR-2 based on a domain of human fibronectin, in recurrent glioblastoma." Invest New Drugs 33(1): 247-253.

Schlegel, S., E. Rujas, A. J. Ytterberg, R. A. Zubarev, J. Luirink and J. W. de Gier (2013). "Optimizing heterologous protein production in the periplasm of E. coli by regulating gene expression levels." Microb Cell Fact 12: 24.

Schlehuber, S., G. Beste and A. Skerra (2000). "A novel type of receptor protein, based on the lipocalin scaffold, with specificity for digoxigenin." J Mol Biol 297(5): 1105-1120.

Schmucker, J., M. Ader, D. Brockschnieder, A. Brodarac, U. Bartsch and D. Riethmacher (2003). "erbB3 is dispensable for oligodendrocyte development in vitro and in vivo." Glia 44(1): 67-75.

Schneider, T. D. and R. M. Stephens (1990). "Sequence logos: a new way to display consensus sequences." Nucleic Acids Res 18(20): 6097-6100.

Schreiber, G. and A. R. Fersht (1993). "Interaction of barnase with its polypeptide inhibitor barstar studied by protein engineering." Biochemistry 32(19): 5145-5150.

Schreiber, G. and A. E. Keating (2011). "Protein binding specificity versus promiscuity." Curr Opin Struct Biol 21(1): 50-61.

Schulman, B. A. and J. W. Harper (2009). "Ubiquitin-like protein activation by E1 enzymes: the apex for downstream signalling pathways." Nat Rev Mol Cell Biol 10(5): 319-331.

110

Scott, D. E., A. R. Bayly, C. Abell and J. Skidmore (2016). "Small molecules, big targets: drug discovery faces the protein-protein interaction challenge." Nature reviews. Drug discovery.

Sha, F., E. B. Gencer, S. Georgeon, A. Koide, N. Yasui, S. Koide and O. Hantschel (2013). "Dissection of the BCR-ABL signaling network using highly specific monobody inhibitors to the SHP2 SH2 domains." Proc Natl Acad Sci U S A 110(37): 14924-14929.

Sharabi, O., A. Erijman and J. M. Shifman (2013). "Computational methods for controlling binding specificity." Methods Enzymol 523: 41-59.

Shibasaki, S., M. Karasaki, T. Graslund, P. A. Nygren, H. Sano and T. Iwasaki (2014). "Inhibitory effects of H-Ras/Raf-1-binding affibody molecules on synovial cell function." AMB Express 4(1): 82.

Sidhu, S. S., H. B. Lowman, B. C. Cunningham and J. A. Wells (2000). "Phage display for selection of novel binding peptides." Methods Enzymol 328: 333-363.

Siegel, J. B., A. L. Smith, S. Poust, A. J. Wargacki, A. Bar-Even, C. Louw, B. W. Shen, C. B. Eiben, H. M. Tran, E. Noor, J. L. Gallaher, J. Bale, Y. Yoshikuni, M. H. Gelb, J. D. Keasling, B. L. Stoddard, M. E. Lidstrom and D. Baker (2015). "Computational protein design enables a novel one-carbon assimilation pathway." Proceedings of the National Academy of Sciences 112(12): 3704-3709.

Siggers, T. and R. Gordân (2014). "Protein-DNA binding: complexities and multi-protein codes." Nucleic acids research 42(4): 2099-2111.

Sippl, W., V. Collura and F. Colland (2011). "Ubiquitin-specific proteases as cancer drug targets." Future Oncol 7(5): 619-632.

Smith, G. P. (1985). "Filamentous fusion phage: novel expression vectors that display cloned antigens on the virion surface." Science 228(4705): 1315-1317.

Smith, G. P., S. U. Patel, J. D. Windass, J. M. Thornton, G. Winter and A. D. Griffiths (1998). "Small binding proteins selected from a combinatorial repertoire of knottins displayed on phage." J Mol Biol 277(2): 317-332.

Smith, G. P. and J. K. Scott (1993). "Libraries of peptides and proteins displayed on filamentous phage." Methods in enzymology 217: 228-257.

Sondek, J. and D. Shortle (1992). "A general strategy for random insertion and substitution mutagenesis: substoichiometric coupling of trinucleotide phosphoramidites." Proc Natl Acad Sci U S A 89(8): 3581-3585.

Sorensen, J., I. Velikyan, D. Sandberg, A. Wennborg, J. Feldwisch, V. Tolmachev, A. Orlova, M. Sandstrom, M. Lubberink, H. Olofsson, J. Carlsson and H. Lindman (2016). "Measuring HER2-Receptor Expression In Metastatic Breast Cancer Using [(68)Ga]ABY-025 Affibody PET/CT." Theranostics 6(2): 262-271.

Spence, J., R. R. Gali, G. Dittmar, F. Sherman, M. Karin and D. Finley (2000). "Cell cycle- regulated modification of the ribosome by a variant multiubiquitin chain." Cell 102(1): 67-76.

111

Sporn, M. B. and K. T. Liby (2012). "NRF2 and cancer: the good, the bad and the importance of context." Nat Rev Cancer 12(8): 564-571.

Stern, D. F. (2008). "ERBB3/HER3 and ERBB2/HER2 duet in mammary development and breast cancer." J Mammary Gland Biol Neoplasia 13(2): 215-223.

Sundberg, E. J. and R. A. Mariuzza (2000). "Luxury accommodations: the expanding role of structural plasticity in protein-protein interactions." Structure 8(7): R137-142.

Tanaka, S. I., T. Takahashi, A. Koide, S. Ishihara, S. Koikeda and S. Koide (2015). "Monobody- mediated alteration of enzyme specificity." Nat Chem Biol.

Tari, A. M. and G. Lopez-Berestein (2001). "GRB2: a pivotal protein in signal transduction." Semin Oncol 28(5 Suppl 16): 142-147.

Taussig, M. J., O. Stoevesandt, C. A. Borrebaeck, A. R. Bradbury, D. Cahill, C. Cambillau, A. de Daruvar, S. Dubel, J. Eichler, R. Frank, T. J. Gibson, D. Gloriam, L. Gold, F. W. Herberg, H. Hermjakob, J. D. Hoheisel, T. O. Joos, O. Kallioniemi, M. Koegl, Z. Konthur, B. Korn, E. Kremmer, S. Krobitsch, U. Landegren, S. van der Maarel, J. McCafferty, S. Muyldermans, P. A. Nygren, S. Palcy, A. Pluckthun, B. Polic, M. Przybylski, P. Saviranta, A. Sawyer, D. J. Sherman, A. Skerra, M. Templin, M. Ueffing and M. Uhlen (2007). "ProteomeBinders: planning a European resource of affinity reagents for analysis of the human proteome." Nat Methods 4(1): 13-17.

Terrell, J., S. Shih, R. Dunn and L. Hicke (1998). "A function for monoubiquitination in the internalization of a G protein-coupled receptor." Mol Cell 1(2): 193-202.

Teyra, J., S. S. Sidhu and P. M. Kim (2012). "Elucidation of the binding preferences of peptide recognition modules: SH3 and PDZ domains." FEBS Lett 586(17): 2631-2637.

Tonikian, R., X. Xin, C. P. Toret, D. Gfeller, C. Landgraf, S. Panni, S. Paoluzi, L. Castagnoli, B. Currell, S. Seshagiri, H. Yu, B. Winsor, M. Vidal, M. B. Gerstein, G. D. Bader, R. Volkmer, G. Cesareni, D. G. Drubin, P. M. Kim, S. S. Sidhu and C. Boone (2009). "Bayesian modeling of the yeast SH3 domain interactome predicts spatiotemporal dynamics of endocytosis proteins." PLoS Biol 7(10): e1000218.

Tonikian, R., Y. Zhang, C. Boone and S. S. Sidhu (2007). "Identifying specificity profiles for peptide recognition modules from phage-displayed peptide libraries." Nat. Protocols 2(6): 1368- 1386.

Ulmer, K. M. (1983). "Protein engineering." Science 219(4585): 666-671.

Ulrich, H. D. and H. Walden (2010). "Ubiquitin signalling in DNA replication and repair." Nat Rev Mol Cell Biol 11(7): 479-489.

Valimberti, I., M. Tiberti, M. Lambrughi, B. Sarcevic and E. Papaleo (2015). "E2 superfamily of ubiquitin-conjugating enzymes: constitutively active or activated through phosphorylation in the catalytic cleft." Sci Rep 5: 14849.

112

Van Cutsem, E., C. H. Kohne, E. Hitre, J. Zaluski, C. R. Chang Chien, A. Makhson, G. D'Haens, T. Pinter, R. Lim, G. Bodoky, J. K. Roh, G. Folprecht, P. Ruff, C. Stroh, S. Tejpar, M. Schlichting, J. Nippgen and P. Rougier (2009). "Cetuximab and chemotherapy as initial treatment for metastatic colorectal cancer." N Engl J Med 360(14): 1408-1417.

Vaquerizas, J. M., S. K. Kummerfeld, S. A. Teichmann and N. M. Luscombe (2009). "A census of human transcription factors: function, expression and evolution." Nat Rev Genet 10(4): 252- 263.

Verrijzer, C. P., M. J. Alkema, W. W. van Weperen, H. C. Van Leeuwen, M. J. Strating and P. C. van der Vliet (1992). "The DNA binding specificity of the bipartite POU domain and its subdomains." The EMBO journal 11(13): 4993-5003.

Wade, M., Y. C. Li and G. M. Wahl (2013). "MDM2, MDMX and p53 in oncogenesis and cancer therapy." Nat Rev Cancer 13(2): 83-96.

Weiss, G. A., C. K. Watanabe, A. Zhong, A. Goddard and S. S. Sidhu (2000). "Rapid mapping of protein functional epitopes by combinatorial alanine scanning." Proc Natl Acad Sci U S A 97(16): 8950-8954.

Wells, J. A. (1996). "Binding in the growth hormone receptor complex." Proc Natl Acad Sci U S A 93(1): 1-6.

Wells, J. A. and A. M. de Vos (1996). "Hematopoietic receptor complexes." Annu Rev Biochem 65: 609-634.

Williams, A. and L. G. Baird (2003). "DX-88 and HAE: a developmental perspective." Transfus Apher Sci 29(3): 255-258.

Wintrode, P. L., G. I. Makhatadze and P. L. Privalov (1994). "Thermodynamics of ubiquitin unfolding." Proteins 18(3): 246-253.

Wlodarski, T. and B. Zagrovic (2009). "Conformational selection and induced fit mechanism underlie specificity in noncovalent interactions with ubiquitin." Proc Natl Acad Sci U S A 106(46): 19346-19351.

Wojcik, J., O. Hantschel, F. Grebien, I. Kaupe, K. L. Bennett, J. Barkinge, R. B. Jones, A. Koide, G. Superti-Furga and S. Koide (2010). "A potent and highly specific FN3 monobody inhibitor of the Abl SH2 domain." Nature structural & molecular biology 17(4): 519-527.

Woolfson, D. N., G. J. Bartlett, A. J. Burton, J. W. Heal, A. Niitsu, A. R. Thomson and C. W. Wood (2015). "De novo protein design: how do we expand into the universe of possible protein structures?" Curr Opin Struct Biol 33: 16-26.

Wörn, A. and A. Plückthun (2001). "Stability engineering of antibody single-chain Fv fragments." Journal of molecular biology 305(5): 989-1010.

Wu, T. T. and E. A. Kabat (1970). "An analysis of the sequences of the variable regions of Bence Jones proteins and myeloma light chains and their implications for antibody complementarity." J Exp Med 132(2): 211-250.

113

Wurch, T., A. Pierre and S. Depil (2012). "Novel protein scaffolds as emerging therapeutic proteins: from discovery to clinical proof-of-concept." Trends Biotechnol 30(11): 575-582.

Yang, Y., J. Kitagaki, H. Wang, D. X. Hou and A. O. Perantoni (2009). "Targeting the ubiquitin-proteasome system for cancer therapy." Cancer Sci 100(1): 24-28.

Yarden, Y. and M. X. Sliwkowski (2001). "Untangling the ErbB signalling network." Nat Rev Mol Cell Biol 2(2): 127-137.

Ye, Y. and M. Rape (2009). "Building ubiquitin chains: E2 enzymes at work." Nat Rev Mol Cell Biol 10(11): 755-764.

Zack, D. J., A. L. Wong, M. Stempniak and R. H. Weisbart (1995). "Two kappa immunoglobulin light chains are secreted by an anti-DNA hybridoma: implications for isotypic exclusion." Molecular immunology 32(17-18): 1345-1353.

Zhang, W., K. P. Wu, M. A. Sartori, H. B. Kamadurai, A. Ordureau, C. Jiang, P. Y. Mercredi, R. Murchie, J. Hu, A. Persaud, M. Mukherjee, N. Li, A. Doye, J. R. Walker, Y. Sheng, Z. Hao, Y. Li, K. R. Brown, E. Lemichez, J. Chen, Y. Tong, J. W. Harper, J. Moffat, D. Rotin, B. A. Schulman and S. S. Sidhu (2016). "System-Wide Modulation of HECT E3 Ligases with Selective Ubiquitin Variant Probes." Mol Cell 62(1): 121-136.

Zhang, Y., L. Zhou, L. Rouge, A. H. Phillips, C. Lam, P. Liu, W. Sandoval, E. Helgason, J. M. Murray, I. E. Wertz and J. E. Corn (2013). "Conformational stabilization of ubiquitin yields potent and selective inhibitors of USP7." Nat Chem Biol 9(1): 51-58.

Zilfou, J. T. and S. W. Lowe (2009). "Tumor suppressive functions of p53." Cold Spring Harb Perspect Biol 1(5): a001883.

Zolghadr, K., J. Gregor, H. Leonhardt and U. Rothbauer (2012). Case Study on Live Cell Apoptosis-Assay Using Lamin-Chromobody Cell-Lines for High-Content Analysis. Single Domain Antibodies: Methods and Protocols. D. Saerens and S. Muyldermans. Totowa, NJ, Humana Press: 569-575.

114