Engineering a 45- Scaffold for Molecular Imaging

A DISSERTATION SUBMITTED TO THE FACULY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA

BY

Max Kruziki

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

Advisor: Benjamin J. Hackel

December 2017

© Max Kruziki 2017 Acknowledgements I am thankful for all the wonderful people I have met during my years in graduate

school. They have had the biggest impact on my happiness and success both at work and

in life.

Katie, who will soon be my wife, has inspired me to push myself be the best that I

can be, in the hopes of even approaching all she has accomplished. She has helped keep

me hopeful when experiments weren’t going well, and has provided encouragement and

friendship whenever I needed it most.

All of my lab mates, but especially Danny, Brett, Larry, and Sadie who have shared

an office with me nearly my entire time in Minnesota made each day enjoyable. We could always laugh and joke around, and yet switch to serious scientific discussion when someone had a question or needed help. The undergraduates who have worked with me;

Vandon, Andrew, Lizzie, and Feifan were great people to work with and mentor, and it was fulfilling to watch their scientific knowledge grow.

The entire CEMS department has been like a family to me. Especially the other students in my class, who struggled with me through early classes and long homework assignments. Without the kindness and support from them during the intense first few semesters, grad school would have been much different and much less fun.

Ben, my advisor, has instilled in me the desire to do good science and be vigilant to not cut corners or take the easy way out. His optimism is contagious, and our weekly

i meetings always left me feeling refreshed and encouraged to conquer what problems I may face.

Finally, I want to thank my family and friends outside of CEMS. My brother, Jake, and along with my other close friends give an escape to talk about topics outside of chemical engineering and fulfill my interests outside of science. Most importantly, my parent’s unwavering belief and support is what has allowed me to accomplish all I have throughout life.

ii

Abstract

Cancer is the second leading cause of death in the United States. Molecularly

targeted cancer treatments, including monoclonal and kinase inhibitors, exhibit

strong performance on a small subset of patients but are inconsistent due to tumor

heterogeneity. Biopsy-based genetic and protein tumor characterization provide value but

cannot address spatial or temporal variations in heterogeneity. Non-invasive methods, such

as molecular imaging, to characterize cancer cells will allow for easier patient stratification

and treatment monitoring. Currently, molecular imaging is limited by the modest

availability of quality probes that efficiently distribute throughout the body and

quantitatively localize at the site of the cancer biomarker. Engineering effective diagnostic

molecular probes would provide a substantial advance in cancer characterization and

personalized medicine.

Protein scaffolds, which comprise a large stabilizing framework and a randomized

region onto which binding interactions can be engineered, offer an efficient platform for

probe engineering. More broadly, engineered binding are useful in many aspects

of and medicine.

In this thesis, we mined ~100,000 known protein topologies to identify candidate

small protein scaffolds. We developed the 45-amino acid Gp2scaffold and evolved

multiple Gp2 variants that strongly (as strong as 0.2 nM) and specifically (greater than 50:1 target:control) bind their respective target while also retaining high thermal stability (65-

80 ºC thermal desaturation midpoint) .

iii

A Gp2 variant that was evolved to bind with strong affinity to epidermal growth

factor receptor (EGFR), a cell surface biomarker overexpressed in multiple cancer types,

was more thoroughly investigated in pre-clinical studies. This variant exhibited strong (18

nM), selective binding, and was passive on normal EGFR signaling pathways, which is

important to reduce off-target side effects. PET imaging of subcutaneously xenografted

tumors in mice revealed effective probe localization to EGFR-high tumors while low signal

was observed in EGFR-low tumors and from non-targeted control Gp2.

Gp2 evolution was studied by comparing the efficacy of different combinatorial

library amino acid diversity based on high throughput sequencing data, natural Gp2

homologs, structural data, and computed stability. Multiple library designs elucidated amino acid diversity that was beneficial or detrimental in different sections of the Gp2 protein, and will aid future evolution and developability of Gp2. From these libraries, high affinity Gp2 variants targeting an additional clinically-relevant cancer biomarker, programmed death- 1 (PD-L1), were evolved, isolated, and characterized.

Collectively this work identifies and validates Gp2 as a new potential tool for biomarker-

based cancer detection and sets a strong foundation for future optimization.

iv

Table of Contents List of Figures ...... x

List of Tables ...... xiii

Chapter 1: Introduction ...... 1

1.1.1 Benefits of molecular recognition ...... 1

1.1.2 Protein scaffolds as imaging agents ...... 4

1.1.3 Protein evolution and design ...... 9

Chapter 2: A 45-amino acid scaffold mined from the Protein Data Bank for high affinity ligand engineering ...... 14

2.1. Abstract ...... 14

2.2. Introduction ...... 14

2.3 Experimental Procedures ...... 18

2.3.1 Protein Data Bank Analysis ...... 18

2.3.2 Library Construction ...... 19

2.3.3 Binder Selection and ...... 19

2.3.4 Illumina MiSeq Analysis ...... 20

2.3.5 Affinity and Biophysical Properties...... 21

2.4. Results ...... 22

2.4.1 Scaffold Discovery and Library Construction ...... 22

2.4.2 Yeast Surface Display Selection Against Model Protein Targets ...... 26 v

2.4.3 Soluble Protein Characterization ...... 30

2.4.4 EGFR-Targeting Gp2 Domains ...... 31

2.4.5 Deep Sequencing of Naïve and Binding Populations ...... 34

2.5 Discussion ...... 36

2.6 Significance...... 39

2.7 Acknowledgements ...... 40

2.8 Supplemental Data ...... 40

2.8.1 Supplemental Experimental Procedures ...... 48

Chapter 3: A 64Cu-labeled Gp2 Domain for PET Imaging of Epidermal Growth Factor

Receptor ...... 59

3.1 Abstract ...... 59

3.2 Introduction ...... 60

3.3 Materials and Methods ...... 62

3.3.1 Protein production and DOTA conjugation ...... 62

3.3.2 Size Exclusion Chromatography...... 63

3.3.4 Cell growth...... 63

3.3.5 Affinity measurement ...... 63

3.3.6 Western Blot Analysis ...... 64

3.3.7 Internalization ...... 65

vi

3.3.8 Copper chelation and purification ...... 66

3.3.9 Radio TLC ...... 66

3.3.10 Tumor inoculation ...... 66

3.3.11 EGFR expression quantification ...... 67

3.3.12 PET imaging – static and dynamic ...... 67

3.3.13 Tissue gamma counting ...... 68

3.3.14 Statistics ...... 68

3.4 Results ...... 69

3.4.1 Gp2 Production and Conjugation...... 69

3.4.2 EGFR Binding ...... 70

3.4.3 Biological Activity ...... 71

3.4.4 Copper chelation and purification ...... 72

3.4.5 Murine model micro-PET/CT and tissue biodistribution ...... 73

3.5 Discussion ...... 77

3.6 Acknowledgement ...... 81

3.7 Supplemental Information ...... 81

Chapter 4: High affinity PD-L1 binding Gp2 proteins isolated from diversity constrained combinatorial library ...... 88

4.1. Introduction ...... 88

vii

4.2 Methods...... 90

4.2.1 Design of second generation library ...... 90

4.2.2 General Constraint ...... 94

4.2.3 Extended Paratope ...... 97

4.2.4 PD-L1 designer libraries ...... 99

4.2.5 High Background Bead Sorts...... 100

4.2.6 Library construction, sorting, and sequencing ...... 100

4.2.7 Protein production and characterization ...... 101

4.3 Results ...... 102

4.3.1 Design of second generation library ...... 102

4.3.2 Sorting and sequencing ...... 108

4.3.3 Further evolution of PD-L1 ligands ...... 112

4.4 Discission ...... 116

4.5 Conclusion ...... 119

4.6 Supplemental Information ...... 119

Chapter 5: Concluding Remarks ...... 129

Bibliography ...... 134

Appendix A: Immunogenicity of Gp2-EGFR...... 152

A.1 Introduction ...... 152

viii

A.2 Methods ...... 153

A.3 Results and Discussion...... 154

Appendix B: Gp2-EGFR framework hydrophilicity engineering ...... 156

B.1 Introduction ...... 156

B.2 Methods ...... 156

B.2.1 Generating Hydrophilic Mutants ...... 156

B.2.2 Protein Production and Characterization...... 157

B.3 Results and Discussion ...... 157

B.3.1 Designed hydrophilic mutants ...... 157

B.3.2 Combination hydrophilic mutants ...... 160

ix

List of Figures

Figure 1.1. Protein scaffold interacting with a target...... 5

Figure 1.3. Protein fitness in the sequence space landscape...... 12

Figure 2.1. Summary of information for top potential scaffolds ...... 24

Figure 2.2. Solution structure of Gp2 ...... 25

Figure 2.3. Binding characterization...... 29

Figure 2.4. Soluble Gp2 characterization...... 31

Figure 2.5. GαEGFR2.2.3 affinity and stability...... 33

Figure 2.6. Deep sequencing comparison of naïve and binding populations...... 35

Figure S2.1. Related to Figure 2.6. High throughput analysis of Gp2 library and evolved sequences...... 41

Figure S2.2. Related to Figure 2.4. Affinity maturation and characterization of

GαLysA0.3.3...... 43

Figure S2.3. related to Figure 2.3. Collection of affinity titration curve replicates ...... 44

Figure 3.1 Gp2 conjugation...... 70

Figure 3.2 Affinity titration...... 71

Figure 3.3 Biological Activity...... 72

Figure 3.4 PET/CT imaging ...... 74

Figure 3.5 Resected tissue gamma counting...... 76

x

Figure 3.6 Dynamic PET scans...... 77

Figure S3.1. Mass spectrometry traces of DOTA conjugation to Gp2 ...... 84

Figure S3.2. Size exclusion chromatography ...... 85

Figure S3.3. Multiple replicates of titrations of murine EGFR ...... 86

Figure S3.4. EGFR expression of excised mouse flank tumor xenografts was analyzed via flow cytometry...... 86

Figure 4.1. Change in amino acid frequency from initial naïve library to evolved binding sequences isolated from the Gp2 generation 1 library...... 91

Figure 4.2. Frequency of amino acids in sequences of natural homologs to Gp2...... 92

Figure 4.3. Change in folding energy ...... 93

Figure 4.4. Solvent accessible surface area ...... 94

Figure 4.5. Amino acid frequency in extended paratope residues from generation 1 Gp2.

...... 98

Figure 4.6. Expected amino acid frequency for CDR+ and CDR- sites...... 104

Figure 4.7. Potential of bonded cysteines in generation 1 Gp2...... 107

Figure 4.8. Binding yeast recovery during magnetic bead sorts with high levels of non- binding yeast (EBY)...... 109

Figure 4.9. Sequencing and library identity of naïve and evolved populations of the generation 2 Gp2...... 110

Figure 4.10. PD-L1 binding Gp2 protein characterization...... 113

xi

Figure 4.11. Sequences of strongest binding evolved and parental PD-L1 binding Gp2.

...... 116

Figure A1. Immunogenic response to Gp2 and controls...... 155

Figure B1. Hydrophilicity of wild type Gp2...... 159

Figure B2. Single hydrophilic mutant characterization...... 160

Figure B3. Characterization of hydrophilic combination mutants...... 161

xii

List of Tables

Table 1.1. Characterization of wild-type Gp2 and the top binding molecule for each target.

...... 27

Table S2.1. related to Figure 2.1. Brief summary of scaffolds ...... 45

Table S.2.2 Oligonucleotide sequences used in current study. xyz codons represent CDR`

diversity...... 50

Table S3.1. Protein sequence alignment of two probes used in this study...... 81

Table 4.1. Designs of second generation Gp2 libraries...... 102

Table 4.2. Expected amino acid frequency for each degenerate codon used in the Gp2

second generation library...... 103

Table 4.3. Library design for PD-L1 binding Gp2 evolution...... 113

Table 4.4. Expected amino acid frequency for each degenerate codon used in PD-L1 library designs...... 114

Table S4.1. Oligonucleotide sequences used to construct generation 2 Gp2...... 119

xiii

Chapter 1: Introduction

1.1.1 Benefits of molecular recognition

Cancer is the second leading cause of death in the United States killing an estimated

600,000 people in 2017[1]. Cancer mortality has declined by 30% over the past two

decades due to reduced smoking and advances in early detection and treatment. This

declining trend is driven by improved survivability in four of the five main cancer types

(lung, colorectal, breast, and prostate), however there is still room for significant

improvement in early detection and treatment.

One of the reasons cancer treatments are so unreliable is that cancer differs vastly not only

across patients, but also within a given patient. Advancements in microscopic visualization

and ex vivo molecular characterization of cancer cells has revealed a high degree of heterogeneity among tumor cells across patients [2]. Genomics studies in recent years have further elucidated this heterogeneity, enabling the description of the exact mutations that

give rise to cancerous cells in one patient versus another. Typically, upon cancer diagnosis,

a treatment plan is decided based on a single biopsy of a portion of the primary tumor.

However, cost and time limitations prevent full genomic studies on every patient.

Moreover, even if every biopsy was characterized, more advanced tumors may have

heterogeneity within a single patient’s primary tumor and this diversity can become even

more pronounced as the disease progresses towards metastatic spreading. Therefore,

noninvasive methods must be developed to more easily and accurately characterize cancer

molecular signatures.

1

Traditionally, cancer treatments, like chemotherapy, were designed to broadly inhibit cell

division and kill growing cells indiscriminately. However, these treatments are often unpredictable in their efficacy and the lack of specificity results in severe damage to healthy cells. Treatments have thus been transitioning towards more cancer-targeted therapies [3,

4]. These newer drugs act based on the increased expression of certain proteins

(biomarkers) or metabolic pathways within cancer cells.

Because biomarker and pathway expression differ between patients, knowing the overall

molecular signature of a patient’s cancer becomes important for predicting therapeutic

response of a patient to a specific targeted drug. For example, epidermal growth factor

receptor (EGFR), is a cell surface glycoprotein in the ErbB family of receptor tyrosine

kinases [5] and is a target for multiple new therapies. EGFR overexpression, amplification,

or mutation occurs in several tumor types [6]and can lead to increased activity that is

suspected to contribute to tumor growth [7]\. The drugs gefitinib and erlotinib act by

inhibiting the EGFR signaling pathway. Consequently, certain mutations or gene

amplifications related to this signaling can be predictive of their efficacy [8].

Similarly, another receptor tyrosine kinase present on cell surfaces, Met, is expressed in

various tumors and may contribute to their poor prognoses [9, 10]. Met expression is often

associated with acquired resistance to other drugs, such as EGFR inhibitors in non-small

cell lung cancer [11]. In a clinical trial for onartuzumab (Met ) plus erlotinib

(EGFR inhibitor), Met-high expressing patients responded favorably, whereas Met-low

patients saw detrimental effects over placebo [12]. Knowing the expression level of Met

can help predict if patients will become resistant to other treatments and provides a

2 potential pathway for blocking that resistance by using MET inhibitors.

Several Met inhibitors are also being examined in clinical trials in combination with other drugs [13].

In a final example, PD-L1 is a cell surface receptor important in regulating the body’s immune response by bind to PD1 receptors expressed on T cells and halting their development. Recent studies have down that PD-L1 is overexpressed in several

([14] and can help the cancer evade immune response [15]. Multiple checkpoint inhibitors have been FDA approved to target this interaction, including Atezolizumab, Avelumab, and Durvalumab [16–18]. PD-L1 overexpression has also been shown to be predictive of response for melanoma patients, however no correlation was seen in non-small cell lung cancer patients [19].

Taken together, these examples provide evidence that expression of biomarkers can be predictive of therapeutic response to targeted therapies and make clear the need to efficiently and accurately assess expression levels.

As mentioned earlier, localized biopsies and genetic testing are currently the primary routes of molecular characterization in cancer. Positron emission tomography (PET) using biomarker-specific molecular probes offers a promising non-invasive method that could be used to assess biomarker expression levels throughout a patient’s cancer. PET currently sees significant clinical usage with 18F-FDG which can detect metabolic changes typical in cancer cells [20]. The radioactive sugar analog detected in 18F-FDG PET allows for high depth penetration and very high signal over background, but the specificity is hindered due to the tendency of 18F-FDG signal to accumulate in other non-malignant tissues that have 3

high sugar utilization. 18F-FDG could be replaced with specific molecular recognition

probes that detect biomarker levels. Utilizing the current PET imaging infrastructure with

these new molecular recognition probes will allow medical professionals to non-invasively

predict therapeutic response to different targeted therapies.

In addition to determining expression levels, molecular recognition probes can potentially

aid in early detection. Non-invasive screening methods that have a low false positive rate

and are not harmful to the patient are important as the first line of early detection. For

patients in higher risk categories, or who have tested positive in a screen, computed

tomography (CT) imaging is often used due to the high resolution of anatomical structures

and wide availability. However, CT is limited in terms of signal to background and small

tumors may be overlooked. Other modalities can provide improved imaging sensitivity and

selectivity. For example, a lipopeptide microbubble targeting vascular endothetial growth

factor receptor 2 has been used as an ultrasound contrast agent in clinical trials for patients

with prostate cancer [21]. Similarly, photoacoustic imaging of molecularly targeted agents

are in the early stages of development, such as a protein-small molecule conjugate used to target B7-H3, a breast cancer biomarker [22]. When combined with ultrasound and

photoacoustic imaging, which are widely availability, cost-effective, and non-invasive, molecular probes can provide additional molecular information and specificity leading to earlier detection and improved efficacy.

1.1.2 Protein scaffolds as imaging agents

Protein scaffolds that can be evolved to bind a target of interest are immensely useful for developing molecular probes towards new biomarkers (Figure 1.1). Additionally, proteins 4

that non-covalently recognize and bind other molecules have many uses in biotechnology and medicine outside of imaging. However, the discovery and engineering of novel binding

function is challenging because the complexity of both inter- and intra-molecular protein

interactions effectively precludes explicit design while the enormity of protein sequence

possibilities challenges combinatorial discovery. Yet, nature has solved the binder

discovery and engineering problem in its immune systems by using a protein ‘scaffold’

comprised of a stable, unchanged region and a variable region through which chemical

diversity can be obtained. In most cases, including humans, this scaffold is the antibody, a

large (150 kDa) protein that consists of a region that is highly diverse throughout the

myriad of antibody variants in the body. This variable region is the area that determines

what target a specific antibody will bind. These proteins excel as binding molecules for

multiple reasons; their stable structure allows a consistent region to be exposed for potential

interactions, their size allows for a large interaction area to drive strong and specific

interactions, and the amino acid building blocks allow for a multitude of unique

interactions.

Figure 1.1. Protein scaffold interacting with a target. The variable paratope region (red) of the protein scaffold is supported by a conserved frame work region (blue) to bind to a given target (black). Adapted from [23]

5

Antibodies have been successfully developed to target many cancer biomarkers in the clinic. However, the large size and neonatal Fc receptor binding of antibodies leads to slow

clearance from the blood [24], which while beneficial for minimally toxic therapeutic

applications hampers molecular imaging [25–27] due to high background signal (Figure

1.2). Smaller molecules have been reported to distribute more effectively into tissue, especially enabling penetration into solid tumors [28]. Additionally, the cysteine containing, multi-domain structure of antibodies can lead to complications during synthesis and site-selective conjugation of other molecules. These drawbacks have prompted a search for smaller antibody alternatives that reduce or eliminate the negative characteristics while maintaining strong, specific binding to a target [29].

Figure 1.2. Antibody vs. small protein imaging in mice. At early timepoints, the smaller affibody provides an image of much higher tumor (T) to background ratio compared to the large, slow clearing antibody. Adapted from Orlova 2009.

As protein size becomes smaller, added difficulties arise when generating a protein ligand

towards a target of interest. The smaller size of the protein generally leads towards smaller

potential interaction areas, which are correlated with decreased binding affinity [30].

Amino acid changes or mutations can be added to a ligand to increase the interaction area

6 and strength and thereby the affinity. However, for small proteins even a small number of mutations are a high percentage of the total protein which, since mutations are on average deleterious to protein stability, can quickly result in an unstable, inactive protein [31]. Thus, amino acid mutations need to be balanced between function (intermolecular interactions) and stability (intramolecular interactions) – a difficult task for small proteins.

One method for developing new, smaller ligands towards targets of interest is to use alternative, non-antibody protein scaffolds. As mentioned above, protein scaffolds consist of a small randomized paratope, or proposed binding region, and a larger conserved framework that stabilizes the paratope structure [23]. The paratope region contains most, if not all, of the amino acids that interact with a given target and each paratope is unique and specific for the target it binds. The framework contains amino acids that have strong intramolecular interactions to drive correct folding of the protein. Protein scaffolds aim to have a framework with high thermal and chemical stability to remain functional for each new paratope sequence. Another important quality of protein scaffolds is their developability, i.e. ease of evolution to bind new targets, ease of synthesis or production, and controlled conjugation of chemical moieties in downstream reactions.

A multitude of protein scaffolds have been developed over the past 15-20 years, including a handful that are smaller than 10 kDa [32]. Some examples include [33], fynomers

[34–38], and affitins [39–41]. While these scaffolds have different secondary structures, they all consist of two loops as the paratope region and are larger than 60 amino acids.

Each has been evolved towards a small number of different targets and often contain properties that make them useful in certain situations. For example, affitins parental

7 structure is derived from a hyperthermophilic organism and therefore the affitins often have very high thermal stability.

The smallest scaffolds, bicyclic and knottins, can be as small as 2-4 kDa [42–45].

These scaffolds are stabilized by covalent bonds tethering regions of the protein distant in primary sequence, usually through disulfide bonds or non-natural amino acids. For knottins, the disulfide structure complicates synthesis and limits downstream applications.

For peptides, the less constrained structure increases entropy penalties when binding and, in combination with the small potential interaction area, leads to difficult evolution of high- affinity ligands [30].

One of the most well-established small scaffolds is the Affibody [46]. The Affibody is a 3- helix bundle consisting of 58 amino acids. A portion of the outer surfaces of two of the helices form the paratope, which has been evolved to bind many unique targets, such as

HER2, EGFR, VEGFR2, FVIII, and others, with affinities as strong as sub-nanomolar. An affibody targeting HER2 has reached phase II clinical trials for PET imaging in breast cancer patients (NCT01858116). Historically, affibodies were often heavily destabilized after evolution towards new targets with a median thermal stability of 46°C [23], but recent advances in paratope design have remedied this drawback, increasing median evolved stability to 62 °C [47].

Overall, many scaffolds have been developed but none have flourished as the end-all replacement for antibodies or other ligands in cancer imaging. Thus, there remains a need for a small, efficiently evolved, stable scaffold to improve the ease of generating ligands and provide beneficial molecules for use in the clinic. 8

1.1.3 Protein evolution and design

When engineering a protein scaffold to recognize a new target, the optimal strategy consists of mutating each amino acid in the paratope to a new amino acid that benefits binding to the target based on the exposed sidechain chemistries. However, choosing the optimal mutation without prior knowledge proves nearly impossible in most practical cases due to the amount of interactions that need to be accounted for when mutating even a small number of amino acids. Rational design of the paratope has been successfully carried out, but requires large amounts of structural data about both the ligand and the target [48]. Even then, the design route is challenging. Phylogenetic data can provide insight into potential beneficial interactions as well [49], but is often limited depending on the scaffold of interest. Even when this data exists, it may be insufficient for low-throughput rational design of building and testing each ligand computationally or experimentally due to the vast number of combinations that need to be examined.

High-throughput methods for protein engineering allow for large amounts of proteins to be constructed and screened without explicitly requiring previous knowledge of or complicated predictions of potential interactions. However, knowledge of the correct combinatorial library design can greatly improve discovery methods [47, 50, 51].

Multiple methods for high-throughput protein sorting have been reported [52], all of which rely on displaying a protein on a particle that is linked to the gene that encodes for that protein. This phenotype-genotype linkage allows for protein function to be evaluated, the best performing proteins to be collected, and the DNA that encodes these proteins to be amplified and identified. Often, this amplified DNA is reentered into the system, translated

9

into protein, and sorted again to increase functional proteins and reduce false positives.

Mutations can be added to the DNA while it is amplified and the sorts can be carried out

with increasing stringency, resulting in directed evolution towards proteins with the most

function.

Many of these display systems utilize cells to produce the DNA and protein, such as phage

display [53] and yeast display [54]. For example, for ligand discovery and evolution in

yeast display DNA encoding unique ligand variants linked to a yeast mating protein

(Aga2p) is transformed into cells. As yeast grow they produce the Aga2p-ligand construct

and shuttle it to the outer surface of their cells where the ligand is displayed (about 10,000

times per cell). The function of a single yeast displaying a ligand is measured and sorted with magnetic beads [55] displaying target or flow cytometry [56] with fluorescent target,

with the best performing yeast being collected. As the collected yeast grow, the daughter

cells contain copies of DNA of the best performing ligands and more sorts can be

performed.

The amount of unique protein variants that can be investigated in cellular display systems

is limited by transformation and cell size with maximums of approximately 109 for yeast

and 1010 for phage. The benefits of using cells are potentially better folding of proteins due

to the additional cellular machinery and multivalent display which can allow for more

unique, functional proteins to be isolated through initial avidity driven capture of weak

binders, which can be further evolved to improve affinity. Conversely, cell-free display

systems such as mRNA [57] or ribosome display [58], where the ligand is linked directly

to the mRNA or ribosome/mRNA complex, can have many more proteins investigated, up

10

to 1013, with the only limitation being on how much mass of DNA can be produced, not

transformation efficiency as in yeast or phage. However, these proteins are produced

through in vitro translation and, lacking some cellular machinery, may be more likely to

be unfolded and only contain one copy of the ligand per particle, so very weak binders will not be collected and evolved.

Even though high-throughput display systems allow for investigation of up to 1013 unique

proteins, this is still only fraction of the total diversity available for even a small section of

a protein (12 sites have 2012 = 1015 choices, which requires 3212 = 1018 DNA sequences

using standard synthesis techniques). Moreover, the vast majority of proteins do not

contain the function of interest and if a protein does have the function of interest a single

amino acid change may completely abolish it [59] (Figure 1.3). On average, mutations are

destabilizing, so even mutation that would be functionally beneficial may completely

destabilize and unfold the protein [60]. Additionally, as proteins evolve in nature, there

typically isn’t a selective pressure for extreme stability as most cells don’t exist in extreme

conditions. In fact, it can be beneficial for proteins to be easily degraded after they are no

longer needed. Thus proteins often cannot tolerate a large number of destabilizing

mutations. Therefore, it is vital that the initial library of proteins be chosen in order to

maximize the number of potentially functional library members.

11

Figure 1.3. Protein fitness in the sequence space landscape. The vast amount of sequence space is filled with proteins that do not carry the function of interest (black). In areas where there are pockets of fitness, the sequence space is usually jagged such that a single amino acid change could abolish function or local maxima can limit evolution. Adapted from [59].

Increasing the fraction of functional proteins in a library can be achieved through two

methods: a more stable framework or improved paratope randomization. Most proteins that

exist in nature are only moderately stable, leaving room for potential engineered

improvements [61, 62]. With a stable framework a higher number of destabilizing, but perhaps beneficial to function, mutations can be tolerated before the protein becomes completely unfolded and thus non-functional [63]. As more mutations are made, the

potential interaction area between target and ligand can increase, increasing binding

affinity [30]. Improving the composition of the paratope increases the fraction of functional

proteins by increasing beneficial amino acids and decreasing detrimental amino acids in

the initial library. Not all amino acids provide equal benefit for binding interfaces, for

example, antibody complementarity determining regions have a distinct amino acid profile

12

where certain amino acids such as tyrosine and are highly enriched [64]. Increasing the prevalence of these amino acids can lead to a higher fraction of functional paratopes in the library. On the other hand, limiting detrimental amino acids can decrease the fraction of completely destabilized and unfolded library members. Decreasing the prevalence of destabilizing mutations requires phylogenetic data to suggest what amino acids are not allowed at certain sites [47, 51]. Such data can still only provide guidelines of what to allow at a certain site due to the complication that all the other amino acids within the paratope site are changing as well.

In the work described here, a new small protein scaffold, Gp2, is developed and engineered for use as a molecular PET imaging agent. Paratope design and rational mutations to the natural framework are used to begin optimization of the scaffold.

13

Chapter 2: A 45-amino acid scaffold mined from the Protein Data Bank for high affinity ligand engineering

2.1. Outline

Small protein ligands can provide superior physiological distribution versus antibodies and improved stability, production, and specific conjugation. Systematic evaluation of the

Protein Data Bank identified a scaffold to push the limits of small size and robust evolution of stable, high-affinity ligands: 45-residue T7 phage gene 2 protein (Gp2) contains an α- helix opposite a β-sheet with two adjacent loops amenable to mutation. De novo ligand discovery from 108 mutants and directed evolution towards four targets yielded target- specific binders with affinities as strong as 200 ±100 pM, Tm’s from 65 ±3 ºC to 80 ±1 ºC, and retained activity after thermal denaturation. For cancer targeting, a Gp2 domain for epidermal growth factor receptor was evolved with 18 ±8 nM affinity, receptor-specific binding, and high thermal stability with refolding. The efficiency of evolving new binding function and the size, affinity, specificity, and stability of evolved domains render Gp2 a uniquely effective ligand scaffold.

2.2. Introduction

Molecules that bind targets specifically and with high affinity are useful clinically for imaging, therapeutics, and diagnostics as well as scientifically as reagents for biological modulation, detection, and purification. Antibodies have been successfully used for these applications in many cases, but their drawbacks have instigated a search for alternative protein scaffolds from which improved binding molecules can be developed [29, 65]. 14

Biodistribution mechanisms such as extravasation [66, 67] and tissue penetration [68, 69]

are limited by large size (150 kDa for , 50 kDa for -binding

fragments, and even 27 kDa for single-chain variable fragments) thereby reducing delivery to numerous locales including many solid tumors. Additionally, large size and FcRn-

mediated recycling slow plasma clearance [24]. While beneficial for minimally toxic

molecular therapeutic applications, slow clearance greatly hinders molecular imaging and

systemically toxic therapeutics such as radioimmunotherapy [70] via high background.

Smaller agents yield improved results [25–27]. Moreover small size does not preclude

therapeutic applications where blocking a protein/protein interaction is required [71]. As

scientific reagents, small size aids synthesis and selective conjugation including protein

fusion. Yet significant reduction in scaffold size increases the challenge of balancing

evolved intermolecular interaction demands for affinity [30, 72] or function while retaining

beneficial intramolecular interactions for stability and solubility.

Protein scaffolds, frameworks upon which numerous functionalities can be independently

engineered, offer a consistent source of binding reagents for the multitude of biomarkers

and applications thereof [29, 65, 73]. A successful protein scaffold should be efficiently

evolvable to contain all of the following properties. High affinity (low-nanomolar

dissociation constant) and specificity provide potent delivery [27, 67], reduce side effects

in clinical applications, and are requisite for precise use in biological study. Stable protein

scaffolds provide tolerance to mutations in the search for diverse and improved function

[63], resistance to chemical and thermal degradation in production and synthetic

manipulation, in vivo integrity to avoid immunogenicity and off-target effects [74, 75], and

15 robustness to harsh washing conditions in vitro. Contrary to the multi-domain architecture of many antibodies and fragments, single domain architecture facilitates production of native ligands as well as incorporation into protein fusions and other multicomponent systems. Cysteine-free structure allows for bacterial production in the reducing E. coli cellular environment, intracellular stability in mammals, and the option of a genetically introduced thiol for site-specific chemical conjugation.

A multitude of alternative protein scaffolds have arisen that possess many of these beneficial properties (Table S1). (11 kDa) [76, 77], nanobodies (11 kDa) [78], designed ankyrin repeat proteins (20 kDa) [79], and (20 kDa) [80] have been evolved to interact with numerous targets with high affinity while maintaining stability.

However, the relatively large size of these scaffolds leaves room for potential improvement in solid tumor penetration and biodistribution through decreased size. Very small size has been achieved in the case of the cystine knottin scaffold (20-50 amino acids) [81] and cyclic peptides (17 amino acids) (Heinis 2009). Knottins often use grafting of known binding motifs, which is only applicable to a subset of targets [82], although binders have been evolved from naïve libraries [83]. Peptides, partially due to limited potential for interfacial area as well as the entropic cost of conformational flexibility [84], often require extensive optimization to yield the affinity and specificity required for many applications. In addition, the multiple disulfide bonds required for stabilization can complicate production and range of application in both cases. Slightly larger scaffolds, such as Fynomers (63 amino acids) [36], affitin (65 amino acids) [41], or sso7d (63 amino acids) [85], have moved closer to the small size of knottins and bicyclic peptides without the need for

16

. Affibodies (58 amino acids) are the smallest heavily-investigated disulfide-free scaffold in the literature [86]. Their helical paratope has provided high affinity towards many targets; however, they are typically severely destabilized after mutation (midpoint of thermal denaturation (Tm) range: 37-65 ºC; median: 46 ºC) [23]. There is still space to develop a scaffold that approaches the small size of knottins and peptides, but also possesses the other beneficial properties.

We hypothesized that a thorough exploration of known protein topologies could reveal an effective scaffold that pushed current limits of small size with potential for high affinity and retention of stability. A search of the Protein Data Bank was performed to identify a small protein with characteristics amenable to use as a protein scaffold, including retention of stability upon mutation, lack of disulfides and large available binding surface. Gene 2 protein from T7 phage was selected as the scaffold to investigate further based on the considered characteristics. A library of truncated gene 2 protein (Gp2) mutants was created by diversifying two solvent-exposed loops in the protein. Using yeast surface display and affinity maturation, target-specific Gp2 molecules with nanomolar or better affinity to three model proteins were isolated. The binding proteins retained the wild-type secondary

structure characteristics and remained folded up to an average 72 °C. These results verified

Gp2 as a promising scaffold for the isolation of strong, specific binders to a wide variety

of targets and motivated ligand discovery for the cancer biomarker epidermal growth factor

receptor (EGFR). EGFR is a clinical biomarker for imaging and therapy due to its

overexpression in multiple cancer types, including head and neck, breast, bladder, prostate,

, colorectal, non-small cell lung cancer and glioma carcinoma [87, 88]. Improved

17

EGFR targeting could empower molecular imaging for patient stratification [89–91] and

treatment monitoring as well as advanced therapeutics. An EGFR-binding Gp2 was isolated and matured. Binding was verified for membrane-bound EGFR on mammalian

cells, and Gp2 was used to detect relative EGFR expression levels across multiple cell

lines. The unique combination of small size and efficient evolution of high affinity and

stability give Gp2 significant potential for the development of molecular targeting agents.

2.3 Experimental Procedures

Detailed experimental procedures are described in Supplemental Experimental Procedures.

2.3.1 Protein Data Bank Analysis

All single domains within the Protein Data Bank (accessed November 2011) were filtered

to identify those with a size of 40-65 amino acids and at least 30% β-sheet. Note that several

proteins outside of these bounds were nevertheless returned by the website’s filter. PyMol

(Schrödinger, New York, NY) was used to visualize the structures of the 259 remaining

proteins, and those that contained two or more adjacent solvent exposed loops whose

surfaces formed one continuous face and had no disulfide bonds were further considered.

The solvent accessible surface area of the loop residues was calculated, using a 1.4 Å

sphere, after mutating residues to in Pymol [92]. For the top 14 proteins, the

destabilization upon random mutation of the loop residues was calculated using ERIS [93].

Amino acid sequences were randomly generated using equal probability of each amino

acid within the loops noted in Figure 2.1. Random mutants were analyzed until the mean

destabilization converged within 0.25 kcal/mol for at least ten consecutive mutants

18

(minimum 35 mutants total). This resulted in a maximum of 84 mutants with a median of

42 mutants.

2.3.2 Library Construction

A genetic library was constructed based on a truncated form of T7 phage gene 2 protein,

the top scoring protein, in which the sequence encoding for two loops was randomized

using degenerate oligonucleotides encoding for an amino acid distribution mimicking

antibody CDRs [94]. At diversified positions (Figure 2.1) the mixtures of nucleotides were:

15% A and C, 25% G, and 45% T in the first position; 45% A, 15% C, 25% G, and 15% T

in the second position; and 45 % C, 10% G, and 45% T in the third position. The DNA was

transformed into yeast surface display system strain EBY100 via homologous

recombination with a pCT vector [95]. The total number of transformants was determined through serial dilution on SD-CAA plates (0.07 M sodium citrate (pH 5.3), yeast nitrogen

base (6.7 g/L), casamino acids (5 g/L), and glucose (20 g/L)). Flow cytometry was used to

determine the amount of full length Gp2 displayed, through labeling of the N-terminal HA

and C-terminal c-MYC epitope, and supported with DNA sequence verification.

The design of the focused lysozyme library is described in the Supplemental Experimental

Procedures.

2.3.3 Binder Selection and Affinity Maturation

Yeast displaying the Gp2 library were exposed to control magnetic beads (first avidin- coated beads then beads with immobilized non-target protein) to remove any non-specific binding interactions. Yeast were then exposed to magnetic beads with immobilized

19

biotinylated target protein (goat IgG, rabbit IgG, lysozyme, or EGFR ectodomain) and

bound yeast were selected. Magnetic sorts on the initial library were performed at 4 °C and

only one wash. Non-naïve populations were sorted more stringently, at room temperature

and with three washes. Flow cytometry selections, with biotinylated target protein and

Alexa Fluor 647-conjugated streptavidin, were used [50] to isolate full length (c-MYC

positive) Gp2 domain mutants that bind selectively and with high affinity toward the target

proteins. Loop-focused and total gene error-prone PCR, using nucleotide analogs, and genetic loop shuffling between binding sequences were used to evolve improved function

[96].

2.3.4 Illumina MiSeq Analysis

Illumina MiSeq paired-end sequencing was conducted to obtain 4.9 x 104 reads from the

naïve library and a total of 6.4 x 105 (CV = 24%) reads from binding populations. Loop

one was analyzed independently from loop two. Mature loop one sequences were grouped

into 95 families (>60% sequence similarity) with 249 unique sequences, and after damping

(quad rooted) the adjusted sequence count was n ≥ 173 at each position. Mature loop two

sequences were grouped into 96 families with 198 unique sequences, and after damping

the adjusted sequence count was n ≥ 96 at each position. Sequences from the naïve library

population were grouped into 4.9 x 104 families with 4.9 x 104 unique sequences, after

damping the adjusted sequence count was ≥ 1.1 x 104 (due to fewer sequences at extended loop positions). Mature framework analysis grouped the sequences into 153 families (>

85% sequence similarity) with 4209 unique sequences, and after damping the adjusted sequence count was n ≥ 421 at each position. P values were calculated using a Welch’s t-

20 test and assuming a normal distribution and using the number of damped sequences at each position as the sample size.

2.3.5 Affinity and Biophysical Properties

Binding affinities of evolved Gp2 molecules were determined by titrating target and evaluating binding using yeast surface display and flow cytometry [95]. For GαLysA0.3.3

Alexa Fluor 488 conjugated streptavidin was mixed equimolar with singly biotinylated lysozyme at 2 µM prior to titration and yeast labeling. Gp2 domains were produced in E. coli, using strain JE1(DE3) for wild-type or BL21(DE3) for other Gp2 domains, purified by metal affinity chromatography and reverse phase high performance liquid chromatography, and verified by mass spectrometry. Purified Gp2 was suspended at 0.2 to

0.9 mg/mL in PBS (8.0 g/L NaCl, 0.2 g/L KCl, 1.44 g/L Na2HPO4, 0.24 g/L KH2PO4, pH

7.2) or 10 mM sodium acetate pH 5.5, and secondary structure and thermal stability were evaluated by wavelength (200 to 260 nm) and temperature scans (25 to 98 °C at 218 nm) via circular dichroism spectroscopy. Binding affinity was further measured by equilibrium competition titration with purified Gp2, target protein, and Gp2 displayed on the yeast surface using flow cytometry [97]. For cellular EGFR affinity, A431 epidermoid carcinoma were cultured in Dulbecco’s modified Eagle’s medium with 10% fetal bovine serum at 37

ºC in humidified air with 5% CO2. Cells to be used in flow cytometry were detached using trypsin for a shorter time (3-5 min) than recommended. Detached cells were washed and labeled with Gp2-EGFR2.2.3 at varying concentrations for 15-30 min at 4 °C. Cells were pelleted and washed with PBSA (PBS + 0.1% w/v BSA), then labeled with fluorescein conjugated rabbit anti-His6 antibody for 15 min at 4 °C. For blocking, cells were also

21

labeled with biotin-Gp2-EGFR2.2.3 and detected by Alexa Fluor 647 conjugated

streptavidin. Fluorescence was analyzed on a C6 Accuri flow cytometer (BD Biosciences,

San Jose, CA). The equilibrium dissociation constant, KD, was identified by minimizing

the sum of squared errors assuming a 1:1 binding interaction.

2.4. Results

2.4.1 Scaffold Discovery and Library Construction

The structural information in the Protein Data Bank was surveyed to locate potential

binding scaffolds. Based on the efficacy of diversifying loops at one end of a β-sandwich

in antibody domains and the type III domain, we sought scaffolds with solvent- exposed loops on a β-sheet-rich framework. As two loops can provide sufficient diversity for high affinity binding while a single loop is often insufficient [97], we sought protein

topologies with two adjacent loops amenable to mutation. All proteins in the Protein Data

Bank were filtered to identify single domains of 40-65 amino acids and at least 30% β-

sheet content, although the Protein Data Bank filter also anomalously returned several

proteins as large as 70 amino acids and with only 25% β-sheet content. Resultant proteins

were visualized in PyMol. Those with two adjoining loops (regions of at least four amino

acids without defined secondary structure) and no disulfide bonds were retained. The

potential of the diversified loops to provide sufficient binding interface while maintaining

a folded protein was assessed via two metrics: (1) solvent accessible surface area of the

loop amino acids was calculated using serine as the side chain basis for all diversified sites

to avoid wild-type sequence bias; and (2) the average destabilization upon random

22 mutation of the loops was computed using Eris [93]. Multiple proteins exhibited favorable properties within several categories (Figure 2.1). One scaffold that performed well on all criteria, especially differentiated by loop orientation, surface area (709 Å2, second highest), and stability (ΔΔGf,mut = 5.6 kcal/mol, third lowest), was selected for focused investigation:

T7 phage gene 2 protein [98].

23

Figure 2.1. Summary of information for top potential scaffolds. (A) Structure and proposed paratope surface (red) of top five scoring scaffolds. Images created in Pymol. (B)

24

PDB ID, amino acid length, % β strand, % α helix, and number of disulfides are all from the Protein Data Bank. Loop residues are identified as residues between secondary structural elements. Loop accessible surface area (ASA) is calculated using GetArea after mutating loop residues to serines within PyMol. ΔΔGf,mut was calculated for random loop mutants using Eris. Fibronectin (1ttg) and Affibody (2b88) are included for comparison. Also see Table S1.1 for commonly used scaffolds.

T7 phage gene 2 protein, a 64 amino acid E. coli RNA polymerase inhibitor from T7 phage,

was the top protein selected by the ranking system (Figure 2.2). The coils at the N- and C-

termini (14 and 5 amino acids, respectively) were genetically removed to minimize protein

size. In addition, a framework mutation (I17V) was made on the basis of 59% frequency

in naturally occurring homologs and computational stabilization (ΔΔGf,IV = -3.8 kcal/mol via Eris [93]). These modifications resulted in the 5.2 kDa, 45 amino acid Gp2 scaffold.

Figure 2.2. Solution structure of Gp2. Diversified amino acids are highlighted in red and underlined in sequence. The N- and C- terminal tails removed to create the Gp2 scaffold are highlighted in blue. An I17V (boldface in sequence) mutation was added based on prevalence in homologous protein sequences. Images created in Macpymol (PDB ID: 2wnm). 25

To evaluate the potential of Gp2 as a protein scaffold for molecular recognition, a

combinatorial library of selectively randomized Gp2 sequences was constructed for use in

discovery and directed evolution. Two loops, comprising amino acids E7-S12 and V34-

E39 (Figure 2.2), were selected as the proposed binding surface, based on their lack of defined secondary structure, continuous solvent-exposed face, and high solvent-exposed surface area (709 Å2). A yeast surface displayed library, containing approximately 4x108

Gp2 molecules based on the number of yeast transformants, was constructed by

randomizing 6 amino acids from each loop using amino acid frequencies consistent with

antibody complementarity-determining regions (CDRs) [94], and allowing loop length

diversity of 6, 7, or 8 amino acids in each loop (Figure 2.2). Hemagglutinin (HA) and c- myc epitope tags placed upstream and downstream of the Gp2 gene, respectively, allowed for differentiation through flow cytometry of yeast that lost the plasmid (HA-/c-myc-), yeast

carrying a plasmid with incomplete Gp2 (HA+/c-myc-) and yeast expressing full-length

Gp2 (HA+/c-myc+). A quality control check by flow cytometry indicated that 44% of yeast

harboring plasmid expressed full-length Gp2. This was supported through sequencing Gp2

genes, resulting in four full-length Gp2 genes, one Gp2 containing stop codon in the

diversified loop and five frame shifted genes. Therefore, the actual size of the library was

1.8 x 108 unique, full-length Gp2 proteins. Amino acid diversity matched the designed

distribution (median absolute deviation: 0.5%; Figure S2.1).

2.4.2 Yeast Surface Display Selection Against Model Protein Targets

26

Lysozyme and immunoglobulin G (IgG) from rabbit and goat were chosen as the first

targets for evaluating the binding potential of the Gp2 scaffold. The differing size and

surface topologies of lysozyme (14 kDa) and the IgG proteins (150 kDa) test the scaffold’s

ability to target diverse proteins [99, 100]. The similarity of the two IgG proteins test the

ability of Gp2 to create very specific ligands, i.e. binding to goat IgG but not rabbit IgG

and vice versa.

During each evolutionary round, the Gp2 library underwent two sorts to enrich specific binders (either with target-coated magnetic beads or through fluorescence activated cell

sorting (FACS)) and one c-myc+ sort to isolate full-length Gp2. Sorted Gp2 sequences were

mutated through parallel error-prone PCR reactions targeting the loops or the entire gene and transformed into yeast with loop shuffling driven by homologous recombination [94].

After a single maturation cycle, clear binding was evident by flow cytometry for each target

campaign (data not shown). Two to four rounds of selection and mutation were carried out

for each target to isolate binders with low-nanomolar to picomolar affinity.

Table 1.1. Characterization of wild-type Gp2 and the top binding molecule for each target. Loop 1 and Loop 2 indicate amino acid sequences in the diversified loops (E7-S12 and V34-E39). Framework indicates mutations outside of the loops resulting from error- prone PCR during evolution. Kd values represent equilibrium dissociation constants for clones displayed on the surface of yeast or as purified soluble protein (±SD for n ≥ 3). n.d. indicates data not determined. Tm indicates the midpoint of thermal denaturation as measured by circular dichroism (±SD for n = 2).

Kd,yeast Kd,soluble Name Loop 1 Loop 2 Framework Tm (ºC) (nM) (nM)

27

wild-type ESSEHS VPAGFE - - - 67 ± 1

GαGIgG2.2.1 YDYDADYY YSNHSDYL E30V, Q32R 0.2 ± 0.1 0.4 ± 0.3 70 ± 4

GαLysA0.3.3 FSYGNL SGAYEY - 0.9 ± 0.7 n.d. 65 ± 3

GαRIgG3.2.3 HSVHGY GNALGY E30F, W31G 2.3 ± 1.4 1.0 ± 0.5 80 ± 1

Single clones were isolated at the end of rounds where strong binding was detectable by

FACS. Target affinity on the surface of yeast was determined by concentration titration

(Table 2.1 and Figure 2.3A). Affinities measured on the surface of yeast have been

previously shown to match affinities measured by surface plasmon resonance and related

techniques [101]. The goat IgG binding population contained one dominant sequence family that had affinities as strong as 200 ±100 pM after one mutagenic cycle. The rabbit

IgG binding population contained a family of low nanomolar affinity binders at the end of the third mutagenic cycle, which improved to 2.3 ±1.4 nM after another round of affinity maturation. A single dominant clone group was isolated for lysozyme binders after the second round of mutagenesis, albeit with weaker binding. Lysozyme-binding clones

isolated in subsequent rounds showed point mutations but no significant increase in binding

affinity. A sublibrary was created in which amino acid diversity was chosen based on

weighted scores of parental binding sequences, computational stability analysis (using

FoldX), natural homolog sequences, and potential for complementarity (Figure S2.2). The

strongest library member isolated was GαLysA0.3.3 with an affinity of 0.9 ±0.7 nM (Figure

2.3A). During selection and maturation, the lysozyme target was always displayed as a

28 complex with streptavidin. Resultantly, GαLysA0.3.3 bound to a streptavidin-lysozyme complex but not streptavidin or lysozyme alone. Nevertheless, this still supports Gp2 as able to generate specific, high affinity binders to the target presented.

Figure 2.3. Binding characterization. (A) Yeast displaying GαGIgG2.2.1 (red circles), GαRIgG3.2.3 (green triangles), and GαLysA0.3.3 (gray squares) were incubated with the indicated concentrations of biotinylated target. Binding was detected by streptavidin- fluorophore via flow cytometry. Titration indicates equilibrium dissociation constants of 0.2 ±0.1 nM, 1.5 ±0.6 nM, and 27 ±17 nM. (B) Yeast were incubated with 1 µM of goat IgG (G), lysozyme (L), rabbit IgG (R), or transferrin (T). Binding was detected by streptavidin-fluorophore via flow cytometry. Data are normalized to signal of intended target for each yeast strain. Error bars represent ±SD of n = 3 samples. See Figure S2.3 for full collection of affinity titration curves.

To examine target specificity, the strongest binding ligand from each target campaign was incubated with four proteins at 1 µM. All Gp2 proteins show strong binding to the target protein for which they were selected and matured, while showing fluorescence only slightly above background signal for the three other proteins (Figure 2.3B). Four depletion sorts were performed per round, consisting of two sorts to bare streptavidin coated beads and two sorts to non-target protein coated beads. This sorting process enabled only Gp2 binders to the protein of interest to be carried through the affinity maturation process.

29

2.4.3 Soluble Protein Characterization

To examine if untethered Gp2 will retain its useful properties, the protein was produced and tested for binding and stability. Wild-type Gp2 could not be produced at detectable levels in a typical production E. coli strain, likely due to the protein’s native RNA polymerase inhibition function [102]. However, mutated clones were able to be produced, presumably due to mutation of many of the residues on Gp2 normally required to interact with DNA during transcription inhibition [98]. Protein titer, with no optimization, varied between 0.2 mg/L and 2.2 mg/L for different Gp2 mutants. The JE1 E. coli strain has the

β-jaw portion of the E. coli RNA polymerase protein deleted [103] and enabled production of Gp2 wild-type at 0.7 mg/L. All Gp2 molecules were verified by mass spectrometry to within 0.1% of the expected mass.

Structure and stability after mutation are important qualities of a potential protein scaffold.

In order to evaluate secondary structure changes after Gp2 was evolved for new binding function, circular dichroism measurements were taken of the top binder for each target.

The ellipticity spectra of the binding Gp2 proteins deviate only slightly from the wild-type

Gp2, suggesting that the secondary structure remains relatively unchanged after multiple mutations in the backbone and entirely new loop regions (Figure 2.4A). The midpoint of thermal denaturation of each protein was measured by monitoring the loss of secondary structure at 218 nm as temperature increased (Table 2.1 and Figure 2.4B). Interestingly, the average stability of Gp2 binding mutants increased by 5 °C over Gp2 wild-type, which had a Tm = 67 ±1 °C. Gp2 mutants that were raised to 98 °C and brought down to room temperature had similar CD spectra as the initial measurements (Figure 2.4A). GαGIgG2.2.1

30

(containing a framework mutation, R44E, to increase production) maintained binding

activity (Figure 2.4D) after reverse phase high performance liquid chromatography and

heating to 98 °C, illustrating the chemical and thermal stability of the protein.

Figure 2.4. Soluble Gp2 characterization. (A,B) Purified Gp2 clones (blue: wild-type, red: GαG2.2.1, green: GαR3.2.3) were analyzed by circular dichroism spectroscopy. (A) Molar ellipticity (θ) was measured at indicated wavelengths before (solid lines) and after (dashed lines) thermal denaturation. (B) Molar ellipticity was monitored at 218 nm upon heating from 25 ºC to 98 ºC at 1 ºC per minute. (C) Biotinylated IgG (2 nM goat or 4 nM rabbit as appropriate) was incubated with the indicated concentration of purified Gp2 and used to label yeast displaying the corresponding Gp2 clone. IgG binding was measured by streptavidin-fluorophore via flow cytometry. (D) Purified GαG2.2.1 was subjected to various treatments prior to use in the competitive binding assay (as in (C)). Unblocked is a control without Gp2 competition. Pre-HPLC uses competition by GαG2.2.1 that was purified on a metal affinity column. Post-HPLC uses competition by GαG2.2.1 purified by metal affinity chromatography and reverse phase high performance liquid chromatography and lyophilization. Post-heat uses competition by GαG2.2.1 purified as in Post-HPLC along with heating to 98 ºC and cooling back to 22 ºC. Error bars are ±SD on n = 2 samples. See Figure S2.2 for GαLysA0.3.3.

The binding affinities of select soluble Gp2 clones, measured by equilibrium competition

titration, were 0.4 ±0.3 nM for GαGIgG2.2.1 and 1.0 ±0.5 nM for GαRIgG3.2.3 (Table 2.1

and Figure 2.4C), showing good agreement with the KD’s obtained from yeast surface

display affinity titration.

2.4.4 EGFR-Targeting Gp2 Domains

31

The success and ease of ligand discovery for the three model targets from the current Gp2

library motivated the search for a Gp2 ligand that strongly and specifically bound EGFR

for use in cancer imaging or treatment. Through similar methods as the model targets, Gp2

ligand discovery was performed using soluble biotinlyated EGFR ectodomain as the target.

After the second round of affinity maturation, strong binding was evident by flow

cytometry with 50 nM soluble EGFR (Figure 2.5A). Deep sequencing indicated that

GαEGFR2.2.3 was the dominant clone in the library and revealed several other families of

sequences that were enriched during evolution. Purified Gp2-EGFR2.2.3 binds to intact

membrane-bound EGFR with an affinity of 18 ±8 nM as measured by labeling A431

epidermoid carcinoma cells (Figure 2.5B) that highly overexpress EGFR on their surface

[104, 105]. Gp2-EGFR2.2.3 exhibits secondary structure consistent with the wild-type Gp2

domain (Figure 2.4C) and is thermally stable (Tm = 71 ±2 ºC, Figure 2.5C). Fluorescence

microscopy was used to evaluate binding specificity. GαEGFR2.2.3, but not wild-type Gp2, effectively labels the cell surface of EGFRhigh A431 epidermoid carcinoma cells but not

EGFRlow MCF7 mammary carcinoma cells (Figure 2.5D). Moreover, binding of

biotinylated GαEGFR2.2.3 was blockable by excess unlabeled Gp2 (Figure 2.5E). Gp2

biotinylation was straightforward (with N-hydroxy succinimide biotin) due to the lone

lysine residue at the N-terminus. To further validate that the evolved Gp2 domain can

differentiate between different EGFR levels, four cell lines, A431 epidermoid carcinoma

(EGFRhigh), MDA-MB-231 mammary carcinoma (EGFRmid), DU145 prostate carcinoma

mid low (EGFR ), and MCF7 mammary carcinoma (EGFR ) were labeled with GαEGFR2.2.3

32

and a secondary fluorophore. Fluorescence signal correlated with cell surface expression

level of EGFR (Figure 2.5F).

Figure 2.5. GαEGFR2.2.3 affinity and stability. (A) Population of yeast displaying Gp2 incubated with 50 nM EGFR and anti-c-myc antibody. Secondary fluorophores detect binding of EGFR and antibody. Spread of double positive cells suggests moderate diversity of EGFR binding Gp2 molecules. GαEGFR2.2.3 was collected from top 1% of double positives. (B) A431 cells were incubated with indicated amount of purified Gp2. Binding was detected by anti-His6 fluorophore. Titration indicates an equilibrium dissociation constant of 18 ±8 nM. (C) Molar ellipticity was monitored at 218 nm upon heating from 25 ºC to 98 ºC at 1 ºC per minute. Molar ellipticity before (solid) and after (dashed) thermal denaturation is inset. (D) Fluorescence microscopy of adhered cancer cell lines, incubated with 100 nM Gp2 protein and detected by anti-His6-fluorescein. A431 cells labeled with GαEGFR2.2.3 (left column) show localization to cell surface. A431 cells labeled with Gp2 WT (middle column) and MCF7 cells labeled with GαEGFR2.2.3 (lower column) show very little detectable binding. Scale bar represents 200 µm. (E) A431 cells preincubated with PBS or 1 µM GαEGFR2.2.3 and labeled with 200 nM biotinylated GαEGFR2.2.3. Binding detected with streptavidin fluorophore. (F) Cancer cell lines expressing varying levels of

33

EGFR: MCF7 with 2 x 104 receptors (gray line) [106], MDA-MB-231 with 1 x 105 receptors (red dot) [106], DU145 with 2 x 105 receptors (black dash) [107], and A431with 6 3 x 10 (green dash) [105] were incubated with 1 µM GαEGFR2.2.3 and detected with anti- His6-fluorescein by flow cytometry.

2.4.5 Deep Sequencing of Naïve and Binding Populations

Deep sequencing was performed on the original library and the first enriched population in

each target campaign that displayed significant selective binding represented by a 10:1

ratio of cells on target beads to control beads. Multiple binding sequence families were

identified for each target (4,209 unique sequences in 153 families identified by 85%

sequence identity). Sitewise changes in amino acid frequencies, relative to the original

library, for heavily diversified loops (Figure 2.6A) and framework sites subject to error- prone PCR (Figure 2.6B) reveal site-specific amino acid preferences. Additionally, loop length frequency shows a slight change from the initial library to the evolved binders

(Figure 2.6C).

34

Figure 2.6. Deep sequencing comparison of naïve and binding populations. All sequences were grouped into families and damped. (A) Diversified loop positions (CDR`) are shown at the top, with 9a and 9b, or 36a and 36b, representing loop length diversity positions. Positions with an absolute change of 5% or greater are labeled. All labeled positions have p < 0.001 (n ≥ 96, number of damped sequences). (B) Relative change in amino acid frequency by framework position from the naïve library to combined mature populations. Certain mutations are more common than others due to error prone PCR limitations. Positions where wild type was conserved (#) or mutated (*) significantly more often (p < 0.001 for 3% deviation, n ≥ 440) than the mean mutation rate are denoted. (C) Frequency of Gp2 loop length in amino acids. The naïve library was designed with equal loop length frequencies, but DNA bias during construction lead to overrepresentation of shorter loops. Mature combines high throughput sequencing from the four binding 35

populations. All have p < 0.001. Error bars are ±SD for n = 4.9x104 for naïve and n ≥ 477 for mature (number of damped sequences). See also Figure S2.1. 2.5 Discussion

We sought to identify a protein scaffold with a unique combination of small size and robust

evolvability of specific, high affinity binding while retaining stability. And we sought to

evaluate the ability to identify such a scaffold via systematic analysis of protein topology

and an estimation of mutational tolerance. Gp2 provides very small size (5 kDa, 45 amino

acids) and has been efficiently evolved, to four different targets, to picomolar to nanomolar

affinities with good specificity including strong discrimination between related IgGs.

Evolved molecules exhibit good thermal and chemical stability (Tm = 65 - 80 ºC with

reversible unfolding and tolerance to reverse phase chromatography) and are functional in

cellular assays. Gp2 succeeded with the combination of an effective topology,

diversification design (tolerant loops of sufficient size, antibody-inspired amino acid bias,

and shape diversity via loop length variation), and evolution method.

The particular topology of an α-helix opposite a three-strand β-sheet underpinning two diversified loops is a new structure for a generalized binding scaffold. Of course, loops have been successful with numerous other topologies including – as examples rather than an all-inclusive list – antibodies, fibronectin domains, and green fluorescent protein [108].

Yet not all loop-presenting frameworks are equally efficacious as evidenced by varying

performance of different scaffolds [109]. Of note here, the second candidate in our

theoretical scaffold evaluation – an SH3 domain from CD2 associated protein [110] – has

substantially different framework topology and loop orientation (Figure 2.1). This SH3

domain is a homolog (RMSD = 1.4 Å and TM-score = 0.79, where greater than 0.5 36

indicates similar fold [111]) of the Fynomer scaffold, which has been evolved for low

nanomolar affinity to five targets (Table S1) [36, 112]. While the scaffold evaluation

algorithm provides a framework for comparison, comparative experimental evaluation of

numerous scaffolds is beyond the scope of the current study. Importantly, the demonstrated

efficacy of Gp2 provides evidence of an improved ability to merge design goals in

evolutionary capacity, affinity, and stability if the topology is properly selected.

Regarding evolutionary library design, while the diversified loops provided sufficient diversity for high affinity binding, adjacent framework mutations in several clones were noted (Table 1). Deep sequencing suggests that added diversity at the N-terminal edge of the second loop could be beneficial as E30, W31, and Q32 exhibited higher than average variance in binders relative to the naïve library (p < 0.001), presumably resulting from error-prone PCR during evolution (Figure 2.6B). In addition, K1 also exhibits an increased mutation rate (p=0.006), perhaps to account for change in exposure or structure due to the removal of the N-terminal tail. Conversely, lower than average mutation rates of sites F2,

A4, P16, A25, Y33, and V40 (p < 0.001) suggest an evolutionary benefit to conservation at those sites. Notably, Y33 and V40 directly flank the diversified region of the second loop and are relatively buried in the protein core (19% and 21% solvent accessibility, respectively), suggesting a loop-anchoring benefit. Overall, mutation tolerance correlated with solvent accessibility (p=0.001, Figure S2.1). Antibody-inspired amino acid distribution was effective as it was closely maintained in binding sequences (Figure 2.6A) although evolutionary efficiency would benefit from slightly increased , especially at select sites: amino acids 9-11 in the middle of the first loop and 34-35 at the start of the

37

second loop. Meanwhile, evolutionary preference was evident for hydrophilic residues at

sites 10 and 36 and hydrophobic residues at sites 12, 37, and 38. Loop length diversity was

effectively utilized in binding sequences with a slight trend towards a longer first loop

(18% decrease of wild-type 6-amino acid length and 33% increase of 8 amino acid loop; p

< 0.001) and a second loop of wild-type length (12% increase; p < 0.001) (Figure 2.6C).

Collectively, the diversification design was effective yet represents an avenue for

improvement. While the Gp2 scaffold was robust in efficiently identifying effective

binding molecules for each of the four disparate targets, one possible limitation is the

modest number of lead molecules for each target. As scaffold size is reduced, the need to

simultaneously optimize inter- and intra-molecular interactions is heightened, and the

solution space is diminished. While all evolved clones exhibited appropriate secondary

structure, good thermal stability, and effective binding, the frequency of mutational

tolerance of the naïve library could be different. In fact, evolved proteins were readily produced in E. coli, albeit at moderate yields, and eight randomly selected initial library mutants were unable to be recovered in detectable amounts (data not shown), suggesting lack of solubility or stability in some naïve mutants. Refining the combinatorial library using sitewise optimization of diversity based on the aforementioned data could improve the breadth of solutions.

Effective lead ligands, such as GαEGFR2.2.3, which binds cellular EGFR with 18 ±8 nM

affinity and has high thermal stability (71 °C), were identified. GαEGFR2.2.3 was effective

in cellular labeling for immunofluorescence, flow cytometry, and binding inhibition.

Notably, GαEGFR2.2.3 contains a single lysine at the N-terminus, distal to the proposed

38

binding paratope, for conjugation of imaging moieties or molecules of therapeutic interest.

The conjugation of biotin to the molecule without strong inhibition of binding suggests that

other molecules can similarly be conjugated. Similar small proteins have shown advantages

over antibodies in molecular imaging of solid tumors [25, 26], where enhanced transport

and clearance afforded by small size are particularly valuable, and have exhibited clinical

potential [113]. It should be noted that, as with any synthetically engineered protein with

non-human sequence components, immunogenicity of evolved molecules will need to be evaluated. Overall, Gp2 molecules represent a promising alternative with a unique blend of small size and robust evolution of stable, high affinity binders.

2.6 Significance

A systematic evaluation of the Protein Data Bank based on size, protein secondary

structure, paratope structure, and resilience to mutation lead to further investigation of the

small Gp2 domain. The ability to discover and evolve novel binding function on mutated

Gp2 molecules is generalizable, with four targets currently tested. The small size, retention

of thermal stability, and ease of evolving high affinity binding provide a unique and

powerful combination in an alternative scaffold. The Gp2 molecules reported here that bind

strongly and specifically to rabbit IgG, goat IgG, lysozyme, and EGFR have use

immediately in biotechnology or, in the case of GαEGFR2.2.3, as a potential clinical imaging

agent due to their single distal lysine available for conjugation of small molecules or other

moieties. Moreover, deep sequencing of the binding populations provided insights on Gp2

evolution including mutational tolerance at non-loop positions and amino acid preference

at individual loop positions. Overall, the initial successes of Gp2 combined with knowledge

39

gained from sequence analysis suggest a promising scaffold that can be used to discover

small, stable binders to many targets while providing an opportunity for ongoing study of

evolutionary optimization of inter- and intra-molecular interactions in a small scaffold.

2.7 Acknowledgements

We are grateful to Dr. Sivaramesh Wigneshweraraj for the JE1 E. coli strain and Aaron

Becker at the University of Minnesota Genomics Center for assistance with Illumina sequencing. This work was partially funded by the Department of Defense (Grant

W81XWH-13-1-0471 to B.J.H.), the National Institutes of Health (Grant EB019518 to

B.J.H.), and the University of Minnesota.

2.8 Supplemental Data

40

Figure S2.1. Related to Figure 2.6. High throughput analysis of Gp2 library and evolved sequences. (A) Naïve library amino acid frequency determined by high throughput sequencing (n = 6.7 x 105). The theoretical aim of mimicking antibody CDRs with reduced glycine was closely matched by the experimental frequency. Nucleotide mixtures at each position in a codon are inset. (B) The relative observed mutation rate in

41

evolved binders for each framework site (1-6, 13-33, and 40-45) is plotted versus the relative solvent accessible surface area based on the wild-type structure.

42

Figure S2.2. Related to Figure 2.4. Affinity maturation and characterization of GαLysA0.3.3. (A) Amino acid diversity of Gp2 library based on dominant Gp2-lysozyme binder found in high throughput sequencing. (B) GαLysA0.3.3 was analyzed by circular 43 dichroism spectroscopy. Molar ellipticity at indicated wavelengths before (solid) and after (dashed) thermal denaturation. (C) Molar ellipticity was monitored at 218 nm upon heating from 25ºC to 98ºC at 1ºC per minute.

Figure S2.3. related to Figure 2.3. Collection of affinity titration curve replicates used in determining yeast based affinity.

44

Table S2.1. related to Figure 2.1. Brief summary of scaffolds reported in literature that have been used to evolve novel binding activity. For a recent, in-depth review see [32].

Scaffol Size Wild Evolve Targets Affinity Discover Sour d type d Tm (Media y ce Tm (Media n) (nM) Methods (oC) n) (oC) bicycli 17 n.d. n.d. plasma kallikrein, 0.3-157 phage a c (2- cathepsin G, uPA (99) display 3SS) s knottin 20- ~100 n.d. Thrombin, αvβ3 0.78- grafting b 50 integrin, αvβ6 integrin, 330 (9) and (3SS AntiEBV Antibody, phage/ye ) HIV gp120, Dll4-Fc ast display, Gp2 45- 67 65-80 goat IgG, rabbit, IgG, 0.2-18 yeast This 49 (71) lysozyme, EGFR (1.6) display work Album 46 ~80 ~60 ErbB3, Z2 domain, 4-400 phage c in TNFα (267) display Bindin g Domai n Affibo 58 72 37-65 SPA, DNA 0.2- phage d dy (46) Polymerase, IgA, 8500 display, HER2/neu, HIV-I (15) bacterial gp120, CD25, factor display VIII, IgA, HER2, EGFR, PDGFR, amyloid β, TNF-α, ErbB3, hCD28, IGF-1, VEGFR2 sso7d 63 98 75- Lysozyme, β-Catenin 12-7600 yeast e 102 peptide, Fluorescein, (225) display (93) cIgY, Streptavidin, mIgG, hIgG, Red clover necrotic mosaic virus Fynom 63 71 n.d. extradomain B 0.9-85 phage f er fibronectin, mouse (6) display 45

serum albumin , chymase, BACE-2, IL- 17A Affitin 66 90 66-90 PulD, hIgG, lysozyme, 0.14-48 ribosome g (80) CelD (23) display Alphab 70- n.d. n.d. IL-23 3.8 phage h ody 100 display 76 n.d. n.d. TNFα 180 ribosome i display fibrone 94 86 42-73 TNF-α, hSUMO1, 0.02- mRNA j ctin / (57) AblSH2, GFP, Erk-2, 250 (12) display, Adnect mAbs, SH2D5, Grb7, phage in / Syk, lysozyme, αvβ3, display, monob EGFR, VEGFR2, yeast ody SUMO, MBP, Src display SH3, , Phospho-IκBα, ARVCF Peptide Avime 70- n.d. n.d. cMet, CD40L, IL-6, 0.1-0.2 phage k r 120 CD28 (0.1) display Nanob ~100 n.d. 60-78 EGFR, F4 fimbriae, 0.04-20 llama l ody (65) Hen lysozyme, Human (1.6) immuniz (VHH) lysozyme, NmcA, ation and BcII, TEM-1, Azo dye phage RR6 display Obodie 111 n.d. 66-81 lysozyme 3 phage m s (72) display Antical 133- 79 53-73 hemoglobin, 4-6000 n in 178 (68) fluorescein, (35) streptavidin, digoxigenin , nonsymmetric phthalic acid ester DARPi 130- >90 66-85 BCL-2, BCL-XL, 0.02- ribosome o n 200 (79) BCL-W, MCL-1, 117 display ERK2, JNK1α1, (1.0) JNK2α1, pERK2, HER2, EpCAM a [42–45] b [83, 114–118] c [119–121] d [71, 76, 122–137] e [85, 138, 139] f [34–38]

46 g [39–41] h [140] I [141] j [94, 97, 142–151] k [33] l [152–155] m [156] n [157–161] o [162–167]

47

2.8.1 Supplemental Experimental Procedures

Library Construction

A genetic library was constructed based on a truncated form of T7 phage gene 2 protein,

the top scoring protein, in which the sequence encoding for two loops was randomized

using degenerate oligonucleotides encoding for an amino acid distribution mimicking

antibody CDRs [94]. Naïve library design and construction was carried out as described

previously [94]. Degenerate nucleotides (Table below), at diversified positions (sites 7-12

and 34-39, Figure 2.1), had 15% A and C, 25% G, and 45% T in the first position; 45% A,

15% C, 25% G, and 15% T in the second position; and 45 % C, 10% G, and 45% T in the

third position. This mixture of nucleotides resulted in a theoretical amino acid composition

of 17% Y, 13% S, 11% D, 9% N, 6% A and H, 5% C and T, 4% G and P, 3% F, R, V, and

Z, 2% L, I, E and K, 1% Q and W, and <1% M. The design also included three loop lengths

in each loop: six (wild-type length), seven, or eight amino acids. Overlap extension

reactions of eight oligonucleotides were carried out separately to create full-length Gp2 genes and avoid shorter loop length bias. Gene reactions were combined and transformed into EBY100 yeast using homologous recombination with linearized yeast surface display pCT vector [95].

Degenerate oligonucleotides were used to create a diversified library heavily based on the parental lysozyme-binding clone found in high throughput sequencing. The amino acid composition at each residue was determined by weighing composition in previous binding sequences and natural homolog sequences, computational stability, and potential for complementarity. Previous binding sequences were counted as one family if they differed 48

by two or fewer amino acids in the loop region, and the frequency of each family member

was square rooted to avoid over counting dominant members. Running the NCBI BLAST

online tool of the full length Gp2 gene sequence returned 32 natural homologous genes.

These were translated and aligned in Clustal W2, and the frequency of amino acids at each

position was counted.

Computational stability of the each loop site was calculated using the FoldX force field

[168]. Four Gp2 structures from the PDB (2WNM, 2LMC, 4LK0, and 4LLG) had their

loop regions randomized and then relaxed. For each random loop region, each loop position

was mutated, independently, to the 19 other amino acids and the change in stability was

calculated. The number of random loop sequences evaluated depended on how long each

amino acid at each site took to converge to an average stability (between 40-50 depending

on the structure). Stability was converted to a frequency by normalizing between 0 and 1

using S = exp(-ΔΔG/kT), setting kT = 1, and dividing by the sum at that position.

Complementarity was based on the CDRs of antibodies; the frequencies were set equal to

the initial diversity of the CDR-like diversification scheme. Frequency in previous binders was weighted heavily to ensure that the parental amino acid was present in the design.

Natural sequences, complementarity, and stability were chosen to have a ratio of 1:2:16, respectively, based on the relative confidence and impact of each category. The final weights were 48%, 2.5%, 5%, and 40% for previous binders, natural homologs, complementarity, and stability, respectively. Possible amino acid combinations arising from degenerate codons were scored at each position. The design (Table S2) that remained below approximately 6 x 106 members with the highest product of position scores was

49

chosen, using solvent accessible surface to break ties (giving more diversity for a more

exposed position).

Table S.2.2 Oligonucleotide sequences used in current study. xyz codons represent

CDR` diversity.

Sequence Description TGGTGGTTCTGCTAGCAAATTTTGGGCGACTGTACCAT Naïve library GGATCTGCCCCGGGATGTACCA construction: vector GCTTTTGTTCGGATCCCGGACGCACGCGGGTCACCAT Naïve library ATGGTACATCCCGGGGCAGATCC construction: vector Naïve library TAGCAAATTTTGGGCGACTGTAxyzxyzxyzxyzxyzxyzTTC construction: loop 1, GAGGTTCCGGTTTATGCT length 6 Naïve library TAGCAAATTTTGGGCGACTGTAxyzxyzxyzxyzxyzxyzxyzT construction: loop 1, TCGAGGTTCCGGTTTATGCT length 7 Naïve library TAGCAAATTTTGGGCGACTGTAxyzxyzxyzxyzxyzxyzxyzx construction: loop 1, yzTTCGAGGTTCCGGTTTATGCT length 8 Naïve library GAACTGGCCGAATGGCAGTACxyzxyzxyzxyzxyzxyzGTG construction: loop 2, ACCCGCGTGCGTCCGGGAT length 6 Naïve library GAACTGGCCGAATGGCAGTACxyzxyzxyzxyzxyzxyzxyzG construction: loop 2, TGACCCGCGTGCGTCCGGGAT length 7 Naïve library GAACTGGCCGAATGGCAGTACxyzxyzxyzxyzxyzxyzxyzxy construction: loop 2, zGTGACCCGCGTGCGTCCGGGAT length 8 Naïve library construction: GTGGTGGTTCTGCTAGCAAATTTTGGGCGACTGTA framework and loop 1 mutagenesis Naïve library CTGCCATTCGGCCAGTTCCAGTGCTTCGTCCAGGGTTT construction: CAGCATAAACCGGAACCTCGAA framework and loop 1 mutagenesis Naïve library GAGGTTCCGGTTTATGCTGAAACCCTGGACGAAGCAC construction: TGGAACTGGCCGAATGGCAGTAC framework and loop 2 mutagenesis 50

Naïve library construction: AGCTTTTGTTCGGATCCCGGACGCACGCGGGTCAC framework and loop 2 mutagenesis CGACGATTGAAGGTAGATACCCATACGACGTTCCAGA Full gene mutagenesis CTACGCTCTGCAG and plasmid transfer ATCTCGAGCTATTACAAGTCCTCTTCAGAAATAAGCTT Full gene mutagenesis TTGTTCGGATCC and plasmid transfer AGCAAATTTTGGGCGACTGTAGAATCCTCTGAACACA Wild-type loop 1 GCTTCGAGGTTCCGGTTTATGCT construction CGGACGCACGCGGGTCACTTCGAAACCAGCCGGAACG Wild-type loop 2 TACTGCCATTCGGCCAGTTC construction Lysozyme designer TAGCAAATTTTGGGCGACTGTATHCTYGYMTKRCARM library construction: MKCTTCGAGGTTCCGGTTTATGCT loop 1 Lysozyme designer GAAGCACTGGAACTGGCCGAATGGCAGTACTHTSSTR library construction: MAKRTGVAYRTGTGACCCGCGTGCGTCCGGGAT loop 2

Binder Selection and Affinity Maturation

Selection and maturation of yeast was performed largely following a previously outlined protocol [50]. Briefly, yeast were grown to logarithmic phase in SD-CAA at 30 °C, pelleted and resuspended to 1-3 x 107 cells/mL in SG-CAA (0.1M sodium phosphate, pH 6.0, 6.7 g/L yeast nitrogen base, 5 g/L casamino acids, 19 g/L galactose, 1 g/L glucose) and grown for 8-24 h at 30 °C to induce protein expression. One round of magnetic sorting consisted of two negative selections, one for clones that do not bind streptavidin-coated magnetic beads, and one for clones that do not bind biotinylated control protein (either goat IgG, rabbit IgG, transferrin, or lysozyme) followed by a positive selection for clones that bind biotinylated target protein conjugated to streptavidin-coated magnetic beads. A round of flow cytometry sorting consisted of labeling the induced yeast library with 0.25 mg/L mouse anti-c-myc antibody (clone 9E10) and biotinylated target protein (concentration 51 based on expected population binding strength), followed by fluorescent secondary labeling of fluorescein-conjugated goat anti-mouse antibody and Alexa Fluor 647- conjugated streptavidin. If clones were positive for target binding, represented by positive

Alexa Fluor 647, and contained full length Gp2, represented by positive fluorescein signal, they were collected via fluorescence activated cell sorting (FACS). If no target binding was detected, all full length Gp2 clones were collected via FACS. The naïve library underwent two rounds of magnetic sorts (one wash at 4 °C, three washes at 22 °C) and one flow cytometry sort. After the flow cytometry sort, plasmid DNA was extracted, mutagenized by error-prone PCR [169], using nucleotide analogs 2'-Deoxy-P-nucleoside-5'-

Triphosphate and 8-Oxo-2'-deoxyguanosine-5'-Triphosphate (Trilink ,

San Diego, CA), of the full gene or the diversified loops (both done in parallel [96]), and retransformed into yeast (Supplementary Table 2). Mutagenized populations underwent two rounds of magnetic sorting (both using three washes at 22 °C) and one flow cytometry sort. When dilution plating of the individual magnetic bead sorts indicated 5-10 fold more yeast collected during positive selection than yeast bound to the negative control protein conjugated beads, only one magnetic bead round (three washes at 22 °C) and one flow cytometry round were done for each mutagenized population.

To isolate the strongest affinity ligands, FACS was used as above. However, the collection gate was drawn so that only cells whose binding/display signal was in the top 1% were collected. These cells were grown to saturation in SD-CAA medium and zymoprepped

(Zymo Research Corp., Irvine, CA) to isolate plasmid DNA. Single plasmids were obtained by transforming 1 – 5 µL of DNA into E. coli DH5α. Single colonies were grown in LB

52 medium (tryptone (10 g/L), yeast extract (5 g/L), and sodium chloride (10 g/L)) containing ampicillin (100 mg/L) and miniprepped (Epoch Biolabs, Inc., Sugar Land, TX). Purified

DNA was sent for sequencing to Eurofins Genomics (Huntsville, AL).

The designed library based on the dominant clones from the Gp2-lysozyme Illumina sequencing was transformed into yeast and sorted with one normal bead sort round followed by FACS two times to isolate the strongest binders, as above.

Gp2 Production

The Gp2 gene was removed from the yeast surface display vector using NheI and BamHI restriction enzymes and ligated into a pET vector containing a C-terminal His6 tag

(Novagen, EMD Millipore, Billerica, MA). The plasmid was transformed either into

BL21(DE3) or JE1(DE3) (kind gift of Dr. Sivaramesh Wigneshweraraj) E. coli and grown in LB medium containing kanamycin (50 mg/L). One liter of LB medium with kanamycin was inoculated with 5 mL of overnight culture, grown at 37 ºC to an optical density600 nm of 0.6-1.5 units, and induced with 0.5 mM isopropyl β-D-1-thiogalactopyranoside for 20-

24 hours at 30 °C. Cells were pelleted, resuspended in 10 mL of lysis buffer (50 mM sodium phosphate (pH 8.0), 0.5 M NaCl, 5% glycerol, 5 mM 3-[(3-cholamidopropyl) dimethylammonio]-1-propanesulfonate, and 25 mM imidazole), and underwent four freeze-thaw cycles. The soluble fraction was isolated by centrifugation at 12,000 g for 10 min, and Gp2 was purified by metal affinity chromatography on a HisPur resin (Pierce,

Thermo Fisher Scientific, Rockford, IL) and by reverse phase high performance liquid chromatography with a C18 column using a 15 minute gradient of 10% solvent B (90%

53

acetonitrile, 9.9% water, 0.1% trifluoroacetic acid) to 90% solvent B with the remainder

composed of solvent A (99.9% water, 0.1% trifluoroacetic acid).

Affinity Measurement

The plasmid containing the clone of interest was transformed into yeast using the Frozen

EZ Transformation Kit II (Zymo Research Corp., Irvine, CA), plated on SD medium plates

and then grown directly in SG medium at 30 °C for at least 12 h to induce protein

expression. Cells were pelleted and washed with PBSA (8.0 g/L NaCl, 0.2 g/L KCl, 1.44

g/L Na2HPO4, 0.24 g/L KH2PO4, pH 7.2, 1.0 g/L bovine serum albumin) and resuspended

in PBSA containing biotin conjugated target over a range of concentrations. Sample

volumes and cell densities were selected to ensure at least a 15-fold excess of target to displayed Gp2. The samples were incubated at 22 °C for an appropriate amount of time to reach 90% of the approach to equilibrium. After incubation, cells were washed and labeled with mouse anti-c-myc antibody (clone 9E10) for 10-15 min, washed, and labeled with

fluorescein-conjugated goat anti-mouse antibody and Alexa Fluor 647-conjugated

streptavidin for 10-15 min. Yeast were washed and Alexa Fluor 647 fluorescence was

analyzed on a FACS Calibur flow cytometer (BD Biosciences, San Jose, CA). For

GαLysA0.3.3 Alexa Fluor 488 conjugated streptavidin was premixed at equimolar

concentration with singly biotinylated lysozyme (verified by mass spectrometry) at 2 µM

and then diluted for use during the target labeling step. Alexa Fluor 647 conjugated goat

anti-mouse antibody was used to measure 9E10 binding in the second labeling step.

Binding was determined by measuring Alexa Fluor 488 on a flow cytometer. The relative

fraction bound for cells displaying Gp2 was determined by subtracting background

54 fluorescence of an unlabeled control and normalizing to the maximum saturated signal at high concentrations. The equilibrium dissociation constant, KD, was identified as the concentration where half-maximal binding occurred.

Soluble affinity of select Gp2 proteins was determined using equilibrium competition titration, as described previously [97]. Briefly, the soluble Gp2 clone was allowed sufficient time to reach 90% equilibrium with biotin-conjugated target over a range of Gp2 concentrations. Yeast cells displaying the same Gp2 clone and yeast cells harboring no plasmid (to aid in pelleting) were added to the solution, and allowed to equilibrate for six days. Cells were washed, labeled for target binding and analyzed with flow cytometry, as above.

A431 epidermoid carcinoma were kindly provided by Dr. Daniel Vallera (University of

Minnesota). MDA-MB-231 mammary carcinoma were kindly provided by Dr. Jayanth

Panyam (University of Minnesota). DU145 prostate carcinoma were kindly provided by

Dr. Efrosini Kokkoli (University of Minnesota). MCF7 mammary carcinoma were kindly provided by Dr. Deepali Sachdev (University of Minnesota). A431, MDA-MB-231, and

MCF7 were cultured in Dulbecco’s modified Eagle’s medium with 10% fetal bovine serum at 37 ºC in humidified air with 5% CO2. DU145 was cultured in Minimum Essential Media with 10% fetal bovine serum at 37 ºC in humidified air with 5% CO2.

Gp2 Blocking

N-hydroxysuccinimidobiotin (Thermo Fischer Scientific, Rockford, IL) in dimethyl sulfoxide was added to purified Gp2 suspended in PBS (8.0 g/L NaCl, 0.2 g/L KCl, 1.44

55 g/L Na2HPO4, 0.24 g/L KH2PO4, pH 7.2) at a 5:1 molar ratio. Biotin free Gp2 at 1 µM and pure PBS were used to label A431 cells for 15 min at 4 °C. Cells were pelleted and washed with PBSA, and 200 nM biotin-Gp2 was used to label both samples. Cells were pelleted and washed with PBSA and incubated with 0.25 mg/L Alexa Fluor 647-conjugated streptavidin for 10-15 min. Fluorescence was analyzed on a C6 Accuri flow cytometer.

Illumina Analysis Preparation

To isolate the plasmid DNA from protein-displaying yeast, Zymoprep Yeast Plasmid

Miniprep II was used. DNA samples from four separate magnetic bead sorted populations as well as the naïve library were divided into separate groups based on target binding campaign and naïve library. In total, five pools of DNA were isolated and separately analyzed. Following plasmid DNA extraction, two rounds of PCR were completed to assemble the Gp2 gene fragment with Illumina primers, index tags, multiplexing bar codes, and TruSeq universal adapter. For all PCR conducted during amplicon library preparation,

KAPA HiFi polymerase was used as it has been shown to reduce clonal amplification bias due to GC content [170] as well as fragment length bias [171]. Compatible multiplexing and adapter primers were designed according to TruSeq sample preparation guidelines. To increase library sequence diversity, ~25% PhiX control library was spiked in which is common practice for amplicons with large regions of conservation. Samples were submitted to the University of Minnesota Genomics Center.

Full length, in frame Gp2 sequences were isolated from Pandaseq assembled [172] Illumina files. For the binding populations, loop sequences that had fewer than 28 occurrences were ignored to account for background from carryover during bead sorting. At each position 56 within a family, amino acid counts were summed and then damped (quad rooted) and frequency was calculated from the sum across families. The amount of dampening was chosen to balance importance of dominant sequences, multiplied many fold during directed evolution, and rare sequences, containing important information about tolerated mutations.

For framework analysis the family threshold was set to 85% and damped (quad root), these values were selected for the same reasons as above but the family threshold is different due to the change in amino acid length of the analyzed section.

Fluorescence Microscopy

A431 and MCF7 cells were cultured in Dulbecco’s modified Eagle’s medium with 10% fetal bovine serum at 37 ºC in humidified air with 5% CO2 in a 12 well plate. At ~80% confluency media was removed and cells were washed three times by adding 1 mL PBSA and aspirating. 1 mL of 100 nM of Gp2 was added to the wells and incubated for 15 min at 4 °C. Cells were washed three times with PBSA. 0.3 mL of fluorescein conjugated rabbit anti-His6 antibody was added to each well and incubated for 5-10 min at 4 °C. Cells were washed three times with PBSA and viewed on an Evos Fl microscope (Advanced

Microscopy Group, Mill Creek, WA).

Circular Dichroism

Purified Gp2 was lyophilized and resuspended in PBS or 10 mM sodium acetate pH 5.5

(for GαLysA0.3.3 and GαEGFR2.2.3 due to low solubility in PBS) to a concentration of 0.2-

0.9 mg mL-1. Ellipticity was measured from 260 to 200 nm on a Jasco J-815 (Jasco, Inc.,

Easton, MD) spectrophotometer in a quartz cuvette with 1 mm path length. Thermal

57 denaturation was carried out by measuring the ellipticity at 218 nm from 25 to 98 °C and

Tm was calculated from a standard two-state unfolding curve.

58

Chapter 3: A 64Cu-labeled Gp2 Domain for PET Imaging of Epidermal

Growth Factor Receptor

3.1 Outline

Purpose: Determine the efficacy of a 45-amino acid Gp2 domain, engineered to bind to

epidermal growth factor receptor (EGFR), as a positron emission tomography (PET) probe

of EGFR in a xenograft mouse model.

Methods: The EGFR-targeted Gp2 (Gp2-EGFR) and a non-binding control were site-

specifically labeled with 1,4,7,10-tetraazacyclododecane-1,4,7,10-tetraacetic acid (DOTA) chelator. Binding affinity was tested towards human EGFR and mouse EGFR. Biological activity on downstream EGFR signaling was examined in cell culture. DOTA-Gp2 molecules were labeled with 64Cu and intravenously injected (0.6-2.3 MBq) into mice

bearing EGFRhigh (n=7) and EGFRlow (n=4) xenografted tumors. PET/computed

tomography (CT) images were acquired at 45 min, 2 h, and 24 h. Dynamic PET (25 min)

was also acquired. Tomography results were verified with gamma counting of resected

tissues. Two-tailed t tests with unequal variances provided statistical comparison.

Results: DOTA-Gp2-EGFR bound strongly to human (KD = 7 ± 5 nM) and murine (KD =

29 ± 6 nM) EGFR, and non-targeted Gp2 had no detectable binding. Gp2-EGFR did not agonize EGFR nor antagonize EGF-EGFR. 64Cu-Gp2-EGFR tracer effectively localized to

EGFRhigh tumors at 45 minutes (3.2 ± 0.5 %ID/g). High specificity was observed with

significantly lower uptake in EGFRlow tumors (0.9 ± 0.3 %ID/g, p < 0.001), high tumor-to-

background ratios (11 ± 6 tumor:muscle, p < 0.001). Non-targeted Gp2 tracer had low

59 uptake in EGFRhigh tumors (0.5 ± 0.3 %ID/g, p < 0.001). Similar data was observed at 2 h and tumor signal was retained at 24 h (2.9 ± 0.3 %ID/g).

Conclusion: An engineered Gp2 PET imaging probe exhibited low background and target- specific EGFRhigh tumor uptake at 45 min, with tumor signal retained at 24 h post-injection, and compared favorably with published EGFR PET probes for alternative protein scaffolds. These beneficial in vivo characteristics, combined with thermal stability, efficient evolution, and small size of the Gp2 domain validate its use as a future class of molecular imaging agents.

3.2 Introduction

Molecular cancer therapeutics have provided effective treatments for many cancers, yet are typically characterized by efficacy on only a subset of patients, even within a type of cancer as defined by tissue[173, 174]. Personalized or precision medicine via molecular characterization to differentiate responders from non-responders can aid patient outcomes[175]. Epidermal growth factor receptor (EGFR) overexpression is present in many cancer types[88, 176–181], correlates with differentiation, reduced disease-free and overall survival, and is an independent prognostic indicator of poor survival in colorectal cancer patients[182, 183]. EGFR amplification is predictive of response to cetuximab in wild-type KRAS metastatic colorectal cancer patients[89–91]. In HER2-positive primary breast cancer, EGFR overexpression – but not copy number – is a poor prognostic factor and predictive of response to trastuzumab[184]. The current biopsy/immunohistochemistry approach to EGFR characterization is invasive and does not account for spatiotemporal heterogeneity, most notably differential expression between primary tumors and 60

metastases[185–187]. Positron emission tomography (PET) targeting EGFR could inform

personalized treatment plans by enabling identification, localization, and characterization

of primary tumors and metastases, while being non-invasive, quantitative, and sensitive to

picomolar quantities. PET based imaging has been clinically useful for other receptors,

such as imaging estrogen receptor for breast cancer[188, 189].

Numerous scaffolds have been explored as molecular PET tracers of EGFR. Therapeutic monoclonal antibodies (~150 kDa) have been radiolabeled to visualize EGFR in vivo but slow clearance results in high background and liver signal and necessitates late imaging times that elevate patient dose[190–195]. 94-residue fibronectin domains[104, 196], 58- residue affibodies[197–201], 120-residue nanobodies[202–204] and 400-residue Fab fragments[205] have provided good tumor-to-background ratios at early time points (≤4 h) via nuclear imaging due to their fast clearance, better extravasation, and increased tissue penetration compared to antibodies[27, 66, 67, 206]. Additional scaffolds have been used for other targets[29]. Small molecule inhibitors[207–212] and natural EGF ligand[213] provide molecular characterization but are not biologically passive.

We recently developed the 45-residue Gp2 domain as a small, stable protein scaffold that has been successfully evolved towards multiple targets with high affinity (0.2-18 nM Kd) while retaining thermal stability (65-80ºC)[214]. The Gp2 scaffold contains a framework of a single alpha-helix and three beta-strands, and two solvent-exposed loops that form the diversified paratope. Thermal stability, lack of cysteine, and presence of a single lysine residue distant from the proposed paratope provide ease of chemical conjugation of imaging moieties through amine or thiol chemistry. Additionally, the small size and 61

straightforward structure enable direct chemical synthesis. The two Gp2 variants used here

are Gp2-EGFR2.2.3, which was previously evolved to bind to EGFR with 18 ± 8 nM affinity,

and EGFR non-binding control, Gp2-rIgG3.2.3, which previously evolved to bind to an

irrelevant control (rabbit IgG; notably the molecule does not cross-react with murine IgG)

(herein referred to as Gp2-EGFR and Gp2-nb). These variants share 70% sequence identity

(Table S3.1).

We hypothesize that the small size of Gp2 will provide high tumor uptake with fast blood clearance enabling high contrast images at early time points. The ease of evolution and synthesis combined with high thermal stability and different paratope topology may provide a useful tool as an alternative imaging agent to available molecules. In particular, variant Gp2-EGFR2.2.3 has 18±8 nM affinity for cell-surface EGFR with a midpoint of thermal denaturation of 71ºC. The current study evaluates the ability of this scaffold to function as molecular PET agent in a small animal model.

3.3 Materials and Methods

3.3.1 Protein production and DOTA conjugation

Gp2 domains were produced recombinantly in E. coli as described previously[214].

Briefly, one liter of LB medium with 50 mg/L kanamycin was inoculated with 5 mL of

overnight BL21(DE3) E. coli culture carrying the pET-Gp2-His6 plasmid, grown at 37 ºC

to an optical density (600 nm) of 0.6-1.5 units, and induced with 0.5 mM isopropyl β-D-1-

thiogalactopyranoside for 20-24 hours at 30 °C. Cells were pelleted, resuspended in 10 mL

of lysis buffer (50 mM sodium phosphate (pH 8.0), 0.5 M NaCl, 5% glycerol, 5 mM 3-[(3-

62

cholamidopropyl) dimethylammonio]-1-propanesulfonate, and 25 mM imidazole), and underwent four freeze-thaw cycles. The soluble fraction was isolated by centrifugation at

12,000 g for 10 min. Gp2 was purified by metal affinity chromatography on a HisPur resin

(Pierce, Thermo Fisher Scientific). Purified Gp2, 30-60 µM, in PBS containing 150 mM

imidazole was mixed with 25 to 50-fold molar excess 10 mg/mL DOTA-NHS-ester

(Macrocyclics) in dimethyl sulfoxide and allowed to react at room temperature for 1 h. The

reaction was quenched with excess 1 M Tris pH 8.0, purified on a PD-10 column (GE

Healthcare), and evaluated by matrix-assisted laser desorption ionization mass spectrometry.

3.3.2 Size Exclusion Chromatography

Protein solutions in 100 mM sodium acetate at pH 5.0 were filtered with a 0.2 µM filter to

remove any particulates. 200 µL of 40 µM DOTA-Gp2 was loaded onto an AKTA

primeplus (GE Healthcare Bio-Sciences) and with a Superdex 75 10/300 GL column. The mobile phase was 100 mM sodium acetate at pH 5.0 flowing at 0.5 mL/min.

3.3.4 Cell growth

A431 epidermoid carcinoma were kindly provided by Dr. Daniel Vallera (University of

Minnesota). MDA-MB-435 cells, which have similarities to a melanoma cell line but also

show evidence of breast cancer lineage[215], were kindly provided by Dr. Tim Starr

(University of Minnesota). Cells were cultured in Dulbecco’s modified Eagle’s medium

with 10% fetal bovine serum at 37 ºC in humidified air with 5% CO2.

3.3.5 Affinity measurement

63

Cells to be used in flow cytometry were detached using trypsin for a shorter time (3–5 min)

than recommended. Detached cells were washed and labeled with Gp2 at varying

concentrations for 15–30 min at 4 °C. Cells were pelleted and washed with PBSA (PBS +

0.1% w/v BSA), then labeled with fluorescein-conjugated rabbit anti-His6 antibody

(Abcam ab1206) for 15 min at 4 °C. Fluorescence was analyzed on a C6 Accuri flow

cytometer (BD Biosciences). The equilibrium dissociation constant, KD, was identified by minimizing the sum of squared errors assuming a 1:1 binding interaction.

Affinity of Gp2 towards soluble murine EGFR ectodomain (Sino Biological) was determined using Gp2 displayed on the yeast surface as described previously[214].

3.3.6 Western Blot Analysis

A431 cells were grown to approximately 60% confluency, washed with PBS and incubated

in serum-free medium overnight at 37 ºC in humidified air with 5% CO2. The next day, cells were washed with PBS and exposed to four different conditions at 37 ºC: (1) PBS for

20 min; (2) 5 nM DOTA-Gp2-EGFR for 20 min; (3) 5 nM epidermal growth factor (Gemini

Bio Products) for 20 min; or (4) 5 nM DOTA-Gp2-EGFR for 30 min, washed with PBS, followed by 5 nM epidermal growth factor for 20 min. Cells were detached from the plate by mechanical shearing in RIPA buffer (PBS with 1% v/v Triton X-100, 0.5% w/v sodium deoxycholate, 0.1% w/v sodium dodecyl sulfate). Cells were lysed through rotation at 4 °C for 30 min in RIPA buffer. After centrifuging at 15,000g for 15 min at 4 °C the supernatant was collected and protein concentration was determined with a Pierce BCA assay kit

(Thermo Scientific).

64

Whole-cell lysates (60 µg) were boiled in 5X Laemmli loading buffer at 95°C for 5 minutes, separated by 8% SDS-PAGE, transferred to PVDF membrane and subjected to indicated immunoblotting analyses according to manufacturer guidelines. The primary antibodies bind phosphorylated AKT serine 473 (#9271 Cell Signaling Technology), total

AKT (#9272), phosphorylated EGFR tyrosine 1068 (#2234), total EGFR (#2232S) and actin (#A3853 Sigma-Aldrich) were incubated overnight at 4 °C. After washing with Tris- buffered saline with Tween-20 (50mM Tris, 150mM sodium chloride and 0.05% Tween-

20), the membrane was further immunoblotted with either anti-rabbit horseradish peroxidase-conjugated antibody (#NA934V GE Healthcare Life Science) or anti-mouse horseradish peroxidase-conjugated (#170-6516 Biorad) secondary antibody for 1 h at 37

°C.

3.3.7 Internalization

Gp2-EGFR and Gp2-nb in PBS with 150 mM imidazole was allowed to react with fluorescein isothiocyanate in DMSO (3 mg/mL) at 100x molar excess at room temperature for 1 h. The reaction was quenched with excess 1 M Tris buffer pH 8, purified on a Zeba

Spin Desalting Column 7K molecular weight cutoff (ThermoFisher). Fluorescein conjugation was verified by matrix-assisted laser desorption ionization mass spectrometry.

A431 and MDA-MB-435 cells were grown and detached as above. Cells were labeled with

100 nM fluorescein conjugated Gp2 at 37 °C for 0.5 and 1 h, followed by incubation with

0.2 M acetic acid, 0.5 M NaCl pH 2 for 5 min to strip extracellular binding. Fluorescence was detected by flow cytometry. Internalization was calculated by normalizing the change

65

in fluorescence signal over time to fluorescence signal of A431 cells labeled with 100 nM

fluorescein-Gp2-EGFR at 4 °C for 0.5 h.

3.3.8 Copper chelation and purification

64 CuCl2 (UW-Madison) was diluted into 150 µL of 100 mM sodium acetate pH 5.0 and pH

64 adjusted to pH 5.0. Approximately 50 MBq of the CuCl2 was added to 100 µL DOTA-

Gp2 in 100 mM sodium acetate pH 5.0 at 30-60 µM. The mixture was allowed to incubate

at 47 °C for 1 h and purified by PD-10 column equilibrated with 10 mM sodium acetate pH 5.0 in order to remove unchelated copper.

3.3.9 Radio TLC

1 µL of 64Cu-Gp2 was spotted on filter paper and a mobile phase of PBS was applied for

20 minutes. An AR-2000 radio-thin layer chromatography scanner (Eckert & Ziegler)

scanned and analyzed the filter paper for migration of radioactive peaks. Comparison of

scans before and after PD-10 purification showed removal of nearly all of the peak near

the solvent front (the unconjugated 64Cu) while retaining the less mobile peak (64Cu-Gp2), which cold PD-10 purifications along with SDS-PAGE and binding assays have shown to contain highly pure Gp2.

3.3.10 Tumor inoculation

Eight week old female (Foxn1nu/Foxn1nu) mice (Jackson Laboratory) were anesthetized with 1.5% isoflurane in 1 mL/min O2 and subcutaneously injected with 10 million MDA-

MB-435 cells in 50% v/v Matrigel Matrix (Corning) in one shoulder. After 4 weeks, the

mice were injected with two million A431 cells in 50% Matrigel Matrix into the opposite 66

shoulder. Xenografted tumors were grown to 5-10 mm in diameter (approximately two

weeks for A431 and six weeks for MDA-MB-435).

3.3.11 EGFR expression quantification

To quantify EGFR expression within in vivo xenografted tumor cells, GentleMACS dissociator C Tubes (Miltenyi Biotec) were used to generate single cell suspensions from each excised tumor. Receptor expression was quantified by flow cytometry with Quantum

Simply Cellular anti-mouse IgG calibration beads (Bang’s Laboratories), using Gp2-EGFR and/or mouse anti-EGFR antibody (Abcam ab30) at 1 µM, followed by secondary labeling with fluorescein conjugated rabbit anti-His6 (Abcam ab1206) or AlexaFluor 647 conjugated goat anti-mouse IgG (ThermoFisher), respectively. The cell population from the A431 tumor was approximated as two normally distributed subpopulations.

3.3.12 PET imaging – static and dynamic

All procedures performed in studies involving animals were in accordance with the ethical standards of the University of Minnesota and approved by the Institutional Animal Care and Use Committee. Mice were anesthetized with 1.5% isoflurane in 1 mL/min O2 and tail vein injected with approximately 0.6 to 2.3 MBq of 64Cu-Gp2 as measured by a Atomlab

100 dosimeter with a setting of 50.2. Five-minute static PET scans were performed at 45 min, 2 h and 24 h after injection using an Inveon micro-PET/CT (Siemens). The PET energy cutoffs were 350-650 keV with a timing window of 3.438 ns. The PET images were reconstructed with an OSEM2D method using 4 iterations of Fourier rebinning. PET images were smoothed with a 1 x 1 x 1 voxel Guassian filter. The CT used 340 projections

67

of 80 kV at 500 µA with 200 ms exposure over 384 s of total scan time with an effective

pixel size of 98.3 µm. The CT was reconstructed using the Feldkamp algorithm with a

Shepp-Logan filter. The preceding methods are included in the Inveon Acquisition

Workplace software (Siemens). A second batch of independently produced, DOTA-

conjugated, 64Cu chelated, and purified 64Cu-Gp2-EGFR injected into another set of tumor inoculated mice validated the results of the other 45 min and 2 h PET/CT scans.

PET images were quantified using the Inveon Research Workplace software (Siemens).

Using the CT as an anatomical guide, the volume of 10-20 mm3 that resulted in the maximum average PET signal for that tissue was selected. The anterior end of the liver was selected to avoid noise from kidney signal. The posterior leg furthest from any bladder signal was chosen to represent muscle background.

3.3.13 Tissue gamma counting

After imaging, mice were euthanized by cervical dislocation under isoflurane anesthesia.

Blood, bone, brain, heart, large intestine, kidneys, liver, lungs, muscle, pancreas, skin, spleen, stomach, and tumors were resected, weighed, and had their activity measured by a

CRC-25W (Capintec) gamma counter averaged over 45 seconds. The CRC-25W collected counts from all windows and was calibrated through serial dilutions based on the dose reported by the Atomlab 100 dosimeter used to measure injected dose. Renal radiation dose was calculated with the Medical Internal Radiation dose method.

3.3.14 Statistics

68

Comparisons between two samples were determined using a two-tailed student’s t-test for

unequal variances. P-values are stated where relevant. Data were presented as average ±

standard deviation.

3.4 Results

3.4.1 Gp2 Production and Conjugation

EGFR-binding Gp2-EGFR and non-binding control Gp2-nb, both containing a C-terminal

His-6 tag, were produced in the soluble fraction of E. coli and purified by immobilized metal affinity chromatography. Purity was verified with SDS-PAGE and molecular weight was verified by matrix assisted laser desorption ionization mass spectrometry (Gp2-EGFR expected: 6873, actual: 6869; Gp2-nb expected: 6228, actual: 6226). The copper chelator

1,4,7,10-tetraazacyclododecane-1,4,7,10-tetraacetic acid (DOTA) was conjugated to the

N-terminal lysine residue distal from the proposed paratope in the Gp2 scaffold framework

(Fig. 3.1). Mass spectrometry was used to verify an average labeling of 0.83 DOTA per molecule for Gp2-EGFR and 1.1 DOTA per molecule for Gp2-nb (Fig. S3.1). Size

exclusion chromatography verified that DOTA-Gp2-EGFR (7.4 kDa) is dominantly

monomeric, eluting at a comparable time to control proteins of a similar size (6.5 kDa

aprotinin and 7.5 kDa affibody; Fig. S3.2).

69

Figure 3.1 Gp2 conjugation. Purified Gp2 was conjugated with the N- hydroxysuccinimidyl ester of the chelator DOTA then radiolabeled with 64Cu

3.4.2 EGFR Binding

Gp2-EGFR binding affinity towards cellular EGFR was previously found to be 18 ± 8 nM

[214]. The effect of DOTA conjugation on binding affinity was examined by labeling

EGFR-expressing A431 epidermoid carcinoma cells with varying levels of DOTA-Gp2-

EGFR. DOTA conjugation did not significantly change ligand affinity (7 ± 5 nM; Fig. 3.2).

The non-binding control DOTA-Gp2-nb showed no detectable binding up to 300 nM on

A431 cells (Fig. 3.2). Preclinical imaging experiments with EGFR targeted Gp2 were carried out in mice, so the affinity of Gp2-EGFR towards murine EGFR was examined.

Yeast displaying Gp2-EGFR were labeled with varying levels of recombinantly-produced

murine EGFR extracellular domain, which revealed an affinity of 29 ± 6 nM (Fig. S3.3).

70

Figure 3.2 Affinity titration. A431 cells were labeled with DOTA-Gp2 domains (squares, DOTA-Gp2-EGFR; triangles, DOTA-Gp2-nb) at the indicated concentrations. Binding was detected by fluorescein-conjugated anti-His6 antibody via flow cytometry. Fluorescence signal is normalized between minimal and maximal fluorescence. One representative titration of triplicate experiments is presented. A representative Gp2-EGFR titration curve is also included for comparison (circles, dotted line). The equilibrium dissociation constant for DOTA-Gp2-EGFR, assuming a 1:1 binding model, is 7±5 nM

3.4.3 Biological Activity

The effect of DOTA-Gp2-EGFR binding on the EGFR signaling pathway was determined

by western blot to detect phosphorylated AKT (p-AKT at S473), a downstream protein

kinase, and phosphorylated EGFR (p-EGFR at Y1068) (Fig. 3.3). A431 cells labeled with

5 nM DOTA-Gp2-EGFR show no change in p-AKT or p-EGFR compared to PBS only

control suggesting that DOTA-Gp2-EGFR is not agonistic to EGFR, DOTA-Gp2-EGFR

is also not antagonistic, as blocking the A431 cells with 5 nM Gp2 before addition of 5 nM

epidermal growth factor showed no change in level of p-AKT or p-EGFR compared to

EGF only control.

The ability of A431 cells to internalize Gp2 was examined through flow cytometry of cells

grown in tissue plate culture. Fluorescein was conjugated to Gp2-EGFR (0.45

fluorescein/protein) and Gp2-nb (0.61 fluorescein/protein) through amine chemistry and

used to label A431 and MDA-MB-435 cells at 37 °C for up to 1 h. At 0.5 and 1 h, cells

were acid stripped and the increase in signal over time was used to calculate internalization

rate (Fig 3.3b). Fluorescein-Gp2-EGFR rapidly internalized into A431 cells (2.2±0.3-fold

of saturated surface EGFR per hour) compared to control cells (MDA-MB-435, p<0.001)

and control non-binder (fluorescein-Gp2-nb, p<0.001).

71

Figure 3.3 Biological Activity. (a) A431 cells were labeled with four different conditions in triplicate: PBS only, 5 nM DOTA-Gp2-EGFR in PBS, 5 nM epidermal growth factor (EGF) in PBS, or 5 nM DOTA-Gp2-EGFR followed by 5 nM EGF (Gp2 block). Cells were lysed and separated by SDS-PAGE. Blotting was done to detect phosphorylated AKT (S473), a protein kinase in the EGFR signaling pathway, and phosphorylated EGFR (Y1068), as well as total amounts of the two proteins and actin to verify similar total protein concentration. DOTA-Gp2-EGFR is neither agonistic, since it does not activate the EGFR pathway, nor antagonistic, since it does not block activation when EGF is present. (b) A431 and MDA-MB-435 cells were labeled with 100 nM fluorescein conjugated Gp2 at 37 °C for 0.5 and 1 h, followed by incubation with acid for 5 min to strip extracellular binding. Internalization was calculated by normalizing the change in fluorescence signal over time to fluorescence signal of A431 cells labeled with 100 nM fluorescein-Gp2-EGFR at 4 °C for 0.5 h. Error bars represent standard deviation for n = 3 biological replicates. P < 0.001 is indicated by *.

3.4.4 Copper chelation and purification

Radioactive 64Cu was incubated with DOTA-Gp2 for 1 h at 47ºC. Free 64Cu was separated

by size exclusion chromatography resulting in, on average, 93% purity. Labeling efficiency

was 29%, perhaps due to the low number of DOTA per Gp2 (to assure site specific

72

conjugation) or modest protein concentration. Based on historical yields from non-

radioactive DOTA-Gp2 purifications, the specific activity of chelated protein was 0.6-1.1

MBq/nmol. Radiolabeled DOTA-Gp2 variants are referred to as 64Cu-Gp2-EGFR or 64Cu-

Gp2-nb.

3.4.5 Murine model micro-PET/CT and tissue biodistribution

The efficacy of the Gp2 domain was evaluated in a murine model with xenografted human tumor lines. To assess specificity EGFRhigh A431 tumors (mean: 5.2 x 105 EGFR/cell; 75th

percentile: 1.9 x 106) and EGFRlow MDA-MB-435 tumors (mean and 75th percentile: < 4 x

103 EGFR/cell) were simultaneously evaluated. A non-binding Gp2 domain was tested in

parallel. 64Cu-Gp2 was injected via the tail vein into mice harboring dual tumors. PET/CT

was performed at 45 min and 2 h. 64Cu-Gp2-EGFR effectively localized to A431 tumors highly expressing EGFR (3.2 ± 0.5 %ID/g) and cleared from background (11 ± 6 tumor:background ratio, p < 0.001) as early as 45 minutes after injection (Fig. 3.4).

Targeting was molecularly specific as EGFRlow MDA-MB-435 tumors had demonstrably

lower signal (0.9 ± 0.3 %ID/g, p < 0.001). Moreover, the non-targeted control 64Cu-Gp2- nb exhibited lower signal in EGFRhigh tumors (0.5 ± 0.3 %ID/g, p < 0.001). As for most

small protein imaging agents, high kidney signal is observed (78 ±16 %ID/g) resulting

from renal processing. Similar imaging is observed at 2 h where 64Cu-Gp2-EGFR uptake to EGFRhigh tumors was 3.2 ± 0.6 %ID/g and 12 ± 4 tumor:background (p = 0.006).

Specificity is retained at 2 h as EGFRlow tumors had low uptake (0.7 ± 0.2 %ID/g, p =

0.009) and the non-targeted control had lower signal in EGFRhigh tumors (0.7 ± 0.3 %ID/g,

p = 0.007). While early time point imaging is the preferred translational route, we

73 acknowledge that for alternative applications, such as targeted therapy, and biological safety concerns the behavior of engineered proteins at later times is relevant. Even with the fast clearance, preferential EGFRhigh tumor signal from 64Cu-Gp2-EGFR is still evident at

24 h (2.9 ± 0.3 %ID/g) with high tumor:background (8 ± 6, p = 0.009).

Figure 3.4 PET/CT imaging. Coronal and axial micro-PET/CT images of anesthetized athymic nude mice bearing subcutaneously xenografted A431 tumors (EGFRhigh) in the left 74

shoulder and MDA-MB-435 tumors (EGFRlow) in the right shoulder. The mouse in the 24 h image lacks a MDA-MB-435 tumor. Mice were injected by tail vein with 0.6-2.6 MBq of either 64Cu-Gp2-EGFR (top row) or, the non-targeted control, 64Cu-Gp2-nb (bottom row). Five minute static PET scans followed by CT scans were acquired at 45 min (left), 2 h (middle), and 24 h (right, for targeted Gp2 only) post-injection. Image planes were selected such that both tumors appear in the image.

PET images were corroborated by ex vivo tissue gamma counting at 2 h and 24 h (Fig.

3.5). At 2 hours post injection, 64Cu-Gp2-EGFR localized significantly more to xenografted EGFRhigh tumors (7.0 ± 1.9 % ID/g) as compared to EGFRlow tumors (1.4 ±

0.3 % ID/g; p < 0.001). The targeted Gp2 had 14 ± 8 tumor-to-blood ratio and 23 ± 6 tumor-to-muscle at 2 h, compared to 1.8 ± 1 tumor-to-blood (p = 0.005) and 3.3 ± 3.1 tumor-to-muscle (p < 0.001) for the non-targeted Gp2. In addition, the non-targeted Gp2 showed significantly lower EGFRhigh tumor uptake with 1.4 ± 0.4 % ID/g (p = 0.001).

Renal retention was high for the targeted (244 ±66 %ID/g) and non-targeted (208 ± 19

%ID/g) probes. Liver signal was modest for both (4.8 ± 1.8 and 4.9 ± 1.9 %ID/g). At 24 h the fast clearance leads to lower signal in most tissues, including EGFRhigh tumor (4.0 ±

0.3 %ID/g) and kidney (114 ± 20 %ID/g), with the exception of a notable increase in liver

signal (10.1 ± 1.3 %ID/g). Tumor-to-blood and tumor-to-muscle ratios (3.4 ± 1.1, p = 0.002

and 8.1 ± 3.6, p < 0.001, respectively) indicate there is still preferential uptake to EGFRhigh

tumor.

75

Figure 3.5 Resected tissue gamma counting. After PET/CT imaging, mice were euthanized and tissues were collected, weighed, and measured for activity. (A) The targeted 64Cu-Gp2-EGFR (dark gray) and non-targeted 64 Cu-Gp2-nb (light gray) distribution is shown for the selected tissues at 2 h post-injection. (B) Ratios of tumor signal to relevant background signals in blood and muscle. The data is combined over two separate experiments, n = 4 for mice containing EGFRhigh and EGFRlow tumors and another n = 3 for mice containing only EGFRhigh tumors. Significance for important comparisons (p < 0.005) is denoted by *. (C and D) Biodistribution and tumor-to-background ratios of 64Cu- Gp2-EGFR in n = 3 mice at 24 h post-injection. Error bars represent standard deviation.

The rapid distribution and clearance of 64Cu-Gp2 evident at the 45 minute scan was more thoroughly investigated by 25-minute dynamic PET scans (Fig. 3.6). Using heart signal as a surrogate for probe blood levels, clearance half-time was revealed to be 3.2 ± 1.0 min, supporting the low accumulation in muscle background seen at 45 minutes post-injection.

76

Figure 3.6 Dynamic PET scans. 25 minute dynamic PET scans were acquired on anesthetized mice containing xenografted EGFRhigh tumors. The average signal within ~15 mm3 regions, guided by an anatomical CT scan, is presented. Data were fit assuming exponential kinetics. The clearance half-time within the heart (predominantly blood pool) was t1/2 = 3.2 ± 1.0 min (n=2 mice).

3.5 Discussion

Other small scaffolds have been successfully used for in vivo imaging previously but drawbacks, such as the relatively larger size of fibronectins (11 kDa)[76] and (20 kDa)[79], or the difficultly of broad evolution and presence of cysteines in knottins[81] and cyclic peptides[45], has driven the search for an optimal scaffold. Cysteine-free

Affibodies[86] have gone to smaller size (58 amino acids) and their helical paratope has yielded high affinity binders, however they are typically severely destabilized after mutation[23]. Gp2 domains push the size even smaller (45-49 amino acids), have thus far remained highly thermally stable after mutation, and provide a vastly different paratope structure compared to Affibodies. Beyond its previous characterization for high-affinity,

EGFR-specific binding[214], further biophysical evaluation of Gp2-EGFR in the current

77

study revealed that it is well-suited for use in molecular imaging. Though selected solely

for EGFR ectodomain binding, the current Gp2 variant is neither agonistic nor antagonistic

(Fig. 3.3a). This enables passive imaging – unlike radiolabeled EGF or bivalent,

crosslinking-compatible antibodies – which is preferred to avoid impacting EGF signaling

cascades. Additionally, Gp2-EGFR is internalized into A431 cells (Fib. 3b). Internalization

potentially allows for an accumulation of signal in target tissues over time, but may not be

necessary for Gp2 due to the rapid clearance of the small agent, which has the benefit of

reducing background. Primary amine / N-hydroxysuccinimidyl chemistry was selected for

conjugation at the N-terminal lysine distal to the evolved loops (Fig. 3.1). As hoped, DOTA

conjugation did not hinder binding affinity (18 ± 8 nM as Gp2-EGFR to 7 ± 5 nM as

DOTA-Gp2-EGFR). Importantly, Gp2-EGFR exhibits cross-reactive binding to murine

EGFR, which aids the validity of the murine model to assess the probe’s tumor selectivity relative to lower levels of EGFR expression in healthy tissue including liver. Modest liver accumulation was observed (4.8 ± 1.8 %ID/g at 2 h), which was due to physiological processing, not EGFR targeting, as the non-binding control exhibited equivalent hepatic retention (4.9 ± 1.8 %ID/g). This liver signal remains below the EGFRhigh tumor signal (1.5

± 0.4 tumor:liver). Nevertheless, efforts are underway to mutate surface hydrophobic

amino acids to increase Gp2 hydrophilicity, which effectively reduced liver signal for

engineered fibronectin domains[196].

The relevance of non-invasive EGFR detection in the clinic has led to development of

many imaging probes, including a variety of small protein scaffolds. The increased

extravasation and tissue penetration of protein scaffolds compared to larger proteins allows

78

for high contrast early imaging resulting in lower patient dose. Multiple successes have

been realized for EGFR previously. The beneficial properties of Gp2 domains as evolvable

protein scaffolds, such as small size, lack of cysteines, and high thermal stability, do not

guarantee successful translation to an imaging agent. However, these properties provide

benefits during evolution, conjugation, administration, and biodistribution that are useful

for imaging agents or therapeutics towards many targets. Due to the variations between

labs, strict quantitative comparisons between scaffolds does not prove superiority.

Moreover, comparisons across scaffolds must take care to acknowledge the context- dependent properties – affinity, charge, hydrophilicity – of individual protein variants.

Nevertheless, the current data demonstrate that the Gp2 domain is a promising PET imaging agent for EGFR with potential benefits versus other probes, and further optimization of the affinity and biophysical properties of Gp2-EGFR could lead to a clinically effective PET imaging agent. 64Cu-Gp2-EGFR exhibits tumor accumulation (3.2

± 0.5 %ID/g at 0.75 h via PET; 7.0 ± 1.9 %ID/g at 2 h via excised tissue) comparable to other small protein PET probes including fibronectin domains (3.4 ± 1.0 and 2.4 ± 1.0

%ID/g at 1 h)[104] and affibodies (5.7 ± 0.6 and 9.7 ± 4.9 %ID/g at 1 h)[197] as well as nanobodies for single-photon emission computed tomography (4.6 ± 0.4 %ID/g at 1 h)[204]. The dramatically lower uptake of 64Cu-Gp2-EGFR in EGFRlow tumors and non-

binding control in EGFRhigh tumors was similarly observed for the fibronectin domain. For

affibody, neither EGFRlow tumors nor non-targeted affibody were evaluated as controls.

Blocking did yield a reduction, albeit incomplete (47%), in EGFRhigh tumor uptake. 64Cu-

Gp2-EGFR exhibits high tumor:blood ratio (14 ± 8 at 2 h) because of rapid clearance (3.2

79

± 1.0 min half-time). Conversely, affibody provides limited tumor:blood differentiation

(1.2 ± 1.1 and 1.0 ± 0.1 at 1 h and 4 h) because of slower clearance (20 – 120 min half- time[197, 216–219]) while fibronectin is intermediate (8.9 ± 4.7 and 6.4 ± 4.3 at 1 h and 4 h[104]) with rapid clearance (2.1 ± 0.3 min half-time[104]). Tumor:muscle specificity is also strong for 64Cu-Gp2-EGFR (11 ± 6 at 0.75 h via PET; 23 ± 6 at 2 h via excised tissue),

comparable to affibody (16 ± 7 at 1 h, 18 ± 4 at 4 h, both via excised tissue) and higher

than fibronectin (8.6 ± 3.0 at 1 h via PET; 10 ± 4 and 4.2 ± 1.3 at 1 and 4 h via excised tissue).

The main disadvantage with Gp2 as an imaging agent is the high kidney signal due to

partial renal retention during clearance, which is observed for most small protein

scaffolds[29]. Dosimetry calculations indicate 3.0 mGy/MBq renal dose, which is 3% of the maximum tolerated dose for a 185 MBq injection thereby rendering this a minor

concern clinically for non-renal tumors. Yet strategies exist to lower kidney signal.

Modulation of charge has been shown to reduce renal uptake in fibronectin domains[196],

affibodies[220], and knottins[221]. Preliminary data indicate an ability to modify charge

on Gp2-EGFR while retaining activity. Additionally, alternative radiochemical

conjugation has drastically reduced renal uptake of other small protein scaffolds[26, 219,

222–226]. Specifically, transchelation from the DOTA chelator[227] may account for

some signal in the liver and kidney, and other chelators such as NOTA or PCTA have

64 shown higher stability in vivo[228]. Notably, Cu (t1/2 = 12.7 h) was used in the current

study to enable examination of distribution kinetics over short and long time periods, which

is important for initial physiological characterization of this new protein scaffold. Yet,

80

clinical use may benefit from a radioisotope with decay kinetics that align with the rapid

18 distribution of the small Gp2 domain to reduce patient dose. Future studies with F (t1/2 =

68 61 110 min), Ga (t1/2 = 68 min), or Cu (t1/2 = 3.3 h) will be valuable for clinical translation.

Evaluation on cells with intermediate EGFR expression will also be informative. It should

be noted that, as with any synthetically engineered protein with non-human sequence

components, immunogenicity of evolved molecules will need to be evaluated.

Overall, the performance of these initial Gp2 domains in vivo gives promise to the potential of Gp2-EGFR, and other targeted Gp2 domains, as molecular imaging agents.

3.6 Acknowledgement

We are grateful to Joanne Johnson of the Center for Clinical Imaging Research for assistance with PET/CT imaging and Dr. Blake Jacobson from the Department of Medicine at the University of Minnesota for assistance with ex vivo tumor analysis. This work was

funded by the National Institutes of Health (EB019518 to BJH), Komen for the Cure

(SAC110039 to DY), National Cancer Institute Cancer Center Support Grant P30 077598, and the University of Minnesota.

3.7 Supplemental Information

Table S3.1. Protein sequence alignment of two probes used in this study. Gp2-rIgG3.2.3 is the non-binding control. The diversified paratope is bolded. Additional framework sequence differences are underlined.

81

82

83

Figure S3.1. Mass spectrometry traces of DOTA conjugation to Gp2. (a) Gp2-EGFR (b) DOTA-Gp2-EGFR (c) Gp2-nb (d) DOTA-Gp2-nb

84

Figure S3.2. Size exclusion chromatography results of two trials of DOTA-Gp2-EGFR (GaE-DOTA) and various controls.

85

Figure S3.3. Multiple replicates of titrations of murine EGFR/human FC conjugate labeling yeast surface display Gp2-EGFR, detected with a fluorophore conjugated anti- human FC antibody and measured by flow cytometry.

Figure S3.4. EGFR expression of excised mouse flank tumor xenografts was analyzed via flow cytometry. Xenograft cell lines included MDA-MB-435 (EGFRLOW, n=2) and A431 (EGFRHIGH, n=4). GentleMACS dissociator C Tubes were used to generate single cell suspensions from each excised tumor. Fluorescence intensities of gated events are shown on a logarithmic scale. Receptor expression was quantified with Quantum Simply Cellular anti-mouse IgG beads (Bang’s Laboratories), using Gp2-EGFR and/or mouse anti-

86

EGFR antibody at 1 µM, followed by secondary labeling with fluorescein conjugated rabbit anti-His6 or AlexaFluor 647 conjugated goat anti-mouse IgG, respectively. The A431 tumor cell population can be approximated as two subpopulations with normal distributions (solid and dotted lines). The majority (58%) of the tumor cells express high levels of EGFR (5.2 x105) while the second subpopulation displays significantly fewer (< 4 x 103).

87

Chapter 4: High affinity PD-L1 binding Gp2 proteins isolated from diversity constrained combinatorial library

4.1. Introduction

Molecular recognition ligands have many important applications in biotechnology and medicine. Protein scaffolds enable efficient generation of binding ligands with engineered control over affinity, stability, and other biophysical properties. Protein scaffolds consist of a small binding region, or paratope, that is engineered for each target epitope, to provide selective potent interaction, and supported by a larger, conserved framework.

One such scaffold that has been engineered to bind multiple targets is Gp2. Based on the

T7 RNA polymerase inhibitor, Gp2 is a 45 (up to 49 with loop length diversity) amino acid scaffold with a two-loop paratope and a framework consisting of three beta strands and an . Gp2 has been evolved to bind multiple targets with high affinity including goat immunoglobulin G (IgG), rabbit IgG, lysozyme, epidermal growth factor receptor (EGFR)

[214], and insulin receptor [229].

Despite the successes, Gp2 can potentially be optimized to perform better during evolution and in downstream applications. In order to develop new binding function, a large portion of the paratope must be mutated to provide significant interaction area [30] between the target and ligand. However, mutations are destabilizing on average [60] and mutating an intramolecularly stabilizing site to a sub-optimal amino acid can eliminate function by unfolding an otherwise functional paratope. This balancing of intermolecular binding function and intramolecular stability is especially challenging in small proteins where a

88

large fraction of the total amino acids must be mutated to result in strong binding. In

addition to the primary functionality being selected for, the protein also must exhibit other

desired characteristics, e.g. thermal or protease stability, solubility, or binding of a specific

epitope. A higher number of hits from discovery and evolution will increase the quality

and diversity of the lead molecules.

The first generation Gp2 combinatorial library contained all 20 amino acids at each site, using amino acid frequencies that mimicked the complementarity determining region

(CDR-H3) of the human antibody repertoire. However, mutational tolerance varies by site based on the nearby environment. Variable diversities at each site, in terms of identity and prevalence of amino acids allowed, has been shown to aid evolution in other small scaffolds, such as fibronectin domains [51] and affibody domains [47]. Amino acid design has been based on amino acids that are frequently observed at protein-protein interfaces, such as tyrosine and serine [50, 64, 230], or glycine in loops. Beneficial site-wise constraint can also be identified through phylogenetic diversity [49], computational stability analysis

[231], or deep-sequencing of high-throughput evolution for function or stability [47, 51].

Small protein ligands have improved performance in molecular imaging [26, 232] compared to larger proteins, such as antibodies; these benefits are hypothesized to result from improved extravasation [66, 233, 234] and improved clearance [67].

With the rise of targeted therapies in oncology, and the observed heterogeneity of response, knowing the molecular signature of a patient’s cancer becomes important for predicting therapeutic response of a specific patient to a specific targeted drug. Non-invasive imaging via molecular probes can provide patient stratification through added knowledge of tumor 89 characteristics. A Gp2 positron emission tomography (PET) imaging agent has been successfully used to target tumors with high EGFR expression, showing clear localization compared to low EGFR expressing tumors and muscle or blood background (Kruziki

EGFR). PD-L1 is a cell surface receptor that regulating the body’s immune response by binding to PD1 receptors expressed on T cells and halting their development. Recent studies have shown that PD-L1 is overexpressed in several cancers [14] and can help the cancer evade immune response [15]. Multiple checkpoint inhibitors have been FDA approved to target this interaction, including Atezolizumab, Avelumab, and Durvalumab

[16–18]. PD-L1 overexpression has also been shown to be predictive of response for melanoma patients, however no correlation was seen in non-small cell lung cancer patients

[19]. Therefore, PD-L1 arises as an intriguing target for evolution of Gp2 binders for use in PET imaging.

In this work, we developed constrained Gp2 libraries to investigate how focused diversity can benefit ligand evolution. Gp2 clones targeting PD-L1 isolated from one of these generation 2 libraries, were further evolved to high affinity ligands, which have potential use in biotechnology and imaging.

4.2 Methods

4.2.1 Design of second generation library

Gp2 binding sequences were collected from published [214, 229] and unpublished sources

(Hong Zhou, Christopher Garcia?, Anthony Braun). In order to balance high-throughput and sanger sequencing, 6.5 x 105 sequences (34,000 unique) from high-throughput

90 sequencing of 4 populations (1.6 x 105 each) were combined with 110 unique sequences from 6 populations of Sanger sequencing scaled up so that each population had 1.6 x 105 counts. The total collection of previous binding sequences were aligned, grouped by family

(80% similarity), and then quad-root damped to decrease weighting on dominant families as described previously [214, 235] (Figure 4.1).

Figure 4.1. Change in amino acid frequency from initial naïve library to evolved binding sequences isolated from the Gp2 generation 1 library. Sites with more than a 5% change are labeled.

The amino acid frequency at each site was calculated across natural homologs to Gp2

(Figure 4.2). 32 sequences were identified through a BLAST search of the wild type Gp2

91 protein sequence (in 2016). These sequences were aligned and site wise amino acid distribution was calculated counting each sequence equally.

Figure 4.2. Frequency of amino acids in sequences of natural homologs to Gp2. Amino acids with above 5% prevalence are labeled. The four most prevalent amino acids at each site are ranked at the bottom.

To predict destabilization upon mutation, FoldX was used to calculate the change in folding energy for each single mutation at each site of a Gp2 with a randomized paratope (Figure

4.3). This method was repeated for 40 random paratope sequences with four unique PBD files describing T7 phage Gp2 (2WNM, 2LMC, 4LK0, 4LLG) and averaged to determine the final stability of each amino acid at each site.

92

Figure 4.3. Change in folding energy for each single mutation averaged over more than 40 random paratopes and 4 protein data bank structures. Positive values represent less stable folds.

Solvent accessible surface area was calculated through the GetArea webserver [92], using a 1.4 angstrom probe and averaged over four PDB files (Figure 4.4).

93

Figure 4.4. Solvent accessible surface area calculated from the GetArea webserver for 4 protein data bank files and averaged. Area is normalized to surface area for each amino acid in the unfolded state.

4.2.2 General Constraint

Site E7 had decreased levels of select amino acids in previous binders. Stability

calculations suggested that diversity should not be too deleterious as 16 amino acids yield

≤ 1 kcal/mol destabilization upon mutation. Natural homologs were slightly constrained with 75% wildtype, but had a wide mixture of amino acids in the remaining 25% although this may be because wild type is important for natural function at this site [98]. An average

SASA of 0.35 suggests a modest likelihood for beneficial intermolecular contact upon

mutation. A medium level of diversity was chosen; the design includes H, I, K, L, M, N,

P, Q, R, S, T (encoded by MNK codon).

Site S8 had increased levels of the hydrophilic amino acids R, and S and decreases of

hydrophobic including A, F, and L in previous binders. Stability calculations suggested

that diversity may be deleterious. Natural homologs suggested small amino acids G, S, and

T as well as a low level of larger amino acids are viable. An average SASA of 0.30

suggested a benefit of conservation at this site. A low level of diversity was chosen, aiming

to include wild type and the upregulated hydrophilic amino acids. The design includes D,

G, H, N, R, S and Y on a separate oligonucleotide (encoded by VRT + TAC codons).

Site S9 had increased levels of a variety of chemically disparate amino acids including F,

G, E, L, K, W, and Y in previous binders. Stability and natural homolog analysis suggested that a wide variety of amino acids are permitted at this site. An average SASA of 0.94 94

suggested that high diversity may be beneficial at this site. The design chosen was allowing

all 20 amino acids and mimicking the antibody complementarity determining region

similar to generation 1 Gp2 (CDR).

The optional loop 1 insertion sites, 9a and 9b, have some amino acids that are increased in binders, especially A, H, and G at site 9b. However, due to the limited information from other sources due to these being non-natural loop length extensions, and their position in the middle of the proposed paratope, CDR was chosen for the design at both sites.

Site E10 had an increased level of hydrophilicity, especially N and D, and highly increased levels of G and P in previous binders. Stability and natural homolog data suggested that a wide variety of amino acids are permitted at this site. An average SASA of 0.78 suggested that high diversity may be beneficial at this site. Aiming for a high level of diversity with increased G, the amino acids A, D, H, N, P, S, T, Y and G on a separate oligonucleotide

(encoded by NMT + GGT codons) were selected for the design at this site.

Site H11 had an overall increase in hydrophilicity and a large increase in G in previous binders. Stability and natural homologs suggested that a wide variety of amino acids are permitted at this site. An average SASA of 0.55 suggested that some diversity may be beneficial at this site. A design of CDR was chosen for this site.

Site S12 had a large increase in the frequency of hydrophobic amino acids in previous binders. Stability calculations suggested that many amino acids are permitted at this site, however natural homologs suggested more constraint with 83% as S and 6% as other small amino acids. An average SASA value of 0.47 suggested that some diversity may be

95 beneficial. A design was chosen that increased the level of hydrophobic amino acids, while maintaining a medium level of diversity. The design includes A, G, L, M, R, S, T, V, W

(encoded by DBG codon).

In loop 2, site V34 and P35 had highly increased levels of G and P, which are normally helpful in making a turn in protein secondary structure. Stability calculations and homolog analysis of V34 suggest that various amino acids are permitted. At P35, stability data suggested that any mutation would be detrimental. Yet we avoided completely locking in proline because we did not want to rely too heavily on the computational stability analysis.

V34 has a medium SASA of 0.57 and P35 has a high SASA of 0.74. However, the combination of both sites showing highly increased levels of G and P lead to a constrained design with high levels of these amino acids. V34 was designed as A, E, G, P, Q, R

(encoded by SVR codon). P35 was designed as A, D, H, N, P, S, T, Y and G (encoded by

NMT + GGT codons).

Site A36 had an increase in total hydrophilicity in previous binders. Stability analysis suggested that a wide variety of amino acids are tolerated. Natural homologs were relatively constrained with 81% as A. An average SASA of 0.45 suggests that some diversity may be beneficial. A design was chosen to increase hydrophilicity with A, D, E,

H, K, N, P, Q, S, T, Y, and a stop codon that was carried along due to the genetic code limitations (encoded by NMB codon).

Sites 36a and 36b, the non-natural loop length extension sites, had less information to use for design choice compared to the natural sites. In previous binders, 36a showed an

96 increased hydrophilicity while 36b showed increased hydrophobicity. Due to the limited information CDR was chosen for both sites.

Site G37 had increased hydrophobicity in previous binders, with increased levels of L, W, and Y especially, but hydrophilic D was also increased. Stability and natural homologs suggested limiting diversity at this site. However, SASA of 0.78 suggested that diversity could provide beneficial binding interactions. A design was chosen with moderate diversity and increased hydrophobicity, including D, F, I, N, V, Y, L, H (encoded by NWT codon).

Site F38 had significantly increased hydrophobicity in previous binders, with highly increased C. Stability and natural homolog analysis suggested limiting diversity at this site.

An average SASA of 0.26 suggested that diversity may not be very beneficial at this site.

A moderate diversity design to increase the level of hydrophobicity was chosen, including

D, F, I, N, V, Y, L, H (encoded by NWT codon).

Site E39 had an increase in Y as well as other dissimilar amino acids. Stability and natural homologs both suggested that many amino acids are permitted at this site. An average

SASA of 0.70 suggested that high diversity could provide beneficial binding interactions.

CDR was chosen for this site.

4.2.3 Extended Paratope

97

Figure 4.5. Amino acid frequency in extended paratope residues from generation 1 Gp2. (Left) Expected amino acid frequency change in binders after error prone PCR was calculated based on theoretical error rates and assuming 100% wild type initially. (Right) Actual amino acid frequency change seen in binding population compared to initial population.

Site E30 had a high percentage of G in previous binders but the ratio of actual/expected was relatively low. Conversely, V, F, A, and Q all occurred much more often than expected, and K was slightly higher than expected. The chosen design was E, Q, L, V, A, P (encoded by SHG codon) aiming to include wild type and allow for the chemically similar Q or much smaller A, V to reduce potential steric hindrance.

98

Site W31 had increased frequency of G, C, and L, moderately higher A, S, and slightly

higher Q. The chosen design was A, G, L, S, V, W (encoded by KBG) aiming to include

wild type and a variety of smaller amino acids to potentially provide additional beneficial

binding interactions.

Site Q32 had lower R than expected, while L, H, K were higher than expected and E was

slightly higher than expected. The design was chosen as D, E, H, L, Q, V (encoded by

SWK codon) to allow for chemical homology and smaller amino acids to reduce potential

steric hindrance.

See Supplemental Table S4.1 for oligonucleotide designs and mixtures.

4.2.4 PD-L1 designer libraries

Cysteines at positions 7 and 12 that appeared in all parental clones were kept locked in as

cysteine. Amino acid diversities at other sites (Table 4.4) were chosen to include parental

and maximize the number of chemically similar amino acids while keeping total

combinatorial diversity under 107. Four oligonucleotides that spanned the Gp2 gene were

ordered (IDT) for each library. The two oligonucleotides that encoded the first half of the

gene and the two that encoded the second half of the gene were mixed and cycled for 10

cycles using Phusion polymerase (NEB) following manufacturer recommendations. After

completion, the two reactions were mixed and cycled for 10 cycles using Phusion to create

the full gene. The correct size band was extracted from an agarose gel after electrophoresis

and purified. This product was PCR amplified prepare for electroporation into yeast as

previous described [214].

99

4.2.5 High Background Bead Sorts

EBY100 [236] yeast containing no plasmid were grown in YPD (yeast extract (10 g/L),

bactopeptone (20 g/L), and glucose (20g/L)) medium until saturation. EBY100 yeast

containing a plasmid encoding a binding ligand were grown in SD-CAA (0.07 M sodium citrate (pH 5.3), yeast nitrogen base (6.7 g/L), casamino acids (5 g/L), and glucose (20 g/L)) medium until OD600 = 6, then diluted in SG-CAA (Na2HPO4•7H2O (10.2 g/L),

NaH2PO4•H2O (8.6 g/L), galactose (19 g/L), glucose (1 g/L), yeast nitrogen base (6.7 g/L),

casamino acids(5 g/L)) medium to an OD600 < 1 and allowed to grow overnight to induce

7 protein expression. OD600 was used to estimate cell count (1 OD600 = 1x10 cells/mL) and

cells were mixed at the specified ratios in PBS (8.0 g/L NaCl, 0.2 g/L KCl, 1.44 g/L

Na2HPO4, 0.24 g/L KH2PO4, pH 7.2) supplemented with 1 g/L bovine serum albumin. The

pre-sorted mixture was diluted and plated on YPD, to count total yeast, and SD-CAA, to count plasmid-containing yeast, plates at appropriate dilutions to yield 20-200 colonies.

Bead sorts were carried out as described previously [214] with 1 or 3 washes. Collected beads were plated on YPD and SD-CAA at appropriate dilutions.

4.2.6 Library construction, sorting, and sequencing

Naïve libraries were constructed and sorted as described previously [214]. Briefly,

degenerate oligonucleotides were purchased (IDT) and assembled using overlap extension.

Each library was separately transformed into yeast using homologous recombination.

Yeast, ten times the library size, were sorted on magnetic beads with one wash, followed

by another magnetic sort with two washes. Full length clones were mutated by error-prone

PCR using the dNTP analogs 8-oxo-dGTP and dPTP (Trilink Biotechnologies). First and 100

second loops were PCR amplified separately and mixed together during homologous

recombination such that loops from different could shuffle and recombine together. Typical

sorting procedure of two bead sorts and one flow cytometry sort [214] was followed until

selective binding, as determined by 10-fold higher recovery on target beads, was observed.

The initial library and isolated binders were sequenced as described previously [214].

PD-L1 designer libraries were constructed by using an analogous method to CDR-walking

[97]. For each parental clone, loop one was diversified while retaining parental loop two,

and vice versa. The diversity was chosen by keep the parental amino acid and selecting

chemically homologous amino acids accessible with degenerate codons. The diversified

oligonucleotide for one loop and parental oligonucleotide for the other loop were

constructed using overlap extension and transformed into yeast. After one standard bead

sort the number of recovered yeast was <10,000 for each library. The functionally mutated

first and second loops, from their respective libraries, were isolated by PCR and

recombined into yeast for each variant. With fewer than 10,000 clones recovered, all

combinations (<108) could be sampled by yeast display sorting. Yeast with combined loops

were incubated with 50 nM biotin-PD-L1, washed, and then captured by beads. The

enriched population was sorted by flow cytometry with 50 nM and 5 nM biotin-PD-L1 to

isolate the strongest binders. A final depletion sort using three rounds of bare magnetic

beads and one round of human IgG coated beads was used to remove non-PD-L1 binders.

4.2.7 Protein production and characterization

Protein was produced and characterized as previously described (Kruziki Gp2). Briefly,

proteins were produced in T7 express, or T7 SHuffle E. coli (NEB) when proteins 101

contained multiple cysteines, and were purified using immobilized metal affinity

chromatography and reverse phase high-performance liquid chromatography with a 90%

buffer A (99.9% H2O, 0.1% trifluoroacetic acid (TFA))/10% buffer B (90% acetonitrile,

9.9% H2O, 0.1% TFA) to 10% A/90% B gradient over 15 minutes on a C18 column

(Waters). Affinity titrations were carried out using flow cytometry to measure the level of fluorophore tagged anti-His6 antibodies binding to varying concentrations of Gp2 labeled

CHO-K1 cells transfected with human PD-L1( CHO-hPD-L1) [237] cells which express

PD-L1. Mammalian cells were grown in DMEM plus 10% fetal bovine serum (A431) or

F12K plus 10% fetal bovine serum and 2 mg/L G418. Binding curves were fit by

minimizing the sum of squared residuals. Melting temperature was measured by

monitoring ellipticity at 218 nm using circular dichroism over a temperature range of 25-

98 °C in 100 mM sodium acetate, pH 5.

4.3 Results

4.3.1 Design of second generation library

While the first-generation Gp2 library yielded binders to each target, the discovery and

evolution campaigns were typically dominated by a small number of clones. We hypothesized that a different library design could provide a higher fraction of folded, functional mutants. Multiple library designs were constructed (Table 4.1 and Table 4.2).

The base library was akin to the first-generation library with antibody CDR diversity

(abbreviated CDR+) (updated for current database values) at each site (Figure 4.6).

Table 4.1. Designs of second generation Gp2 libraries. Codon or amino acid diversity is shown for each site in each library. CDR+ represents base antibody CDR diversity and 102

CDR- represents the closest match to that diversity while removing cysteine (and thus arginine, glycine, and tryptophan due to genetic code limitations). When two codons are listed, the second codon was on a separate oligonucleotide. For cysteine pair libraries, the slashes indicate first loop/second loop/cross loop library designs and have a ‘C’ for the cysteines locked in that library. Framework positions have a highlighted background.

CDR- (w/ CDR+ (w/ CDR- (w/ Const 9 & CDR (-CWRG) CDR (+CWRG) CDR (-CWRG) Constrain 9) Constrain 9) EWQ) (Cys Pairs) E7 CDR+ CDR- MNK MNK MNK C/CDR-/CDR- S8 CDR+ CDR- VRT + TAC (Y) VRT + TAC VRT + TAC CDR-/CDR-/C S9 CDR+ CDR- CDR+ CDR- CDR- CDR- 9a CDR+ CDR- CDR+ CDR- CDR- CDR- 9b CDR+ CDR- CDR+ CDR- CDR- CDR- E10 CDR+ CDR- + GGT (G) NMT + GGT NMT + GGT NMT + GGT CDR- + GGT H11 CDR+ CDR- + GGT CDR+ CDR- CDR- CDR- + GGT S12 CDR+ CDR- DBG DBG DBG C/CDR-/CDR-

E20 E/A E/A E/A E/A E/A E/A E27 E/Q E/Q E/Q E/Q E/Q E/Q E30 E E E E SHG E W31 W W W W KBG W Q32 Q Q Q Q SWK Q

V34 CDR+ CDR- + GGT SVR SVR SVR CDR- + GGT P35 CDR+ CDR- + GGT NMT + GGT NMT + GGT NMT + GGT CDR- + GGT A36 CDR+ CDR- NMB NMB NMB CDR-/C/CDR- 36a CDR+ CDR- CDR+ CDR- CDR- CDR- 36b CDR+ CDR- CDR+ CDR- CDR- CDR-/CDR-/C G37 CDR+ CDR- NWT NWT NWT CDR- F38 CDR+ CDR- NWT NWT NWT CDR- E39 CDR+ CDR- CDR+ CDR- CDR- CDR-/C/CDR-

R44 R/Q R/Q R/Q R/Q R/Q R/Q

Table 4.2. Expected amino acid frequency for each degenerate codon used in the Gp2 second generation library. 103

MNK VRT NMT DBG SHG KBG SWK SVR NMB NWT A - - 13% 11% 17% 17% - 17% 13% - C ------D - 17% 13% - - - 13% - 8% 13% E - - - - 17% - 13% 17% 4% - F ------13% G - 17% - 11% - 17% - 17% - - H 6% 17% 13% - - - 13% - 8% 13% I 6% ------13% K 6% ------4% - L 13% - - 11% 17% 17% 25% - - 13% M 6% - - 11% ------N 6% 17% 13% - - - - - 8% 13% P 13% - 13% - 17% - - 17% 13% - Q 6% - - - 17% - 13% 17% 4% - R 19% 17% - 11% - - - 17% - - S 6% 17% 13% 11% - 17% - - 13% - T 13% - 13% 11% - - - - 13% - V - - - 11% 17% 17% 25% - - 13% W - - - 11% - 17% - - - - Y - - 13% - - - - - 8% 13% Z ------4% -

20% Abysis 15% CDR (+CWRG) CDR (-CWRG) 10%

5%

0% A C D E F G H I K L M N P Q R S T V W Y Z

Figure 4.6. Expected amino acid frequency for CDR+ and CDR- sites. The literature values for the antibody CDR (Abysis) were used to guide the design.

104

Cysteines can limit ease of protein production, preclude downstream applications in reducing environments, and can cause aggregation. Therefore, a second library eliminated cysteine, while attempting to maintain CDR frequency of other amino acids (abbreviated

CDR-), to examine if disulfide-free clones could be readily discovered. This was achieved by elimination of guanine at the second position within each codon, which also eliminates

W, R, and G. Because of the evolutionary benefit of glycine in several sites in the first- generation evolution, glycine was uniquely allowed at sites 10, 11, 34, and 35 via an additional oligonucleotide.

A third library design constrained diversity at several sites based on sitewise constraint of amino acids in the paratope region has been beneficial for binder discovery in other small protein scaffolds [47, 51]. We hypothesized that this result would carry over to Gp2, provided we had sufficient structural and phylogenetic data to design the library. The general constrained library design relied on previous binding sequences (Figure 4.1), wild type solvent accessible surface area (SASA) (Figure 4.2), sequences from homologous proteins (Figure 4.3), chemical homology, and computational stability calculations (Figure

4.4). In general, we looked for trends in enrichment or reduction of certain amino acid properties at each site, e.g. size, hydrophobicity, charge, or retention of wild-type, from previous binding sequences and natural homologs. Chemical homology was used to add in or remove amino acids that had similar properties to included or excluded amino acids, respectively. Amino acids near the cutoff of being included or excluded were decided on by computational stability, i.e. they were disallowed if greatly destabilizing or allowed if neutral or beneficial. Finally, sites with higher SASA were allowed to have a higher

105

amount of diversity due to the potential for these exposed amino acids to form binding

interactions. In depth site-by-site design decisions are described in the Methods section.

Two versions of the constrained library were built; one where non-constrained sites were

CDR diversity and one where the non-constrained sites had the cysteine-free CDR diversity.

A fourth library extended the second loop paratope based on previous binder sequences. A large fraction of the strongest binding clones isolated during initial sorts had mutations in the residues directly preceding loop two. Diversity was chosen in order to have 4-6 amino acids at each site to avoid overdiversification. Amino acid diversity was selected by comparing expected (theoretical analog mutation rate of parental codon) vs. actual

(Illumina sequencing) frequencies from generation 1 Gp2. The extended loop library was built to have other paratope positions match the diversity of the constrained, cysteine-free

library.

Although cysteines can cause production and handling issues, cysteines that form

disulfides can provide needed stabilization. In the previous binder sequencing certain

cysteine pairs occurred much more than expected based on their positional frequency

(Figure 4.7). A fifth library constrained two sites at a time to be uniquely cysteine to

examine potential benefits of disulfide bonds. Three variations of the fifth library were

created, the loop one library has cysteines at positions 7 and 12; a loop two library has

cysteines at positions 36 and 39, and a cross loop library has cysteines at positions 8 and

36. These libraries had cysteine-free CDR diversity (CDR-) at all other positions.

106

Figure 4.7. Potential of disulfide bonded cysteines in generation 1 Gp2. Frequencies of cysteines at indicated positions were multiplied to calculate predicted frequency. The predicted frequency was compared to the actual pairwise frequency and the absolute difference and calculated mutual information are shown. High mutual information indicated cysteines at that site show up in pairs often. The sites with the top three values for difference and mutual information are shown in cartoon structure with the distance between residues (PDB ID: 2wnm).

Increased binding specificity has been seen in other small protein scaffolds after surface charge neutralization [238]. Data from charge neutralization experiments on a few Gp2 clones for the purpose of affecting in vivo biodistribution indicated multiple sites where neutralization was tolerated (manuscript in preparation). These charged framework sites were allowed to be parental or one neutral mutant in all libraries; E/A at site 20, E/Q at site

27, and R/Q at site 44 (Table 4.1). 107

4.3.2 Sorting and sequencing

Combinatorial libraries were built using degenerate oligonucleotides and transformed into

yeast for use in yeast display [54]. Based on number of transformants, each library size was

estimated to be 3-5 x 108. To ensure each library experienced the same handling, they were

combined and sorted together, resulting in a total library size of 4 x 109. Ten times the

library size was used each sort so that there was a more than 99.9% chance for each variant

of being included at least once [239]. Because of the high number of yeast needed for

sorting, a model case with approximately 105 binding yeast in up to 1010 non-binding yeast were sorted on magnetic beads to assess yield and enrichment at high cell densities. The

results (Figure 4.8) indicate that for certain clones and/or targets, low recovery can be

10 observed for moderate to weak binding interactions (KD > 1 µM) with 10 total yeast. Yet, four of six clones, including a 170 nM affinity binder, exhibited high yield and enrichment even with 1010 yeast cells. The recovery when sorting lower amounts of total yeast agreed

with previously published results [55].

108

Figure 4.8. Binding yeast recovery during magnetic bead sorts with high levels of non- binding yeast (EBY). 105 yeast expressing different scaffolds of varying affinities were counted by plating dilutions after bead sorts. Low recovery is seen for some weak affinity scaffolds even with 1 wash. Yield is only slightly decreased when using as many as 1010 non-binding yeast.

For the second generation Gp2 naïve library, 4 x 1010 (split into four tubes to not exceed

the 1010 yeast per tube) yeast were sorted using one wash to maximize yield, but acknowledging that there may be some losses of low affinity binders. After two rounds of

sorting and mutagenesis, populations of binding yeast emerged for PD-L1, MET, and

tumor necrosis factor receptor (TNFR). In particular, sorted populations exhibited greater

than 14-fold higher yeast recovered on beads coated in target compared to beads with

streptavidin or a control protein (human IgG). PD-L1 and MET populations also showed signal over background on flow cytometry when labeled with 100 nM target and detected with fluorophore-tagged streptavidin. Naive and binding populations were sequenced and

unique sequences were separated by loop (339 for each loop from naïve; 53 unique loop

one and 30 unique loop two from binders). Sequences with greater than 80% amino acid

identity were grouped into families (339 for each loop from naïve; 38 loop one families and 18 loop two families from binders) to accentuate diversity rather than dominant clones

(Figure 4.9). Each family was assigned to the library that it had the highest probability of arising from based on the sequence and designed diversity. Each family was counted once after being assigned.

109

Figure 4.9. Sequencing and library identity of naïve and evolved populations of the generation 2 Gp2. Sequences of each loop separately were grouped into families with a cutoff of 80% sequence identity. Each family was assigned to the library it had the highest probability of originating from. Error bars represent standard deviations of n = 339 naïve families, n = 38 loop one families, and n = 18 loop two families.

In loop one, the cysteine constraint library performed very well increasing prevalence from

13% of the naïve library to 47% of the binding sequences (p < 0.001). Cysteine-free CDR

(which was the diversity at other sites in the cysteine constraint library) performed poorly, decreasing prevalence from 23% to 5% (p = 0.006), which highlights the benefit of the cysteine constraint. Potential disulfides formed by these cysteines may help stabilize the

Gp2 protein fold to enable strong binding interactions. Conversely, the cross-loop cysteine library performed poorly decreasing from 21% to 5% (p = 0.009), indicating that heavily conserving cysteines is not beneficial at all sites. The constrained, cysteine-free library decreased from 22% to 8% (p = 0.02), but comparing this to the decrease for unconstrained, cysteine-free (17% vs 14%, respectively; p = 0.33) does not show a significant effect for constraint. Base case CDR increased from 10% of naïve to 24% of binders (p = 0.006),

110 while constrained with base case CDR was nearly unchanged (12% vs 11%), suggesting that constraint is detrimental in the base case CDR context (p < 0.001). While constraint has been beneficial for other small scaffolds (Danny affibody, Fn), perhaps the low number of binding sequences and natural Gp2 homologs led to over constraint towards limited or incorrect constraint. Comparing base case CDR to cysteine-free CDR in either the unconstrained (14% increase vs 17% decrease, p < 0.001) or the constrained (1% decrease vs 14% decrease, p < 0.001) cases suggests that eliminating cysteine, glycine, arginine, and tryptophan in loop one hurts binder discovery (even when glycine is doped in at certain positions).

In loop two, conversely, cysteine-free CDR performed the best of all the libraries increasing from 22% of naïve to 39% of binders (p = 0.04). Cysteine-free CDR outperformed the related library modifications including base case CDR (17% increase vs

7% increase, p < 0.001), cross loop cysteine (17% increase vs. 21% decrease, p < 0.001), intraloop cysteine (17% increase vs. 3% increase, p = 0.003) and, nominally, constrained, cysteine-free CDR (17% increase vs 7% increase, p = 0.06), suggesting that removing cysteine, glycine, arginine, and tryptophan in loop two is beneficial to discovery but the other constraints are not beneficial. Constrained, cysteine-free CDR outperformed constrained, base case CDR (7% increase vs 9% decrease, p < 0.001) and base case CDR outperformed constrained, base case CDR (7% increase vs 9 % decrease, p < 0.001) also suggesting that removing the four amino acids helps but the designed constraint is not beneficial. Extending the loop two paratope to include 30EWQ32 was also detrimental when compared to the relevant constrained, cysteine-free CDR (5% decrease vs. 7% increase, p

111

< 0.001). The extended paratope may cause extra destabilization resulting in more unfolded

proteins, or, since the library is not fully sampled, may simply have a lower fraction of

beneficial target interactions.

4.3.3 Further evolution of PD-L1 ligands

High affinity PD-L1 ligands are of interest in the clinic as PET imaging agents to stratify

patients and predict therapeutic response. The six most abundant variants sequenced from

the PD-L1 binding population were constructed in a production vector and produced in E.

coli. All six variants contained cysteines in the first loop at sites 7 and 12. When using the

NEB SHuffle E. coli strain, four were able to be produced and purified from the soluble fraction (Figure 10). Flow cytometry verified all four PD-L1 binding Gp2 variants bound

to CHO--hPD-L1 , however the large shift in signal at 150 nM and 3 µM Gp2

concentrations indicated a weak binding affinity (Figure 4.10).

112

Figure 4.10. PD-L1 binding Gp2 protein characterization. The aligned protein sequence of PD-L1 binding sequences recovered from the generation 2 library sorting, blue lettering indicates paratope region. Clones C and G were unable to be produced in the soluble fraction. The produced clones were used to label CHO-hPD-L1 cells at a high (3 µM) and medium (150 nM) concentration. Binding was detected using a fluorescently tagged anti-His6 antibody with flow cytometry. Proteins (excluding F) were also used to label CHO-K1 cells as a negative control. Binding is visible at 3 µM but nearly undetectable above background at 150 nM, suggesting weak affinity.

Eight libraries were constructed to increase the binding strength of these Gp2 lead

molecules to PD-L1. Two sublibraries were constructed for each lead clone: one with

extensively diversified loop 1 and conserved parental loop 2; and one sublibrary of the

inverse with conserved loop 1 and extensively diversified loop 2. Amino acid diversity at

each site was chosen to include homologous amino acids and to keep the library size below

107 unique members (Table 3 and Table 4) so that the full library could be reliably sampled

during yeast display sorting. Yeast were sorted by magnetic beads and fluorescence- activated cell sorting (FACS). Collected yeast of a single parental variant had their mutated loops recombined, i.e. loop 1 of library one and loop 2 of library two were combined into a single new library. The highest affinity Gp2 mutants were isolated and sequenced after two rounds of FACS sorting (Figure 4.11).

Table 4.3. Library design for PD-L1 binding Gp2 evolution. The degenerate codon used at each site for each library is displayed. ‘C’ signifies that position was locked in as cysteine. N/A signifies that parental clone did not have the extended loop positions present.

113

Table 4.4. Expected amino acid frequency for each degenerate codon used in PD-L1 library designs.

114

115

Figure 4.11. Sequences of strongest binding evolved and parental PD-L1 binding Gp2. Clones E1 and E4 were the only evolved proteins produced at measurable levels in the soluble fraction using E. coli. Strong binding was detectable with 10 nM Gp2 labeling of CHO-hPD-L1 cells detected by fluorophore tagged anti-His6 antibody (compared to 3 µM with parental proteins).

4.4 Discission

The ability to isolate numerous unique binding variants from a combinatorial protein

scaffold library provides benefits in downstream applications through a variety of

biophysical properties and variable binding epitopes. Previous successful campaigns to

isolate Gp2 binders toward multiple targets [214, 229] have been limited by the modest

amount of high affinity sequences recovered. Gp2 would benefit from optimization of its

paratope and framework, resulting in a higher frequency of functional clones and thus more

effective discovery and evolution of high affinity binders. Here, we have taken the next

steps towards building a knowledge base for optimization of Gp2.

116

The high percentage of cysteines in potential disulfide locations that emerged in the Gp2

generation 1 evolved binders suggested that many randomized loops could be too destabilizing unless a disulfide was present (to be clear, stable cysteine-free mutants were

also found but the frequency of such mutants was modest). To examine if disulfide

stabilization is beneficial, libraries were built with conserved cysteines at sites that can

potentially form disulfides due to proximity while all other sites in these libraries were

allowed full diversity. The enrichment of clones with paired cysteines at the termini of loop

1 (sites 7 and 12) compared to the appropriate controls (Figure 4.9) suggested that cysteine

stabilization is beneficial for discovering Gp2 binders. However, paired cysteines in loop

2 or across loops did not perform well compared to their controls, so cysteine locations

must be chosen with care. Additionally, cysteines led to difficulty in producing certain

clones during further PD-L1 binder evolution, which must be balanced with potential

evolutionary benefit. The cysteine-free CDR library performed poorly in loop 1 but better than the base case CDR library in loop 2 demonstrating that completely removing cysteine

(and tryptophan, arginine and sometimes glycine) is advisable only for loop 2.

As another approach to increase the quality of the library, we looked at trends in the evolved Gp2 generation 1 amino acid diversity to see if any sites benefitted from conservation. By omitting the undesirable amino acids from the naïve library, a higher fraction of the library could potentially be stable and well-folded. For example, site S8 had increased levels of small, hydrophilic amino acids in the evolved binders in Generation 1, and F38 showed increased hydrophobicity, both sites had low diversity in naturally occurring homologs, and so diversity was limited at these sites. Other sites that had similar

117

patterns were constrained as well. However, the constrained libraries were either neutral or

detrimental to the frequency of evolved sequences in both loops when compared to the

unconstrained base cases. One possible explanation for is that the relatively low amount of

sequences from generation 1 binders lead to over constraint or incorrect constraint. Since

constraint has proven beneficial in other protein scaffold libraries, another attempt at

constraining diversity in Gp2 should be performed when more binding sequences are

available.

The expanded paratope library decreased in frequency from naïve to evolved binders

compared to the increase of the relevant cysteine-free constrained library control. This

result suggested that added destabilization from higher diversity outweighs the benefit of expanding the paratope to allow for increased interaction area. Taken together, these trends and sequences serve as a foundation of the information needed to more fully optimize the

Gp2 paratope.

PD-L1 is expressed in multiple cancer types and is associated with poor prognosis [14,

240]. Tumor expression of PD-L1 is correlated with response to targeted therapies [241–

243]. However, the expression level of PD-L1 can vary during the course of a treatment

[244] and the difficulty of repeated invasive biopsies can limit ability to monitor PD-L1

levels through immunohistochemistry. Radiolabeled antibodies [245] and nanobodies

[246] that bind PD-L1 have been used in PET imaging to detect PD-L1 expression, but

large size of the tracers limit image quality at early timepoints. Small protein Affibody

molecules targeting PD-L1 have recently been used in PET imaging for early timepoints,

but modest tumor signal (2.5 % ID/g) leaves room for improvement [247].

118

Radiolabeled Gp2 molecules have been successfully used for early timepoint PET imaging of EGFR biomarkers in mice [248]. PD-L1 binding Gp2 molecules isolated from the after one round of mutagenesis had low affinity, showing weak binding signal at with 150 nM labeling. PET image quality is highly dependent on strength of ligand binding [67]. An analogous approach to CDR walking [97, 249] enabled evolution of Gp2 molecules with much higher binding affinity towards PD-L1. This evolution approach can be applied to efficiently create higher affinity ligands in other Gp2 campaigns or in other protein scaffolds. The highest affinity PD-L1 targeting Gp2 molecules generated in this study have potential for use as radiotracers for PET imaging, or in other biotechnological applications.

4.5 Conclusion

Overall, these initial investigations in paratope diversification or conservation form a strong starting point for Gp2 optimization. The beneficial nature of cysteines and detriment of certain conservation schemes can be used in further iterations of the Gp2 naïve library.

PD-L1 targeted Gp2 molecules that arose out of the paratope optimization have been evolved to bind strongly towards cellular PD-L1 and can be utilized for molecular imaging or other applications.

4.6 Supplemental Information

Table S4.1. Oligonucleotide sequences used to construct generation 2 Gp2. When separate oligonucleotides were mixed the following frequencies were desired: For CDR- and all cysteine libraries loop 1 76% main design, 13% with G10, and 11% with G11. For CDR- and all cysteine libraries loop 2 78% main design, 11% with G34, and 11% with G35. For constrained CDR+, constrained CDR-, and extended paratope loop 1 75% main design, 14% with Y8, 9% with G10, and 2% with Y8 and G10. For constrained CDR+, constrained CDR-, and extended paratope loop 2 89% main design and 11% with G35.

119

WNM2F_gen2 TTCGAGGTTCCGGTTTATGCTGMAACCCTGGACGAAG CACTGSAGCTGGCC WNM1R_Gen2 GGCCAGCTSCAGTGCTTCGTCCAGGGTTKCAGCATAA ACCGGAACCTCGAA WNM2R_Gen2 CAAGTCCTCTTCAGAAATAAGCTTTTGTTCGGATCCCG GCYGCACGCGGGTCAC

Gp2CDRplusA1 TAGCAAATTTTGGGCGACTGTA (N1: 14 15 34 37) (N2: 43 19 19 19) (N3:0 22 15 63) (N1) (N2) (N3) (N1) (N2) (N3)(N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) TTCGAGGTTCCGGTTTATGCT Gp2CDRplusA2 TAGCAAATTTTGGGCGACTGTA (N1: 14 15 34 37) (N2: 43 19 19 19) (N3:0 22 15 63) (N1) (N2) (N3) (N1) (N2) (N3)(N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3)(N1) (N2) (N3) TTCGAGGTTCCGGTTTATGCT Gp2CDRplusA3 TAGCAAATTTTGGGCGACTGTA (N1: 14 15 34 37) (N2: 43 19 19 19) (N3:0 22 15 63) (N1) (N2) (N3) (N1) (N2) (N3)(N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3)(N1) (N2) (N3)(N1) (N2) (N3) TTCGAGGTTCCGGTTTATGCT Gp2CDRplusC1 GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC (N1:14 15 34 37) (N2:43 19 19 19) (N3:0 22 15 63) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3)(N1) (N2) (N3) (N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT Gp2CDRplusC2 GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC(N1:14 15 34 37) (N2:43 19 19 19) (N3:0 22 15 63) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3)(N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT Gp2CDRplusC3 GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC (N1:14 15 34 37) (N2:43 19 19 19) (N3:0 22 15 63) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3)(N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT

Gp2CDRminusA1 TAGCAAATTTTGGGCGACTGTA (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) TTCGAGGTTCCGGTTTATGCT Gp2CDRminusA2 TAGCAAATTTTGGGCGACTGTA (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) TTCGAGGTTCCGGTTTATGCT Gp2CDRminusA3 TAGCAAATTTTGGGCGACTGTA (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) (N1)

120

(N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) TTCGAGGTTCCGGTTTATGCT Gp2CDRminus_G10 TAGCAAATTTTGGGCGACTGTA (N1:18 15 25 42) (N2:42 _A1 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) GGT (N1) (N2) (N3) (N1) (N2) (N3) TTCGAGGTTCCGGTTTATGCT Gp2CDRminus_G10 TAGCAAATTTTGGGCGACTGTA (N1:18 15 25 42) (N2:42 _A2 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) GGT (N1) (N2) (N3) (N1) (N2) (N3) TTCGAGGTTCCGGTTTATGCT Gp2CDRminus_G10 TAGCAAATTTTGGGCGACTGTA (N1:18 15 25 42) (N2:42 _A3 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) GGT (N1) (N2) (N3) (N1) (N2) (N3) TTCGAGGTTCCGGTTTATGCT Gp2CDRminus_G11 TAGCAAATTTTGGGCGACTGTA (N1:18 15 25 42) (N2:42 _A1 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) GGT (N1) (N2) (N3) TTCGAGGTTCCGGTTTATGCT Gp2CDRminus_G11 TAGCAAATTTTGGGCGACTGTA (N1:18 15 25 42) (N2:42 _A2 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) GGT (N1) (N2) (N3) TTCGAGGTTCCGGTTTATGCT Gp2CDRminus_G11 TAGCAAATTTTGGGCGACTGTA (N1:18 15 25 42) (N2:42 _A3 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) GGT (N1) (N2) (N3) TTCGAGGTTCCGGTTTATGCT Gp2CDRminusC1 GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT Gp2CDRminusC2 GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT Gp2CDRminusC3 GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT Gp2CDRminus_G34 GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC GGT _C1 (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT

121

Gp2CDRminus_G34 GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC GGT _C2 (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT Gp2CDRminus_G34 GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC GGT _C3 (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT Gp2CDRminus_G35 GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC _C1 (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) GGT (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT Gp2CDRminus_G35 GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC _C2 (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) GGT (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT Gp2CDRminus_G35 GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC _C3 (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) GGT (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT

Gp2CDRplus_cnsv_ TAGCAAATTTTGGGCGACTGTA MNK VRT (N1:14 15 34 A1 37) (N2:43 19 19 19) (N3:0 22 15 63) NMT (N1) (N2) (N3) DBG TTCGAGGTTCCGGTTTATGCT Gp2CDRplus_cnsv_ TAGCAAATTTTGGGCGACTGTA MNK VRT (N1:14 15 34 A2 37) (N2:43 19 19 19) (N3:0 22 15 63) (N1) (N2) (N3) NMT (N1) (N2) (N3) DBG TTCGAGGTTCCGGTTTATGCT Gp2CDRplus_cnsv_ TAGCAAATTTTGGGCGACTGTA MNK VRT (N1:14 15 34 A3 37) (N2:43 19 19 19) (N3:0 22 15 63) (N1) (N2) (N3) (N1) (N2) (N3) NMT (N1) (N2) (N3) DBG TTCGAGGTTCCGGTTTATGCT Gp2CDRplus_cnsv_ TAGCAAATTTTGGGCGACTGTA MNK TAC (N1:14 15 34 Y_A1 37) (N2:43 19 19 19) (N3:0 22 15 63) NMT (N1) (N2) (N3) DBG TTCGAGGTTCCGGTTTATGCT Gp2CDRplus_cnsv_ TAGCAAATTTTGGGCGACTGTA MNK TAC (N1:14 15 34 Y_A2 37) (N2:43 19 19 19) (N3:0 22 15 63) (N1) (N2) (N3) NMT (N1) (N2) (N3) DBG TTCGAGGTTCCGGTTTATGCT Gp2CDRplus_cnsv_ TAGCAAATTTTGGGCGACTGTA MNK TAC (N1:14 15 34 Y_A3 37) (N2:43 19 19 19) (N3:0 22 15 63) (N1) (N2) (N3) (N1) (N2) (N3) NMT (N1) (N2) (N3)DBG TTCGAGGTTCCGGTTTATGCT Gp2CDRplus_cnsv_ TAGCAAATTTTGGGCGACTGTA MNK VRT (N1:14 15 34 G_A1 37) (N2:43 19 19 19) (N3:0 22 15 63) GGT (N1) (N2) (N3) DBG TTCGAGGTTCCGGTTTATGCT 122

Gp2CDRplus_cnsv_ TAGCAAATTTTGGGCGACTGTA MNK VRT (N1:14 15 34 G_A2 37) (N2:43 19 19 19) (N3:0 22 15 63) (N1) (N2) (N3) GGT(N1) (N2) (N3) DBG TTCGAGGTTCCGGTTTATGCT Gp2CDRplus_cnsv_ TAGCAAATTTTGGGCGACTGTA MNK VRT (N1:14 15 34 G_A3 37) (N2:43 19 19 19) (N3:0 22 15 63) (N1) (N2) (N3) (N1) (N2) (N3) GGT (N1) (N2) (N3) DBG TTCGAGGTTCCGGTTTATGCT Gp2CDRplus_cnsv_ TAGCAAATTTTGGGCGACTGTA MNK TAC (N1:14 15 34 YG_A1 37) (N2:43 19 19 19) (N3:0 22 15 63) GGT (N1) (N2) (N3) DBGTTCGAGGTTCCGGTTTATGCT Gp2CDRplus_cnsv_ TAGCAAATTTTGGGCGACTGTA MNK TAC (N1:14 15 34 YG_A2 37) (N2:43 19 19 19) (N3:0 22 15 63) (N1) (N2) (N3) GGT (N1) (N2) (N3) DBG TTCGAGGTTCCGGTTTATGCT Gp2CDRplus_cnsv_ TAGCAAATTTTGGGCGACTGTA MNK TAC (N1:14 15 34 YG_A3 37) (N2:43 19 19 19) (N3:0 22 15 63) (N1) (N2) (N3) (N1) (N2) (N3) GGT (N1) (N2) (N3) DBG TTCGAGGTTCCGGTTTATGCT Gp2CDRplus_cnsv_ GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC SVR C1 NMT NMB NWT NWT (N1: 14 15 34 37) (N2:43 19 19 19) (N3:0 22 15 63) GTGACCCGCGTGCRGCCGGGAT Gp2CDRplus_cnsv_ GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC SVR C2 NMT NMB (N1:14 15 34 37) (N2:43 19 19 19) (N3:0 22 15 63) NWT NWT (N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT Gp2CDRplus_cnsv_ GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC SVR C3 NMT NMB (N1:14 15 34 37) (N2:43 19 19 19) (N3:0 22 15 63) (N1) (N2) (N3) NWT NWT (N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT Gp2CDRplus_cnsv_ GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC SVR G_C1 GGT NMB NWT NWT (N1: 14 15 34 37) (N2:43 19 19 19) (N3:0 22 15 63) GTGACCCGCGTGCRGCCGGGAT Gp2CDRplus_cnsv_ GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC SVR G_C2: GGT NMB (N1:14 15 34 37) (N2:43 19 19 19) (N3:0 22 15 63) NWT NWT (N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT Gp2CDRplus_cnsv_ GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC SVR G_C3 GGT NMB (N1:14 15 34 37) (N2:43 19 19 19) (N3:0 22 15 63) (N1) (N2) (N3) NWT NWT (N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT

Gp2CDRminus_cns TAGCAAATTTTGGGCGACTGTA MNK VRT (N1:18 15 25 v_A1 42) (N2:42 30 0 28) (N3:0 0 20 80) NMT (N1) (N2) (N3) DBG TTCGAGGTTCCGGTTTATGCT

123

Gp2CDRminus_cns TAGCAAATTTTGGGCGACTGTA MNK VRT (N1:18 15 25 v_A2 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) NMT (N1) (N2) (N3) DBG TTCGAGGTTCCGGTTTATGCT Gp2CDRminus_cns TAGCAAATTTTGGGCGACTGTA MNK VRT (N1:18 15 25 v_A3 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) NMT (N1) (N2) (N3) DBG TTCGAGGTTCCGGTTTATGCT Gp2CDRminus_cns TAGCAAATTTTGGGCGACTGTA MNK TAC (N1:18 15 25 v_Y_A1 42) (N2:42 30 0 28) (N3:0 0 20 80) NMT (N1) (N2) (N3) DBG TTCGAGGTTCCGGTTTATGCT Gp2CDRminus_cns TAGCAAATTTTGGGCGACTGTA MNK TAC (N1:18 15 25 v_Y_A2 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) NMT (N1) (N2) (N3) DBG TTCGAGGTTCCGGTTTATGCT Gp2CDRminus_cns TAGCAAATTTTGGGCGACTGTA MNK TAC (N1:18 15 25 v_Y_A3 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) NMT (N1) (N2) (N3) DBG TTCGAGGTTCCGGTTTATGCT Gp2CDRminus_cns TAGCAAATTTTGGGCGACTGTA MNK VRT (N1:18 15 25 v_G_A1 42) (N2:42 30 0 28) (N3:0 0 20 80) GGT (N1) (N2) (N3) DBG TTCGAGGTTCCGGTTTATGCT Gp2CDRminus_cns TAGCAAATTTTGGGCGACTGTA MNK VRT (N1:18 15 25 v_G_A2 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) GGT (N1) (N2) (N3) DBG TTCGAGGTTCCGGTTTATGCT Gp2CDRminus_cns TAGCAAATTTTGGGCGACTGTA MNK VRT (N1:18 15 25 v_G_A3 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) GGT (N1) (N2) (N3) DBG TTCGAGGTTCCGGTTTATGCT Gp2CDRminus_cns TAGCAAATTTTGGGCGACTGTA MNK TAC (N1:18 15 25 v_YG_A1 42) (N2:42 30 0 28) (N3:0 0 20 80) GGT (N1) (N2) (N3) DBG TTCGAGGTTCCGGTTTATGCT Gp2CDRminus_cns TAGCAAATTTTGGGCGACTGTA MNK TAC (N1:18 15 25 v_YG_A2 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) GGT (N1) (N2) (N3) DBG TTCGAGGTTCCGGTTTATGCT Gp2CDRminus_cns TAGCAAATTTTGGGCGACTGTA MNK TAC (N1:18 15 25 v_YG_A3 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) GGT (N1) (N2) (N3) DBG TTCGAGGTTCCGGTTTATGCT Gp2CDRminus_cns GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC SVR v_C1 NMT NMB NWT NWT (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) GTGACCCGCGTGCRGCCGGGAT Gp2CDRminus_cns GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC SVR v_C2 NMT NMB (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) NWT NWT (N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT 124

Gp2CDRminus_cns GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC SVR v_C3 NMT NMB (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) NWT NWT (N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT Gp2CDRminus_cns GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC SVR v_G_C1 GGT NMB NWT NWT (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) GTGACCCGCGTGCRGCCGGGAT Gp2CDRminus_cns GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC SVR v_G_C2 GGT NMB (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) NWT NWT (N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT Gp2CDRminus_cns GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC SVR v_G_C3 GGT NMB (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) NWT NWT (N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT

Gp2CDR_EWQ_C1 GACGAAGCACTGSAGCTGGCC SHG KBG SWK TAC SVR NMT NMB NWT NWT (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) GTGACCCGCGTGCRGCCGGGAT Gp2CDR_EWQ_C2 GACGAAGCACTGSAGCTGGCC SHG KBG SWK TAC SVR NMT NMB (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) NWT NWT (N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT Gp2CDR_EWQ_C3 GACGAAGCACTGSAGCTGGCC SHG KBG SWK TAC SVR NMT NMB (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) NWT NWT (N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT Gp2CDR_EWQ_G_ GACGAAGCACTGSAGCTGGCC SHG KBG SWK TAC C1 SVR GGT NMB NWT NWT (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) GTGACCCGCGTGCRGCCGGGAT Gp2CDR_EWQ_G_ GACGAAGCACTGSAGCTGGCC SHG KBG SWK TAC C2 SVR GGT NMB (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) NWT NWT (N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT Gp2CDR_EWQ_G_ GACGAAGCACTGSAGCTGGCC SHG KBG SWK TAC C3 SVR GGT NMB (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) NWT NWT (N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT

Gp2Cys_loop1_A1 TAGCAAATTTTGGGCGACTGTA TGC (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) TGC TTCGAGGTTCCGGTTTATGCT Gp2Cys_loop1_A2 TAGCAAATTTTGGGCGACTGTA TGC (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) 125

(N1) (N2) (N3) (N1) (N2) (N3) TGC TTCGAGGTTCCGGTTTATGCT Gp2Cys_loop1_A3 TAGCAAATTTTGGGCGACTGTA TGC (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) TGC TTCGAGGTTCCGGTTTATGCT Gp2Cys_loop1_G10 TAGCAAATTTTGGGCGACTGTA TGC (N1:18 15 25 42) _A1 (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) GGT (N1) (N2) (N3) TGC TTCGAGGTTCCGGTTTATGCT Gp2Cys_loop1_G10 TAGCAAATTTTGGGCGACTGTA TGC (N1:18 15 25 42) _A2 (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) GGT (N1) (N2) (N3) TGC TTCGAGGTTCCGGTTTATGCT Gp2Cys_loop1_G10 TAGCAAATTTTGGGCGACTGTA TGC (N1:18 15 25 42) _A3 (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) GGT (N1) (N2) (N3) TGC TTCGAGGTTCCGGTTTATGCT Gp2Cys_loop1_G11 TAGCAAATTTTGGGCGACTGTA TGC (N1:18 15 25 42) _A1 (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) GGT TGC TTCGAGGTTCCGGTTTATGCT Gp2Cys_loop1_G11 TAGCAAATTTTGGGCGACTGTA TGC (N1:18 15 25 42) _A2 (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) GGT TGC TTCGAGGTTCCGGTTTATGCT Gp2Cys_loop1_G11 TAGCAAATTTTGGGCGACTGTA TGC (N1:18 15 25 42) _A3 (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) GGT TGC TTCGAGGTTCCGGTTTATGCT

Gp2Cys_loop2_C1 GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) TGC (N1) (N2) (N3) (N1) (N2) (N3) TGC GTGACCCGCGTGCRGCCGGGAT Gp2Cys_loop2_C2 GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) TGC (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) TGC GTGACCCGCGTGCRGCCGGGAT Gp2Cys_loop2_C3 GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3) TGC (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) TGC GTGACCCGCGTGCRGCCGGGAT Gp2Cys_loop2_G34 GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC GGT _C1 (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) TGC (N1) (N2) (N3) (N1) (N2) (N3) TGC GTGACCCGCGTGCRGCCGGGAT Gp2Cys_loop2_G34 GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC GGT _C2 (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) TGC (N1) (N2) 126

(N3) (N1) (N2) (N3) (N1) (N2) (N3) TGC GTGACCCGCGTGCRGCCGGGAT Gp2Cys_loop2_G34 GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC GGT _C3 (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) TGC (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) TGC GTGACCCGCGTGCRGCCGGGAT Gp2Cys_loop2_G35 GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC _C1 (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) GGT TGC (N1) (N2) (N3) (N1) (N2) (N3) TGC GTGACCCGCGTGCRGCCGGGAT Gp2Cys_loop2_G35 GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC _C2 (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) GGT TGC (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) TGC GTGACCCGCGTGCRGCCGGGAT Gp2Cys_loop2_G35 GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC _C3 (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) GGT TGC (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) TGC GTGACCCGCGTGCRGCCGGGAT

Gp2Cys_diloop_A1 TAGCAAATTTTGGGCGACTGTA (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) TGC (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) TTCGAGGTTCCGGTTTATGCT Gp2Cys_diloop_A2 TAGCAAATTTTGGGCGACTGTA (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) TGC (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) TTCGAGGTTCCGGTTTATGCT Gp2Cys_diloop_A3 TAGCAAATTTTGGGCGACTGTA (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) TGC (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) TTCGAGGTTCCGGTTTATGCT Gp2Cys_diloop_G1 TAGCAAATTTTGGGCGACTGTA (N1:18 15 25 42) (N2:42 0_A1 30 0 28) (N3:0 0 20 80) TGC (N1) (N2) (N3) GGT (N1) (N2) (N3) (N1) (N2) (N3) TTCGAGGTTCCGGTTTATGCT Gp2Cys_diloop_G1 TAGCAAATTTTGGGCGACTGTA (N1:18 15 25 42) (N2:42 0_A2 30 0 28) (N3:0 0 20 80) TGC (N1) (N2) (N3) (N1) (N2) (N3) GGT (N1) (N2) (N3) (N1) (N2) (N3) TTCGAGGTTCCGGTTTATGCT Gp2Cys_diloop_G1 TAGCAAATTTTGGGCGACTGTA (N1:18 15 25 42) (N2:42 0_A3 30 0 28) (N3:0 0 20 80) TGC (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) GGT (N1) (N2) (N3) (N1) (N2) (N3) TTCGAGGTTCCGGTTTATGCT

127

Gp2Cys_diloop_G1 TAGCAAATTTTGGGCGACTGTA (N1:18 15 25 42) (N2:42 1_A1 30 0 28) (N3:0 0 20 80) TGC (N1) (N2) (N3) (N1) (N2) (N3) GGT (N1) (N2) (N3) TTCGAGGTTCCGGTTTATGCT Gp2Cys_diloop_G1 TAGCAAATTTTGGGCGACTGTA (N1:18 15 25 42) (N2:42 1_A2 30 0 28) (N3:0 0 20 80) TGC (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) GGT (N1) (N2) (N3) TTCGAGGTTCCGGTTTATGCT Gp2Cys_diloop_G1 TAGCAAATTTTGGGCGACTGTA (N1:18 15 25 42) (N2:42 1_A3 30 0 28) (N3:0 0 20 80) TGC (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) (N1) (N2) (N3) GGT (N1) (N2) (N3) TTCGAGGTTCCGGTTTATGCT Gp2Cys_diloop_C3 GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3)(N1) (N2) (N3)(N1) (N2) (N3) TGC (N1) (N2) (N3) (N1) (N2) (N3)(N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT Gp2Cys_diloop_G3 GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC GGT 4_C3 (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) (N1) (N2) (N3)(N1) (N2) (N3) TGC (N1) (N2) (N3) (N1) (N2) (N3)(N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT Gp2Cys_diloop_G3 GACGAAGCACTGSAGCTGGCCGAATGGCAGTAC 5_C3 (N1:18 15 25 42) (N2:42 30 0 28) (N3:0 0 20 80) GGT (N1) (N2) (N3)(N1) (N2) (N3) TGC (N1) (N2) (N3) (N1) (N2) (N3)(N1) (N2) (N3) GTGACCCGCGTGCRGCCGGGAT

128

Chapter 5: Concluding Remarks

Molecules that bind specifically to other targets have many uses in biotechnology and

medicine. Small non-immunoglobulin protein scaffolds have become more popular in

recent years due to their less complex structure, smaller size, and ease of production. Small protein scaffolds are especially useful in molecular imaging, where the large size and secondary functions of the typically used antibody lead to slow clearance and high background.

This thesis describes the discovery and evaluation of a novel small protein scaffold Gp2.

This work starts from the early stages of systematic evaluation of known protein structures to identify a potential scaffold. From there, we successfully demonstrated proof-of-concept studies of evolving Gp2 to bind to model protein targets. We then use the scaffold to generate a Gp2 variant that binds the cancer biomarker EGFR and verified this variant as an effective PET imaging agent in vivo. We then finalized the study with steps toward optimization of the Gp2 paratope and framework to allow more efficient evolution and development of future Gp2 variants.

In chapter 2, we described a method to discover potential protein scaffolds from the protein data bank online protein structure repository. Of the protein structures scored on size, secondary structure, paratope surface area, and mutational tolerance, Gp2 from T7 phage emerged as the winner. A combinatorial library of Gp2 variants was constructed, sorted, and evolved to bind to four targets: lysozyme, rabbit IgG, goat IgG, and EGFR. The lead

Gp2 proteins in each campaign bound strongly and specifically and retained high thermal

129

stability. These proteins have potential use in biotechnology and, in the case of

Gp2αEGFR2.2.3 (Gp2-EGFR), as a clinical imaging agent for the stratification of cancer

patients.

In chapter 3, the performance of Gp2-EGFR as an imaging agent was thoroughly

investigated. Gp2-EGFR and a non-binding control Gp2 (Gp2-nb) were site-specifically

labeled with the DOTA chelator. DOTA-Gp2-EGFR retained strong binding to both human

EGFR and murine EGFR. Murine EGFR affinity is an important trait that allows pre- clinical imaging to better match the clinical situation where off-target tissues will express

low levels of human EGFR. In vitro biological activity assays revealed that Gp2-EGFR neither agonized EGFR signaling nor antagonized the EGF-EGFR interaction. This biological passiveness is a beneficial quality of imaging agents to reduce unwanted side effects. Radioactively labeled 64Cu-Gp2-EGFR tracer effectively localized to EGFRhigh

tumors at 45 minutes (3.2 ± 0.5 %ID/g). High specificity was observed with significantly

lower signal in EGFRlow tumors (0.9 ± 0.3 %ID/g, p < 0.001) and high tumor-to-

background ratios (11 ± 6 tumor:muscle, p < 0.001) in EGFRhigh tumors. Non-targeted Gp2

tracer had low uptake in EGFRhigh tumors (0.5 ± 0.3 %ID/g, p < 0.001). Similar values were seen in PET image quantification 2 h post injection. Ex vivo tissue analysis at 2 h

correlated with PET imaging results. Additionally, 25-minute dynamic early time point

PET studies revealed the clearance half-time to be 3.2 ± 1.0 min. One of the main

drawbacks of Gp2-EGFR is that renal retention was high for the targeted (244 ±66 %ID/g)

and non-targeted (208 ± 19 %ID/g) probes, as is the case with most other small protein

scaffolds. Overall, Gp2-EGFR performed favorably when compared to two other EGFR

130 binding small scaffolds, Fn3 and affibody, showing promise for translating Gp2 to clinical imaging.

Despite the successes of Gp2 as a protein scaffold, low numbers of binding sequences per campaign and low solubility of some mutants could limit its usefulness in certain cases. In chapter 4, we attempted to optimize the Gp2 paratope to increase functional clones in the naïve library and increase the hydrophilicity of the framework to increase solubility. High- throughput sequencing of previous Gp2 binders, computational stability, sequences of natural Gp2 homologs, and solvent accessible surface area of residue side chains were used to design multiple libraries to investigate the effect of site wise amino acid diversity of Gp2 binder evolution. Libraries were designed to study the effect of cysteines and probable disulfides, constraint of certain sites to moderate diversity, and expansion of the paratope.

In loop one of Gp2, cysteines were beneficial to finding binding sequences, but in loop two the opposite was seen, where completely removing cysteines (and therefore arginine and tryptophan due to the genetic code) was beneficial. The constrained libraries and expanded paratope both performed worse than their relevant controls. Taken together, these results suggest that certain constraints (such as cysteines in loop one) are beneficial for the Gp2 library, but more sequence and structural information is needed to design a more optimized constraint.

The Gp2 scaffold is a useful tool for biotechnology and medicine as it currently exists. The choice between antibodies, protein scaffolds, and peptides for a specific binding need should be based on benefits and drawbacks due to their size, evolvability, and binding strength and specificity. Thus, Gp2 should continue to be used to develop binders to targets

131 of interest when small protein size, high thermal stability, and high specificity are required.

Yet, new scaffolds should continue to be developed and compared to existing scaffold performance on a variety of targets. As protein structure knowledge advances experimentally and through computational predictions, patterns of which scaffold topologies are easiest to develop towards specific target topologies should arise.

As the current Gp2 libraries are used to discover new binders, they should be sequenced and aggregated with other known Gp2 sequences. With increased sequence data, new generations of Gp2 libraries can be built with varying diversity or constraint. Better libraries will allow more sequences to be discovered for each target, aiding both practical applications and collection of sequence data for even further optimized libraries.

The evolved Gp2 variants that bind the clinically relevant EGFR and PD-L1 receptors need to undergo more thorough immunogenicity and toxicity tests before translation to the clinic. The Gp2 clones described in this thesis should be compared to other preclinical

EGFR and PD-L1 imaging probes. Bringing forward only the best probes for each target will ultimately benefit patients and save time and money during clinical trials.

The field of protein engineering provides the knowledge and tools to develop proteins for many applications; from purification to catalysis to medicine and much more. As knowledge advances in these other areas, new problems will continually arise that protein engineering can be used to solve. For example, as cancers are further characterized, new targets will emerge that allow patient stratification through diagnostic imaging for increased therapeutic efficacy. The advancements outlined in this thesis make important

132 contributions towards improving both protein engineering tools and methodologies as well as diagnostic imaging tools.

133

Bibliography

1. Siegel RL, Miller KD, Jemal A (2017) Cancer statistics, 2017. CA Cancer J Clin 67:7–30. doi: 10.3322/caac.21387 2. Allison KH, Sledge GW (2014) Heterogeneity and Cancer. Oncology 3. Phelps MA, Sparreboom A (2014) A Snapshot of Challenges and Solutions in Cancer Drug Development and Therapy. Clin Pharmacol Ther 95:341–346. doi: 10.1038/clpt.2014.15 4. Lordick F, Hacker U (2014) Chemotherapy and Targeted Therapy. In: Imaging Complicat. Toxic. Follow. Tumor Ther. Springer, Cham, pp 3–15 5. Yarden Y (2001) The EGFR family and its ligands in human cancer: signalling mechanisms and therapeutic opportunities. Eur J Cancer 37:3–8. doi: 10.1016/S0959-8049(01)00230-1 6. Yewale C, Baradia D, Vhora I, et al (2013) Epidermal growth factor receptor targeting in cancer: A review of trends and strategies. Biomaterials 34:8690–8707. doi: 10.1016/j.biomaterials.2013.07.100 7. Scaltriti M, Baselga J (2006) The Epidermal Growth Factor Receptor Pathway: A Model for Targeted Therapy. Clin Cancer Res. doi: 10.1158/1078-0432.CCR-06- 1554 8. Pao W, Miller V, Zakowski M, et al (2004) EGF receptor gene mutations are common in lung cancers from never smokers and are associated with sensitivity of tumors to gefitinib and erlotinib. Proc Natl Acad Sci 101:13306–11. doi: 10.1073/pnas.0405220101 9. Stéphane Garcia B, Dalès J-P, Charafe-Jauffret E, et al (2007) Poor prognosis in breast carcinomas correlates with increased expression of targetable CD146 and c- Met and with proteomic basal-like phenotype. Hum Pathol. doi: 10.1016/j.humpath.2006.11.015 10. Lee HE, Kim MA, Lee HS, et al (2012) MET in gastric carcinomas: comparison between protein expression and gene copy number and impact on clinical outcome. Br J Cancer 107:325–333. doi: 10.1038/bjc.2012.237 11. Cepero V, Sierra JR, Corso S, et al (2010) MET and KRAS gene amplification mediates acquired resistance to MET tyrosine kinase inhibitors. Cancer Res 70:7580–90. doi: 10.1158/0008-5472.CAN-10-0436 12. Spigel DR, Ervin TJ, Ramlau R a, et al (2013) Randomized Phase II Trial of Onartuzumab in Combination With Erlotinib in Patients With Advanced Non- Small-Cell Lung Cancer. J Clin Oncol 31:4105–14. doi: 10.1200/JCO.2012.47.4189 13. Kim B, Wang S, Lee JM, et al (2015) Synthetic lethal screening reveals FGFR as one of the combinatorial targets to overcome resistance to Met-targeted therapy. Oncogene 34:1083–1093. doi: 10.1038/onc.2014.51 14. Zou W, Chen L (2008) Inhibitory B7-family molecules in the tumour microenvironment. Nat Rev Immunol 8:467–477. doi: 10.1038/nri2326 15. Dong H, Strome SE, Salomao DR, et al (2002) Tumor-associated B7-H1 promotes T-cell : A potential mechanism of immune evasion. Nat Med 8:nm730. doi: 10.1038/nm730 134

16. FDA (2016) Approved Drugs - Atezolizumab (TECENTRIQ). https://www.fda.gov/Drugs/InformationOnDrugs/ApprovedDrugs/ucm525780.htm . Accessed 21 Nov 2017 17. FDA (2017) Approved Drugs - Avelumab (BAVENCIO). https://www.fda.gov/Drugs/InformationOnDrugs/ApprovedDrugs/ucm547965.htm . Accessed 21 Nov 2017 18. FDA (2017) Approved Drugs - Durvalumab (Imfinzi). https://www.fda.gov/Drugs/InformationOnDrugs/ApprovedDrugs/ucm555930.htm . Accessed 21 Nov 2017 19. Patel SP, Kurzrock R (2015) PD-L1 Expression as a Predictive Biomarker in Cancer Immunotherapy. Mol Cancer Ther 14:847–56. doi: 10.1158/1535-7163.MCT-14- 0983 20. Israel O, Kuten A (2007) Early detection of cancer recurrence: 18F-FDG PET/CT can make a difference in diagnosis and patient care. J Nucl Med 48 Suppl 1:28S– 35S. 21. Kaneko OF, Willmann JK (2012) Ultrasound for molecular imaging and therapy in cancer. Quant Imaging Med Surg 2:87–97. doi: 10.3978/j.issn.2223- 4292.2012.06.06 22. Wilson KE, Bachawal S V, Abou-Elkacem L, et al (2017) Spectroscopic Photoacoustic Molecular Imaging of Breast Cancer using a B7-H3-targeted ICG Contrast Agent. Theranostics 7:1463–1476. doi: 10.7150/thno.18217 23. Hackel BJ (2014) Alternative Protein Scaffolds for Molecular Imaging and Therapy. Eng Transl Med. doi: 10.1007/978-1-4471-4372-7 24. Lobo ED, Hansen RJ, Balthasar JP (2004) Antibody pharmacokinetics and pharmacodynamics. J Pharm Sci 93:2645–2668. doi: 10.1002/jps.20178 25. Natarajan A, Hackel BJ, Gambhir SS (2013) A novel engineered anti-CD20 tracer enables early time PET imaging in a humanized transgenic mouse model of B-cell non-Hodgkins lymphoma. Clin cancer Res 19:6820–9. doi: 10.1158/1078- 0432.CCR-13-0626 26. Orlova A, Wållberg H, Stone-Elander S, Tolmachev V (2009) On the selection of a tracer for PET imaging of HER2-expressing tumors: direct comparison of a 124I- labeled and trastuzumab in a murine xenograft model. J Nucl Med 50:417–25. doi: 10.2967/jnumed.108.057919 27. Zahnd C, Kawe M, Stumpp MT, et al (2010) Efficient tumor targeting with high- affinity designed ankyrin repeat proteins: effects of affinity and molecular size. Cancer Res 70:1595–605. doi: 10.1158/0008-5472.CAN-09-2724 28. Chauhan VP, Stylianopoulos T, Boucher Y, Jain RK (2011) Delivery of Molecular and Nanoscale Medicine to Tumors: Transport Barriers and Strategies. Annu Rev Chem Biomol Eng 2:281–298. doi: 10.1146/annurev-chembioeng-061010-114300 29. Stern LA, Case BBA, Hackel BJB (2013) Alternative non-antibody protein scaffolds for molecular imaging of cancer. Curr Opin Chem Eng 2:425–32. 30. Chen J, Sawyer N, Regan L (2013) Protein-protein interactions: General trends in the relationship between binding affinity and interfacial buried surface area. Protein Sci 22:510–515. doi: 10.1002/pro.2230

135

31. Roodveldt C, Aharoni A, Tawfik DS (2005) Directed evolution of proteins for heterologous expression and stability. Curr Opin Struct Biol 15:50–56. doi: 10.1016/j.sbi.2005.01.001 32. Škrlec K, Štrukelj B, Berlec A (2015) Non-immunoglobulin scaffolds: a focus on their targets. Trends Biotechnol 33:1–11. doi: 10.1016/J.TIBTECH.2015.03.012 33. Silverman J, Liu Q, Bakker A, et al (2005) Multivalent proteins evolved by exon shuffling of a family of human receptor domains. Nat Biotechnol 23:1556– 1561. doi: 10.1038/nbt1166 34. Banner DW, Gsell B, Benz J, et al (2013) Mapping the conformational space accessible to BACE2 using surface mutants and cocrystals with Fab fragments, Fynomers and Xaperones. Acta Crystallogr Sect D Biol Crystallogr 69:1124–1137. doi: 10.1107/S0907444913006574 35. Bertschinger J, Grabulovski D, Neri D (2007) Selection of single domain binding proteins by covalent DNA display. Protein Eng Des Sel 20:57–68. doi: 10.1093/protein/gzl055 36. Grabulovski D, Kaspar M, Neri D (2007) A novel, non-immunogenic Fyn SH3- derived binding protein with tumor vascular targeting properties. J Biol Chem 282:3196–204. doi: 10.1074/jbc.M609211200 37. Schlatter D, Brack S, Banner DW, et al (2012) Generation, characterization and structural data of chymase binding proteins based on the human Fyn kinase SH3 domain. MAbs 4:497–508. doi: 10.4161/mabs.20452 38. Silacci M, Baenziger-Tobler N, Lembke W, et al (2014) Linker length matters, Fynomer-Fc fusion with an optimized linker displaying picomolar IL-17A inhibition potency. J Biol Chem 289:14392–14398. doi: 10.1074/jbc.M113.534578 39. Béhar G, Bellinzoni M, Maillasson M, et al (2013) Tolerance of the archaeal Sac7d scaffold protein to alternative library designs: Characterization of anti- immunoglobulin G Affitins. Protein Eng Des Sel 26:267–275. doi: 10.1093/protein/gzs106 40. Correa A, Pacheco S, Mechaly AE, et al (2014) Potent and specific inhibition of glycosidases by small artificial binding proteins (Affitins). PLoS One. doi: 10.1371/journal.pone.0097438 41. Mouratou B, Schaeffer F, Guilvout I, et al (2007) Remodeling a DNA-binding protein as a specific in vivo inhibitor of bacterial secretin PulD. Proc Natl Acad Sci 104:17983–17988. doi: 10.1073/pnas.0702963104 42. Angelini A, Cendron L, Chen S, et al (2012) Bicyclic peptide inhibitor reveals large contact interface with a protease target. ACS Chem Biol 7:817–821. doi: 10.1021/cb200478t 43. Baeriswyl V, Rapley H, Pollaro L, et al (2012) Bicyclic peptides with optimized ring size inhibit human plasma kallikrein and its orthologues while sparing paralogous proteases. ChemMedChem 7:1173–1176. doi: 10.1002/cmdc.201200071 44. Chen S, Rentero Rebollo I, Buth S a., et al (2013) Bicyclic peptide ligands pulled out of cysteine-rich peptide libraries. J Am Chem Soc 135:6562–6569. doi: 10.1021/ja400461h

136

45. Heinis C, Rutherford T, Freund S, Winter G (2009) Phage-encoded combinatorial chemical libraries based on bicyclic peptides. Nat Chem Biol 5:502–507. doi: 10.1038/nchembio.184 46. Nord K, Gunneriusson E, Ringdahl J, et al (1997) Binding proteins selected from combinatorial libraries of an α-helical bacterial receptor domain. Nat Biotechnol 15:772–777. doi: 10.1038/nbt0897-772 47. Woldring DR, Holec P V., Stern LA, et al (2017) A Gradient of Sitewise Diversity Promotes Evolutionary Fitness for Binder Discovery in a Three-Helix Bundle Protein Scaffold. Biochemistry 56:1656–1671. doi: 10.1021/acs.biochem.6b01142 48. Chen R, Greer A, Dean AM (1995) A highly active decarboxylating dehydrogenase with rationally inverted coenzyme specificity. Proc Natl Acad Sci 92:11666–70. doi: 10.1073/PNAS.92.25.11666 49. Steipe B, Schiller B, Pluckthun A, et al (1994) Sequence statistics reliably predict stabilizing mutations in a . J Mol … 240:188–92. doi: 10.1006/JMBI.1994.1434 50. Hackel BJ, Wittrup KD (2010) The full amino acid repertoire is superior to serine/tyrosine for selection of high affinity immunoglobulin G binders from the fibronectin scaffold. Protein Eng Des Sel 23:211–219. doi: 10.1093/protein/gzp083 51. Woldring DR, Holec P V., Zhou H, Hackel BJ (2015) High-Throughput Ligand Discovery Reveals a Sitewise Gradient of Diversity in Broadly Evolved Hydrophilic Fibronectin Domains. PLoS One 10:e0138956. doi: 10.1371/journal.pone.0138956 52. Packer MS, Liu DR (2015) Methods for the directed evolution of proteins. Nat Rev Genet 16:379–394. doi: 10.1038/nrg3927 53. Clackson T, Hoogenboom HR, Griffiths AD, Winter G (1991) Making antibody fragments using phage display libraries. 352:624–628. doi: 10.1038/352624a0 54. Boder EET, Wittrup KDK (1997) Yeast surface display for screening combinatorial polypeptide libraries. Nat Biotechnol 15:553–7. doi: 10.1038/nbt0697-553 55. Ackerman M, Levary D, Tobon G, et al (2009) Highly avid magnetic bead capture: An efficient selection method for de novo protein engineering utilizing yeast surface display. Biotechnol Prog 25:774–83. doi: 10.1021/bp.174 56. VanAntwerp JJJ, Wittrup KDD (2000) Fine affinity discrimination by yeast surface display and flow cytometry. Biotechnol Prog 16:31–7. doi: 10.1021/bp990133s 57. Wilson DS, Keefe AD, Szostak JW (2001) The use of mRNA display to select high- affinity protein-binding peptides. Proc Natl Acad Sci 98:3750–5. doi: 10.1073/pnas.061028198 58. Hanes J, Plückthun A (1997) In vitro selection and evolution of functional proteins by using ribosome display. Proc Natl Acad Sci 94:4937–42. 59. Romero PA, Arnold FH (2009) Exploring protein fitness landscapes by directed evolution. Nat Rev Mol Cell Biol 10:866–876. doi: 10.1038/nrm2805 60. Tokuriki N, Tawfik DS (2009) Stability effects of mutations and protein evolvability. Curr Opin Struct Biol 19:596–604. doi: 10.1016/j.sbi.2009.08.003 61. Pace CN (1975) The stability of globular proteins. CRC Crit Rev Biochem 3:1–43. 62. Taverna DM, Goldstein RA (2002) Why are proteins marginally stable? Proteins 46:105–9.

137

63. Bloom JD, Labthavikul ST, Otey CR, Arnold FH (2006) Protein stability promotes evolvability. Proc Natl Acad Sci 103:5869–74. doi: 10.1073/pnas.0510098103 64. Birtalan S, Zhang Y, Fellouse FA, et al (2008) The Intrinsic Contributions of Tyrosine, Serine, Glycine and Arginine to the Affinity and Specificity of Antibodies. doi: 10.1016/j.jmb.2008.01.093 65. Banta S, Dooley K, Shur O (2013) Replacing antibodies: engineering new binding proteins. Annu Rev Biomed Eng 15:93–113. doi: 10.1146/annurev-bioeng-071812- 152412 66. Yuan F, Dellian M, Fukumura D, et al (1995) Vascular permeability in a human tumor xenograft: molecular size dependence and cutoff size. Cancer Res 55:3752– 3756. 67. Schmidt MM, Wittrup KD (2009) A modeling analysis of the effects of molecular size and binding affinity on tumor targeting. Mol Cancer Ther 8:2861–71. doi: 10.1158/1535-7163.MCT-09-0195 68. Thurber GM, Schmidt MM, Wittrup KD (2008) Factors determining antibody distribution in tumors. Trends Pharmacol Sci 29:57–61. doi: 10.1016/j.tips.2007.11.004 69. Thurber G, Schmidt M, Wittrup K (2008) Antibody tumor penetration: transport opposed by systemic and antigen-mediated clearance. Adv Drug Deliv Rev 60:1421. doi: 10.1016/j.addr.2008.04.012.Antibody 70. Wu AM, Senter PD (2005) Arming antibodies: prospects and challenges for immunoconjugates. Nat Biotechnol 23:1137–46. doi: 10.1038/nbt1141 71. Fleetwood F, Klint S, Hanze M, et al (2014) Simultaneous targeting of two ligand- binding sites on VEGFR2 using biparatopic Affibody molecules results in dramatically improved affinity. Sci Rep 4:7518. doi: 10.1038/srep07518 72. Engh RA, Bossemeyer D (2002) Structural aspects of protein kinase control-role of conformational flexibility. Pharmacol Ther 93:99–111. doi: 10.1016/S0163- 7258(02)00180-8 73. Sidhu SS (2012) Antibodies for all: The case for genome-wide affinity reagents. FEBS Lett 586:2778–2779. doi: 10.1016/j.febslet.2012.05.044 74. Rosenberg AS (2006) Effects of protein aggregates: an immunologic perspective. AAPS J 8:E501–E507. doi: 10.1208/aapsj080359 75. Hermeling S, Crommelin DJA, Schellekens H, Jiskoot W (2004) Structure- immunogenicity relationships of therapeutic proteins. Pharm Res 21:897–903. doi: 10.1023/B:PHAM.0000029275.41323.a6 76. Koide A, Bailey CW, Huang X, Koide S (1998) The fibronectin type III domain as a scaffold for novel binding proteins. J Mol Biol 284:1141–51. doi: 10.1006/jmbi.1998.2238 77. Lipovsek D (2011) Adnectins: engineered target-binding protein therapeutics. Protein Eng Des Sel 24:3–9. doi: 10.1093/protein/gzq097 78. Revets H, Baetselier P De, Muyldermans S (2005) Nanobodies as novel agents for cancer therapy. Expert Opin Biol Ther 5:111–124. 79. Tamaskovic R, Simon M, Stefan N, et al (2012) Designed ankyrin repeat proteins (DARPins) from research to therapy. In: Methods Enzymol. pp 101–34

138

80. Gebauer M, Skerra A (2012) Anticalins: Small Engineered Binding Proteins Based on the Lipocalin Scaffold. In: Methods Enzymol. pp 157–88 81. Moore SJ, Leung CL, Cochran JR (2012) Knottins: Disulfide-bonded therapeutic and diagnostic peptides. Drug Discov Today Technol. doi: 10.1016/j.ddtec.2011.07.003 82. Ackerman SE, Currier N V., Bergen JM, Cochran. JR (2014) Cystine-knot peptides: emerging tools for cancer imaging and therapy. Expert Rev Proteomics 11:561–572. 83. Getz JA, Rice JJ, Daugherty PS (2011) Protease-resistant peptide ligands from a knottin scaffold library. ACS Chem Biol 6:837–844. doi: 10.1021/cb200039s 84. Castel G, Chtéoui M, Heyd B, Tordo N (2011) Phage display of combinatorial peptide libraries: Application to antiviral research. Molecules 16:3499–3518. doi: 10.3390/molecules16053499 85. Gera N, Hussain M, Wright RC, Rao BM (2011) Highly stable binding proteins derived from the hyperthermophilic Sso7d scaffold. J Mol Biol 409:601–16. doi: 10.1016/j.jmb.2011.04.020 86. Löfblom J, Feldwisch J, Tolmachev V, et al (2010) Affibody molecules: engineered proteins for therapeutic, diagnostic and biotechnological applications. FEBS Lett 584:2670–80. doi: 10.1016/j.febslet.2010.04.014 87. Corcoran E, Hanson R (2013) Imaging EGFR and HER2 by PET and SPECT: A Review. Med Res Rev 2:1–48. doi: 10.1002/med 88. Hynes NE, MacDonald G (2009) ErbB receptors and signaling pathways in cancer. Curr Opin Cell Biol 21:177–84. doi: 10.1016/j.ceb.2008.12.010 89. Scartozzi M, Bearzi I, Mandolesi A, et al (2009) Epidermal Growth Factor Receptor (EGFR) gene copy number (GCN) correlates with clinical activity of irinotecan- cetuximab in K-RAS wild-type colorectal cancer: a fluorescence in situ (FISH) and chromogenic in situ hybridization (CISH) analysis. BMC Cancer 9:303. doi: 10.1186/1471-2407-9-303 90. Laurent-Puig P, Cayre A, Manceau G, et al (2009) Analysis of PTEN, BRAF, and EGFR status in determining benefit from cetuximab therapy in wild-type KRAS metastatic colon cancer. J Clin Oncol 27:5924–30. doi: 10.1200/JCO.2008.21.6796 91. Moroni M, Veronese S, Benvenuti S, et al (2005) Gene copy number for epidermal growth factor receptor (EGFR) and clinical response to antiEGFR treatment in colorectal cancer: a cohort study. Lancet Oncol 6:279–86. doi: 10.1016/S1470- 2045(05)70102-9 92. Fraczkiewicz R, Braun W (1998) Exact and efficient analytical calculation of the accessible surface areas and their gradients for macromolecules. J. Comput. Chem. 19: 93. Yin S, Ding F, Dokholyan N (2007) Eris: an automated estimator of protein stability. Nat Methods 4:466–467. 94. Hackel BJ, Ackerman ME, Howland SW, Wittrup KD (2010) Stability and CDR composition biases enrich binder functionality landscapes. J Mol Biol 401:84–96. doi: 10.1016/j.jmb.2010.06.004 95. Chao G, Lau WL, Hackel BJ, et al (2006) Isolating and engineering human antibodies using yeast surface display. Nat Protoc 1:755–68. doi:

139

10.1038/nprot.2006.94 96. Hackel BJ, Kapila A, Wittrup KD (2008) Picomolar affinity fibronectin domains engineered utilizing loop length diversity, recursive mutagenesis, and loop shuffling. J Mol Biol 381:1238–52. doi: 10.1016/j.jmb.2008.06.051 97. Lipovsek D, Lippow SM, Hackel BJ, et al (2007) Evolution of an interloop disulfide bond in high-affinity antibody mimics based on fibronectin type III domain and selected by yeast surface display: molecular convergence with single-domain camelid and shark antibodies. J Mol Biol 368:1024–41. doi: 10.1016/j.jmb.2007.02.029 98. Cámara B, Liu M, Reynolds J, et al (2010) T7 phage protein Gp2 inhibits the RNA polymerase by antagonizing stable DNA strand separation near the transcription start site. Proc Natl Acad Sci 107:2247–52. doi: 10.1073/pnas.0907908107 99. Diamond R (1974) Real-space refinement of the structure of hen egg-white lysozyme. J Mol Biol 82:371–391. 100. Harris LJ, Skaletsky E, McPherson A (1998) Crystallographic structure of an intact IgG1 . J Mol Biol 275:861–72. doi: 10.1006/jmbi.1997.1508 101. Gai SA, Wittrup KD (2007) Yeast surface display for protein engineering and characterization. Curr Opin Struct Biol 17:467–73. doi: 10.1016/j.sbi.2007.08.012 102. Nechaev S, Severinov K (1999) Inhibition of Escherichia coli RNA polymerase by bacteriophage T7 gene 2 protein. J Mol Biol 289:815–26. doi: 10.1006/jmbi.1999.2782 103. Ederth J, Artsimovitch I, Isaksson L a, Landick R (2002) The downstream DNA jaw of bacterial RNA polymerase facilitates both transcriptional initiation and pausing. J Biol Chem 277:37456–63. doi: 10.1074/jbc.M207038200 104. Hackel BJB, Kimura RRH, Gambhir SS (2012) Use of 64Cu-labeled fibronectin domain with EGFR-overexpressing tumor xenograft: molecular imaging. Radiology 263:179–188. doi: 10.1148/radiol.12111504/-/DC1 105. Spangler JB, Neil JR, Abramovitch S, et al (2010) Combination antibody treatment down-regulates epidermal growth factor receptor by inhibiting endosomal recycling. Proc Natl Acad Sci 107:13252–7. doi: 10.1073/pnas.0913476107 106. Reilly RM, Kiarash R, Sandhu J, et al (2000) A comparison of EGF and MAb 528 labeled with 111In for imaging human breast cancer. J Nucl Med 41:903–911. 107. Malmberg J, Tolmachev V, Orlova A (2011) Imaging agents for in vivo molecular profiling of disseminated prostate cancer--targeting EGFR receptors in prostate cancer: comparison of cellular processing of [111In]-labeled affibody molecule Z(EGFR:2377) and cetuximab. Int J Oncol 38:1137–43. doi: 10.3892/ijo.2011.915 108. Pavoor T V, Cho YK, Shusta E V (2009) Development of GFP-based biosensors possessing the binding properties of antibodies. Proc Natl Acad Sci 106:11895– 11900. 109. Binz HK, Amstutz P, Plückthun A (2005) Engineering novel binding proteins from nonimmunoglobulin domains. Nat Biotechnol 23:1257–68. doi: 10.1038/nbt1127 110. Roldan JLO, Blackledge M, van Nuland NAJ, Azuaga AI (2011) Solution structure, dynamics and thermodynamics of the three SH3 domains of CD2AP. J Biomol NMR

140

50:103–17. doi: 10.1007/s10858-011-9505-5 111. Zhang Y, Skolnick J (2005) TM-align: A protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 33:2302–2309. doi: 10.1093/nar/gki524 112. Panni S, Dente L, Cesareni G (2002) In vitro evolution of recognition specificity mediated by SH3 domains reveals target recognition rules. J Biol Chem 277:21666– 74. doi: 10.1074/jbc.M109788200 113. Baum RP, Prasad V, Müller D, et al (2010) Molecular imaging of HER2-expressing malignant tumors in breast cancer patients using synthetic 111In- or 68Ga-labeled affibody molecules. J Nucl Med 51:892–897. doi: 10.2967/jnumed.109.073239 114. Silverman AP, Levin AM, Lahti JL, Cochran JR (2009) Engineered Cystine-Knot Peptides that Bind ??v??3 Integrin with Antibody-Like Affinities. J Mol Biol 385:1064–1075. doi: 10.1016/j.jmb.2008.11.004 115. Souriau C, Chiche L, Irving R, Hudson P (2005) New binding specificities derived from Min-23, a small cystine-stabilized peptidic scaffold. Biochemistry 44:7143– 7155. doi: 10.1021/bi0481592 116. Stricher F, Huang CC, Descours A, et al (2008) Combinatorial Optimization of a CD4-Mimetic Miniprotein and Cocrystal Structures with HIV-1 gp120 Envelope Glycoprotein. J Mol Biol 382:510–524. doi: 10.1016/j.jmb.2008.06.069 117. Zoller F, Markert A, Barthe P, et al (2012) Combination of phage display and molecular grafting generates highly specific tumor-targeting miniproteins. Angew Chemie - Int Ed 51:13136–13139. doi: 10.1002/anie.201203857 118. Kimura RH, Jones DS, Jiang L, et al (2011) Functional mutation of multiple solvent- exposed loops in the Ecballium elaterium trypsin inhibitor-II cystine knot miniprotein. PLoS One 6:e16112. doi: 10.1371/journal.pone.0016112 119. Nilvebrant J, Åstrand M, Löfblom J, Hober S (2013) Development and characterization of small bispecific albumin-binding domains with high affinity for ErbB3. Cell Mol Life Sci 70:3973–3985. doi: 10.1007/s00018-013-1370-9 120. Nilvebrant J, Alm T, Hober S, Lofblom J (2011) Engineering Bispecificity into a Single Albumin-Binding Domain. Curr Sci. doi: 10.1371/Citation 121. Alm T, Yderland L, Nilvebrant J, et al (2010) A small bispecific protein selected for orthogonal affinity purification. Biotechnol J 5:605–617. doi: 10.1002/biot.201000041 122. Eklund M, Axelsson L, Uhlén M, Nygren PÅ (2002) Anti-idiotypic protein domains selected from -based affibody libraries. Proteins Struct Funct Genet 48:454–462. doi: 10.1002/prot.10169 123. Wikman M, Steffen a. C, Gunneriusson E, et al (2004) Selection and characterization of HER2/neu-binding affibody ligands. Protein Eng Des Sel 17:455–462. doi: 10.1093/protein/gzh053 124. Grimm S, Yu F, Nygren PÅ (2011) Ribosome display selection of a murine IgG1 fab binding affibody molecule allowing species selective recovery of monoclonal antibodies. Mol Biotechnol 48:263–276. doi: 10.1007/s12033-010-9367-1 125. Wikman M, Rowcliffe E, Friedman M, et al (2006) Selection and characterization of an HIV-1 gp120-binding affibody ligand. Biotechnol Appl Biochem 45:93–105. doi: 10.1042/BA20060016

141

126. Grönwall C, Snelders E, Palm AJ, et al (2008) Generation of Affibody ligands binding interleukin-2 receptor alpha/CD25. Biotechnol Appl Biochem 50:97–112. doi: 10.1042/BA20070261 127. Nord K, Nord O, Uhlén M, et al (2001) Recombinant human factor VIII-specific affinity ligands selected from phage-displayed combinatorial libraries of protein A. Eur J Biochem 268:4269–4277. doi: 10.1046/j.1432-1327.2001.02344.x 128. Rönnmark J, Grönlund H, Uhlén M, Nygren PÅ (2002) Human immunoglobulin A (IgA)-specific ligands from combinatorial engineering of protein A. Eur J Biochem 269:2647–2655. doi: 10.1046/j.1432-1033.2002.02926.x 129. Friedman M, Orlova A, Johansson E, et al (2008) Directed Evolution to Low Nanomolar Affinity of a Tumor-Targeting Epidermal Growth Factor Receptor- Binding Affibody Molecule. J Mol Biol 376:1388–1402. doi: 10.1016/j.jmb.2007.12.060 130. Lindborg M, Cortez E, Höidén-Guthenberg I, et al (2011) Engineered high-affinity affibody molecules targeting platelet-derived growth factor receptor β in vivo. J Mol Biol 407:298–315. doi: 10.1016/j.jmb.2011.01.033 131. Grönwall C, Jonsson A, Lindström S, et al (2007) Selection and characterization of Affibody ligands binding to Alzheimer amyloid β peptides. J Biotechnol 128:162– 183. doi: 10.1016/j.jbiotec.2006.09.013 132. Löfdahl P-A, Nygren P-A (2010) Affinity maturation of a TNFalpha-binding affibody molecule by Darwinian survival selection. Biotechnol Appl Biochem 55:111–120. doi: 10.1042/BA20090274 133. Kronqvist N, Malm M, Göstring L, et al (2011) Combining phage and staphylococcal surface display for generation of ErbB3-specific Affibody molecules. Protein Eng Des Sel 24:385–396. doi: 10.1093/protein/gzq118 134. Sandström K, Xu Z, Forsberg G, Nygren P (2003) Inhibition of the CD28-CD80 co- stimulation signal by a CD28-binding affibody ligand developed by combinatorial protein engineering. Protein Eng 16:691–697. doi: 10.1093/protein/gzg086 135. Li J, Lundberg E, Vernet E, et al (2010) Selection of affibody molecules to the ligand-binding site of the insulin-like growth factor-1 receptor. Biotechnol Appl Biochem 55:99–109. doi: 10.1042/BA20090226 136. Olson CA, Liao HI, Sun R, Roberts RW (2008) mRNA display selection of a high- affinity, modification-specific phospho-ikba-binding fibronectin. ACS Chem Biol 3:480–485. doi: 10.1021/cb800069c 137. Huang J, Koide A, Makabe K, Koide S (2008) Design of protein function leaps by directed domain interface evolution. Proc Natl Acad Sci 105:6578–6583. doi: 10.1073/pnas.0801097105 138. Gera N, Hill AB, White DP, et al (2012) Design of pH Sensitive Binding Proteins from the Hyperthermophilic Sso7d Scaffold. PLoS One. doi: 10.1371/journal.pone.0048928 139. Hussain M, Lockney D, Wang R, et al (2013) Avidity-mediated virus separation using a hyperthermophilic affinity ligand. Biotechnol Prog 29:237–246. doi: 10.1002/btpr.1655 140. Desmet J, Verstraete K, Bloch Y, et al (2014) Structural basis of IL-23 antagonism

142

by an protein scaffold. Nat Commun 5:5237. doi: 10.1038/ncomms6237 141. Hoffmann A, Kovermann M, Lilie H, et al (2012) New binding mode to TNF-alpha revealed by ubiquitin-based artificial binding protein. PLoS One 7:2–11. doi: 10.1371/journal.pone.0031298 142. Xu L, Aha P, Gu K, et al (2002) Directed evolution of high-affinity antibody mimics using mRNA display. Chem Biol 9:933–42. 143. Koide A, Wojcik J, Gilbreth RN, et al (2012) Teaching an old scaffold new tricks: constructed using alternative surfaces of the FN3 scaffold. J Mol Biol 415:393–405. doi: 10.1016/j.jmb.2011.12.019 144. Mann JK, Wood JF, Stephan AF, et al (2013) Epitope-guided engineering of binders for in vivo inhibition of Erk-2 signaling. ACS Chem Biol 8:608– 616. doi: 10.1021/cb300579e 145. Sullivan M a., Brooks LR, Weidenborner P, et al (2013) Anti-idiotypic monobodies derived from a fibronectin scaffold. Biochemistry 52:1802–1813. doi: 10.1021/bi3016668 146. Wojcik J, Hantschel O, Grebien F, et al (2010) A potent and highly specific FN3 monobody inhibitor of the Abl SH2 domain. Nat Struct Mol Biol 17:519–527. doi: 10.1038/nsmb.1793 147. Richards J, Miller M, Abend J, et al (2003) Engineered fibronectin type III domain with a RGDWXE sequence binds with enhanced affinity and specificity to human αvβ3 integrin. J Mol Biol 326:1475–1488. doi: 10.1016/S0022-2836(03)00082-2 148. Getmanova E V., Chen Y, Bloom L, et al (2006) Antagonists to human and mouse vascular endothelial growth factor receptor 2 generated by directed protein evolution in vitro. Chem Biol 13:549–556. doi: 10.1016/j.chembiol.2005.12.009 149. Gilbreth RN, Truong K, Madu I, et al (2011) Isoform-specific monobody inhibitors of small ubiquitin-related modifiers engineered using structure-guided library design. Proc Natl Acad Sci 108:7751–7756. doi: 10.1073/pnas.1102294108 150. Gilbreth RN, Esaki K, Koide A, et al (2008) A Dominant Conformational Role for Amino Acid Diversity in Minimalist Protein-Protein Interfaces. J Mol Biol 381:407–418. doi: 10.1016/j.jmb.2008.06.014 151. Karatan E, Merguerian M, Han Z (2004) Molecular recognition properties of FN3 monobodies that bind the Src SH3 domain. Chem Biol 11:835–844. doi: 10.1016/j 152. Arbabi Ghahroudi M, Desmyter a., Wyns L, et al (1997) Selection and identification of single domain antibody fragments from camel heavy-chain antibodies. FEBS Lett 414:521–526. doi: 10.1016/S0014-5793(97)01062-4 153. Gottlin EB, Xiangrong Guan, Pegram C, et al (2009) Isolation of novel EGFR- specific VHH domains. J Biomol Screen Off J Soc Biomol Screen 14:77–85. doi: 10.1177/1087057108327064 154. Harmsen MM, Van Solt CB, Van Zijderveld-Van Bemmel a. M, et al (2006) Selection and optimization of proteolytically stable llama single-domain antibody fragments for oral immunotherapy. Appl Microbiol Biotechnol 72:544–551. doi: 10.1007/s00253-005-0300-7 155. Dumoulin M, Conrath K, Van Meirhaeghe A, et al (2002) Single-domain antibody fragments with high conformational stability. Protein Sci 11:500–515. doi:

143

10.1110/ps.34602 156. Steemson JD, Baake M, Rakonjac J, et al (2014) Tracking molecular recognition at the atomic level with a new protein scaffold based on the OB-fold. PLoS One 9:e86050. doi: 10.1371/journal.pone.0086050 157. Vogt M, Skerra A (2004) Construction of an artificial receptor protein (“”) based on the human apolipoprotein D. ChemBioChem 5:191–199. doi: 10.1002/cbic.200300703 158. Beste G, Schmidt FS, Stibora T, Skerra a (1999) Small antibody-like proteins with prescribed ligand specificities derived from the lipocalin fold. Proc Natl Acad Sci 96:1898–1903. doi: 10.1073/pnas.96.5.1898 159. Lamla T, Erdmann V a. (2003) Searching sequence space for high-affinity binding peptides using ribosome display. J Mol Biol 329:381–388. doi: 10.1016/S0022- 2836(03)00432-7 160. Schlehuber S, Beste G, Skerra a (2000) A novel type of receptor protein, based on the lipocalin scaffold, with specificity for digoxigenin. J Mol Biol 297:1105–1120. doi: 10.1006/jmbi.2000.3646 161. Mercader J V., Skerra A (2002) Generation of anticalins with specificity for a nonsymmetric phthalic acid ester. Anal Biochem 308:269–277. doi: 10.1016/S0003- 2697(02)00200-2 162. Schilling J, Schöppe J, Plückthun A (2014) From DARPins to LoopDARPins: Novel LoopDARPin design allows the selection of low picomolar binders in a single round of ribosome display. J Mol Biol 426:691–721. doi: 10.1016/j.jmb.2013.10.026 163. Parizek P, Kummer L, Rube P, et al (2012) Designed ankyrin repeat proteins (DARPins) as novel isoform-specific intracellular inhibitors of c-Jun N-terminal kinases. ACS Chem Biol 7:1356–66. doi: 10.1021/cb3001167 164. Fogolari F, Brigo a, Molinari H (2002) The Poisson-Boltzmann equation for biomolecular electrostatics: a tool for structural biology. J Mol Recognit 15:377–92. doi: 10.1002/jmr.577 165. Kummer L, Parizek P, Rube P, et al (2012) Structural and functional analysis of phosphorylation-specific binders of the kinase ERK from designed ankyrin repeat protein libraries. Proc Natl Acad Sci 109:E2248–E2257. doi: 10.1073/pnas.1205399109 166. Theurillat J-P, Dreier B, Nagy-Davidescu G, et al (2010) Designed ankyrin repeat proteins: a novel tool for testing epidermal growth factor receptor 2 expression in breast cancer. Mod Pathol 23:1289–1297. doi: 10.1038/modpathol.2010.103 167. Winkler J, Martin-Killias P, Plückthun A, Zangemeister-Wittke U (2009) EpCAM- targeted delivery of nanocomplexed siRNA to tumor cells with designed ankyrin repeat proteins. Mol Cancer Ther 8:2674–2683. doi: 10.1158/1535-7163.MCT-09- 0402 168. Schymkowitz J, Borg J, Stricher F, et al (2005) The FoldX web server: an online force field. Nucleic Acids Res 33:W382-8. doi: 10.1093/nar/gki387 169. Zaccolo M, Williams DM, Brown DM, Gherardi E (1996) An approach to random mutagenesis of DNA using mixtures of triphosphate derivatives of nucleoside analogues. J Mol Biol 255:589–603. doi: 10.1006/jmbi.1996.0049

144

170. van Dijk EL, Jaszczyszyn Y, Thermes C (2014) Library preparation methods for next-generation sequencing: tone down the bias. Exp Cell Res 322:12–20. doi: 10.1016/j.yexcr.2014.01.008 171. Dabney J, Meyer M (2012) Length and GC-biases during sequencing library amplification: a comparison of various polymerase-buffer systems with ancient and modern DNA sequencing libraries. Biotechniques 52:87–94. doi: 10.2144/000113809 172. Masella AP, Bartram AK, Truszkowski JM, et al (2012) PANDAseq: paired-end assembler for illumina sequences. BMC Bioinformatics 13:31. doi: 10.1186/1471- 2105-13-31 173. Hamburg MA, Collins FS (2010) The Path to Personalized Medicine. N Engl J Med 363:301–304. doi: 10.1056/NEJMp1002530 174. Scott AM, Wolchok JD, Old LJ (2012) Antibody therapy of cancer. Nat Rev Cancer 12:278–287. 175. Kircher MF, Hricak H, Larson SM (2012) Molecular imaging for personalized cancer care. Mol Oncol 6:182–195. doi: 10.1016/j.molonc.2012.02.005 176. Shinojima N, Tada K, Shiraishi S, et al (2003) Prognostic value of epidermal growth factor receptor in patients with glioblastoma multiforme. Cancer Res 63:6962–6970. 177. Nieto Y, Nawaz F, Jones RB, et al (2007) Prognostic significance of overexpression and phosphorylation of epidermal growth factor receptor (EGFR) and the presence of truncated EGFRvIII in locoregionally advanced breast cancer. J Clin Oncol 25:4405–4413. 178. Lugli A, Iezzi G, Hostettler I, et al (2010) Prognostic impact of the expression of putative cancer stem cell markers CD133, CD166, CD44s, EpCAM, and ALDH1 in colorectal cancer. Br J Cancer 103:382–390. 179. Galizia G, Lieto E, Orditura M, et al (2007) Epidermal growth factor receptor (EGFR) expression is associated with a worse prognosis in gastric cancer patients undergoing curative surgery. World J Surg 31:1458–1468. 180. Parra HS, Cavina R, Latteri F, et al (2004) Analysis of epidermal growth factor receptor expression as a predictive factor for response to gefitinib (“Iressa”, ZD1839) in non-small-cell lung cancer. Br J Cancer 91:208–212. 181. Schlomm T, Kirstein P, Iwers L, et al (2007) Clinical significance of epidermal growth factor receptor protein overexpression and gene copy number gains in prostate cancer. Clin Cancer Res 13:6579–6584. 182. Huang C-W, Tsai H-L, Chen Y-T, et al (2013) The prognostic values of EGFR expression and KRAS mutation in patients with synchronous or metachronous metastatic colorectal cancer. BMC Cancer 13:599. doi: 10.1186/1471-2407-13-599 183. Rokita M, Stec R, Bodnar L, et al (2013) Overexpression of epidermal growth factor receptor as a prognostic factor in colorectal cancer on the basis of the Allred scoring system. Onco Targets Ther 6:967–76. doi: 10.2147/OTT.S42446 184. Lee HJ, Seo AN, Kim EJ, et al (2015) Prognostic and predictive values of EGFR overexpression and EGFR copy number alteration in HER2-positive breast cancer. Br J Cancer 112:103–11. doi: 10.1038/bjc.2014.556 185. Scartozzi M, Bearzi I, Berardi R, et al (2004) Epidermal growth factor receptor

145

(EGFR) status in primary colorectal tumors does not correlate with EGFR expression in related metastatic sites: implications for treatment with EGFR- targeted monoclonal antibodies. J Clin Oncol 22:4772–8. doi: 10.1200/JCO.2004.00.117 186. Yarom N, Marginean C, Moyana T, et al (2014) EGFR expression variance in paired colorectal cancer primary and metastatic tumors. Cancer Biol Ther 10:416–421. doi: 10.4161/cbt.10.5.12610 187. Bozzetti C, Tiseo M, Lagrasta C, et al (2008) Comparison between epidermal growth factor receptor (EGFR) in primary non-small cell lung cancer (NSCLC) and in fine-needle aspirates from distant metastatic sites. J Thorac Oncol 3:18–22. 188. Linden HM, Stekhova SA, Link JM, et al (2006) Quantitative Fluoroestradiol Positron Emission Tomography Imaging Predicts Response to Endocrine Treatment in Breast Cancer. J Clin Oncol 24:2793–2799. doi: 10.1200/JCO.2005.04.3810 189. Peterson LM, Kurland BF, Schubert EK, et al (2014) A phase 2 study of 16α-[18F]- fluoro-17β-estradiol positron emission tomography (FES-PET) as a marker of hormone sensitivity in metastatic breast cancer (MBC). Mol Imaging Biol 16:431– 40. doi: 10.1007/s11307-013-0699-7 190. Cai W, Chen K, He L, et al (2007) Quantitative PET of EGFR expression in xenograft-bearing mice using 64Cu-labeled cetuximab, a chimeric anti-EGFR monoclonal antibody. Eur J Nucl Med Mol Imaging 34:850–8. doi: 10.1007/s00259- 006-0361-6 191. Niu G, Li Z, Xie J, et al (2009) PET of EGFR antibody distribution in head and neck squamous cell carcinoma models. J Nucl Med 50:1116–23. doi: 10.2967/jnumed.109.061820 192. Niu G, Sun X, Cao Q, et al (2010) Cetuximab-based immunotherapy and radioimmunotherapy of head and neck squamous cell carcinoma. Clin Cancer Res 16:2095–105. doi: 10.1158/1078-0432.CCR-09-2495 193. Ping Li W, Meyer LA, Capretto DA, et al (2008) Receptor-binding, biodistribution, and metabolism studies of 64Cu-DOTA-cetuximab, a PET-imaging agent for epidermal growth-factor receptor-positive tumors. Cancer Biother Radiopharm 23:158–171. 194. Menke-Van der Houven van Oordt CW, Gootjes EC, Huisman MC, et al (2015) 89Zr-cetuximab PET imaging in patients with advanced colorectal cancer. Oncotarget 6:30384–30393. doi: 10.18632/oncotarget.4672 195. Bhattacharyya S, Kurdziel K, Wei L, et al (2013) Zirconium-89 labeled panitumumab: A potential immuno-PET probe for HER1-expressing carcinomas. Nucl Med Biol 40:451–457. doi: 10.1016/j.nucmedbio.2013.01.007 196. Hackel BJ, Sathirachinda A, Gambhir SS (2012) Designed hydrophilic and charge mutations of the fibronectin domain: towards tailored protein biodistribution. Protein Eng Des Sel 25:639–47. doi: 10.1093/protein/gzs036 197. Miao Z, Ren G, Liu H, et al (2010) Small-animal PET imaging of human epidermal growth factor receptor positive tumor with a 64Cu labeled affibody protein. Bioconjug Chem 21:947–954. doi: 10.1021/bc900515p

146

198. Miao Z, Ren G, Liu H, et al (2012) PET of EGFR expression with an 18F-labeled affibody molecule. J Nucl Med 53:1110–8. doi: 10.2967/jnumed.111.100842 199. Nordberg E, Orlova A, Friedman M, et al (2008) In vivo and in vitro uptake of 111In, delivered with the affibody molecule (ZEGFR:955)2, in EGFR expressing tumour cells. Oncol Rep 19:853–857. 200. Tolmachev V, Rosik D, Wållberg H, et al (2010) Imaging of EGFR expression in murine xenografts using site-specifically labelled anti-EGFR 111In-DOTA-Z EGFR:2377 Affibody molecule: aspect of the injected tracer amount. Eur J Nucl Med Mol Imaging 37:613–22. doi: 10.1007/s00259-009-1283-x 201. Su X, Cheng K, Jeon J, et al (2014) Comparison of Two Site-Specifically 18F- Labeled Affibodies for PET Imaging of EGFR Positive Tumors. Mol Pharm 11:3947–3956. doi: 10.1021/mp5003043 202. Huang L, Gainkam LOT, Caveliers V, et al (2008) SPECT imaging with 99mTc- labeled EGFR-specific nanobody for in vivo monitoring of EGFR expression. Mol Imaging Biol 10:167–175. 203. Gainkam LOT, Keyaerts M, Caveliers V, et al (2011) Correlation between epidermal growth factor receptor-specific nanobody uptake and tumor burden: a tool for noninvasive monitoring of tumor response to therapy. Mol Imaging Biol 13:940– 948. 204. Gainkam LOT, Huang L, Caveliers V, et al (2008) Comparison of the biodistribution and tumor targeting of two 99mTc-labeled anti-EGFR nanobodies in mice, using pinhole SPECT/micro-CT. J Nucl Med 49:788–95. doi: 10.2967/jnumed.107.048538 205. Chakravarty R, Goel S, Valdovinos HF, et al (2014) Matching the decay half-life with the biological half-life: ImmunoPET imaging with44Sc-labeled Cetuximab Fab fragment. Bioconjug Chem 25:2197–2204. doi: 10.1021/bc500415x 206. Thurber GM, Wittrup KD (2008) Quantitative spatiotemporal analysis of antibody fragment diffusion and endocytic consumption in tumor spheroids. Cancer Res 68:3334–3341. 207. Memon AA, Jakobsen S, Dagnaes-Hansen F, et al (2009) Positron emission tomography (PET) imaging with [11C]-labeled erlotinib: a micro-PET study on mice with lung tumor xenografts. Cancer Res 69:873–8. doi: 10.1158/0008- 5472.CAN-08-3118 208. Zhang MR, Kumata K, Hatori A, et al (2010) [11C]Gefitinib ([11C]Iressa): Radiosynthesis, In Vitro uptake, and In Vivo imaging of intact murine fibrosarcoma. Mol Imaging Biol 12:181–191. doi: 10.1007/s11307-009-0265-5 209. Wang H, Yu J, Yang G, et al (2007) Assessment of 11C-labeled-4-N-(3- bromoanilino)-6,7-dimethoxyquinazoline as a positron emission tomography agent to monitor epidermal growth factor receptor expression. Cancer Sci 98:1413–6. doi: 10.1111/j.1349-7006.2007.00562.x 210. Dai D, Li X-F, Wang J, et al (2016) Predictive efficacy of 11 C-PD153035 PET imaging for EGFR-tyrosine kinase inhibitor sensitivity in non-small cell lung cancer patients. Int J Cancer 138:1003–1012. doi: 10.1002/ijc.29832 211. Bahce I, Smit EF, Lubberink M, et al (2013) Development of [11C]erlotinib positron

147

emission tomography for in vivo evaluation of EGF receptor mutational status. Clin Cancer Res 19:183–193. doi: 10.1158/1078-0432.CCR-12-0289 212. Slobbe P, Windhorst AD, Walsum MS van, et al (2014) Development of [18F]afatinib as new TKI-PET tracer for EGFR positive tumors. Nucl Med Biol 41:749–757. doi: 10.1016/j.nucmedbio.2014.06.005 213. Kareem H, Sandström K, Elia R, et al (2010) Blocking EGFR in the liver improves the tumor-to-liver uptake ratio of radiolabeled EGF. Tumour Biol 31:79–87. doi: 10.1007/s13277-009-0011-2 214. Kruziki MAMA, Bhatnagar S, Woldring DRDR, et al (2015) A 45-Amino-Acid Scaffold Mined from the PDB for High-Affinity Ligand Engineering. Chem Biol 22:946–956. doi: 10.1016/j.chembiol.2015.06.012 215. Chambers AF (2009) MDA-MB-435 and M14 Cell Lines: Identical but not M14 Melanoma? Cancer Res 69:5292–5293. doi: 10.1158/0008-5472.CAN-09-1528 216. Ahlgren S, Orlova A, Wallberg H, et al (2010) Targeting of HER2-Expressing Tumors Using 111In-ABY-025, a Second-Generation Affibody Molecule with a Fundamentally Reengineered Scaffold. J Nucl Med 51:1131–1138. doi: 10.2967/jnumed.109.073346 217. Ahlgren S, Wallberg H, Tran TA, et al (2009) Targeting of HER2-Expressing Tumors with a Site-Specifically 99mTc-Labeled Recombinant Affibody Molecule, ZHER2:2395, with C-Terminally Engineered Cysteine. J Nucl Med 50:781–789. doi: 10.2967/jnumed.108.056929 218. Kramer-Marek G, Kiesewetter DO, Martiniova L, et al (2008) [18F]FBEM- ZHER2:342–Affibody molecule—a new molecular tracer for in vivo monitoring of HER2 expression by positron emission tomography. Eur J Nucl Med Mol Imaging 35:1008–1018. doi: 10.1007/s00259-007-0658-0 219. Cheng Z, De Jesus OP, Namavari M, et al (2008) Small-Animal PET Imaging of Human Epidermal Growth Factor Receptor Type 2 Expression with Site-Specific 18F-Labeled Protein Scaffold Molecules. J Nucl Med 49:804–813. doi: 10.2967/jnumed.107.047381 220. Tran T, Engfeldt T, Orlova A, et al (2007) Detection of HER2 Expression in Malignant Tumors. 2:1956–1964. 221. Kimura RH, Teed R, Hackel BJ, et al (2012) Pharmacokinetically stabilized cystine knot peptides that bind Alpha-v-Beta-6 integrin with single-digit nanomolar affinities for detection of pancreatic cancer. Clin Cancer Res 18:839–849. doi: 10.1158/1078-0432.CCR-11-1116 222. Cheng Z, De Jesus OP, Kramer DJ, et al (2010) 64Cu-labeled affibody molecules for imaging of HER2 expressing tumors. Mol Imaging Biol 12:316–24. doi: 10.1007/s11307-009-0256-6 223. Ahlgren S, Orlova A, Rosik D, et al (2008) Evaluation of maleimide derivative of DOTA for site-specific labeling of recombinant affibody molecules. Bioconjug Chem 19:235–43. doi: 10.1021/bc700307y 224. Tolmachev V, Friedman M, Sandström M, et al (2009) Affibody molecules for epidermal growth factor receptor targeting in vivo: aspects of dimerization and labeling chemistry. J Nucl Med 50:274–83. doi: 10.2967/jnumed.108.055525

148

225. Tolmachev V, Nilsson FY, Widström C, et al (2006) 111In-benzyl-DTPA- ZHER2:342, an affibody-based conjugate for in vivo imaging of HER2 expression in malignant tumors. J Nucl Med 47:846–53. 226. Kramer-Marek G, Kiesewetter DO, Capala J (2009) Changes in HER2 expression in breast cancer xenografts after therapy can be quantified using PET and (18)F- labeled affibody molecules. J Nucl Med 50:1131–9. doi: 10.2967/jnumed.108.057695 227. Boswell CA, Sun X, Niu W, et al (2004) Comparative in vivo stability of copper- 64-labeled cross-bridged and conventional tetraazamacrocyclic complexes. J Med Chem 47:1465–1474. 228. Ait-Mohand S, Fournier P, Dumulon-Perreault V, et al (2011) Evaluation of 64Cu- labeled bifunctional Chelate-Bombesin conjugates. Bioconjug Chem 22:1729– 1735. doi: 10.1021/bc2002665 229. Chan JY, Hackel BJ, Yee D (2017) Targeting Insulin Receptor in Breast Cancer Using Small Engineered Protein Scaffolds. Mol Cancer Ther 16:1324–1334. doi: 10.1158/1535-7163.MCT-16-0685 230. Fellouse FA, Wiesmann C, Sidhu SS (2004) Synthetic antibodies from a four- amino-acid code: a dominant role for tyrosine in antigen recognition. Proc Natl Acad Sci 101:12467–72. doi: 10.1073/pnas.0401786101 231. Smith CA, Kortemme T (2011) Predicting the Tolerated Sequences for Proteins and Protein Interfaces Using RosettaBackrub Flexible Backbone Design. PLoS One 6:e20451. doi: 10.1371/journal.pone.0020451 232. Sörensen J, Sandberg D, Sandström M, et al (2014) First-in-human molecular imaging of HER2 expression in breast cancer metastases using the 111In-ABY-025 affibody molecule. J Nucl Med 55:730–5. doi: 10.2967/jnumed.113.131243 233. Thurber GM, Zajic SC, Wittrup KD (2007) Theoretic criteria for antibody penetration into solid tumors and micrometastases. J Nucl Med 48:995–9. doi: 10.2967/jnumed.106.037069 234. Wittrup KD, Thurber GM, Schmidt MM, Rhoden JJ (2012) Practical theoretic guidance for the design of tumor-targeting agents. Methods Enzymol 503:255–68. doi: 10.1016/B978-0-12-396962-0.00010-0 235. Woldring DR, Holec P V., Hackel BJ (2016) ScaffoldSeq: Software for characterization of directed evolution populations. Proteins Struct Funct Bioinforma 84:869–874. doi: 10.1002/prot.25040 236. Boder ET, Wittrup KD (2000) Yeast surface display for directed evolution of protein expression, affinity, and stability. Methods Enzymol 328:430–44. 237. Lesniak WG, Chatterjee S, Gabrielson M, et al (2016) PD-L1 Detection in Tumors Using [64Cu]Atezolizumab with PET. Bioconjug Chem 27:2103–2110. doi: 10.1021/acs.bioconjchem.6b00348 238. Traxlmayr MW, Kiefer JD, Srinivas RR, et al (2016) Strong Enrichment of Aromatic Residues in Binding Sites from a Charge-neutralized Hyperthermostable Sso7d Scaffold Library. J Biol Chem 291:22496–22508. doi: 10.1074/jbc.M116.741314 239. Reetz MT, Kahakeaw D, Lohmer R (2008) Addressing the Numbers Problem in

149

Directed Evolution. ChemBioChem 9:1797–1804. doi: 10.1002/cbic.200800298 240. Le DT, Uram JN, Wang H, et al (2015) PD-1 Blockade in Tumors with Mismatch- Repair Deficiency. N Engl J Med 372:2509–2520. doi: 10.1056/NEJMoa1500596 241. Lipson EJ, Vincent JG, Loyo M, et al (2013) PD-L1 expression in the Merkel cell carcinoma microenvironment: association with , Merkel cell polyomavirus and overall survival. Cancer Immunol Res 1:54–63. doi: 10.1158/2326-6066.CIR-13-0034 242. Garon EB, Rizvi NA, Hui R, et al (2015) Pembrolizumab for the Treatment of Non– Small-Cell Lung Cancer. N Engl J Med 372:2018–2028. doi: 10.1056/NEJMoa1501824 243. Taube JM, Young GD, McMiller TL, et al (2015) Differential Expression of Immune-Regulatory Genes Associated with PD-L1 Display in Melanoma: Implications for PD-1 Pathway Blockade. Clin Cancer Res 21:3969–76. doi: 10.1158/1078-0432.CCR-15-0244 244. Frederick DT, Piris A, Cogdill AP, et al (2013) BRAF inhibition is associated with enhanced melanoma antigen expression and a more favorable tumor microenvironment in patients with metastatic melanoma. Clin Cancer Res 19:1225– 31. doi: 10.1158/1078-0432.CCR-12-1630 245. Heskamp S, Hobo W, Molkenboer-Kuenen JDM, et al (2015) Noninvasive Imaging of Tumor PD-L1 Expression Using Radiolabeled Anti-PD-L1 Antibodies. Cancer Res 75:2928–36. doi: 10.1158/0008-5472.CAN-14-3477 246. Li D, Zhu X-H (2016) Immuno-PET imaging using 89Zr labeled PD-L1 antibody in non-small cell lung cancer Xenograft. J Nucl Med 57:no pagination. 247. González Trotter DE, Meng X, McQuade P, et al (2017) In Vivo Imaging of the Programmed Death Ligand 1 by (18)F PET. J Nucl Med 58:1852–1857. doi: 10.2967/jnumed.117.191718 248. Kruziki MA, Case BA, Chan JY, et al (2016) 64 Cu-Labeled Gp2 Domain for PET Imaging of Epidermal Growth Factor Receptor. Mol Pharm 13:3747–3755. doi: 10.1021/acs.molpharmaceut.6b00538 249. Barbas CF, Hu D, Dunlop N, et al (1994) In vitro evolution of a neutralizing human antibody to human immunodeficiency virus type 1 to enhance affinity and broaden strain cross-reactivity. Proc Natl Acad Sci 91:3809–3813. doi: 10.1073/pnas.91.9.3809 250. Schellekens H (2005) Factors influencing the immunogenicity of therapeutic proteins. Nephrol Dial Transplant. doi: 10.1093/ndt/gfh1092 251. Adair F, Ozanne D (2002) The immunogenicity of therapeutic proteins. BioPharm 15:30–36. 252. Chaplin DD (2010) Overview of the immune response. J Allergy Clin Immunol 125:S3–S23. doi: 10.1016/J.JACI.2009.12.980 253. Fireman P, Fineberg SE, Galloway JA (1982) Development of IgE antibodies to human (recombinant DNA), porcine, and bovine insulins in diabetic subjects. Diabetes Care 5:119–125. 254. Moore W V., Leppert P (1980) Role of aggregated human growth hormone (hGH) in development of antibodies to hGH. J Clin Endocrinol Metab 51:691–697. doi:

150

10.1210/jcem-51-4-691 255. Schernthaner G (1993) Immunogenicity and allergenic potential of animal and human insulins. Diabetes Care 16:155–165. 256. van Beers MMC, Sauerborn M, Gilli F, et al (2011) Oxidized and Aggregated Recombinant Human Interferon Beta is Immunogenic in Human Interferon Beta Transgenic Mice. Pharm Res 28:2393–2402. doi: 10.1007/s11095-011-0451-4 257. Klitgaard JL, Coljee VW, Andersen PS, et al (2006) Reduced susceptibility of recombinant polyclonal antibodies to inhibitory anti-variable domain antibody responses. J Immunol 177:3782–90. doi: 10.4049/JIMMUNOL.177.6.3782 258. O’Hagan DT, Jeffery H, Davis SS (1993) Long-term antibody responses in mice following subcutaneous immunization with ovalbumin entrapped in biodegradable microparticles. Vaccine 11:965–969. doi: 10.1016/0264-410X(93)90387-D 259. Tangri S, Mothé BR, Eisenbraun J, et al (2005) Rationally engineered therapeutic proteins with reduced immunogenicity. J Immunol 174:3187–96. doi: 10.4049/JIMMUNOL.174.6.3187 260. Schein CH (1990) Solubility as a Function of Protein Structure and Solvent Components. Nat Biotechnol 8:308–317. doi: 10.1038/nbt0490-308 261. Wang W (1999) Instability, stabilization, and formulation of liquid protein pharmaceuticals. Int J Pharm 185:129–188. doi: 10.1016/S0378-5173(99)00152-0 262. Mosavi LK, Peng Z-Y (2003) Structure-based substitutions for increased solubility of a designed protein. Protein Eng 16:739–745. doi: 10.1093/protein/gzg098 263. Roosild TP, Choe S (2005) Redesigning an integral membrane K+ channel into a soluble protein. Protein Eng Des Sel 18:79–84. doi: 10.1093/protein/gzi010 264. Jenkins TM, Hickman AB, Dyda F, et al (1995) Catalytic domain of human immunodeficiency virus type 1 integrase: identification of a soluble mutant by systematic replacement of hydrophobic residues. Proc Natl Acad Sci U S A 92:6057–6061. doi: 10.1073/pnas.92.13.6057 265. Trevino SR, Scholtz JM, Pace CN (2008) Measuring and Increasing Protein Solubility. J Pharm Sci 97:4155–4166. doi: 10.1002/jps.21327 266. Giglio J, Fernández S, Rey A, Cerecetto H (2011) Synthesis and biological characterisation of novel dithiocarbamate containing 5-nitroimidazole 99mTc- complexes as potential agents for targeting hypoxia. Bioorganic Med Chem Lett 21:394–397. doi: 10.1016/j.bmcl.2010.10.130 267. Ono M, Arano Y, Mukai T, et al (2002) Control of radioactivity pharmacokinetics of 99mTc-HYNIC-labeled polypeptides derivatized with ternary ligand complexes. Bioconjug Chem 13:491–501. doi: 10.1021/bc010043k 268. Koehler L, Graf F, Bergmann R, et al (2010) Radiosynthesis and radiopharmacological evaluation of cyclin-dependent kinase 4 (Cdk4) inhibitors. Eur J Med Chem 45:727–737. doi: 10.1016/j.ejmech.2009.11.020

151

Appendix A: Immunogenicity of Gp2-EGFR

A.1 Introduction

Proteins of foreign origin, for example animal or bacterial proteins injected into humans,

often act as and elicit immune responses [250]. The immune response is a rapid

reaction, often occurring after one injection. The antibodies generated that bind to the drug

can affect pharmacokinetics [251]. Antibodies may also be neutralizing, leading to a loss

of efficacy or binding in therapeutic proteins or repeated diagnostic studies [75].

Proteins of non-human origin that are degraded and displayed on major histocompatibility

complexes act as antigens which T-cell receptors recognize and activates immune response

[252]. Normally, T-cells that bind self-antigens are suppressed, however, recombinantly produced fully-human or humanized proteins can also trigger a reaction due to improper

post-translational modifications, impurities, or other differences [75]. Even small

variations in protein sequence can have a large effect. For example, bovine and human

insulin have 3 different amino acids and trace amounts of bovine insulin cause an immune

response [253].

The stability of a protein is also important for immunogenicity. Protein aggregates have

been shown to increase antibody titer when aggregates of insulin and human growth

hormone [254, 255]. Degradation or other chemical modifications of the protein and

impurities can also have unforeseen immunogenicity consequences in vivo [256].

To predict human immune response, animal models may be used especially if the protein

is of neither human or animal origin and therefore foreign to both species. However, the 152

actual immune response in humans or other animal strains can vary [257]. Therefore,

animal models should only be used to help predict basic immune properties. Herein, we

evaluate an EGFR-targeted Gp2 domain for generation of anti-Gp2 antibodies in mice.

A.2 Methods

Gp2αEGFR2.2.3 (Gp2-EGFR) were produced in E. coli in and purified by metal affinity

column as before [248]. Purified Gp2 was allowed to react with 2,2′-(7-(1-carboxy-4-((2,5- dioxopyrrolidin-1-yl)oxy)-4-oxobutyl)-1,4,7-triazonane-1,4-diyl)diacetic acid n- hydroxysuccinimide ester (NHS-NODA-GA) (Chematech) at room temperature for 1 hr.

Conjugation was verified by mass spectrometry and NODAGA-Gp2 was purified by reverse phase high performance liquid chromatography (HPLC) with a 90% buffer A

(99.9% H2O, 0.1% trifluoroacetic acid (TFA))/10% buffer B (90% acetonitrile, 9.9% H2O,

0.1% TFA) to 10% A/90% B gradient over 15 minutes on a C18 column (Waters)..

Ga(NO3)3*xH2O was diluted in 100 mM sodium acetate pH 5 to ~100 mM and allowed to

chelate to NODAGA-Gp2 for 1 hr at 42 °C. The product was purified by HPLC and

resuspended in 10 mM sodium acetate pH 5.

Mice were injected by tail vein with 10 µg of Gp2, 10 µg of ovalbumin, or vehicle only in

100 µL sodium acetate weekly for 3 weeks. Saphenous vein blood draws were carried out

prior to each protein injection and weekly for 3 weeks after the final injection. 50 µL of

blood was collected and allowed to coagulate at room temperature for 15 min. Samples

were centrifuged at 2000g for 10 min at 4 °C. Supernatant was removed and stored at -20

°C.

153

10 pmol of Gp2, ovalbumin or buffer was adsorbed to a Nunc 96 well plate at 4 °C

overnight. Wells were washed 3 times with 200 µL Tris buffered saline (TBS) and tapped

dry. Wells were blocked with 400 µL 2% milk mixture in TBS for 2 h and washed 3 times

with TBS. Serum was diluted 250x and 1250x in TBS and 100 µL of it was added to the

well and incubated for 1 h at 4 °C. Wells were washed 3 times with TBS plus Tween 20.

Anti-mouse HRP was diluted 1:10,000 and 100 µL was added to wells and incubated for

30 minutes. Wells were washed 3 times with TBS plus Tween 20. TMB solution was

prewarmed and 100 µL was added to wells. Color was allowed to develop for 5-15 min.

100 µL of 1 N HCl was added to wells to stop the reaction and absorbance was measured

at 450 nm.

A.3 Results and Discussion

Mice were injected with a non-radioactive analogue of an epidermal growth factor receptor

binding Gp2 imaging agent (Gp2-EGFR) by tail vein weekly for 3 weeks to test whether

anti-Gp2 antibodies would develop. Ovalbumin was used as a positive control as it has

been previously shown to elicit an immune response [258]. An enzyme-linked immunosorbent assay was used to detect serum levels of antibodies able to bind to the injected protein (Figure A1). After week 3, both Gp2 and ovalbumin have detectable amounts of anti-protein antibody compared to sodium acetate only (Figure A1). Although nominally lower, Gp2-EGFR and ovalbumin are statistically equivalent except for week 4

(p = 0.02). More dilute serum also suggests only a small difference between Gp2-EGFR and ovalbumin response in week 5 and 6 (p < 0.05).

154

Figure A1. Immunogenic response to Gp2 and controls. Mice were injected with Gp2- EGFR or controls weekly for 3 weeks. Blood was drawn immediately prior to injections. Relative serum levels of anti-Gp2 (for vehicle and Gp2) or anti-ovalbumin antibodies by ELISA. Serum was diluted by 250x (left) and 1250x (right). The immune response of mice to repeated injections of Gp2-EGFR could be problematic if the same trend translates to humans. Neutralizing antibodies would strongly hinder binding, and non-neutralizing antibodies could modify the beneficial pharmacokinetics of the small Gp2-EGFR. Further experiments are needed to see if the response is similar in humans and if the response affects efficacy. If needed, immunogenicity engineering can be used to lower the prevalence of Gp2-EGFR response, as has been successfully used for

other proteins [259].

155

Appendix B: Gp2-EGFR framework hydrophilicity engineering

B.1 Introduction

Surface hydrophobicity is known to impact the solubility of proteins [260]. Low protein

solubility can lead to complications during production, shipping, and storage of protein

pharmaceuticals [261]. Site-directed mutagenesis has been previously used to successfully increase protein solubility in multiple cases [262–264]. However, not all hydrophobic-to-

hydrophilic substitutions increase solubility [265]. Increasing the solubility of Gp2 will

ease development and handling, especially in large scale production and storage during

clinical translation. Additionally, although Gp2 has been successfully used in in vivo

murine imaging [248] renal clearance due to the small size resulted in high kidney retention

complicating imaging for tissues in that region. High renal signal is typical of many small

scaffolds (cite), but multiple studies have provided evidence that hydrophobicity can affect

biodistribution, however no clear trend has emerged. Some studies indicate decreased

kidney uptake with increased hydrophobicity [196, 266], others have shown the opposite

trend [267], or no correlation [268]. Reducing the renal retention of Gp2 would improve imaging performance for tissues near the kidney. Here, we use rational design to add site- directed mutations to increase the hydrophilicity of Gp2.

B.2 Methods

B.2.1 Generating Hydrophilic Mutants

Solvent accessible surface area was calculated through the GetArea webserver [92], using

a 1.4 angstrom probe. Mutants were built by ordering DNA encoding the new amino acids 156

(IDT) and using overlap extension to build the gene. The gene and pET plasmid were

digested at 37 °C for 2h with the restriction enzymes NheI-HF and BamHI-HF (NEB). T4

DNA ligase (NEB) ligated the nick and the plasmid was transformed into E. coli.

B.2.2 Protein Production and Characterization

Protein was produced and characterized as previously described (Kruziki Gp2). Briefly,

proteins were produced in T7 express, or T7 SHuffle E. coli (NEB) when proteins

contained multiple cysteines, and were purified using immobilized metal affinity

chromatography and reverse phase high-performance liquid chromatography with a 90% buffer A (99.9% H2O, 0.1% trifluoroacetic acid (TFA))/10% buffer B (90% acetonitrile,

9.9% H2O, 0.1% TFA) to 10% A/90% B gradient over 15 minutes on a C18 column

(Waters). Affinity titrations were carried out using flow cytometry to measure the level of

fluorophore tagged anti-His6 antibodies binding to varying concentrations of Gp2 labeled

A431 which express epidermal growth factor receptor (EGFR). Binding curves were fit by

minimizing the sum of squared residuals. Melting temperature was measured by

monitoring ellipticity at 218 nm using circular dichroism over a temperature range of 25-

98 °C in 100 mM sodium acetate, pH 5.

B.3 Results and Discussion

B.3.1 Designed hydrophilic mutants

Gp2 performance can also be improved through optimization of the framework amino acids. In particular, we aimed to increase the hydrophilicity of the exposed Gp2 surface.

The solvent accessible surface area of each amino acid was calculated computationally

157 based on the solved crystal structure of Gp2 [98], and compared to its hydrophobicity

(Figure B1). Most buried amino acids have high hydrophobicity and exposed amino acids have low hydrophobicity, as expected. However, there are several framework amino acids that are hydrophobic and moderately exposed. We hypothesized that mutating these exposed amino acids to less hydrophobic amino acids could benefit Gp2 in solubility, production, and potentially in vivo biodistribution. Mutants were selected based on hydrophilicity and accessibility with minor consideration of theoretical stability and frequency in natural homologs.

158

Figure B1. Hydrophilicity of wild type Gp2. (A) Cartoon representation of Gp2 (PDB ID: 2wnm) showing hydrophobic amino acids concentrated in core of protein. (B) Amino acids with high hydrophobicity and high solvent accessible surface area show potential for mutation to more hydrophilic residues.

Analysis was performed in the context of an EGFR-binding Gp2 [248] to enhance

translational potential. Site-directed mutagenesis was used to create single mutants based.

Mutants and the parental Gp2 were produced in E. coli and purified. Target binding

strength was examined by labeling EGFRhigh A431 cells with 5 nM Gp2, which was detected with a fluorescently-tagged anti-His6 antibody via flow cytometry. An initial 159 screen of production titer and binding ability was used to identify potentially effective mutants (Figure B2). Both mutations to F13 drastically reduced production and binding.

Y18N, T21N, L22S, L26T (which was shown to bind better in a more thorough affinity assay), L28S, and T41N were chosen for further exploration. L22S was added to combination mutants without testing of the single mutant due to the throughput of the assays and wanting to examine every initially chosen site. Combination mutants without the L22S mutation will be made to control for possible detriment of L22S.

Figure B2. Single hydrophilic mutant characterization. The protein yield from 100 mL E. coli shake flask productions (triangle) and fluorescence of yeast labeled with 5 nM target and detected with fluorophore tagged streptavidin were used to select the best mutants to combine together.

B.3.2 Combination hydrophilic mutants

The tolerated mutants were combined to achieve a variant with even lower surface hydrophobicity. The six lead mutations were combined into a single clone, as well as each

5-site mutant, and the most hydrophilic 4- and 3-site mutants (Figure B3A). The mutants and parental protein were produced in E. coli and purified. Yield, full affinity titration 160

curves, and thermal stability was measured for each clone (Figure B3). Production yield

was negatively impacted for all combination mutants, especially the 6-site mutant which

had no measurable soluble phase production. Binding affinity remained relatively stable

compared to parental for most mutants. Thermal stability dropped significantly for all

mutants, although with denaturation midpoint temperatures of approximately 50 °C, even

the least stable clones are above temperatures required in many applications. However,

partial unfolding may be observed at temperatures relevant to in vivo imaging. To verify

that these mutations decreased the hydrophobicity of Gp2, reverse phase high pressure

liquid chromatography was used to compare elution times of parental Gp2 to the mutated

variants (Figure B3E). Parental eluted at 15.4 minutes, with mutated variants eluting off

the hydrophobic column earlier, suggesting that they are less hydrophobic.

Figure B3. Characterization of hydrophilic combination mutants. (A) The full combination 6-mutant, as well as every 5-mutant, and the most hydrophilic 4- and 3- mutants were built and were given one letter names. (B) Yield of Gp2 protein in soluble fraction from 1 L E. coli shake flask production. P denotes the parental EGFR binding Gp2. A did not appreciably produce. (C) Binding dissociation constant determined by labeling 161

high of A431 (EGFR ) cells. Binding was detected by fluorophore tagged anti-His6 antibody. (D) Melting temperature of Gp2 proteins measured by monitoring circular dichroism at 218 nm over 25-98 °C temperature scan. (E) Retention time of Gp2 proteins on hydrophobic reverse phase HPLC column. All mutants elute significantly earlier than parental except E (p < 0.05). Error bars are standard deviation with n > 2.

Optimizing the framework of a protein scaffold is one avenue to improve performance.

Surface hydrophobicity of a protein can drastically affect biophysical properties such as

solubility and aggregation which are important factors in downstream applications [265].

Hydrophobicity has also been shown influence biodistribution of other small protein

imaging agents [196]. Gp2 was tolerant to most single and combination rational mutations changing hydrophobic surface residues to more hydrophilic variants (Figures B2 and B3).

Although these mutations increased the hydrophilicity (Figure B2), no drastic increases in

solubility or production were seen. These mutations can potentially be transferred to other

Gp2 clones for increased hydrophilicity in applications where necessary.

162