Continuous Evolution of With Altered Specificity

The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters

Citation Packer, Michael Samuel. 2017. Continuous Evolution of Proteases With Altered Specificity. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.

Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:41141523

Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA

Continuous Evolution of Proteases with Altered Specificity

A dissertation presented !

by

Michael S. Packer

to

The Committee on Higher Degrees in Biophysics

in partial fulfillment of the requirements

!for the degree of

Doctor of Philosophy

!in the subject of !

Biophysics

Harvard University Cambridge, Massachusetts

April 2017

© 2017 Michael S. Packer

All rights reserved.

Dissertation Advisor: Professor David R. Liu Michael S. Packer

Continuous Evolution of Proteases with Altered Specificity

Abstract

The following thesis work aims to establish a method for the generation of proteases with tailor-made specificities. This programmability will enable the design of proteases that modulate the activity of a target protein of biotechnological or therapeutic relevance. In the first phase of this project, we developed and validated a system for the phage-assisted continuous evolution (PACE) of . In pilot feasibility studies, we used this system to evolve hepatitis C virus proteases that are resistant to small-molecule protease inhibitors. We then established that this PACE selection could also evolve specificity changes in two different model proteases: human rhinovirus (HRV) protease and tobacco etch virus (TEV) protease. After these proof-of-concept experiments, we began the phage-assisted continuous evolution (PACE) of TEV protease, which canonically cleaves

ENLYFQS, to cleave a very different target peptide sequence, HPLVGHM, that is present in IL-23, a cytokine implicated in inflammatory disease. A protease emerging from approximately 2,500 generations of PACE contains

20 non-silent mutations, cleaves human IL-23 at the target peptide bond, and inhibits IL-23 mediated signaling in cultured primary murine splenocytes. We characterized the substrate specificity of this evolved using a protease specificity profiling method, revealing a mixture of shifted and broadened specificity at the six positions in which the target sequence differed. On-going studies seek to expand the scope of protease PACE through: (1) the development of a negative selection scheme to ablate off-target , (2) the adaptation of disulfide compatible E. coli strains for the PACE of human circulating proteases, and (3) the evolution proteases to enable the destruction of intracellular target proteins.

iii

Table of Contents

Table of Contents ...... iv! Table of Figures ...... v! List of Tables ...... vi! Acknowledgements ...... vii! Introduction ...... 8! Chapter 1: Design and Validation of a Positive Selection Scheme for the Continuous Evolution of Proteases 11! Section 1.1: Transducing protease activity into expression ...... 12! Section 1.2: Linking protease activity to phage propagation ...... 20! Section 1.3: Continuous evolution of resistance to HCV protease inhibitors ...... 22! Section 1.4: Continuous evolution of model proteases with altered specificity ...... 28! Section 1.5: Discussion ...... 32! Section 1.6: Methods ...... 34! Chapter 2: Continuous Evolution of a Model Protease to Cleave a Disease-Associated Human Protein ...... 39! Section 2.1: Choice of target substrate and protease ...... 40! Section 2.2: Continuous Evolution of TEV Variants that Cleave the IL-23 Target Peptide ...... 44! Section 2.3: Characterization of Evolved TEV Protease Variants ...... 58! Section 2.4: Substrate Specificity Profiling of an Evolved TEV Protease ...... 61! Section 2.5: Specificity Profiling Reveals Functionally Independent TEV Mutation Groups ...... 68! Section 2.6: Evolved TEV L2F Cleaves Human IL-23 ...... 70! Section 2.7: Evolved TEV Protease Deactivates IL-23 and Prevents IL-17 Secretion ...... 72! Section 2.8: Discussion ...... 76! Section 2.9: Methods ...... 78! Chapter 3: Extending the Utility of Protease PACE ...... 86! Section 3.1: Design of a negative selection scheme to ablate off-target proteolysis...... 87! Section 3.2: Validation of simultaneous positive and negative selection in protease PACE...... 90! Section 3.3: Attempts at PACE of FDA-approved therapeutic proteases containing disulfide linkages...... 93! Section 3.4: Adapting protease PACE to botulinum toxin light chain...... 95! Section 3.5: BoNT-LC can be evolved in protease PACE...... 99! Section 3.6: Evolutionary target substrates for BoNT-LC E...... 103! Section 3.7: Evolution of BoNT-LC E proteases that cleave PTEN...... 106! Section 3.8: Discussion...... 112! Section 3.9: Methods...... 114! References ...... 117!

iv

Table of Figures

Figure 1: Development of a system to link protease activity to ...... 14! Figure 2: Crystal structure of T7 lysozyme bound to T7 RNAP...... 15! Figure 3: Vector map of complementary plasmid (CP)...... 16! Figure 4: Vector map of expression plasmid (EP)...... 17! Figure 5: Vector map of accessory plasmid (AP)...... 18! Figure 6: Vector map of selection phage (SP)...... 19! Figure 7: Western blot showing PA-RNAP cleavage...... 20! Figure 8: PA-RNAPs link protease activity to phage propagation...... 22! Figure 9: HCV PA-RNAP response to protease inhibitors in E. coli cells...... 25! Figure 10: Vector map of mutagenesis plasmid (MP)...... 26! Figure 11: Continuous evolution of drug resistance in HCV protease...... 27! Figure 12: Vector map of accessory plasmid (AP) used in protease specificity reprogramming experiments...... 29! Figure 13: PACE of TEV proteases with altered P1 specificity...... 30! Figure 14: PACE of HRV proteases with altered P1’ specificity ...... 31! Figure 15: Evolutionary trajectories and representative evolved TEV protease genotypes...... 45! Figure 16: Luciferase activity assay of clones from the middle of PACE stage 1 of trajectories 1, 2, and 3...... 46! Figure 17: Luciferase activity assay after PACE stage 2 of trajectories 1 and 2...... 47! Figure 18: Luciferase activity assay after PACE stage 2 of trajectory 3...... 48! Figure 19: Luciferase activity assay of clones after PACE stage 4...... 49! Figure 20: Validation of protease PACE stringency modulation...... 50! Figure 21: Luciferase activity assay of clones after PACE stage 8...... 51! Figure 22: Epistatic interactions with TEV protease residue N177...... 58! Figure 23: Protein cleavage assay to identify the most active clone...... 59! Figure 24: HPLC assay of TEV protease kinetics...... 60! Figure 25: Evolved TEV protease cleaves wild-type, intermediate, and target substrates...... 61! Figure 26: Protease specificity profiling...... 63! Figure 27: Specificity profiles generated from libraries with three randomized substrate amino acids...... 66! Figure 28: TEV protease variants containing subsets of TEV L2F mutations are all active...... 68! Figure 29: Specificity profiles of TEV variants possessing wild-type-like specificity...... 69! Figure 30: Identification of IL-23 cleavage sites by Western blot and LC-MS...... 70! Figure 31: Identification of the cleavage site within IL-23 heterodimer by mass spectrometry...... 71! Figure 32: Identification of two cleavage sites within IL-23 monomer by mass spectrometry...... 71! Figure 33: Protease-mediated attenuation of IL-17 secretion in mouse splenocytes...... 73! Figure 34: Western blot of pre-mixed additives to splenocyte cell culture...... 73! Figure 35: Western Blot of pre-mixed additives to splenocyte cell culture...... 74! Figure 36: TEV L2F catalytically deactivates IL-23 and prevents IL-17 secretion in mouse splenocytes...... 74! Figure 37: TEV L2F must be pre-incubated with IL-23 to prevent IL-17 secretion in mouse splenocytes...... 75! Figure 38: TEV L2F is unaffected by the addition of FBS to in vitro cleavage assays...... 75! Figure 39: Schematic representation of negative selection components...... 87! Figure 40: Luciferase assay screen of polymerase variants for an orthogonal PA-RNAP...... 88! Figure 41: Vector map of negative selection AP...... 89! Figure 42: Simultaneous selection for cleavage of ENLYAQS and against cleavage of ENLYFQS...... 91! Figure 43: Luciferase assay of disulfide containing proteases in the presence of Erv1 and DsbC...... 94! Figure 44: SNARE-derived PA-RNAPs exhibit protease-dependent gene expression with cognate BoNT-LC...... 97! Figure 45: SNAP25 residues 187-206 disrupt PA-RNAP leading to high background transcription...... 98! Figure 46: PACE-evolved BoNT-LC B and F variants with high apparent activity on wild-type substrate...... 100! Figure 47: Luciferase assay of PACE-evolved BoNT-LC E clones exhibit apparent proteolysis of SNAP23...... 102! Figure 48: Luciferase assay of wild-type BoNT-LC E on a panel of point mutant substrates...... 107! Figure 49: Luciferase assay of evolved BoNT-LC E exhibit apparent proteolysis of stepping-stone two and three. 109!

v

List of Tables

Table 1: Target peptide scoring matrix based upon wild-type TEV specificity...... 40! Table 2: Protease target substrates identified by specificity scoring...... 41! Table 3: TEV protease mutations after PACE on HNLYFQS substrate...... 41! Table 4: Target peptide scoring matrix based upon known evolutionary potential...... 42! Table 5: Target substrates identified by TEV protease evolutionary potential...... 43! Table 6: PACE stage 1 mutations...... 46! Table 7: PACE stage 2 mutations...... 47! Table 8: PACE stage 2 mutations...... 48! Table 9: PACE stage 3 mutations...... 52! Table 10: PACE stage 4 mutations...... 53! Table 11: PACE stage 5 mutations...... 54! Table 12: PACE stage 6 mutations...... 55! Table 13: PACE stage 7 mutations...... 56! Table 14: PACE stage 8 mutations...... 57! Table 15: Kinetic parameters of wild-type and evolved TEV proteases...... 60! Table 16: Phage display enrichment values from selections on single site libraries...... 64! Table 17: Phage display enrichment values from selections on libraries with three randomized residues...... 67! Table 18: Plasmids used for PACE and protein expression...... 85! Table 19: Sanger and Illumina sequencing DNA primers...... 85! Table 20: Overnight propagation assay of simultaneous positive and negative selection...... 90! Table 21: Simultaneous positive and negative selection is compatible with protease PACE...... 90! Table 22: BoNT-LC B genotypes after 72h PACE on AP encoding VAMP1 60 amino acid substrate...... 99! Table 23: BoNT-LC F genotypes after 72h PACE on AP encoding VAMP1 60 amino acid substrate...... 99! Table 24: BoNT-LC E genotypes after 72h PACE on AP encoding SNAP25 substrates...... 100! Table 25: BoNT-LC E genotypes after PACE on AP encoding SNAP25 D179K or SNAP23...... 101! Table 26: Target peptide scoring matrix based upon wild-type BoNT-LC E specificity...... 104! Table 27: Candidate target substrates identified via wild-type BoNT-LC E specificity scoring...... 104! Table 28: Target peptide scoring matrix based upon BoNT-LC E specificity and evolutionary potential...... 105! Table 29: PTEN target substrate identified via revised scoring method...... 105! Table 30: Evolutionary stepping-stone substrates for lagoon 1...... 107! Table 31: Evolutionary stepping-stone substrates for lagoon 2...... 107! Table 32: Evolutionary stepping-stone substrates for lagoon 3...... 108! Table 33: Evolutionary stepping-stone substrates for lagoon 4...... 108! Table 34: BoNT-LC E genotypes after 72h PACE on AP encoding SNAP25 D179C or R180S substrates...... 108! Table 35: BoNT-LC E genotypes after 72h PACE on AP encoding SNAP25 D179C and R180S substrate...... 110! Table 36: BoNT-LC E genotypes after 72h PACE with mixing-transition from stepping-stone three to four...... 111!

vi

Acknowledgements

Alon Rivel, my husband, for seeing me through the entire PhD and beyond.

My mom, dad, and sister for raising me to be curious, creative, and brave (A.K.A. a Scientist).

David R. Liu for mentoring me and serving as a role model. He has combined a group of fantastic researchers with ample scientific resources to create an environment where anything seems possible.

Aleks Markovic for keeping the Liu Lab running like a well-oiled machine.

Bryan Dickinson for taking me on as a rotation student, and then graciously allowing me to continue his work in setting up a protease PACE system. Without his pioneering efforts my thesis studies could not have transpired

Travis Blum; it has been my honor to have someone even interested in continuing where my work left off. It is an even greater pleasure to see the amazing directions only you could take it.

Holly Rees for assistance both technical and emotional in the execution of primary splenocyte cell culture assays.

Sunia Trauger for saving me from LC-MS purgatory and helping me see the light at the end of the tunnel.

Jacob Carlson, Kevin Esvelt, Aaron Leconte (PACE forefathers whom I have never met); for all the hardwork that went into this technology before my time in the Liu Lab.

9AM Millis for being an incredible PACE troubleshooting resource (Jeff Bessen, David Bryson, Liwei Chen, Kevin

Davis, Bryan Dickinson, Sherry Gao, Johnny Hu, Basil Hubbard, Tim Roth, Ning Sun, David Thompson, Travis

Blum, Nicole Gaudelli, Tina Wang, Ben Thuronyi, Mary Morrison, Ahmed Badran, Lena Afeyan).

Ahmed Badran for answering all of my molecular biology questions and being a guiding force for all PACErs.

Nicole Gaudelli, Alexis Komor, Bill Kim for being amazing friends and coworkers. I am forever grateful for being included in your genius Cas9 projects. It was a welcome distraction and a great learning experience.

Current and former Liusers that I have not yet mentioned all played a critical role in making the Liu Lab a fantastic work place: Chihui An, Brian Chaikind, Ryan Hili, Rick McDonald, John Zuris, Brent Dorr, John

Guilinger, Margie Li, Lynn McGregor, Jia Niu, Adrian Berliner, Alix Chan, Jonathan Chen, Luke Koblan, Jon Levy,

Phillip Lichtor, JP Maianti, Chris Podracky, Weixin Tang, Dmitry Usanov, Ariel Yeh, Christina Zeina, Manda

Arbab, James Nelson.

Jim Hogle and Michele Jakoulov for running Harvard Biophysics. You have created a PhD program with unparalleled scientific and academic independence.

Funding from NIH, DARPA, Broad Institute, NSF Graduate Research Fellowship and IPSEN Pharmaceuticals.

vii

Introduction

Parts adapted from: Dickinson, Packer, Badran, Liu, Nature Communications. 5 (2014).

8

Among the more than 600 naturally occurring proteases that have been described1 are enzymes that have proven to be important catalysts of industrial processes, essential tools for proteome analysis, and life-saving pharmaceuticals2-5. Recombinant human proteases including , factor VIIa, and tissue plasminogen activator are widely used drugs for the treatment of blood clotting diseases4. Researchers have engineered or evolved industrial proteases with enhanced thermostability and solvent tolerance6, 7. Similarly, a handful of therapeutic proteases have been engineered with improved kinetics and prolonged activity8-10. The potential of proteases to serve as a broadly useful platform for degrading proteins implicated in disease, however, is greatly limited by the native substrate scope of known proteases. In contrast to the highly successful generation of therapeutic monoclonal antibodies with tailor-made binding specificities11, the generation of proteases with novel protein cleavage specificities has proven to be a major challenge. For example, efforts to engineer variants that exhibit the substrate specificity of the closely related protease were unsuccessful until researchers grafted the entire substrate-binding pocket, multiple surface loops, and additional residues from chymotrypsin12-15. This approach of replacing protease residues with amino acids from related proteases to impart specificity features from the latter16, 17 cannot provide proteases with specificities not already known among natural proteases, prompting researchers to instead turn to laboratory evolution to generate proteases with novel specificities18-22. Despite several decades of effort, no evolved proteases have yet been reported with more than one position of changed substrate specificity.

The evolution of a protease that can degrade a target protein of interest will almost always require changing substrate sequence specificity at more than one position, and thus may require many generations of evolution.

Continuous evolution strategies, which require little or no researcher intervention between generations23, therefore may be well-suited to evolve proteases capable of cleaving a target protein that differs substantially in sequence from the preferred substrate of a wild-type protease. In phage-assisted continuous evolution (PACE), a population of evolving selection phage (SP) is continuously diluted in a fixed-volume vessel by an incoming culture of host E. coli24. The SP is a modified M13 bacteriophage genome in which the evolving gene of interest has replaced gene III, a gene essential for phage infectivity. If the evolving gene of interest possesses the desired activity it will trigger expression of gene III from an accessory plasmid (AP) in the host cell, thus producing infectious progeny encoding active variants of the evolving gene. The mutation rate of the SP is controlled using an inducible mutagenesis plasmid (MP) such as MP6, which upon induction increases the mutation rate of the SP by > 300,000-fold25.

9

Because the rate of continuous dilution is slower than phage replication but faster than E. coli replication, mutations only accumulate in the SP.

Here we describe the development and application of a system for the continuous of proteases. This system uses an engineered protease-activated RNA polymerase (PA-RNAP) to transduce polypeptide cleavage events into changes in gene expression that support phage propagation during PACE. We validate that this system successfully links the phage life cycle to protease activity for three distinct proteases. When performed in the presence of danoprevir or asunaprevir, two hepatitis C virus (HCV) protease inhibitor drug candidates currently in clinical trials, protease PACE rapidly evolved HCV protease variants that are resistant to each drug candidate. The PACE-evolved HCV protease variants are dominated by mutations previously observed in patients treated with these drug candidates. Encouraged by these results, we then used PACE to evolve both human rhinovirus (HRV) and tobacco etch virus (TEV) proteases that cleave substrates containing single amino acid substitutions at critical positions in their substrate recognition sequences. Together, these findings establish protease

PACE as a platform to reveal the vulnerability of protease inhibitors to the evolution of drug resistance and to rapidly generate proteases with novel specificities.

We then applied this platform to dramatically reprogram the specificity of TEV protease, which natively cleaves the consensus substrate sequence ENLYFQS, to cleave a target sequence, HPLVGHM, that differs at six of seven positions from the consensus substrate and is present in an exposed loop of the pro-inflammatory cytokine IL-

23. After constructing a pathway of evolutionary stepping-stones and performing ~2,500 generations of evolution using PACE, the resulting proteases contain up to 20 amino acid substitutions, cleave human IL-23 at the intended target peptide bond, and block the ability of IL-23 to stimulate IL-17 production in a murine splenocyte assay. These results establish a strategy for generating proteases with substrate specificities changed at several positions and the ability to cleave proteins implicated in human disease.

After such dramatic progress with a model enzyme (TEV protease), we sought to expand the purview of protease PACE to a number of FDA-approved protease therapeutics. Although, we were unable to adapt PACE to the evolution of enzymes containing disulfide linkages, which comprise the majority of therapeutic proteases, FDA approved C. botulinum toxins contain a metalloprotease domain absent of any disulfide linkages. These proteases have proven to be amenable to substrate specificity reprogramming in PACE, and because these toxins mediate intracellular delivery, they open up novel therapeutic avenues that we are only just beginning to explore.

10

Chapter 1: Design and Validation of a Positive Selection Scheme for the

Continuous Evolution of Proteases

Parts adapted from: Dickinson, Packer, Badran, Liu, Nature Communications. 5 (2014).

11

Section 1.1: Transducing protease activity into gene expression

PACE requires that a target activity be linked to changes in the expression of an essential phage gene such as gene III (gIII). To couple the cleavage of a polypeptide substrate to increases in gene expression, we engineered a

PA-RNAP that transduces proteolytic activity into changes in gene expression that are sufficiently strong and rapid to support PACE. T7 RNA polymerase (T7 RNAP) is naturally inhibited when bound to T7 lysozyme26. We envisioned that T7 lysozyme could be tethered to T7 RNAP through a flexible linker containing a target protease cleavage site. Ideally, the effective concentration of the tethered T7 lysozyme with respect to T7 RNAP would be sufficiently high that the T7 RNAP subunit would exist predominantly in the T7 lysozyme-bound, RNAP-inactive state. Proteolysis of the target sequence would disfavor the bound T7 RNAP:T7 lysozyme complex, resulting in the liberation of an active T7 RNAP and expression of gIII placed downstream of a T7 (Figure 1A).

N-terminal fusions to T7 RNAP are known to be well tolerated, and in the crystal structure of T7 RNAP bound to T7 lysozyme, the C-terminus of T7 lysozyme is only 32 Å from the N-terminus of T7 RNAP, separated by a solvent-exposed channel27 (Figure 2). In light of this structural information, we linked the two proteins through these proximal termini. Since T7 lysozyme activity is toxic to host E. coli cells, we characterized catalytically inactive lysozyme variants and found that the inactive C131S lysozyme mutant retained its ability to inhibit T7

RNAP without impairing host cell viability.

To identify T7 RNAP–T7 lysozyme linkers that promote complex formation and result in an inactive polymerase subunit yet permit efficient proteolysis, we screened a small set of linkers consisting of Gly, Ser, and

Ala ranging in length from three to ten residues flanking each side of a target protease substrate. We designed PA-

RNAP constructs containing linker peptide sequences known to be cleaved by tobacco etch virus (TEV) protease,

HCV protease, or human rhinovirus-14 3C (HRV) protease. We assayed T7 RNAP activity using a luciferase reporter and observed that T7 lysozyme linked to T7 RNAP through at least 28 residues including the target protease substrate resulted in significant inhibition of RNAP activity (Figure 1B). To assay RNAP activation, we coexpressed each PA-RNAP variant from a plasmid (the complementary plasmid or CP, Figure 1C and Figure 3) together with each of the three proteases (expressed from the expression plasmid or EP, Figure 4) in E. coli cells that also harbored a plasmid encoding gIII and luciferase under control of the T7 promoter (the accessory plasmid or AP,

Figure 1C and Figure 5).

12

Expression of a protease that is not known to cleave the target amino acid sequence in a coexpressed PA-

RNAP did not result in enhanced gene expression as measured by luciferase activity (Figure 1D). In contrast, expression of a protease that is known to cleave the target sequence within the PA-RNAP resulted in 18- to 49-fold increase in gene expression for all three cognate combinations of protease and substrate. These data indicate that

PA-RNAPs are capable of transducing specific proteolytic cleavage activities into large changes in target gene expression.

13

Figure 1: Development of a system to link protease activity to gene expression. (a) Protease-activated RNA polymerase (PA-RNAP). T7 RNAP is fused to the natural inhibitor T7 lysozyme through a linker containing a protease target substrate sequence. While the linker is intact, the complex preferentially adopts the lysozyme-bound, RNAP-inactive state. Proteolysis of the target sequence favours dissociation of the complex, freeing active T7 RNAP to transcribe downstream of the T7 promoter. This study used an accessory plasmid (AP) in which the T7 promoter drives a tandem gIII-luciferase (lux) cassette. (b) Sequences of the protein linkers containing a target protease substrate used for each PA-RNAP, with T7 lysozyme residues in blue, protease substrates in red, T7 RNAP residues in green and linker regions in black. (c) Plasmids used for protease PACE. An AP that has gIII and luciferase (lux) under the control of the T7 promoter serves as the source of gIII in the cells. A complementary plasmid (CP) constitutively expresses a PA-RNAP variant with a protease target substrate sequence embedded in the linker. (d) PA-RNAP gene expression response in E. coli cells. Host cells were transformed with (i) an AP containing the T7 promoter driving gIII-lux; (ii) a CP that constitutively expresses a PA-RNAP including the TEV protease substrate, the HCV protease substrate, or the HRV protease substrate; and (iii) a plasmid that expresses TEV protease (orange bars), HCV protease (purple bars), or HRV protease (grey bars). Gene expression is activated only when the expressed protease cleaves the amino-acid sequence on the PA-RNAP sensor. The luminescence experiment was performed in triplicate with error bars indicating the standard deviation.

14

32 Å

T7 Lysozyme

T7 RNAP

Figure 2: Crystal structure of T7 lysozyme bound to T7 RNAP. Generated from PDB-1ARO27. The crystal structure of T7 RNAP in complex with the transcriptional inhibitor T7 lysozyme reveals that the carboxy-terminus of T7 lysozyme is approximate 32 angstroms away from the amino- terminus of T7 RNAP. This proximity makes possible the generation of an auto-inhibited fusion protein linked through a small flexible polypeptide linker.

15

Tuesday, May 20, 2014 1:39 PM Page 1 of 1 1303-1169

r te o 0 A 0 ad p m 0 a 15A ro 5 o p rig l in a n r r e r t n in b b t e l 1 a r 0 m p 0 i r 0 n o

a m

t o o

r t

e

r

Complementary Plasmid (CP) R

B S

4

0

s e d

0 5628 bp 8 0 m

y z

o s y l

r e k n T 0 i 7 0 L RN 0 AP 2 r e e c k n 3 in e 0 L u 00 q se on iti gn co re se tea Pro

Figure 3: Vector map of complementary plasmid (CP). This plasmid was used in HCV protease inhibitor resistance experiments. The complementary plasmid supplies constitutive expression of the protease-activated RNA polymerase (lysozyme-linker-T7 RNAP).

16

Tuesday, May 20, 2014 1:37 PM Page 1 of 1 pJC166d - SP R6K pBAD HCV

cat promoter

3500 r to a R6 in K gam ma rm or te igi l n a t n 0 a o 0 c 5 ti 0 0 c 3 0 e ir id f b 1

B n o t /

4

1 P Expression Plasmid (EP)

2 5 3542 bp 0

0 H

C 0 V 0 p 0 r o 1 t e a C s a e r a

2 00 R 0 0 P BS 50 B S 1 AD D p 8 ro mo ter Pc promoter

Figure 4: Vector map of expression plasmid (EP). This plasmid was used for arabinose-inducible protease expression during luminescence assays as well as for the purification of recombinant HCV protease.

17

Tuesday, May 20, 2014 1:40 PM Page 1 of 1 pJC173b - T1 term sd8 geneIII

bla promoter tor ina rm l te na tio ec 00 ir 60 R

id ) ep

b R A p r

m

L o

rf (A t a a

/o l

b n

tA i

e m

t r

1 e

0 t

0 l

0 a

n

o i

p t

p

S i

C r 1 c

0 0 s 1

0 n r

0 a

r e t

5 t o

Accessory Plasmid (AP) 1

m B

o

n

r

r

r

p

6533 bp 7

T

x ) l t u 0 x 8 n A 0 d e B 0 s d 2 n S e B p 4 R e 0 d 0 I 0 II e m y z o s ly 7 00 (T 30 or at in m ter T1 Figure 5: Vector map of accessory plasmid (AP). This plasmid was used in HCV protease inhibitor resistance experiments. The accessory plasmid supplied gene III and luxAB under control of the T7 promoter. The rrnB1 T1 terminator was placed downstream of the T7 promoter to lower the background transcription in the absence of protease that would activate the PA-RNAP.

18

Tuesday, May 20, 2014 1:38 PM Page 1 of 1 CE-SP-T7 P314T

i ) or l (+ tia complete (+) ori en gin ss tion e ca Gene II pro pli mot re er -) r ( al fo gn r si e g im in r g p a k A c G N a e R p II n 0 e 0 V 0 p 5 r o m 1 o 0 t 0 e 0 r

V X I

Selection Phage (SP) V

G

e

n I

I e 5892 bp V

I V r

4 e I t p 0 I X I I o r o 0 V m m 0 o o r t p e I I r I e g s a 0 I te 0 ro 0 r p 2 o V t HC a r in S to m B VI c r R fa te G er o 8 en ot c D e 300 m a S I p 0 ro 4 ro p NS mo 102 ter J23 Gene VI RBS

Figure 6: Vector map of selection phage (SP). This construct was used in HCV protease inhibitor resistance experiments. In this selection phage, gene III has been replaced by HCV protease followed by a J23102 promoter to replace the portion of the gene III coding sequence that also contains the gene VI promoter.

19

Section 1.2: Linking protease activity to phage propagation

Next we sought to use PA-RNAPs to link the life cycle of M13 bacteriophage to protease activity. We generated selection phage (SP) in which gIII was replaced by a gene encoding TEV protease, HCV protease, or

HRV protease (Figure 6). Without pIII, these phage are unable to propagate on wild-type E. coli cells. We engineered host E. coli cells containing two plasmids: (i) an AP that contains gIII and luciferase under the control of the T7 promoter, and (ii) a CP that constitutively expresses a PA-RNAP (Figure 1C). To be sure that the PA-RNAP selection scheme works as intended we analyzed the cleavage of the sensor by Western blot. We observed the loss of the Lysozyme-RNAP fusion and the formation of a new protein that corresponds to the size of T7 RNAP exclusively in the presence of protease phage that recognizes the host encoded PA-RNAP (Figure 7). To assay whether the host cells could support phage propagation in a protease-dependent manner, we performed activity- dependent plaque assays. We observed that plaque formation, a consequence of phage replication in solid media, only occurred with phage encoding a protease that can cleave the PA-RNAP within the host cells. Phage with mismatched protease/PA-RNAP combinations did not form plaques, indicating that phage encoding non-cognate proteases do not replicate, or replicate at a significantly reduced rate. These observations together establish that the

PA-RNAP system is capable of transducing protease activity of a phage-encoded protease into phage production.

PA-RNAP HCV HRV TEV

Phage None HCV HRV TEV None HCV HRV TEV None HCV HRV TEV

PA-RNAP 118kDa

Cleaved T7 RNAP 99kDa

Figure 7: Western blot showing PA-RNAP cleavage. The PA-RNAP is cleaved only by the protease that is known to recognize the target sequence. Band sizes of ~120 kDa and ~100 kDa correspond to the full PA-RNAP construct and the cleaved RNAP, respectively.

We next tested if the PA-RNAP-based selection supports the continuous propagation of phage encoding active proteases in the continuous liquid culture format required for PACE (Figure 8A). We maintained three host cell cultures, each harboring a CP expressing a PA-RNAP containing one of the three protease cleavage sites (TEV,

HCV, or HRV protease substrates), using chemostats diluted with fresh growth media at a fixed rate28. Each of these

20

host cell cultures continuously diluted lagoons seeded with various combinations of phage containing TEV, HCV, or

HRV protease. Lagoons seeded with phage encoding cognate proteases that can cleave the PA-RNAP within the host cells robustly propagated (108-1010 pfu mL-1 after 72 hours of continuous dilution at 1.0 lagoon volume per hour), while lagoons seeded with phage encoding proteases that do not match the PA-RNAP of incoming host cells washed out (< 104 pfu mL-1), demonstrating protease activity-dependent propagation in continuous liquid culture.

In order to determine if this system can selectively replicate phage carrying protease genes with a desired activity at the expense of phage encoding proteases that are unable to cleave the host-cell PA-RNAP, we performed protease phage enrichment experiments in a PACE format. We seeded a lagoon with a 1,000:1 ratio of TEV

SP:HCV SP, then allowed the phage to propagate in the lagoon while being continuously diluted with host cells containing a PA-RNAP with the HCV protease recognition site. We periodically sampled the waste line of the lagoon and amplified by PCR the region of the phage containing the protease genes. The TEV protease and HCV protease genes are readily distinguishable as PCR amplicons of distinct lengths. At the start of the experiment the

HCV protease phage were virtually undetectable by PCR amplification of the starting population and gel electrophoresis, while TEV protease dominated the lagoon (Figure 8B). After just 24 h of continuous propagation on host cells containing the HCV PA-RNAP, the TEV protease SPs were undetectable, while the HCV protease SPs were strongly enriched (≥ 100,000-fold enrichment over 24 hours).

We repeated this experiment with a 1,000-fold excess of HCV protease phage over TEV protease phage using host cells containing the TEV protease PA-RNAP (Figure 8C), and a third time using a 1,000-fold excess of

TEV protease phage over HRV phage and host cells containing the HRV protease PA-RNAP (Figure 2D). In all three of the enrichment experiments, continuous propagation rapidly and dramatically enriched phage encoding each cognate protease from a minute fraction of the starting phage mixture, while non-cognate proteases washed out of the lagoon (Figure 8). Collectively, these results indicate that this protease PACE system successfully links specific protease activity to the phage life cycle in a continuous flow format and can strongly and rapidly enrich phage that encode proteases with the ability to cleave a target polypeptide substrate.

21

Figure 8: PA-RNAPs link protease activity to phage propagation. (a) The protease PACE system. Fixed volume vessels (lagoons) contain phage in which gIII is replaced with a gene encoding an evolving protease. The lagoon is fed with host cells that contain an AP with the T7 promoter driving gIII and a CP that expresses a PA-RNAP. Phages infect incoming cells and inject their genome containing a protease variant. Only if the protease variant can activate the PA-RNAP by cleaving the linker encoding the target protease substrate, gIII is expressed and that SP can propagate. (b–d) Enrichment of active proteases from mixed populations using PACE. At time 0, a lagoon was seeded with a 1,000-fold excess of non-cognate protease-encoding phage over cognate protease-encoding phage. The lagoon was continuously diluted with host cells containing a PA-RNAP with either the HCV (b), TEV (c) or HRV (d) protease substrates. Lagoon samples were periodically analysed by PCR. In all three cases, phage encoding the cognate protease were rapidly enriched in the lagoon while phage encoding the non-cognate protease were depleted.

Section 1.3: Continuous evolution of resistance to HCV protease inhibitors

As an initial application of protease PACE, we continuously evolved protease enzymes to rapidly assess the drug resistance susceptibility of small-molecule protease inhibitors. Several HCV protease inhibitors are in late- stage clinical trials or are awaiting FDA approval29, 30. For some HCV protease inhibitor drug candidates, clinically isolated drug resistance mutations are known31. First we tested whether small-molecule HCV protease inhibitors can modulate protease activity in the protease PACE system. We observed that the incubation of host cells with either

32 33 danoprevir (IC50 = ~0.3 nM) or asunaprevir (IC50 = ~1.0 nM) , two second-generation HCV protease inhibitors, inhibited the cellular gene expression arising from the activity of HCV protease on the HCV PA-RNAP in a dose- dependent manner (Figure 9). These observations suggest that protease inhibitors can create selection pressure during PACE favoring the evolution of protease mutants that retain their ability to cleave a cognate substrate despite the presence of the drug candidates.

22

Based on the relationship between protease inhibitor concentration and gene expression in our system

(Figure 9) and initial trial PACE experiments, we selected 20 µM danoprevir as the final concentration to use in the culture media during attempts to continuously evolve drug-resistant HCV proteases. We inoculated two separate lagoons with HCV protease SP and propagated the phage on host cells containing the HCV protease PA-RNAP in the absence of any inhibitor for 6 h to allow the accumulation of mutations in HCV protease genes. Next, we added

20 µM danoprevir to the media that feeds into the host cell culture, and eventually into each of the two replicate lagoons. As a control, we propagated two replicate lagoons of HCV protease phage on HCV protease PA-RNAP host cells with no added protease inhibitor for the same time period. Throughout all of these experiments, we induced enhanced mutagenesis of the phage genome by activating a second generation mutagenesis plasmid (MP) in the host cells with 0.5% arabinose (Figure 10).

Phage populations at 6 and 28 h from replicate lagoons were analyzed by high-throughput DNA sequencing. No mutations were substantially enriched in the control lagoons propagated in the absence of any drug candidate (Figure 11C). In contrast, several mutations rapidly evolved in both replicate lagoons in the presence of danoprevir. Mutations at position D168 were predominant among these mutations. By 28 h, lagoon 1 with danoprevir contained 38.8% D168E, 8.3% D168Y, 2.1% D168A, and 1.1% D168V, while lagoon 2 with danoprevir contained 40.3% D168E and 10.7% D168Y (Figure 11C). Other genetic differences between the SPs of these two replicate populations, such as R130C (5.1% in lagoon 1, undetectable in lagoon 2) and T72I (10.8% in lagoon 2, undetectable in lagoon 1), suggest that cross-contamination did not lead to the observed protease variants in these experiments. These findings reveal that the presence of danoprevir caused the population of continuously evolving proteases to rapidly acquire mutations at D168.

To assay whether the PACE-evolved mutations confer danoprevir drug resistance in HCV protease, we purified recombinant HCV protease variants containing either of the two most highly enriched mutations, D168E

and D168Y. Each of these two mutations increase the IC50 of danoprevir by ~30-fold (wild-type HCV protease IC50

= 1.3 ± 0.1 nM; HCV protease D168E IC50 = 38.9 ± 2.4 nM; HCV protease D168Y IC50 = 34.4 ± 2.8 nM; IC50 ± standard deviation) (Figure 11D). Importantly, the D168E, D168A, and D168V mutations emerging from protease

PACE have been previously identified as common drug-resistance mutations in HCV isolated from patients treated with danoprevir31, 34.

23

To validate that protease PACE in the presence of a different HCV protease inhibitor can also result in the rapid evolution of drug-resistance mutations, we repeated PACE of HCV protease in the presence of asunaprevir, an

HCV protease inhibitor in phase III clinical trials, instead of danoprevir. We selected 75 µM asunaprevir as the final target concentration to use in the culture media based on dose-dependent gene expression assays (Figure 9). In order to allow diversity to emerge in the protease population, we first propagated HCV protease phage for 24 h without any inhibitor. Next, to ensure that the populations had sufficient time to evolve mutations that confer drug resistance, we propagated the populations for 24 h with 10 µM asunaprevir, Finally, we ramped up the asunaprevir concentration to 75 µM for 27 h in order to enrich those mutations that conferred robust drug resistance. HCV protease phage were also propagated for an identical amount of time without any added drug candidate for comparison. High-throughput DNA sequencing of phage populations at the end of the experiment revealed that mutations evolved at substantial levels in the asunaprevir-treated lagoons but not in the control samples (Figure

11C). In this experimental condition as well, mutations at position D168 were highly enriched. In the case of asunaprevir, however, the only substitution at this position to emerge at substantial levels from protease PACE was

D168Y, in contrast with the evolution of both D168E and D168Y during protease PACE with danoprevir.

In vitro assays of HCV proteases containing either mutation provides an explanation underlying the strong apparent preference of D168Y over D168E within asunaprevir-treated lagoons. D168Y increases the IC50 of asunaprevir by 30-fold, while D168E only increases the IC50 of asunaprevir by ~10-fold (wild-type HCV protease

IC50 = 6.9 ± 0.6 nM; HCV protease D168E IC50 = 53.5 ± 3.4 nM; HCV protease D168Y IC50 = 214.8 ± 31.9 nM;

IC50 ± standard deviation) (Figure 4e). Mutations at position D168 have been previously identified in replicon-based asunaprevir resistance experiments35 and the specific D168Y mutation has been observed to arise in hepatitis C patients treated with asunaprevir36. Collectively, these results establish that protease PACE in the presence of protease inhibitor drug candidates can very rapidly (1-3 days) reveal clinically relevant mutants that confer strong resistance to the drug candidates, without requiring extensive laboratory or clinical experiments.

24

Figure 9: HCV PA-RNAP response to protease inhibitors in E. coli cells. Host cells expressing the HCV PA-RNAP were incubated with HCV protease inhibitors danoprevir (a) or asunaprevir (b) for 90!min, followed by inoculation with HCV protease encoding phage. After 3!h, luminescence assays were used to quantify relative gene activation resulting from the PA-RNAP. Luminescence experiments were performed in triplicate with error bars depicting the standard deviation.

25

Tuesday, May 20, 2014 1:39 PM Page 1 of 1 pAB086k8 - MP CloDF13 pBAD dnaQ926 dam seqA cat (clone #5)

cat prom r oter inato erm al t on cti 00 re 50 di bi B n o clo /t at DF 4 c 1 1 3 P

r r n B 1

t s r

1 a e

0 n q

0 s A 0 0 c 0 A r

N i q 0 p e

a t 4 s i

o t i

n v Mutagenesis Plasmid (MP)

a

e l

R

t e

C

B

a r

r m S

a

5183 bp i n

a

t

o r

d a m

M o 3 d 0 0 i 0 dn Q926 0 fie 0 a 0 d 2 D m A u pB tS R 5 BS sd RBS

Figure 10: Vector map of mutagenesis plasmid (MP). This plasmid was used in HCV protease inhibitor resistance experiments. This plasmid provides arabinose-inducible expression of mutator genes dnaQ926, dam, and seqA.

26

Figure 11: Continuous evolution of drug resistance in HCV protease. (a,b) PACE condition timeline for evolution in the presence of danoprevir (a) or asunaprevir (b). The blue arrows indicate arabinose-induced enhanced mutagenesis, and the red arrow shows the timing and dosing of HCV protease inhibitors. (c) High-throughput sequencing data from phage populations in replicate lagoons (L1 and L2) subjected to danoprevir treatment at 28!h, asunaprevir treatment at 75!h, and no drug at 72!h. All mutations with frequencies >1% above the allele-specific error rate are shown. (d) In vitro analysis of danoprevir inhibition of mutant HCV proteases that evolved during PACE. (e) In vitro analysis of asunaprevir inhibition of mutant HCV proteases that evolved during PACE. For (d,e), evolved HCV protease variants were expressed and purified, then assayed using an internally quenched fluorescent-substrate (Anaspec). In vitro analyses were performed in triplicate with error bars calculated as the standard deviation.

27

Section 1.4: Continuous evolution of model proteases with altered specificity

Encouraged by these results, we next attempted to evolve proteases that cleave substrates containing single amino acid substitutions at critical positions in their substrate recognition sequences. In our first attempt at reprogramming substrate specificity we attempted to recapitulate the results of a previously published test case. A yeast-display screen was used to evolve TEV protease variants that cleave the single mutant substrate ENLYFE/S, a substrate known not to be accepted by wild-type TEV protease22. For these studies and all future studies involving specificity reprogramming, we used a single accessory plasmid (as opposed to the aforementioned CP+AP combination) to supply the protease-activated RNAP as well as gIII under control of the T7 promoter (Figure 12). In order to reprogram protease substrate specificity, we used a mixing strategy in which we performed 24 h of PACE on host cells expressing PA-RNAPs containing the wild-type substrate (ENLYFQ/S) to allow the protease population to diversify and increase in activity, followed by 24 h on a 1:1 mixture of two host strains each expressing either a PA-RNAP containing wild-type (ENLYFQ/S) or a PA-RNAP containing mutant (ENLYFE/S) substrate linkers, and finally 24-48 h of PACE on host cells expressing only the PA-RNAP containing the mutant substrate linker (Figure 13A). This protocol yielded TEV protease variants containing mutations at residues D148 and N177 both of which were reported as crucial determinants of TEV specificity at the modified P1 substrate position (Figure 13B). These TEV protease variants showed apparent activity on the ENLYFE/S substrate in luciferase assays and in vitro using purified proteins as shown in Figure 13C and Figure 13D.

In a second preliminary study, we sought to evolve HRV protease to cleave a target substrate (LEVLFQ/Y) that is known not to be cleaved by wild-type HRV protease37. Nevertheless, using the aforementioned mixing strategy PACE rapidly evolved HRV protease mutants that can cleave LEVLFQ/Y (wild-type substrate =

LEVLFQ/G) (Figure 14A). Virtually all evolved clones contained either mutation T143A or T143P (Figure 14B). In the NMR structure of HRV protease bound to substrate, residue T143 is position directly next to the substrate P1’ residue38. We hypothesize that mutations T143A or T143P alleviate the steric clash which the larger mutant substrate residue Tyr (as opposed to the wild-type Gly). The evolved HRV protease variants exhibit apparently proteolytic activity on both the wild-type and mutant target substrates in luciferase assays, whereas wild-type protease has no detectable activity on the mutant substrate (Figure 14C). These results collectively validate the ability of PACE to access protease mutants with the ability to cleave non-cognate substrates.

28

Tuesday, February 7, 2017 3:07 PM Page 1 of 1 MSP955

10000 or nat mi RepA er R CD t mp S nb A rr bla promo B ter p o S r 8 C P d rrn 1 s B 0 S 1 1 B e te R m rm zy in o a s t ly o e r r t e R i k B S in T S t L 7 T u 1 s p C d r r t e 8 e o V r k m m

E n i

o t

T i n

L

e t

e a

t

r A t

Accessory Plasmid I o I

I

/ r

o

( r T

f

L 7

T l

b y

7 i s d o

R

i z

N r y e m

A c

P t e

io d e

n p

a e n

l d

t e

e n t rm ) in ator B xA xlu

5000

Figure 12: Vector map of accessory plasmid (AP) used in protease specificity reprogramming experiments. A single accessory plasmid encodes constitutively expressed (proB promoter) PA-RNAP (Lysozyme-Linker- TEVcutsite-Linker-T7RNAP) as well as T7 promoter-controlled expression of gIII and the translationally coupled luciferase reporter (xluxAB). A lysozyme-dependent terminator (T1 terminator) is placed downstream of the T7 promoter to lower transcription of gIII-xluxAB in the absence of active protease. This plasmid encodes the low copy pSC101 origin of replication and is maintained with carbenicillin.

29

A B L1A E107D D148A N177K 0 24 48 72 96 h L1B N177K L1C D148A N177K L1D N177K AP L1E N177K ENLYFQS ENLYFES L1F Q73R S135F stop223 50% ENLYFQS + L1G S135F stop223 50% ENLYFES L1H S135F stop223 C D 60 GENLYFQSA ENLYF_/S Q E Q E Q E Protease NA NA WT WT N177K, N177K, GENLYFESA D148A, D148A, 50 E107D E107D

40

30

20

10 Fold Change in Luminescence in Fold Change 0 Wild Type N177K N177K, D148A TEV Protease Genotype

Figure 13: PACE of TEV proteases with altered P1 specificity. (A) PACE experimental condition timeline. Time in hours is depicted above the arrow; while protease substrate sequences used within the PA-RNAP are depicted beneath the arrow. The mixing period is indicated as a 50:50 mixture of two host strains expressing PA-RNAPs containing either the wild-type or the mutant substrate linker. (B) Genotypes after 96 h of PACE. Each row represents the genotype of TEV protease sequenced from a single clone of SP. Mutations at S135, D148 and N177 are strongly enriched. (C) Luminescence assays of PACE evolved SP clones. SP encoding TEV proteases bearing both the N177K and D148A mutations exhibit enhanced apparent activity on the single mutation ENLYFE/S substrate. (D) Protein cleavage assays of PACE-evolved TEV proteases. Approximately 1 µg of protease was incubated with 5 µg of a fusion protein construct in which MBP is linked to GST through a cleavable substrate linker containing the peptide ENLYFQ/S or ENLYFE/S. PACE-evolved TEV protease variant is capable of cleaving both the wild-type and mutant substrates.

30

A 0 24 48 72 96 h

AP LEVLFQG LEVLFQY 50% LEVLFQG + 50% LEVLFQY B C LEVLFQGP (WT) L1A T143P 81 71 LEVLFQYP L1B 61 L1C T143A 51 L1D T143P 41 L1E T143P 31 L1F T143A 21 L1G T143P 11 L1H T143P 1 Fold Change in Luminescence in Fold Change WT T143P T143A HRV Protease Genotype

Figure 14: PACE of HRV proteases with altered P1’ specificity (A) PACE experimental condition timeline. Time in hours is depicted above the arrow; while protease substrate sequences used within the PA-RNAP are depicted beneath the arrow. The mixing period is indicated as a 50:50 mixture of two host strains expressing PA-RNAPs containing either the wild-type or the mutant substrate linker. (B) Genotypes after 96 h of PACE. Each row represents the genotype of HRV protease sequenced from a single clone of SP. Mutations at T143 are strongly enriched. (C) Luminescence assays of PACE evolved SP clones. SP encoding HRV proteases bearing either the T143P and T143A mutations exhibit enhanced apparent activity on the single mutation LEVLFQ/Y substrate.

31

Section 1.5: Discussion

Previous efforts to use laboratory evolution to study HCV protease inhibitor resistance have relied on time- and labor-intensive approaches such as viral replication in mammalian cell culture or conventional protein evolution methods, which typically require months to complete39. By comparison, the continuous evolution of proteases can reveal key resistance mutations in as little as ~1 day of PACE. The speed of PACE and its ability to be multiplexed using many lagoons in parallel, each receiving a different drug candidate, and analyzed by high-throughput DNA sequencing of bar-coded lagoon samples, raises the possibility of screening future early-stage hit or lead compounds for their vulnerability to the evolution of drug resistance, before more resource-intensive optimization of in vivo properties or clinical trials take place. Rapid and cost-effective access to drug resistance susceptibility enabled by

PACE may enhance the more informed selection of more promising early-stage drug candidates for further development. This technique could also be applied to quickly screen a drug candidate across many distinct genotypic variants of a protease target (such as the six major HCV protease genotypes) to reveal each target variant’s potential to evolve mutations that abrogate the effectiveness of the drug candidate. As HCV patient isolates and replicon assays have already demonstrated differing drug resistance profiles among different HCV genotypes,40 this capability could also be used to rapidly identify patient-specific drug treatments that are more likely to offer long-term therapeutic effects on patients infected with specific HCV strains, even in the absence of previous data relating strain genotypes to drug effectiveness.

The development of protease PACE also expands the scope of PACE to evolve diverse biochemical activities. Prior to the publication of these results, PACE studies had only evolved RNA polymerases, which have activities that can be directly linked to changes in gIII expression. This study demonstrates how other types of enzymatic activities with no obvious direct connection to gene expression can nevertheless be evolved using PACE by establishing an indirect, but robust, linkage between the activity of interest and gIII expression. Since the publication of this work the Liu Lab has developed and/or published PACE selection schemes for evolving the activities of amino-acyl tRNA synthetases, high-affinity protein binding domains (Bt toxins, monobodies, etc), DNA binding domains (TALEs, Cas9, etc), CRISPR endoribonucleases, and enzymatic pathways that biosynthesize polyhydroxyalkanoates.

The protease PACE system provides a strong foundation for the continuous evolution of proteases with reprogrammed specificities. Previous work on reprogramming the DNA substrate selectivity of T7 RNAP

32

enzymes24, 28, 41, 42 demonstrated the ability of PACE to rapidly evolve enzymes that accept substrates very different from the native substrate. These reprogramming experiments relied on a “stepping-stone” strategy in which selection phage are transitioned between a series of intermediate substrates42, and are enhanced by the recent development of modulated selection stringency and negative selection during PACE28. In proof of principle experiments we have shown that these strategies coupled with protease PACE also enable the continuous evolution of model proteases with altered specificity for substrates containing a single amino acid substitution. These studies establish a foundation for subsequent research programs in which proteases are evolved with the tailor-made ability to selectively cleave proteins implicated in human diseases.

33

Section 1.6: Methods

PA-RNAP gene expression response in vivo. All plasmids were constructed by Gibson Assembly 2x Master Mix

(NEB); all PCR products were generated using Q5 Hot Start 2x Master Mix (NEB). E. coli strain S103028 were transformed by electroporation with three plasmids: (i) a complementary plasmid (CP) that constitutively expresses a PA-RNAP with one of the three protease cut sites (Figure 3), (ii) an accessory plasmid (AP, Figure 5) that encodes gIII-luciferase (translationally coupled) under control of the T7 promoter, and (iii) an arabinose-inducible expression plasmid for one of the three proteases (EP, Figure 4). The HRV protease gene was purchased as IDT gblocks and cloned into the expression vector. The MBP-TEV fusion protein was amplified by PCR from pRK79343. The MBP fusion was necessary for expression and solubility. We deployed a constitutively active HCV protease construct that includes the NS4a peptide44. Cells were grown in 2xYT media to saturation in the presence of antibiotics and 1 mM glucose, then inoculated into 1 mL fresh media containing 1 mM glucose and antibiotics in a 96 well culture plate. After 4.5 h, 150 µL of the cultures were transferred to a black-wall clear-bottom assay plate and

luciferase and OD600 measurements were taken using a Tecan Infinite Pro plate. The luminescence data was normalized to cell density by dividing by OD600.

Western blot of PA-RNAP sensor activation. E. coli cells transformed with an AP and a CP were grown to log phase, then infected with a 10-fold excess of protease-encoding phage. After 4.5 h, the cells were harvested by centrifugation at 5000 g for 10 min, and then resuspended in LDS Sample Buffer (Life Technologies). Samples were heated to 95 °C for 5 m and vortexed to shear genomic DNA. 4 µL of each sample was loaded onto a protein gel electrophoresis system (Bolt gel system, Life Technologies). The blot was performed using a PVDF membrane

(iBlot 2 system, Life Technologies). The membrane was blocked with 5% BSA TBST then incubated overnight with the primary antibody (5% BSA, TBST, 1:5000 anti-T7 RNAP mouse monoclonal, Novagen #70566). The membrane was washed three times, incubated with the secondary antibody (5% BSA, TBST, 1:5000 donkey anti- mouse, IR-dye conjugate, LI-COR #926-32212) for 60 min, washed three times, then visualized on a LI-COR

Odyssey at 800 nm. As seen in Figure 7, the PA-RNAP sensor is proteolyzed to a smaller band of anticipated molecular weight only in the presence of a cognate protease that can cleave the peptide sequence in each PA-RNAP linker.

34

Protease activity-dependent plaque assays. Protease phage were cloned using Gibson assembly and the aforementioned expression plasmids as templates. E. coli strain S1030 were transformed by electroporation with an

AP and a CP. After the transformed host cells were grown in 2xYT to OD600 ~ 1.0, 100 µL of cells were added to 50

µL of serial dilutions of protease-encoding phage. After one minute, 800 µL of top agar (7g / L agar in 2xYT) was added, mixed and transferred to quarter-plates containing bottom agar (15g / L agar in 2xYT). After overnight incubation at 37 °C, the plates were examined for plaques, which represent zones of slowed growth and diminished turbidity due to phage propagation.

PACE propagations and enrichment experiments. E. coli strain S1030 were transformed by electroporation with an AP, CP (one for each of the three PA-RNAPs), and a mutagenesis plasmid (MP, Figure 10) encoding arabinose- inducible expression of a dominant-negative mutator variant of dnaQ, wild-type dam, and wild-type seqA45-47.

Starter cultures were grown overnight in 2xYT supplemented with antibiotics and 1 mM glucose to prevent induction of mutagenesis prior to the PACE experiment. Host cell culture chemostats containing 80 mL of Davis rich media28 were inoculated with 2 mL of starter culture and grown at 37 °C with magnetic stir-bar agitation. At

-1 approximately OD600 1.0, fresh Davis rich media was pumped in at 80-100 mL h , with a chemostat waste needle set at 80 mL. This fixed dilution rate maintains the chemostat culture in late log phase growth, at which point it can be flowed into lagoons seeded with protease phage (initial titers were ~105 pfu mL-1). For these experiments, lagoon waste needles were set to maintain a lagoon volume of 15 mL, and host cell cultures were flowed in at 15-17 mL h-1.

Arabinose (10% w/v in water) was added directly to lagoons via syringe pump at 0.7 mL h-1 to induce mutagenesis.

Test propagations were conducted with cognate protease phage as well as non-cognate protease phage. Enrichment experiment lagoons were seeded with 1,000-fold excess of non-cognate protease phage. Lagoon samples were sterile-filtered at least every 24 h, and titers were assessed by plaque assay. Plaque assays were performed with

S1030 carrying pJC175e, a plasmid that supplies gIII under control of the phage-shock promoter28. Mock selections were monitored by PCR of the protease genes with forward primer (BCD582)

5'TGTTTTAGTGTATTCTTTCGCCTCTTTCGTT3' and reverse primer (BCD578)

5'CCCACAAGAATTGAGTTAAGCCCAATAATAAGAGC3' using filtered samples as templates. The distinct sizes of amplicons containing protease genes enabled evaluation of the relative abundance of cognate and non- cognate protease-encoding phage.

35

Inhibition of PA-RNAP response in host E. coli cells. Host cells were prepared by electroporation with an AP and the CP encoding the HCV-site PA-RNAP. We prepared 2xYT media with serial dilutions of inhibitors (danoprevir and asunaprevir, MedChemExpress) from stock solutions made in DMSO, and inoculated with a saturated starter culture of host cells. 150 µL cell cultures in a 96-well assay plate were incubated at 37 °C for 1.5 h to allow uptake of inhibitors, then infected with ~10 µL HCV protease phage (multiplicity of infection ~ 10). After 3 h of incubation at 37 °C, the luminescence of each culture was measured on a Tecan Infinite Pro plate reader and normalized to

OD600. In the absence of inhibitor, phage-encoded protease will activate the PA-RNAP leading to robust production of luciferase. Relative dose responses to inhibitors compared to control cells without drug were measured in triplicate.

Evolution of drug resistance in HCV protease using PACE. Host cells were the same as those in the HCV test propagation and enrichment experiments. Host cell culture chemostats containing 40 mL of Davis rich media were inoculated with 2 mL of starter culture and grown at 37 °C with magnetic stir-bar agitation. At approximately OD600

1.0, fresh Davis rich media was pumped in at 50 mL h-1, with a chemostat waste needle set at 40 mL. This adjustment was made to provide enough cell culture to feed two lagoons while also conserving media that contained small-molecule inhibitors. Lagoons were seeded with HCV protease phage and run in duplicate. Again, lagoon waste needles were set to maintain a lagoon volume of 15 mL, and host cell cultures were flowed in at 15-17 mL h-1.

Arabinose (10% w/v in water) was added directly to lagoons via syringe pump at 0.7 mL h-1 to induce mutagenesis.

After 6 h of propagation without any inhibitor, a filtered lagoon sample was taken, and danoprevir was added directly to the chemostat media at 20 µM with 2.5% DMSO to enhance solubility. A final time point was taken after 22 additional hours, and titers were measured by plaque assay on strain S1030 carrying pJC175e.

For the asunaprevir experiment, samples were taken every 12 h. After 24 h of propagation with no inhibitor, asunaprevir was added directly to the chemostat media at 10 µM with 2.5% DMSO. After an additional 24 h, asunaprevir dosage was increased to 75 µM and 5% DMSO. Titers were measured by plaque assay on strain

S1030 carrying pJC175e.

36

High-throughput sequencing of evolved populations. Strain S1030 carrying pJC175e were grown to saturation and used to inoculate fresh media. Host cells were infected with phage samples from the above PACE experiments and incubated for 5 h at 37 °C. DNA from infected cells was extracted using miniprep kits to yield concentrated template phage DNA (Epoch Life Science). PCR reactions were performed using Q5 Hot Start 2x Master Mix

(NEB) with a set of tiled primers. The PCR product from the first reaction was diluted ten-fold and 1 µL served as the template for the second PCR. The second PCR added Illumina adapters as well as barcodes; PCR products were purified from agarose gel (Qiagen) and quantified using the Quant-IT Picogreen assay (Invitrogen). Samples were normalized and pooled together to create a sequencing library at approximately 4 nM. The library was quantified by qPCR (KapaBiosystems) and processed by an Illumina MiSeq using the MiSeq Reagent Kit v3 and the 2x300 paired-end protocol. A single paired-end read of 600 bp is sufficient to cover the entire HCV protease gene.

Data was analyzed in MATLAB using the custom scripts. FASTQ files were automatically generated by the Illumina MiSeq. These files were already binned by sample barcodes and ready for transfer to a desktop computer. Each read was aligned to the wild-type HCV protease gene in the expected orientation using the Smith-

Waterman algorithm. Base calls with Q-scores below a threshold of 31 were converted to ambiguous bases, and the resulting ambiguous codons were turned into a series of three dashes for computationally efficient translation.

Ambiguous codons were translated into X’s, which were ignored when tabulating allele counts into a matrix. The script automatically cycled through each FASTQ file and saved the resulting allele count matrix in a separate subdirectory. At this stage, matrices for paired-end reads were added together and normalized to yield allele frequencies for each sample.

We relied on a wild-type control sample to assess PCR and sequencing bias. For this sample, we calculated the frequency of alleles that were not wild-type at each to yield the locus-specific error rate. We added

0.01(1%) to the locus-specific error rate to yield our variant call threshold. The allele frequency matrix for each sample was scanned for mutant alleles above the variant call threshold.

Purification and in vitro assays of evolved HCV variants. HCV protease variants were sub-cloned by Gibson assembly out of the phage genome and into the previously mentioned EP. EPs were transformed into NEB BL21

37

DE3 chemically competent cells. Starter cultures were grown to saturation, and 2 mL was used to inoculate 500 mL

LB. At OD600 = 0.6, cultures were transferred to 20 °C and induced with 0.5% arabinose for 6 h. Cells were harvested by centrifugation at 5,000 g for 10 m, and resuspended in lysis/bind buffer (50 mM Tris-HCl pH 8.0, 500 mM NaCl, 10% glycerol, 5 mM imidazole). Cells were lysed by sonication for a total of 2 m, and then centrifuged for 20 m at 18,000 g to clarify the lysate. Supernatant was flowed through 0.2 mL His-pur nickel resin spin columns that were equilibrated with binding buffer (Pierce-Thermo). Resin was washed with 4 column volumes of wash buffer (50 mM Tris-HCl pH 8.0, 500 mM NaCl, 10% glycerol, 20 mM imidazole). HCV protease was eluted in 4 column volumes of 50 mM Tris-HCl pH 8.0, 500 mM NaCl, 10% glycerol, 200 mM imidazole. Samples were further purified by size exclusion chromatography on a SuperDex 75 10/300 GL column (GE Healthcare). Size exclusion was performed in 50 mM Tris-HCl pH 8.0, 100 mM NaCl, 10% glycerol, 1 mM DTT. Protein concentrations were determined by UV280 on a Nanodrop machine and calculated using an extinction coefficient of

19,000 cm-1 M-1 and a molecular weight of 23 kDa.

In vitro assays were performed using the commercial HCV RET Substrate 1 (Anaspec), an internally quenched probe that fluoresces upon proteolytic cleavage, according to the manufacturer’s instructions. Protease and inhibitors were incubated in assay buffer at room temperature for 5 m prior to addition of substrate. Fluorescence was measured every 30 s for 20 m by a Tecan Infinite Pro plate reader (excitation / emission = 355 nm / 495 nm).

Assays were performed at 30 °C with 40 nM protease, 7.5 µM substrate, and varying concentration of inhibitors in a final volume of 100 µL per well in black-wall clear-bottom assay plate. The assay buffer contained 50 mM Tris HCl pH 8.0, 100 mM NaCl, 20% glycerol, 5 mM DTT. Assays were performed in triplicate, and initial reaction velocities were calculated and normalized to controls without inhibitor. The data was fit to the Hill Equation using

Igor Pro with base and max parameters fixed at one and zero respectively. The resulting fits yielded IC50 values and standard deviations of the estimate.

38

Chapter 2: Continuous Evolution of a Model Protease to Cleave a Disease-

Associated Human Protein

Parts adapted from Packer, Rees, Liu, (2017).

39

Section 2.1: Choice of target substrate and protease

The biochemistry, substrate specificity48, and structure49 of TEV protease have been extensive studied, making it an ideal target for directed evolution. Our initial search for TEV protease target substrates was devised to find substrates similar in sequence to those that wild-type TEV protease could cleave. We populated a matrix with activity scores for each of the 20 possible amino acids at each of the seven positions within the TEV recognition motif. These scores roughly correspond to the published in vitro activity levels of TEV protease on mutant peptides48. This “specificity” matrix (Table 1) was then used to rank all possible heptapeptides within all human extracellular proteins. From this ranking, we manually curated a handful of disease-associated proteins in which the target peptide sequences were predicted to be solvent exposed based on their crystal structures (Table 2). Target peptides HFPYSQY from CCR5 (an HIV co-receptor whose degradation could provide immunity) and HSSYRQR from PDL1 (an immunosuppressive agent often overexpressed in cancers and whose degradation could prompt an anti-cancer immune response) both share a common His substitution at P6.

amino acid P6 P5 P4 P3 P2 P1 P1' A 0 0 0.1 0 0 0 0.5 R 0 0 0 0 0 0 0 N 0 0 0 0 0 0 1 D 0 0 0.1 0 0 0 0 C -1 -1 -1 -1 -1 -1 1 Q 0 0 0 0 0 1 0 E 1 0 0 0 0 0.5 0 G 0 0 0.1 0 0 0 1 H 0.5 0 0.5 0 0 0.5 0.25 I 0 0 1 0 0 0 1 L 0 0 1 0 0.1 0 0 K 0 0 0 0 0 0 0 M 0 0 0 0 0 0 0.25 F 0 0 0 0 1 0 0 P -0.5 0 -0.5 -0.5 -0.5 -0.5 -0.5 S 0 0 0.1 0 0.1 0 1 T 0 0 0.5 0 0.1 0 0 W 0 0 0 0 0 0 0 Y 0 0 0 1 0 0 0 V 0 0 0.5 0.25 0 0 0 Table 1: Target peptide scoring matrix based upon wild-type TEV specificity. We created a subjective rating matrix based upon our knowledge of wild-type TEV protease substrate specificity as assessed by published in vitro peptide cleavage assays. Key features include high ratings for consensus residues ENLYFQS as well as substitutions with high biochemical activity. We also introduced penalties for Cys residues due to disulfide formation in mammalian target proteins and for Pro due to its unique structural properties.

40

TEV site E X L Y F Q S CCR5 H F P Y S Q Y PDL1 H S S Y R Q R TNFa L G G V F Q L Black = Matches consensus Red = Requires evolution Table 2: Protease target substrates identified by specificity scoring. Target substrates were identified from the human extracellular proteome based upon ratings calculated using the above scoring matrix (Table 1). These three substrates were manually curated based upon the disease relevance of the target protein and the solvent-accessibility of target peptide.

For this reason we introduced the P6 His substitution first, which at the time had no precedent in the literature. Using the aforementioned mixing strategy we transitioned SP populations from the wild-type ENLYFQS substrate to the single mutant HNLYFQS substrate. In this experiment, we enriched for variants containing mutation N171D or N176T (Table 3). After our execution of these experiments, it was independently reported that mutations N171D and N176T promote TEV protease tolerance at the P6 position for uncharged residues such as threonine and proline18.

L1 A N176S B Q226Stop C S135F D E S135F F G H S135F L2 A D90G N185S B N176T C N171D N177Y D N171D N177Y F G D136E N176D Table 3: TEV protease mutations after PACE on HNLYFQS substrate. This table contains the clonal sequencing data from the end of PACE in which SP population was transitioned from hosts expressing ENLYFQS substrate to hosts expressing the HNLYFQS substrate. Each row corresponds to a single clonal sequence of TEV protease from the SP.

We then attempted mixing experiments to access double mutant substrates (HNPYFQS, HNSYFQS,

HNLYRQS, HNLYFQR) that would bring our protease populations closer to activity on the evolutionary target peptides HFPYSQY and HSSYRQR. These PACE attempts resulted in phage washout indicating that these evolutionary challenges were too difficult. Due to a concern that an unprecedented two substitutions within the substrate would have a synergistic effect on activity, we sought to decouple the mutations and access corresponding single mutant substrates (ENPYFQS, ENSYFQS, ENLYFQR). Mixing experiments attempting to evolve

41

recognition of these three single mutant substrates either ended in extremely low SP titers with no enriched mutations or complete washout. For the P4 Pro and P4 Ser substitutions we also attempted NNK site-saturation mutagenesis of TEV protease residues 139, 169, 178, 214, and 216. This library washed out of the lagoons and did not yield TEV variants with enhanced activity on either mutant substrate. A broad statement of evolutionary potential cannot be made, however it seems to be far more challenging to use protease PACE to alter TEV protease specificity at the P4 position than at the P1 and P6 positions (which we successfully achieved in Sections 1.4 and 2.1 respectively).

Using wild-type protease specificity to identify target substrates may fail to identify targets with high evolutionary potential. Although wild-type TEV protease is extremely specific for P6 Glu and P1 Gln, our studies and those of others show it is relatively easy to evolve TEV variants that recognize substrates mutated at these positions. Specificity scoring penalizes such substrates with directed evolution precedent18, 19, 21, 22, instead we created an “evolvability” scoring matrix (Table 4) that assigns an evolution difficulty score to each of the 20 possible amino acids at each of the seven positions recognized by TEV protease. We used this matrix to rank all possible heptapeptides within all human extracellular proteins and created a new short list of candidate target substrates (Table 5).

amino acid P6 P5 P4 P3 P2 P1 P1' A -1 0 1 0 2 -1 0 R 0 0 0 -1 0 0 0 N 0 0 0 0 0 0 0 D 0 0 1 0 0 0 0 C -1 -1 -1 -1 -1 -1 -1 Q 0 0 0 0 0 5 0 E 3 0 0 0 0 4 0 G -1 0 1 0 0 -1 0 H 3 0 1 0 0 4 0 I -1 0 1 0 1 -1 0 L -1 0 1 0 1 -1 0 K 0 0 0 1 0 0 0 M 0 0 0 0 0 0 0 F -1 0 0 0 3 -1 0 P -1 0 -1 -1 -1 -1 -1 S 0 0 1 0 1 0 0 T 0 0 1 0 1 0 0 W 0 0 0 0 0 -1 0 Y 0 0 0 4 0 0 0 V -1 0 1 1 1 -1 0 Table 4: Target peptide scoring matrix based upon known evolutionary potential. We created a subjective rating matrix based upon our knowledge of TEV protease substrate specificity and evolution of TEV proteases that accept single substrate changes. Key features include high ratings for consensus residues ENLYFQS as well as substitutions with known evolutionary solutions such as P6 His, P1 His, and P1 Glu. We also introduced penalties for Cys residues due to disulfide formation in mammalian target proteins and for Pro due to its unique structural properties.

42

TEV site E X L Y F Q S IL2RA H F V V G Q M IL23A H P L V G H M SELE H L V A I Q N SAA1/2 S D K Y F H A Black = Matches consensus Green = Previously evolved Yellow = Accepted Red = Requires evolution Table 5: Target substrates identified by TEV protease evolutionary potential. Target substrates were identified from the human extracellular proteome based upon ratings calculated using the above scoring matrix (Table 4). These four substrates were manually curated based upon the disease relevance of the target protein and the solvent-accessibility of target peptide.

The resulting candidate target sequences included HPLVGHM, a peptide found in human IL-23. IL-23 is a pro-inflammatory cytokine secreted by macrophages and dendritic cells in response to pathogens and tissue damage, ultimately promoting an innate immune response at the site of injury or infection. This immune response is mediated by IL-23-dependent stabilization of Th17 cells, a class of T helper cells that produce pro-inflammatory cytokines IL-17, IL-6, and TNFα50. Hyperactivity of this pathway can lead to a variety of autoimmune disorders including psoriasis and rheumatoid arthritis51. Monoclonal antibodies that neutralize IL-23 are FDA-approved drugs for the treatment of psoriasis and show promise in late-stage clinical trials for other autoimmune disorders52. These studies suggest that a protease that catalyzes IL-23 degradation may have anti-inflammatory activity.

The target peptide HPLVGHM differs from the TEV consensus substrate sequence, ENLYFQS, at six of seven positions. Two of these substitutions are predicted to not substantially impact TEV protease activity due to its low specificity at positions P5 and P1’, while the other four substitutions occur at positions that are known to be crucial specificity determinants of wild-type TEV protease (P6 Glu, P3 Tyr, P2 Phe, and P1 Gln). Indeed, substitution of TEV substrate P2 Phe or P1 Gln with the corresponding IL-23 substrate residue (P2 Gly or P1 His) has been shown to reduce TEV protease activity by more than an order of magnitude in each case48. Encouragingly, we had previously introduced the P6 His substitution, while other researchers have used site-saturation mutagenesis and an elegant yeast display screen to identify TEV mutants that accept P1 His instead of the P1 Gln22.

43

Section 2.2: Continuous Evolution of TEV Variants that Cleave the IL-23 Target Peptide

We verified that SP expressing wild-type TEV protease propagate robustly on host cells expressing a PA-

RNAP containing the TEV consensus substrate, ENLYFQS, in the linker connecting T7 RNAP and T7 lysozyme. In contrast, replacing the PA-RNAP linker with the target peptide HPLVGHM results in a failure of phage to propagate and rapid phage washout, consistent with the inability of wild-type TEV protease, or TEV variants containing a handful of immediately accessible mutations, to cleave the target IL-23 peptide.

Given the lack of previous success evolving proteases with specificity changes at more than one substrate residue, we anticipated that successful evolution of TEV protease variants that cleave the IL-23 target would require multiple evolutionary stepping-stones28, 42, 53, 54 to guide evolving gene populations through points in the fitness landscape that bring them successively closer to activity on the final target substrate. We designed three evolutionary trajectories such that substrate changes known to strongly disrupt the activity of wild-type TEV protease, including P6 His, P2 Gly, and P1 His,48 were introduced in the earliest stepping-stones (Figure 15A). We confronted these challenging substitutions first while the evolving protease populations had access to variants with wild-type-like levels of activity, reasoning that the likelihood of success was higher while proteases had sufficient activity to exchange for altered specificity without falling below a minimum activity threshold needed to survive selection. We introduced these challenging substrate changes one stepping-stone at a time to minimize the risk of washout and to illuminate at each stage how mutations within TEV protease altered substrate specificity.

We began all three evolutionary trajectories (Figure 15A) by introducing the P6 His substitution

(HNLYFQS) into the PA-RNAP and expressing a site-saturation mutagenesis library of TEV protease from the SP.

Using NNK codons, we randomized TEV protease residues N171, N176, and Y178, all of which are proximal to the

P6 substrate residue. This first PACE yielded variants with enhanced apparent activity on the HNLYFQS substrate

(Figure 16) and genotypes were highly enriched for D127A + S135F + N176I, or I138T + N171D + N176T (Table

6). Mutations N171D and N176T have been previously characterized as allowing P6 tolerance for uncharged residues such as threonine and proline18.

44

A Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6 Stage 7 Stage 8 1a Trajectory 1 NNK Residues: 146, 148, 167, 177 HNLVGHS IL-23 (38-66) 1b NNK Residues: NNK Residues: HNLYGHS proB proA 171, 176, 178 209, 211, 216, 218 50% proB+ HPLVGHM 50% proA HPLVGHM Q649S NNK Residues: 2a HPLVGHM HNLYFQS ENLYGQS 146, 148, 167, 177 HNLVGHS IL-23 (38-66) 2b HNLYGHS proB proA Trajectory 2 50% proB+ HPLVGHM 50% proA HPLVGHM Q649S 3a Trajectory 3 NNK Residues: HPLVGHM 209, 211, 216, 218 HNLVGHS IL-23 (38-66) 3b proB proA HNLYFQS HNLYFHS HNLYGHS 50% proB+ HPLVGHM 50% proA HPLVGHM Q649S IL-23 (38-66)IL-23 (38-66) 50% HNLYFQS + HPLVGHM Q649S Q649S 50% HNLYFHS proA

B

PACE stage Trajectory 1 1 D127A S135F N176I 2 D127A S135F N176I V209M W211I M218F 3 D127A S135F T146A D148P N176I N177R V209M W211I M218F 4 D127A S135F T146A D148P N176I N177R V209M W211I M218F K229E 5 E2K D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop 6 E2K D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop 7 E2K D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop 8 E2K D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop

PACE stage Trajectory 2 1 I138T N171D N176T 2 D127A S135F N171D N176T V209M W211I M218F 3 D127A S135F T146S D148P N171D N176T N177M V209M W211I M218F K229E 4 D127A S135F T146S D148P F162S N171D N176T N177M V209M W211I M218F K229E 5 D127A S135F T146S D148P F162S N171D N176T N177M V209M W211I M218F K229E 6 R50G E107D D127A S135F T146S D148P F162S S170A N171D N176T N177M V209M W211I M218F K229E 7 T30A D127A S135F T146S D148P F162A S170A N171D N176T N177M V209M W211V K215E M218F K229E 8 T17S H28L T30A N68D E107D D127A F132L S135F T146S D148P S153N F162S S170A N171D N176T N177M V209M W211I M218F K229E

PACE stage Trajectory 3 1 D127A S135F N176I 2 D127A S135F D148A N176I 3 E107D D127A S135F T146A D148A N176I V209E W211L V216I M218W 4 E107D D127A S135F T146A D148A N176I V209F W211C M218L 5 H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop 6 H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop 7 H28Y T30A E107D D127A S135F T146A D148A S153N N176I V209F W211C K215E M218L Q226P P227A V228S K229stop 8 H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C K215E M218L Q226P P227A V228S K229stop Figure 15: Evolutionary trajectories and representative evolved TEV protease genotypes. (A) Across the eight stages of PACE along three diverging trajectories (shown in purple, blue, and orange), each arrow represents a PACE experiment with the corresponding substrate peptide and selection stringency parameters listed beneath the arrow. Increased selection stringency annotations are: Q649S (a T7 RNAP mutant with decreased transcriptional activity), proA (lower expression of substrate PA-RNAP), and IL-23 (38-66) (native IL-23 sequence in place of GGS linker). Numbers above the arrows denote TEV protease residues that were targeted in site- saturation mutagenesis libraries used to initiate that PACE experiment. In the first PACE experiment, wild-type TEV protease was mutagenized at the positions shown. All other libraries were generated using the protease genes emerging from the previous PACE stage as the PCR template. For PACE stages with no targeted mutagenesis, lagoons were inoculated with an aliquot of the phage population from the preceding experiment. (B) For each stage of each trajectory, a representative evolved protease clone emerging at the end of the stage (numbered in the first column) is shown. The mutations within each clone are listed as an individual row, illustrating the enrichment of new mutations during successive stages of PACE as well as genotypic differences between trajectories.

45

Trajectory A N171D N176T B R159I N171D N176T C S120N N176I E230A D D127A S135F N176I E D127A S135F N176I F D127A F132I S135F N176I G D127A S135F N176I L1 H D127A S135F N176I A S3R I138T N171D N176T B N12T I138T N171D N176T C I138T N171D N176T D I138T N171D N176T E I138T N171D N176T F I138T N171D N176T G I138T N171D N176T L2 1,2,3 H V36I I138T N171D N176T Table 6: PACE stage 1 mutations. Clonal sequencing data from the end of PACE stage 1 in Figure 15A (84 cumulative hours of evolution). Each row corresponds to a single clonal sequence of TEV protease from the SP. Lagoons 1 and 2 (L1 and L2) correspond to two separate biological replicate populations that underwent the same selection in PACE stage 1.

70000 ENLYFQS 60000 HNLYFQS 50000 40000 30000 20000

Luminescence/OD600 10000 0 WT N176I, R159T, R159T, I138T, Y178F N171D, N176T N171D, N176T N176T

Figure 16: Luciferase activity assay of clones from the middle of PACE stage 1 of trajectories 1, 2, and 3. TEV protease clones (corresponding genotypes are shown beneath the x-axis) after 36h of evolution on the first stepping-stone substrate show apparent proteolytic activity on both the wild-type substrate and the single mutant substrate HNLYFQS. Error bars represent the standard deviation of three technical replicates.

Next we pursued two parallel lines of PACE using either ENLYGQS (trajectories 1 and 2) or HNLYFHS (trajectory 3) as the second stepping-stone substrate. For trajectories 1 and 2, we diversified the population that emerged from PACE on the first stepping-stone (HNLYFQS) with NNK codons at TEV protease residues 209, 211, 216, and 218, which line the hydrophobic pocket that is occupied by the P2 Phe and performed PACE using host cells expressing ENLYGQS. The resulting population of TEV mutants is typified by the mutations N176I, V209M, W211I, M218F (Table 7), which confer apparent cleavage activity on both HNLYFQS and ENLYGQS substrates (Figure 17).

46

Trajectory A T70C T114P D127A S135F N176I V209A W211W V216I M218F B N176I V209M W211V V216V M218W C D127A S135F N176I V209M W211I V216V M218F E K65R D127A S135F N176I V209E W211L V216I M218W F Y11C D127A S135F N176I V209M W211I V216V M218W G D127V S135F N176I V209E W211L V216I M218W L6 1,2 H D127A S135F N176I V209M W211I V216V M218W Table 7: PACE stage 2 mutations. Clonal sequencing data from trajectories 1 and 2 after PACE stage 2 in Figure 15A (168 cumulative hours of evolution). 80000 ENLYFQS 70000 HNLYFQS 60000 ENLYGQS 50000 40000 30000 20000

Lumienscence/OD600 10000 0 uninfected WT L6A L6C L6G L6H

Figure 17: Luciferase activity assay after PACE stage 2 of trajectories 1 and 2. TEV protease clones from trajectories 1 and 2 (corresponding genotypes can be found in Table 7) after evolution on the second stepping-stone substrate, ENLYGQS, show activity on the wild-type substrate and both single mutant substrates (HNLYFQS/ENLYGQS). Error bars represent the standard deviation of three technical replicates.

In trajectory 3, we used a mixing strategy to access TEV proteases that could cleave the HNLYFHS stepping-stone double mutant substrate. Unlike PACE experiments initiated from a site saturation mutagenesis library, a mixing strategy relies on a transitional period of phage propagation on a mixture of two different host cell populations, one expressing an accepted substrate (HNLYFQS) and the other expressing the next stepping-stone substrate (HNLYFHS). Following this transitional period, the SP is propagated exclusively on hosts expressing the next stepping-stone substrate (HNLYFHS). The variants that emerged from this stage of trajectory 3 showed weak apparent activity on the double mutant substrate HNLYFHS (Figure 18), and only a single additional enriched mutation D148A (Table 8). Encouragingly, mutation at residue D148 has previously been reported to enable activity on ENLYFHS22.

47

Trajectory A D127A S135F N176I B D127A S135F N176I C S135F N176I D S120N S135F D148A N176I E D127A S135F N176I F D127A S135F N176I G D127A S135F D148A N176I R203Q L1 H S120N S135F D148A N176I A I138T N176T B I138T D148A N176T C I138T D148A N176T D I138T D148A N176T E I138T D148A N176T F I138T D148A N176T G I138T D148A N176T L2 3 H I138T D148A N176T Table 8: PACE stage 2 mutations. Clonal sequencing data from trajectory 3 after PACE stage 2 in Figure 15A (168 cumulative hours of evolution).

ENLYFQS 50000 HNLYFHS 40000 30000 20000 10000 0 Luminescence/OD600

Figure 18: Luciferase activity assay after PACE stage 2 of trajectory 3. TEV protease clones from trajectory 3 (corresponding genotypes can be found in Table 8) after evolution on the second stepping-stone substrate, HNLYFHS, show apparent activity on the wild-type substrate and the double mutant substrate, HNLYFHS. Error bars represent the standard deviation of three technical replicates.

Due to the low apparent activity of proteases emerging from this mixing experiment, we relied on the site- saturation mutagenesis strategy to evolve activity on the third stepping-stone, HNLYGHS. The TEV protease populations from trajectory 3 were randomized at sites implicated in P2 recognition (209, 211, 216, and 218), while for trajectories 1 and 2 the TEV protease population was randomized at sites 146, 148, 167, 177 as previously described for the reprogramming of TEV specificity at the P1 position22 (Figure 15A). The primers used to randomize TEV protease residues 167 and 177 must also encode the identity of intervening amino acids N171 and

N176. Although the population appeared to converge on N176I (Table 7), we reasoned that it was best to preserve genetic diversity at N176 by constructing one library with primers encoding N176I (trajectory 1) and another with

48

N171D + N176T (trajectory 2). Libraries constructed for all three trajectories were then subjected to PACE on host cells expressing the triple mutant substrate HNLYGHS. The variants emerging at this stage of trajectory 1 and 2 were enriched for mutations at residues 146, 148, and 177, consistent with acceptance of the newly introduced P1 substitution22. Similarly, clones from trajectory 3 exhibit mutations at residues 209, 211, and 218 that may promote acceptance of the newly added P2 Gly substitution. Regardless of trajectory, all clones emerging at this stage exhibit at least one mutation from each of three targeted mutagenesis libraries (Table 9), suggesting that they have evolved activity on the triple mutant substrate.

Given the known tolerance of TEV protease for amino acids at positions P5 and P1’, we speculated that proteases evolved to recognize the triple mutant substrate HNLYGHS might already exhibit activity on the final target substrate (HPLVGHM). Indeed, the populations arising from evolution on the triple mutant substrate successfully propagate in PACE on host cells producing the HPLVGHM substrate, and the resulting variants display weak apparent activity on the final target substrate (Figure 19). In order to evolve high levels of activity on the final target substrate from these weakly active mutants, we applied three strategies to increase selection stringency on all three trajectories: (1) express a lower concentration of the PA-RNAP substrate by using a weaker constitutive promoter (proA instead of proB)55; (2) substitute the flexible GGS linker that flanks our substrate with the native sequence from IL-23 (human IL-23 residues 38-66); and (3) introduce a mutation in the T7 RNA polymerase portion of the PA-RNAP that decreases transcriptional activity (Q649S)56. We confirmed that all three strategies indeed increased selection stringency (Figure 20).

200000 ENLYFQS 150000 HPLVGHM 100000 50000 0 Luminescence/OD600

Figure 19: Luciferase activity assay of clones after PACE stage 4. PACE evolved TEV SP clones (corresponding genotypes can be found in Table 10) from stage four of the evolutionary trajectories show proteolysis of HPLVGHM substrates within a protease-activated RNA polymerase as measured by downstream luciferase signal. These data indicate that the evolved enzymes were acquiring the desired phenotype, but higher selection stringency would be necessary in order to achieve catalytic activity similar to that of wild-type TEV protease. Error bars represent the standard deviation of three technical replicates.

49

35000 * p<0.005 uninfected ** p<0.0005 * unpaired t-test L2A 080215 30000 ** 25000 ** **

20000

15000

Luminescence/OD600 10000

5000

0 HPLVGHM IL23(38-66) HPLVGHM HPLVGHM IL23(38-66) proB proB proA proB proA Q649S Q649S

Figure 20: Validation of protease PACE stringency modulation. Using the highest activity TEV variant prior to stringency modulation in PACE, we performed protease-induced luminescence assays on APs that were expected to exert higher selection stringency. Prior to stringency modulation, the HPLVGHM proB AP exhibits robust protease-induced luminescence and 4.7-fold fold activation. When the flexible GGS-linkers in the PA-RNAP of the standard AP are replaced with the native sequence of IL-23 (amino acids 38-66) protease-induced luminescence is diminished (2.8-fold activation). When expression levels of the HPLVGHM PA-RNAP are lower due to a weaker constitutive promoter (proA instead of proB), we see much lower background and protease-induced luminescence as well as 3.3-fold activation. The introduction of deactivating mutation Q649S to the T7 RNAP portion of the PA-RNAP also causes a decrease in background and protease- induced luciferase signal (2.7-fold activation). When all three strategies are combined in a single AP, an even greater decrease in luciferase signal is observed (2.1-fold activation). Lower fold activation corresponds with higher selection stringency. Error bars represent the standard deviation of three technical replicates.

We first applied the lowered substrate concentration strategy using a mixing experiment to transition from proB to proA expression of the PA-RNAP; this experiment yielded modest changes in genotypes. Exploiting the ease of performing PACE on multiple lagoons in parallel, we implemented the other two strategies simultaneously on all three trajectories. The resulting six populations (trajectories 1a, 1b, 2a, 2b, 3a, and 3b; see Figure 15A) were carried forward into PACE on hosts expressing a PA-RNAP with both the IL-23 (38-66) linker and the attenuated

T7 RNAP mutant Q649S. In the final stage of PACE for all six populations, we used a proA promoter to generate less of the PA-RNAP containing the IL-23 (38-66) linker and the Q649S mutation. This series of stringency

50

modulation experiments produced variants with higher levels of apparent activity on the final HPLVGHM substrate

(Figure 21).

300000 ENLYFQS 250000 HPLVGHM

200000

150000

100000

Lumienscence/OD600 50000

0

Figure 21: Luciferase activity assay of clones after PACE stage 8. After multiple PACE stages with increasing positive selection stringency, many TEV protease variants (corresponding genotypes can be found in Table 14) exhibit markedly stronger apparent activity on the HPLVGHM substrate when compared with clones from previous PACE experiments such as those seen in Figure 19. Error bars represent the standard deviation of three technical replicates.

51

Trajectory A D127A S135F T146A D148P N176I N177M V209M W211I M218F B E106G D127A S135F T146A D148P N176I N177R V209M W211I M218F C D127A S135F T146A D148P N176I N177R V209M W211I M218F D D127A S135F T146R D148C H167P N176I N177G V209M W211I M218F E D127A S135F T146A D148P N176I N177W V209M W211I M218F F D127A S135F T146C D148P N176I N177M V209M W211I M218F G D127A S135F T146C D148P N177M S200G V209M W211I M218F L1 1 H D127A S135F T146C D148P N176I N177M V209M W211I M218F A D127A S135F T146S D148P N171D N176T N177M V209M W211I M218F K229E B D127A S135F T146C D148P N171D N176T N177M V209E W211L V216I M218W E223stop C D127A S135F T146S D148P N171D N176T N177M V209M W211I M218F K229E D D127A S135F T146C D148P N171D N176T N177M V209E W211L V216I M218W E V63I D127A F132S S135F T146S D148P N171D N176T N177M V209M W211I M218F K229E F D127A S135F T146S D148P N171D N176T N177M V209M W211I M218F K229E G D127A S135F T146C D148P N171D N176T N177M V209E W211L V21 6I M218W E223stop L2 2 H D127A S135F T146C D148P N171D N176T N177M V209M W211I M218F K229E C E107D D127A S135F T146A D148A N176I V209S W211I M218W D E107D D127A S135F D148A N176I V209E W211L V216I M218W Q226stop E E107D D127A S135F D148A N176I V209S W211I M218W

52 F E107D D127A S135F T146A D148A N176I V209E W211L V216I M218W

G G32R E107D D127A S135F T146A D148A N176I V209E W211L V216I M218W Q226stop L3 3 H E107D D127A S135F D148A N176I V209F W211C M218L

Table 9: PACE stage 3 mutations. Clonal sequencing data from trajectory 1, 2, and 3 after PACE stage 3 in Figure 15A (264 cumulative hours of evolution).

Trajectory A D127A S135F T146A D148P N176I N177W V209M W211I M218F K229E Q233stop B D127A S135F T146A D148P N176I N177W V209M W211I M218F K229E C P8L D127A S135F T146A D148P N176I N177W V209M W211I M218F K229E D D127A S135F T146A D148P N176I N177W V209M W211I M218F K229E Q233stop E D127A S135F T146A D148P N176I N177W V209M W211I M218F K229E Q233stop F D127A S135F T146A D148P N176I N177W V209M W211I M218F K229E Q233stop G D127A S135F T146A D148P R159K N176I N177R V209M W211I M218F K229E L1 1a H D127A S135F T146A D148P N176I N177R V209M W211I M218F K229E A D127A S135F T146S D148P F162S N171D N176T N177M V209M W211I M218F K229E B D127A S135F T146S D148P F162S N171D N176T N177M V209M W211I M218F K229E C D127A S135F T146S D148P F162S N171D N176T N177M V209M W211I M218F K229E D D127A S135F T146S D148P F162S N171D N176T N177M V209M W211I M218F K229E E D127A S135F T146S D148P N171D N176T N177M V209M W211I M218F K229E F D127A S135F T146S D148P F162S N171D N176T N177M V209M W211I M218F K229E G D127A S135F T146S D148P F162S N171D N176T N177M V209M W211I M218F Q226stop K229E L2 2a H D127A S135F T146S D148P F162S N171D N176T N177M V209M W211I M218F K229E A N12H E107D D127A S135F T146A D148A N176I V209F W211C M218L Q226stop B E107D D127A S135F T146A D148A N176I V209F W211C M218L P227S D E107D D127A S135F T146A D148A N176I V209F W211C M218L E K67N E107D D127A S135F T146A D148A S152N N176I V209F W211C M218L F E107D D127A S135F T146A D148A N176I V209F W211C M218L G E107D D127A S135F T146A D148A N176I V209F W211C M218L L3 3a H E107D D127A S135F T146A D148A N176I V209F W211C M218L A E106G D127A S135F T146S D148P N176I N177F V209M W211I M218F K229E B T118S D127A S135F T146A D148P N176I N177R V209M W211I M218F K229E C D127A S135F T146A D148P N176I N177R V209M W211I M218F K229E 53 D T118S D127A S135F T146A D148P N176I N177R V209M W211I M218F K229E

E D127A S135F T146A D148P N176I N177R V209M W211I M218F K229E F D127A S135F T146A D148P N176I N177R V209M W211I M218F K229E G N12T D127A S135F T146A D148P N176I N177R V209M W211I M218F K229E L4 1b H R80G D127A S135F T146A D148P N176I N177R V209M W211I M218F K229E A D127A S135F T146S D148P N171D N176T N177M V209M W211I M218F K229E B D127A S135F T146S D148P N171D N176T N177M V209M W211I M218F K229E D D127A S135F T146S D148P N171D N176T N177M V209M W211I M218F K229E E K6E D127A S135F T146S D148P N171D N176T N177M V209M W211I M218F K229E F T17A D127A S135F T146S D148P N171D N176T N177M V209M W211I M218F K229E L5 2b G N68D D127A S135F T146S D148P N171D N176T N177M V209M W211I M218F K229E A E106G E107D D127A S135F T146A D148A N176I V209F W211C M218L B E107D D127A S135F T146A D148A N176I M218F Q226stop C T30A R50K E107D D127A S135F T146A D148A N176I V209F W211C M218L D E107D D127A S135F T146A D148A N176I M218F Q226stop E E107D D127A S135F T146A D148A N176I M218F Q226stop F E107D D127A S135F T146A D148A N176I M218F Q226stop G R9C E107D D127A S135F T146A D148A N176I V209F W211C M218L L6 3b H E106G E107D D127A S135F T146A D148A N176I

Table 10: PACE stage 4 mutations. Clonal sequencing data from trajectory 1, 2, and 3 after PACE stage 4 in Figure 15A (336 cumulative hours of evolution).

Trajectory A E2K D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop B E106G D127A S135F T146A D148P S153N S170A N176I N177R K184T V209M W211I M218F K229E C E2K D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop D E2K D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop E E2K D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop F E2K D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop G E2K D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop L1 1 H D127A S135F T146A D148P S153N S170A N176I N177R V209M W211I M218F K229E A D127A S135F T146S D148P F162S N171D N176T N177M V209M W211I M218F K229E B H28Y D127A S135F T146S D148P F162S N171D N176T N177M V209M W211I M218F K229E C H28L D127A S135F T146S D148P F162S N171D N176T N177M V209M W211I M218F K229E D K6E D127A S135F T146S D148P F162S N171D N176T N177M V209M W211I M218F K229E E D127A S135F T146S D148P F162S S170A N171D N176T N177M V209M W211I M218F K229E F H28L D127A S135F T146S D148P F162S N171D N176T N177M V209M W211I M218F K229E G K6E D127A S135F T146S D148P F162S N171D N176T N177M V209M W211I M218F K229E L2 2 H D127A S135F T146S D148P F162S S170A N171D N176T N177M V209M W211I M218F K229E A E107D D127A S135F T146A D148A N176I V209F W211C M218L Q226S P227A V228S K229stop B H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop D H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop E H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop F E107D D127A S135F T146A D148A N176I V209F W211C M218L Q226S P227A V228S K229stop L3 3 H H28Y E107D D127A S135F T146A D148A N176I V209F W211C M218L Q226S P227A V228S K229stop

54 Table 11: PACE stage 5 mutations. Clonal sequencing data from trajectory 1, 2, and 3 after PACE stage 5 in Figure 15A (456 cumulative hours of evolution).

Trajectory A E2K D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop B E2K P39S D127A S135F T146C D148P F162S S170A N176I N177S R203Q V209M W211I M218F Q226stop C E2K E107D D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop D E2K D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop E E2K H28Y N68D D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop F E2K D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop G E2K D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F E223G Q226stop L1 1a H E2K D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop A E107D T114A D127A S135F T146S D148P F162S S170A N171D N176T N177M V209M W211I M218F K229E B H28L T30A E106D D127A S135F T146S D148P F162S S170A N171D N176T N177M V209M W211I M218F K229E

D K6E H28L D127A S135F T146S D148P F162S N171D N176T N177M V209M W211I M218F E223stop E K6E D127A S135F T146S D148P F162S N171D N176T N177M V209M W211I M218F K229E F K6E D127A S135F T146S D148P F162S S170A N171D N176T N177M V209M W211I M218F K229E G K6E T30A D127A S135F T146S D148P F162S S170A N171D N176T N177M V209M W211I M218F K229E L2 2a H K6E D127A S135F T146S D148P F162S N171D N176T N177M V209M W211I M218F K229E

B H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop C H28Y K89E E107D M121T D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop D H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop E H28Y F40L E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop

G H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop L3 3a H H28Y E107D I109V D127A S135F T146A D148A S153N N171K N176I V209F W211C M218L Q226S P227A V228S K229stop A E2K D10N D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop B E2K D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop

D E2K D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop E K99Q E106G M124I D127A S135F T146A D148P S153N S170A N176I N177R V209M W211I M218F K229E F E2K D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop G E2K H61R D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop L4 1b A H28L T114A D127A S135F T146S D148P S153N F162S S170A N171D N176T N177M V209M W211I M218F K229E B F5L T17A R50G E107D D127A S135F T146S D148P F162S S170A N171D N176T N177M V209M W211I M218F K229E

55 C R50G D127A C130R S135F T146S D148P F162S S170A N171D N176T N177M V209M W211I M218F K220Q K229E D R50G I109V D127A S135F T146S D148P F162S S170A N171D N176T N177M V209M W211I M218F E223stop K229E

E P13S R50G D127A F132V S135F T146S D148P F162S S170A N171D N176T N177M V209M W211I M218F K229E L234R F R50G I109V D127A S135F T146S D148P F162S S170A N171D N176T N177M V209M W211I M218F K229E G D127A S135F T146S D148P F162S S170A N171D N176T N177M V209M W211I M218F K229E A231E L5 2b A H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop B H28Y N68D E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop C H28Y E107D M121T D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop D H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop E H28Y F40L E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop F T17A H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop G H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop L6 3b H H28Y E107D I109V D127A F132V S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop

Table 12: PACE stage 6 mutations. Clonal sequencing data from trajectory 1, 2, and 3 after PACE stage 6 in Figure 15A (528 cumulative hours of evolution).

Trajectory A E2K D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop C E2K E107D D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop D T17A E24K H28L K97N D127P S135F T146A D148P S153N S170A N176I N177R V209M W211I M218F K229E E E2K D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop G T17A H28L K97N D127P S135F T146A D148P S153N S170A N176I N177R V209M W211I M218F K229E L1 1a H T17A H28L K97N D127P S135F T146A D148P S153N S170A N176I N177R V209M W211I M218F K229E A T30A D127A S135F T146S D148P F162A S170A N171D N176T N177M V209M W211V K215E M218F K229E B T30A D127A S135F T146S D148P F162A S170A N171D N176T N177M V209M W211V K215E M218F K229E C T30A D127A S135F T146S D148P F162A S170A N171D N176T N177M V209M W211V K215E M218F F225L K229E D T30A T71P K89E D127A S135F T146S D148P F162A S170A N171D N176T N177M V209M W211V K215E M218F K229E E T30A T69P M124T D127A S135F T146S D148P F162A S170A N171D N176T N177M V209M W211I M218F K229E F T30A D127A S135F T146S D148P F162A S170A N171D N176T N177M V209M W211V K215E M218F K229E L2 2a H T30A D127A S135F T146P D148P Q150P F162A S170A N171D N176T N177M V209M W211V K215E M218F K229E A H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L B H28Y T30A E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226P P227A V228S K229stop C H28Y T30A E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226P P227A V228S K229stop D H28Y E107D S120N D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop E H28Y T30A E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L F H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop G H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop L3 3a H I18V H28Y E107D D127A S135F T146A D148A N176I V209F W211C M218L Q226P P227A V228S K229stop A E2K I109V D127A S135F T146C D148P S153N S170A N176I N177S R203Q V209M W211I M218F Q226stop B E2K E106D D127A S135F T146C D148P S153N S170A N176I N177S K184E R203Q V209M W211I M218F Q226stop C E2K T70P E107D D127A S135F T146C D148P S153N S170A N176I N177S R203Q V209M W211I M218F Q226stop D E2K T30A E106G D127A S135F T146A D148P S153N S170A N176I N177R V209M W211V M218F K229E T232A E E2K E107D L111F M124I D127A S135F T146C D148P S153N S170A N176I N177S R203Q V209M W211I M218F Q226stop F E2K E107D D127A S135F T146C D148P S153N S170A N176I N177S R203Q V209M W211I M218F Q226stop L4 1b G E2K E107D T114A D127A S135F T146C D148P S153N S170A N176I N177S R203Q V209M W211I M218F Q226stop A T17A H28L D127A S135F T146S D148P S153N F162S S170A N171D N176T N177M V209M W211I M218F F225S K229E B H28L H61Q D127A S135F T146S D148P S153N F162S S170A N171D N176T N177M V209M W211I M218F Q226S P227A V228S K229stop C T17A H28L D127A S135F T146S D148P S153N F162S S170A N171D N176T N177M V209M W211I M218F Q226S P227A V228S K229stop E T17A H28L D127A S135F T146S D148P S153N F162S S170A N171D N176T N177M V209M W211I M218F Q226S P227A V228S K229stop L234R F T17A H28L N68D D127A S135F T146S D148P S153N F162S S170A N171D N176T N177M V209M W211I M218F Q226S P227A V228S K229stop G E2K T17A H28L D127A S135F T146S D148P S153N F162S S170A N171D N176T N177M V209M W211I M218F Q226S P227A V228S K229stop A231E L5 2b H H28L N68D D127A S135F T146S D148P S153N F162S S170A N171D N176T N177M V209M W211I M218F Q226S P227stop A H28Y V66G H75L E107D D127A S135F T146A D148A N176I V209F W211C M218L Q226S P227A V228S K229stop C H28Y E107D M121T D127A S135F T146A D148A N176I V209F W211C M218L Q226S P227A V228S K229stop D H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop

56 E H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop F H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop

L6 3b G H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop

Table 13: PACE stage 7 mutations. Clonal sequencing data from trajectory 1, 2, and 3 after PACE stage 7 in Figure 15A (600 cumulative hours of evolution).

Trajectory A E2K E107D D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I K215E M218F Q226stop B E2K E107D D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I K215E M218F Q226stop C E2K E107D D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I K215E M218F Q226stop D E2K E107D D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I K215E M218F Q226stop E E2K E107D D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I K215E M218F Q226stop F E2K E107D D127A S135F T146C D148P S153N S170A N176I N177S R203Q V209M W211I K215E M218F Q226stop G E2K E107D D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I K215E M218F Q226stop L1 1a H E2K E107D D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I K215E M218F Q226stop B H28L E107D D127A T128P S135F T146S D148P S153N F162S S170A N171D N176T N177M V209M W211I M218F K229E C R9C H28L T30A N68D E107D D127A F132L S135F T146S D148P S153N F162S S170A N171D N176T N177M V182G V209M W211I M218F K229E F T17S H28L T30A N68D E107D D127A F132L S135F T146S D148P S153N F162S S170A N171D N176T N177M V209M W211I M218F K229E L2 2a H H28L T30A N68D E107D D127A F132L S135F T146S D148P S153N F162S S170A N171D N176T N177M V209M W211I M218F K229E A H28Y Q104R E107D D127A F132S S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop B H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop C E24V H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop D H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop E E2G H28Y E107D D127A S135F T146A D148A N176I N185D V209F W211C M218L Q226S P227A V228S K229stop F H28Y N68D E107D D127A S135F T146A D148A N176I V209F W211C M218L Q226S P227A V228S K229stop G E24K H28Y N68D E107D D127A T128A S135F T146A D148A N176I V209F W211C M218L Q226S P227A V228S K229stop L3 3a H H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop A E2K D26Y E107D D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218L Q226stop B E2K E107D D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop C E2K E107D D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop D E2K E107D D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop E E2K R80G D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I K215E M218F Q226stop F E2K T17A E107D D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop G E2K E107D D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop L4 1b H E2K E107D D127A S135F T146C D148P S170A N176I N177S R203Q V209M W211I M218F Q226stop A F5L T30A Q73H T118A D127A C130R S135F T146S D148P F162A S170A N171D N176T N177M V209M W211V K215E M218F P221T E222stop B T30A D127A S135F T146S D148P F162A S170A N171D N176T N177M V209M W211V K215E M218F P221T E222stop C T30A V125G D127A S135F T146S D148P F162A S170A N171D N176T N177M V209M W211V K215E M218F P221T E222stop D T30A D127A S135F T146S D148P F162A S170A N171D N176T N177M V209M W211V K215E M218F P221T E222stop E T30A D127A S135F T146S D148P F162A S170A N171D N176T N177M V209M W211V K215E M218F K229E F T30A D127A S135F T146S D148P F162A S170A N171D N176T N177M V209M W211V K215E M218F P221T E222stop G T30A D127A S135F T146S D148P F162A S170A N171D N176T N177M V209M W211V K215E M218F P221T E222stop L5 2b H T30A Q73H T118A D127A C130R S135F Q145P T146S D148P F162A S170A N171D N176T N177M V209M W211V K215E M218F P221T E222stop A N12T H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop B H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop C D10G H28Y K67E E107D D127A S135F T146A D148A S153N N176I Q193P V209F W211C M218L Q226S P227A V228S K229stop D H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop E H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop

57 F H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L Q226S P227A V228S K229stop L6 3b G H28Y E107D D127A S135F T146A D148A S153N N176I V209F W211C M218L

Table 14: PACE stage 8 mutations. Clonal sequencing data from trajectory 1, 2, and 3 after PACE stage 8 in Figure 15A (672 cumulative hours of evolution)

Section 2.3: Characterization of Evolved TEV Protease Variants

Mutations that arise early in long evolutionary trajectories can create a cascade of contingencies because subsequent mutations must be compatible with the preexisting genetic context, a phenomenon known as epistasis.

Genotypes suggest that epistasis strongly shaped the outcomes of trajectories 1 and 2, which were dominated by

N176I . N171D + N176T, respectively, prior to the third stage of PACE. During subsequent evolution, the amino acid identity at amino acid 176 appears to have dictated the optimal identity of residue 177, such that the combinations N176I + N177S, or N176T + N177M, predominate trajectories 1 and 2 respectively. Swapping the identity of N177 between clones from trajectories 1 and 2 results in a substantial loss of activity (Figure 22), further consistent with epistasis at this position. It is likely that these genetic differences between trajectory 1 and 2 also later led to the enrichment of distinct mutations outside of the substrate- (Table 6-Table 14). For example, mutations at position 203 persisted only in trajectory 1, while mutations at positions 28, 30, 68, 132, and

162 were only abundant in trajectory 2.

160000 * * p<0.0001 140000 unpaired t-test * 120000 100000 80000 60000 40000

Luminescence/OD600 20000 0 L1F L1F N177M L2F L2F N177S Uninfected

Figure 22: Epistatic interactions with TEV protease residue N177. PACE evolved clones L1F and L2F, from PACE stage 8 of trajectories 1 and 2 respectively, exhibit robust apparent activity on the HPLVGHM substrate. When the identity of residue N177 is swapped between these clones, as in variants L1F N177M and L2F N177S, we see a significant decrease in apparent activity, suggesting that the optimal substitution for N177 depends upon the identities of other mutations within TEV protease. Error bars represent the standard deviation of three technical replicates.

Unsurprisingly, we observed a dramatically different outcome in the third trajectory, which not only experienced a different schedule of stepping-stone substrates but was also subjected to a mixing experiment instead of NNK mutagenesis at residues 146, 148, 167 and 177. Our data are consistent with a model in which a lack of diversification at these critical residues traps trajectory 3 in a local fitness maximum, evidenced by weak apparent activity on the final target substrate (Figure 19-Figure 21) and few genotypic changes after the fifth stage of

58

trajectory 3 (Figure 22B). Consequently, while all six populations yielded TEV variants with apparent activity on the final target, the TEV protease variants that exhibited the highest apparent activity on the final target were all derived from trajectories 1 and 2.

Three representative proteases from the end of trajectories 1 and 2 were purified and assayed in vitro for their ability to cleave a model protein substrate in which MBP and GST were fused through a linker containing the final HPLVGHM substrate sequence (Figure 23). All three evolved proteases cleaved the model substrate. We selected the most active clone (TEV L2F from trajectory 2, containing 20 non-silent mutations, Table 14) for detailed characterization. We assayed the kinetic parameters of this mutant enzyme on wild-type (ENLYFQS) and target (HPLVGHM) substrate peptides using a previously described HPLC method22. Unlike the wild-type enzyme, which exhibits no detectable activity on the HPLVGHM peptide, the L2F variant processes this substrate with approximately 15% of the catalytic efficiency (kcat/KM) with which TEV protease cleaves its native substrate (Table

15 and Figure 24). Compared to wild-type TEV, evolved TEV L2F appears to maintain nearly identical kinetics on the canonical ENLYFQS substrate, while experiencing only a modest 5-fold increased Km on the target substrate

HPLVGHM. These results collectively indicate that PACE generated a highly evolved mutant protease that cleaves a target substrate containing mutations at six positions with only modestly lower efficiency than wild-type TEV protease cleaves its consensus substrate.

L1F L2F L5B

MBP-GST

MBP

TEV GST

Figure 23: Protein cleavage assay to identify the most active clone. TEV protease variants from the final PACE experiment were overexpressed and purified. Approximately 1 µg of protease was incubated for 3 h at 30°C with 5 µg of a fusion protein construct in which MBP is linked to GST through a cleavable substrate linker containing the peptide HPLVGHM. Note that TEV protease variants L1F and L5B encode premature stop codons leading to products with approximately the same molecular weight as GST. Consequently, the intensity of the MBP product band best reflects reaction efficiency, leading to the conclusion that TEV L2F exhibits the highest catalytic activity among the clones tested.

59

-1 -1 Substrate Protease kcat (s ) Km (µM) kcat/Km (s*µM) ENLYFQS Wild-Type 0.23±0.011 170±22 1.3x10-3 HPLVGHM Wild-Type N.D. N.D. N.D. ENLYFQS TEV L2F 0.27±0.019 150±28 1.7x10-3 HPLVGHM TEV L2F 0.19±0.010 920±110 2.0x10-4 Table 15: Kinetic parameters of wild-type and evolved TEV proteases. N.D. = not detected (no product formation observed after 30 min of incubation of 1 µM protease and 2 mM substrate with a limit of detection of 1 nM product).

TEV Wild-Type A Product Substrate B 250 0.25

200 0.20

150 -1 0.15

ENLYFQS 0.10

100 Rate, s HPLVGHM kcat 0.2338 50 0.05 Km 173.1 0.00 Absorbance355nm (mAu) 0 15 17 19 21 23 25 27 0 200 400 600 800 1000 -50 [ENLYFQS Substrate], µM Time (minutes)

C TEV L2F D TEV L2F 0.15 0.25

0.20 0.10 -1 -1 0.15

0.10 Rate, s 0.05 Rate, s kcat 0.1865 kcat 0.266 0.05 Km 925.7 Km 152.4 0.00 0.00 0 500 1000 1500 2000 2500 0 200 400 600 [HPLVGHM Substrate], µM [ENLYFQS Substrate], µM

Figure 24: HPLC assay of TEV protease kinetics. (A) Synthetic peptide standards. TEV protease substrate peptides and the corresponding product peptides in a 1:1 mixture are separable by reverse-phase liquid chromatography. (B) Wild-type TEV protease (0.1 µM) was incubated for 10 min at 30 °C with ENLYFQS substrate concentration ranging from 50 to 800 µM. (C) TEV L2F protease (0.1 µM) was incubated for 10 min at 30 °C with HPLVGHM substrate concentration ranging from 50 to 2000 µM. (D) TEV L2F protease (0.05 µM) was incubated for 10 min at 30 °C with ENLYFQS substrate concentration ranging from 50 to 500 µM. Data was fit to a Michaelis-Menten model with error bars representing the standard deviation of three technical replicates.

60

Section 2.4: Substrate Specificity Profiling of an Evolved TEV Protease

Proteolysis assays on individual substrates reveal that evolved TEV protease L2F maintains the ability to detectably cleave starting and intermediate substrates while acquiring activity on the final IL-23 target (Figure 25).

A more comprehensive understanding of the substrate specificity of this evolved enzyme requires an unbiased protease specificity profile generated from a large number of substrate variants. To obtain such a profile, we applied a previously reported phage substrate display method (Figure 26A)57-59. M13 bacteriophage encoding pIII fused to a

FLAG-tag through a library of substrate linkers were immobilized on anti-FLAG magnetic beads. When incubated with a protease of interest, phage encoding cleaved substrates are liberated from the solid support, while phage encoding the intact substrates remain immobilized and are eluted with excess FLAG peptide. The abundance of each substrate in the cleaved versus eluted populations was measured by high-throughput DNA sequencing, yielding enrichment values (Table 16) and sequence logos (Figure 26B-E) that convey protease substrate specificity across all possible amino acids60.

ENLYFQSHNLYFQSENLYFHSENLYGQSHNLYFHSHNLYGHSHPLVGHMENLYFQSHNLYFQSENLYFHSENLYGQSHNLYFHSHNLYGHSHPLVGHM

MBP-GST

MBP

TEV GST

Wild Type TEV Protease TEV L2F Protease T17S, H28L, T30A, N68D, E107D, D127A, F132L, S135F, T146S, D148P, S153N, F162S, S170A, N171D, N176T, N177M, V209M, W211I, M218F, K229E Figure 25: Evolved TEV protease cleaves wild-type, intermediate, and target substrates. In a manner analogous to that described above in Figure 23, we assayed TEV proteases on a panel of substrate sequences. Approximately 1 µg of protease was incubated for 3 h at 30°C with 5 µg of a fusion protein construct in which MBP is linked to GST through a cleavable substrate linker containing the indicated amino acid sequence. WT TEV efficiently cleaves wild-type substrate, and to a much lesser degree processes single mutant substrates (HNLYFQS, ENLYFHS, ENLYGQS). Evolved TEV protease clone L2F yields a visible product band for the target substrate HPLVGHM. This evolved protease has also maintained activity on wild-type, single, double, and triple mutant substrates that were used as evolutionary stepping-stones in PACE.

61

We applied this specificity profiling technique to seven separate libraries, each containing a single randomized position within the canonical ENLYFQS substrate. These libraries are inherently biased by the identities of the residues that are held constant, but because of their small theoretical diversity, they are easy to construct and a single round of selection yielded robust enrichment values. We validated this method by enrichment of the consensus motif

EXLYFQS (where X = any amino acid) for wild-type TEV protease (Figure 26B). We also applied this substrate specificity profiling method to more complex libraries containing sets of three consecutive randomized amino acids within either the ENLYFQS or HPLVGHM substrate. The resulting specificity profiles from these larger libraries

(Figure 27) did not substantially differ from the results of the single-site libraries. In addition, the identity of the constant residues (ENLYFQS vs. HPLVGHM) had only modest impact on the resulting specificity profiles of TEV

L2F, although the P1 specificity of TEV L2F is more pronounced for P1 His, the target residue, and P6 Glu, the wild-type residue, in the context of HPLVGHM libraries (Figure 27 and corresponding enrichment values in Table

17).

When we compare the specificity profile of evolved TEV L2F (Figure 26C) to that of wild-type TEV, a number of differences are apparent: TEV L2F shows a broadening of specificity at P6, a shifting of P3 specificity towards aliphatic residues Ile and Val, a shifting of P1 specificity to include His, and a shifting of P1’ specificity towards aliphatic amino acids Ala, Ile, and Met. These changes are largely consistent with evolutionary pressure to cleave the target substrate HPLVGHM. A notable absence of altered specificity at the P2 Gly position suggests that affinity for this substitution may offer the largest remaining potential gains in target substrate cleavage efficiency.

Although the evolved L2F protease recognizes a shortened motif due to loss of P6 specificity, it retains the ability to reject the substantial majority of amino acids at each of the five others positions used by TEV. The overall specificity of the evolved L2F protease lies well within the range of specificities exhibited by natural proteases such as , granzymes, clotting factors, and MMPs, which typically specify strongly only one or two positions and accept mixtures of several to many amino acids at other positions, yet retain sufficient overall specificity to mediate physiological signaling roles61-64.

62

Figure 26: Protease specificity profiling. (A) Overview of phage substrate display. M13 bacteriophage libraries contain pIII fused to a FLAG-tag through a randomized protease substrate linker. These substrate phage are bound to anti-FLAG magnetic beads and treated with a protease to release phage that encode substrates that can be cleaved by the protease. The remaining intact substrate phage are eluted with excess FLAG peptide. The abundance of all substrate sequences within the cleaved and eluted samples is measured by high-throughput sequencing. (B-E) For all assayed proteases, phage substrate display was separately performed on seven libraries, each with a different single randomized position within the ENLYFQS motif. The resulting enrichment values are displayed as sequence logos, with enrichment values above zero indicating protease acceptance, and values below zero indicating rejection. (B) wild-type TEV protease exhibits strong enrichment for the consensus motif EXLYFQS. (C) Evolved TEV L2F has broadened specificity at P6 and shifted specificity at P3, P1, and P1’ in accordance with the HPLVGHM target substrate. (D) Mutations I138T, N171D, N176T are sufficient to broaden P6 specificity. (E) Mutations T146S, D148P, S153N, S170A, N177M shift specificity at both P1 and P3.

63

Table 16: Phage display enrichment values from selections on single site libraries. Each sub-table within the larger table represents the amino acid enrichment values generated for the given genotype of TEV protease. Each row contains enrichment values from a selection performed on the library in which the corresponding position within the ENLYFQS motif was randomized. The enrichment value for each amino acid identity at a given position was calculated as frequencycleaved/frequencyelution-1. The cells are shaded on a linear scale from red to blue; this color scale is normalized for each sub-table with the lowest number in the sub-table being the darkest red and the highest number in the sub-table being darkest blue. Wild-type TEV A C D E F G H I K L M N P Q R S T V W Y P6 -0.19 0.08 1.03 5.69 0.99 -0.16 0.03 0.41 -0.20 0.42 0.31 -0.10 0.04 0.23 -0.14 -0.05 -0.01 0.26 0.37 -0.01 P5 0.30 -0.25 0.45 -0.07 0.25 -0.46 0.15 -0.19 -0.06 0.21 0.10 -0.51 -0.24 0.02 0.02 0.10 -0.04 0.20 0.14 0.24 P4 -0.34 -0.29 -0.31 -0.15 -0.33 -0.31 -0.40 5.16 -0.29 3.16 0.68 -0.32 -0.37 -0.39 -0.33 -0.35 -0.31 0.39 -0.21 -0.39 P3 -0.50 -0.52 -0.09 -0.54 0.84 0.30 -0.48 -0.11 -0.50 -0.37 -0.54 -0.45 -0.39 -0.52 -0.29 -0.56 -0.58 0.01 -0.26 4.08 P2 -0.19 -0.74 -0.48 -0.66 3.81 -0.39 -0.70 3.79 -0.69 0.48 0.85 -0.59 -0.03 -0.72 -0.62 -0.38 0.19 4.36 0.92 -0.72 P1 -0.22 -0.06 -0.14 -0.12 -0.16 -0.15 -0.08 -0.24 -0.24 -0.18 1.29 -0.02 -0.22 5.49 -0.27 -0.24 -0.17 -0.28 -0.16 -0.26 P1' 2.05 -0.50 -0.03 -0.48 1.60 1.96 1.22 -0.39 -0.58 -0.44 1.38 0.07 -0.68 -0.33 -0.62 2.77 -0.33 -0.60 2.24 2.60 TEV L2F T17S, H28L, T30A, N68D, E107D, D127A, F132L, S135F, T146S, D148P, S153N, F162S, S170A, N171D, N176T, N177M, V209M, W211I, M218F, K229E A C D E F G H I K L M N P Q R S T V W Y P6 0.50 -0.59 -0.04 0.02 -0.08 0.01 -0.01 0.11 0.02 0.02 0.07 -0.03 -0.03 -0.02 0.01 -0.07 -0.01 0.02 0.15 -0.72 P5 0.01 -0.57 -0.08 0.03 -0.05 0.00 0.09 -0.05 0.01 0.01 0.06 -0.23 0.07 0.03 0.09 -0.01 -0.03 0.00 -0.02 0.06 P4 0.03 -0.40 -0.21 -0.12 -0.11 -0.32 -0.41 3.80 -0.25 4.24 1.30 -0.16 -0.31 -0.38 -0.35 -0.32 -0.33 1.23 -0.20 -0.37 P3 -0.66 -0.86 -0.58 -0.79 1.42 -0.51 -0.13 1.82 -0.65 1.12 1.30 -0.78 -0.88 0.02 0.00 -0.85 -0.36 1.73 1.43 1.36 P2 0.50 -0.85 -0.83 -0.79 3.04 -0.64 -0.81 1.76 -0.82 0.53 -0.18 -0.70 -0.56 -0.80 -0.73 -0.28 0.02 1.94 2.67 2.22 P1 0.45 -0.66 0.11 1.74 1.97 -0.08 2.35 -0.70 -0.62 1.89 2.60 0.68 -0.54 3.04 -0.54 0.12 1.33 -0.76 -0.15 2.71 P1' 3.22 -0.51 0.98 1.36 2.07 2.13 1.99 3.16 -0.77 2.06 3.01 -0.80 -0.80 1.50 -0.74 2.45 2.26 2.69 2.52 2.87 TEV I138T, N171D, N176T A C D E F G H I K L M N P Q R S T V W Y P6 -0.81 -0.50 0.59 0.78 1.41 -0.51 -0.12 1.53 1.45 1.60 1.70 -0.42 1.50 1.06 -0.08 0.28 0.48 1.46 0.42 0.19 P5 0.63 -0.19 1.66 1.25 2.49 -0.76 0.85 2.32 -0.57 2.11 2.16 -0.06 2.39 0.93 -0.48 0.07 0.72 1.97 3.04 2.81 P4 -0.20 -0.22 -0.41 -0.25 -0.12 -0.16 -0.24 1.69 -0.13 4.15 -0.01 -0.13 -0.04 -0.14 -0.18 -0.21 -0.28 -0.25 -0.09 -0.27 P3 -0.26 -0.35 0.84 0.26 -0.08 1.62 -0.12 -0.33 -0.19 -0.25 -0.39 0.00 -0.19 -0.19 -0.12 -0.23 -0.30 -0.26 -0.26 3.35 P2 -0.06 -0.40 -0.10 -0.30 5.21 -0.18 -0.34 0.05 -0.29 -0.21 -0.38 -0.33 -0.77 -0.31 -0.18 -0.21 -0.26 1.74 -0.27 -0.48 P1 -0.10 -0.12 -0.47 -0.12 -0.13 -0.01 -0.20 -0.25 -0.06 -0.17 -0.11 -0.18 0.08 4.05 -0.08 -0.05 -0.07 -0.26 -0.11 -0.24 P1' 1.24 -0.31 -0.44 -0.29 0.20 0.74 0.05 -0.25 -0.33 -0.27 -0.04 -0.37 -0.36 -0.43 -0.40 2.35 -0.41 -0.41 1.24 -0.06 TEV T146S, D148P, S153N, S170A, N177M A C D E F G H I K L M N P Q R S T V W Y P6 0.48 0.35 1.31 2.68 0.52 -0.03 0.09 0.31 -0.16 0.26 0.36 0.18 0.15 0.34 -0.14 -0.01 0.10 0.21 0.18 0.19 P5 0.30 -0.48 0.20 0.30 0.28 -0.55 -0.06 0.42 -0.25 0.27 0.40 -0.23 0.30 0.30 -0.04 -0.08 0.17 0.32 0.30 0.17 P4 -0.34 -0.34 -0.39 -0.08 -0.35 -0.27 -0.37 4.37 -0.22 4.49 0.91 -0.20 -0.28 -0.24 -0.29 -0.33 -0.32 0.50 -0.31 -0.47 P3 -0.67 -0.73 -0.48 -0.59 0.83 0.02 -0.49 0.86 -0.72 -0.15 -0.38 -0.56 -0.74 -0.56 -0.45 -0.74 -0.69 0.81 0.66 2.27 P2 -0.36 -0.64 -0.58 -0.49 3.55 -0.47 -0.51 1.16 -0.65 0.14 -0.01 -0.42 -0.27 -0.56 -0.49 -0.37 -0.23 0.39 1.18 -0.43 P1 0.15 -0.43 0.14 1.21 1.55 -0.14 2.02 -0.78 -0.59 1.57 2.23 0.14 -0.61 2.16 -0.56 0.12 0.86 -0.78 0.08 1.99 P1' 1.00 -0.32 0.49 0.38 1.11 0.81 0.54 0.90 -0.60 0.33 0.73 -0.76 -0.81 0.13 -0.56 1.92 0.32 -0.08 1.00 2.71 TEV V209M, W211I, M218F A C D E F G H I K L M N P Q R S T V W Y P6 0.91 0.31 0.39 2.61 0.19 -0.01 0.01 0.22 0.31 0.29 0.28 0.09 0.11 0.17 -0.12 0.05 0.10 0.18 0.05 -0.46 P5 1.06 -0.37 1.42 1.05 1.65 -0.78 1.14 2.04 -0.55 1.47 1.45 0.36 1.88 0.60 -0.45 0.19 0.84 1.90 1.78 1.94 P4 -0.31 -0.22 -0.07 -0.13 -0.23 -0.19 -0.30 2.68 -0.13 4.45 0.19 -0.11 -0.17 -0.18 -0.23 -0.27 -0.27 0.26 -0.19 -0.26 P3 -0.39 -0.46 -0.03 -0.30 0.32 0.72 -0.34 -0.10 -0.34 -0.17 -0.44 -0.11 -0.23 -0.32 -0.03 -0.45 -0.44 -0.09 -0.16 3.38 P2 0.06 -0.50 -0.42 -0.49 3.77 -0.28 -0.50 -0.26 -0.51 -0.37 -0.27 -0.27 0.41 -0.56 -0.38 -0.29 -0.21 0.05 1.41 0.49 P1 -0.03 -0.07 0.06 0.20 -0.22 0.07 0.04 -0.30 -0.08 -0.25 0.51 -0.01 -0.07 3.92 -0.12 -0.07 -0.02 -0.29 -0.23 -0.29 P1' 0.79 -0.13 -0.07 -0.16 0.39 1.08 0.56 -0.33 -0.39 -0.36 -0.08 -0.50 -0.46 -0.30 -0.44 2.44 -0.24 -0.46 0.88 0.55 TEV H28L, T30A A C D E F G H I K L M N P Q R S T V W Y P6 0.56 0.18 0.24 0.33 0.58 0.04 -0.03 0.00 0.17 0.07 0.04 0.13 0.00 -0.05 -0.04 0.11 -0.01 -0.10 0.25 0.09 P5 0.11 -0.05 0.46 0.43 0.32 -0.39 -0.14 0.22 -0.45 0.32 0.66 -0.16 1.07 0.00 -0.38 -0.28 0.08 0.34 1.05 0.29 P4 -0.05 -0.03 -0.12 0.04 -0.11 0.03 -0.07 0.26 0.08 0.60 -0.12 0.04 0.06 -0.05 0.01 -0.01 -0.14 -0.10 -0.09 -0.20 P3 0.08 -0.07 -0.02 0.29 0.00 0.45 0.11 -0.16 0.23 -0.09 -0.20 0.22 0.11 0.03 0.04 0.03 -0.04 -0.20 -0.10 0.51 P2 0.07 -0.14 -0.13 -0.02 0.74 -0.08 -0.10 0.54 0.02 -0.09 -0.18 0.03 -0.35 -0.10 0.08 0.00 0.04 0.15 -0.18 -0.26 P1 -0.10 -0.09 0.16 0.01 -0.09 0.18 -0.07 -0.12 0.08 -0.16 -0.05 0.01 0.08 0.51 0.05 -0.02 -0.03 -0.15 -0.07 -0.14 P1' 0.02 -0.20 -0.21 -0.08 0.21 0.16 -0.04 -0.02 -0.18 -0.11 0.24 -0.21 -0.19 -0.22 -0.26 0.33 -0.07 -0.11 0.97 0.06 TEV T17S, N68D, E107D, D127A, F132L, S135F, F162S, K229E A C D E F G H I K L M N P Q R S T V W Y

64

Table 16 (Continued) P6 -0.85 -0.16 0.45 2.26 0.47 -0.06 -0.12 0.23 3.24 0.43 0.48 -0.09 -0.05 0.53 -0.08 -0.05 -0.02 0.16 -0.08 -0.05 P5 0.40 -0.42 0.10 0.90 0.92 -0.65 -0.31 1.30 -0.49 1.11 1.07 -0.44 1.43 0.59 -0.15 -0.30 0.53 1.13 1.47 0.92 P4 -0.21 -0.16 1.06 -0.06 -0.36 -0.06 -0.18 2.69 -0.12 2.92 0.00 -0.04 -0.10 -0.14 -0.12 -0.19 -0.21 0.00 -0.20 -0.36 P3 -0.27 -0.37 -0.34 -0.17 0.20 0.74 -0.23 -0.25 0.25 -0.26 -0.37 -0.08 0.06 -0.25 -0.09 -0.27 -0.35 -0.16 -0.15 2.34 P2 -0.25 -0.37 -0.24 -0.31 2.20 -0.33 -0.40 1.36 -0.39 -0.11 -0.17 -0.23 -0.31 -0.36 -0.21 -0.26 -0.16 0.80 0.19 -0.43 P1 -0.04 0.08 -0.45 0.08 -0.15 -0.12 0.00 -0.33 -0.12 -0.18 0.56 0.00 -0.02 2.36 -0.07 0.00 0.04 -0.36 -0.07 -0.27 P1' 0.91 -0.35 -0.10 -0.38 1.30 0.49 0.64 -0.05 -0.46 -0.26 0.58 -0.56 -0.41 -0.40 -0.50 1.69 -0.30 -0.35 2.37 1.85 TEV E107D, D127A, S135F, R203Q, K215E A C D E F G H I K L M N P Q R S T V W Y P6 -0.94 0.10 0.10 0.62 0.53 -0.01 -0.13 0.14 2.29 0.38 0.41 -0.11 -0.02 0.39 -0.04 -0.04 -0.08 0.14 0.00 0.44 P5 0.16 -0.39 -0.55 -0.44 0.12 -0.52 -0.44 0.51 0.36 0.46 0.68 -0.58 1.43 0.49 1.08 -0.37 0.19 0.33 0.24 -0.06 P4 -0.12 -0.07 0.56 -0.02 -0.20 -0.01 -0.18 1.05 0.02 0.94 0.03 0.06 -0.07 -0.06 -0.04 -0.13 -0.11 -0.06 0.00 -0.21 P3 0.00 -0.17 1.01 0.07 0.14 0.05 -0.10 -0.21 0.14 -0.02 -0.14 0.15 -0.12 -0.05 0.05 -0.09 -0.21 -0.18 0.14 0.35 P2 -0.12 -0.14 -0.10 -0.15 1.07 -0.24 -0.24 0.58 -0.11 -0.08 -0.18 -0.10 0.98 -0.17 0.01 -0.11 -0.13 0.27 0.07 -0.31 P1 -0.14 0.21 -0.49 -0.18 0.29 0.02 -0.18 -0.18 -0.07 0.12 0.13 -0.18 -0.03 0.76 -0.06 -0.11 -0.10 -0.24 0.60 -0.08 P1' 0.12 0.09 -0.27 -0.25 1.25 0.06 0.14 -0.21 -0.27 -0.15 0.14 -0.31 -0.27 -0.35 -0.30 0.72 -0.29 -0.35 1.50 1.24 TEV E107D, D127A, S135F A C D E F G H I K L M N P Q R S T V W Y P6 -0.81 0.25 0.57 2.22 1.16 -0.13 -0.08 0.90 2.59 1.14 0.94 -0.15 0.00 0.71 -0.14 -0.01 0.02 0.67 0.64 1.44 P5 0.15 -0.15 -0.08 0.20 0.42 -0.53 0.07 0.12 -0.18 0.27 0.47 -0.52 0.24 0.25 0.04 0.08 0.15 0.10 0.36 0.44 P4 -0.26 -0.14 0.80 -0.05 -0.20 -0.15 -0.27 3.37 -0.16 3.03 0.46 -0.02 -0.18 -0.29 -0.20 -0.25 -0.24 0.24 -0.03 -0.30 P3 -0.34 -0.40 -0.28 -0.27 0.73 0.75 -0.28 -0.15 -0.31 -0.25 -0.44 -0.16 -0.45 -0.32 -0.02 -0.41 -0.44 0.03 -0.12 2.57 P2 -0.37 -0.51 -0.54 -0.52 3.67 -0.55 -0.53 2.99 -0.56 0.09 0.03 -0.35 -0.18 -0.59 -0.41 -0.40 -0.22 2.52 1.22 -0.50 P1 -0.15 0.28 -0.47 -0.02 -0.11 -0.07 -0.02 -0.29 -0.09 -0.13 0.71 0.06 -0.07 2.88 -0.12 -0.03 -0.04 -0.36 0.05 -0.23 P1' 1.45 0.19 -0.04 -0.39 1.59 1.17 0.89 -0.27 -0.46 -0.35 1.25 -0.48 -0.56 -0.28 -0.46 2.27 -0.34 -0.49 1.89 2.40

65

WT TEV XXXYFQS TEV L2F XXXYFQS TEV L2F XXXVGHM

Enrichment Enrichment Enrichment

P6 P5 P4 P3 P2 P1 P1’ P6 P5 P4 P3 P2 P1 P1’ P6 P5 P4 P3 P2 P1 P1’

WT TEV EXXXFQS TEV L2F EXXXFQS TEV L2F HXXXGHM

Enrichment Enrichment Enrichment

P6 P5 P4 P3 P2 P1 P1’ P6 P5 P4 P3 P2 P1 P1’ P6 P5 P4 P3 P2 P1 P1’

WT TEV ENXXXQS TEV L2F ENXXXQS TEV L2F HPXXXHM

Enrichment Enrichment Enrichment

P6 P5 P4 P3 P2 P1 P1’ P6 P5 P4 P3 P2 P1 P1’ P6 P5 P4 P3 P2 P1 P1’

WT TEV ENLXXXS TEV L2F ENLXXXS TEV L2F HPLXXXM

Enrichment Enrichment Enrichment

P6 P5 P4 P3 P2 P1 P1’ P6 P5 P4 P3 P2 P1 P1’ P6 P5 P4 P3 P2 P1 P1’

WT TEV ENLYXXX TEV L2F ENLYXXX TEV L2F HPLVXXX

Enrichment Enrichment Enrichment

P6 P5 P4 P3 P2 P1 P1’ P6 P5 P4 P3 P2 P1 P1’ P6 P5 P4 P3 P2 P1 P1’

Figure 27: Specificity profiles generated from libraries with three randomized substrate amino acids. The logos above were generated using phage substrate libraries containing windows of three randomized amino acids within either the ENLYFQS or the HPLVGHM substrate (corresponding enrichment values in Table 17). The nature of the library and the protease that was used in the selection is specified in the title above each sequence logo (with X denoting randomized substrate residues). The specificity profiles of wild-type and evolved TEV L2F protease using three-residue ENLYFQS libraries are largely similar to those seen in Figure 26 using single-site randomized substrate libraries. In the context of the HPLVGHM libraries, however, we observe that TEV L2F exhibits greater specificity for glutamate at P6 and for histidine at P1.

66

WT XXXYFQS A C D E F G H I K L M N P Q R S T V W Y P6 -0.24 -0.53 1.34 2.27 0.83 -0.40 -0.10 0.30 -0.45 0.15 0.25 -0.31 -0.19 -0.27 -0.43 -0.01 -0.29 0.06 -0.15 1.28 P5 -0.13 -0.33 -0.23 -0.12 0.21 -0.50 -0.10 0.41 -0.35 0.86 0.12 -0.43 -0.34 0.09 -0.43 -0.30 -0.13 0.00 1.08 0.19 P4 -0.44 -0.37 -0.60 -0.61 -0.49 -0.56 -0.48 -0.12 -0.48 5.17 -0.42 -0.46 -0.50 -0.32 -0.39 -0.44 -0.49 -0.47 -0.36 -0.49 WT EXXXFQS A C D E F G H I K L M N P Q R S T V W Y P5 -0.08 -0.44 0.54 0.30 -0.13 0.06 0.52 -0.17 -0.58 0.16 0.55 -0.68 0.22 -0.17 -0.21 -0.14 -0.43 -0.07 0.60 -0.38 P4 -0.69 -0.66 -0.55 -0.80 -0.69 -0.78 -0.63 2.73 -0.63 4.05 0.09 -0.70 -0.76 -0.73 -0.71 -0.73 -0.69 -0.49 -0.70 -0.72 P3 -0.73 -0.68 -0.59 -0.79 0.20 -0.76 -0.69 0.12 -0.75 -0.56 -0.66 -0.68 -0.82 -0.75 -0.67 -0.75 -0.76 -0.07 -0.56 5.51 WT ENXXXQS A C D E F G H I K L M N P Q R S T V W Y P4 -0.32 -0.67 -0.94 -0.89 -0.67 -0.12 -0.78 0.46 -0.65 6.62 -0.14 -0.52 -0.77 -0.72 -0.73 -0.79 -0.66 -0.35 -0.59 -0.59 P3 -0.04 -0.66 -0.87 -0.14 0.31 -0.20 -0.84 -0.63 -0.69 -0.36 -0.50 -0.69 -0.85 -0.78 -0.53 -0.62 -0.71 -0.54 -0.12 2.91 P2 -0.82 -0.74 -0.93 -0.51 1.35 -0.62 -0.74 1.66 -0.81 0.03 -0.32 -0.78 -0.72 -0.74 -0.69 -0.59 -0.36 0.74 -0.50 -0.43 WT ENLXXXS A C D E F G H I K L M N P Q R S T V W Y P3 -0.59 -0.47 -0.57 -0.66 -0.06 -0.53 -0.51 -0.50 -0.55 -0.42 -0.52 -0.64 -0.73 -0.62 -0.75 -0.70 -0.52 -0.45 -0.54 3.19 P2 -0.56 -0.29 -0.62 -0.71 -0.32 -0.49 -0.64 -0.37 -0.42 -0.19 -0.57 -0.72 -0.55 -0.69 -0.60 -0.05 -0.67 0.35 1.60 -0.33 P1 -0.57 -0.27 -0.61 -0.59 -0.54 -0.32 -0.25 -0.51 -0.51 -0.53 -0.26 1.29 -0.59 15.26 -0.55 -0.58 -0.56 -0.64 -0.68 -0.63 WT ENLYXXX A C D E F G H I K L M N P Q R S T V W Y P2 -0.16 -0.46 -0.40 -0.27 1.83 0.55 -0.46 0.03 -0.65 -0.16 -0.27 -0.61 -0.63 -0.51 -0.51 -0.36 -0.39 0.70 -0.06 -0.29 P1 -0.31 -0.51 -0.43 -0.19 -0.47 -0.47 -0.30 -0.42 -0.59 -0.32 -0.25 -0.31 -0.63 11.84 -0.37 -0.43 -0.44 -0.42 -0.41 -0.39 P1' 0.78 -0.23 -0.56 -0.70 -0.47 0.41 1.40 -0.44 -0.43 -0.41 0.40 -0.11 -0.46 -0.15 -0.36 0.73 -0.45 -0.22 -0.06 -0.15

L2F XXXYFQS A C D E F G H I K L M N P Q R S T V W Y P6 -0.23 -0.51 -0.16 -0.24 0.29 -0.20 0.05 -0.08 -0.34 0.31 0.03 -0.24 0.00 -0.30 -0.16 -0.03 0.09 0.06 0.26 -0.08 P5 -0.17 0.02 -0.17 -0.07 0.06 -0.44 -0.13 0.14 0.15 0.30 -0.35 -0.41 -0.33 -0.24 -0.08 -0.43 -0.21 0.18 0.86 0.82 P4 0.78 0.38 -0.81 -0.88 -0.61 -0.82 -0.87 3.19 -0.85 3.61 1.77 -0.87 -0.86 -0.88 -0.87 -0.69 -0.72 1.15 -0.76 -0.80 L2F EXXXFQS A C D E F G H I K L M N P Q R S T V W Y P5 -0.08 0.31 0.43 0.18 0.07 0.05 -0.09 0.07 0.19 0.23 0.01 -0.77 0.05 -0.09 -0.05 -0.06 -0.39 -0.16 0.35 0.00 P4 -0.05 -0.23 -0.56 -0.78 -0.42 -0.75 -0.77 2.48 -0.76 3.20 0.75 -0.70 -0.76 -0.77 -0.76 -0.72 -0.67 -0.12 -0.51 -0.64 P3 -0.48 -0.49 -0.64 -0.55 0.22 -0.73 -0.53 0.61 -0.57 0.58 0.13 -0.64 -0.71 -0.38 -0.48 -0.64 -0.45 0.35 0.22 0.40 L2F ENXXXQS A C D E F G H I K L M N P Q R S T V W Y P4 -0.48 -0.35 -0.88 -0.77 -0.05 -0.28 -0.81 2.31 -0.30 5.44 -0.30 -0.61 -0.65 -0.77 -0.77 -0.69 -0.72 -0.67 -0.47 -0.72 P3 -0.52 -0.57 -0.81 -0.35 -0.11 -0.35 -0.84 -0.13 -0.67 0.77 0.33 -0.52 -0.15 -0.54 -0.64 -0.48 -0.25 0.18 0.09 0.34 P2 -0.68 -0.45 -0.82 -0.61 1.41 0.18 -0.81 -0.24 -0.94 0.18 -0.60 -0.81 -0.26 -0.46 -0.59 0.15 -0.53 -0.17 0.54 -0.01 L2F ENLXXXS A C D E F G H I K L M N P Q R S T V W Y P3 -0.73 -0.68 -0.67 -0.77 0.15 -0.70 -0.67 0.54 -0.54 0.52 -0.41 -0.74 -0.78 -0.41 -0.77 -0.77 -0.65 0.09 -0.25 0.97 P2 0.44 -0.54 -0.64 -0.71 1.06 -0.09 -0.78 0.04 -0.66 -0.30 -0.73 -0.82 -0.61 -0.78 -0.78 -0.40 -0.61 -0.08 0.81 0.23 P1 -0.35 -0.59 -0.71 0.92 0.74 -0.61 3.94 -0.77 -0.75 -0.49 2.88 1.81 -0.66 2.82 -0.63 -0.68 -0.53 -0.79 -0.74 0.27 L2F ENLYXXX A C D E F G H I K L M N P Q R S T V W Y P2 0.14 -0.29 -0.50 -0.79 2.09 -0.39 -0.72 -0.53 -0.76 -0.16 -0.53 -0.68 -0.35 -0.69 -0.72 -0.40 -0.35 0.17 1.11 0.11 P1 -0.66 -0.67 -0.62 0.52 0.13 -0.58 2.90 -0.71 -0.71 -0.54 2.37 0.39 -0.71 2.81 -0.64 -0.70 -0.29 -0.74 -0.66 -0.02 P1' 0.51 -0.29 -0.51 -0.54 -0.22 0.18 0.40 -0.29 -0.74 0.04 1.04 -0.14 -0.75 -0.46 -0.67 0.68 -0.24 0.37 1.10 -0.41

L2F XXXVGHM A C D E F G H I K L M N P Q R S T V W Y P6 -0.09 -0.21 -0.44 4.48 -0.23 -0.09 0.06 -0.01 0.17 -0.12 0.23 0.10 0.05 -0.28 -0.10 -0.16 0.01 0.03 0.02 0.26 P5 -0.01 -0.11 -0.11 0.05 0.38 -0.30 -0.16 0.00 1.03 0.06 -0.21 0.31 -0.43 0.03 -0.21 -0.18 -0.36 -0.03 0.12 0.25 P4 -0.14 0.13 -0.38 -0.47 -0.12 -0.16 -0.09 -0.19 0.17 1.12 -0.35 -0.23 0.18 -0.44 -0.15 -0.31 -0.20 -0.13 -0.10 -0.38 L2F HXXXGHM A C D E F G H I K L M N P Q R S T V W Y P5 -0.12 0.32 0.18 -0.48 0.21 -0.55 -0.19 0.17 -0.50 -0.16 -0.20 -0.07 -0.17 -0.03 -0.38 -0.36 -0.34 -0.15 0.41 -0.01 P4 -0.32 -0.11 -0.69 -0.70 -0.15 -0.22 -0.32 0.03 -0.24 1.28 0.02 -0.10 -0.14 -0.34 -0.37 -0.05 -0.42 0.00 -0.21 -0.20 P3 -0.30 -0.18 -0.28 -0.12 0.16 -0.49 -0.17 0.66 -0.22 0.33 0.19 -0.25 -0.62 -0.13 -0.36 -0.34 -0.28 0.08 0.02 0.16 L2F HPXXXHM A C D E F G H I K L M N P Q R S T V W Y P4 -0.87 -0.84 -0.91 -0.91 -0.82 -0.88 -0.91 1.28 -0.85 6.50 -0.83 -0.94 -0.86 -0.88 -0.83 -0.85 -0.89 -0.83 -0.83 -0.87 P3 -0.90 -0.76 -0.23 -0.82 -0.49 -0.88 -0.87 1.92 -0.85 0.64 -0.18 -0.85 -0.87 -0.70 -0.85 -0.89 -0.82 0.99 -0.74 1.37 P2 -0.08 -0.28 -0.75 -0.78 2.00 -0.77 -0.86 -0.78 -0.91 -0.09 -0.84 -0.86 -0.87 -0.88 -0.89 -0.66 -0.52 0.73 0.71 -0.89 L2F HPLXXXM A C D E F G H I K L M N P Q R S T V W Y P3 -0.70 -0.60 -0.49 -0.69 0.03 -0.78 -0.72 0.18 -0.76 2.63 -0.40 -0.69 -0.74 -0.70 -0.72 -0.76 -0.70 -0.04 0.28 0.27 P2 0.01 -0.55 -0.45 -0.72 2.43 -0.69 -0.69 -0.12 -0.72 -0.26 -0.68 -0.71 -0.71 -0.76 -0.75 -0.48 -0.56 0.49 0.71 0.60 P1 -0.02 -0.56 -0.71 -0.65 0.83 -0.80 4.96 -0.65 -0.72 -0.62 0.38 -0.21 -0.78 2.19 -0.75 -0.52 -0.47 -0.62 -0.24 -0.43 L2F HPLVXXX A C D E F G H I K L M N P Q R S T V W Y P2 0.04 0.02 -0.47 -0.45 0.83 -0.51 -0.46 -0.34 -0.43 0.06 -0.29 -0.45 -0.41 -0.38 -0.42 -0.46 -0.45 -0.17 1.46 0.00 P1 -0.34 -0.22 -0.41 -0.21 0.22 -0.49 5.04 -0.26 -0.50 -0.24 1.19 0.01 -0.44 -0.13 -0.44 -0.47 -0.38 -0.39 -0.40 -0.03 P1' 0.36 -0.17 -0.32 -0.36 -0.20 0.10 -0.27 0.28 -0.49 0.17 0.37 -0.18 -0.42 -0.30 -0.44 0.01 -0.10 0.64 0.54 0.02 Table 17: Phage display enrichment values from selections on libraries with three randomized residues. Each sub-table within the larger table represents the amino acid enrichment values generated for the given genotype of TEV protease on the specified library (with randomized residues denoted as X). Each set of three rows contains enrichment values after two rounds of selection performed on the library in which the corresponding three positions within either the ENLYFQS or HPLVGHM motif was randomized. The enrichment value for each amino acid identity at a given position was calculated as frequencycleaved/frequencycontrol_selection-1. The cells are shaded on a linear scale from red to blue; this color scale is normalized for each sub-table with the lowest number in the sub-table being the darkest red and the highest number in the sub-table being darkest blue.

67

Section 2.5: Specificity Profiling Reveals Functionally Independent TEV Mutation Groups

To illuminate the molecular basis of the evolved changes in substrate specificity, we generated TEV mutants containing small subsets of mutations and profiled their substrate specificities using substrate libraries in which a single residue of the ENLYFQS substrate was randomized. A number of mutations were predicted to influence solubility and stability based on previous reports65-67 or their distance from the substrate in the crystal structure49. We constructed various combinations of the predicted solubility mutations (T17S, N68D, E107D,

D127A, F132L, S135F, F162S, R203Q, K215E, K229E) as well as mutants that putatively influence specificity at

P1 (T146S, D148P, S153N, S170A, N177M), P6 (I138T, N171D, N176T), and P2 (V209M, W211I, M218F) based on the emergence of these mutations during PACE.

All of the tested combinations of mutations resulted in proteases that retained activity to varying degrees

(Figure 28), despite being taken out of their PACE-evolved contexts. As expected, the solubility-enhancing mutants exhibited no significant change in specificity (Figure 29). The P2 variant also did not display any substantial specificity changes, consistent with the lack of a strong change in P2 specificity in the TEV L2F specificity profile

(Figure 29).

T146S, D148P,T146C, S153N, D148P, S153N, T17S, N68D, E107D,E107D, D127A, D127A, S135F, Wild type Wild type H28L, T30AH28Y, T30A S170A, N177MS170A, N177SV209M, W211I,V209F, M218F W211C,N176I M218LN171D, N176TI138T, N171D,F132L, N176T S135F, F162S,R203Q, K229E K215EE107D, D127A,None S135F

MBP-GST

MBP

TEV GST

Figure 28: TEV protease variants containing subsets of TEV L2F mutations are all active. TEV protease variants were engineered to contain groups of mutations taken from the L2F variant. These enzymes were purified and assayed in vitro on the test substrate, MBP–GST, containing the wild-type substrate motif ENLYFQS in the linker. Approximately 1 µg of protease was incubated for 3 h at 30°C with 5 µg of a fusion protein construct in which MBP is linked to GST through a cleavable substrate linker containing the ENLYFQS peptide. All assayed variants retained proteolytic activity despite the naïve genetic dissection of mutations.

68

Mutations in the P6 variant (I138T, N171D, N176T) are sufficient to confer loss of glutamate specificity at

P6 with no other obvious changes to substrate preferences (Figure 26D), suggesting some degree of modularity in protease-substrate interactions. Conversely, the P1 variant (T146S, D148P, S153N, S170A, N177M) not only exhibits broadened specificity at the P1 site, but also shows a concurrent increased affinity for P3 aliphatic side chains (Figure 26E). The mutations within these two variants appear to be responsible for the three largest differences in substrate specificity between wild-type TEV and TEV L2F.

WT TEV H28L, T30A

Enrichment Enrichment

P6 P5 P4 P3 P2 P1 P1’ P6 P5 P4 P3 P2 P1 P1’

V209M, W211I, M218F E107D, D127A, S135F

Enrichment Enrichment P6 P5 P4 P3 P2 P1 P1’ P6 P5 P4 P3 P2 P1 P1’

E107D, D127A, S135F, T17S, N68D, E107D, D127A, R203Q, K215E F132L, S135F, F162S, K229E

Enrichment Enrichment

P6 P5 P4 P3 P2 P1 P1’ P6 P5 P4 P3 P2 P1 P1’ Figure 29: Specificity profiles of TEV variants possessing wild-type-like specificity. The logos above were generated using phage substrate libraries each containing a single randomized amino acid within the ENLYFQS substrate (corresponding enrichment values in Table 16). The genotype of the protease that was used in the selection is specified in the title above each sequence logo.

69

Section 2.6: Evolved TEV L2F Cleaves Human IL-23

Next we tested the ability of the evolved TEV L2F protease to cleave full-length human IL-23 protein. In its active form, IL-23 is a heterodimer between the IL-12p40 subunit and the IL-23p19 subunit. We incubated TEV

L2F with IL-23 in either its heterodimeric or monomeric p19 state, and observed by Western blot the formation of a single cleavage product for the IL-23 heterodimer, and in the presence of excess protease, two cleavage products for the monomeric IL-23p19 substrate (Figure 30).

A - 5 - 5 TEV L2F (μg)

5 5 - - IL-23 (μg)

- - 0.45 0.45 IL-23p19 (μg)

3 4 20kd 1 2 5 15kd

10kd

B RAVPGGSSPAWTQCQQLSQKLCTLAWSA HPLVGH_MDLREEGDEETTNDVPHIQCGD GCDPQGLRDNSQFCLQRIHQGLIFYEKLL GSDIFTGEPSLLPDSPVGQLHASLLGLSQL LQPEGHHWETQQMPSLSPSQPWQRLLLR FKILRNLQAFVAVAARVFAH_GAATLSP Figure 30: Identification of IL-23 cleavage sites by Western blot and LC-MS. IL-23 heterodimer (IL-23) and IL-23 monomer (IL-23p19) were incubated with and without TEV L2F. Reaction mixtures were subject to LC-MS and (A) visualized by western blot with anti-IL-23p19 monoclonal antibody. Bands 1 and 3 correspond to intact IL-23p19; differences in size are due to carboxy-terminal affinity purification tags. Cleavage product bands 2 and 4 correspond to IL-23 fragments with new masses that are 3,598 Da less than the corresponding starting materials. This mass difference is a perfect match for the fragment liberated by cleavage at the target site (HPLVGH//M). Cleavage of the monomer also results in a second product (band 5) with a mass that matches IL-23 cleaved at both the target site (HPLVGH//M) and an off-target site (ARVFAH//G). (B) The IL- 23p19 amino acid sequence is shown with the target cleavage site in blue and the off-target site in orange.

IL-23 digestion reactions were subjected to LC-MS to identify the cleavage products. The heterodimer cleavage reaction generated a new protein of mass 3,598 Da less than the starting material, matching the fragment liberated by cleavage of the target peptide bond at the HPLVGH//M sequence (Figure 31). Data from the monomer cleavage reaction in the presence of a 1.5-fold excess of TEV L2F revealed two new masses. The minor proteolytic product corresponds to a single cleavage at the on-target site (HPLVGH//M) (Figure 32). The more abundant ion was a match for proteolysis at both the on-target site (HPLVGH//M) and an additional off-target site (ARVFAH//G) that is consistent with the L2F specificity profile shown in Figure 26C. The absence of an ion corresponding to IL-

70

23 cleaved at only the off-target site suggests that the on-target site is kinetically favored by TEV L2F. This off- target site was only cleaved in the monomeric substrate and not the heterodimer presumably because it is occluded by the IL-12p40 subunit in the heterodimer structure68.

2 x10 2 A C x10 1 1 0.8 0.8 0.6 0.6 0.4 0.4

0.2 0.2

0 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Counts (%) vs. Acquisition Time (min) Counts (%) vs. Acquisition Time (min)

B D x10 6 x106 27768.39 1.1 19472.96 1 1 Deconvoluted 0.9 0.9 Expected 18722 0.8 0.8 Delta m 751 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 15874.97 0.2 22324.58 35908.61 0.2 0.1 0.1 36070.58 55538.38 0 0 8000 10000 12000 14000 16000 18000 20000 22000 24000 26000 28000 30000 32000 34000 36000 38000 40000 10000 15000 20000 25000 30000 35000 40000 45000 50000 55000 60000 Counts vs. Deconvoluted Mass (amu) Counts vs. Deconvoluted Mass (amu) Figure 31: Identification of the cleavage site within IL-23 heterodimer by mass spectrometry. IL-23 was obtained in its native heterodimeric state following expression and purification from cultured mammalian cells (PHC9321, ThermoFisher). This protein was incubated under reducing conditions either in the presence or absence of TEV L2F. These samples were analyzed by LC-MS to yield total ion current (A, C) and the corresponding deconvoluted mass spectra (B, D). Both samples exhibit a cluster of masses around 36,000 Da corresponding to the multiple glycoforms of the IL12p40 subunit. The unreacted sample (A, B) contains a mass of 19,472Da that is 751 Da greater than the expected mass of IL-23, which is likely caused by an unspecified C- terminal tag. The reaction mixture (C, D) contains a 27,768 Da match for TEV L2F as well as a 15,875 Da mass that matches the expected cleavage product plus the unspecified 751 Da C-terminal tag.

x102 x102 A 1.1 C 1.1 TIC 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 -0.1 -0.1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Counts (%) vs. Acquisition Time (min) Counts (%) vs. Acquisition Time (min)

x106 B 3 D x105 2.75 22324.77 Predicted MW=22324.2 4 14526.98 2.5 27768.04 2.25 3.5 2 3 1.75 2.5 1.5 1.25 2 1 1.5 0.75 1 0.5 0.5 0.25 18727.03 0 0 8000 10000 12000 14000 16000 18000 20000 22000 24000 26000 28000 30000 32000 34000 36000 38000 40000 10000 15000 20000 25000 30000 35000 40000 45000 50000 55000 60000 Counts vs. Deconvoluted Mass (amu) Counts vs. Deconvoluted Mass (amu) Figure 32: Identification of two cleavage sites within IL-23 monomer by mass spectrometry. IL-23p19 was expressed and purified from cultured HEK293T cells using a C-Terminal Myc/DDK tag (TP309680, Origene). This protein was incubated under reducing conditions either in the presence or absence of TEV L2F. These samples were analyzed by LC-MS to yield total ion current (A, C) and the corresponding deconvoluted mass spectra (B, D). The unreacted sample (A, B) contains a mass of 22,324 Da which is a perfect match for the IL-23p19 sequence and Myc tag in the product data. The reaction mixture (C, D) contains three additional masses: TEV L2F (27,768 Da), substrate cleaved only at the HPLVGHM target site (18,727 Da), and substrate cleaved at both the target site and an off-target site ARVFAHG (14,526 Da).

71

Section 2.7: Evolved TEV Protease Deactivates IL-23 and Prevents IL-17 Secretion

Finally we tested the ability of the evolved TEV L2F protease to abrogate the biological activity of IL-23.

We used a previously described IL-23 activity assay with primary isolates of mouse mononuclear splenocytes69.

When cultured in the presence of IL-2 and IL-23, Th17 cells are stabilized and secrete IL-17 into the media supernatant, which is quantified by ELISA. We observed a dose-dependent attenuation of IL-17 production when

IL-23 was pre-incubated with TEV L2F (Figure 33). These pre-incubated samples were also visualized by Western blot demonstrating that the p40 subunit is unaffected by incubation with protease, and that inhibition of IL-17 production is causally linked to IL-23p19 cleavage (Figure 34). Even a sub-stoichiometric dose of L2F protease

(0.36 equivalents) resulted in conversion of greater than 50% of IL-23 into cleaved product (Figure 35) and the loss of nearly all IL-23-induced IL-17 secretion, consistent with the action of TEV L2F in a catalytic manner to deactivate IL-23 (Figure 36). In contrast, addition of neutralizing IL-23 antibody elicits a dose-dependent attenuation of IL-17 production in which the minimum effective dose is stoichiometric with IL-23 concentration

(Figure 37). Direct addition of TEV L2F to splenocyte cultures in serum-containing media supplemented with IL-

23 did not attenuate IL-17 secretion (Figure 37). While the presence of an equivalent concentration of serum did not inhibit cleavage in vitro (Figure 38), it is possible the protease is sequestered by other secreted factors or cell-surface proteins within the complex culture media, or that IL-23 binding to IL-23R may occur faster than IL-23 proteolysis.

72

800 * p<0.05 ** ** ** p<0.005 * 700 unpaired t-test

600

500

400

IL-17 (pg/ml) IL-17 300

200

100

0 Premix 16h 4°C - - - + + + + + + + + WT TEV (μg/ml) - - - - - 120nM - - - - 120nM L2F TEV (μg/ml) - - - - 120nM - - 12nM 60nM 120nM - Anti-IL-23 (μg/ml) - - 67nM ------+ + - - - + + + + + IL-23 (0.1 μg/ml) Figure 33: Protease-mediated attenuation of IL-17 secretion in mouse splenocytes. The activity of IL-23 in vivo is mediated by stabilization of a T-helper cell lineage (Th ) that secretes IL-17, leading 17 to downstream pro-inflammatory signals. This pathway can be assayed within a culture of mouse mononuclear splenocytes, by measuring the amount of IL-17 secretion into the cell culture media using an ELISA. As a positive control, anti-IL-23 antibodies in a super-stoichiometric ratio prevent IL-17 secretion. Preincubation of IL-23 and evolved TEV L2F attenuates IL-17 secretion, demonstrating that cleavage of the HPLVGHM target site inactivates immune signaling capabilities of IL-23. Error bars represent the standard deviation of three technical replicates.

IL23 - - - + + + + + (0.54μM) TEV - L2F WT - L2F L2F L2F WT 36μM 36μM 3.6μM 18μM 36μM 36μM IL12p40

IL23p19

Figure 34: Western blot of pre-mixed additives to splenocyte cell culture. IL-23 and TEV proteases were incubated for 16 h at 4 °C in the presence of BSA as a stabilizing carrier protein. Samples were prepared at 300x concentration used in splenocyte cultures to enable detection of IL-23p19 and IL- 12p40 by western blot. Neither component is proteolyzed by wild-type TEV protease; IL-12p40 is also unaffected by TEV L2F. As expected, TEV L2F cleaves IL-23p19 at the HPLVGHM site in a dose-dependent manner. At the highest doses, off-target cleavage products are also observed. An aliquot of these samples was directly used in the cell culture experiments in Figure 33 to confirm that on-target proteolysis causes IL-23 loss of function.

73

IL23 + + + + (0.54μM) TEV - L2F L2F L2F 0.22μM 2.2μM 22μM

IL12p40

IL23p19

Figure 35: Western Blot of pre-mixed additives to splenocyte cell culture. At approximately 0.40 molar equivalents of TEV L2F, greater than 50% of IL-23p19 is cleaved at HPLVGHM site, consistent with the ability of TEV L2F to process substrates with multiple turnover. At the highest doses off-target cleavage products are also observed. An aliquot of these samples was directly used in the cell culture experiments in Figure 36 to confirm that on-target proteolysis causes IL-23 loss of function. 600 * p<0.05 * unpaired t-test * * 500

400

300 IL17 (pg/ml) IL17 200

100

0 Premix 16h 4°C - + + + + L2F TEV - - 0.72nM 7.2nM 72nM IL-23 (1.8nM) - + + + +

Figure 36: TEV L2F catalytically deactivates IL-23 and prevents IL-17 secretion in mouse splenocytes. IL-17 is secreted by cultured mouse mononuclear splenocytes in response to human IL-23 in the media. The secretion of IL-17 can be prevented by pretreatment of IL-23 with TEV L2F at a dose that is less half the molar equivalent of IL-23. Inhibition is first observed at a dose corresponding with 0.72 nM TEV L2F (compared with 1.8 nM IL-23), confirming that IL-23 is deactivated with multiple turnover by TEV L2F.

74

400 * p<0.005 * unpaired t-test * 350 *

300

250

200 IL-17 (pg/ml) IL-17 150

100

50

0 WT TEV - - - 360nM ------360nM L2F TEV - - 360nM ------0.72nM 7.2nM 72nM 360nM - Anti-IL-23 - 67nM - - - 0.13nM 1.3nM 13nM 67nM - - - - - IL-23 (1.8nM) - - - - + + + + + + + + + + Figure 37: TEV L2F must be pre-incubated with IL-23 to prevent IL-17 secretion in mouse splenocytes. IL-17 is secreted by cultured mouse mononuclear splenocytes in response to human IL-23 in the media. This response can be prevented by addition of antibodies that neutralize IL-23 directly to cell culture media. A dose- dependent response is observed in which the antibody neutralizes IL-23 through a stoichiometric binding mechanism. We begin to see inhibition at approximately 1.3nM antibody (compared with 1.8nM IL-23). Evolved TEV L2F, when added directly to cell culture media, is unable to prevent IL-23 from stimulating IL-17 secretion. This shortcoming is likely due to slower kinetics of IL-23 degradation by protease when compared with IL-23 receptor binding. Furthermore, the proteolysis reaction velocity at these physiological concentrations will be orders of magnitude slower than those observed in the 300-fold pre-incubation experiments (Figure 36). Alternatively, it is also possible that the protease is sequestered by other cell surface or secreted factors preventing IL-23 proteolysis.. FBS 10% - - - - - + IL23 + + + + + + (0.54μM) TEV - L2F L2F L2F WT L2F 0.22μM 2.2μM 22μM 22μM 22μM

IL12p40

IL23p19

Figure 38: TEV L2F is unaffected by the addition of FBS to in vitro cleavage assays. IL-23 and TEV proteases were incubated for 16 h at 4°C in the presence of BSA as a stabilizing carrier protein. The addition of 10% Fetal Bovine Serum (FBS) to the assay buffer had no effect on the efficiency of cleavage by TEV L2F. The same percentage of FBS was used to supplement cell culture media, suggesting that components within serum are not responsible for a loss of TEV L2F activity when added directly to splenocyte cell cultures.

75

Section 2.8: Discussion

The generation of on-demand biochemical catalysts has been a longstanding interest of the scientific community70. Previous efforts to evolve protease specificity have been successful at altering the substrate specificity of model proteases by one amino acid18-22. By using PACE to conduct 2,500 total generations of evolution in three diverging evolutionary trajectories, evolutionary stepping-stones to guide populations through long evolutionary trajectories, and both targeted and elevated random mutagenesis, we evolved TEV protease variants that cleave a substrate dramatically different from the wild-type substrate with only a modest decrease in kinetic parameters compared with wild-type TEV on its consensus substrate. This work demonstrates for the first time that a protease can be reprogrammed through laboratory evolution to cleave and deactivate a disease-associated human protein.

Even with these advances, evolved proteases still face additional challenges to therapeutic applications.

The above data demonstrate that evolved L2F protease has sufficient specificity to avoid degrading itself, the numerous proteins required for the PACE selection, MBP, GST, or essential protein components of the IL-23 assay beyond IL-23. The ability of the L2F protease to cleave IL-23 is also not inhibited by the presence of 10% fetal bovine serum, which contains high concentrations of other proteins. The specificity of the evolved L2F protease overall resembles that of many natural proteases, which like L2F reject the majority of possible amino acids but accept mixtures of others at each recognized substrate position61-64.

While these advances suggest the utility of evolved proteases for many research and industrial applications, they face additional challenges for therapeutic applications. The evolution of proteases under many generations of positive selection alone can result in proteases that accept the target substrate but do not reject the wild-type or intermediate substrates. In cases in which it is desirable to reject cleavage of certain non-target substrates, the application of a PACE negative selection strategy to apply selection pressure against non-target cleavage may be useful28. Moreover, any circulating foreign protein therapeutic if administered repeatedly poses a substantial immunogenicity risk. Consequently, ideal starting points for therapeutic protease evolution may be circulating human proteases, rather than a viral protease such as TEV.

These challenges notwithstanding, this study represents a foundation for the directed evolution of proteases with highly altered specificities. We demonstrate the feasibility of catalytic inactivation of a target protein with an evolved protease, which in some cases may offer substantial benefits over stoichiometric binding of a neutralizing antibody. In addition to potency advantages, we anticipate that evolved proteases may also enable research and

76

therapeutic applications that are unavailable to antibodies such as proteolysis-induced gain-of-function and proteolysis-mediated alteration of a protein’s import, export, subcellular localization, half-life, or post-translational modification state.

77

Section 2.9: Methods

Ranking of Target Sites within Extracellular Proteins. A list of human extracellular and transmembrane proteins with their corresponding amino acid sequences were tabulated using the ProteinData functionality in Mathematica

10. This data was transferred into MATLAB for further processing by a customizable script that performed the following operations. A rating matrix that is 7 positions wide (for the seven sites within the TEV protease recognition motif) by 20 long (for each possible amino acid) was manually populated with biochemical specificity data or subjective “evolvability” integer ratings. Each protein was converted into a binary sparse matrix with as many rows as the length of the protein sequence and 20 columns one for each amino acid. For each protein matrix, 7 rows at a time were multiplied by the rating matrix, with the trace of the resulting 7x7 product matrix providing a score for the heptapeptide. For each extracellular protein the best score and the corresponding peptide and starting- residue index were saved. Once all protein sequences had been processed, we sorted the protein names along with their best-match candidate substrate sequences by score.

Cloning of Accessory Plasmids, Expression Vectors and Phage Libraries. All primers were designed to perform

USER cloning71 and ordered from Integrated DNA Technologies (IDT). For the cloning of phage libraries, NNK codons were generating using hand-mixed phosphoramidite ratios to provide uniform incorporation rates. All PCR reactions were performed using Phusion U Hot Start polymerase (Thermo-Fisher).

For the assembly of APs and expression vectors, PCR products were purified using EconoSpin columns

(Epoch Life Sciences) and assembled with DpnI and USER enzyme in CutSmart Buffer (New England BioLabs).

Following assembly, plasmids were transformed into NEB Turbo Competent E. coli cells (New England BioLabs).

For the assembly of phage libraries, PCR products were purified by gel electrophoresis and extracted using the MinElute kit by Qiagen. Following an assembly reaction identical to that of the AP, the USER reaction was desalted using the MinElute PCR purification kit (Qiagen) prior to electroporation into competent E. coli S105924

(for SP libraries) or NEB Turbo Electrocompetent E. coli cells24 (for substrate display phage libraries). Phage libraries were grown overnight in 2xYT and filtered through sterile 0.22 µm membranes to eliminate host cells. The titers of phage libraries were evaluated by plaque assay using strain S1059 as hosts. Briefly, phage were prepared in four 50-fold serial dilutions of 50 µL. To each dilution was added 100 µL of fresh host cell culture at approximately

OD600 = 1.0 followed by addition of 900 µL top agar (2xYT, 6 g/L agar). The mixture was mixed by pipet, then

78

transferred to a quarter-plate prepared with a thin layer of bottom agar (2xYT, 16 g/L agar). Plaque assays were incubated overnight at 37 °C.

In order to assess library quality, 12 clones were sequenced to confirm diversity at the targeted amino acid positions. Briefly, individual plaques were picked with a pipet tip in order to provide template material (SP-infected

E. coli) for rolling circle amplification (TempliPhi, GE Healthcare). Sanger DNA sequencing was performed using primer BCD1136 (Table 19), and results were aligned and tabulated using SeqMan (DNAStar).

PACE Experiments. PACE experiments were performed as previously described24, 28, 42, 53, 54, 72. E. coli strain S1030 was co-transformed by electroporation with a mutagenesis plasmid (MP6)25 and an accessory plasmid (plasmids are described in Table 18 and detailed in Figure 12). Chemostats containing 80 mL of Davis Rich Media with 22.5

µg/mL carbenicillin and 15 µg/mL chloramphenicol were inoculated with overnight starter cultures and grown at

37 °C while mixing at 250 rpm via a magnetic stir bar. Once the chemostat grew to approximately OD600 = 1.0, we began dilution with fresh media at a rate of 80-100 mL/h, with the waste needle set at a height of 80 mL. At the same time, we began the flow of chemostat culture at approximately 10-20 mL/h into a lagoon with a waste needle set at a height of 15 mL. The total flow rate through each lagoon was set based upon the difficulty of a given experiment, with slower dilution being used for more challenging evolutions. For the full duration of the experiment

10% w/v arabinose solution was syringe-pumped into the lagoons at a rate of 0.5-1.0 mL/h.

Experiments starting with an NNK mutagenized SP library initiated with a lagoon inoculum of 1-2 mL of phage library containing 108-1010 pfu/mL. For all other experiments, lagoons were inoculated with 50-100 µL of filtered phage population from the last time point of the previous PACE experiment. In PACE experiments using mixtures of host cell cultures (see the main text), lagoons received an influx of cell culture from two separate chemostats containing hosts bearing two different APs (combined rate of 10-20 mL/h) for a period of 24-48 h.

Phage samples were collected from lagoon waste outflow lines at 24 h intervals and passed through a 0.22

µm sterile filter to remove host cells. The titers of phage samples were evaluated by plaque assay using strain S1059 as hosts.

At the end of each PACE experiment, eight individual plaques were picked with a pipet tip in order to provide template material (SP-infected E. coli) for rolling circle amplification (TempliPhi, GE Healthcare). The same pipet tip was subsequently transferred to a 96-deep well culture plate containing 2xYT media for growth

79

overnight at 37 °C. After a PACE experiment, enriched mutations should be present within multiple clones of this small sample of eight population members. Sanger DNA sequencing was performed using primer BCD1136 (Table

19), and results were aligned and tabulated using SeqMan (DNAStar).

Luminescence Assays of Evolved Clones. Clones chosen for characterization were sterile-filtered from the corresponding position within the 96-well culture plate. Saturated overnight cultures of S1030 cells containing a substrate AP were used to initiate luciferase assays in 96-well culture plates. Approximate volumes were 500 µL

2xYT, 50 µL overnight starter culture, and 10 µL filtered phage samples. All assays included a negative control (no phage), a positive control (SP encoding T7 RNAP), and wild-type TEV SP as a reference. Experimental and control conditions were performed in triplicate. After 3-5 h of growth in a 37 °C shaker, 100 µL was transferred to a clear-

bottom assay plate to measure OD600 and luminescence on a Tecan Infinite Pro Plate Reader. Measurements were analyzed as OD-normalized values and as luminescence fold-change over the negative control.

Purification of TEV Proteases and Fusion Protein Substrates. TEV protease was purified as previously described22, 65-67, but with minor modifications. OneShot BL21 Star (DE3) chemically competent cells (Invitrogen) were transformed with expression vectors encoding MBP fused through a TEV cleavage site to a 6xHis-tagged TEV protease. Five mL of saturated overnight starter culture was added to 1-2 L of LB + kanamycin (40 µg/mL), and

grown at 37 °C until OD600 = ~0.7. Expression was induced with 1 mM IPTG for 4 h at 30 °C, and cells were harvested by centrifugation at 6000 g for 5 min. The pellet was resuspended in 15-25 mL binding buffer (10% glycerol, 50 mM Tris pH 8.0, 1.0 M NaCl, 1 mM DTT, and 20 mM imidazole) with a Roche Complete EDTA-free protease inhibitor tablet (note that TEV protease is unaffected by conventional protease inhibitors). Cells were lysed by sonication for 4 min with a 1-s on, 1-s off cycle at medium power. Lysate was clarified by centrifugation at

18,000 g for 20 min. Clarified lysate was incubated with 1-2 mL TALON metal affinity resin (Clontech) for 1 h mixing end-over-end at 4 °C. Resin was pelleted at 700 g for 5 min, and resuspended with 10 mL of binding buffer to load onto a gravity flow column. Resin was washed with 10 column volumes of binding buffer, followed by 2 column volumes of bind buffer with imidazole supplemented to 50 mM. TEV protease was eluted with 4 column volumes of elution buffer (10% glycerol, 50 mM Tris pH 8.0, 0.1 M NaCl, 1 mM DTT, and 250 mM imidazole).

The purity of fractions was assessed by SDS-PAGE using precast Bolt 4-12% Bis-Tris gels (ThermoFisher), and

80

TEV containing fractions were pooled and concentrated to < 250 µL using an Amicon Ultra Centrifugal filter with a

10 kDa molecular weight cut-off (EMD-Millipore). The concentrated sample was further purified to > 95% using a

SuperDex 200 Increase 10/300 column (GE Healthcare) running with storage buffer (20% glycerol, 50 mM Tris pH

8.0, 0.1 M NaCl, 1 mM DTT). Proteases used in mammalian cell culture were further subjected to endotoxin removal resin (Pierce), followed by assaying with an LAL endotoxin quantification kit (Pierce). Protein concentrations were determined by Bradford Assay (ThermoFisher) and aliquots were frozen in liquid nitrogen for storage at -80 °C.

MBP–GST test substrates were expressed, purified, and stored exactly as described above for TEV, except for the following changes. Expression was induced with 1 mM IPTG for 16 h at 20 °C, and binding buffer was 50 mM Tris pH 8.0, 0.5 M NaCl, supplemented with a Roche Complete EDTA-free protease inhibitor tablet. After sonication and centrifugation, clarified lysate was incubated with 1 mL glutathione-linked sepharose (Clontech) for

1 h mixing end-over-end at 4 °C. Loaded resin was washed with 40 column volumes of binding buffer, followed by

4 column volumes of elution buffer (50 mM Tris pH 8.0, 100 mM NaCl, 10 mM glutathione). Samples were > 95% pure as assessed by SDS-PAGE and were dialyzed against storage buffer (20% glycerol, 50 mM Tris pH 8.0, 0.1 M

NaCl, 1 mM DTT) using Slide-A-Lyzer Cassettes with a 10-kDa molecular weight cut-off (ThermoFisher).

Assaying Proteolysis of Fusion Protein Substrates. Protease assays consisted of 5 µg of MBP–GST substrate and

1 µg of wild-type or evolved TEV protease incubated for 3 h at 30 °C in storage buffer (20% glycerol, 50 mM Tris pH 8.0, 0.1 M NaCl, 1 mM DTT) supplemented with freshly prepared DTT to a final concentration of 2 mM.

Reactions were analyzed by SDS-PAGE and visualized with Coomassie stain.

HPLC Kinetics Assay. Protease kinetics were determined as previously described22, but with minor adjustments.

Synthetic peptide substrates (THPLVGHMGTRRW- dinitrophenol-lysine and TENLYFQSGTRRW-dinitrophenol- lysine) and synthetic standards for cleaved products (MGTRRW- dinitrophenol-lysine and SGTRRW-dinitrophenol- lysine) were ordered from Genscript. Dinitrophenol moieities provided strong absorbance at 355 nm for more accurate quantification. Reactions and standards were analyzed by HPLC on a C18 reverse-phase column (Kinetex

5µ C18 100A, Phenomenex) using an acetonitrile gradient from 5-50%. Standard curves were constructed for both

81

products (MGTRRW- dinitrophenol-lysine and SGTRRW-dinitrophenol-lysine) to enable quantification of reaction progress.

Reactions were carried out with 0.05-0.1 µM protease and 50 µM to 2 mM substrate. Proteases (in storage buffer plus 1 mM freshly prepared DTT) and substrates (in sterile water) were prepared as solutions at 2x concentration (50 µL each) then combined to yield a total reaction volume of 100 µL. Reactions were incubated at

30 °C for 10 min and quenched with 25 µL of 5% TFA. After quenching, protease was eliminated from samples using an Amicon Ultra Centrifugal filter with a 10-kDa cut-off (EMD-Millipore). Prior to conducting reactions in triplicate, all conditions were tested and monitored at 5, 10, 30 min to ensure that 10 min was within the linear range of the reaction (< 25% substrate consumption). Peak integrations were tabulated, converted into product concentrations using the standard curves, and fit to the Michaelis-Menten kinetics model using Prism GraphPad.

Protease Specificity Profiling by Phage Substrate Display Selection. For each combination of library and protease, 60 µL of a 50% suspension of Anti-FLAG M2 Magnetic beads (Sigma) was transferred into a 1.5-mL eppendorf tube. For all subsequent manipulations, a magnetic plate was used to separate beads and allow aspiration of the supernatant. After washing with 1 mL of TBS (20 mM Tris pH 7.0, 150 mM NaCl), beads were incubated with 30-100 µL of substrate phage libraries (titers ranged from 108-1010 pfu/mL) in 1 mL of TBS at room temperature for 2 h rotating end-over-end. After initial binding, the supernatant was discarded and beads were washed with 1 mL of TBS. Beads with bound substrate phage were incubated in 0.5 mL TBS containing 0.5 µM protease for 2 h. Supernatant containing cleaved substrate phage was recovered, and the beads were again washed with 1 mL of TBS. The remaining bound uncleaved substrate phage was eluted in 0.1 mL of TBS containing 0.1 mg/mL FLAG peptide (Sigma).

For substrate libraries containing a single randomized amino acid position, a single round of selection was sufficient. For substrate libraries containing windows of three randomized amino acids, a second round of selection was necessary to detect enrichment. In these cases, the round 1 cleaved substrate phage were expanded in overnight cultures consisting of 100 µL cleaved substrate phage and 900 µL S1030 culture (diluted to OD600 = ~0.1).

Following outgrowth, cultures were centrifuged to pellet E. coli prior to aspirating the supernatant phage to be used in a second round of phage substrate display as previously described. Due to expansion biases during outgrowth,

82

these specificity profiles were only interpreted after normalizing to the second round elution of the no-protease control experiments.

High-Throughput Sequencing and Data Analysis Generating Specificity Profiles. Samples were amplified by

PCR using Q5 Hot Start 2x Master Mix (NEB) with 1 µL of template phage sample and primers (MSP819 or

MSP820, and MSP824, see Table 19). Illumina barcodes were added in a second PCR reaction using 1 µL of the first round PCR material as template. Samples were pooled and purified by gel electrophoresis using a MinElute

Gel Extraction kit (Qiagen). The concentration of the pooled library was measured by Quant-iT PicoGreen dsDNA

Assay (ThermoFisher) and diluted to approximately 4 nM. This concentration was further adjusted based on qPCR quantification (Kapa Biosystems). Samples were loaded onto an Illumina MiSeq using a v2 50-cycle kit set up to run a single-direction read of 50 nucleotides.

Data was automatically demultiplexed by MiSeq Reporter software and the resulting fastq files were processed by a custom Python script. This script searches each sequencing read for a perfect match to sequences flanking both sides of the proteolysis site. If the proteolysis site in between these matching flanks is exactly 21 nucleotides, then the proteolysis site is translated to a seven amino acid sequence. Sequences were disregarded in subsequent analysis if they contained a stop codon, were template material used in library cloning (HNLYGHS), or were the FLAG tag sequence (YKDDDDK) due to spontaneous genetic deletion of the proteolysis site. The list of proteolysis sites was tabulated into a position-specific amino acid frequency table. For each library-protease

combination, enrichment values were calculated as freqcleaved/freqelution-1. For each protease, specificity data from the randomized position within single-site libraries was combined into a single table and converted into a sequence logo using a the Seq2Logo webserver60. For libraries containing windows of three randomized positions, sequence logos for each protease-library combination were separately generated.

Western Blot Visualization of Recombinant IL-23 Cleavage. IL-23 was purchased as a Myc-tagged IL-23p19 monomer (TP309680 Origene) and as a heterodimer of IL-23p19 and IL12p40 (PHC9321 ThermoFisher). Five µg of heterodimer or 0.44 µg of monomer was incubated with 5 µg of TEV L2F for 3 h at 30 °C in storage buffer (20% glycerol, 50 mM Tris pH 8.0, 0.1 M NaCl, 1 mM DTT) supplemented with freshly prepared DTT to a final concentration of 2 mM. Samples were electrophoresed on precast Bolt 4-12% Bis-Tris gels (ThermoFisher), then

83

transferred to a PVDF membrane using the iBlot 2 Dry Blotting System (ThermoFisher). Membranes were incubated at room temperature for 30 min in Odyssey Blocking Buffer (LiCor). Primary antibody (IL-23 Antibody

(26H20L23), ABfinity™ Rabbit Monoclonal, ThermoFisher) was added to blocking buffer in 1:1000 dilution, and the membrane was incubated on a rocker at 4 °C overnight. After 3 washes with TBST (20 mM Tris pH 7.0, 150 mM NaCl, 0.1% Tween 20), the membrane was incubated 1h at room temperature in blocking buffer containing a

1:1000 dilution secondary antibody (IRDye 800CW Donkey anti-Rabbit IgG, LiCor). After three more washes with

TBST, the membrane was scanned using an Odyssey Imaging System.

LC-MS Identification of Cleavage Sites. IL-23p19 and the IL-23 heterodimer (ThermoFisher) were reduced with

10 mM DTT to identify intact masses of unreacted IL-23p19 subunits. IL-23 substrates were incubated in a manner similar to western blots for 3 h at 30°C using 2 to 10 µg of substrate and 4 µg of TEV L2F. All samples were analyzed using an Agilent LC-MS 6220 (ESI-TOF) equipped with an Agilent PLRP-S column. A standard protein

LC method was used containing a 15-minute reverse-phase gradient (0.1% formic acid in water, MeCN 0.1% formic acid).

IL-23-induced IL-17 Production in Mouse Mononuclear Splenocytes. The following protocol was adapted from those previously published69. Two male mice (C57BL/6J) were euthanized and dissected to isolate spleens. Spleens were pulverized into 10 mL of cell culture media (DMEM, Glutamax, high-glucose, penicillin, streptomycin, 10%

FBS ThermoFisher) through a 100 µm nylon mesh Falcon Cell Strainer (Corning). Cell suspensions were centrifuged for 3 min at 700 g, and the supernatant was discarded. The pellet was resuspended in 1 mL ACK lysis buffer (Gibco/ThermoFisher). After 5 min, lysis was stopped with the addition of 9 mL of DMEM, and cells were pelleted by centrifugation at 700 g for 3 min. If the pellet was red due to remaining red blood cells, ACK lysis was repeated. Otherwise if the pellet was white, lysis was complete and cells were resuspended in 4 mL of DMEM. Cell density was quantified using a Scepter 2.0 Handheld Automated Cell Counter (Millipore). Cultures were diluted to

2x106 cells/mL in cell culture media supplemented with 100 units/mL recombinant human IL-2 (Roche). The outer perimeter wells of a 96-well round bottom culture plate were filled with 100 µL of cell-free media to prevent evaporative loss in central wells containing cell cultures. The central wells were prepared in triplicate filled first

84

with 125 µL of culture followed by 25 µL of additives (see below). Cell culture supernatant was sampled after two days of growth, and 10 µL was used to perform a mouse IL-17 ELISA (R&D Systems).

Additives containing IL-23 and varying doses of protease or neutralizing antibody (MAB1510, R&D

Systems) were prepared in cell culture media immediately prior to mixing with splenocytes. Additives containing doses of proteases were also prepared as pre-incubated samples at 300x final concentration. Incubation was performed at 4 °C for 16 h in storage buffer (20% glycerol, 50 mM Tris pH 8.0, 0.1 M NaCl, 1 mM DTT) supplemented with 2.5 mg/mL of BSA carrier-protein to enhance stability during incubation. These pre-incubated samples were prepared at high concentration to confirm cleavage efficiency by western blot as described above.

However, this western blot was conducted using two primary antibodies Anti-IL-12p40 (ab62822 Abcam)/Anti-IL-

23p19 (sc271279 Santa Cruz) and two secondary antibodies IRDye800CW Donkey anti-Mouse/IRDye 680RD

Donkey anti-Goat (LiCor).

Shorthand Name Type Glycerol Stock Origin of Replication Resistance Marker Description/Features MP6 MP MSP513 cloDF Chloramphenicol pBad dnaQ926, dam, seqA, emrR, ugi, cda1 122-1182-proB AP MSP955 pSC101 Carbenicillin proB-lysozyme-ggs-ENLYFQS-ggs-T7RNAP//T7pro-gIII-lux 122-432-proB AP MSP565 pSC101 Carbenicillin proB-lysozyme-ggs-HNLYFQS-ggs-T7RNAP//T7pro-gIII-lux 122-653-proB AP MSP722 pSC101 Carbenicillin proB-lysozyme-ggs-ENLYGQS-ggs-T7RNAP//T7pro-gIII-lux 122-683-proB AP MSP770 pSC101 Carbenicillin proB-lysozyme-ggs-HNLYFHS-ggs-T7RNAP//T7pro-gIII-lux 122-690-proB AP MSP780 pSC101 Carbenicillin proB-lysozyme-ggs-HNLYGHS-ggs-T7RNAP//T7pro-gIII-lux 122-699-proB AP MSP794 pSC101 Carbenicillin proB-lysozyme-ggs-HNLVGHS-ggs-T7RNAP//T7pro-gIII-lux 122-692-proB AP MSP782 pSC101 Carbenicillin proB-lysozyme-ggs-HPLVGHM-ggs-T7RNAP//T7pro-gIII-lux 122-692-proA AP MSP814 pSC101 Carbenicillin proA-lysozyme-ggs-HPLVGHM-ggs-T7RNAP//T7pro-gIII-lux proB-lysozyme-ggs-HPLVGHM-ggs-T7RNAP(Q649S)//T7pro- 122-692-proB Q649S AP MSP832 pSC101 Carbenicillin gIII-lux 122-733-proB AP MSP833 pSC101 Carbenicillin proB-lysozyme-IL-23(38-66)-T7RNAP//T7pro-gIII-lux 122-733-proB Q649S AP MSP855 pSC101 Carbenicillin proB-lysozyme-IL-23(38-66)-T7RNAP(Q649S)//T7pro-gIII-lux 122-733-proA Q649S AP MSP848 pSC101 Carbenicillin proA-lysozyme-IL-23(38-66)-T7RNAP(Q649S)//T7pro-gIII-lux pET MBPTEV WT Expression MSP573 pBR322 Kanamycin pET28 MBP-ENLYFQS-TEV WT pET MBPTEV 111215 L1F Expression MSP850 pBR322 Kanamycin pET28 MBP-HPLVGHM-TEV L1F 111215 pET MBPTEV 111215 L2F Expression MSP851 pBR322 Kanamycin pET28 MBP-HPLVGHM-TEV L2F 111215 pET MBPTEV 111215 L5B Expression MSP852 pBR322 Kanamycin pET28 MBP-HPLVGHM-TEV L5B 111215 pET MBPTEV H28L T30A Expression MSP968 pBR322 Kanamycin pET28 MBP-ENLYFQS-TEV H28L T30A pET MBPTEV HisP6c Expression MSP577 pBR322 Kanamycin pET28 MBP-HNLYFQS-TEV I138T, N171D, N176T pET28 MBP-ENLYFQS-TEV T146S, D148P, S153N, S170A, pET MBPTEV HisP1a Expression MSP969 pBR322 Kanamycin N177M pET MBPTEV GlyP2a Expression MSP824 pBR322 Kanamycin pET28 MBP-ENLYGQS-TEV V209M, W211I, M218F pET MBPTEV solC Expression MSP970 pBR322 Kanamycin pET28 MBP-ENLYFQS-TEV E107D, D127A, S135F pET28 MBP-ENLYFQS-TEV E107D, D127A, S135F, R203Q, pET MBPTEV solB Expression MSP971 pBR322 Kanamycin K215E pET28 MBP-ENLYFQS-TEV T17S, N68D, E107D, D127A, pET MBPTEV solA Expression MSP972 pBR322 Kanamycin F132L, S135F, F162S, K229E pET MBPGST 447 Expression MSP574 pBR322 Kanamycin pET28 MBP-ENLYFQS-GST pET MBPGST 446 Expression MSP578 pBR322 Kanamycin pET28 MBP-HNLYFQS-GST pET MBPGST 709 Expression MSP805 pBR322 Kanamycin pET28 MBP-ENLYFHS-GST pET MBPGST 710 Expression MSP806 pBR322 Kanamycin pET28 MBP-ENLYgQS-GST pET MBPGST 711 Expression MSP807 pBR322 Kanamycin pET28 MBP-HNLYFHS-GST pET MBPGST 712 Expression MSP808 pBR322 Kanamycin pET28 MBP-HNLYGHS-GST pET MBPGST 713 Expression MSP809 pBR322 Kanamycin pET28 MBP-HPVLGHM-GST Table 18: Plasmids used for PACE and protein expression. Plasmids are listed with important features including origin of replication, resistance marker, and encoded proteins. BCD1136 GGAATACCCAAAAGAACTGGCATG MSP819 ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNCAACAGTTTCAGCAGAACCAC MSP820 ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNAACAGTTTCAGCAGAACCACC MSP824 TGGAGTTCAGACGTGTGCTCTTCCGATCTGTTCCTTTCTATTCTCACTCCGAC Table 19: Sanger and Illumina sequencing DNA primers.

85

Chapter 3: Extending the Utility of Protease PACE

86

Section 3.1: Design of a negative selection scheme to ablate off-target proteolysis.

Thus far nearly all applications of protease PACE had evolved promiscuous variants. These variants range from TEV and HRV protease variants that cleave a substrate containing a single amino acid substitution to the highly evolved TEV L2F that cleaves the IL23 peptide HPLVGHM. These evolved proteases cleave the evolutionary target substrates in addition to the wild-type substrate. This broadening of specificity is a common result of positive selections without a negative selection against unwanted activities such as activity upon wild-type substrate73.

However, the evolution of reprogrammed proteases that are therapeutically useful will require negative selection against the cleavage of off-target endogenous proteins in order to avoid the possible unfavorable consequences of these off-target cleavage events. Because our positive selection scheme is well established and we

possess multiple tools for stringency modulation (promoter modulation and T7 RNAP activity mutants), we reasoned that an entirely analogous and independent scheme for negative selection would be ideal. The basis of our simultaneous positive and negative selection strategy is a pair of protease-activated RNA polymerases possessing orthogonal promoter recognition. When the target substrate within the positive selection PA-RNAP is cleaved, gene

III is transcribed from the positive selection promoter. Similarly, when the off-target substrate is cleaved in the negative selection PA-RNAP, transcription from a negative selection promoter drives expression of dominant- negative “gene III-neg” that poisons phage infectivity28 (Figure 39). As a result, SPs encoding proteases that have high activity on the target substrate and low activity on the off-target substrate induce the highest ratios of pIII:pIII- neg and therefore give rise to progeny phage of the highest infectivity28.

T7 lysozyme T7 lysozyme

positive negative selection selection substrate substrate T7 RNAP evolved T7 RNAP

PT7 PT3 (T3 promoter-specific) gIII AP+ gIII-neg AP–

Figure 39: Schematic representation of negative selection components. Simultaneous positive and negative selection pressures for sequence-specific proteolysis can be exerted through the use of two protease-activated RNAP polymerases. Proteolysis of the positive selection substrate sequence frees an active T7 RNAP to transcribe gene III downstream from the T7 promoter. Similarly, proteolysis of the negative selection substrate would activate a T7 RNAP variant that transcribes gene III-neg from the T3 promoter. Because the ratio of pIII:pIII-neg determines the amount of functional progeny phage, this selection scheme selects for the highest ratio of cleavage of positive selection substrate to negative selection substrate.

87

We first set out to construct a negative selection PA-RNAP consisting of the T7-lysozyme fused through a cleavable linker to a previously evolved orthogonal variant of T7 RNAP that transcribes from the T3 promoter but does not transcribe from the T7 promoter28. In addition to exquisite T3 promoter specificity, this T7 RNAP variant must retain the ability form a transcriptionally-inhibited complex with T7 lysozyme and exhibit wild-type levels of transcriptional activity when freed from this inhibition. We screened a panel of PACE-evolved T7 RNAP variants for these properties by placing polymerase variants directly in a PA-RNAP construct containing the HRV protease substrate and measuring a luminescence reporter for transcription from the T7 promoter and the T3 promoter.

Because these polymerases had undergone positive selection for T3 promoter recognition and negative selection against T7 promoter recognition, they all exhibited low transcriptional activity on the T7 promoter. However, they displayed varying degrees of protease-dependent transcription from the T3 promoter (Figure 40). We selected the

L2C variant for further studies because it possessed the highest fold-activation in the presence of HRV protease

(L2C variant contains the following mutations: R96L, K98R, H205Y, E207K, E222K, S397R, G542V, G675R,

V728I, N748D, P759L, S776A).

35000 T3 promoter uninfected T3 promoter HRV SP 30000 T7 Promoter uninfected 25000 T7 Promoter HRV SP

20000

15000

10000 Luminescence/OD600 5000

0 L1A L1B L1C L1D L1E L2A L2B L2C L2D T7 RNAP Variant within the HRV PA-RNAP

Figure 40: Luciferase assay screen of polymerase variants for an orthogonal PA-RNAP. T7 RNA polymerase variants that were evolved to transcribe from the T3 promoter but not the T7 promoter, were cloned into a PA-RNAP construct containing the HRV protease substrate. Hosts were cotransformed with these PA- RNAP constructs and a luminescence reporter downstream of either the T7 or T3 promoter. In the presence of HRV protease expressed from a selection phage, many variants (L1B, L1D, L1E, L2A, L2C, L2D) exhibited protease- dependent transcription from the T3 promoter but not the T7 promoter.

88 Tuesday, January 17, 2017 11:02 AM Page 1 of 1 T3neg-1169-proD-P15a-spec

T1 te rmina tor ( nt rrnB1 T7 e t tran lys gm en sc oz fra gm rip y II fra tio m I VI 0 na e 900 l t de er p r m e te A in n o aad p15A a d orig to e m in r n ro 1 t) p 0 00 l 0 0 a 0 rn 8 te n R i B S

s d

8

T

3

v P 2

a 3 0 r N

0 0 i

E a

0 0 n G

0 t

7

o Negative Selection Accessory Plasmid

f

T

7

R

N

A

P N

e g a t 6 i v 0 e 0 0 0 S 0 0 e 3 le B c xA t ly lu io so x n zy L me S i u n b ke st r 5 r ra 00 00 to te L 0 40 a ink in er erm al t R ction BS tetA/orfL bidire sd8 P roA/B/C/D

Figure 41: Vector map of negative selection AP. A single plasmid supplies the negative selection PA-RNAP (constructed with a T3 specific variant of the T7 RNAP) as well as gene-III neg under control of the T3 promoter.

89

Section 3.2: Validation of simultaneous positive and negative selection in protease PACE.

We constructed a negative selection AP that encoded the negative selection PA-RNAP and gene III-neg under control of the T3 promoter. Host cells were transformed with a positive selection AP (Figure 12) and a negative selection AP (Figure 41), and inoculated with SP encoding a protease that recognizes the positive selection substrate. If this protease could also cleave the negative selection substrate (in addition to the positive) we observed diminished propagation as evidenced by phage titers lower than controls in which the negative selection substrate cannot be cleaved by the SP-encodsed protease (Table 20). This decrease in propagation was especially pronounced when more negative selection PA-RNAP was expressed due to a stronger constitutive promoter (proC>proB). We then validated in continuous flow format of PACE that negative selection could indeed block phage propagation when the evolving protease could cleave both the positive and negative selection substrates (Table 21).

Protease on SP Positive selection site // Negative selection site // Final Titer (pfu/ml) promoter strength promoter strength TEV TEV// proB HRV // proB 3*109 HRV // proC 6*109 TEV // proB 7*107 TEV // proC 1*106 HRV HRV// proB HRV // proB 6*108 HRV // proC 3*108 TEV // proB 6*109 TEV // proC 1*109 Table 20: Overnight propagation assay of simultaneous positive and negative selection. Host cells were transformed with both a positive and negative selection AP encoding PA-RNAPs with the indicated cut sites and constitutive promoter expression levels. Hosts were infected with 105pfu/ml of selection phage encoding protease that cleaved the positive selection substrate. Titers after overnight outgrowth were robust in cases when the negative selection substrate could not be cleaved by the SP-encoded protease. However, titers were significantly lower when the negative selection PA-RNAP was cleaved and this difference was more pronounced when the negative selection PA-RNAP was expressed at a higher level.

Protease phage Positive selection site // Negative selection site // Titer at 24hr of PACE promoter strength promoter strength HRV HRV // proB HRV // proC 2*103 HRV HRV // proB TEV // proC 2*1010 TEV TEV// proB HRV // proC 9*109 TEV TEV// proB TEV // proC 3*103 Table 21: Simultaneous positive and negative selection is compatible with protease PACE. Titers after 24 hours of PACE were robust when the negative selection substrate could not be cleaved by the SP- encoded protease. However, titers plummet when the negative selection PA-RNAP contained the same substrate as the positive selection PA-RNAP.

Somewhat fortuitously, we had an ideal test case at the time to conduct a proof-of-principle simultaneous positive and negative selection experiment. We had conducted a positive selection for TEV protease variants that cleave the substrate ENLYAQS. Because we had used NNK site-saturation mutagenesis (TEV protease residues

90

209, 211, 216, 218) and not a mixing experiment, we enriched a population that cleaved the target substrate

(ENLYAQS) irrespective of activity on the wild-type substrate (ENLYFQS). Consequently, some variants possessed activity on both target and wild-type substrates, but a minor fraction had serendipitously lost apparent activity on wild-type substrate while acquiring activity on the target substrate (Figure 42A).

We hypothesized that our simultaneous positive and negative selection scheme could purify the population by enriching for the variants that specifically cleaved the target substrate but not wild-type substrate. We began a mixing experiment starting with hosts expressing low levels (proA) and transitioning to high levels (proC) of negative selection PA-RNAP containing the ENLYFQS substrate. Both hosts maintained a consistent level (proB) of positive selection PA-RNAP containing the ENLYAQS substrate. By the end of the PACE experiment, all assayed clones exhibited specificity for cleavage of the target (ENLYAQS) but not the wild-type (ENLYFQS)

substrate. This specificity was most likely conferred by the mutation V216F; the added hydrophobic bulk in the P2 pocket probably occludes the larger wild-type substrate residue Phe while permitting the target substrate residue Ala

(Figure 42B). The success of this experiment validates that simultaneous positive and negative selection in protease

PACE does enrich for variants that cleave the target substrate and not the negative selection substrate.

A B Positive: ENLYGQS Library: 209, 211, 216, 218 proA proC Positive: ENLYGQS Negative: 50% proA+ ENLYFQS 50% proC

50000 ENLYFQS 45000 ENLYFQS ENLYaQS 40000 40000 35000 ENLYaQS 30000 30000 25000 20000 20000 15000 10000 10000 Luminescence/OD600

Luminescence/OD600 5000 0 0 A B C E G H B C E WT A TEV Protease Clone WT uninfecte TEV Protease Clone Uninfected Figure 42: Simultaneous selection for cleavage of ENLYAQS and against cleavage of ENLYFQS. (A) We initiated a positive selection for cleavage of target substrate ENLYAQS starting from a TEV protease site- saturation library (targeted residues indicated above the arrow). After this PACE, we observe a mixture of genotypes and phenotypes; many clones retained activity on both substrates while others had coincidentally lost activity on wild-type substrate. (B) Starting with the previously evolved population that had undergone positive selection, we initiated a simultaneous positive selection (ENLYAQS) and negative selection (ENLYFQS) PACE experiment. This mixing experiment transitioned from low negative selection stringency (proA) to high negative selection stringency (proC). The resulting population exhibited strong genetic convergence and all tested clones cleaved the target substrate but not the wild-type substrate.

91

Despite the success of this proof-of-principle experiment, our attempts at applying simultaneous positive and negative selection to the TEV variants that cleaved the IL23 target peptide (HPVLGHM) were much less successful. When we attempted high positive selection stringency and low negative selection stringency few genotypic or phenotypic changes emerged. Conversely at high negative selection stringency, titers sank and the

TEV protease variants emerging appeared to possess diminished activity on both the positive selection substrate and negative selection substrate with no change to the ratio of on/off target activity. These results might indicate that there is a sweet spot for both the positive and negative selection pressures as well as the ratio of positive to negative selection pressure. Alternatively, these results may also support the theory that broadening of protease substrate specificity is frequently an easier evolutionary challenge than swapping substrate specificity.

92

Section 3.3: Attempts at PACE of FDA-approved therapeutic proteases containing disulfide linkages.

In addition to converting a model protease like TEV into a therapeutic through alteration of substrate specificity, we have explored the possibility of extending the utility of FDA-approved protease therapeutics through enhancements in kinetics and/or altered substrate specificity. Many of these therapeutics are derived from human circulating proteins involved in hemostasis and coagulation 4. Clotting factors such as Factor VII and Factor IX are used to complement natural deficiencies in hemophilia. Similarly, thrombin is used as an acute coagulant often applied topically during surgeries. On the other hand, anti-coagulants urokinase and tissue plasminogen activator are used in the treatment of blood clots in stroke and heart-attack patients.

These proteases pose a low immunogenicity risk to patients due to the fact that they are naturally circulating and recognized as self by the immune system. Any improvement to their kinetic parameters would be of

huge commercial value due to the potency benefits experienced by patients, and alterations to substrate specificity could allow for novel therapeutic indications. However, the aforementioned proteases all contain intramolecular and/or interchain disulfide bonds, and due to the reducing environment within the E. coli , these proteases will not express in a functional form during PACE.

A number of strategies exist for expressing disulfide-containing proteins in E. coli for protein purification purposes. The expression of these proteins often yields inclusion bodies that can then be solubilized and renatured under oxidizing conditions74; alternatively the protein can be co-translationally exported to the naturally oxidizing environment of the periplasm75. Neither of the aforementioned strategies is applicable to protease PACE which requires functional protease within the cytosol. For soluble cytosolic expression of disulfide-containing proteins, E. coli strains have been generating lacking both thioredoxin reductase (trxB) glutathione reductase (gor)76, 77. These knockouts yield a pool of oxidized thioredoxin and glutathione within the cytosol, which can then serve as oxidants in the production of disulfide bonds76, 77. These knockout strains grow slowly and upon selection for faster growth many researchers have identified suppressor mutations in ahpC, a peroxiredoxin76, 78, 79. The cytoplasmic expression of a disulfide DsbC (normally found in the periplasm of E. coli) further increases expression of functional disulfide-containing protein by rearranging the pairing of disulfide-linked cysteines76. The resulting commercial strain is aptly named NEB SHuffle because of this disulfide shuffling, and has been demonstrated to produce active tissue plasminogen activator (more specifically a truncated single chain form called vTPA).

93

Unfortunately, we found this strain to be unable to support propagation of bacteriophage M13 (despite conjugation with F plasmid). It turns out thioredoxin trxA is a known factor in the filamentous bacteriophage lifecycle, and must exist in its reduced form for bacteriophage assembly80, 81. A number of thioredoxin (trxA) mutants are purported to complement bacteriophage propagation in the context of thioredoxin reductase (trxB) deletion82. Expression of these trxA variants failed to generate an NEB SHuffle based strain that could support M13 bacteriophage. This suggests that the NEB Shuffle may be a poor host due to reasons unrelated to the deletion of trxB, and possibly due to the suppressor mutation in ahpC. Interestingly, we also observed in NEB SHuffle strains that the luxAB gene was unable to generate a luminescence signal regardless of exogenous or endogenous substrate decanal. Consequently, we were unable to test in NEB Shuffle for protease-dependent gene expression using vTPA and a PA-RNAP containing the vTPA substrate sequence.

Aside from knocking out reducing pathways, there are reports that heterologous expression of oxidases that natively catalyze disulfide bond formation can facilitate disulfide formation in the E. coli cytosol. One group co- expressed DsbC and the sulfhydryl oxidase Erv1, a mitochondrial protein from S. cerevisiae, alongside the target protein vTPA and attained active protease in comparable amounts to commercially available trx gor strains 83, 84.

Adapting this strategy we were unable to observe protease-dependent luminescence signal for either of two disulfide containing proteases, vTPA or enterokinase (Figure 43). At the current moment, PACE of disulfide containing proteases seems infeasible despite our considerable exploratory studies.

70000 Uninfected 60000 T7 RNAP MBP-TEV 50000 Entero 40000 MBP-ggs-Entero 30000 MBP-ddddk-Entero 20000 MBP-ggs-vTPa 10000 vTPA Luminescence/OD600 0 TEV vTPA Entero TEV vTPA Entero proC-erv1-dsbC proD-erv1-dsbC PA-RNAP Substrate and Folding Plasmid

Figure 43: Luciferase assay of disulfide containing proteases in the presence of Erv1 and DsbC. Hosts were transformed with AP (the protease substrate within each PA-RNAP is indicated below the axis) and a plasmid encoding a constitutively expressed sulfhydryl oxidase (Erv1) and disulfide isomerase (DsbC). These hosts were infected with selection phage encoding a panel of proteases as well as T7 RNA polymerase as a positive control. We observe strong protease-dependent luminescence when hosts containing the TEV PA-RNAP are infected with MBP-TEV SP. However, there is no such protease-dependent signal for enterokinase or vTPA.

94

Section 3.4: Adapting protease PACE to botulinum toxin light chain.

Although we are unable to conduct PACE on the majority of therapeutic proteases, which contain disulfide linkages, there exists a class of FDA-approved protease therapeutics that do not require disulfide linkages to perform catalysis, the botulinum neurotoxin (BoNT)4. BoNT is a two-component toxin secreted by C. botulinum, consisting of a heavy chain (BoNT-HC) linked through disulfide bonds to a light chain (BoNT-LC). The BoNT-HC contains a receptor-binding domain that facilitates receptor-mediated endocytosis into lower motor neurons85. Upon cellular trafficking into endosomes, the BoNT-HC translocation domain undergoes a pH-dependent conformational shift leading to endosomal membrane pore formation and translocation of the BoNT-LC into the cytosol86, 87. Once in the cytosol, the light chain zinc-metalloprotease cleaves SNARE proteins preventing the release of into neuromuscular junctions thus resulting in flaccid paralysis88. BoNT are classified into seven serotypes (A-G), with

serotypes A/C/E cleaving SNAP25 and serotypes B/D/F/G cleave some or all of the VAMP serotypes 1-3. Due to efficiently targeted intracellular delivery and a catalytic mechanism of intoxication, BoNTs are the most potent toxin known to mankind with an approximate LD50 ranging from 2ng/kg to 1ug/kg depending upon route of administration89.

As with all toxins “the dose makes the poison,” and BoNTs (serotypes A and B) appear to have beneficial cosmetic and therapeutic properties at appropriately low doses. Perhaps most famously, Botox is used to lessen the appearance of facial wrinkles through very localized muscle paralysis90. Similarly, BoNT are used in the treatment of a number of motor disorders characterized by spasticity, as well in the treatment of pain associated with involuntary muscle spasms90. There is mounting evidence that BoNTs are capable of alleviating pain without the loss of muscle tone indicating a mechanism of action beyond the canonical inhibition of acetylcholine release at the neuromuscular junction91. Furthermore, these studies have shown that BoNT is capable of inhibiting secretion of pain-mediators substance P and glutamate by afferent neurons91-93.

Research into extending the therapeutic application of BoNT beyond neurons has focused either on the

BoNT-HC (to target delivery to non-neuronal cells) or the BoNT-LC (to cleave non-neuronal SNARE proteins).

Researchers have shown that BoNT-HC can be engineered to yield a targeted secretion inhibitor (TSI) by replacing the receptor-binding domain with a new targeting domain (such as epidermal growth factor, growth-hormone releasing hormone, and antibodies)94-97. These TSI undergo the same uptake and translocation processes ultimately delivering BoNT-LC to the cytosol of the target cell and thus inhibiting vesicle-fusion mediated release of any

95

number of hormones and small molecules. However, the efficacy of any given TSI is contingent upon the appropriate SNARE isoform being present in the target cell and responsible for secretion. Because secretion in non- neuronal cell types is frequently regulated by non-neuronal SNARE isoforms, many targeted secretion inhibitor strategies will also require BoNT-LC engineered to cleave a target SNARE.

Fewer studies have been dedicated to the generation of variant BoNT-LC with altered specificity. One such study demonstrated that BoNT-LC serotype E could be engineered through a single point mutation to cleave

SNAP23, the non-neuronal paralog of the wild-type substrate SNAP2598. When delivered to epithelial cells, this engineered BoNT-LC was capable of inhibiting release of IL-8 and mucin. Through more dramatic reprogramming efforts it may be possible to generate BoNT-LC that degrade disease-associated proteins unrelated to secretion.

Towards that aim of developing next-generation TSI and BoNT therapeutics we set out to apply the

protease PACE system to BoNT-LC. Unlike our previous model proteases (TEV, HRV, HCV) where substrate selectivity was dictated by a short (approximately 7 amino acid) peptide sequence in an extended linear conformation, BoNT-LC recognizes a longer peptide sequence (30-60 amino acids) within their cognate SNARE protein substrates. In the case of BoNT-LC A, a stretch of nearly 60 amino acids of substrate SNAP25 wraps around almost the entire circumference of the light-chain forming an extensive network of contacts99.

We were initially unsure if using a longer substrate motif potentially containing structured protein sequences would disrupt the properties of the PA-RNAP that are essential for protease PACE. Prior to cleavage, the

PA-RNAP must form a tight T7 RNAP-T7 lysozyme complex that is transcriptionally inhibited leading to minimal background production of pIII. Background production of pIII leads to weak selection pressure, and expression of pIII above a certain level can also make host E. coli uninfectable, presumably by preventing bacteriophage attachment and entry. The added linker length between T7 lysozyme and T7 RNAP allows for greater diffusion between the two components thus lessening the fraction of molecules in the transcriptionally inhibited state.

Furthermore, any structured elements within the linker could further exacerbate the energetic and entropic costs of forming the transcriptionally inhibited complex.

We tested six serotypes of BoNT-LC for protease-dependent luciferase gene expression across a panel of

PA-RNAPs containing 60 amino acids of each of the SNARE substrates (without any Gly, Ser, Ala linker). All substrates except for SNAP25 were compatible with the PA-RNAP yielding low background transcription.

Nonetheless, the SNAP25 PA-RNAP exhibited a protease-dependent increase in luminescence for BoNT-LC E. We

96

also observe strong protease dependent luminescence for BoNT-LC F with all VAMP substrates as well as weak luminescence for BoNT-LC B with certain VAMP substrates (Figure 44).

70000

60000 Positive Control

50000 Negative Control BoNT A 40000 BoNT B 30000 BoNT C

Luminescence/OD600 20000 BoNT D

BoNT E

10000 BoNT F 0

PA-RNAP Cleavable Linker

Figure 44: SNARE-derived PA-RNAPs exhibit protease-dependent gene expression with cognate BoNT-LC. Host strains were transformed with APs encoding gene III under control of the T7 promoter and PA-RNAPs containing 60 amino acids of each of the SNARE proteins (indicated beneath the x-axis). These host strains were inoculated with selection phage encoding each of the six serotypes of BoNT-LC, as well as phage encoding the T7 RNAP as a positive control. In comparing each of the BoNT-LC to the uninfected negative control, we observe protease-dependent luminescence for BoNT F and BoNT B on VAMP substrates as well as BoNT E on SNAP25.

Because BoNT A is the main serotypes used in the clinic, we had a keen interest in fixing the high background signal seen with the SNAP25 substrate accessory plasmid. Since the 60 amino acid substrate length was not an issue for other SNARE-containing PA-RNAPs, we assumed this was an issue with the SNAP25 secondary structure. We first tried to provide extra conformational flexibility to accommodate this hypothetical secondary structure. To accomplish this objective we placed the original Gly, Ser, Ala linkers on both sides of the SNAP25 substrate (residues 147-206), but we continued to observe high background signal and an absence of BoNT-LC A dependent luminescence (Figure 45A). Rather than accommodate this structure we then tried to eliminate the amino acids responsible for secondary structure. Removal of residues 187-206 completely resolved the high background signal. Unfortunately, the BoNT-LC A cleavage site lies directly in the middle of residues 187-206, thus making

BoNT-LC A incompatible with the current protease PACE selection. On the other hand, truncation to SNAP25

97

residues 146-186 and even further truncation to residues 166-186 preserves the entire substrate recognition motif for

BoNT-LC E and thus protease-dependent luminescence is still observed (Figure 46B).

uninfected T7 45000 Uninfected

BoNT E BoNT A 40000 T7 RNAP

60000 35000 BoNT E

50000 30000 25000 40000 20000 30000 15000

20000 Luminescence/OD600 10000 Luminescence/OD600 10000 5000

0 0 ggs-linker SNAP25 ggs-linker ggs-linker SNAP25 147-206 SNAP25 SNAP25 147-206 147-186 166-186

Figure 45: SNAP25 residues 187-206 disrupt PA-RNAP leading to high background transcription. (A) Addition of Gly, Ser, Ala linkers on each side of the SNAP25 substrate does not alleviate the high background transcription of PA-RNAP as seen in the strong luminescence of the uninfected condition. (B) Conversely, removal of SNAP25 residues 187-206 appears to resolve the issue of background transcription as seen by the low luminescence for the uninfected condition for the ggs-linker SNAP25 147-186 AP. Further truncation to residues 166-186 appears to preserve most of the BoNT-LC E protease-dependent luminescence signal.

98

Section 3.5: BoNT-LC can be evolved in protease PACE.

With strong luciferase assay results validating the use of VAMP APs for BoNT-LC B and F as well as

SNAP25 APs for BoNT-LC E, we began to test for phage propagation in continuous flow format of PACE. We conducted PACE for 72h using the VAMP1 AP (proB stringency) with lagoon two containing BoNT-LC B SP and lagoon three containing BoNT-LC F SP. Sequencing results revealed enrichment of point mutations D75N S213R,

V431G for BoNT B, as well as a diverse pool of variants for BoNT F (Table 22 and Table 23 respectively).

Luminescence assays demonstrate functional improvements to both serotypes of BoNT proteases (Figure 46). These improvements may be mediated by optimization of the phage genome, enhanced solubility/expression in E. coli, or biochemical improvements to protease kinetics. In the case of mutation S166Y to BoNT-LC F, we presume that the proximity of this residue to the catalytic zinc suggests that this mutation broadly activates protease catalysis100, 101.

We were also able to conduct PACE using SNAP25 APs (proC stringency) with substrate residues 147-186 (lagoons one and two) and 166-186 (lagoons three and four). Sequencing results after 72h of PACE show a strong enrichment of mutation E28K or I18V (Table 24).

A B C L2 D E363G E F D75N S213R V431G G D75N S213R V431G H D75N S213R V431G Table 22: BoNT-LC B genotypes after 72h PACE on AP encoding VAMP1 60 amino acid substrate. Genotypes of BoNT-LC B show strong enrichment for mutations D75N, S213R, and V431G after 72h of PACE selection to cleave VAMP1. A N276T

C T79A

D L3 E S166Y

G

H

Table 23: BoNT-LC F genotypes after 72h PACE on AP encoding VAMP1 60 amino acid substrate. Genotypes show a lack of convergence after 72h of PACE selection to cleave VAMP1. Nonetheless residue S166 appears to lie within the of the BoNT-LC F, suggesting a functional role for this mutation.

99

120000 VAMP1 VAMP2 100000 80000 60000 40000 20000

Luminescence/OD600 0 Uninfected BoNT B BoNT B BoNT F BoNT F D75N S166Y S213R V431G

Figure 46: PACE-evolved BoNT-LC B and F variants with high apparent activity on wild-type substrate. Strongly enriched BoNT-LC B variant D75N, S213R V431G shows improved apparent activity on wild-type substrates VAMP1 and VAMP2 when compared to wild-type BoNT-LC B. Although BoNT-LC F S166Y is not strongly enriched, we do observe improvement in apparent activity when compared to wild-type BoNT-LC F.

A B T341A C D I18V E I18V G115D D278N F I18V G N383S L1 H A E28K V291A B E28K C I302V

E E28K F N122T N138D S284A G T51A N149S L2 H A B E28K G101S C D N390S E R16H F G E28K L3 H A E28K B E28K C E28K P112S D E28K E E28K F E28K G101S C231R R244S N293K G E28K L4 H E28K Table 24: BoNT-LC E genotypes after 72h PACE on AP encoding SNAP25 substrates. Genotypes of BoNT-LC E show strong enrichment for mutations I18V or E28K after 72h of PACE selection to cleave SNAP25 substrate residues 147-186 (lagoons one and two) and 166-186 (lagoons three and four).

100

In order to validate the reprograming of BoNT-LC, we initially set out to recapitulate the results of previous groups that had engineered BoNT-LC E to cleave SNAP23. This group recognized that a single amino acid substitution to SNAP25 with the homologous residue from SNAP23 was sufficient to prevent proteolysis by

BoNT-LC E. This substrate mutation D179K was predicted to eliminate a salt-bridge with protease residue K225.

Consequently, they were able to restore protease activity on SNAP25 D179K as well as SNAP23 by swapping the positive charge of protease residue K225 to a negatively charged aspartate in order to restore the electrostatic interaction98.

We conducted a mixing experiment to transition an evolving population of BoNT-LC E SP from wild-type substrate (SNAP25 147-186) to either SNAP25 147-186 D179K (lagoons one and two) or SNAP23 153-192

(lagoons three and four). After 72 hours of PACE, all populations were strongly enriched for mutation K225E, a

substitution likely to play the same functional role as the previously reported K225D (Table 25). As expected, all tested clones possess apparent proteolytic activity on both SNAP25 D179K and SNAP23 substrates (Figure 47).

Interestingly, populations evolved to cleave SNAP23 had enriched additional mutations R168K, P398L, Q141K,

I233T, and I396S; which may confer enhanced catalytic activity on the SNAP23 substrate.

A I18V L89P K225E V265G B I18V L89P K225E L404* C I18V E154G I199T K225E N258S D I18V L89P K225E E E154G K225E F I18V E154G K225E G Q27H K225E L404* L1 H Q27H K225E L404* A E28K K225E C231R L404* B E28K K225E C231R L404* C E28K K225E C231R L404* D E28K K225E C231R E E28K K225E C231R N261D F E28K K225E C231R L404*

L2 A I18V E154G R168K K225E P398L B I18V E154G K225E C E154G K225E D270N D I18V E154G R168K K225E P398L E E154G S187F K225E F I18V E154G R168K K225E P398L

L3 H I18V E154G R168K K225E P398L A E28K Q141K E154G K225E C231R I233T I396S B E28K Q141K E154G K225E C231R I233T I396S C E28K Q141K E154G K225E C231R I233T I396S

E E28K Q141K E154G K225E C231R I233T I396S F E28K Q141K E154G K225E C231R I233T I396S G E28K Q141K E154G K225E C231R I233T I396S L4 H E28K Q141K E154G K225E C231R I233T I396S Table 25: BoNT-LC E genotypes after PACE on AP encoding SNAP25 D179K or SNAP23. Genotypes of BoNT-LC E show strong enrichment for mutation K225E after 72h of PACE selection to cleave SNAP25 147-186 D179K (lagoons one and two) or SNAP23 153-192 (lagoons three and four). Lagoons three and four exhibit additional enriched mutations that may play a role in enhanced recognition of SNAP23.

101

80000 SNAP25 SNAP23 70000 SNAP25_D179K 60000

50000

40000

30000

Luminescence/OD600 20000

10000

0

Figure 47: Luciferase assay of PACE-evolved BoNT-LC E clones exhibit apparent proteolysis of SNAP23. Compared to wild-type BoNT-LC E, which exhibits no apparent activity on SNAP25 D179K and SNAP23 substrates, PACE-evolved variants of BoNT-LC E exhibit enhanced apparent proteolysis of both target substrates. Clones from all lagoons exhibit comparable activity on the SNAP25 D179K mutant substrate. However, clones from lagoons three and four show superior apparent activity on SNAP23 substrate likely because these populations underwent selection to cleave SNAP23 and enriched additional mutations that accommodate this substrate.

102

Section 3.6: Evolutionary target substrates for BoNT-LC E.

Hypothetically, an engineered BoNT-LC could be coupled to engineered BoNT-HC to enable delivery of protease that cleaves any target disease-associated protein in any target cell type. Because both engineering approaches are still in their infancy, it is more appropriate to only alter one chain at a time. As a result, we are limiting our protease target substrate search to intracellular proteins associated with disease progression in neurons, such that we can rely upon the native delivery capabilities of the BoNT-HC. We have chosen to evolved BoNT-LC

E primarily because its shorter substrate motif (SNAP25 167-186) and published specificity determinants102, 103 make it significantly easier to identify disease-associated intracellular proteins containing similar amino acid sequences.

Furthermore, researchers have shown that when administered via intracranial injection, BoNT E enters neurons within the central nervous system and can block neurotransmitter release104. This property should enable us to target diseases that affect neurons other than the traditional BoNT target neuron, the lower motor neuron.

In an approach analogous to that used for TEV protease, we searched the amino acid sequences of intracellular proteins for target substrates similar to the wild-type BoNT-LC E substrate (SNAP25 167-186). We constructed a scoring matrix with weights derived from published in vitro proteolysis assays using mutant substrates

(Table 26), and used this scoring system to rank all 20 amino acid sequences in the intracellular proteome. From this ranking we identified target substrates in either MTOR or CDK5R1 (Table 27) whose degradation would be expected to induce an autophagic response that could promote clearance of mis-folded protein aggregates that are key to the etiology of many neurodegenerative diseases.

Similar to the early target substrates for TEV protease, these targets were identified solely based upon wild- type protease specificity and therefore ignored questions of evolutionary potential. All three substrate point mutations D179K/L/G could be accommodated through mutations to the protease residue K225 that enriched during

PACE. However, both substrate point mutations I181E and I181Y resulted in phage washout during mixing experiments starting from wild-type substrate (SNAP25 167-186). Furthermore, experiments relying upon NNK mutagenesis of BoNT-LC E residues 160, 192, and 209 (the three residues that comprise a hydrophobic pocket occupied by SNAP25 I181) also resulted in phage washout for both substrate point mutations I181E and I181Y.

Despite both D179 and I181 serving as crucial determinants of substrate specificity, it appears that BoNT-LC E only has high evolutionary potential with regards to substitutions at D179 and not I181. With this information in mind, we constructed a revised scoring matrix designed to reflect this new knowledge about evolutionary potential (Table

103

28). From the revised target substrate rankings we identified a new target substrate within proto-oncogene PTEN, whose inhibition could mimic the cancer-like state of uninhibited growth thus transiently stimulating neurogenesis.

Numerous studies have shown that pharmacological or genetic blockade of PTEN activity promotes regeneration of a variety of neuron subtypes105-110.

SNAP 25 167-186 M G N E I D T Q N R Q I D R I M E K A D

A 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 R 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 N 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D 0 0 0 1 0 1 0 0 0 0 0 0 3 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 H 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 I 0 0 0 0 3 0 0 0 0 0 0 3 0 0 8 0 0 0 0 0 L 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 K 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 M 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

F 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 W 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 V 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 Table 26: Target peptide scoring matrix based upon wild-type BoNT-LC E specificity. We created a subjective rating matrix based upon our knowledge of wild-type BoNT-LC E protease substrate specificity as assessed by published in vitro protein cleavage assays. Key features include high marks for crucial specificity determinants G168, I178, D179, and I181.

167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 SNAP25 M G N E I D T Q N R Q I D R I M E K A D SNAP23 I G N E I D A Q N P Q I K R I T D K A D CDK5R1 M G N E I S Y P L K P F L V E S C K E A MTOR V G R L I H Q L L T D I G R Y H P Q A L Table 27: Candidate target substrates identified via wild-type BoNT-LC E specificity scoring. Target substrates are aligned to wild-type substrate SNAP25 167-186. Matches at positions for which BoNT-LC E has high specificity are highlighted in green.

104

SNAP 25 167-186 M G N E I D T Q N R Q I D R I M E K A D A 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 R 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 N 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Q 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 H 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 I 0 0 0 0 3 0 0 0 0 0 0 4 0 0 8 0 0 0 0 0 L 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 K 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 M 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 F 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 P -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 W 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 V 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 Table 28: Target peptide scoring matrix based upon BoNT-LC E specificity and evolutionary potential.

We revised the specificity-scoring matrix (Table 26) to reflect our knowledge of BoNT-LC E evolutionary potential. New features include: (1) a penalty for proline at all positions due to unique structural properties and (2) removal of any preference at position 179 due to high evolutionary potential with regards to substitutions at this position.

167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 SNAP25 M G N E I D T Q N R Q I D R I M E K A D SNAP23 I G N E I D A Q N P Q I K R I T D K A D PTEN N G S L C D Q E I D S I C S I E R A D N Table 29: PTEN target substrate identified via revised scoring method. The target substrate from PTEN is aligned to wild-type substrate SNAP25 167-186. Matches at positions for which BoNT-LC E has high specificity are highlighted in green.

105

Section 3.7: Evolution of BoNT-LC E proteases that cleave PTEN.

Based upon luminescence assays of wild-type BoNT-LC E for cleavage of single mutant SNAP25 substrates (Figure 48), we observed that mutation to D179 and R180 were most detrimental to proteolytic activity and would be ideal first stepping-stones in what would be short evolutionary trajectories (Table 30-Table 33). For these first stepping-stones we used site-saturation mutagenesis with NNK codons at residues 216 and 225 or 159 and

161 for SNAP25 substrates containing D179C or R180S respectively. In both evolutions, we observed mutations

Q27R, E28K, and I232T. After evolution to cleave SNAP25 D179C (L1 Table 34), we observed strong enrichment of mutations K225H or K225L; while in the case evolution to cleave SNAP25 R180S, we observed enrichment of

E159A (L2 Table 34). We then carried forward populations in lagoons one and two into PACE propagation on hosts expressing a SNAP25 substrate containing both D179C and R180S. Uncertain if random mutagenesis during

this PACE experiment would be sufficient to access high activity variants, we also performed NNK codon site- saturation mutagenesis of these two populations. We targeted residues 159 and 161 for lagoon one to yield the population in lagoon three, and residues 216 and 225 for lagoon two to yield the population in lagoon four. Variants from this experiment exhibit a variety of genotypes (Table 35) and apparent cleavage activity on not only stepping- stone substrate two but also stepping-stone three (Figure 49). For this reason, we initiated a mixing experiment to transition evolving SP populations from stepping-stone-three, for which we had observed some proteolytic activity, to stepping stone-four, for which we had not observed proteolytic activity. After this PACE experiments, already abundant mutations at 354 and 355 or 357 sweep across lagoon two and lagoon one/three respectively. In addition, novel mutations at position 160 and 174 have also enriched to high abundance in lagoon two (Table 36).

106

50000 Uninfected 45000 BoNT-E 40000 35000 30000 25000 20000 15000

Luminescence/OD600 10000 5000 0 SNAP25 N169S E170L I171C D179C R180S

Figure 48: Luciferase assay of wild-type BoNT-LC E on a panel of point mutant substrates.

Wild-type BoNT-LC E shows apparent activity on a panel of SNAP25 point mutant substrates in which the corresponding amino acid has been replaced with the residue from the PTEN target substrate.

Stepping-stone Strategy 0 M G N E I D T Q N R Q I D R I M E K A D 1 NNK 216, 225 M G N E I D T Q N R Q I C R I M E K A D 2 Direct propagation M G N E I D T Q N R Q I C S I M E K A D 3 Direct propagation M G S L C D T Q N R Q I C S I M E K A D 4 Mixing N G S L C D Q E I D S I C S I M E K A D 5 Mixing N G S L C D Q E I D S I C S I E R A D N Table 30: Evolutionary stepping-stone substrates for lagoon 1. Stepping-stone substrates are aligned to wild-type substrate SNAP25 167-186. Matches at positions for which BoNT-LC E has high specificity are highlighted in green; PTEN substitutions to SNAP25 are highlighted yellow. PACE strategies are indicated as: (1) selection from NNK site-saturation mutagenesis libraries on the indicated stepping-stone substrate, (2) direct propagation of the population from the preceding experiment on the indicated stepping-stone substrate, or (3) as a mixing experiment in which populations propagate on a mixture of host cells encoding the previous stepping-stone substrate as well as the indicated next stepping-stone substrate.

Stepping-stone Strategy 0 M G N E I D T Q N R Q I D R I M E K A D 1 NNK 159, 161 M G N E I D T Q N R Q I D S I M E K A D 2 Direct propagation M G N E I D T Q N R Q I C S I M E K A D 3 Direct propagation M G S L C D T Q N R Q I C S I M E K A D 4 Mixing N G S L C D Q E I D S I C S I M E K A D 5 Mixing N G S L C D Q E I D S I C S I E R A D N Table 31: Evolutionary stepping-stone substrates for lagoon 2. Stepping-stone substrates are aligned to wild-type substrate SNAP25 167-186. Matches at positions for which BoNT-LC E has high specificity are highlighted in green; PTEN substitutions to SNAP25 are highlighted yellow. PACE strategies are indicated as: (1) selection from NNK site-saturation mutagenesis libraries on the indicated stepping-stone substrate, (2) direct propagation of the population from the preceding experiment on the indicated stepping-stone substrate, or (3) as a mixing experiment in which populations propagate on a mixture of host cells encoding the previous stepping-stone substrate as well as the indicated next stepping-stone substrate.

107

Stepping-stone Strategy 0 M G N E I D T Q N R Q I D R I M E K A D 1 NNK 216, 225 M G N E I D T Q N R Q I C R I M E K A D 2 NNK 159, 161 M G N E I D T Q N R Q I C S I M E K A D 3 Direct propagation M G S L C D T Q N R Q I C S I M E K A D 4 Mixing N G S L C D Q E I D S I C S I M E K A D 5 Mixing N G S L C D Q E I D S I C S I E R A D N Table 32: Evolutionary stepping-stone substrates for lagoon 3. Stepping-stone substrates are aligned to wild-type substrate SNAP25 167-186. Matches at positions for which BoNT-LC E has high specificity are highlighted in green; PTEN substitutions to SNAP25 are highlighted yellow. PACE strategies are indicated as: (1) selection from NNK site-saturation mutagenesis libraries on the indicated stepping-stone substrate, (2) direct propagation of the population from the preceding experiment on the indicated stepping-stone substrate, or (3) as a mixing experiment in which populations propagate on a mixture of host cells encoding the previous stepping-stone substrate as well as the indicated next stepping-stone substrate.

Stepping-stone Strategy 0 M G N E I D T Q N R Q I D R I M E K A D 1 NNK 159, 161 M G N E I D T Q N R Q I D S I M E K A D 2 NNK 216, 225 M G N E I D T Q N R Q I C S I M E K A D 3 Direct propagation M G S L C D T Q N R Q I C S I M E K A D

4 Mixing N G S L C D Q E I D S I C S I M E K A D 5 Mixing N G S L C D Q E I D S I C S I E R A D N Table 33: Evolutionary stepping-stone substrates for lagoon 4. Stepping-stone substrates are aligned to wild-type substrate SNAP25 167-186. Matches at positions for which BoNT-LC E has high specificity are highlighted in green, PTEN substitutions to SNAP25 are highlighted yellow. PACE strategies are indicated as: (1) selection from NNK site-saturation mutagenesis libraries on the indicated stepping-stone substrate, (2) direct propagation of the population from the preceding experiment on the indicated stepping-stone substrate, or (3) as a mixing experiment in which populations propagate on a mixture of host cells encoding the previous stepping-stone substrate as well as the indicated next stepping-stone substrate.

L1A Q27R K225L L1B Q27R K225H I232S L1C E28K K225L I352T L1D K225L L1E Q27R K225H L1F E28K K225H I232T L1G E28K K225H L1H K225L L2A E159A N161H I232S L2B Q27R E159A N161H L2C Q27R I232S L2D E159A I232T L2E E159A I232T L2F E159L N161Y L2G E28K E159S N161H L2H E159A N161Y Table 34: BoNT-LC E genotypes after 72h PACE on AP encoding SNAP25 D179C or R180S substrates. Site-saturation libraries of BoNT-LC E residues 216 and 226 (lagoon one) or 159 and 161 (lagoon two) underwent selection for variants that cleave SNAP25 D179C (lagoon one) or R180S (lagoon two). After 72h of PACE selection, genotypes of BoNT-LC E show enrichment of mutations Q27R, E28K, and I232T regardless of target substrate. We observe mutations K225H or K225L in lagoon one and E159A and N161H or N161Y in lagoon two.

108

16000 14000 uninfected 12000 WT BoNT E 10000 L4B 8000 L4E 6000 4000

Luminescence/OD600 2000 0

Figure 49: Luciferase assay of evolved BoNT-LC E exhibit apparent proteolysis of stepping-stone two and three. Compared to wild-type BoNT-LC E, which exhibits no apparent activity on stepping-stone substrates two through five, variants of BoNT-LC E evolved to cleave stepping-stone two exhibit enhanced apparent proteolysis of stepping-stones two and three.

109

L1A L98P K225H L1B E28K K225L L1C Q27R L98P N197K I199M K225H L1E E28K K225L I227V I352T L404* L1F Y171C A224S K225H L1H K22S E28K E184G K225H Y357C L2A I232T L2B I232T Q354R Y355H L2D I232T V265G L2E V47I L98P I232T S372G L2G I232T Q354R Y355H N379K L2H N197K I232T S314A Q354R Y355H L3A E28K E159S N161H M172S K225H C231G L3B E159S N161W K225L I352T I409L L3C Q27R L98P E159W N161W K225H L3D E159Q N161Y K225H L3E E159A N161W M172V K225L C231R L3G Q27R L98P E159A N161V K225H L3H L98P E159C N161W K225L C231R L4A E28K K225L I232T L4B K225L I232T L4C E28K K225L I232T L4D E159A N161H K225Y I232T N242S L4E E28K S198G K225L I232T L4F K225L I232T

110 L4H E159A N161H K225H Table 35: BoNT-LC E genotypes after 72h PACE on AP encoding SNAP25 D179C and R180S substrate.

Populations from lagoon one and two from the preceding PACE experiment were used to inoculate lagoons one and two for PACE on hosts encoding the SNAP25 D179C/R180S double mutant substrate. In lagoons one and two, we see emergence of mutations at Q354 Y355 and Y357. The population from lagoon one from the preceding PACE experiment was also used as template to generate a site-saturation mutagenesis library of residues 159 and 161, which then served as the inoculum for lagoon three. The population from lagoon two from the preceding PACE experiment was also used as template to generate a site-saturation mutagenesis library of residues 216 and 225, which then served as the inoculum for lagoon four. In lagoons three and four, we observe combinations of mutations at K225 and E159 and N161 which may promote cleavage of SNAP25 D179C/R180S.

L1A K22R E28K E148G T160A Y171C K225H C231Y Y357C L1B E184G K225H Y357H L1E K22R E28K E184G K225H Y357C L1F K22R E28K E184G F186L K225H Y357C L1G E184G Y171C K225H L2A I21M E28K N72H Q141R E159A T160A N161H S174A I232T Q354R Y355H L2B E28K S99P E159A T160A N161H S174A I232T Q354R Y355H L404* L2C E28K E159A T160A N161H S174A I232T Q354R Y355H L2G E28K E159A T160A N161H S174A I232T Q354R Y355H L2H E28K E159A T160A N161H S174A I232T D312N Q354R Y355H A389T L3A E159C T160S N161W K225L C231R Y357S L3B E159C N161W M172R K225L C231R Y357C L3C Q27R E159R N161Y N197K I199M K225H K311E L3D E159C T160S N161W K225L C231R Y357C L3E E159C N161W K225L C231R Y357C L3G Q27R E159C T160S N161W K225L C231R L3H S6N E159S N161W N197K I199M K225H L4A E28K K225L I232T L4B E28K K225L I232T A266T L4C E28K N138D K225L I232T K329N L4D E28K K225L I232T L4E E28K K225L I232T L4F E28K S137R K225L I232T Table 36: BoNT-LC E genotypes after 72h PACE with mixing-transition from stepping-stone three to four. Samples from lagoons one through four from the preceding PACE experiment were used to inoculate separate lagoons propagated for 24h on hosts encoding stepping-stone substrate three. For a period of 24h, lagoons were diluted with a 50-50 mixture of hosts encoding either stepping-stone three or four. For the final 24h, lagoons were diluted with hosts encoding stepping-stone substrate four. We observe variants enriched for new mutations T160A, S174A, and Y357H. 111

Section 3.8: Discussion.

This last phase of thesis research has sought to extend the practical utility of reprogramming protease specificity using PACE. We first aimed to develop a scheme for simultaneous positive and negative selection that would avoid the promiscuity observed when using positive selection alone. This scheme relies upon two orthogonal

PA-RNAPs; the positive selection PA-RNAP drives expression of gIII and the negative selection PA-RNAP drives expression of gIII-neg. As a results, SPs encoding proteases that cleave the positive selection substrate but not the negative selection substrate will induce the highest ratio of gIII:gIII-neg and thus will propagate most efficiently.

We were able to assemble all the necessary components and validate that this simultaneous positive and negative selection system could enrich variants of TEV protease that cleave the single mutant substrate ENLYAQS but not the wild-type substrate ENLYFQS.

In our proof-of-concept studies (Section 2 of this thesis) we evolved TEV protease to cleave a clinically relevant human protein. Even if ex vivo studies had shown efficacy at physiologically relevant concentrations, then we still would have major concerns about repeated administration of a foreign and likely immunogenic biologic.

For this reason, we determined that our next evolution campaign must either focus on a protease of human origin or an FDA-approved protease therapeutic. Although we were unable to adapt protease PACE to the evolution of the vast majority of FDA-approved therapeutic proteases, which contain disulfide bonds, we did find that botulinum toxin proteases (BoNT-LC) were for the most part compatible with protease PACE. We conducted evolution of three serotypes of BoNT-LC and in some cases identified protease variants with enhanced apparent proteolytic activity on wild-type substrates. We also demonstrated that protease PACE could evolve BoNT-LC E variants that cleave the non-neuronal ortholog SNAP23 (as opposed the canonical substrate SNAP25). This PACE experiment enriched a BoNT-LC mutation similar to one previously identified via structure-guided protein design, but also identified addition protease mutations that appear to greatly enhance cleavage of SNAP23.

We have since initiated a line of research to evolve BoNT-LC E to cleave intracellular target proteins implicated in neuronal disease progression. Currently, we are in the midst of a short evolutionary trajectory to evolve variants of BoNT-LC E that cleave PTEN, a protein important in cell-cycle regulation and whose degradation in neurons could promote neurogenesis and other neuroprotective phenotypes. Assuming we can complete this arc of positive selection, it remains to be seen if these evolved proteases will cleave PTEN in vitro and if proteolysis of the target peptide bond can inactive the phosphatase activity of PTEN. It is also is unknown if these mutated BoNT-

112

LC will still be delivered into the neuronal cytosol by the BoNT-HC and if such delivery will be efficient enough to promote cleavage of PTEN and affect a change in neuronal growth. Furthermore, we will likely need to apply our newly developed simultaneous positive and negative selection to ablate cleavage of the wild-type substrate SNAP25, given that this activity is known to be toxic in lower motor neurons and would also be undesirable in the central nervous system. Despite these challenges, the work presented within this thesis represents a huge advance in the directed evolution of proteases, the capability for the first time, to alter specificity at many positions within a protease substrate recognition motif through directed evolution. We hope that this capability will continue to be applied to the generation of proteases with both therapeutic and research applications.

113

Section 3.9: Methods.

Cloning of Accessory Plasmids, Expression Vectors and Phage Libraries. All primers were designed to perform

USER cloning71 and ordered from Integrated DNA Technologies (IDT). For the cloning of phage libraries, NNK codons were generating using hand-mixed phosphoramidite ratios to provide uniform incorporation rates. All PCR reactions were performed using Phusion U Hot Start polymerase (Thermo-Fisher).

For the assembly of APs and expression vectors, PCR products were purified using EconoSpin columns

(Epoch Life Sciences) and assembled with DpnI and USER enzyme in CutSmart Buffer (New England BioLabs).

Following assembly, plasmids were transformed into NEB Turbo Competent E. coli cells (New England BioLabs).

For the assembly of phage libraries, PCR products were purified by gel electrophoresis and extracted using the MinElute kit by Qiagen. Following an assembly reaction identical to that of the AP, the USER reaction was

24 desalted using the MinElute PCR purification kit (Qiagen) prior to electroporation into competent E. coli S1059

(for SP libraries) or NEB Turbo Electrocompetent E. coli cells24 (for substrate display phage libraries). Phage libraries were grown overnight in 2xYT and filtered through sterile 0.22 µm membranes to eliminate host cells. The titers of phage libraries were evaluated by plaque assay using strain S1059 as hosts. Briefly, phage were prepared in four 50-fold serial dilutions of 50 µL. To each dilution was added 100 µL of fresh host cell culture at approximately

OD600 = 1.0 followed by addition of 900 µL top agar (2xYT, 6 g/L agar). The mixture was mixed by pipet, then transferred to a quarter-plate prepared with a thin layer of bottom agar (2xYT, 16 g/L agar). Plaque assays were incubated overnight at 37 °C.

In order to assess library quality, 12 clones were sequenced to confirm diversity at the targeted amino acid positions. Briefly, individual plaques were picked with a pipet tip in order to provide template material (SP-infected

E. coli) for rolling circle amplification (TempliPhi, GE Healthcare). Sanger DNA sequencing was performed using primer BCD1136 (Table 19), and results were aligned and tabulated using SeqMan (DNAStar).

Luminescence Assays of Wild-Type SP Constructs. Saturated overnight cultures of S1030 cells containing a substrate AP were used to initiate luciferase assays in 96-well culture plates. Approximate volumes were 500 µL

2xYT, 50 µL overnight starter culture, and 10 µL filtered phage samples. All assays included a negative control (no phage) and a positive control (SP encoding T7 RNAP). Experimental and control conditions were performed in triplicate. After 3-5 h of growth in a 37 °C shaker, 100 µL was transferred to a clear-bottom assay plate to measure

114

OD600 and luminescence on a Tecan Infinite Pro Plate Reader. Measurements were analyzed as OD-normalized values and as luminescence fold-change over the negative control.

PACE Experiments. PACE experiments were performed as previously described24, 28, 42, 53, 54, 72. E. coli strain S1030 was co-transformed by electroporation with a mutagenesis plasmid (MP6)25 and an accessory plasmid (plasmids are described in Table 18 and detailed in Figure 12). Chemostats containing 80 mL of Davis Rich Media with 22.5

µg/mL carbenicillin and 15 µg/mL chloramphenicol were inoculated with overnight starter cultures and grown at

37 °C while mixing at 250 rpm via a magnetic stir bar. Once the chemostat grew to approximately OD600 = 1.0, we began dilution with fresh media at a rate of 80-100 mL/h, with the waste needle set at a height of 80 mL. At the same time, we began the flow of chemostat culture at approximately 10-20 mL/h into a lagoon with a waste needle set at a height of 15 mL. The total flow rate through each lagoon was set based upon the difficulty of a given experiment, with slower dilution being used for more challenging evolutions. For the full duration of the experiment

10% w/v arabinose solution was syringe-pumped into the lagoons at a rate of 0.5-1.0 mL/h.

In cases of simultaneous positive and negative selection, an additional negative selection AP was also co- transformed into host cells and spectinomycin (50 µg/mL) was added to the media in addition to carbenicillin and chloramphenicol.

Experiments starting with an NNK mutagenized SP library initiated with a lagoon inoculum of 1-2 mL of phage library containing 108-1010 pfu/mL. For all other experiments, lagoons were inoculated with 50-100 µL of filtered phage population from the last time point of the previous PACE experiment. In PACE experiments using mixtures of host cell cultures (see the main text), lagoons received an influx of cell culture from two separate chemostats containing hosts bearing two different APs (combined rate of 10-20 mL/h) for a period of 24-48 h.

Phage samples were collected from lagoon waste outflow lines at 24 h intervals and passed through a 0.22

µm sterile filter to remove host cells. The titers of phage samples were evaluated by plaque assay using strain S1059 as hosts.

At the end of each PACE experiment, eight individual plaques were picked with a pipet tip in order to provide template material (SP-infected E. coli) for rolling circle amplification (TempliPhi, GE Healthcare). The same pipet tip was subsequently transferred to a 96-deep well culture plate containing 2xYT media for growth overnight at 37 °C. After a PACE experiment, enriched mutations should be present within multiple clones of this

115

small sample of eight population members. Sanger DNA sequencing was performed using primer BCD1136 (Table

19), and results were aligned and tabulated using SeqMan (DNAStar).

Luminescence Assays of Evolved Clones. Clones chosen for characterization were sterile-filtered from the corresponding position within the 96-well culture plate. Saturated overnight cultures of S1030 cells containing a substrate AP were used to initiate luciferase assays in 96-well culture plates. Approximate volumes were 500 µL

2xYT, 50 µL overnight starter culture, and 10 µL filtered phage samples. All assays included a negative control (no phage), a positive control (SP encoding T7 RNAP), and wild-type BoNT-LC SP as a reference. Experimental and control conditions were performed in triplicate. After 3-5 h of growth in a 37 °C shaker, 100 µL was transferred to a clear-bottom assay plate to measure OD600 and luminescence on a Tecan Infinite Pro Plate Reader. Measurements were analyzed as OD-normalized values and as luminescence fold-change over the negative control.

Ranking of Target Sites within Intracellular Proteins. A list of human intracellular and transmembrane proteins with their corresponding amino acid sequences were tabulated using the ProteinData functionality in Mathematica

10. This data was transferred into MATLAB for further processing by a customizable script that performed the following operations. A rating matrix that is 20 positions wide (for the twenty sites within the BoNT-LC E recognition motif) by 20 long (for each possible amino acid) was manually populated with biochemical specificity data or subjective “evolvability” integer ratings. Each protein was converted into a binary sparse matrix with as many rows as the length of the protein sequence and 20 columns one for each amino acid. For each protein matrix,

20 rows at a time were multiplied by the rating matrix, with the trace of the resulting 20x20 product matrix providing a score for the candidate substrate. For each intracellular protein the best score and the corresponding peptide and starting-residue index were saved. Once all protein sequences had been processed, we sorted the protein names along with their best-match candidate substrate sequences by score.

116

References

1. Schilling, O. & Overall, C.M. Proteome-derived, database-searchable peptide libraries for identifying protease cleavage sites. Nature biotechnology 26, 685-694 (2008). 2. Walsh, G. Biopharmaceutical benchmarks 2006. Nature biotechnology 24, 769-776 (2006). 3. Wehr, M.C. et al. Monitoring regulated protein-protein interactions using split TEV. Nature methods 3, 985-993 (2006). 4. Craik, C.S., Page, M.J. & Madison, E.L. Proteases as therapeutics. The Biochemical journal 435, 1-16 (2011). 5. Gray, D.C., Mahrus, S. & Wells, J.A. Activation of specific apoptotic caspases with an engineered small- molecule-activated protease. Cell 142, 637-646 (2010). 6. Zhao, H.M. & Arnold, F.H. Directed evolution converts subtilisin E into a functional equivalent of thermitase. Protein engineering 12, 47-53 (1999). 7. You, L. & Arnold, F.H. Directed evolution of subtilisin E in Bacillus subtilis to enhance total activity in aqueous dimethylformamide. Protein engineering 9, 77-83 (1996). 8. Persson, E., Kjalke, M. & Olsen, O.H. Rational design of coagulation factor VIIa variants with substantially increased intrinsic activity. Proc Natl Acad Sci U S A 98, 13583-13588 (2001). 9. Madison, E.L., Goldsmith, E.J., Gerard, R.D., Gething, M.J. & Sambrook, J.F. Serpin-resistant mutants of human tissue-type plasminogen activator. Nature 339, 721-724 (1989). 10. Allen, G.A. et al. A variant of recombinant factor VIIa with enhanced procoagulant and antifibrinolytic activities in an in vitro model of hemophilia. Arterioscl Throm Vas 27, 683-689 (2007). 11. Beck, A., Wurch, T., Bailly, C. & Corvaia, N. Strategies and challenges for the next generation of therapeutic antibodies. Nature reviews. Immunology 10, 345-352 (2010). 12. Hedstrom, L., Farr-Jones, S., Kettner, C.A. & Rutter, W.J. Converting trypsin to chymotrypsin: ground- state binding does not determine substrate specificity. Biochemistry 33, 8764-8769 (1994). 13. Kurth, T., Ullmann, D., Jakubke, H.D. & Hedstrom, L. Converting trypsin to chymotrypsin: structural determinants of S1' specificity. Biochemistry 36, 10098-10104 (1997). 14. Hedstrom, L., Szilagyi, L. & Rutter, W.J. Converting trypsin to chymotrypsin: the role of surface loops. Science 255, 1249-1253 (1992). 15. Hedstrom, L., Perona, J.J. & Rutter, W.J. Converting trypsin to chymotrypsin: residue 172 is a substrate specificity determinant. Biochemistry 33, 8757-8763 (1994). 16. Ridky, T.W. et al. Programming the Rous sarcoma virus protease to cleave new substrate sequences. The Journal of biological chemistry 271, 10538-10544 (1996). 17. Lin, Y.C. et al. Alteration of substrate and inhibitor specificity of feline immunodeficiency virus protease. Journal of virology 74, 4710-4720 (2000). 18. Carrico, Z.M., Strobel, K.L., Atreya, M.E., Clark, D.S. & Francis, M.B. Simultaneous selection and counter-selection for the directed evolution of proteases in E-coli using a cytoplasmic anchoring strategy. Biotechnol Bioeng 113, 1187-1193 (2016). 19. Renicke, C., Spadaccini, R. & Taxis, C. A tobacco etch virus protease with increased substrate tolerance at the P1' position. PloS one 8, e67915 (2013). 20. Varadarajan, N., Gam, J., Olsen, M.J., Georgiou, G. & Iverson, B.L. Engineering of protease variants exhibiting high catalytic activity and exquisite substrate selectivity. Proc Natl Acad Sci U S A 102, 6855- 6860 (2005). 21. Verhoeven, K.D., Altstadt, O.C. & Savinov, S.N. Intracellular Detection and Evolution of Site-Specific Proteases Using a Genetic Selection System. Appl Biochem Biotech 166, 1340-1354 (2012). 22. Yi, L. et al. Engineering of TEV protease variants by yeast ER sequestration screening (YESS) of combinatorial libraries. Proc Natl Acad Sci U S A 110, 7229-7234 (2013). 23. Badran, A.H. & Liu, D.R. In vivo continuous directed evolution. Curr Opin Chem Biol 24C, 1-10 (2015). 24. Esvelt, K.M., Carlson, J.C. & Liu, D.R. A system for the continuous directed evolution of biomolecules. Nature 472, 499-503 (2011). 25. Badran, A.H. & Liu, D.R. Development of potent in vivo mutagenesis plasmids with broad mutational spectra. Nature communications 6, 8425 (2015). 26. Entus, R., Aufderheide, B. & Sauro, H.M. Design and implementation of three incoherent feed-forward motif based biological concentration sensors. Systems and synthetic biology 1, 119-128 (2007).

117

27. Jeruzalmi, D. & Steitz, T.A. Structure of T7 RNA polymerase complexed to the transcriptional inhibitor T7 lysozyme. EMBO J 17, 4101-4113 (1998). 28. Carlson, J.C., Badran, A.H., Guggiana-Nilo, D.A. & Liu, D.R. Negative selection and stringency modulation in phage-assisted continuous evolution. Nat Chem Biol 10, 216-222 (2014). 29. Clark, V.C., Peter, J.A. & Nelson, D.R. New therapeutic strategies in HCV: second-generation protease inhibitors. Liver international : official journal of the International Association for the Study of the Liver 33 Suppl 1, 80-84 (2013). 30. Manns, M.P. & von Hahn, T. Novel therapies for hepatitis C - one pill fits all? Nature reviews. Drug discovery 12, 595-610 (2013). 31. Romano, K.P. et al. The molecular basis of drug resistance against hepatitis C virus NS3/4A protease inhibitors. PLoS pathogens 8, e1002832 (2012). 32. Jiang, Y. et al. Discovery of danoprevir (ITMN-191/R7227), a highly selective and potent inhibitor of hepatitis C virus (HCV) NS3/4A protease. Journal of medicinal chemistry 57, 1753-1769 (2014). 33. Scola, P.M. et al. The discovery of asunaprevir (BMS-650032), an orally efficacious NS3 protease inhibitor for the treatment of hepatitis C virus infection. Journal of medicinal chemistry 57, 1730-1752 (2014). 34. Lim, S.R. et al. Virologic escape during danoprevir (ITMN-191/RG7227) monotherapy is hepatitis C virus subtype dependent and associated with R155K substitution. Antimicrobial agents and chemotherapy 56, 271-279 (2012). 35. McPhee, F. et al. Resistance analysis of the hepatitis C virus NS3 protease inhibitor asunaprevir. Antimicrobial agents and chemotherapy 56, 3670-3681 (2012). 36. McPhee, F. et al. Resistance analysis of hepatitis C virus genotype 1 prior treatment null responders receiving daclatasvir and asunaprevir. Hepatology 58, 902-911 (2013). 37. Cordingley, M.G., Callahan, P.L., Sardana, V.V., Garsky, V.M. & Colonno, R.J. Substrate requirements of human rhinovirus 3C protease for peptide cleavage in vitro. The Journal of biological chemistry 265, 9062- 9065 (1990). 38. Bjorndahl, T.C., Andrew, L.C., Semenchenko, V. & Wishart, D.S. NMR solution structures of the apo and peptide-inhibited human rhinovirus 3C protease (Serotype 14): structural and dynamic comparison. Biochemistry 46, 12945-12958 (2007). 39. Verbinnen, T. et al. Tracking the evolution of multiple in vitro hepatitis C virus replicon variants under protease inhibitor selection pressure by 454 deep sequencing. Journal of virology 84, 11124-11133 (2010). 40. Imhof, I. & Simmonds, P. Genotype differences in susceptibility and resistance development of hepatitis C virus to protease inhibitors telaprevir (VX-950) and danoprevir (ITMN-191). Hepatology 53, 1090-1099 (2011). 41. Leconte, A.M. et al. A population-based experimental model for protein evolution: effects of mutation rate and selection stringency on evolutionary outcomes. Biochemistry 52, 1490-1499 (2013). 42. Dickinson, B.C., Leconte, A.M., Allen, B., Esvelt, K.M. & Liu, D.R. Experimental interrogation of the path dependence and stochasticity of protein evolution using phage-assisted continuous evolution. Proc Natl Acad Sci U S A (2013). 43. Kapust, R.B. et al. Tobacco etch virus protease: mechanism of autolysis and rational design of stable mutants with wild-type catalytic proficiency. Protein engineering 14, 993-1000 (2001). 44. Romano, K.P., Ali, A., Royer, W.E. & Schiffer, C.A. Drug resistance against HCV NS3/4A inhibitors is defined by the balance of substrate recognition versus inhibitor binding. Proc Natl Acad Sci U S A 107, 20986-20991 (2010). 45. Herman, G.E. & Modrich, P. Escherichia coli K-12 clones that overproduce dam methylase are hypermutable. Journal of bacteriology 145, 644-646 (1981). 46. Fijalkowska, I.J. & Schaaper, R.M. Mutants in the Exo I motif of Escherichia coli dnaQ: defective proofreading and inviability due to error catastrophe. Proc Natl Acad Sci U S A 93, 2856-2861 (1996). 47. Yang, H., Wolff, E., Kim, M., Diep, A. & Miller, J.H. Identification of mutator genes and mutational pathways in Escherichia coli using a multicopy cloning approach. Molecular microbiology 53, 283-295 (2004). 48. Dougherty, W.G., Cary, S.M. & Parks, T.D. Molecular genetic analysis of a plant virus polyprotein cleavage site: a model. Virology 171, 356-364 (1989). 49. Phan, J. et al. Structural basis for the substrate specificity of tobacco etch virus protease. The Journal of biological chemistry 277, 50564-50572 (2002). 50. Gaffen, S.L., Jain, R., Garg, A.V. & Cua, D.J. The IL-23-IL-17 immune axis: from mechanisms to therapeutic testing. Nature reviews. Immunology 14, 585-600 (2014).

118

51. Langrish, C.L. et al. IL-23 drives a pathogenic T cell population that induces autoimmune inflammation. J Exp Med 201, 233-240 (2005). 52. Teng, M.W. et al. IL-12 and IL-23 cytokines: from discovery to targeted therapies for immune-mediated inflammatory diseases. Nat Med 21, 719-729 (2015). 53. Badran, A.H. et al. Continuous evolution of Bacillus thuringiensis toxins overcomes insect resistance. Nature 533, 58-63 (2016). 54. Hubbard, B.P. et al. Continuous directed evolution of DNA-binding proteins to improve TALEN specificity. Nature methods 12, 939-942 (2015). 55. Davis, J.H., Rubin, A.J. & Sauer, R.T. Design, construction and characterization of a set of insulated bacterial promoters. Nucleic acids research 39, 1131-1141 (2011). 56. Makarova, O.V., Makarov, E.M., Sousa, R. & Dreyfus, M. Transcribing of Escherichia coli genes with mutant T7 RNA polymerases: Stability of lacZ mRNA inversely correlates with polymerase speed. P Natl Acad Sci USA 92, 12250-12254 (1995). 57. Ratnikov, B., Cieplak, P. & Smith, J.W. High throughput substrate phage display for protease profiling. Methods in molecular biology 539, 93-114 (2009). 58. Scholle, M.D. et al. Mapping protease substrates by using a biotinylated phage substrate library. Chembiochem 7, 834-838 (2006). 59. Matthews, D.J. & Wells, J.A. Substrate phage: selection of protease substrates by monovalent phage display. Science 260, 1113-1117 (1993). 60. Thomsen, M.C. & Nielsen, M. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion. Nucleic acids research 40, W281-287 (2012). 61. Fuchs, J.E. et al. Cleavage entropy as quantitative measure of protease specificity. PLoS Comput Biol 9, e1003007 (2013). 62. Pop, C. & Salvesen, G.S. Human caspases: activation, specificity, and regulation. The Journal of biological chemistry 284, 21777-21781 (2009). 63. Song, J. et al. PROSPER: an integrated feature-based tool for predicting protease substrate cleavage sites. PloS one 7, e50300 (2012). 64. Lien, S., Pastor, R., Sutherlin, D. & Lowman, H.B. A substrate-phage approach for investigating specificity. Protein J 23, 413-425 (2004). 65. Cabrita, L.D. et al. Enhancing the stability and solubility of TEV protease using in silico design. Protein science : a publication of the Protein Society 16, 2360-2367 (2007). 66. van den Berg, S., Lofdahl, P.A., Hard, T. & Berglund, H. Improved solubility of TEV protease by directed evolution. Journal of biotechnology 121, 291-298 (2006). 67. Wei, L. et al. In vivo and in vitro characterization of TEV protease mutants. Protein expression and purification 83, 157-163 (2012). 68. Desmet, J. et al. Structural basis of IL-23 antagonism by an Alphabody protein scaffold. Nature communications 5, 5237 (2014). 69. Aggarwal, S., Ghilardi, N., Xie, M.H., de Sauvage, F.J. & Gurney, A.L. Interleukin-23 promotes a distinct CD4 T cell activation state characterized by the production of interleukin-17. The Journal of biological chemistry 278, 1910-1914 (2003). 70. Lerner, R.A., Benkovic, S.J. & Schultz, P.G. At the Crossroads of Chemistry and Immunology - Catalytic Antibodies. Science 252, 659-667 (1991). 71. Nour-Eldin, H.H., Geu-Flores, F. & Halkier, B.A. USER Cloning and USER Fusion: The Ideal Cloning Techniques for Small and Big Laboratories. Plant Secondary Engineering: Methods and Applications 643, 185-200 (2010). 72. Dickinson, B.C., Packer, M.S., Badran, A.H. & Liu, D.R. A system for the continuous directed evolution of proteases rapidly reveals drug-resistance mutations. Nature communications 5, 5352 (2014). 73. Tracewell, C.A. & Arnold, F.H. Directed enzyme evolution: climbing fitness peaks one amino acid at a time. Curr Opin Chem Biol 13, 3-9 (2009). 74. Fischer, B., Sumner, I. & Goodenough, P. Isolation, renaturation, and formation of disulfide bonds of eukaryotic proteins expressed in Escherichia coli as inclusion bodies. Biotechnol Bioeng 41, 3-13 (1993). 75. de Marco, A. Strategies for successful recombinant expression of disulfide bond-dependent proteins in Escherichia coli. Microb Cell Fact 8, 26 (2009). 76. Bessette, P.H., Aslund, F., Beckwith, J. & Georgiou, G. Efficient folding of proteins with multiple disulfide bonds in the Escherichia coli cytoplasm. Proc Natl Acad Sci U S A 96, 13703-13708 (1999).

119

77. Stewart, E.J., Aslund, F. & Beckwith, J. Disulfide bond formation in the Escherichia coli cytoplasm: an in vivo role reversal for the thioredoxins. EMBO J 17, 5543-5550 (1998). 78. Ritz, D., Lim, J., Reynolds, C.M., Poole, L.B. & Beckwith, J. Conversion of a peroxiredoxin into a disulfide reductase by a triplet repeat expansion. Science 294, 158-160 (2001). 79. Yamamoto, Y. et al. Mutant AhpC peroxiredoxins suppress thiol-disulfide redox deficiencies and acquire deglutathionylating activity. Mol Cell 29, 36-45 (2008). 80. Lim, C.J., Haller, B. & Fuchs, J.A. Thioredoxin is the bacterial protein encoded by fip that is required for filamentous bacteriophage f1 assembly. Journal of bacteriology 161, 799-802 (1985). 81. Russel, M. & Model, P. Thioredoxin is required for filamentous phage assembly. Proc Natl Acad Sci U S A 82, 29-33 (1985). 82. Russel, M. & Model, P. The role of thioredoxin in filamentous phage assembly. Construction, isolation, and characterization of mutant thioredoxins. The Journal of biological chemistry 261, 14997-15005 (1986). 83. Hatahet, F., Nguyen, V.D., Salo, K.E. & Ruddock, L.W. Disruption of reducing pathways is not essential for efficient disulfide bond formation in the cytoplasm of E. coli. Microb Cell Fact 9, 67 (2010). 84. Nguyen, V.D. et al. Pre-expression of a sulfhydryl oxidase significantly increases the yields of eukaryotic disulfide bond containing proteins expressed in the cytoplasm of E.coli. Microb Cell Fact 10, 1 (2011). 85. Dong, M. et al. SV2 is the protein receptor for botulinum neurotoxin A. Science 312, 592-596 (2006). 86. Hoch, D.H. et al. Channels formed by botulinum, tetanus, and diphtheria toxins in planar lipid bilayers: relevance to translocation of proteins across membranes. Proc Natl Acad Sci U S A 82, 1692-1696 (1985). 87. Koriazova, L.K. & Montal, M. Translocation of botulinum neurotoxin light chain protease through the heavy chain channel. Nat Struct Biol 10, 13-18 (2003). 88. Simpson, L.L. Identification of the major steps in botulinum toxin action. Annu Rev Pharmacol Toxicol 44, 167-193 (2004). 89. Arnon, S.S. et al. Botulinum toxin as a biological weapon: medical and public health management. JAMA 285, 1059-1070 (2001). 90. Jankovic, J. Botulinum toxin in clinical practice. J Neurol Neurosurg Psychiatry 75, 951-957 (2004). 91. Aoki, K.R. Evidence for antinociceptive activity of botulinum toxin type A in pain management. Headache 43 Suppl 1, S9-15 (2003). 92. Cui, M., Khanijou, S., Rubino, J. & Aoki, K.R. Subcutaneous administration of botulinum toxin A reduces formalin-induced pain. Pain 107, 125-133 (2004). 93. Ranoux, D., Attal, N., Morain, F. & Bouhassira, D. Botulinum toxin type A induces direct analgesic effects in chronic neuropathic pain. Ann Neurol 64, 274-283 (2008). 94. Foster, K.A. et al. Re-engineering the target specificity of Clostridial neurotoxins - a route to novel therapeutics. Neurotox Res 9, 101-107 (2006). 95. Masuyer, G., Chaddock, J.A., Foster, K.A. & Acharya, K.R. Engineered botulinum neurotoxins as new therapeutics. Annu Rev Pharmacol Toxicol 54, 27-51 (2014). 96. Somm, E. et al. A botulinum toxin-derived targeted secretion inhibitor downregulates the GH/IGF1 axis. J Clin Invest 122, 3295-3306 (2012). 97. Yeh, F.L. et al. Retargeted clostridial neurotoxins as novel agents for treating chronic diseases. Biochemistry 50, 10419-10421 (2011). 98. Chen, S. & Barbieri, J.T. Engineering botulinum neurotoxin to extend therapeutic intervention. Proc Natl Acad Sci U S A 106, 9180-9184 (2009). 99. Breidenbach, M.A. & Brunger, A.T. Substrate recognition strategy for botulinum neurotoxin serotype A. Nature 432, 925-929 (2004). 100. Agarwal, R., Binz, T. & Swaminathan, S. Structural analysis of botulinum neurotoxin serotype F light chain: implications on substrate binding and inhibitor design. Biochemistry 44, 11758-11765 (2005). 101. Agarwal, R., Schmidt, J.J., Stafford, R.G. & Swaminathan, S. Mode of VAMP substrate recognition and inhibition of Clostridium botulinum neurotoxin F. Nat Struct Mol Biol 16, 789-794 (2009). 102. Chen, S. & Barbieri, J.T. Unique substrate recognition by botulinum neurotoxins serotypes A and E. The Journal of biological chemistry 281, 10906-10911 (2006). 103. Chen, S. & Barbieri, J.T. Multiple pocket recognition of SNAP25 by botulinum neurotoxin serotype E. The Journal of biological chemistry 282, 25540-25547 (2007). 104. Costantin, L. et al. Antiepileptic effects of botulinum neurotoxin E. J Neurosci 25, 1943-1951 (2005). 105. Groszer, M. et al. Negative regulation of neural stem/progenitor cell proliferation by the Pten tumor suppressor gene in vivo. Science 294, 2186-2189 (2001).

120

106. Liu, K. et al. PTEN deletion enhances the regenerative ability of adult corticospinal neurons. Nat Neurosci 13, 1075-1081 (2010). 107. Ohtake, Y., Hayat, U. & Li, S. PTEN inhibition and axon regeneration and neural repair. Neural Regen Res 10, 1363-1368 (2015). 108. Park, K.K. et al. Promoting axon regeneration in the adult CNS by modulation of the PTEN/mTOR pathway. Science 322, 963-966 (2008). 109. Sun, F. et al. Sustained axon regeneration induced by co-deletion of PTEN and SOCS3. Nature 480, 372- 375 (2011). 110. Zukor, K. et al. Short hairpin RNA against PTEN enhances regenerative growth of corticospinal tract axons after spinal cord injury. J Neurosci 33, 15350-15361 (2013).

121