1 2 3 Design of Potent and Selective G Inhibitors Based on the Sunflower 4 5 6 Inhibitor-1 Scaffold 7 8 9 Joakim E. Swedberg, Choi Yi Li, Simon J. de Veer, Conan K. Wang, David J. Craik 10 11 12 Institute for Molecular Bioscience, The University of Queensland, Brisbane QLD 4072, 13 14 Australia 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2 3 ABSTRACT 4 5 6 are directly responsible for destroying invading pathogens via reactive oxygen 7 8 9 species, antimicrobial peptides and (NSPs). Imbalance between NSP 10 11 activity and endogenous inhibitors is associated with chronic inflammatory disorders 12 13 14 and engineered inhibitors of NSPs is a potential therapeutic pathway. In this study we 15 16 characterised the extended substrate specificity (P4-P1) of the NSP using a peptide 17 18 substrate library. Substituting preferred cathepsin G substrate sequences into the sunflower 19 20 21 -1 (SFTI-1) produced a potent cathepsin G inhibitor (K i = 0.89 nM). Cathepsin 22 23 G’s P2ʹ preference was determined by screening against a P2ʹ diverse SFTI-based library, and 24 25 the most preferred residue at P2ʹ was combined in SFTI-1 with a preferred substrate sequence 26 27 28 (P4-P2) and a non-proteinogenic P1 residue (4-guanidyl-L-phenylalanine) to produce a potent 29 30 (Ki = 1.6 nM) and the most selective (≥ 360-fold) engineered cathepsin G inhibitor reported to 31 32 33 date. This compound is a promising lead for further development of CG inhibitors targeting 34 35 chronic inflammatory disorders. 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57

58 59 60

1 2 3 INTRODUCTION 4 5 6 Neutrophils are a key component of the innate and form the first line of defence 7 8 9 against bacterial and fungal infections. After migrating to the site of injury, neutrophils eliminate 10 11 invading organisms using cytotoxic granules that contain reactive oxygen species, antimicrobial 12 13 1-3 14 peptides and several proteases, including neutrophil serine proteases (NSPs). At present, four 15 16 NSPs have been identified, including neutrophil (NE), (PR3) and cathepsin 17 18 G (CG),4 as well as the recently identified neutrophil 4 (NSP4).5 The NSPs, 19 20 21 together with the (A, B, H, K and M), , complement , and 22 23 the pseudo protease (catalytically inactive), azurocidin, belong to the S1A family 24 25 of serine proteases. The substrate specificity of each NSP has recently been characterized using 26 27 6 7 28 combinatorial peptide libraries and substrates. NE and PR3 share an elastase-type P1 29 30 specificity with cleavage after small or branched amino acids, such as Ala, Val and Thr. By 31 32 contrast, CG has an unusually broad P1 tolerance, with cleavage after aromatic, branched 33 34 35 hydrophobic and, to some degree, basic residues.6, 7 NSP4 has been reported to have a preference 36 37 for P1 Arg,6 but the protease also cleaves after aromatic residues to some degree.5 38 39 40 NSPs are directly responsible for destroying invading pathogens as well as degrading 41 42 8 43 components to allow for neutrophil infiltration during acute inflammation. 44 45 Additionally, NSPs are involved in regulating the immune response by processing chemokines, 46 47 growth factors and cell surface receptors.9 As NSPs have diverse roles that are vital for 48 49 50 preventing infection, their activity is tightly regulated at several levels. First, NSPs are produced 51 52 as inactive zymogens that require sequential removal of the N-terminal signal peptide, followed 53 54 by a two pro-peptide that is cleaved by I ().5, 10 55 56 57 Second, NSP activity in the extracellular space is regulated by endogenous inhibitors, including 58 59 60

1 2 3 α1-proteinase inhibitor (NE, PR3, NSP4 and CG), (NE and PR3), α1-anti-chymotrypsin 4 5 8 6 (CG) and secretory leukocyte protease inhibitor (CG). Finally, these inhibitors can also be 7 8 inactivated by NSPs at high concentrations or by oxidation. For example, PR3 and NE can 9 10 11 12, 13 11 degrade α1-proteinase inhibitor whilst NE degrades α1-anti-chymotrypsin. Additionally, the 12 13 release of reactive oxygen species by activated neutrophils can inactivate secretory leukocyte 14 15 protease inhibitor and α1-proteinase inhibitor by oxidising the inhibitory loop Met73 and 16 17 14, 15 18 Met358, respectively. 19 20 21 Imbalance between NSP activity and endogenous protease inhibitors is associated with 22 23 chronic inflammatory disorders, such as chronic obstructive pulmonary disorder (COPD), 24 25 , pulmonary fibrosis and cystic fibrosis.1, 8, 16-18 COPD correlates strongly with 26 27 28 lifestyle factors such as smoking or exposure to pollution, but genetic factors have also been 29 30 identified that predispose an individual to COPD. Patients with α1-proteinase inhibitor deficiency 31 32 33 develop COPD, and the severity of disease correlates with polymorphisms in the SERPINA1 34 35 ,19 validating a role for proteases, including NSPs, in the development of COPD. To date, 36 37 most research has focused on the development of potent and selective inhibitors of NE and a 38 39 17 40 number of clinical trials using NE inhibitors have been performed or are ongoing. However, no 41 42 NE inhibitors have been approved for clinical use, except for sivelstat, which is approved in 43 44 Japan and Korea.17, 20 To some extent, this may be a result of redundancy in the pro- 45 46 21 47 inflammatory signalling pathways mediated by NSPs. It is noteworthy that α1-proteinase 48 49 inhibitor shows activity against all of the NSPs, but clinical trials for chronic inflammatory 50 51 disorders have so far focused on NE alone. 52 53 54 As NE inhibitors have only seen modest success in clinical trials, there is now growing 55 56 22 57 interest in developing inhibitors for other NSPs, in particular CG. A dual inhibitor of CG (Ki =

58 59 60

1 2 3 38 nM) and mast cell chymase (K = 2.3 nM) has been reported23 and shown to reduce 4 i 5 24 6 inflammation in animal models of pulmonary inflammation. However, this inhibitor lacks 7 8 selectivity and thus, it is difficult to assess whether the decrease in inflammation could be 9 10 11 attributed to inhibition of CG, chymase or both proteases. While there are several potent 12 13 endogenous inhibitors of CG, these can be inactivated by neutrophils via several mechanisms 14 15 and are therefore not ideal drug candidates. Accordingly, no truly potent and selective CG 16 17 18 inhibitors that are suitable for therapeutic development have been reported to date. 19 20 21 Sunflower trypsin inhibitor-1 (SFTI-1), a 14 amino acid backbone cyclic peptide bisected 22 23 by a disulfide bond, is the smallest known member of the Bowman-Birk family of serine 24 25 protease inhibitors,25-27 and a potent inhibitor of trypsin (K = 0.0017 nM).28 We have previously 26 i 27 28 shown that SFTI-1 is an excellent scaffold for engineering potent and selective inhibitors of 29 30 specific serine proteases,25-27 or multi-target inhibitors for a family of closely related proteases.27 31 32 SFTI-1 can also be engineered to produce broad-range inhibitors towards both tryptic and 33 34 27, 29 35 chymotryptic serine proteases. SFTI-1 has been reported to be an inhibitor of CG (Ki = 570 36 37 nM), and substituting the P1 residue in SFTI-1 from Arg5 to Phe5 produced a CG inhibitor with 38 39 30 40 improved inhibition (Ki = 370 nM), albeit with limited selectivity over other serine proteases. 41 42 This suggests that the SFTI-1 scaffold is suitable for developing potent and selective CG 43 44 inhibitors, but has yet to be fully explored. 45 46 47 In the current study we used a non-combinatorial colorimetric peptide substrate library to 48 49 50 further define the substrate specificity of CG across the S4-S1 subsites. Preferred CG cleavage 51 52 sequences were substituted into the P4-P1 subsites of SFTI-1 to produce potent CG inhibitors 53 54 with sub nanomolar inhibition constants. Potency and selectivity of inhibition was further 55 56 57 improved by screening CG against a SFTI-based inhibitor library to optimise the P2′ residue. 58 59 60

1 2 3 The most potent and selective SFTI-variant inhibited CG with a K of 1.6 nM and at least 360- 4 i 5 6 fold selectivity over other NSPs and a panel of other serine proteases with chymotryptic and 7 8 tryptic specificity. This compound is the most potent and specific CG inhibitor reported to date 9 10 11 and is a promising lead for the development of CG inhibitors targeting chronic inflammatory 12 13 disorders. 14 15 16 17 18 19 RESULTS 20 21 22 A previous study showed that CG is inhibited by SFTI-1 (compound 1, Table 1) with a Ki of 570 23 24 nM.30 To confirm this finding, we synthesized SFTI-1 by solid-phase Fmoc synthesis and tested 25 26 27 its activity against CG, which revealed a similar, although higher Ki (730 nM, Table 1). We 28 29 previously engineered variants of SFTI-1 that were designed to display broad-range inhibition of 30 31 27, 29 32 serine proteases (i.e. compound 2 and 3). These variants were substituted at Arg2 (Thr), 33 34 Phe12 (Asn) and Asp14 (Asn) to promote intramolecular hydrogen bonding while having 35 36 minimal side chain interactions with the target protease. Further, the P1 residue (Lys5) was 37 38 39 substituted with either Arg (compound 2) or Phe (compound 3) to target proteases with trypsin- 40 41 like or chymotrypsin-like specificity, respectively.29 CG is known to have mixed trypsin- and 42 43 chymotrypsin-like specificity. To determine the most preferred P1 residue for designing SFTI- 44 45 46 based inhibitors, we screened compounds 2 and 3 against CG. Both inhibitors were more potent 47 48 than SFTI-1, with the P1 Phe variant (3) being the most potent (Ki = 390 nM). Considering the 49 50 broad P1 specificity of CG, we also examined SFTI variants with P1 Tyr (4) and P1 Leu (5) and 51 52 53 found that these were similarly potent to P1 Phe or Arg, respectively.

54 55 56 57 58 59 60

1 2 3 Design and Screening of a targeted CG substrate library. We have shown that serine 4 5 6 proteases can be screened by colorimetric para-nitroanilide (-pNA) peptide substrate libraries to 7 8 determine optimal P4-P1 sequences for substituting into SFTI-1 to produce potent and selective 9 10 25, 29, 31, 32 11 inhibitors. CG has previously been screened against a combinatorial positional scanning 12 13 peptide library.6 However, since combinatorial substrate libraries only probe the preference of 14 15 one protease binding subsite at once, effects between adjacent residues in the 16 17 18 substrate cannot be detected. To identify the most preferred substrate sequences, the findings 19 20 from a positional scanning library needs to be deconvoluted using a sparse matrix library – a 21 22 substrate library that contain all possible combinations of the most preferred residues for each 23 24 25, 33 6 25 position. We used the information from a published positional scanning CG screen to design 26 27 a non-combinatorial sparse matrix library of substrates to identify highly preferred CG cleavage 28 29 sequences (Supporting Information, Table S1). Phe was chosen for the P1 position of the sparse 30 31 32 matrix library since this residue produced the most potent SFTI-based inhibitor (3). Screening 33 34 CG against the substrate library (Figure 2) showed that the most preferred residues overall were 35 36 37 Asp/Thr (P4), Glu (P3) and norleucine (P2, norleucine: Nle or n). Several substrates were 38 39 cleaved at significantly higher rates than the remainder of the library, and these included DTnF- 40 41 pNA, TEnF-pNA and IEnF-pNA. DEnF-pNA also appeared to be cleaved with high efficiency 42 43 44 (as indicated by the yellow colour after cleavage of the pNA moiety), but precipitation of the 45 46 substrate prevented accurate spectrophotometric quantification. All four CG substrates were 47 48 further characterized by determining their kinetic constants (Table 2). DEnF-pNA had the lowest 49 50 -1 51 KM (54 µM) whereas the highest kcat was seen for DTnF-pNA (1.10 s ). DEnF-pNA was cleaved 52 53 with the highest catalytic efficiency (kcat/KM), closely followed by DTnF-pNA.

54 55 56 57 58 59 60

1 2 3 Synthesis and screening of CG peptide-aldehyde inhibitors. To explore whether the preferred 4 5 6 substrate sequences DEnF, DTnF, TEnF and IEnF could be used to design CG inhibitors, the 7 8 corresponding peptides aldehydes DEnF-H (6), DTnF-H(7), TEnF-H (8) and IEnF (9) were 9 10 11 synthesised by solid-phase Fmoc synthesis on NovaSyn® TG resin. Peptide aldehydes inhibit 12 13 serine proteases by forming a transition-state mimicking covalent bond with Ser195 14 15 (chymotrypsin numbering). DEnF-H, DTnF-H and IEnF-H showed similar activity to the P1 16 17 18 substituted SFTI variants (2-5), whereas TEnF-H was 6-fold less potent most likely due to the 19 20 higher KM of the corresponding substrate. Although the metabolic stability of peptide aldehydes 21 22 is too low for therapeutic applications, these findings supported the use of these P4-P1 sequences 23 24 25 for developing CG inhibitors. 26 27 28 Substrate-guided design of potent SFTI-based CG Inhibitors. The preferred sequences 29 30 DEnF, TEnF and IEnF were substituted into the P4-P1 sites of SFTI-1 as DCnF (10), TCnF (11) 31 32 and ICnF (12) to preserve the vital disulfide bond (Cys3-Cys11). All variants showed improved 33 34 35 activity compared to compounds 2-5, with compound 12 being the most potent (Ki = 3.4 nM). 36 37 We have shown that Asn14 in 2 is important for maintaining the intramolecular hydrogen bond 38 39 25, 29 40 network, conformational stability and potency for -related proteases. Accordingly, 41 42 Asp14 in compounds 11and 12 was substituted for Asn to produce 13 and 14, respectively. 43 44 Compound 13 gained 12-fold potency (Ki = 0.89 nM) whereas 14 was slightly less potent. To 45 46 47 investigate these findings further, we performed molecular dynamics simulations with 13 and 14 48 49 in complex with CG. Models of these complexes were constructed using the SFTI-1/trypsin 50 51 complex and a CG crystal structure. These analyses suggested that for 13 the Asn14 side chain 52 53 54 formed two stabilising hydrogen bonds with the Thr2 backbone amide and side chain hydroxyl 55 56 (Figure S3A), as we previously reported for 2.29 The same analysis for 14 indicated that steric 57 58 59 60

1 2 3 hindrance by Ile2 reduced the hydrogen bond interaction between the Asn14 side chain and Ile2 4 5 6 backbone amide (Figure S3B). The most potent inhibitor (13) was screened against the other 7 8 NSPs and was found to be highly selective over NE and PR3 (>10000-fold). However, screening 9 10 11 13 against other proteases with chymotryptic specificity revealed that chymotrypsin and KLK7 12 13 were also potently inhibited, with Ki values in the low nanomolar range (Table 3). 14 15 16 Determining the P2ʹ preference of CG using a P2ʹ diverse SFTI-based library. We have 17 18 shown that the P2ʹ residue is important for inhibitor specificity by screening diverse serine 19 20 21 proteases against a cyclic peptide library based on 2, where P2ʹ was substituted by all 22 23 proteinogenic residues (excluding Cys) or biphenylalanine (BiP; B).27 Screening CG against this 24 25 P2ʹ diverse inhibitor library revealed that the P2ʹ residue in SFTI-1 (Ile7) was not the most 26 27 28 preferred residue (Figure 3), and that CG has a broad P2ʹ specificity, which includes acidic 29 30 (Asp/Glu), hydrophobic (Val/Met) and aromatic (Trp/BiP), but not basic, residues. Our previous 31 32 study showed that P2ʹ Asp7 is not preferred by several serine proteases, including chymotrypsin 33 34 35 and KLK7,27 and therefore Ile7 in 13 was substituted for Asp7, but the resulting compound 17 36 37 was unexpectedly 2-fold less potent. To rationalise this finding, we compared the Ki for the Asp7 38 39 40 (18) and Ile7 (2) variants of the inhibitor library, and the Asp7 variant was indeed 40-fold more 41 42 potent, confirming the results from the P2ʹ diverse SFTI-based inhibitor library. The only other 43 44 difference between 2 and 17 (apart from the P1 residue) is a Phe12 to Asn substitution. 45 46 47 Consequently, Phe12 was substituted for Asn12 in 13 to produce 19, and this inhibitor showed 48 49 36-fold less activity than 13, indicating that Asn12 does not contribute to high binding affinity. 50 51 These findings suggest that the P1 residue (Arg) in the P2ʹ library (Figure 3) strongly influences 52 53 54 the P2ʹ preference, and that Asp7 is less preferred by CG when the P1 residue is Phe (17). 55 56 Screening 17 against PR3 and NE revealed that this compound was slightly less selective than 13 57 58 59 60

1 2 3 over these NSPs (still maintaining ~5000-fold selectivity), but with improved (~80-fold) 4 5 6 selectivity over chymotrypsin and kallikrein-related peptidase 7 (KLK7). However, mast cell 7 8 chymase was also potently inhibited by 17 (Ki = 7.7 nM). Molecular modelling of 2 and 17 9 10 11 showed that in the complex of CG and 2, Asp7 formed hydrogen bonds with Arg41 12 13 (chymotrypsin numbering, CG numbering in brackets (47) of CG, but not in 17 (Supporting 14 15 Information, Figure S3C-F), supporting that the P1 residue modulated the P2ʹ preference of CG. 16 17 18 This suggests that substantial cooperativity occurs between the P1 and P2ʹ residues, where the 19 20 nature of the P1 residue alters the positioning of the P2ʹ residue. 21 22 23 Design and Screening of SFTI-based CG Inhibitors with improved selectivity. With the aim 24 25 of further improving potency and/or selectivity, Ile7in 13 was substituted with Glu or BiP to 26 27 28 produce compounds 20 and 21, respectively. The BiP7 variant (21) lost 70% potency for CG and 29 30 was not further evaluated. The Glu7 variant (20) gained some 50% potency over the Asp7 31 32 variant (17), but was less selective over chymotrypsin. Molecular modelling of CG with 20 33 34 35 suggested that with Phe as the P1 residue Glu7 could reach into the basic S2ʹ pocket of CG to 36 37 form hydrogen bonds (Supporting Information, Figure S3G-H), as seen before for 2 with P1 Arg 38 39 40 in combination with Asp7 at the P2ʹ position. These findings further support strong cooperativity 41 42 between the S1 and S2ʹ binding sites of CG. 43 44 45 A previous study produced a dual chymotrypsin and CG (Ki = 570 nM) inhibitor by 46 47 substituting the P1 residue in SFTI-1 (Lys5) for Phe.30 That study also showed that by 48 49 50 substituting the P1 residue in SFTI-1 (Lys5) for 4-guanidyl-L-phenylalanine rather than Phe 51 52 reduced the inhibitor’s activity against chymotrypsin, even though the resulting inhibitor was 53 54 more potent for chymotrypsin than CG.30 With the aim of reducing the inhibition of proteases 55 56 57 with chymotryptic S1 specificity we substituted P1 Phe in the most selective inhibitor (17) with 58 59 60

1 2 3 4-guanidyl-L-phenylalanine, and the resulting compound 22 was similarly potent for CG as 17, 4 5 6 but with greatly improved selectivity over chymotrypsin (>6300-fold), KLK7 (460-fold) and 7 8 chymase (12000-fold) (Supporting Information, Figure S4). The least specificity of 22 was 9 10 11 achieved for trypsin (360-fold), but this still represents >200000-fold reduction in activity 12 28 13 compared to SFTI-1 (Ki = 0.0017 nM). There was no substantial inhibition of the other NSPs, 14 15 or by 22, and this compound is the most potent and specific CG inhibitor 16 17 18 reported to date. 19 20 21 To gain insights in the molecular determinants underpinning the potency and specificity 22 23 of 22 molecular modelling of this compound in complex with CG was performed. This analysis 24 25 indicated that the binding interaction relied mostly on intermolecular hydrogen bonds at the P4, 26 27 28 P1 and P2ʹ positions. The Lys217 side chain formed hydrogen bonds with both the Thr2 side 29 30 chain hydroxyl group and the Asn14 backbone carbonyl group (Figure 4A-B). At the S2ʹ pocket 31 32 33 the P2ʹ carboxyl oxygen atoms of Asp7 formed hydrogen bonds with the guanadino group of 34 35 Arg143 (148) of CG. In the S1 pocket the guanidino group of 4-guanidyl-L-phenylalanine 36 37 formed hydrogen bonds with carboxylic oxygens of Glu226 (225) as well as several main chain 38 39 40 carbonyl oxygens. This is in contrast to most trypsin-like serine proteases that recognise P1 basic 41 42 residues through Asp189. NE, PR3 and NSP4 also have an acidic residue at the 226 position 43 44 (Asp, chymotrypsin numbering) in combination with Gly189, but the S1 pocket is occluded by 45 46 5 47 hydrophobic/aromatic residues. For NSP4, which is known to cleave after basic residues, the P1 48 49 Arg does not interact with Asp226, but rather with the hydroxyl groups of Ser192 and Ser216 50 51 (PDB 4Q7Z).34 Overlaying the CG/compound 22 model with the structure of NSP4 suggests that 52 53 54 the side chain of 4-guanidyl-L-phenylalanine is too large to fit in the S1 pocket (Supporting 55 56 Information, Figure S5A-B) and that the same binding mode as for the Phe-Phe-Arg- 57 58 59 60

1 2 3 chloromethyl ketone inhibitor in the complex is unlikely. Thus, it remains to be determined to 4 5 6 what degree compound 22 has inhibitory activity towards NSP4. 7 8 9 10 11 12 13 14 15 DISCUSSION 16 17 18 The design of potent inhibitors of CG that were specific over serine proteases with both trypsin- 19 20 and chymotrypsin-like specificity required optimisation of the entire binding loop sequence (P4- 21 22 23 P2´) of SFTI-1. Substituting preferred cleavage sites inTo the P4-P1 positions resulted in an 24 25 inhibitor (13) with improved binding affinity for CG, but without selectivity over chymotrypsin. 26 27 28 By screening the P2ʹ preference of CG and comparing it to the corresponding preferences of 29 30 other serine proteases,27 such as chymotrypsin, KLK7 trypsin, thrombin and plasmin, we 31 32 identified a P2ʹ residue (Asp7) that promoted selectivity and the resulting inhibitor (20) had 33 34 35 improved selectivity over chymotrypsin. Since the P2ʹ preference of chymase is not known 36 37 selectivity over this protease was achieved by substituting a non-proteinogenic P1 residue (4- 38 39 guanidyl-L-phenylalanine) targeting the peculiar P1 specificity of CG. The resulting compound 40 41 42 22 showed at least 360-fold selectivity over serine proteases with trypsin-, chymotrypsin- and 43 44 elastase-like specificity. 45 46 47 Screening CG against SFTI-based inhibitors with diverse P1 residues (Phe, Tyr, Arg and 48 49 50 Leu) has shown that CG has a relaxed P1 specificity for inhibitors, even more so than seen 51 52 previously for substrates.6 Colorimetric tetrapeptide substrates with P1 Arg or Leu were cleaved 53 54 with ≤33% of the rate of those with P1 Phe or Tyr.6 Conversely, in this study the activity of 55 56 inhibitors with P1 Arg or Leu (2, 5) was ~80% of those with P1 Phe or Tyr (3, 4). This 57 58 59 60

1 2 3 phenomenon is similar to what we have previously observed for KLK7 where KLK7 did not 4 5 35 6 cleave with high rates after P1 Arg in colorimetric tetrapeptide substrates, yet SFTI-variants 7 8 with P1 Arg potently inhibited KLK7 (Ki ≥ 0.8 nM).27, 29, 31 Consistent with this observation, 9 10 11 both KLK7 and CG cleaves after basic residues in to some degree after prolonged 12 13 incubation.7, 36 Thus, for these two proteases it appears that when the P1 residue is presented in a 14 15 loop with complementary fit to the and span both the prime and non-prime sides, the 16 17 18 contribution of the P1 residue is less important. These findings have implications for engineering 19 20 of inhibitors based on (covalent) or standard mechanism inhibitor (reversible) 21 22 protein/peptide scaffolds that rely on presenting a highly complementary binding loop. 23 24 25 For CG and NSP4 the ability to recognise basic residues in the S1 pocket appears to have 26 27 28 evolved independently from each other and from other serine proteases with trypsin-like 29 30 specificity. For NSP4 P1 Arg binds to the hydroxyl groups of Ser192 and Ser216 rather than 31 32 33 with Asp189 as for other trypsin-like proteases. This allows NSP4 to cleave after post- 34 35 translationally modified amino acids such as citrulline and methylarginine.34 CG can cleave after 36 37 basic residues due to interaction of the Lys/Arg P1 residue with Glu226 (PDB 1AU8). CG 38 39 37 40 cleaves with high catalytic efficiency after P1 4-guanidyl-L-phenylalanine and it is possible 41 42 that CG can recognise and cleave after other post-translationally modified amino acids. For 43 44 example, cathepsin B is well known to cleave after citrulline, and citrulline has been used 45 46 38 47 frequently in drug conjugate linkers designed for intracellular protease processing. Highly 48 49 activated neutrophils release neutrophil extracellular traps (NETs) that contain reactive oxygen 50 51 species, antimicrobial peptides, histones, DNA and NSPs that effectively contain and eliminate 52 53 39 54 pathogens locally. Enzymatic deamination of Arg to citrulline rapidly occurs for histones in 55 56 NETs upon inflammatory stimuli40 and these modified residues may present cleavage sites for 57 58 59 60

1 2 3 CG and/or NSP4 to accelerate the unpacking of the chromatin. Similarly, the release of reactive 4 5 6 oxygen species by neutrophils also produces a number of chemical post-translational amino acid 7 8 side chain modifications,41 and some of these may be cleavage recognition sites for CG and/or 9 10 11 NSP4 thus limiting proteolysis to the area of neutrophil activation. Future studies are needed to 12 13 determine if this is the case, and if so, such sequences including post-translationally modified 14 15 amino acids could provide leads for design of highly specific inhibitors. 16 17 18 Screening CG against a P2ʹ diverse SFTI-based inhibitor library revealed that CG has a 19 20 21 broad P2ʹ specificity, which includes acidic, hydrophobic and aromatic, but not basic, residues. 22 23 Substituting the most preferred P2ʹ residue Asp7 into the most potent inhibitor (13) resulted in an 24 25 inhibitor with 80% reduced activity (17). Conversely, substituting the similarly preferred Glu7 26 27 28 residue into 13 produced an inhibitor with 24% less activity (20). Molecular dynamics 29 30 simulations indicated that this may be a result of cooperativity between the P1 and P2ʹ residues, 31 32 where the nature of the P1 residue subtly alters the positioning of the P2ʹ residue (Supplementary 33 34 35 Figure S3C-H). We have previously seen that the P2ʹ preference for chymotrypsin depends on 36 37 the P1 residue. Screening chymotrypsin against the SFTI-based P2ʹ library with P1 Arg revealed 38 39 27 40 that P2ʹ Tyr was not preferred, but P2ʹ Tyr was preferred in another SFTI-variant with P1 Phe. 41 42 Therefore, it may be that the cooperativity between the P1 and P2ʹ positions is more pronounced 43 44 than previously known, but whether this phenomenon commonly occurs for serine proteases 45 46 47 needs to be further examined by future studies. 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2 3 In this study we have defined the extended substrate specificity of CG, which allowed the design 4 5 6 of potent macrocyclic peptide inhibitors of CG based on the SFTI-1 scaffold. One of these 7 8 variants (22) showed great selectivity over other NSPs and a panel of serine proteases with 9 10 11 chymotryptic and tryptic specificity. This variant is the most potent and specific CG inhibitor 12 13 described to date and is thus a promising lead-compound for the further development of CG 14 15 inhibitors targeting chronic inflammatory disorders resulting from an imbalance between NSP 16 17 18 activity and endogenous protease inhibitors. 19 20 21 22 23 24 EXPERIMENTAL SECTION 25 26 27 Protein Expression and Protein Sources. Proteases were sourced from Molecular Innovations 28 29 (CG, NE, chymase and alpha-thrombin), BioVision (PR3) and Sigma Aldrich (bovine 30 31 32 chymotrypsin, bovine trypsin and human plasmin). Recombinant KLK7 was expressed in 33 34 zymogen form in Pichia pastoris strain X-33 and purified from the culture supernatant by cation 35 36 exchange chromatography as previously described.29, 42 KLK7 was activated by enterokinase 37 38 39 (EK Max) at 37◦C for 2 h (1 unit EK Max per 50 µg of pro-KLK) and repurified by cation 40 41 exchange. The active protein was quantified by active site titration using α1-antitrypsin (Sigma- 42 43 Aldrich) and stored in 20% glycerol (v/v) at -80°C until use. 44 45 46 Peptide Synthesis. Peptide para-nitroanilide substrates were synthesized manually on 2- 47 48 49 chlorotrityl chloride resin (Chem-Impex, 0.8 mmol equiv/g) that had been derivatized with para- 50 51 phenylenediamine (Sigma Aldrich), as previously described.43 Coupling reactions were 52 53 performed using Fmoc N-protected amino acids (4 equiv) activated with 4 equiv O-(6- 54 55 chlorobenzotriazole-1-yl)-1,1,3,3-tetramethylaminium hexafluorophosphate (HCTU) and 8 equiv 56 57 58 59 60

1 2 3 N,N-diisopropylethylamine (DIPEA) in DMF (2 × 5 min reactions per residue). Fmoc protecting 4 5 6 groups were removed using 30% piperidine in DMF (2 × 3 min). Assembled side chain protected 7 v 8 peptides were cleaved from the resin using 1.25% ( /v) TFA in dichloromethane (DCM) and 9 10 11 precipitated in diethyl ether before oxidation of the C-terminal para-aminoanilide group using 6 12 13 equiv Oxone® (Sigma Aldrich) in H2O:acetonitrile (1:1). Oxidized peptides were extracted using 14 15 DCM:ethyl acetate (1:1) and the organic phase was dried using a rotary evaporator. Side-chain 16 17 18 protecting groups were removed by cleavage using TFA/triisopropylsilane/H2O (96:2:2) (20 mL 19 20 per 0.1 mmol peptide) and the peptides were collected by precipitation in diethyl ether. 21 22 23 Peptide aldehyde inhibitors were synthesized manually on H-Phe(Boc)2-H NovaSyn TG resin 24 25 (0.20 mmol/g, Novabiochem) using the same reaction conditions as for peptide para-nitroanilide 26 27 substrates. On-resin removal of protecting groups and peptide cleavage was performed as 28 29 30 previously described,33 and the final products were stored under a nitrogen atmosphere at -20°C 31 32 until use. 33 34 35 SFTI-based inhibitors were synthesized on 2-chlorotrityl chloride resin (starting at Gly1) using a 36 37 Symphony automated peptide synthesizer (Protein Technologies, Inc). Peptide elongation and 38 39 liberation of side-chain protected peptides from the solid support were performed as above. 40 41 42 Head-to-tail cyclization was performed in DMF (50 mL per 0.1 mmol peptide) with 4 equiv 1- 43 44 [Bis(dimethylamino)methylene]-1H-1,2,3-triazolo[4,5-b]pyridinium 3-oxid hexafluorophosphate 45 46 47 (HATU) and 8 equiv DIPEA for 3 hours. DMF and activators were removed by adding one 48 49 volume of DCM and washing with 2 volumes of H2O (2-3 times). The organic phase was 50 51 recovered and the DCM was removed by rotary evaporation. Side-chain protecting groups were 52 53 54 removed as for peptide para-nitroanilide substrates. Formation of the disulfide bond was 55 56 57 58 59 60

1 2 3 achieved by stirring at room temperature (3 hours) in 0.1 M ammonium bicarbonate buffer (pH 4 5 6 8.3) containing 10 µM oxidized glutathione (100 mL per 0.1 mmol peptide). 7 8 9 Peptide Purification and Mass Spectrometry Analysis. Peptides were purified by reverse 10 11 phase HPLC (Shimadzu Prominence) using a 5 µm ZORBAX Extend-C18 PrepHT column (21.2 12 13 × 250 mm) and a linear gradient of 10% acetonitrile/0.05% TFA to 70% acetonitrile/0.05% TFA. 14 15 16 SFTI variants were purified twice, both before and after formation of the disulfide bond (see 17 18 above). Peptide purity (> 95%) was confirmed by UPLC using a 5 µm Agilent 300 SB C18 19 20 21 column (2.1 × 50 mm) at 50°C with mobile phases as above (Supporting Information, Figure 22 23 S1). Peptide masses were determined by electrospray ionization mass spectroscopy (Shimadzu 24 25 Prominence). 26 27 28 Substrate Library Screening. Crude tetrapeptide-pNAs were adjusted to equal molarity as 29 30 measured by absorbance at 405 nm following total hydrolysis of the pNA moiety. The substrate 31 32 33 library was screened against CG using with 150 µM substrate in 300 µL assay buffer (0.15 M 34 v 35 NaCl, 0.1 M Tris.HCl, pH 8.0 and 0.005% ( /v) Triton X-100) with 10% acetonitrile to aid 36 37 38 substrate solubility. Hydrolysis was monitored at λ 405 nm for 5 minutes in clear non-binding 39 40 surface 96-well plates (Corning) using anInfinite® M1000 PRO microplate reader (TECAN). All 41 42 assays were performed three times in triplicate 43 44 45 Kinetic and Inhibitory Assays. Kinetic constants of substrates were determined using a serial 46 47 48 dilution of substrate. Enzymatic activity was determined by monitoring the liberation of either 49 50 pNA or 4-methylcoumaryl-7-amide (MCA) moieties. Assays with peptide-pNA substrates were 51 52 performed as for the substrate library. Assays with peptide-MCA substrates were performed 53 54 55 using 200 µl assay buffer in black non-binding surface 96-well plates (Corning) monitoring the 56 57 fluorescence at λex 360 nm/ λem 460 nm over 10 minutes. The SFTI-based P2ʹ diverse inhibitor 58 59 60

1 2 3 library (Figure 3) was screened against CG using an inhibitor concentration of 1 µM. Protease 4 5 6 concentrations, substrates, substrate concentrations, KM values and buffer additives used are 7 8 given in Table S2 (Supporting Information). Inhibition constants for peptide aldehydes or SFTI 9 10 11 variants were determined following equilibration of inhibitor and proteases at room temperature 12 13 for 30 minutes. The kinetic (Michaelis-Menten) and inhibition (Morrison Ki) constants were 14 15 calculated by non-linear regression using Prism 6 (GraphPad). All assays were conducted in 16 17 18 three independent triplicate experiments. 2019 1D NMR. Peptides were dissolved in 90% H O/10% D O (v/v) (compounds 1, 2 and 22) or 75% 21 2 2 22 23 Acetonitrile-d3/25% H2O (v/v) (compounds 3-21) at approximately 1 mg/ml at pH ranging from 24 25 3–4. 1H one- and two-dimensional TOCSY (total correlation spectroscopy) and NOESY (nuclear 26 27 28 Overhauser effect spectroscopy) NMR experiments were carried out on a Bruker 600 MHz 29 30 spectrometer at 298K. The spectra are shown in Supplementary Figure S2 and were internally 31 32 33 referenced using 4,4-dimethyl-4-silapentane-1-sulfonic acid. 34 35 36 Molecular Modeling. Models of SFTI-1 variants in complex with proteases were constructed by 37 38 overlaying a trypsin/SFTI-1 complex (PDB 1SFI) with CG (PDB 1T32)or NSP4 (4Q7Z) using 39 40 MUSTANG44 (Cα RMSD 1.03 Å). Complexes were solvated with TIP3P water and neutralized 41 42 + - 45 43 with Na /Cl counter ions to a final concentration of 100 mM in VMD 1.9.2. producing systems 44 45 of approximately 25000 atoms including 7000 water molecules. Each complex was equilibrated 46 47 using a stepwise relaxation procedure over 2.5 ns (2 fs time step) using NAMD 2.1146 and 48 49 27 50 CHARMM27 force fields parameters as previously described. Briefly, a Langevin thermostat 51 52 with a damping coefficient of 0.5 ps-1 was used to maintain the system temperature and the 53 54 system pressure was maintained at 1 atm using a Langevin piston barostat. The particle mesh 55 56 57 Ewald algorithm was used to compute long-range electrostatic interactions at every second time 58 59 60

1 2 3 step and non-bonded interactions were truncated smoothly between 10.5-12 Å. Hydrogen bonds 4 5 6 were constrained by the SHAKE algorithm (or the SETTLE algorithm for water). Production 7 8 runs of 10 ns were performed under NVT conditions with otherwise identical force field and 9 10 47 11 simulation parameters as above using ACEMD. Coordinates were saved every 500 simulation 12 13 steps producing 10000 frames per trajectory. The simulation frames were clustered based on Cα 14 15 RMSDs using VEGA ZZ v2.348 and the frame that most closely aligned with the largest cluster 16 17 18 was selected representative models of the complexes (Figure 4; Supporting Information, Figure 19 20 S3). 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2 3 ASSOCIATED CONTENT 4 5 Supporting Information 6 7 The supporting material is available free of charge on the ACS Publications website 8 9 10 (http://pubs.acs.org). Authors will release the atomic coordinates and experimental data upon 11 12 article publication. Supporting information includes peptide purification and NMR 13 14 characterization, kinetic substrate and inhibitor assay methods, molecular dynamics simulations 15 16 17 and PDB coordinates for molecular models. 18 19 20 PDB coordinates for computational model of Figure 4A-B 21 22 PDB coordinates for computational model of Figure S3A-B 23 24 PDB coordinates for computational model of Figure S3C-D 25 26 27 PDB coordinates for computational model of Figure S3E-F 28 29 PDB coordinates for computational model of Figure S3G-H 30 31 32 PDB coordinates for computational model of the table of content graphic 33 34 35 36 37 AUTHOR INFORMATION 38 39 Corresponding Author 40 41 42 Dr Joakim Swedberg, Institute for Molecular Bioscience, The University of Queensland, 43 44 Brisbane QLD 4072, Australia. 45 46 47 Author Contributions 48 49 All authors contributed to the writing of this manuscript, and all authors have approved the final 50 51 version. 52 53 54 55 56 57 58 59 60

1 2 3 ACKNOWLEDGMENTS 4 5 6 DJC is an Australian Research Council Laureate Fellow [FL150100146] and JES is a National 7 8 9 Health and Medical Research Council Early Career Fellow [APP1069819]. 10 11 12 13 14 ABBREVIATIONS USED 15 16 17 SFTI-1, Sunflower Trypsin Inhibitor-1; CG, cathepsin G; NE, , PR3, 18 19 20 proteinase-3, NSP4, neutrophil serine protease-4, KLK7, kallikrein related peptidase-7; -pNA, 21 22 para-nitroanilide, HATU, 1-[Bis(dimethylamino)methylene]-1H-1,2,3-triazolo[4,5-b]pyridinium 23 24 3-oxid hexafluorophosphate; DIPEA, N,N-diisopropylethylamine. 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2 3 REFERENCES 4 5 (1) Amulic, B.; Cazalet, C.; Hayes, G. L.; Metzler, K. D.; Zychlinsky, A. Neutrophil 6 7 8 function: from mechanisms to disease. Annu. Rev. Immunol. 2012, 30, 459-489. 9 10 (2) Kolaczkowska, E.; Kubes, P. Neutrophil recruitment and function in health and 11 12 inflammation. Nat. Rev. Immunol. 2013, 13, 159-175. 13 14 15 (3) Nauseef, W. M.; Borregaard, N. Neutrophils at work. Nat. Immunol. 2014, 15, 602-611. 16 17 (4) Korkmaz, B.; Moreau, T.; Gauthier, F. Neutrophil elastase, proteinase 3 and cathepsin G: 18 19 20 physicochemical properties, activity and physiopathological functions. Biochimie 2008, 21 22 90, 227-242. 23 24 (5) Perera, N. C.; Schilling, O.; Kittel, H.; Back, W.; Kremmer, E.; Jenne, D. E. NSP4, an 25 26 27 elastase-related protease in human neutrophils with arginine specificity. Proc. Natl. Acad. 28 29 Sci. U. S. A. 2012, 109, 6229-6234. 30 31 (6) O'Donoghue, A. J.; Jin, Y.; Knudsen, G. M.; Perera, N. C.; Jenne, D. E.; Murphy, J. E.; 32 33 34 Craik, C. S.; Hermiston, T. W. Global substrate profiling of proteases in human 35 36 neutrophil extracellular traps reveals consensus motif predominantly contributed by 37 38 39 elastase. PLoS One 2013, 8, e75141. 40 41 (7) Heinz, A.; Jung, M. C.; Jahreis, G.; Rusciani, A.; Duca, L.; Debelle, L.; Weiss, A. S.; 42 43 Neubert, R. H.; Schmelzer, C. E. The action of neutrophil serine proteases on elastin and 44 45 its precursor. Biochimie 2012, 94, 192-202. 46 47 48 (8) Korkmaz, B.; Horwitz, M. S.; Jenne, D. E.; Gauthier, F. Neutrophil elastase, proteinase 3, 49 50 and cathepsin G as therapeutic targets in human diseases. Pharmacol. Rev. 2010, 62, 726- 51 52 53 759. 54 55 (9) Pham, C. T. Neutrophil serine proteases: specific regulators of inflammation. Nat. Rev. 56 57 Immunol. 2006, 6, 541-550. 58 59 60

1 2 3 (10) Adkison, A. M.; Raptis, S. Z.; Kelley, D. G.; Pham, C. T. Dipeptidyl peptidase I activates 4 5 6 neutrophil-derived serine proteases and regulates the development of acute experimental 7 8 arthritis. J. Clin. Invest. 2002, 109, 363-371. 9 10 11 (11) Duranton, J.; Bieth, J. G. Inhibition of proteinase 3 by [alpha]1-antitrypsin in vitro 12 13 predicts very fast inhibition in vivo. Am. J. Respir. Cell Mol. Biol. 2003, 29, 57-61. 14 15 (12) Rubin, H.; Plotnick, M.; Wang, Z. M.; Liu, X.; Zhong, Q.; Schechter, N. M.; Cooperman, 16 17 18 B. S. Conversion of alpha 1-antichymotrypsin into a human neutrophil elastase inhibitor: 19 20 demonstration of variants with different association rate constants, stoichiometries of 21 22 inhibition, and complex stabilities. Biochemistry 1994, 33, 7627-7633. 23 24 25 (13) Korkmaz, B.; Poutrain, P.; Hazouard, E.; de Monte, M.; Attucci, S.; Gauthier, F. L. 26 27 Competition between elastase and related proteases from human neutrophil for binding to 28 29 alpha1-protease inhibitor. Am. J. Respir. Cell Mol. Biol. 2005, 32, 553-559. 30 31 32 (14) Boudier, C.; Bieth, J. G. Oxidized mucus proteinase inhibitor: a fairly potent neutrophil 33 34 elastase inhibitor. Biochem. J. 1994, 303 ( Pt 1), 61-68. 35 36 37 (15) Taggart, C.; Cervantes-Laurean, D.; Kim, G.; McElvaney, N. G.; Wehr, N.; Moss, J.; 38 39 Levine, R. L. Oxidation of either methionine 351 or methionine 358 in alpha 1- 40 41 antitrypsin causes loss of anti-neutrophil elastase activity. J. Biol. Chem. 2000, 275, 42 43 44 27258-27265. 45 46 (16) Meyer-Hoffert, U.; Wiedow, O. Neutrophil serine proteases: mediators of innate immune 47 48 responses. Curr. Opin. Hematol. 2011, 18, 19-24. 49 50 51 (17) von Nussbaum, F.; Li, V. M. Neutrophil elastase inhibitors for the treatment of 52 53 (cardio)pulmonary diseases: Into clinical testing with pre-adaptive pharmacophores. 54 55 Bioorg. Med. Chem. Lett. 2015, 25, 4370-4381. 56 57 58 59 60

1 2 3 (18) Twigg, M. S.; Brockbank, S.; Lowry, P.; FitzGerald, S. P.; Taggart, C.; Weldon, S. The 4 5 6 Role of Serine Proteases and Antiproteases in the Cystic Fibrosis Lung. Mediators 7 8 Inflamm. 2015, 2015, 293053. 9 10 11 (19) Bashir, A.; Shah, N. N.; Hazari, Y. M.; Habib, M.; Bashir, S.; Hilal, N.; Banday, M.; 12 13 Asrafuzzaman, S.; Fazili, K. M. Novel variants of SERPIN1A gene: Interplay between 14 15 alpha1-antitrypsin deficiency and chronic obstructive pulmonary disease. Respir. Med. 16 17 18 2016, 117, 139-149. 19 20 (20) Lucas, S. D.; Costa, E.; Guedes, R. C.; Moreira, R. Targeting COPD: advances on low- 21 22 molecular-weight inhibitors of human neutrophil elastase. Med. Res. Rev. 2013, 33 Suppl 23 24 25 1, E73-101. 26 27 (21) Perera, N. C.; Jenne, D. E. Perspectives and potential roles for the newly discovered 28 29 NSP4 in the immune system. Expert Rev. Clin. Immunol. 2012, 8, 501-503. 30 31 32 (22) Kosikowska, P.; Lesner, A. Inhibitors of cathepsin G: a patent review (2005 to present). 33 34 Expert Opin. Ther. Pat. 2013, 23, 1611-1624. 35 36 37 (23) de Garavilla, L.; Greco, M. N.; Sukumar, N.; Chen, Z. W.; Pineda, A. O.; Mathews, F. S.; 38 39 Di Cera, E.; Giardino, E. C.; Wells, G. I.; Haertlein, B. J.; Kauffman, J. A.; Corcoran, T. 40 41 W.; Derian, C. K.; Eckardt, A. J.; Damiano, B. P.; Andrade-Gordon, P.; Maryanoff, B. E. 42 43 44 A novel, potent dual inhibitor of the leukocyte proteases cathepsin G and chymase: 45 46 molecular mechanisms and anti-inflammatory activity in vivo. J. Biol. Chem. 2005, 280, 47 48 18001-18007. 49 50 51 (24) Maryanoff, B. E.; de Garavilla, L.; Greco, M. N.; Haertlein, B. J.; Wells, G. I.; Andrade- 52 53 Gordon, P.; Abraham, W. M. Dual inhibition of cathepsin G and chymase is effective in 54 55 56 57 58 59 60

1 2 3 animal models of pulmonary inflammation. Am. J. Respir. Crit. Care Med. 2010, 181, 4 5 6 247-253. 7 8 (25) Swedberg, J. E.; Nigon, L. V.; Reid, J. C.; de Veer, S. J.; Walpole, C. M.; Stephens, C. 9 10 11 R.; Walsh, T. P.; Takayama, T. K.; Hooper, J. D.; Clements, J. A.; Buckle, A. M.; Harris, 12 13 J. M. Substrate-guided design of a potent and selective kallikrein-related peptidase 14 15 inhibitor for kallikrein 4. Chem. Biol. 2009, 16, 633-643. 16 17 18 (26) Swedberg, J. E.; de Veer, S. J.; Sit, K. C.; Reboul, C. F.; Buckle, A. M.; Harris, J. M. 19 20 Mastering the canonical loop of serine protease inhibitors: enhancing potency by 21 22 optimising the internal hydrogen bond network. PLoS One 2011, 6, e19302. 23 24 25 (27) de Veer, S. J.; Wang, C. K.; Harris, J. M.; Craik, D. J.; Swedberg, J. E. Improving the 26 27 selectivity of engineered protease inhibitors: optimizing the P2 prime residue using a 28 29 versatile cyclic peptide library. J. Med. Chem. 2015, 58, 8257-8268. 30 31 32 (28) Quimbar, P.; Malik, U.; Sommerhoff, C. P.; Kaas, Q.; Chan, L. Y.; Huang, Y. H.; 33 34 Grundhuber, M.; Dunse, K.; Craik, D. J.; Anderson, M. A.; Daly, N. L. High-affinity 35 36 37 cyclic peptide inhibitors. J. Biol. Chem. 2013, 288, 13885-13896. 38 39 (29) de Veer, S. J.; Swedberg, J. E.; Akcan, M.; Rosengren, K. J.; Brattsand, M.; Craik, D. J.; 40 41 Harris, J. M. Engineered protease inhibitors based on sunflower trypsin inhibitor-1 42 43 44 (SFTI-1) provide insights into the role of sequence and conformation in Laskowski 45 46 mechanism inhibition. Biochem. J. 2015, 469, 243-253. 47 48 (30) Legowska, A.; Debowski, D.; Lesner, A.; Wysocka, M.; Rolka, K. Introduction of non- 49 50 51 natural amino acid residues into the substrate-specific P1 position of trypsin inhibitor 52 53 SFTI-1 yields potent chymotrypsin and cathepsin G inhibitors. Bioorg. Med. Chem. 2009, 54

55 17, 3302-3307. 56 57 58 59 60

1 2 3 (31) de Veer, S. J.; Swedberg, J. E.; Brattsand, M.; Clements, J. A.; Harris, J. M. Exploring 4 5 6 the active site binding specificity of kallikrein-related peptidase 5 (KLK5) guides the 7 8 design of new peptide substrates and inhibitors. Biol. Chem. 2016. 9 10 11 (32) de Veer, S. J.; Furio, L.; Swedberg, J. E.; Munro, C. A.; Brattsand, M.; Clements, J. A.; 12 13 Hovnanian, A.; Harris, J. M. Selective substrates and inhibitors for kallikrein-related 14 15 peptidase 7 (KLK7) shed light on KLK proteolytic activity in the . J. 16 17 18 Invest. Dermatol. 2016, doi:10.1016/j.jid.2016.1009.1017. 19 20 (33) Swedberg, J. E.; Harris, J. M. Plasmin substrate cooperativity guides the 21 22 design of potent peptide aldehyde inhibitors. Biochemistry 2011, 50, 8454-8462. 23 24 25 (34) Lin, S. J.; Dong, K. C.; Eigenbrot, C.; van Lookeren Campagne, M.; Kirchhofer, D. 26 27 Structures of neutrophil serine protease 4 reveal an unusual mechanism of substrate 28 29 recognition by a trypsin-fold protease. Structure 2014, 22, 1333-1340. 30 31 32 (35) Debela, M.; Magdolen, V.; Schechter, N.; Valachova, M.; Lottspeich, F.; Craik, C. S.; 33 34 Choe, Y.; Bode, W.; Goettig, P. Specificity profiling of seven human tissue 35 36 37 reveals individual subsite preferences. J. Biol. Chem. 2006, 281, 25678-25688. 38 39 (36) Yoon, H.; Laxmikanthan, G.; Lee, J.; Blaber, S. I.; Rodriguez, A.; Kogot, J. M.; 40 41 Scarisbrick, I. A.; Blaber, M. Activation profiles and regulatory cascades of the human 42 43 44 kallikrein-related peptidases. J. Biol. Chem. 2007, 282, 31852-31864. 45 46 (37) Wysocka, M.; Legowska, A.; Bulak, E.; Jaskiewicz, A.; Miecznikowska, H.; Lesner, A.; 47 48 Rolka, K. New chromogenic substrates of human neutrophil cathepsin G containing non- 49 50 51 natural aromatic amino acid residues in position P(1) selected by combinatorial chemistry 52 53 methods. Mol. Divers. 2007, 11, 93-99. 54

55 56 57 58 59 60

1 2 3 (38) Perez, H. L.; Cardarelli, P. M.; Deshpande, S.; Gangwar, S.; Schroeder, G. M.; Vite, G. 4 5 6 D.; Borzilleri, R. M. Antibody-drug conjugates: current status and future directions. Drug 7 8 Discov. Today 2014, 19, 869-881. 9 10 11 (39) Brinkmann, V.; Reichard, U.; Goosmann, C.; Fauler, B.; Uhlemann, Y.; Weiss, D. S.; 12 13 Weinrauch, Y.; Zychlinsky, A. Neutrophil extracellular traps kill bacteria. Science 2004, 14 15 303, 1532-1535. 16 17 18 (40) Neeli, I.; Khan, S. N.; Radic, M. Histone deimination as a response to inflammatory 19 20 stimuli in neutrophils. J. Immunol. 2008, 180, 1895-1902. 21 22 (41) Mowen, K. A.; David, M. Unconventional post-translational modifications in 23 24 25 immunological signaling. Nat. Immunol. 2014, 15, 512-520. 26 27 (42) Stefansson, K.; Brattsand, M.; Roosterman, D.; Kempkes, C.; Bocheva, G.; Steinhoff, 28 29 M.; Egelrud, T. Activation of proteinase-activated -2 by human kallikrein-related 30 31 32 peptidases. J. Invest. Dermatol. 2008, 128, 18-25. 33 34 (43) Abbenante, G.; Leung, D.; Bond, T.; Fairlie, D. P. An efficient Fmoc strategy for the 35 36 37 rapid synthesis of peptide para-nitroanilidies. Lett. Pept. Sci. 2000, 7, 347-351. 38 39 (44) Konagurthu, A. S.; Whisstock, J. C.; Stuckey, P. J.; Lesk, A. M. MUSTANG: a multiple 40 41 structural alignment algorithm. Proteins 2006, 64, 559-574. 42 43 44 (45) Humphrey, W.; Dalke, A.; Schulten, K. VMD: visual molecular dynamics. J. Mol. 45 46 Graph. 1996, 14, 33-38, 27-38. 47 48 (46) Phillips, J. C.; Braun, R.; Wang, W.; Gumbart, J.; Tajkhorshid, E.; Villa, E.; Chipot, C.; 49 50 51 Skeel, R. D.; Kale, L.; Schulten, K. Scalable molecular dynamics with NAMD. J. 52 53 Comput. Chem. 2005, 26, 1781-1802. 54

55 56 57 58 59 60

1 2 3 (47) Harvey, M. J.; Giupponi, G.; Fabritiis, G. D. ACEMD: Accelerating Biomolecular 4 5 6 Dynamics in the Microsecond Time Scale. J. Chem. Theory Comput. 2009, 5, 1632-1639. 7 8 (48) Pedretti, A.; Villa, L.; Vistoli, G. VEGA - an open platform to develop chemo-bio- 9 10 11 informatics applications, using plug-in architecture and script programming. J. Comput. 12 13 Aided Mol. Des. 2004, 18, 167-173. 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

55 56 57 58 59 60

1 2 3 FIGURE LEGENDS 4 5 6 Figure 1: Structure and sequence of SFTI-1. (A) Schematic of the SFTI-1 sequence with 7 8 9 residues of β-strands shown in light blue, the disulfide bond shown in yellow and the scissile 10 11 bond (Lys5:Ser6) is marked with a dashed line. (B) Ribbon plot of the solution structure of 12 13 14 SFTI-1 (PDB 1JBL) with β-strands shown in light blue. The disulfide bond (yellow) and P1 15 16 residue Lys5 (carbon: grey, nitrogen: dark blue) are shown in ball and stick model. (C) Model of 17 18 SFTI-1 (PDB 1SFI) shown as stick model (carbon: green, nitrogen: blue, oxygen: red, sulphur: 19 20 21 yellow, hydrogen bonds: purple dashed lines) on the electrostatic surface of CG (PDB 1T32, 22 23 positive: blue, negative: red, and neutral: white). 24 25 26 Figure 2: Screening of CG substrate specificity against a colorimetric substrate library. 27 28 Amidolytic activity of CG against a sparse matrix library of tetrapeptide-pNA substrates. The y- 29 30 31 axis represents the relative activity of CG normalized to the cleavage rate of the substrate 32 33 cleaved at the highest rate for P4 Asp (yellow), Thr (orange), Trp (green), Ile (blue) and Nle 34 35 36 (purple) Substrate sequences for the P3-P1 subsites are given across the x-axis in the one letter 37 38 amino acid code (n denotes norleucine). Data is expressed as mean ± S.E.M. from three 39 40 independent experiments performed in triplicate. 41 42 43 Figure 3: Inhibitory activity of a SFTI-based library with a variable P2ʹ residue against 44 45 46 CG. The variable residue at the P2ʹ position is shown on the x-axis using single letter code for 47 48 naturally occurring amino acids (B denotes biphenylalanine). All compounds were screened at an 49 50 inhibitor concentration of 1 µM. The wild-type SFTI-1 residue Ile7 is highlighted in grey. The % 51 52 53 inhibition (y-axis) for each variant was calculated by comparing kinetic rates to control assays 54 55 without inhibitor. Data is expressed as mean ± S.E.M. from three independent experiments 56 57 performed in triplicate. 58 59 60

1 2 3 Figure 4: Molecular dynamics simulations of models of CG/SFTI-variant complexes. (A) 4 5 6 Stick model of 22 (carbon: green, nitrogen: blue, oxygen: red, sulphur: yellow, hydrogen bonds: 7 8 purple dashed lines) on the electrostatic surface of CG (positive: blue, negative: red, and neutral: 9 10 11 white). Residue numbering is shown for SFTI-variants (green) and CG (grey, chymotrypsin 12 13 numbering) Compound 22 and CG residues are highlighted in green and grey, respectively. X 14 15 denotes the side chain of 4-guanidyl-L-phenylalanine. (B) Stick model of 22 bound to CG shown 16 17 18 as a ribbon plot (grey) with interacting residues highlighted in stick model. PDB coordinates for 19 20 the computational models are included as supporting material (Figure 4.pdb). 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2 3 TABLES 4 5 Table 1: Inhibitor masses and inhibition constants for CG 6 a 7 Compound Inhibitor Calculated Determined Purity (%) Ki (nM) ± SEM 8 ion mass 9 ion mass 10 +2 +2 11 1 GRCTKSIPPICFPD 757.9 757.8 99.9 730 ± 90 12 2 GTCTRSIPPICNPN Reference 27 - - 490 ± 60 13 14 3 GTCTFSIPPICNPN 722.8+2 722.8+2 99.9 390 ± 60 15 16 4 GTCTYSIPPICNPN 730.8+2 730.8+2 99.9 410 ± 40 17 +2 +2 18 5 GTCTLSIPPICNPN 705.9 705.8 98.9 480 ± 50 19 6 DEnF-H 507.6+1 507.4+1 97.3 630 ± 50 20 21 7 DTnF-H 479.6+1 479.4+1 99.9 530 ± 60 22 23 8 TEnF-H 493.6+1 493.4+1 97.3 3000 ± 260 24 +1 25 9 IEnF-H 505.6 505.5+1 97.0 1200 ± 80 26 10 752.9+2 752.9+2 99.9 160 ± 20 27 GDCnFSIPPICFPD 28 11 GTCnFSIPPICFPD 745.9+2 745.9+2 99.9 7.3 ± 0.9 29 30 12 GICnFSIPPICFPD 751.9+2 751.9+2 99.9 3.4 ± 0.7 31 +2 +2 32 13 GTCnFSIPPICFPN 745.4 745.4 99.9 0.89 ± 0.22 33 14 +2 +2 97.3 4.9 ± 0.7 34 GICnFSIPPICFPN 751.4 751.4 35 15 GTCnFnIPPICFPD 759.0+2 758.8+2 99.9 4.8 ± 1.0 36 37 16 GTCnFnIPPICFPN 758.5+2 758.5+2 99.9 12 ± 1.5 38 39 17 GTCnFSDPPICFPN 746.4+2 746.4+2 99.9 1.7 ± 0.4 40 41 18 GTCTRSDPPICNPN Reference 27 - - 12 ± 1 42 19 GTCnFSIPPICNPN 728.9+2 728.9+2 99.9 62 ± 11 43 44 20 GTCnFSBPPICFPN 800.5+2 800.5+2 99.9 2.9 ± 0.5 45 46 21 GTCnFSEPPICFPN 753.4+2 753.4+2 99.9 1.1 ± 0.3 47 +2 +2 48 22 GTCnXSDPPICFPNc 774.9 774.7 99.9 1.6 ± 0.2 49 aCompounds1-5 and 10-22 are backbone cyclic with a Cys3-Cys11 disulfide bond. n = norleucine 50 51 bAmino acid variations from SFTI-1 (1) are shown in bold font 52 53 cX = 4-guanidyl-L-phenylalanine 54 55 56 57 58 59 60

1 2 3 Table 2: CG substrate masses and Michaelis-Menten kinetic constants 4 -1 -1 -1 5 Substrate Calculated Determined KM (µM) ± kcat(s ) ± SEM kcat/KM (M s ) 6 SEM 7 ion mass (M+1) ion mass (M+1) ± SEM 8 9 DEnF-pNA 643.8 643.4 54 ± 8.0 0.51 ± 0.04 9500 ± 1600 10 DTnF-pNA 615.8 615.4 120 ± 20 1.1 ± 0.09 9200 ± 1600 11 12 TEnF-pNA 629.8 629.4 220 ± 50 0.44 ± 0.02 2000 ± 420 13 14 IEnF-pNA 641.9 641.4 130 ± 40 0.31 ± 0.06 2400 ± 900 15 16 n = norleucine 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49

50 51 52 53 54 55 56 57 58 59 60

1 2 3 Table 3: Inhibition constants for off target proteases 4 a 5 Compound Protease Ki (nM) ± SEM Fold 6 7 selectivity 8 9 13 NE 12000 ± 1000 13000 10 PR3 >10000 >11000 11 12 Chymotrypsin 1.9 ± 0.04 2.1 13

14 KLK7 69 ± 6 78 15 16 17 NE 8900 ± 600 5200 17 PR3 >10000 >5900 18 19 Chymotrypsin 140 ± 4 83 20

21 KLK7 130 ± 8 76 22 23 Chymase 7.7 ± 0.9 4.5 24 20 Chymotrypsin 35 ±6 31 25 26 22 NE 43000 ± 6000 27000 27 28 PR3 > 10000 > 6300 29 30 Chymotrypsin > 10000 >6300 31 KLK7 740 ± 70 460 32 33 Chymase 19600 ± 2100 12000 34

35 Trypsin 580 ± 20 360 36 37 Thrombin >10000 >6300 38 Plasmin >10000 >6300 39 a 40 For inhibitors with >10000 nM there was less than 20% at 41 42 10000 nM 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2 3 4 5 FIGURES 6

7 Figure 1 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44

45 46 47 48 49 50 51 52 53 54 55 56 57

58 59 60

1 2 3 Figure 2 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44

45 46 47 48 49 50 51 52 53 54 55 56 57

58 59 60

1 2 3 Figure 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44

45 46 47 48 49 50 51 52 53 54 55 56 57

58 59 60

1 2 3 Figure 4 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60