Biochemistry Page 1 of 47

1 2 3 4 5 6 7 8 A family of like fungal chimerolectins with 9 10 11 2+ 12 distinct Ca dependent activation mechanism 13 14 15 16 1,†,‡,* 1, †,‡ 1,* 17 Gabriele Cordara , Dipankar Manna and Ute Krengel 18 19 20 1Department of Chemistry, University of Oslo, PO Box 1033 Blindern, 0315 Oslo, Norway 21 22 23 24 25 26 27 Keywords: calciumbinding protein, crystal structure, cysteine , gating tyrosine, protease 28 29 inhibitor 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment 1 Page 2 of 47 Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Graphical abstract 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment Page 3 of 47 Biochemistry

1 2 3 4 ABSTRACT 5 6 7 8 An important function of fungal lectins is to protect their host. Marasmius oreades agglutinin 9 10 (MOA) is toxic to nematodes and exerts its protective effect through protease activity. Its 11 12 13 proteolytic function is associated with a papainlike dimerization domain. The closest homolog 14 15 of MOA is Polyporus squamosus lectin 1a (PSL1a). Here we probed PSL1a for catalytic activity 16 17 and confirmed that it is a calciumdependent , like MOA. The Xray crystal 18 19 20 structures of PSL1a (1.5 Å) and MOA (1.3 Å) in complex with calcium and the irreversible 21 22 cysteine protease inhibitor E64 revealed the structural basis for their mechanism of action. The 23 24 comparison with other calciumdependent (, LapG) reveals a unique metal 25 26 27 dependent activation mechanism relying on a calciuminduced backbone shift and intradimer 28 29 cooperation. Intriguingly, the appear to use a tyrosinegating mechanism instead of pro 30 31 32 peptide processing. A search for potential MOA orthologs suggests the existence of a whole new 33 34 family of fungalspecific chimerolectins with these unique features. 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment 2 Biochemistry Page 4 of 47

1 2 3 INTRODUCTION 4 5 6 Lectins are defined as proteins that “possess at least one noncatalytic domain that binds 7 8 reversibly to a specific mono or oligosaccharide”.1 They are classified into merolectins (single 9 10 11 carbohydratebinding domain), hololectins (two or more carbohydratebinding domains) and 12 13 chimerolectins (carbohydratebinding module fused to one or more differently purposed 14 15 domains). 1 16 17 2 18 Fungi are a wellknown and mostly unexploited source of chimerolectins. Their bioactivity is 19 20 often associated with the presence of a catalytic domain. 3, 4 The Marasmius oreades agglutinin 21 22 (MOA) is a histoblood group Bspecific chimerolectin extracted from the fruiting bodies of the 23 24 5 25 fairy ring mushroom Marasmius oreades . The crystal structure of MOA shows a ricin B chain 26 27 like / papainlike twodomain partition. 6 Its papainlike domain exhibits calcium or manganese 28 29 (II)dependent proteolytic activity. 7 MOA exerts a cytotoxic activity, similarly to other proteins 30 31 8, 9 32 carrying the ricin B chainlike fold. The catalytic domain is a key element in mediating 33 34 toxicity against nematodes or NIH/3T3 cells.10, 11 MOA has been crystallized in either a calcium 35 36 6 12 37 free (PDB ID: 2IHO) or calciumbound form (PDB ID: 3EF2) . Calcium binding triggers a 38 12 7, 10, 13 39 conformational change that is essential for the lectin’s proteolytic activity. 40 41 The closest homolog of MOA is the Polyporus squamosus lectin 1a (PSL1a; 38% sequence 42 43 14, 15 16 44 identity) , which like MOA exhibits cytotoxic activity. Compared to MOA, this 31.2 kDa 45 46 homodimeric lectin has different ligand specificity, to terminal sialic acidcontaining 47 48 glycotopes. 15, 17 The only available crystal structure of PSL1a (PDB ID: 3PHZ) 18 closely mimics 49 50 18 51 the tertiary structure of MOA in its calciumfree state. In particular, the dimerization domain of 52 53 PSL1a retains the papainlike fold and its hallmark L(eft)/R(ight)domain partition (Figure S1). 19 54 55 The good conservation of the papainlike domain (r.m.s.d. = 1.0 Å) extends to the components of 56 57 58 59 60 ACS Paragon Plus Environment 3 Page 5 of 47 Biochemistry

1 2 3 the putative catalytic machinery. The of MOA (Cys215, His257, Glu274) perfectly 4 5 6 aligns with three equivalent residues of PSL1a (Cys208, His248, Glu266). Likewise, each MOA 7 8 residue involved in calcium coordination has a counterpart in PSL1a. The good conservation of 9 10 10, 11, 16 11 structural features, and their importance for cytotoxic activity, suggests a conserved 12 13 enzymatic function. However, the catalytic activity of PSL1a has not been probed to date. 14 15 Here we present a thorough characterization of PSL1a catalytic activity, its structural 16 17 18 underpinnings, and compare PSL1a to MOA. Both lectins were crystallized in complex with the 19 20 E64 cysteine protease inhibitor and calcium. The structures suggest an unusual metaldependent 21 22 activation mechanism, involving the interplay with the symmetryrelated protomer, which 23 24 25 appears to be common to a new family of fungal chimerolectins. 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment 4 Biochemistry Page 6 of 47

1 2 3 MATERIALS AND METHODS 4 5 6 7 8 Expression and purification 9 10 11 For PSL1a production (UniProt reference: Q75WT9), ArcticExpress cells (DE3) cells (Agilent 12 13 Technologies) were transformed with the pET43.1aPSL1a vector, and cultivated according to 14 15 the protocol suggested by the supplier. Gene expression was induced at 11°C for 24h with 0.1 16 17 18 mM IPTG. Subsequently, E. coli pellets were collected by centrifugation and stored at 80°C 19 20 overnight before lysis. After thawing, the bacterial pellets were resuspended in a lysis buffer 21 22 containing 50 mM TrisHCl pH 8.0, 0.15 M NaCl, 2 mM EDTA, 1x concentrated c Omplete 23 24 25 protease inhibitor cocktail EDTA free (Roche Diagnostics Ltd), 1 l/ml Benzonase nuclease 26 27 (Thermo Scientific) and 4 mg/ml hen egg white lysozyme. After incubation on a shaker for two 28 29 hours at RT, the insoluble fraction was removed by two rounds of centrifugation (20000 rcf, 45 30 31 32 min). The clarified cell lysate was passed through a DGalsepharose affinity column (Thermo 33 34 Scientific), followed by extensive washing with 20 mM imidazole pH 8.0 and elution of the 35 36 37 protein using a single step 1.0 M DGal gradient. The fraction containing the eluted PSL1a was 38 39 concentrated using a 10000 MWCO PES membrane (Vivaspin, Sartorius AG) to a volume of 40 41 ≈500 l. Polishing of the protein preparation was carried out by sizeexclusion chromatography 42 43 44 (SEC) using a Superdex 75 10/300 GL gelfiltration column (Tricorn, GE Healthcare Life 45 46 Sciences) with 20 mM imidazole pH 8.0, 2 mM EDTA, 0.2M DGal, 0.15M NaCl and DTT 2 47 48 mM. The fractions containing the purified protein were pooled, concentrated to a final protein 49 50 51 concentration of ≈1015 mg/ml using concentrator tubes with a 10000 MWCO PES membrane 52 53 (Vivaspin, Sartorius AG) and underwent three rounds of buffer exchange against 20 mM 54 55 imidazole/HCl pH 8.0, 2 mM EDTA, 2 mM DTT. 56 57 58 59 60 ACS Paragon Plus Environment 5 Page 7 of 47 Biochemistry

1 2 3 MOA (UniProt reference: Q8X123) was expressed in E. coli BL21 (DE3) cells transformed 4 5 7, 11 6 with the pT7LOMOA expression vector, and purified following an established protocol. 7 8 The protein was stored at 80°C at ≈1015 mg/ml in 10 mM TrisHCl pH 8.0, 5 mM DTT. The 9 10 11 structure described here was obtained from the Cys63Ala variant of MOA, which was generated 12 13 by sitedirected mutagenesis through the QuikChange II kit (Stratagene) following the protocol 14 15 provided by the manufacturer. This variant was produced as a preventive measure against a 16 17 18 possible crossreactivity with thiolmodifying compounds, since Cys63 was found to be 19 20 chemically modified by Nethylmaleimide in an earlier experiment. Parallel experiments with 21 22 wildtype MOA yielded equivalent results, although with marginally lower resolution. 23 24 25 26 27 Activity assay 28 29 The activity of PSL1a or MOA was tested using native α1antitrypsin (α1AT) from human 30 31 7 32 plasma (SigmaAldrich) as the target substrate, following an established protocol. A mixture 33 34 containing 0.4 mg/ml α1AT, 50 mM NaHEPES pH 7.5, 10 mM DTT, 10 mM CaCl 2 and 0.2 35 36 37 mg/ml PSL1a or MOA was incubated at 37°C for 24 h. Variations of the protocol to profile the 38 39 enzymatic activity of PSL1a included the use of a 50 mM concentration of a different buffer than 40 41 NaHEPES pH 7.5 or a 10 mM concentration of a different chloride salt than CaCl 2. The reaction 42 43 44 was stopped by adding 5 l of SDSPAGE sample buffer 4x to 15 l of the reaction mixture and 45 46 boiling at 100°C for 10 min. Samples were loaded alongside with SeeBlue Plus2 PreStained 47 48 Standard (Invitrogen) molecular weight markers on a NuPAGE SDSPAGE 412% gel for the 49 50 51 electrophoretic run; the gels were stained with Coomassie Brilliant Blue dye (SigmaAldrich).. 52 53 Double digestion with a peptide: Nglycanase was carried out by preincubating α1AT 24 h at 54 55 37°C in presence of 2 UN of PNGase F from Elizabethkingia miricola in 50 mM NaHEPES pH 56 57 58 59 60 ACS Paragon Plus Environment 6 Biochemistry Page 8 of 47

1 2 3 7.5. The reaction mixture containing the deglycosylated α1AT was complemented with 10 mM 4 5 6 DTT, 10 mM CaCl 2, 0.2 mg/ml PSL1a or MOA and further incubated as described for the 7 8 untreated substrate. Gel images were analyzed using the Fiji distribution of the ImageJ 9 10 20 11 manipulation software. 12 13 14 15 Crystallization 16 17 18 MOA crystals were obtained for the MOA Cys63Ala variant (concentrated to 5 mg/ml). The 19 20 protein solution contained the Galα1,3(Fucα1,2)Gal trisaccharide (Dextra Laboratories) in a 1:20 21 22 MOA:sugar molar ratio and the proteolytic activity inhibitor E64 in a MOA:E64 1:3 molar 23 24 25 ratio. Crystals grew after a twoweek incubation period at 20°C from a crystallization mixture 26 27 containing 0.1 M imidazole pH 8.0, 15% PEG 8000, 7.5% DMSO and 0.2 M calcium acetate, 28 29 premixed with the protein solution in a 1:1 ratio (drop volumes). 30 31 32 PSL1a crystals were obtained from a solution containing 5 mg/ml PSL1a, 20 mM imidazole 33 34 pH 8.0, 5 mM calcium chloride and the DMSOdissolved E64 inhibitor in a PSL1a:E64 1:3 35 36 37 molar ratio. The proteininhibitor mixture was preincubated for 24h before mixing in a 1:1 ratio 38 39 (drop volumes) with the reservoir solution. Hanging drops were transferred after 1h incubation 40 41 from initial conditions containing 0.1 M citrate pH 5.6, 0.2 NaSCN and 50% PEG 3350 to wells 42 43 44 containing the same components, but a lower PEG 3350 concentration (40% w/v). All crystals 45 46 were cryoprotected using the crystallization solution supplemented with 15% ethylene glycol, 47 48 successively frozen in liquid nitrogen for data collection. 49 50 51 52 53 Data collection, processing, scaling and structure determination 54 55 56 57 58 59 60 ACS Paragon Plus Environment 7 Page 9 of 47 Biochemistry

1 2 3 Diffraction data were collected at the European Synchrotron Radiation Facility (Grenoble, 4 5 6 France) at beamlines ID29 and ID232. The images were processed and scaled using the XDS 7 8 software package. 21 Scaling statistics are given in Table 1. MOA and PSL1a crystals belong to 9 10 11 different space groups, containing either a single protomer (MOA, P6322) or the biological dimer 12 13 (PSL1a, P212121) in the asymmetric unit. The structures were solved by molecular replacement 14 15 with PHASER ,22 using the calciumbound structure of MOA (PDB ID: 3EF2) 12 or the calcium 16 17 18 18 free structure of PSL1a (PDB ID: 3PHZ) as search models, respectively. The MOA search 19 20 model was modified by stripping off atoms with HETATM record (including the calcium ions), 21 22 deleting the Pro54Val56 loop, and by mutating residues showing flexible or generally poorly 23 24 25 defined side chains to Ala. The m FoDFc difference electron density maps showed welldefined, 26 27 positive density peaks at the metal binding sites, the three sugar binding sites and in the active 28 29 site cleft. In PSL1a, additional positive and negative density peaks suggest a conformational 30 31 32 change involving residues Ala165Gly177. 33 34 35 36 37 Model building and refinement 38 23 24 39 Model building and refinement were carried out in cycles, using Coot and REFMAC5 , 40 41 respectively. In an initial step, illdefined side chains and loops were removed from the original 42 43 44 PHASER output. The models were subsequently rebuilt by adding the missing structural 45 46 elements in a stepwise fashion as the quality of the electron density map improved, including (in 47 48 this order) the metal ions, the sugar ligands, water molecules and additional buffer components 49 50 51 (e.g. acetate, ethylene glycol). The metal ions were modeled as calcium based on previous 52 53 structural data 12 , the presence of 0.2 M Ca 2+ in the crystallization solution and a compatible 54 55 ligand coordination 25, 26 . The E64 inhibitor was modeled at the end of the refinement process, 56 57 58 59 60 ACS Paragon Plus Environment 8 Biochemistry Page 10 of 47

1 2 3 when the difference electron density at the allowed unambiguous tracing of the 4 5 6 inhibitor. Ligand occupancy was determined by minimizing the residual difference electron 7 8 density for the inhibitor molecule and taking the Bfactors of nearby interacting atoms into 9 10 11 account. A final refinement step included anisotropic refinement of the MOAE64 model. 12 13 PSL1a diffraction data beyond an Rmeas >0.6 and Ī/σ(Ī)<2 were included in the refinement 14 15 27- process by assessing the value of the CC 1/2 parameter, as suggested by Diederichs and Karplus. 16 17 29 18 A CC 1/2 ≈0.3 was taken as the cutoff limit, based on the suggestion by Evans and Murshudov 19 30 20 that data below a CC 1/2 of 0.20.4 contain negligible residual information. The final model was 21 22 refined against additional diffraction data with a CC 1/2 below the 0.3 limit, settling for a final 23 24 25 resolution of 1.5 Å (CC 1/2 ≈0.3). The final resolution limits were chosen based on a visual 26 27 inspection of the electron density at different resolution cutoffs. 28 29 Validation of the refined models was carried out using the MolProbity server 30 31 31 32 (http://molprobity.biochem.duke.edu) , and additional parameters were calculated with 33 34 phenix.validate. 32 R.m.s.d. values were calculated using the PDBeFold server. 33 Total composite 35 36 37 OMIT maps where calculated with COMIT, which is part of the CCP4 software suite for 38 34 39 macromolecular crystallography. All the figures displaying structural data were generated 40 41 using PyMOL, version 1.5.0.4 (Schrödinger LLC). Rdomain rotation angle measurements were 42 43 44 performed within PyMOL with specific scripts from the ‘psico’ module, an Open Software tool 45 46 developed by Thomas Holder (http:// github.com/speleo3/pymolpsico; Max Planck Institute for 47 48 Developmental Biology, Germany). 49 50 51 52 53 The atomic coordinates have been submitted to the Protein Data Bank (http:// www.pdb.org) 35 54 55 with accession codes 5MU9 and 5MUA. 56 57 58 59 60 ACS Paragon Plus Environment 9 Page 11 of 47 Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 RESULTS AND DISCUSSION 14 15 16 17 18 PSL1a is an active protease 19 20 Primed by the significant similarities of PSL1a and MOA (regarding sequence, structure and 21 22 toxicity) 15, 16, 18 , we tested the hypothesis that also PSL1a is an active protease. α1antitrypsin 23 24 25 from human plasma (α1AT) was selected as a model substrate, since it allows discrimination 26 27 between proteolytic and Nglycanase activity and has been used in previous work. 7 Its three N 28 29 linked glycan chains allow it to be processed as a substrate by enzymes of the PNGase family; 36 30 31 32 at the same time, the presence of a long, flexible loop constitutes an easy target for proteolytic 33 34 cleavage. 37 35 36 37 The incubation with PSL1a led to a shift in the molecular weight of the α1AT SDSPAGE 38 39 band of about 7 kDa, comparable to the proteolytic removal of a 6 kDa fragment, as observed for 40 41 MOA (Figure 1a). The cleavage pattern is consistent with proteolytic activity and rules out N 42 43 44 glycolytic activity. Massspectrometric analysis of the digested products confirmed proteolytic 45 46 cleavage (data not shown). The difference in molecular mass of α1AT digestion products upon 47 48 incubation with PSL1a or MOA can be explained by different substrate specificities. Similar to 49 50 7 51 MOA, PSL1a is enzymatically active between pH 5.5 and 10 (Figure S2a) and requires the 52 53 presence of calcium in the reaction environment (Figure 1b; note that calcium is not required for 54 55 protein stability). For both proteins, manganese (II) can act as a functional substitute for calcium 56 57 58 59 60 ACS Paragon Plus Environment 10 Biochemistry Page 12 of 47

1 2 3 (Figure 1b). As for other enzymes of the cysteine protease family, the proteolytic activity of 4 5 6 PSL1a can be inhibited by thiolmodifying agents like iodoacetamide and Nethylmaleimide 7 8 (Figure S2b) and by specific thiolreactive compounds, such as the Cysprotease inhibitor E64 9 10 11 (Figure 1b), whereas pepstatin A (aspartyl proteases) and PMSF (serine proteases) did not show 12 7 13 any inhibitory effect (not shown). 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Figure 1 . Proteolytic activity of PSL1a. (a) SDSPAGE analysis of α1antitrypsin (α1AT) 28 29 digested with PNGase F, MOA or PSL1a in the presence of calcium. Incubation of α1AT with 30 31 32 MOA or PSL1a alone (lanes 2 and 3) yields singleband shifts toward lower molecular weight, as 33 34 expected for defined proteolytic cleavage. In contrast, sequential digestion of α1AT with 35 36 PNGase F and either PSL1a or MOA (lanes 5 and 6) shifts the entire band pattern observed in 37 38 36 39 lane 4 towards lower molecular weight (deglycosylated products, see Hirsch et al .). As 40 41 previously established for MOA, 7 this result is consistent with PSL1a proteolyic rather than N 42 43 glycolytic activity. (b) Effect of different divalent cations on the enzymatic activity of PSL1a. 44 45 46 Activity was only observed when digesting α1AT in the presence of calcium or manganese (II) 47 48 (5 mM) (compare lanes 2, 3 and 8). Other divalent cations, tested at the same concentration, 49 50 2+ 2+ 51 could not functionally replace Ca or Mn ; E64 inhibits PSL1a activity (lane 11). Metal ion 52 53 mediated protein degradation can be held accountable for the decrease in intensity of the α1AT 54 55 band in the presence of Cu 2+ or Fe 2+ (lanes 6 and 9; see Stadtman 38 ). 56 57 58 59 60 ACS Paragon Plus Environment 11 Page 13 of 47 Biochemistry

1 2 3 4 5 6 Calcium-bound PSL1a and MOA form a covalent complex with E-64 7 8 The crystal structures of MOA and PSL1a in complex with calcium and the irreversible 9 10 11 protease inhibitor E64 were solved to a resolution of 1.5 Å and 1.3 Å, respectively. Data 12 13 collection and refinement statistics are summarized in Table 1. In the MOAE64 structure, the 14 15 asymmetric unit contains a single protomer, whereas PSL1a crystals contain the biological dimer 16 17 18 (Figure 2a). The biologically relevant homodimer of MOA is reconstituted through 19 20 crystallographic symmetry. A comparison of MOA bound to E64 with the coordinates of the 21 22 calciumbound protein in complex with ZVADfmk (PDB ID: 5D61 13 ) or unliganded (PDB ID: 23 24 12 25 3EF2 ) shows very little variation (r.m.s.d. = 0.1 Å). The two protomers in PSL1a are 26 27 essentially identical (r.m.s.d. = 0.3 Å). 28 29 A galactose molecule, one of the buffer components throughout purification, was found to bind 30 31 32 to the ricin B chainlike lectin module of both PSL1a subunits (Figure 2a). The monosaccharide 33 34 matches the orientation of the galactose moiety of the humantype influenza receptor epitope 35 36 37 Neu5Acα2,6Galβ1,4GlcNAc, characterized previously in complex with PSL1a (PDB ID: 38 18 12 39 3PHZ). As already noted in the case of MOA , each PSL1a protomer binds to two calcium 40 41 ions in a binuclear cluster, with each of the ions exhibiting distinct ligand coordination (sites A, 42 43 44 pentagonal bipyramidal and B, octahedral, respectively). In the biological dimer, the two 45 46 binuclear calcium clusters are symmetrically positioned at the dimerization interface (Figure 2a). 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment 12 Biochemistry Page 14 of 47

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment 13 Page 15 of 47 Biochemistry

1 2 3 Figure 2 . E64 inhibitor bound to PSL1a and MOA. (a) Structure of PSL1a homodimer 4 5 6 (light/dark blue) in complex with two E64 inhibitor molecules (green sticks), four calcium ions 7 8 (magenta spheres) and two galactose molecules (yellow sticks). (bd) Structures and composite 9 10 11 OMIT maps of E64 (contoured at 1σ) bound to PSL1a (b,c) or MOA (d). In one of the two 12 13 PSL1a protomers (c), the argininelike (agmatine) tail of E64 adopts only a single conformation, 14 15 stabilized by crystal contacts. (e) Chemical structure of the E64 inhibitor, with the atom 16 17 39 18 numbering suggested by LeungToung et al .. (f) Reaction mechanism of E64 with the 19 20 nucleophilic cysteine of PLCPs, as proposed by Matsumoto et al .. 40 (g) Scheme of the PLCP 21 22 catalytic cleft, first shown by Cordara et al .13 and adapted from the representation proposed by 23 24 41 25 Brömme. Subsites S3 and S2’ are drawn as dotted lines to stress their nature of ‘binding areas’, 26 27 whereas subsites S4 and S3’ are represented as shallow grooves to stress their very low 28 29 conservation among PLCP members. The S2 subsite is shown as a deep pocket to highlight its 30 31 32 prominent role in determining the substrate preference. 33 34 35 36 37 38 39 40 E-64 exhibits the same essential -substrate interactions in PSL1a and MOA 41 42 According to the literature, the catalytic cysteine of papainlike cysteine proteases (PLCPs) 43 44 performs a nucleophilic attack on the oxirane ring of E64 (Figure 2e,f), leading to the formation 45 46 40 47 of a covalent adduct and the irreversible inhibition of the enzyme (Figure 2f). In both PSL1a 48 49 and MOA, E64 forms a covalent linkage between the C2 carbon of the inhibitor molecule and 50 51 γ 52 the S of the catalytic cysteine (PSL1a: Cys208, MOA: Cys215; Figure 2bd). The inhibitor is 53 54 positioned at the dimerization interface, interacting with the active site of PSL1a and MOA 55 56 through polar and van der Waals contacts, as detailed in Figure 3a,b. 57 58 59 60 ACS Paragon Plus Environment 14 Biochemistry Page 16 of 47

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment 15 Page 17 of 47 Biochemistry

1 2 3 Figure 3 . Active site interactions. (a) PSL1a. Left panel: Detailed interactions between the E64 4 5 6 inhibitor (green) and the catalytic cleft of PSL1a (blue). Hydrogen bonding interactions are 7 8 highlighted as yellow dotted lines, calcium ions represented by magenta spheres. Right panel: 9 10 11 Schematic representation of PSL1a active site. (b) MOA. Left panel: E64 interactions with the 12 13 catalytic cleft of MOA (orange/wheat). Right panel: Schematic representation of the MOA active 14 15 site. (c) Comparison of PSL1a, MOA and papain. Left panel: S2 binding pocket (superimposition 16 17 18 of PSL1a, blue, and MOA, orange/wheat). Middle panel: active site (superimposition includes 19 20 papain (grey/black); PDB ID: 1PPN). 42 Right panel: Schematic view of the catalytic cleft of 21 22 papain (adapted from Turk et al .). 43 The is marked. 23 24 25 26 27 28 The three most important interactions between enzyme and inhibitor involve the carboxyl 29 30 group, the C 4=O 4 carbonyl group and the leucyl moiety of E64. The carboxylate “head” points 31 32 33 to a cavity on the active site of PSL1a lined by the NH group of a tryptophan side chain (PSL1a: 34 35 Trp201, MOA: Trp208) and the backbone NH group of the catalytic Cys. This feature, known in 36 37 proteases as the oxyanion hole 44, 45 , plays a fundamental role in stabilizing a negatively charged 38 39 44 4 4 40 intermediate during the proteolytic reaction. The C =O group forms a strong interaction with 41 42 the calcium ion occupying the octahedral cavity. The carbonyl oxygen completes the calcium ion 43 44 coordination on the axial plane and keeps the E64 molecule tightly bound to the catalytic cleft. 45 46 47 The leucyl side chain of E64 points inward, toward the surface of the active cleft, fitting into a 48 49 shallow cavity lined by the side chains of two residues, each contributed by a different protomer 50 51 52 (PSL1a: Ser175# and Phe240, MOA: Leu182# and Leu247; Figure 3). This structural feature, 53 46 54 found in all PLCPs and referred to as the S2 binding pocket (Figure 2g), is a strong determinant 55 56 of the substrate preference of proteases. 47 57 58 59 60 ACS Paragon Plus Environment 16 Biochemistry Page 18 of 47

1 2 3 The conformation of the argininelike (agmatine) tail of E64 differs between MOA and 4 5 6 PSL1a, and between the two PSL1a protomers (Figure 2bd). When bound to chain A of the 7 8 PSL1a dimer, the residual electron density at the active site supports the assignment of the 9 10 11 agmatine tail to three different conformations (α, β, γ), modeled with partial occupancy (0.4, 12 13 0.35, 0.25, respectively; Figure 2b), whereas it exhibits only a single conformation in PSL1a 14 15 chain B, facilitated by a crystal contact (Figure 2c; modeled at full occupancy). In MOA, the 16 17 18 agmatine tail of the inhibitor is poorly defined (occupancies α/β of 0.55/0.45, respectively; 19 20 Figure 2d). 21 22 23 24 25 Metal-mediated activation mechanism of MOA and PSL1a 26 27 The highresolution crystal structures of MOA and PSL1a in complexes with calcium and the 28 29 irreversible E64 inhibitor enable a direct comparison between the active sites of the two 30 31 12 32 enzymes. In MOA, calcium binding triggers a conformational change, which is conserved in 33 34 PSL1a. At the domain level, the sixth strand of the Rdomain barrel motif is pushed away and 35 36 37 forced into a different conformation (PSL1a: Ala165Gly177, MOA: Ile173Gly184), (Figure 38 39 4a). This sliding motion results in a partial rotation of the Rdomain, which is more pronounced 40 41 in MOA than in PSL1a (5.2°, compared to the 1.6° measured in PSL1a; Figure 4a). The 42 43 44 combination of the sliding motion and the Rdomain rotation opens the catalytic cleft for 45 46 substrate binding. 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment 17 Page 19 of 47 Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Figure 4 . Calciumbased activation mechanism. (a) Comparison between the calciumfree (dark 35 36 6, 18 12 37 colors) and the calciumbound forms (light colors; this work and ) of PSL1a (upper panels) 38 39 and MOA (lower panels). Calcium binding triggers a conformational change of a stretch of 40 41 residues of the Rdomain antiparallel β–barrel (PSL1a: Ala165Gly177, MOA: Ile173Gly184). 42 43 44 The change leads to a tilt of the Rdomain relative to the Ldomain (lighter colors; for R/L 45 46 domain partitioning, see Figure S1). At the molecular level, the conformational change results in 47 48 49 a ≈6Å backbone shift of the affected residues. The hydroxyl group of a tyrosine residue (PSL1a: 50 51 Tyr173, MOA: Tyr180) shifts ≈15 Å, freeing the catalytic cleft of the symmetryrelated 52 53 protomer (transparent). (b) The conformational change leads to the proper alignment of the 54 55 56 catalytic machinery. The tyrosine shown in (a) acts as a gating residue, moving away from the 57 58 59 60 ACS Paragon Plus Environment 18 Biochemistry Page 20 of 47

1 2 3 catalytic cysteine and out of the S2 subsite (solvent exposed surface shown in either light blue, 4 5 6 for PSL1a, or wheat, for MOA). A key aspartate (PSL1a: Asp207; MOA: Asp214) is also 7 8 realigned. PDB files: PSL1a (PDB ID: 5MUA, Ca2+bound, PDB ID: 3PHZ,18 Ca2+free); MOA 9 10 6 2+ 12 2+ 11 (PDB ID: 2IHO, Ca free; PDB ID: 3EF2, Ca bound). 12 13 14 15 16 At the molecular level, calcium binding initiates a series of smaller conformational changes 17 18 (Figure 4), which can be broken down into four molecular events: i) the realignment of the 19 20 21 catalytic residues to match the geometric arrangement observed in other PLCPs; ii) binding of 22 23 the calcium ions in site B tilts an aspartate residue away from the oxyanion hole (Asp207 in 24 25 26 PSL1a, Asp214 in MOA; Figure 4b); iii) the conformational change leads to the formation of the 27 28 S2 specificity pocket, by the correct placement of the residues lining its bottom (Ser175# in 29 30 PSL1a, Leu182# in MOA); and iv) a tyrosine residue (Tyr173# in PSL1a, Tyrs180# in MOA) 31 32 33 undergoes a large shift (≈15Å; Figure 4a). The conformational shift of this ‘gating tyrosine’, 34 35 which would otherwise block the entrance to the S2 pocket, removes a constraint on the 36 37 conformational freedom of the catalytic cysteine (Figure 4b). 38 39 40 41 42 Unique activation mechanism among PLCPs 43 44 There are only two other known classes of Ca2+dependent PLCPs: calpains and LapG. 45 46 47 Calpains are a group of multidomain eukaryotic proteases playing an important role in 48 49 physiology and disease,48-51 LapG is a bacterial protease involved in cdiGMP signal 50 51 52 52 transduction. The core event in activation is a calciuminduced conformational change 53 49, 53 54 in its papainlike domain. Calcium binds to two nonEF hand sites, leading to a 25° rotation 55 56 between the L and Rdomain and the rearrangement of two gating loops.54 This in turn realigns 57 58 59 60 ACS Paragon Plus Environment 19 Page 21 of 47 Biochemistry

1 2 3 the catalytic machinery, positioning the catalytic His in the optimal orientation to activate the 4 5 6 nucleophilic Cys of the catalytic thiolateimidazolium pair, and removes a tryptophan residue 7 8 wedging the catalytic cleft.55 While the calcium binding sites of calpains do not superimpose 9 10 56 11 with those in MOA, the single calcium of LapG matches the calcium ion A in 12 13 MOA/PSL1a. Calcium binding in LapG does not significantly affect the catalytic triad, but rather 14 15 leads to a structural change of the lower portion of the catalytic cleft for substrate interaction. 57 16 17 18 The calciumdependent activation mechanism of PSL1a and MOA relies on both principles, 19 20 including an Rdomain rotation (calpains) and the formation/opening of the catalytic cleft 21 22 (calpains, LapG). Unique among PLCPs, is that calcium binding to MOA and PSL1a triggers the 23 24 25 conformational change of key residues. Upon calcium binding, an aspartate residue tilts to allow 26 27 the substrate access to the oxyanion hole, while a tyrosine, provided by the second protomer, 28 29 behaves as a gating element on the S2 pocket and catalytic cysteine (Figure 4b). While partially 30 31 32 similar to the removal of the active siteoccluding Trp298 in calpains, the gating tyrosine 33 34 observed in MOA and PSL1a exerts a direct inhibitory effect on the catalytic cysteine. Most 35 36 37 PLCPs are synthesized as proenzymes and subsequently cleaved, thus removing the propeptide 38 58 39 and freeing the catalytic cleft. Propeptidebased inhibition relies on both keeping the 40 41 nucleophilic cysteine in check by engaging it through an incorrectly oriented peptide bond, and 42 43 44 blocking access to the catalytic center. A calciuminduced backbone shift resulting in tyrosine 45 46 gating, as observed in MOA and PSL1a, could represent a novel alternative to the propeptide 47 48 based activation mechanism common in other PLCPs and the mobile loopbased mechanism of 49 50 54 51 calpains. The role of tyrosine as a gating residue has been observed in other biological systems, 52 53 e.g. aquaporin, 59 RNA polymerase, 60 CIC chloride channels, 61 and even, presumably as a “key in 54 55 56 57 58 59 60 ACS Paragon Plus Environment 20 Biochemistry Page 22 of 47

1 2 3 lock”, in proteosomal activation; 62, 63 however, tyrosine gating has to our knowledge not earlier 4 5 6 been observed in small proteases. 7 8 9 10 11 12 13 Active site of MOA and PSL1a compared to other PLCPs 14 15 Enzymes belonging to the PLCP superfamily share a common active site architecture (Figure 16 17 64 18 2g), originally defined by Schechter and Berger in 1967. When bound to the active site of a 19 20 cysteine protease, the E64 molecule can be likened to a dipeptide with a carboxylcarrying tail 21 22 attached to its Nterminus, a Leu residue as the P2 moiety and a pseudoarginine as the P3 23 24 25 moiety (Figure 2e). Compared to a standard PLCP substrate, E64 binds to sites S1’ and S1S3 of 26 27 the PLCP active cleft in a reversed backbone orientation (compare Figure 2e,g). 40, 65 28 29 When bound to MOA or PSL1a, E64 retains the inverted backbone orientation observed in 30 31 40 32 other PLCPs. The PSL1a and MOAE64 complexes confirm Cys208 and Cys215 the 33 34 nucleophilic residues of PSL1a and MOA, respectively, in line with published cytotoxicity 35 36 16 γ 2 37 experiments. The Cys(S )E64(C ) distance observed in different PDBdeposited PCLPE64 38 39 complexes ranges between 1.8 Å and 2.1 Å, in good agreement with the value of 1.92.0 Å 40 41 determined for MOA and PSL1a (Table 3; see Supplementary Table 1 for an extended 42 43 44 comparison with all the known PLCPE64 complexes). 45 46 The oxyanion hole of most PLCPs is lined by the backbone NH group of the catalytic cysteine 47 48 and the side chain amide of a glutamine residue (Gln19 in papain; Figure 3c). While the NH 49 50 51 group of cysteine shoulders most of the stabilizing effect, a substitution of the glutamine residue 52 53 leads to variations ranging from a change of the catalytic efficiency 44, 66 to a change in reactivity 54 55 (e.g. papain to a nitrile hydratase). 67 In PSL1a and MOA, the glutamine residue is replaced by 56 57 58 59 60 ACS Paragon Plus Environment 21 Page 23 of 47 Biochemistry

1 2 3 tryptophan. This could explain the low measured in vitro catalytic efficiency of both enzymes, or 4 5 6 potentially indicate the presence of an alternative catalytic activity coexisting with the proteolytic 7 8 one. 9 10 11 At the S1’ subsite of papain and other PLCPs, E64 directly interacts with the catalytic 12 68-74 13 histidine and coordinates a tryptophan residue through a water molecule (not shown). Both 14 15 residues are well conserved throughout the PLCP family, acting as the general acidbase catalyst 16 17 18 (His159, papain numbering) and a docking point for the substrate backbone (Trp177; Figure 3c). 19 20 In MOA and PSL1a, the E64His interaction is conserved, whereas the watermediated 21 22 interaction is to a glutamine and not a tryptophan residue (Figure 3a,b). The role of glutamine in 23 24 25 substrate coordination is further reinforced by previous structural data of MOA in complex with 26 27 ZVADfmk. 13 28 29 PLCPs coordinate their substrates through backbonebackbone interactions mediated by L and 30 31 32 Rdomain residues. In particular, the Ldomain binds the substrate through the backbone 33 34 carbonyl and NH groups of a highly conserved glycine (Gly66 in papain) (Figure 3c). In MOA 35 36 37 and PSL1a, the two glycinemediated interactions are replaced by a single contact to the site B 38 39 metal ion with octahedral configuration (Figure 3a,b). 40 41 42 43 44 An emerging family of chimerolectins 45 46 In order to investigate if other proteins contain the newly identified structural features, we 47 48 performed a BLAST search using the sequence of the proteolytic domain of MOA. We identified 49 50 51 several putative protein orthologs from newly sequenced fungal genomes, e.g. Moniliophthora 52 53 perniciosa , Serpula lacrymans and Singulisphaera acidiphila , as well as bacterial genomes ( e.g. 54 55 Xenorhabdus khoisanae and Dehalococcoides mccartyi ; Figure S3). 56 57 58 59 60 ACS Paragon Plus Environment 22 Biochemistry Page 24 of 47

1 2 3 A Promals3D 75 structure and sequencebased alignment of MOA against known and potential 4 5 6 homologs shows the conservation of several key elements (Table 2). Residues involved in metal 7 8 binding through their side chain are generally well conserved, whereas residues mediating metal 9 10 11 interactions via their backbone carbonyl are more prone to variation. The ‘gating tyrosine’ of 12 13 MOA (Tyr180) seems to be conserved in some members (Table 2), while in others it is replaced 14 15 by an amino acid of approximately similar size (e.g. Phe, Arg, Glu, His). In contrast, the catalytic 16 17 18 CysHisGlu triad shared by MOA and PSL1a is not universally conserved. The cysteine is 19 20 replaced by a serine residue in PSL1b and AGL1, by (e.g.) an isoleucine and an aspartate in the 21 22 putative members from Moniliophthora perniciosa , or a glutamine residue in all the putative 23 24 25 Serpula lacrymans members. The histidine residue is subject to variation as well, whereas the 26 27 acidic residue (Glu or Asp) of the catalytic triad is rather well conserved. 28 29 MOA and PSL1a can hence be considered as founding members of an emerging family of 30 31 32 fungal chimerolectins carrying a ricin Bchainlike fold and an unusual dimerization domain. 33 34 Conservation of residues involved in metal binding and gating hint at a possible shared activation 35 36 37 mechanism. In contrast, the catalytic triad is subject to more variation. Some PLCPs tolerate the 38 76 39 substitution of cysteine by serine ; however, in MOA this substitution results in the loss of 40 41 proteolytic activity (data not shown). The loss of catalytic activity and gain of a new function by 42 43 77 44 mutation of the catalytic cysteine is not unprecedented for PLCPs. Examples of the latter are 45 46 e.g. P34 from Glycine max and related proteins, such as SPE31 from Pachyrhizus erosus . These 47 48 plant proteins have either gained or retained a fundamental role in signaling despite a mutation of 49 50 78, 79 51 their catalytic cysteine to a glycine. Overall, conserved and diverging structural features 52 53 suggest that the conformational changes associated with metal binding, including tyrosine gating, 54 55 56 57 58 59 60 ACS Paragon Plus Environment 23 Page 25 of 47 Biochemistry

1 2 3 could play an important functional role in this group of proteins, even in the absence or with a 4 5 6 different kind of enzymatic activity. 7 8 9 10 11 Conclusions and perspectives 12 13 By joining a sugar binding module with a differently purposed domain, chimerolectins can 14 15 deliver a specific function at a precise cellular location. The chimerolectins MOA and PSL1a 16 17 18 carry a papainlike proteolytic domain. The domains provide for an unusual metaldependent 19 20 activation mechanism. Unlike any other enzyme with variations of the papain fold, the metal ions 21 22 directly interact with the substrate. The likely consequence is to add multiple layers of regulation 23 24 25 to the biological activity of the two chimerolectins. As the proteolytic activities of MOA and 26 27 PSL1a are the main source of their cytotoxicity, 10, 11, 16, 80 the role and nature of the available 28 29 metal ions could determine the behavior in the original or the target host. 81 30 31 32 Prospective MOA and PSL1a orthologs show a high degree of variability among putative 33 34 components of the catalytic machinery, whereas residues responsible for metal binding are better 35 36 37 conserved. This suggests that the MOA/PSL1a papainlike domain may serve as a flexible 38 39 scaffold, hosting a range of different, metalregulated enzymatic activities. The functional and 40 41 structural characterization of more known and potential members of the MOA/PSL1a family will 42 43 44 lead to a better understanding of its papainlike domain, and the different enzymatic activities 45 46 associated with it. 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment 24 Biochemistry Page 26 of 47

1 2 3 TABLES 4

5 6 7 Table 1 . Data collection and refinement statistics 8

9 10 PSL1aE64 MOAE64

11 12 A. Data collection 13 Beamline ESRF ID232 ESRF ID232 14 Wavelength (Å) 0.8726 0.8726 15 Space group P2 2 2 P6 22 16 1 1 1 3 Cell parameters: a, b, c (Å) 75.7 94.4 102.8 120.9 120.9 99.9 17 a 18 Resolution (Å) 47.21.5 (1.571.54) 46.41.3 (1.321.30) ab 19 Rmerge (%) 17.8 (>100.0) 7.0 (56.4) ac 20 Rmeas 20.3 (>100.0) 7.8 (68.7) 21 ad Rp.i.m. 9.8 (>100) 3.3 (38.2) 22 ae CC 1/2 99.1 (33.3) 99.9 (61.6) 23 a 24 Mean I/ σ(I) 6.2 (0.6) 14.0 (1.6) a 25 Completeness (%) 99.0 (93.8) 94.2 (61.2) 26 Multiplicity a 4.1 (3.8) 5.2 (2.5) 27 Unique reflections a 107830 (4995) 99395 (3104) 28 29 30 B. Refinement a 31 Resolution (Å) 47.21.5 46.41.3 f 32 Rwork /Rfree (%) 22.1 / 24.8 16.0/19.2 33 Macromolecules / a.s.u. 2 1 34 No. atoms 35 36 Protein 4372 2386 37 Water 451 331 38 Ligands 102 147 39 B-factor (Å 2) 40 Protein 14.9 11.4 41 42 Water 23.3 24.2 43 Ligands 19.2 15.1 44 r.m.s.d. from ideal values 45 Bond lengths (Å) 0.02 0.02 46 Bond angles (deg.) 2.0 2.3 47 Ramachandran plot 48 49 Core region (%) 97.3 97.4 50 Outliers (%) 0.0 0.0 51 PDB ID 5MUA 5MU9

52 53 54 55 aValues in parentheses refer to highest resolution shell 56 57 58 59 60 ACS Paragon Plus Environment 25 Page 27 of 47 Biochemistry

1 2 3 b 4 Rmerge = Σ hΣj |Ihj 〈Ih〉| / Σ hΣj Ihj , where 〈Ih〉 is the mean intensity of symmetryrelated 5 reflections I h 6 c 1/2 82 7 Rmeas = Σ h [N h/(N h1)] Σi |Ihj 〈Ih〉| / Σ hΣi Ihj , where N is the redundancy of reflection h 8 d 1/2 83 9 Rp.i.m. = Σ h [1/(N h1)] Σj |Ihj 〈Ih〉| / Σ hΣj Ihj 10 e 11 The high resolution cutoff was chosen despite the low CC 1/2 ensuring the presence of a low 12 signaltonoise ratio by visual inspection of the electron density map 13 f 14 Rfree was calculated from 5% of randomly selected data for each data set 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment 26 Biochemistry Page 28 of 47

1 2

3 4 Table 2. Conserved catalytic residues in the MOAPSL1a family 5 6 7

8 S2 9 Metal binding gating MOA cat. oxy. subst. pocket 10 Protein/Organism PDB/UniProt %ID triad hole coord. 11 bb side chain 12 13 Known members 14

15 MOA 2IHO (PDB) C H E W Q A L Q D D D D Y L L 16 *

17 18 MOA residue number 19 215 257 274 208 276 256 182 211 183 214 216 217 180 182 247

20

21 PSL1a 3PHZ (PDB) 38 C H E W Q A S E D D D D Y S F 22 PSL1b BAC87876 38 S H E R Q R Y G D D D E Y Y F 23 SCL/AGL1 XP_003030975 32 S I D P E K P P D Q E A E P S

24 Proposed members 25 26 M. perniciosa XP_002398917 43 I ? ? S ? ? A E D D D D Y A ? 27 M. perniciosa XP_002393886 71 D ? ? R ? ? A P D N G E Y A ? 28 S. acidiphila ZP_09568250 33 C H E Y Q S L E D D D D R L L 29 S. lacrimans EGN95167 38 Q Q D L Q G L E D D D D Y L L 30 S. lacrimans EGO20674 37 Q Q D L Q G M E N D D D Y M L 31 S. lacrimans EGO20678 38 Q Q D L Q G L E D D D D Y L L 32 D. mccartyi WP_041331075 27 C H E Y Q Y I D D D D N I I R 33 D. mccartyi WP_072555782 27 C H E Y Q Y I D D D D N I I R 34 D. mccartyi OBW61112 29 C H E Y Q Y I D D D D N I I R Peniophora KZV64829 44 C H E W Q A S E D D D D F S F 35 Peniophora KZV73639 42 Y S E W Q R S E D D E D Y S F 36 Peniophora KZV65628 32 E C D R S R P D N D D A F P Y 37 Pleurocapsa WP_019503841 33 C H E W Q P P E D D D D H P W 38 P. ostreatus KDQ29914 35 I T R S L S A E D D D D F A F 39 P. crispa KII87094 41 N H E W Q T F E D D Q D Y F F 40 P. umbellatus ANC28063 50 C H E W Q A T E D D D D Y T L 41 P. guariconensis SDC03422 30 C H E Y Q A V E D D D D T V F P. monteilii WP_070091156 33 C H E Y Q A V E D D D D T V F 42 P. putida WP_043215011 29 C H E Y Q A V E D D D D T V F 43 P. sp. 25 WP_050705034 35 C H E W Q A A E D D D D C A A 44 P. sp. HMSC08G10 WP_070572439 30 C H E Y Q A V E D D D D T V F 45 S. vermifera KIM22138 34 C H E W Q V T E D D D D Y T F 46 S. vermifera KIM25353 34 S H E S Q L V E E D D Y Y V F 47 S. stellatus KIJ30840 44 C H E W Q A S N D D D D Y S F 48 S. stellatus KIJ39841 43 C H E W Q A S D D D D D Y S F 49 S. stellatus KIJ35645 43 C H E W Q A S D D D D D Y S F T. cinnabarina CDO76517 43 C H E W Q G S E D D D D Y S F 50 T. cinnabarina CDO76070 44 C H E W Q G S E D D D D Y S F 51 T. calospora KIO23019 33 Y R D S Q P D Y D D D D Y D Y 52 T. calospora KIO23020 31 Y K D V Q P D H D D D N Y D Y 53 T. calospora KIO23021 31 Y K D V Q S D G N D D D Y D Y 54 T. calospora KIO23033 34 D R D L Q P D H S D D S H D Y 55 T. calospora KIO25705 35 Y K D V Q V D G D D D D Y D Y 56 T. calospora KIO17420 31 Y R D I Q P D H S D D D Y D Y 57 T. calospora KIO17421 34 Y K D I Q P E N S D D D Y E F T. calospora KIO16956 28 Y K D V Q R T G Q D N D H T Y 58 59 60 ACS Paragon Plus Environment 27 Page 29 of 47 Biochemistry

1 2 3 T. calospora KIO20974 32 Y R D K Q Y A G D D I D I A E 4 X. khoisanae WP_047961396 33 C H E Y Q A V Q D D D D R V L

5 6 Notes: MOA %ID. = % identity to the MOA sequence 7 cat. triad = catalytic triad 8 oxy. hole = oxyanion hole 9 subst. coord. = residues involved in substrate coordination 10 bb = backbonemetal ion interaction 11 * = column containing the catalytic nucleophile 12 13 14 15 16

17

18 Table 3. Conserved E64PLCPs interactions 19

20

21 PSL1a (chain A/B) MOA Papain 40, 73 Mexicain (chain A/B/C/D) 68 K (chain A/B) 72 Staphopain 84 E64 Feature (PDB ID: 5MUA; 1.5Å) (PDB ID: 5MU9; 1.3Å) (PDB ID: ; 2.4Å) (PDB ID: 2BDZ; 2.10Å) (PDB ID: 3CE9; 1.8Å) (PDB ID: 1CV8; 1.75Å) 22 Atom Residue(atom) Dist. Residue(atom) Dist. Residue(atom) Dist. Residue(atom) Dist. Residue(atom) Dist. Residue(atom) Dist.

23

24 2 Catalytic Cys C Cys208(S γ) 1.9/2.0 Cys215(S γ) 1.9 Cys25(S γ) 1.8 Cys25(S γ) 2.3/2.1/2.3/2.1 Cys25(S γ) 1.8 Cys24(S γ) 1.9

25

26 O2 Cys208(N) 2.9/3.0 Cys215(N) 2.8 Cys25(N) 3.0 Cys25(N) 3.1/3.0/3.0/3.0 Cys25(N) 3.0 Cys24(N) 2.9 Oxyanion hole O2 Trp201(N ε1) 2.7/2.7 Trp208(Nε1) 2.7 Gln19(Nε2) 2.9 Gln19(Nε2) 2.9/2.7/2.9/2.6 Gln19(Nε2) 2.8 Gln18(Nε2) 2.9 27

28 O1 His248(N δ1) 2.9/2.9 His257(N δ1) 3.6 His159(N δ1) 2.9 His159(N δ1) 3.3/3.5/3.3/3.6 His162(N δ1) 2.7 His120(N δ1) 2.7 HOH(Gln276 29 S1’ HOH(Gln276Nε1, HOH(Trp177 HOH(Trp184Nε1, O1 2.7/2.6 Nε2, 2.6 HOH(Trp177 Nε1) 2.7 2.6/2.5/ / 2.8 HOH(Trp143Nε1) 2.6 Trp208Nε2) Nε1) Gln19Nε2) 30 Trp208Nε1)

31 O4 Ca 2+ 2.2/2.3 Ca 2+ 2.2 Gly66(N) 2.9 Gly66(N) 2.9/2.9/2.9/2.9 Gly66(N) 3.0 Leu65(N) 3.0 Substrate 32 N2 Ser175#(O) /3.0 Gly66(O) 3.0 Gly66(O) 2.7/2.9/2.9/2.8 Gly66(O) 3.4 Leu65(O) 3.0 coordination N1 Ala247(O) 3.4/3.8 Ala256(O) 3.5 Asp158(O) 3.4 Asp158(O) 3.2/2.9/2.9/3.0 Asn161(O) 3.1 Gly119(O) 3.5 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment 28 Biochemistry Page 30 of 47

1 2 3 ASSOCIATED CONTENT 4 5 6 7 Supporting Information 8 9 10 The following files are available free of charge: overall structural comparison of PSL1a, MOA 11 12 and papain (Figure S1), role of different agents on PSL1a enzymatic activity (Figure S2), 13 14 15 Promals3D mixed structure and sequencebased alignment of MOA against known and 16 17 proposed homologs (Figure S3) and E64protein atomic distances for different PLCPs in 18 19 complex with the E64 inhibitor (Table S1, Microsoft Excel file). 20 21 22 23 AUTHOR INFORMATION 24 25 26 Corresponding Authors 27 28 *Gabriele Cordara Email: [email protected]. Phone: +47 22 85 53 85. Ute 29 30 31 Krengel Email: [email protected]. Phone: +47 22 85 54 61. 32 33 34 Present Addresses 35 36 † 37 Gabriele Cordara: Biocenter Oulu and Faculty of Biochemistry and Molecular Medicine, 38 39 University of Oulu, Aapistie 7A, 90220 Oulu, Finland. Phone: +358 (0)294 481 179. 40 41 42 †Dipankar Manna: Department of Molecular Medicine, Institute of Basic Medical Sciences, 43 44 45 University of Oslo, 0372 Oslo, Norway. Phone: +47 228 51 464. 46 47 48 Author Contributions 49 50 The work was conceived by GC and UK, experiments performed by GC and DM, and analyzed 51 52 53 by all authors. The manuscript was written through contributions of all authors. All authors have 54 55 given approval to the final version of the manuscript. ‡These authors contributed equally. 56 57 58 59 60 ACS Paragon Plus Environment 29 Page 31 of 47 Biochemistry

1 2 3 Funding Sources 4 5 6 This work was funded by the University of Oslo, the Norwegian Research Council (grant no. 7 8 216625), the Norwegian National graduate School in Structural Biology (BioStruct) and 9 10 11 BioStructX. 12 13 14 15 16 17 ACKNOWLEDGMENT 18 19 20 We would like to thank Dr. Harry C. Winter and Prof. Irwin J. Goldstein (University of 21 22 23 Michigan, Ann Arbor, U.S.A.) for a very fruitful longterm collaboration on this project and 24 25 Markus Künzler (ETH, Zürich, Switzerland) for providing us with the expression plasmid 26 27 28 carrying the full length PSL1a gene. We further acknowledge the European Synchrotron 29 30 Radiation Facility staff for assistance and support in using the beamlines ID29 and ID232. This 31 32 research project was carried out within the frame of the GlycoNor consortium. 33 34 35 ABBREVIATIONS 36 37 38 Abbreviations: α1AT: α 1antitrypsin from human plasma; MalNEt: Nethylmaleimide; 39 40 MOA: Marasmius oreades agglutinin; MR: molecular replacement; PLCPs: papainlike cysteine 41 42 43 proteases; PNGase: peptide: Nglycanase; PSL1a: Polyporus squamosus lectin 1a; PSL1b: 44 45 Polyporus squamosus lectin 1b; SCL: Schizophyllum commune lectin; r.m.s.d.: root mean square 46 47 deviation/difference. 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment 30 Biochemistry Page 32 of 47

1 2 3 REFERENCES 4 5 6 7 8 9 10 (1) Peumans, W. J., and Van Damme, E. J. M. (1995) Lectins as plant defense proteins, Plant. 11 12 Physiol. 109 , 347352. 13 14 15 (2) Hassan, M. A. A., Rouf, R., Tiralongo, E., May, T. W., and Tiralongo, J. (2015) Mushroom 16 17 18 lectins: specificity, structure and bioactivity relevant to human disease, Int. J. Mol. Sci. 16 , 7802 19 20 7838. 21 22 23 (3) Khan, F., and Khan, M. I. (2011) Fungal lectins: current molecular and biochemical 24 25 26 perspectives, Int. J. Biol. Chem. 5, 120. 27 28 29 (4) Xu, X., Yan, H., Chen, J., and Zhang, X. (2011) Bioactive proteins from mushrooms, 30 31 Biotechnol. Adv. 29 , 667674. 32 33 34 35 (5) Winter, H. C., Mostafapour, K., and Goldstein, I. J. (2002) The mushroom Marasmius 36 37 oreades lectin is a blood group type B agglutinin that recognizes the Gala1,3Gal and 38 39 Gala1,3Galb1,4GlcNAc porcine xenotransplantation epitopes with high affinity, J. Biol. Chem. 40 41 42 277 , 1499615001. 43 44 45 (6) Grahn, E., Askarieh, G., Holmner, Å., Tateno, H., Winter, H. C., Goldstein, I. J., and 46 47 Krengel, U. (2007) Crystal structure of the Marasmius oreades mushroom lectin in complex with 48 49 50 a xenotransplantation epitope, J. Mol. Biol. 369 , 710721. 51 52 53 (7) Cordara, G., EggeJacobsen, W., Johansen, H. T., Winter, H. C., Goldstein, I. J., Sandvig, 54 55 K., and Krengel, U. (2011) Marasmius oreades agglutinin (MOA) is a chimerolectin with 56 57 58 proteolytic activity, Biochem. Biophys. Res. Commun. 408 , 405410. 59 60 ACS Paragon Plus Environment 31 Page 33 of 47 Biochemistry

1 2 3 (8) BleulerMartínez, S., Butschi, A., Garbani, M., Wälti, M. A., Wohlschlager, T., Potthoff, 4 5 6 E., Sabotiĉ, J., Pohleven, J., Lüthy, P., Hengartner, M. O., Aebi, M., and Künzler, M. (2011) A 7 8 lectinmediated resistance of higher fungi against predators and parasites, Mol. Ecol. 20 , 3056 9 10 11 3070. 12 13 14 (9) BleulerMartinez, S., Stutz, K., Sieber, R., Collot, M., Mallet, J.M., Hengartner, M., 15 16 Schubert, M., Varrot, A., and Künzler, M. (2016) Dimerization of the fungal defense lectin 17 18 CCL2 is essential for its toxicity against nematodes, Glycobiology 27 , 486500. 19 20 21 22 (10) Wohlschlager, T., Butschi, A., Zurfluh, K., Vonesch, S. C., auf dem Keller, U., Gehrig, P., 23 24 BleulerMartinez, S., Hengartner, M. O., Aebi, M., and Künzler, M. (2011) Nematotoxicity of 25 26 Marasmius oreades agglutinin (MOA) depends on glycolipid binding and cysteine protease 27 28 29 activity, J. Biol. Chem. 286 , 3033730343. 30 31 32 (11) Cordara, G., Winter, H. C., Goldstein, I. J., Krengel, U., and Sandvig, K. (2014) The 33 34 fungal chimerolectin MOA inhibits protein and DNA synthesis in NIH/3T3 cells and may induce 35 36 37 BAXmediated apoptosis, Biochem. Biophys. Res. Commun. 447 , 586589. 38 39 40 (12) Grahn, E. M., Winter, H. C., Tateno, H., Goldstein, I. J., and Krengel, U. (2009) Structural 41 42 characterization of a lectin from the mushroom Marasmius oreades in complex with the blood 43 44 45 group B trisaccharide and Calcium, J. Mol. Biol. 390 , 457466. 46 47 48 (13) Cordara, G., van Eerde, A., Grahn, E. M., Winter, H. C., Goldstein, I. J., and Krengel, U. 49 50 (2016) An unusual member of the papain superfamily: mapping the catalytic cleft of the 51 52 53 Marasmius oreades agglutinin (MOA) with a inhibitor, PLoS One 11 , e0149407. 54 55 56 57 58 59 60 ACS Paragon Plus Environment 32 Biochemistry Page 34 of 47

1 2 3 (14) Kruger, R. P., Winter, H. C., SimonsonLeff, N., Stuckey, J. A., Goldstein, I. J., and 4 5 6 Dixon, J. E. (2002) Cloning, expression, and characterization of the Gala1,3Gal high affinity 7 8 lectin from the mushroom Marasmius oreades , J. Biol. Chem. 277 , 1500215005. 9 10 11 (15) Tateno, H., Winter, H. C., and Goldstein, I. J. (2004) Cloning, expression in Escherichia 12 13 14 coli and characterization of the recombinant Neu5Acα2,6Galβ1,4GlcNAcspecific highaffinity 15 16 lectin and its mutants from the mushroom Polyporus squamosus , Biochem. J. 382 , 667675. 17 18 19 (16) Manna, D., Pust, S., Torgersen, M. L., Cordara, G., Künzler, M., Krengel, U., and 20 21 22 Sandvig, K. (2017) Polyporus squamosus lectin 1a (PSL1a) exhibits cytotoxicity in mammalian 23 24 cells by disruption of focal adhesions, inhibition of protein synthesis and induction of apoptosis, 25 26 PLoS One 12 , e0170716. 27 28 29 30 (17) Mo, H., Winter, H. C., and Goldstein, I. J. (2000) Purification and characterization of a 31 32 Neu5Aca26Galb14Glc/GlcNAcspecific lectin from the fruiting body of the polypore 33 34 mushroom Polyporus squamosus , J. Biol. Chem. 275 , 1062310629. 35 36 37 38 (18) Kadirvelraj, R., Grant, O. C., Goldstein, I. J., Winter, H. C., Tateno, H., Fadda, E., and 39 40 Woods, R. J. (2011) Structure and binding analysis of Polyporus squamosus lectin in complex 41 42 with the Neu5Aca26Galb14GlcNAc humantype influenza receptor, Glycobiology 21 , 973984. 43 44 45 46 (19) Heinemann, U., Pal, G. P., Hilgenfeld, R., and Saenger, W. (1982) Crystal and molecular 47 48 structure of the sulfhydryl protease calotropin DI at 3.2 Å resolution, J. Mol. Biol. 161 , 591606. 49 50 51 (20) Schindelin, J., ArgandaCarreras, I., Frise, E., Kaynig, V., Longair, M., Pietzsch, T., 52 53 54 Preibisch, S., Rueden, C., Saalfeld, S., Schmid, B., Tinevez, J. Y., White, D. J., Hartenstein, V., 55 56 57 58 59 60 ACS Paragon Plus Environment 33 Page 35 of 47 Biochemistry

1 2 3 Eliceiri, K., Tomancak, P., and Cardona, A. (2012) Fiji: an opensource platform for biological 4 5 6 image analysis, Nat. Methods 9, 676682. 7 8 9 (21) Kabsch, W. (2010) XDS, Acta Crystallogr. D Biol. Crystallogr. 66 , 125132. 10 11 12 (22) McCoy, A. J., GrosseKunstleve, R. W., Adams, P. D., Winn, M. D., Storoni, L. C., and 13 14 15 Read, R. J. (2007) Phaser crystallographic software, J. Appl. Crystallogr. 40 , 658674. 16 17 18 (23) Emsley, P., Lohkamp, B., Scott, W. G., and Cowtan, K. (2010) Features and development 19 20 of Coot , Acta Crystallogr. D Biol. Crystallogr. 66 , 486501. 21 22 23 (24) Murshudov, G. N., Skubák, P., Lebedev, A. A., Pannu, N. S., Steiner, R. A., Nicholls, R. 24 25 26 A., Winn, M. D., Long, F., and Vagin, A. A. (2011) REFMAC 5 for the refinement of 27 28 macromolecular crystal structures, Acta Crystallogr. D Biol. Crystallogr. 67 , 355367. 29 30 31 (25) Katz, A. K., Glusker, J. P., Beebe, S. A., and Bock, C. W. (1996) Calcium ion 32 33 34 coordination: a comparison with that of beryllium, magnesium, and zinc, J. Am. Chem. Soc. 118 , 35 36 57525763. 37 38 39 (26) McPhalen, C. A., Strynadka, N. C. J., and James, M. N. G. (1991) Calciumbinding sites 40 41 42 in proteins: a structural perspective, Adv. Protein Chem. 42 , 77144. 43 44 45 (27) Karplus, P. A., and Diederichs, K. (2012) Linking crystallographic model and data 46 47 quality, Science 336 , 10301033. 48 49 50 (28) Diederichs, K., and Karplus, P. A. (2013) Better models by discarding data?, Acta 51 52 53 Crystallogr. D Biol. Crystallogr. 69 , 12151222. 54 55 56 57 58 59 60 ACS Paragon Plus Environment 34 Biochemistry Page 36 of 47

1 2 3 (29) Karplus, P. A., and Diederichs, K. (2015) Assessing and maximizing data quality in 4 5 6 macromolecular crystallography, Curr. Opin. Struct. Biol. 34 , 6068. 7 8 9 (30) Evans, P. R., and Murshudov, G. N. (2013) How good are my data and what is the 10 11 resolution?, Acta Crystallogr. D Biol. Crystallogr. 69 , 12041214. 12 13 14 15 (31) Chen, V. B., Arendall, W. B. I., Headd, J. J., Keedy, D. A., Immormino, R. M., Kapral, G. 16 17 J., Murray, L. W., Richardson, J. S., and Richardson, D. C. (2010) MolProbity : allatom structure 18 19 validation for macromolecular crystallography, Acta Crystallogr. D Biol. Crystallogr. 66 , 1221. 20 21 22 23 (32) Adams, P. D., Afonine, P. V., Bunkóczi, G., Chen, V. B., Davis, I. W., Echols, N., Headd, 24 25 J. J., Hung, L.W., Kapral, G. J., GrosseKunstleve, R. W., McCoy, A. J., Moriarty, N. W., 26 27 Oeffner, R., Read, R. J., Richardson, D. C., Richardson, J. S., Terwilliger, T. C., and Zwart, P. H. 28 29 30 (2010) PHENIX : a comprehensive Pythonbased system for macromolecular structure solution, 31 32 Acta Crystallogr. D Biol. Crystallogr. 66 , 213221. 33 34 35 (33) Krissinel, E., and Henrick, K. (2004) Secondarystructure matching (SSM), a new tool for 36 37 38 fast protein structure alignment in three dimensions, Acta Crystallogr. D Biol. Crystallogr. 60 , 39 40 22562268. 41 42 43 (34) Winn, M. D., Ballard, C. C., Cowtan, K. D., Dodson, E. J., Emsley, P., Evans, P. R., 44 45 46 Keegan, R. M., Krissinel, E. B., Leslie, A. G. W., McCoy, A., McNicholas, S. J., Murshudov, G. 47 48 N., Pannu, N. S., Potterton, E. A., Powell, H. R., Read, R. J., Vagin, A., and Wilson, K. S. (2011) 49 50 Overview of the CCP 4 suite and current developments, Acta Crystallogr. D Biol. Crystallogr. 51 52 53 67 , 235242. 54 55 56 57 58 59 60 ACS Paragon Plus Environment 35 Page 37 of 47 Biochemistry

1 2 3 (35) Berman, H. M. (2008) The Protein Data Bank: a historical perspective, Acta Crystallogr. 4 5 6 A Found. Crystallogr. 64 , 8895. 7 8 9 (36) Hirsch, C., Misaghi, S., Blom, D., Pacold, M. E., and Ploegh, H. L. (2004) Yeast N 10 11 glycanase distinguishes between native and nonnative glycoproteins, EMBO Rep. 5, 201206. 12 13 14 15 (37) Kalsheker, N. (1989) Alpha 1antitrypsin: structure, function and molecular biology of the 16 17 gene, Biosci. Rep. 9, 129138. 18 19 20 (38) Stadtman, E. R. (1993) Oxidation of free amino acids and amino acid residues in proteins 21 22 23 by radiolysis and by metalcatalyzed reactions, Annu. Rev. Biochem. 62 , 797821. 24 25 26 (39) LeungToung, R., Li, W., Tam, T. F., and Karimian, K. (2002) Thioldependent enzymes 27 28 and their inhibitors: a review, Curr. Med. Chem. 9, 9791002. 29 30 31 (40) Matsumoto, K., Mizoue, K., Kitamura, K., Tse, W.C., Huber, C. P., and Ishida, T. (1999) 32 33 34 Structural basis of inhibition of cysteine proteases by E64 and its derivatives, Biopolymers 51 , 35 36 99107. 37 38 39 (41) Brömme, D. (2001) Papainlike cysteine proteases, Curr. Protoc. Protein Sci. Chapter 21 . 40 41 42 43 (42) Pickersgill, R. W., Harris, G. W., and Garman, E. (1992) Structure of monoclinic papain 44 45 at 1.60 Å resolution, Acta Crystallogr. B Struct. Sci. 48 , 5967. 46 47 48 (43) Turk, B., Turk, V., and Turk, D. (1997) Structural and functional aspects of papainlike 49 50 51 cysteine proteinases and their protein inhibitors, Biol. Chem. 378 , 141150. 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment 36 Biochemistry Page 38 of 47

1 2 3 (44) Ménard, R., Plouffe, C., Laflamme, P., Vernet, T., Tessier, D. C., Thomas, D. Y., and 4 5 6 Storer, A. C. (1995) Modification of the electrostatic environment is tolerated in the oxyanion 7 8 hole of the cysteine protease papain, Biochemistry 34 , 464471. 9 10 11 (45) Robertus, J. D., Kraut, J., Alden, R. A., and Birktoft, J. J. (1972) Subtilisin; a 12 13 14 stereochemical mechanism involving transitionstate stabilization, Biochemistry 11 , 42934303. 15 16 17 (46) Kamphuis, I. G., Drenth, J., and Baker, E. N. (1985) Thiol proteases. Comparative studies 18 19 based on the highresolution structures of papain and actinidin, and on amino acid sequence 20 21 22 information for B and H, and stem , J. Mol. Biol. 182 , 317329. 23 24 25 (47) Turk, D., Gunčar, G., Podobnik, M., and Turk, B. (1998) Revised definition of substrate 26 27 binding sites of papainlike cysteine proteases, Biol. Chem. 379 , 137147. 28 29 30 31 (48) Suzuki, K., Hata, S., Kawabata, Y., and Sorimachi, H. (2004) Structure, activation, and 32 33 biology of calpain, Diabetes 53 Suppl. 1 , S12S18. 34 35 36 (49) Campbell, R. L., and Davies, P. L. (2012) Structurefunction relationships in calpains, 37 38 Biochem. J. 447 , 335351. 39 40 41 42 (50) Low, K. E., Partha, S. K., Davies, P. L., and Campbell, R. L. (2014) Allosteric inhibitors 43 44 of calpains: reevaluating inhibition by PD150606 and LSEAL, Biochim. Biophys. Acta 1840 , 45 46 33673373. 47 48 49 50 (51) Ono, Y., Saido, T. C., and Sorimachi, H. (2016) Calpain research for drug discovery: 51 52 challenges and potential, Nat. Rev. Drug Discov. 15 , 854876. 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment 37 Page 39 of 47 Biochemistry

1 2 3 (52) Newell, P. D., Boyd, C. D., Sondermann, H., and O'Toole, G. A. (2011) A cdiGMP 4 5 6 effector system controls cell adhesion by insideout signaling and surface protein cleavage, PLoS 7 8 Biol. 9, e1000587. 9 10 11 (53) Moldoveanu, T., Hosfield, C. M., Lim, D., Elce, J. S., Jia, Z., and Davies, P. L. (2002) A 12 13 2+ 14 Ca switch aligns the active site of calpain, Cell 108 , 649660. 15 16 17 (54) Moldoveanu, T., Campbell, R. L., Cuerrier, D., and Davies, P. L. (2004) Crystal structures 18 19 of calpainE64 and leupeptin inhibitor complexes reveal mobile loops gating the active site, J. 20 21 22 Mol. Biol. 343 , 13131326. 23 24 25 (55) Hosfield, C. M., Elce, J. S., Davies, P. L., and Jia, Z. (1999) Crystal structure of calpain 26 27 reveals the structural basis for Ca 2+ dependent protease activity and a novel mode of enzyme 28 29 30 activation, EMBO J. 18 , 68806889. 31 32 33 (56) Partha, S. K., Ravulapalli, R., Allingham, J. S., Campbell, R. L., and Davies, P. L. (2014) 34 35 Crystal structure of calpain3 pentaEFhand (PEF) domain a homodimerized PEF family 36 37 38 member with calcium bound at the fifth EFhand, FEBS J. 281 , 31383149. 39 40 41 (57) Chatterjee, D., Boyd, C. D., O'Toole, G. A., and Sondermann, H. (2012) Structural 42 43 characterization of a conserved, calciumdependent periplasmic protease from Legionella 44 45 46 pneumophila , J. Bacteriol. 194 , 44154425. 47 48 49 (58) Cygler, M., and Mort, J. S. (1997) Proregion structure of members of the papain 50 51 superfamily. Mode of inhibition of enzymatic activity, Biochimie 79 , 645652. 52 53 54 (59) Fischer, G., KosinskaEriksson, U., AponteSantamaría, C., Palmgren, M., Geijer, C., 55 56 57 Hedfalk, K., Hohmann, S., de Groot, B. L., Neutze, R., and LindkvistPetersson, K. (2009) 58 59 60 ACS Paragon Plus Environment 38 Biochemistry Page 40 of 47

1 2 3 Crystal structure of a yeast aquaporin at 1.15 Å reveals a novel gating mechanism, PLoS Biol. 7, 4 5 6 e1000130. 7 8 9 (60) Cheung, A. C. M., and Cramer, P. (2011) Structural basis of RNA polymerase II 10 11 backtracking, arrest and reactivation, Nature 471 , 249253. 12 13 14 15 (61) Basilio, D., Noack, K., Picollo, A., and Accardi, A. (2014) Conformational changes 16 17 required for H +/Cl exchange mediated by a CLC transporter, Nat. Struct. Mol. Biol. 21 , 456463. 18 19 20 (62) Smith, D. M., Chang, S.C., Park, S., Finley, D., Cheng, Y., and Goldberg, A. L. (2007) 21 22 23 Docking of the proteasomal ATPases' carboxyl termini in the 20S proteasome's a ring opens the 24 25 gate for substrate entry, Mol. Cell. 27 , 731744. 26 27 28 (63) Dange, T., Smith, D., Noy, T., Rommel, P. C., Jurzitza, L., Cordero, R. J. B., Legendre, 29 30 31 A., Finley, D., Goldberg, A. L., and Schmidt, M. (2011) Blm10 protein promotes proteasomal 32 33 substrate turnover by an active gating mechanism, J. Biol. Chem. 286 , 4283042839. 34 35 36 (64) Schechter, I., and Berger, A. (1967) On the size of the active site in proteases. I. Papain, 37 38 Biochem. Biophys. Res. Commun. 27 , 157162. 39 40 41 42 (65) Yamamoto, D., Ishida, T., and Inoue, M. (1990) A comparison between the binding 43 44 modes of a substrate and inhibitor to papain as observed in complex crystal structures, Biochem. 45 46 Biophys. Res. Commun. 171 , 711716. 47 48 49 50 (66) Ménard, R., Carrière, J., Laflamme, P., Plouffe, C., Khouri, H. E., Vernet, T., Tessier, D. 51 52 C., Thomas, D. Y., and Storer, A. C. (1991) Contribution of the glutamine 19 side chain to 53 54 transitionstate stabilization in the oxyanion hole of papain, Biochemistry 30 , 89248928. 55 56 57 58 59 60 ACS Paragon Plus Environment 39 Page 41 of 47 Biochemistry

1 2 3 (67) Dufour, E., Storer, A. C., and Ménard, R. (1995) Engineering nitrile hydratase activity 4 5 6 into a cysteine protease by a single mutation, Biochemistry 34 , 1638216388. 7 8 9 (68) Gavira, J. A., GonzálezRamírez, L. A., OliverSalvador, M. C., SorianoGarcía, M., and 10 11 GarcíaRuiz, J. M. (2007) Structure of the mexicainE64 complex and comparison with other 12 13 14 cysteine proteases of the papain family, Acta Crystallogr. D Biol. Crystallogr. 63 , 555563. 15 16 17 (69) Gomes, M. T. R., Teixeira, R. D., Lopes, M. T. P., Nagem, R. A. P., and Salas, C. E. 18 19 (2012) Xray crystal structure of CMS1MS2: a high proteolytic activity cysteine proteinase from 20 21 22 Carica candamarcensis , Amino Acids 43 , 23812391. 23 24 25 (70) GonzálezPáez, G. E., and Wolan, D. W. (2012) Ultrahigh and high resolution structures 26 27 and mutational analysis of monomeric Streptococcus pyogenes SpeB reveal a functional role for 28 29 30 the glycinerich Cterminal loop, J. Biol. Chem. 287 , 2441224426. 31 32 33 (71) Katerelos, N. A., Taylor, M. A. J., Scott, M., Goodenough, P. W., and Pickersgill, R. W. 34 35 (1996) Crystal structure of a caricain D158E mutant in complex with E64, FEBS Lett. 392 , 35 36 37 38 39. 39 40 41 (72) Li, Z., Kienetz, M., Cherney, M. M., James, M. N. G., and Brömme, D. (2008) The crystal 42 43 and molecular structures of a :chondroitin sulfate complex, J. Mol. Biol. 383 , 7891. 44 45 46 (73) Varughese, K. I., Ahmed, F. R., Carey, P. R., Hasnain, S., Huber, C. P., and Storer, A. C. 47 48 49 (1989) Crystal structure of a papainE64 complex, Biochemistry 28 , 13301332. 50 51 52 (74) Varughese, K. I., Su, Y., Cromwell, D., Hasnain, S., and Xuong, N.H. (1992) Crystal 53 54 structure of an actinidinE64 complex, Biochemistry 31 , 51725176. 55 56 57 58 59 60 ACS Paragon Plus Environment 40 Biochemistry Page 42 of 47

1 2 3 (75) Pei, J., Kim, B. H., and Grishin, N. V. (2008) PROMALS3D: a tool for multiple protein 4 5 6 sequence and structure alignments, Nucleic Acids Res. 36 , 22952300. 7 8 9 (76) Hodder, A. N., Drew, D. R., Epa, V. C., Delorenzi, M., Bourgon, R., Miller, S. K., Moritz, 10 11 R. L., Frecklington, D. F., Simpson, R. J., Speed, T. P., Pike, R. N., and Crabb, B. S. (2003) 12 13 14 Enzymic, phylogenetic, and structural characterization of the unusual papainlike protease 15 16 domain of Plasmodium falciparum SERA5, J. Biol. Chem. 278 , 4816948177. 17 18 19 (77) Berti, P. J., and Storer, A. C. (1995) Alignment/phylogeny of the papain superfamily of 20 21 22 cysteine proteases, J. Mol. Biol. 246 , 273283. 23 24 25 (78) Li, Q.G., and Zhang, Y.M. (2013) The origin and functional transition of P34, Heredity 26 27 (Edinb.) 110 , 259266. 28 29 30 31 (79) Zhang, M., Wei, Z., Chang, S., Teng, M., and Gong, W. (2006) Crystal structure of a 32 33 papainfold protein without the catalytic residue: a novel member in the cysteine proteinase 34 35 family, J. Mol. Biol. 358 , 97105. 36 37 38 (80) Juillot, S., Cott, C., Madl, J., Claudinon, J., van der Velden, N. S. J., Künzler, M., 39 40 41 Thuenauer, R., and Römer, W. (2016) Uptake of Marasmius oreades agglutinin disrupts 42 43 integrindependent cell adhesion, Biochim Biophys. Acta 1860 , 392401. 44 45 46 (81) Sueldo, D. J., and van der Hoorn, R. A. L. (2017) Plant life needs cell death, but does 47 48 49 plant cell death need Cys proteases?, FEBS J. (in press) . 50 51 52 (82) Diederichs, K., and Karplus, P. A. (1997) Improved Rfactors for diffraction data analysis 53 54 in macromolecular crystallography, Nat. Struct. Biol. 4, 269275. 55 56 57 58 59 60 ACS Paragon Plus Environment 41 Page 43 of 47 Biochemistry

1 2 3 (83) Evans, P. (2006) Scaling and assessment of data quality, Acta Crystallogr. D Biol. 4 5 6 Crystallogr. 62 , 7282. 7 8 9 (84) Hofmann, B., Schomburg, D., and Hecht, H. J. (1993) Crystal structure of a thiol 10 11 proteinase from Staphylococcus aureus V8 in the E64 inhibitor complex, Acta Crystallogr. A 12 13 14 Biol. Crystallogr. 2, c102. 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment 42 Biochemistry Page 44 of 47

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Proteolytic activity of PSL1a. (a) SDSPAGE analysis of α1antitrypsin (α1AT) digested with PNGase F, MOA 17 or PSL1a in the presence of calcium. Incubation of α1AT with MOA or PSL1a alone (lanes 2 and 3) yields singleband shifts toward lower molecular weight, as expected for defined proteolytic cleavage. In contrast, 18 sequential digestion of α1AT with PNGase F and either PSL1a or MOA (lanes 5 and 6) shifts the entire band 19 pattern observed in lane 4 towards lower molecular weight (deglycosylated products, see Hirsch et al.).36 20 As previously established for MOA,7 this result is consistent with PSL1a proteolyic rather than Nglycolytic 21 activity. (b) Effect of different divalent cations on the enzymatic activity of PSL1a. Activity was only 22 observed when digesting α1AT in the presence of calcium or manganese (II) (5 mM) (compare lanes 2, 3 23 and 8). Other divalent cations, tested at the same concentration, could not functionally replace Ca2+ or 24 Mn2+; E64 inhibits PSL1a activity (lane 11). Metal ionmediated protein degradation can be held 25 accountable for the decrease in intensity of the α1AT band in the presence of Cu2+ or Fe2+ (lanes 6 and 9; see Stadtman38). 26 27 1057x251mm (72 x 72 DPI) 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment Page 45 of 47 Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 Figure 2. E64 inhibitor bound to PSL1a and MOA. (a) Structure of PSL1a homodimer (light/dark blue) in complex with two E64 inhibitor molecules (green sticks), four calcium ions (magenta spheres) and two 48 galactose molecules (yellow sticks). (bd) Structures and composite OMIT maps of E64 (contoured at 1σ) 49 bound to PSL1a (b,c) or MOA (d). In one of the two PSL1a protomers (c), the argininelike (agmatine) tail of 50 E64 adopts only a single conformation, stabilized by crystal contacts. (e) Chemical structure of the E64 51 inhibitor, with the atom numbering suggested by LeungToung et al..38 (f) Reaction mechanism of E64 52 with the nucleophilic cysteine of PLCPs, as proposed by Matsumoto et al..39 (g) Scheme of the PLCP 53 catalytic cleft, first show n by Cordara et al.13 and adapted from the representation proposed by Brömme.40 54 Subsites S3 and S2’ are drawn as dotted lines to stress their nature of ‘binding areas’, whereas subsites S4 55 and S3’ are represented as shallow grooves to stress their very low conservation among PLCP members. The S2 subsite is shown as a deep pocket to highlight its prominent role in determining the substrate preference. 56 57 558x767mm (72 x 72 DPI) 58 59 60 ACS Paragon Plus Environment Biochemistry Page 46 of 47

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment Page 47 of 47 Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 Figure 3. Active site interactions. (a) PSL1a. Left panel: Detailed interactions between the E-64 inhibitor (green) and the catalytic cleft of PSL1a (blue). Hydrogen bonding interactions are highlighted as yellow 48 dotted lines, calcium ions represented by magenta spheres. Right panel: Schematic representation of PSL1a 49 active site. (b) MOA. Left panel: E-64 interactions with the catalytic cleft of MOA (orange/wheat). Right 50 panel: Schematic representation of the MOA active site. (c) Comparison of PSL1a, MOA and papain. Left 51 panel: S2 binding pocket (superimposition of PSL1a, blue, and MOA, orange/wheat). Middle panel: active 52 site (superimposition includes papain (grey/black); PDB ID: 1PPN).41 Right panel: Schematic view of the 53 catalytic cleft of papain (adapted from Turk et al.).42 The oxyanion hole is marked. 54 55 1057x1527mm (72 x 72 DPI) 56 57 58 59 60 ACS Paragon Plus Environment Biochemistry Page 48 of 47

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Figure 4. Calciumbased activation mechanism. (a) Comparison between the calciumfree (dark colors)6, 18 32 and the calciumbound forms (light colors; this work and 12) of PSL1a (upper panels) and MOA (lower 33 panels). Calcium binding triggers a conformational change of a stretch of residues of the Rdomain 34 antiparallel β–barrel (PSL1a: Ala165Gly177, MOA: Ile173Gly184). The change leads to a tilt of the R 35 domain relative to the Ldomain (lighter colors; for R/L domain partitioning, see Figure S 1). At the molecular 36 level, the conformational change results in a ≈6Å backbone shift of the affected residues. The hydroxyl group of a tyrosine residue (PSL1a: Tyr173, MOA: Tyr180) shifts ≈15 Å, freeing the catalytic cleft of the 37 symmetryrelated protomer (transparent). (b) The conformational change leads to the proper alignment of 38 the catalytic machinery. The tyrosine shown in (a) acts as a gating residue, moving away from the catalytic 39 cysteine and out of the S2 subsite (solvent exposed surface shown in either light blue, for PSL1a, or wheat, 40 for MOA). A key aspartate (PSL1a: Asp207; MOA: Asp214) is also realigned. PDB files: PSL1a (PDB ID: 41 5MUA, Ca2+bound, PDB ID: 3PHZ,18 Ca2+free); MOA (PDB ID: 2IHO,6 Ca2+free; PDB ID: 3EF2,12 42 Ca2+bound). 43 44 1017x732mm (72 x 72 DPI) 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment SUPPLEMENTARY INFORMATION

A family of papain-like fungal chimerolectins with

distinct Ca2+-dependent activation mechanism

Gabriele Cordara1,†,‡,*, Dipankar Manna1,†,‡ and Ute Krengel1,*

1Department of Chemistry, University of Oslo, PO Box 1033 Blindern, 0315 Oslo, Norway

1 Supplementary Figure S1

Figure S1. Overall structural comparison of PSL1a, MOA and papain. (a) Superimposition of the homodimers of MOA (orange, PDB ID: 2IHO)1 and PSL1a (blue, PDB ID: 3PHZ)2. (b)

Superimposition of the papain-like domain of PSL1a and MOA with papain (PDB ID: 1PPN)3 aligned according to the ‘standard view’.4, 5 The canonical L- (‘left’) and R-domain (‘right’) partition found in all PLCPs is well preserved in the dimerization domains of MOA and PSL1a.

2 Supplementary Figure S2

Figure S2. Role of different agents on PSL1a enzymatic activities. (a) The active pH region of

PSL1a was probed by digesting α1-AT at a range of different pHs (from 3 to 11), using different buffering agents for each pH point. Those include citrate (pH 3), formate (pH 3.5), succinate (pH

4), acetate (pH 4.5), propionate (pH 5), citrate (pH 5.5), MES (pH 6), bistris (pH 6.5), bistris propane (pH 7), HEPES (pH 7.5), Tris (pH 8), bicine (pH 8.5), bistris propane (pH 9), CHES

(pH 9.5), piperazine (pH 10), 1,3-diaminopropane (pH 11). (b) Thiol-modifying agents such as iodoacetamide or N-ethylmaleimide alkylate the catalytic cysteine (Cys208) inhibiting proteolytic activity.

3 Supplementary Figure S3

MOA 1 M------1 PSL-1a (38%) 1 M------1 PSL-1b (38%) 1 M------1 SCL-AGL1 (32%) 1 M------1 XP_002398917 (43%) ------XP_002393886 (71%) 1 M------1 ZP_09568250 (33%) 1 MDEILVEVRNNVAQ------PRPARVVEPWDRKDRPQVPEPTT------AQRLGITPRFSDPIPL------53 EGN95167.1 (38%) 1 M------1 EGO20674 (37%) 1 M------1 EGO20678 (38%) 1 M------1 WP_041331075 (27%) ------WP_072555782 (27%) 1 M------1 OBW61112 (29%) ------KZV64829 (44%) ------KZV73639 (32%) 1 M------1 KZV65628 (42%) 1 M------1 WP_019503841 (33%) ------KDQ29914 (35%) ------KII87094 (41%) 1 M------1 ANC28063 (50%) 1 M------1 SDC03422 (30%) 1 MPIHKLVPGQPDDANSPLV-----GQILPRPEPVEPPPEPDTTAPD------FTAELTRALGL------52 WP_070091156 (33%) 1 MPIHKLVPNQPHDANSPLV-----GQVLPRPEPVEPPPEPVTTKPD------FTAELTRAMGL------52 WP_043215011 (29%) ------WP_050705034 (35%) 1 MSL------DTL------6 WP_070572439 (30%) 1 MPIHKLVPGQPDDANSPLV-----GQILPRPEPVEPPPEPDTTAPG------FTAELTRALGL------52 KIM22138 (34%) 1 MA------2 KIM25353 (34%) 1 MSSTFVSGFKVVYGDEPNSKPSNVITDVSGNGEDINKYYAGRYVWIVPITTDAGNAACTGFKVDIQSDANPNYDNLAEGTDGDHRYLIPIIDCTTNKKITEIRLMRSSSSVSKLPSGYSGMTSDIN 126 KIJ30840 (44%) 1 M------1 KIJ39841 (43%) 1 M------1 KIJ35645 (43%) 1 M------1 CDO76517 (44%) 1 M------1 CDO76070 (43%) ------KIO23019 (33%) 1 MPAYSA------TTSSRNREKTGY------DDEALPPAYEENA------31 KIO23020 (31%) ------KIO23021 (31%) 1 MAAAT------TNMQSNKTDES------ETNALPAYSNADT------29 KIO23033 (34%) 1 MEIE------GLSRS------KGSDGQ------15 KIO25705 (35%) 1 MQSN------TAAEN------9 KIO17420 (31%) 1 M------1 KIO17421 (32%) 1 MDVGGTENGEPVFCSASAEDCAGQLWDLERASRTTLEIKTMPMVSENTKLQD------TTKSQNEPPSYVAEDA------68 KIO16956 (28%) 1 M------1 KIO20974 (32%) 1 M------1 WP_047961396 (33%) 1 MENKDKTIKI------PSVSDSG--KSTLPESENMNRSED------PISLSQNDSSFFMENDPIPPE------53

MOA 2 ------SLR-RGIYHIENAGV--PSAIDLKDGSSSDGTPIVGWQFTP-----DTINWHQLWLAEPIPNV------ADTFTLCNLFSG------TYMDLYNG 76 PSL-1a (38%) 2 ------SFQG-HGIYYIASAYVANTRLALSEDSSANKSPDVIISSDAV------DPLNNLWLIEPVGE------ADTYTVRNAFAG------SYMDLAGH 76 PSL-1b (38%) 2 ------SFEG-HGIYHIPHAHVANIRMALANRGSGQNGTPVIAWDSNN------DAFDHMWLVEPTGE------ADTYTIHNVSTG------TYMDVTAS 76 SCL-AGL1 (32%) 2 ------GID-QGVYLITNAGS--GTAATLK------DGYVKGWERIK-----GKEQLNQLWFVKSVKTSA----SDPGYAAFNLATN------KVLDLEKG 72 XP_002398917 (43%) ------XP_002393886 (71%) 2 ------SLS-RGVYFIRNAQA--SLNIELNNGSPNNGTVVSGWEPKPWADASDFNASSQLWLVEPIKDT------TNTYTLRNIKGG------TYFELSGG 81 ZP_09568250 (33%) 54 ------SAIDPNVPYLIVDEYD---RILCPDNY------GQSYWDYMFFGKFE----DYAGYVLGTQFSMNPL------AGYAALHLGPP 118 EGN95167.1 (38%) 2 ------TFSY-VARHQLTCGRR--CVVVVVLQGSPTNGTKVQGWTKWPF---SNPQCLAQLWLVTPINGT------ADSFTLQNLQGG------LYLDLPEG 79 EGO20674 (37%) 2 ------SLI-GGIYSIENSAV--PLAIDLSSGGSADGTKVQGWGKWLF---SNPNCLAQLWLVTPTGA------TDAFTLQNLRGG------LYLDLPNG 77 EGO20678 (38%) 2 ------TFSY-VARHQLTCGRR--CVVVVVLQGSPTNGTKVQGWTKWPF---SNPQCLAQLWLVTPINGT------ADSFTLQNLQGG------LYLDLPEG 79 WP_041331075 (27%) ------WP_072555782 (27%) ------OBW61112 (29%) ------KZV64829 (44%) 1 ------MWVIKSVSRE------SDTYTIRNMVGG------TYMSLTGS 30 KZV73639 (32%) 2 ------SFHG-EGIYHIEHARV--AVRIALASPSSDDGTPVVGWSVNE------DVLHHMWLVESVLNE------EDTYTIRNTVSG------TYMDLTGS 75 KZV65628 (42%) 2 ------SFSADSGVYYIEHAHV--PVRMAMSTDSSNPSVPVVGWSVND------DILDHMWLITPVSGE------KNTYHIRNLVTG------TYMDLDRG 76 WP_019503841 (33%) ------KDQ29914 (35%) 1 ------MWLIDPVEGY------NDTFTIRNLASG------NYLDLTDG 30 KII87094 (41%) 2 ------PLD-RAIYRISYAGA-PDVSLTLGNGGSEDDAKVVTSAEWQ----QEQGIFDQLWYILPFSSA------TDTYELRNIKDG------TYLTLPNG 78 ANC28063 (50%) 2 ------TLS-RGIYHIENAGV--PVAIDLYNGSSADGTAIKGWSFGD----GTKLNWNQLWLVEPVAGK------TDTFTVRSLIAG------SYMDLAGG 77 SDC03422 (30%) 53 ------KQIKATDRMVLRDQHG---RVLCVHAK------SDDWGWVYLGEPER--YGAELVPLVVHLDPTSQNLSMSAEKKMVGWSFFADGN 127 WP_070091156 (33%) 53 ------KQIKATDRMVLRDQYG---RVLCIHAK------SDDWGWVYLGEPER--YGAELVPLVVHLDPTSQNLSMSAEKKMVGWSFFADGN 127 WP_043215011 (29%) 1 ------MPLVVHLDPTSQNLSMSAEKKMVGWSFFADGN 32 WP_050705034 (35%) 7 ------ASDESYVLFDEHG---RVLCAQQQ------GSWGWAYLGDFDT--YNSDVLYIAVSATGHR------RLATLDVSAN 66 WP_070572439 (30%) 53 ------KQIKATDRMVLRDQHG---RVLCVHAK------SDDWGWVYLGEPER--YGAELVPLVVHLDPTSQNLSMSAEKKMVGWSFFADGN 127 KIM22138 (34%) 3 ------EDYFNQNWYIEKTGG------GYYYTIRNIRSN------TCMDLAGD 37 KIM25353 (34%) 127 AGRYKKSDYLYVIWKTTEFDTTTLS-DGVYVISNRGT--GTVVDLLGGYVENGTKIQGWANSP----TNYGHFNQTWCIKQNPG------QRCYTIRNIRSN------VCMDLAGG 223 KIJ30840 (44%) 2 ------TQFLLDTSVVAWGTSD------ERGEQLWLIEPVSGE------ADTYTVRNLCGG------SYMDLSI- 51 KIJ39841 (43%) 2 ------SLG-RGIYKISSRYY--SVRIALQGGSNVDSTSVVAWGTSD------ERDEQLWLIEPVSGE------ADTYTVRNLCGG------SYMDLSAA 74 KIJ35645 (43%) 2 ------SLG-RGIYKISSRYY--SVRIALQGGSNVDSTSVVAWGTSD------ERDEQLWLIEPVSGE------ADTYTVRNLCGG------SYMDLSAA 74 CDO76517 (44%) 2 ------TFNK-DGIYRIEHAYV--PVRIALAAHSSDDGTPVIAWSVND------DYLDHMWLIKSVSGE------TDTYTIRNMVGG------SYMDLSGS 75 CDO76070 (43%) ------KIO23019 (33%) 32 ------RPAVPEGLYVITNRRS--RLVLDLRDGESRDGALCYAYTRNN-----NDCIDHQFWIVKQNGS------NGAYTLQNIRGG------HFLSLPWK 107 KIO23020 (31%) 1 ------MTSGP-----NNPPPPAYETEGLLPD------GLYVLTNIMSR------TVLDLKGG 40 KIO23021 (31%) 30 ------SQGISEGLYVLVNKKA--RTLLDLSGGNTGSPATCEGYQRNV-----NEKIDHQLWVISQVER------DGNYRIMSYRAG------TFLDLEAG 105 KIO23033 (34%) 16 ------ATQLREGLYFLVNKKS--RTSLDVNAG---NGVRVQGYEPNL-----DNNIGNQMWAITKERL------NDTHTLINIQHG------TYLDLQGG 88 KIO25705 (35%) 10 ------AQWIPEGLYVLINKKT--RTLVDLSGGHTESGSPIEGWSRKF-----DALIDHQFWVISRSES------EGSYRIMSYRAG------TFVDLVAQ 85 KIO17420 (31%) ------KIO17421 (32%) 69 ------IQAIPEGLYTLTNKKI--RKLLVVHNGDIRSGTACIGWPRIT-----GSSLDFQLWALKHASA------DGSYTLMSLRAG------TYIDLQAG 144 KIO16956 (28%) ------KIO20974 (32%) 2 ------LGLG-KGL------GFNLNGVHATCLPRNF------24 WP_047961396 (33%) 54 ------DKINPHVSYLIIDDHK---RTLCIDDIEY------DKKR-----AYMGSEQYYYAIIILKLRTLKTPGNPMRVAGLYQG------KRFFMYAY 126

MOA 77 SS----EAG------TAVNGWQGTA-FTTNP------HQLWTIKKSSD------GTSYK------IQNYGSK--TFVD--LVNGDSSDG-AKIA 136 PSL-1a (38%) 77 AA----TDG------TAIIGYRP----TGGD------NQKWIISQI------NDVWK------IKSKETG--TFVT--LLNGDGGGTGTVVG 132 PSL-1b (38%) 77 AV----ADN------TPIIGYQR----TGND------NQKWIIRQVQT----DGG------DRPWK------IQCKATG--TFAT--LYSGGGSGT-AIVG 136 SCL-AGL1 (32%) 73 LA----DRG------TPILGWDY----HEGD------NQHWVIKKQAD------GKYYK------IQSVKTQ--TFVD--LNNGGKDNG-TKIQ 129 XP_002398917 (43%) ------XP_002393886 (71%) 82 NA----ASG------TQIIGSQG----TGGN------TQKWIIKKDS------SGNYR------IQNVASQ--TFVD--LYLGGTANG-TAIY 137 ZP_09568250 (33%) 119 HS------QVALYAY----GSST------KYEYLIYGQPGYAKSPIARMKA------VNSSGDHFN------LRYEIAGGDMYLC--CDAASW----NWAY 185 EGN95167.1 (38%) 80 KS----TNG------NPLQGWQR-----AGN------NQVWTIRKVGT----FYR------MQNAQ------GGT------FVD--LLYGGTANG-TVIE 133 EGO20674 (37%) 78 SS----ANG------NPLQGWQR-----VGN------NQVWIIRQTGTFYKMQNLQGGSTCSLHSVLIALRAPE------TRVIVLP--AFVN--LLDGSSSNG-TAVE 154 EGO20678 (38%) 80 KS----TNG------NPLQGWQR-----AGN------NQVWTIRKVGT----FYR------MQNAQ------GGT------FVD--LLYGGTANG-TVIE 133 WP_041331075 (27%) 1 ------MKQIVMTAEE------VFS------ILNKLHV------20 WP_072555782 (27%) 2 ------AGGK------MKQIVMTAEE------VFS------ILNKLHV------25 OBW61112 (29%) 1 ------MTAEE------AFS------TLHKLLV------15 KZV64829 (44%) 31 A-----TNG------TAIVGSQQ-----AGD------NQKWIIKKGTRSD------MTYSR------IQNKGNR--SFID--LAGGAGANG-TKVL 87 KZV73639 (32%) 76 SA----ATE------TPIIGFRK----VLPG------ENQKWIIKKESTG------TGFWK------IQNKATG--TFLD--LWNGGGGNG-TKIT 134 KZV65628 (42%) 77 SS----GDG------AAITGYYPW---EPAE------NQKWIVKQVK------EGIWK------IQNKKSK--TFAD--LLNGGSANG-TKIV 133 WP_019503841 (33%) ------KDQ29914 (35%) 31 YQ----SDG------TSIYCRAP----TGGN------NQKWIIKRDVHQ--EPQK------ELLWK------IQNEASK--TFAD--LYGG-GPHL-TPIT 91 KII87094 (41%) 79 NH----TNG------TPVVGWQN----TNDA------WQRWVILPAPGD------SKTYM------LQNFIGK--SFLT--LSEGNSKNG-TKIV 136 ANC28063 (50%) 78 SS----DNQ------TPITGSQR-----TGT------NQEWIIKYSAD------NKSYK------MMNSASG--TFAD--LYNGGSLDG-TPIY 133 SDC03422 (30%) 128 ST----AGD------YVFAGFKVW---TDYP------RLRFRIKRSGTTIVNHRQ------MFAFT------LSCEIGS--TRMA--VQAPAGKW--GWVK 191 WP_070091156 (33%) 128 ST----DGD------YVFAGFKVW---RDYP------RLRFRIKRSGTTIVNHRQ------TFAFT------LSCEIGG--TRMV--LQAPAGKW--GWVK 191 WP_043215011 (29%) 33 ST----AGD------YVFAGFKVW---TDYP------RLRFRIKRSGTTIVNHRQ------MFAFT------LSCEIGS--TRMA--VQAPAGKW--GWVK 96 WP_050705034 (35%) 67 DRPWSLYANGPIGRDPWLFASGQNWPGYPRIKLCPVS---HGND------AQQFTLQVEI------DGDRK------AFKAGNS--TW------DWLY 135 WP_070572439 (30%) 128 ST----AGD------YVFAGFKVW---TDYP------RLHFRIKRSGTTIVNHRQ------MFAFT------LSCEIGS--TRMA--VQAPAGKW--GWVK 191 KIM22138 (34%) 38 VG----ANG------GTVKGWEA----NNTN------AQNWYIEGND------QTGYS------IVNVGTG--TVLD--LENSRADNG-APIW 93 KIM25353 (34%) 224 SA----ADG------TPVHGYEA----NDSD------AQNWYIEGNN------QTGYS------IFNRGSN--TALD--LYTSNSENG-TPII 279 KIJ30840 (44%) 52 ------FHS----TKSN------SQKWVIRKESTN------GQSWNWRYISLFHISSKSLR--ILIQRYLQW------99 KIJ39841 (43%) 75 ------ADG------TSITGFHS----TGSN------SQKWVIRKESTN------GQSWK------IQNKATQ--TFAD--LSWGGTSNG-TVIC 130 KIJ35645 (43%) 75 ------ADG------TSITGFHS----TGSN------SQKWVIRKESTN------GQSWK------IQNKATQ--TFAD--LSWGGTSNG-TVIC 130 CDO76517 (44%) 76 SA----VDG------TPIIGFHK----TNPG------ENQKWIIKKETSG------SGHWK------IQNKATQ--TFVD--LLDGGSSNG-TKIV 134 CDO76070 (43%) 1 ------MQNKAAQ--TFVE--LMDGADSNG-TKIV 24 KIO23019 (33%) 108 LF----SIPLKQ------QVTEPIICHAREPK-GTNRL------NQEWHIQTKS------TGSYA------IQSARTH--KFLD--VRQGWLVNS-LLLS 173 KIO23020 (31%) 41 SS----SAG------KRCAGYSRNT-NDRID------HQLWIVKHDNS------NNTYT------LKNFRGG--TYLD--LEGGNTSNG-TYAN 100 KIO23021 (31%) 106 KS----EKK------SRVMVYNRVEKGDGRL------HQEWIISPEV------YGYHT------IQCAKTK--TFLE--ILDDKPENA-THVV 165 KIO23033 (34%) 89 LR----NNG------SAVISSAWQLDNQSRS------NQEWRIEERE------PDYFV------IQCSSTD--SYLE--LPGGSPQNS-TLAT 148 KIO25705 (35%) 86 KK----DPK------SPVAAFNYVGSGASRF------NQEWNISGGV------DGYYT------IECVGSH--TFLE--IQDGNAADK-THVT 145 KIO17420 (31%) 2 ------T 2 KIO17421 (32%) 145 KT----ANR------APVIAHSYASSEPSRL------NQEWRISEEV------PGYYT------LRCARTE--TFLE--MDDESTEKW-RKIY 204 KIO16956 (28%) 2 ------TPSDG-AEVT 10 KIO20974 (32%) 25 ------QGE----RGPK------SQEWFLQPED------DELYT------LRNSDGG--TVLE--IANGSSKNY-TAVQ 70 WP_047961396 (33%) 127 DT----SEE------TTKICWDT----PNSSPEDKRISLKKEYVKTEDG------MRKYR------LIWNNQG--TEMF--LYSHAH----DKSK 187

4 Supplementary Figure S3 (cont.)

MOA 137 GWTGTWDEG----NPHQKWYFNRMSV--SSAEAQAAIARN-P------HIHGTYRG-YILDGE------YLVLPNATFTQIW------KDSGLPGSK---WREQIYDCDDFAI------220 PSL-1a (38%) 133 WQNITN------NTSQNWTFQKLSQ--TGANVHATLLACPA------LRQDFKS-YLSDGL------YLVLTRDQISSIW------QASGLGSTP---WRSEIFDCDDFAT------213 PSL-1b (38%) 137 WRLVNS------NGNQDWVFQKLSQ--TSVNVHATLLACGA------TVGQDFKN-YLYDGL------YLVLPRDRISAIW------KASGLGETA---RRDGIYDSDEFAM------218 SCL-AGL1 (32%) 130 GWVGSWDET----NSHQRWVFNQFSA--TGSRIKSILDTH-P------KIHGKVIV-EWPDRI------YFTPDQYIFEAIW------AKTGLADIK---PRDPLFQSEAYEK------213 XP_002398917 (43%) 1 ------QWVFQRMSR--STDEIRNILLRN-P------NNTQDFHC-YNADVE------YLILPLFFWKVIW------NNSGLSNRK---SRGEIFDIDDFAL------71 XP_002393886 (71%) 138 GWTGQWNTT----GPHQQWVFQRMSQ--STDEIRNILLRN-P------NNTQDFHC-YNADVE------YLILPQGLWKVIW------IA------RVSPIVNDGEA------212 ZP_09568250 (33%) 186 VGNTTYRRIP---LTARKFYLRK------ADLFTLFRETWPS-----ADINEMSF--RNLDDE------YELVHAAKASQIY------NNSGLGNFR---YVKEVFDCDDFGY------267 EGN95167.1 (38%) 134 GWEGEWSSTV---DEHQLWSFQLVSQ--TGTQIANTLNAN-P------ILKGHIVN-YQLDGI------YWVVDKDLVASIW------TGANLGVGKKYTLRPELFDQDDFVF------221 EGO20674 (37%) 155 GWAGEWNDTA---NQSQLWSFQPMSR--TSAEVDNALIAN-P------VLQGHFVN-YQMNGI------YWVLDKDLVKNIW------AGANLGEGKKYTLRPELFDQDDFTLCALFFH 248 EGO20678 (38%) 134 GWEGEWSSTV---DEHQLWSFQLVSQ--TGTQIANTLNAN-P------ILKGHIVN-YQLDGI------YWVVDKDLVASIW------TGANLGVGKKYTLRPELFDQDDFVF------221 WP_041331075 (27%) 21 ------KTIMIDDE------LLYYLAEEDWLPIL------KEASANLPP---YQPDRFDCDNFAI------64 WP_072555782 (27%) 26 ------KTIMIDDE------LLYYLAEEDWLPIL------KEASANLPP---YQPDRFDCDNFAI------69 OBW61112 (29%) 16 ------KTIVIDDE------LLYYLTEEDWLPIL------KEASANLPQ---YQPDRFDCDNFAI------59 KZV64829 (44%) 88 GWQSTWDDSN----TNQRWNFSSQSL--TSAEVRTALDAN-P------YLRSDFKS-FVSDGM------YLILSQARYQEIW------RNSGLQSRR---WREEVFDCDDFSF------171 KZV73639 (32%) 135 GCKGSWAVDNTTSLGHQHWTFRPQSL--RGDEVHRILRDS-P------HLRQGSQS-YRSDGK------YLILSEARFQEIW------RTSGLQDRR---WRSEIFDYEDFAF------222 KZV65628 (42%) 134 GWKGDWDYKS---SDRQNWTFNRQSG--IIDGDMKDIITYNQ------LIGQNVQS-FAPNRI------YLTLSKDRLREIW------RYSGLQDNSS-RRRPDVFDEDALSF------221 WP_019503841 (33%) 1 ------MSNYISGQEVHSIVRQQIG------AKLGKNYKF-HSPDGK------YYCPSVGYTQKVI------QWSSIDRNK---WVKERFDCDDFAL------69 KDQ29914 (35%) 92 SEQ------IRTVLQAS-P------YIQKNFPS-FNADGLRVSSSFSYLVLSSERLRAIW------GDTDLGSTK---SRPEIFDIDDFAT------159 KII87094 (41%) 137 GHTGSWKST----NPHQQWLVEKVSF--TGEEVHETLLRN-P------VIGPKFSG-YQFDSI------YLALPQAVLLEIW------NASNLETTK---WRPEIFDNQDFAF------220 ANC28063 (50%) 134 GWQGDFNNN----QPHQCWFFKRMSL--SSAQVNAVIKNN-P------HLSTQYKG-YQTDGE------YLILPPYLWQEIW------NTTGLSKGNK-KWRREIFDCDDFAL------219 SDC03422 (30%) 192 LVPESDVSD----DDQLRFSLHKLSI--PLVRVRELIKATWP------NTTLTYQ--TSVDSY------YEAISAQQARGIW------VESGLKYFK---YKPEVFDCDDFAL------275 WP_070091156 (33%) 192 LMPESDISD----DHEIRFSLHKLSI--PFGRVRDLIKATWP------DTTLTYQ--TSVDAY------YEAISAQQAHGIW------VESGLKSFK---YKPEVFDCDDFAL------275 WP_043215011 (29%) 97 LVPESDVSD----DDQLRFSLHKLSI--PLVRVRELIKATWP------NTTLTYQ--TSVDSY------YEAISAQQARGIW------VESGLKYFR---YKPEVFDCDDFAL------180 WP_050705034 (35%) 136 LASPA------KEPTRFTLHRYHI--TGQRLFDLIKTTWP------TVAADAL--CEADSA------YQAISSYHARQIW------QDTGLGRCR---WRSESFDCDDFAF------215 WP_070572439 (30%) 192 LVPESDVSD----DDQLRFSLHKLSI--PLVRVRELIKATWP------NTTLTYQ--TSVDSY------YEAISAQQARGIW------VESGLKYFK---YKPEVFDCDDFAL------275 KIM22138 (34%) 94 GWRSNG------RANQLWFFERRSR--STAEVHTILTAS------QPQAFQT-YPTDRL------FVIVPPGAVDAVW------KSQGLAPGK---WRAESFDCDDFTF------172 KIM25353 (34%) 280 GYKSHG------GANQLWFFERRSR--SVTEVRTILQAS------QTQAFSS-YSVEKL------CIICPQEVIDTVW------RNQGLQNRE---SRPELYDSDYFAF------358 KIJ30840 (44%) 100 ------HNHQQWMFDIQSR--TASEVHASINNCS------YISRDFES-YLSDAL------YLILPRQKIQSIW------QNSGLGNKA---WRNNIFDCDDFAT------174 KIJ39841 (43%) 131 GWQGAWSDTN--GQSHQQWMFDIQSR--TASEVHASINNSS------YISRDFES-YLSDGL------YLILPRQKIQSIW------QNSGLGNLA---WRNDIFDCDDFAT------216 KIJ35645 (43%) 131 GWQGAWSDTN--GQSHQQWMFDIQSR--TASEVHASINSSS------YISRDFES-YLSDGL------YLILPRQKIQSIW------QNSGLGNLA---WRNDIFDCDDFAT------216 CDO76517 (44%) 135 GWQGSWNDGT--SLGHQHWTFSPQSL--LGHEVHSILRAN-P------YLRQDFKS-YLSDGM------YLILSRARLQEIW------RNSGLRSRN---WRSEIFDCDDFSF------220 CDO76070 (43%) 25 GWKGSWDDGT--SLGHQHWTFSPQSL--LGREVHTILKAN-P------YLRQDFKS-YLSDGM------YLILSRARLQAIW------HNSGLDSRK---WRSEIFDCDDFAF------110 KIO23019 (33%) 174 PGSEE------NEGQVWDLERVSR--TTQEIRTILCEWRP------HILSRLFQP-YPDDAQ------YFLLPHELRKAIW------EREGLHQHG---SVQYAFDYDDFVT------255 KIO23020 (31%) 101 CSSWTG------GDHQLWALERVSR--TTQEIKAVLASWKS------GLLDRLFQP-YGDDAD------YFVLPYELRKSIW------EGTALQRQN---VRKHVFDYDNFVI------183 KIO23021 (31%) 166 CSPEAND------KDHQFWELERVSR--TVQEVKAMIQSWKP------DLLNRLVQP-YGDNVQ------YFVLPNKLRDTIW------DGTKLLRQP---VRTGTFDYDDFVI------249 KIO23033 (34%) 149 CSQAAEQ------MDHQLWSLDLISR--SALDIKMMLKSWKP----- DIEPRLFLS-HGDSVQ------YFVLPNQIRRDIW------KGTGLLRQP---LRPHFFDDDSFVT------232 KIO25705 (35%) 146 CSPRTAG------GDHQLWELERVSR--TAQEIKTIIGSWKP------ELLERFVQPYGDDVQ------YLVLPSQLRSTIW------EGTKLLRQT---VRKGTFDYDDFVI------229 KIO17420 (31%) 3 CSTAQSE------NDHQLWSFELVSR--TGPEITALFKSWKP------TILPQLLQLYEDSSQ------YFVLPSELRKSIW------QETNLLRQP---IRPHLFDYDDFVI------86 KIO17421 (32%) 205 CSNASQE------RDSQLWSLDRVSR--TSLEIKEVMGKWKPDLLNRLFQPYGDNDEFYFESR------YIVLPYELRKPIW------NETGLLDQA---ICRNTFDYDDFVI------294 KIO16956 (28%) 11 CSQAAVE------MDHQLWTLDRVSR--TGSETKAIFESWKP------DVGGRITQPHATQ------TELPPELRPSLW------KDAKLIERP---VRPGIFDYNDFVI------91 KIO20974 (32%) 71 GWTRDPD------KGEQLWRLERRSR--TSTEIASVFGNTPPF-----NTSEISLEN-ILADVGREWRPIEYVTLLEGFRTRIL------LDSGWIN-----KERGQHDYIDFVK------160 WP_047961396 (33%) 188 QWAWLTA------SNKKYSLFTLHPHFIKARLIKTLLKKTWP------KIQFNLANLRTVDTF------YRLVSDEKAKDIY------KKSGLENFT---YKKQVFDCDDFSL------273

MOA 221 ------AMKAAVGK--WGADSWKAN----GFAI-----FCGVMLG----VNKAGD---AAHAYNFTLTK------DHADIVFFEPQNGGYLN-----DIGY------DSYMAFY--- 293 PSL-1a (38%) 214 ------VFKGAVAK--WGNENFKAN----GFAL-----LCGLMFG----SKSSG-----AHAYNWFVERG------NFSTVTFFEPQNGTYSA----NAWDY------KAYFGLF--- 286 PSL-1b (38%) 219 ------TFKSAAAT--WGKENFKAD----GFAI-----LCGMMFG----TKAST----NRHAYNWVVERG------SFSTVTFFEPQNGTYSD----DAWGY------KAYFGLF--- 292 SCL-AGL1 (32%) 214 ------VFKGWVVS--RAQEIIKVD----GFDI-----LVGSLSL----VEKST-----QKIKVLDVSVRTSDDKSPDLSQIVFFDPESGRTLV---DIPEGY------EVNSVII--- 294 XP_002398917 (43%) 72 ------VYKAAVAN--WGYDNIRAD------88 XP_002393886 (71%) ------ZP_09568250 (33%) 268 ------VMKAAASK--QAYDDNRAA----GGAVVR-PYALGFVWG----ISGPY-----SHVANVFVDY------TGTVKILEPQSGRIVD---GKDWQYDSNHKYKPYFICL--- 349 EGN95167.1 (38%) 222 ------TSKVQLAQ--WGSDNIRAN----GLNL-----LFGVALGV-STAEGSG-----GQSCNVFLDS------DLSTVLFFDAQTGKLLD---PTAITSQN-KRQNGRLRLK--- 302 EGO20674 (37%) 249 ECIRLSITNSSSAFKAQLAQ--WGSDNILAN----GLNL-----LLGVMLGV-STEEGSG-----GQSCNFFLDS------DFSTVLFFDAQTGNLLD---PGDITT------QVTVGFF--- 336 EGO20678 (38%) 222 ------TSKVQLAQ--WGSDNIRAS----GLNL-----LFGVALGV-STAEGSG-----GQSCNVFLDS------DLSTVLFFDAQTGKLLD---PTAITSQN-KRQNGRLRLK--- 302 WP_041331075 (27%) 65 ------AVIAHIS------EKYQLN------SCGMARG----ESPGG-----YHAWNIILTE------HGPKFFEPQTGKLFE---FNEGGY------SADFVYFG-- 128 WP_072555782 (27%) 70 ------AVIAHIS------EKYQLN------SCGMARG----ESPGG-----YHAWNIILTE------HGPKFFEPQTGKLFE---FNEGGY------SADFVYFG-- 133 OBW61112 (29%) 60 ------AVIAHIS------EKYQLN------SCGMARG----ESPGG-----YHAWNVILTE------HGPKFFEPQTGKLFE---FNGSGY------SADFVYFG-- 123 KZV64829 (44%) 172 ------VYKGEVGK--WGDSQFKAT----GFGI-----LCGIMFG----SSKQG-----AHAYNWMVKSG------NHANIEFFEPQDNTFKD-----DPGY------SAYFGVY--- 243 KZV73639 (32%) 223 ------VFKAEVAK--WGDNQFKAD----GFAI-----LCGVMFS----SGRQG-----RSAYNWMINSE------DHTAIVFFEPQTNTFVN-----PP------TYFGVF--- 291 KZV65628 (42%) 222 ------VYKSEVAK--WGNSQFKAD----RFTI-----FCGVIYG----SDSQG----GRCACNWVIKD------RKIIFFDPSTGAFSD-----DAPFK---GFSPSLAIF--- 294 WP_019503841 (33%) 70 ------VLKADFAKAAYSSEKIQAP------FC------FGIVWG----NLPH------PHAINWMIND------D-KKLRFIEPQSDEIYF---PKNDQH------NIWFMYV--- 139 KDQ29914 (35%) 160 ------IFKAEVAK--WGNKTFKAD----GFGI-----LCGMMFG----SKTDNNGKEISTAYNWNLEVK------DLSKVVFYRPLDGTISY----DADGY------KAYFGLY--- 237 KII87094 (41%) 221 ------VFKAEVAK--WGAKTLRAD----GFGL-----LLGIMFG----QRGDT-----HYAANWTIDLK------DPTKLLFLEPQDGEFGD----EPWGS------KAYLGIA--- 293 ANC28063 (50%) 220 ------VMKSAIAE--WGAEKWRAD----GFAV-----FCGLMLG----TKPGA-----AHAYNFTISD------DYTKVVFFEPQTNKFMD-----DIGC------NAYLAYF--- 290 SDC03422 (30%) 276 ------AYKSKAAK----TAYLENN----LYPY-----SVGIVFG----QNAKG-----AHAANLFIDQ------WLRLQLLEPQTGAVQR---ASEWAY------TPYHIMF--- 345 WP_070091156 (33%) 276 ------AYKTKAAR----TAYLENN----LYPY-----SVGIVFG----QNAKG-----AHAANLFIDQ------WLRLQLLEPQTGAVQR---ASEWAY------TPYHIMF--- 345 WP_043215011 (29%) 181 ------AYKSKAAK----TAYSENN----LYPY-----SVGIVFG----QNAKG-----AHAANLFIDQ------WLRLQLLEPQTGAVQR---ASEWAY------TPYHIMF--- 250 WP_050705034 (35%) 216 ------VYKGQ-----ASLDAYHGNA---TYPY-----AVGWIAG----TCGDE-----AHAVNLFLDL------D-GHLKTLEPQTGEIML---AQDWPY------APYQVLI--- 285 WP_070572439 (30%) 276 ------AYKSKAAK----TAYLENN----LYPY-----SVGIVFG----QNAKG-----AHAANLFIDQ------WLRLQLLEPQSGAVQR---ASEWAY------TPYHIMF--- 345 KIM22138 (34%) 173 ------QMKGAICR--WAYDNLRAP-----VGH-----LFGVMFGFYINDKNEH----IVHAYNWTLNQ------DMTAITYFEPQDGQVST-----TSEY------TAYFGVF--- 247 KIM25353 (34%) 359 ------QMKGAMCD--WVQDNLRAP-----VGL-----LFGVMFG----ENNNG----EKHAYNWSLNQ------DLTAVTFFEPQNGLVST-----TSDY------VAYFIVY--- 429 KIJ30840 (44%) 175 ------VFKGEIAK--WGNQTFKAD----GFAM-----LCGMMFG----RQATG-----AHAYNWFVDST------DKSKIVFFEPQNGTYAD----NAWVY------AGYFGLF--- 247 KIJ39841 (43%) 217 ------VFKGEIAK--WGNQTFKADV--SGFAM-----LCGMMFG----RQATG-----AHAYNWFVDST------DKSKIVFFEPQNGTYAD----NAWGY------AGYFGLF--- 291 KIJ35645 (43%) 217 ------VFKGEIAK--WGNQTFKADV--SSFAM-----LCGMMFG----RQATG-----AHAYNWFVDST------DKSKIVFFEPQNGTYAD----NAWGY------AGYFGLF--- 291 CDO76517 (44%) 221 ------VYKAEVAK--WGDDQFKAD----DFAI-----VCGVMFG----TNAKG----EGHAYNWMIDPE------DHSSIVFFEPQNNTFMV-----NPGY------NAYFGVF--- 293 CDO76070 (43%) 111 ------VYKAEVAK--WGDDQFKAD----DFAI-----VCGVMFG----TNATQ-----GHAYNWMIDPE------DHSSIVFFEPQENTFKV-----NPGY------DAYFGVF--- 182 KIO23019 (33%) 256 ------KAKAAANS--WARYRFRVG----GYSA-----LFGVVYG----EAKKG-----PRAYNWYLAP------DMRSLVFFDAQTGKEYTQTTLEEFEF------EPTFATF--- 331 KIO23020 (31%) 184 ------KAKDAVIS--WGRDRLLVD----EFSI-----LFGVVYG----NARTG-----PKAYNWYLSS------GMHDLMFFDAQTGQEFTSAAVDAIQF------EPTFATF--- 259 KIO23021 (31%) 250 ------KVKDAINT--WARDRLRKD----GYSV-----LFGIIYG----QDRRG-----SKAYNWYLSR------DMGSLVFLDAQTNREYTTTSLDNLGF------EPTFALF--- 325 KIO23033 (34%) 233 ------RMKDAVTY--WARDRFQANI--RGYSV-----LWGMIYG----ETRKG-----PRAYNWYLSP------DLFSLVFFDAQSGKEYGLAALDSFGF------EPTLALF--- 310 KIO25705 (35%) 230 ------KAKDAVKT--WARDRFRLD----GYSV-----LFGVIYG----EALVG-----VKAYNWYLSA------DMSELVFCDAQTNTEHTALTLDGFKF------NPTFIVC--- 305 KIO17420 (31%) 87 ------RAKDAATG--WARNRFQADI--RGYSV-----LFGIIYG----KAKNG-----PRAYNWYLAA------DMFSLVFFDAQTGNEYGPAALDSFGF------EPTFALF--- 164 KIO17421 (32%) 295 ------KAKDAMNS--WARDNFRTY----GYSI-----LFGIIFG----EAKKG-----PKAYNWYLSY------DMGSLVFFDPQTGIEYT-----SAGN------TRL------361 KIO16956 (28%) 92 ------KRKEALST--RIIKG---GLHLGIRTGPRSLRSFGIIYG----EAKKG-----RKAYNWHLTH------DMCSLAFFDAQIGEYSLA-ALDKFGF------ELTFITF--- 172 KIO20974 (32%) 161 ------GTQSAVHT--RAGERIGVP----GVRV-----LFGVVEG----AIRGE-----KRAYNWALTQ------DLCSVIFLDPQTGREYT---ARGLEL------KGFSPTSVSC 236 WP_047961396 (33%) 274 ------VYKAQACK--EAYSDKEIY----GYAI------GIILG----NTKKT-----AHATNIYITP------ELKVKIIEPQTGEITD---AKDWEY------KPNYILI--- 343

Figure S3. Promals3D6 mixed structure- and sequence-based alignment of MOA against known and proposed homologs. All the proposed members are the results of a BLAST search using the

C-terminal domain sequence of MOA (residues 157-293). Each entry is marked with the percentage identity to the MOA sequence. The list includes both full and partial sequences (e.g.

MPER proteins). Legend: yellow: components of the catalytic triad; cyan: residues involved in metal binding; green: oxyanion hole; magenta: gating tyrosine; blue: residues lining the S2 binding pocket; grey: residues involved in substrate coordination.

5

Supplementary References

(1) Grahn, E., Askarieh, G., Holmner, Å., Tateno, H., Winter, H. C., Goldstein, I. J., and

Krengel, U. (2007) Crystal structure of the Marasmius oreades mushroom lectin in complex with a xenotransplantation epitope, J. Mol. Biol. 369, 710-721.

(2) Kadirvelraj, R., Grant, O. C., Goldstein, I. J., Winter, H. C., Tateno, H., Fadda, E., and

Woods, R. J. (2011) Structure and binding analysis of Polyporus squamosus lectin in complex with the Neu5Aca2-6Galb1-4GlcNAc human-type influenza receptor, Glycobiology 21, 973-984.

(3) Pickersgill, R. W., Harris, G. W., and Garman, E. (1992) Structure of monoclinic papain at

1.60 Å resolution, Acta Crystallogr. B Struct. Sci. 48, 59-67.

(4) Heinemann, U., Pal, G. P., Hilgenfeld, R., and Saenger, W. (1982) Crystal and molecular structure of the sulfhydryl protease calotropin DI at 3.2 Å resolution, J. Mol. Biol. 161, 591-606.

(5) Kamphuis, I. G., Drenth, J., and Baker, E. N. (1985) Thiol proteases. Comparative studies based on the high-resolution structures of papain and actinidin, and on amino acid sequence information for cathepsins B and H, and stem bromelain, J. Mol. Biol. 182, 317-329.

(6) Pei, J., Kim, B. H., and Grishin, N. V. (2008) PROMALS3D: a tool for multiple protein sequence and structure alignments, Nucleic Acids Res. 36, 2295-2300.

6