Recognition of protein-linked as a determinant PNAS PLUS of peptidase activity

Ilit Noacha, Elizabeth Ficko-Bleana,1, Benjamin Pluvinagea, Christopher Stuarta, Meredith L. Jenkinsa, Denis Brochub, Nakita Buenbrazoc, Warren Wakarchukc, John E. Burkea, Michel Gilbertb, and Alisdair B. Borastona,2

aDepartment of Biochemistry and Microbiology, University of Victoria, Victoria, BC V8W 3P6, Canada; bHuman Health Therapeutics, National Research Council Canada, Ottawa, ON K1A 0R6, Canada; and cDepartment of Chemistry and Biology, Ryerson University, Toronto, ON M5B 2K3, Canada

Edited by Chi-Huey Wong, Academia Sinica, Taipei, Taiwan, and approved December 20, 2016 (received for review September 9, 2016) The vast majority of proteins are posttranslationally altered, with the by mass, and the glycans tend to be very heterogeneous by virtue of addition of covalently linked () being one of the differing extensions and branching of the core structures (6). The most abundant modifications. However, despite the hydrolysis of abundance and frequency of posttranslational modification to pro- protein peptide bonds by peptidases being a process essential to all teins, particularly large additions like glycans, suggests a need for life on Earth, the fundamental details of how peptidases accommo- peptidases to accommodate more than just the peptide component date posttranslational modifications, including glycosylation, has not of the substrate. Indeed, there is evidence that some peptidases are been addressed. Through biochemical analyses and X-ray crystallo- selective for heavily O-glycosylated proteins; however, the molecular graphic structures we show that to hydrolyze their substrates, three basis of how these recognize substrates is entirely unknown. structurally related metallopeptidases require the specific recognition Peptidases with known selectivity for heavily O-glycosylated of O-linked modifications via -specific subsites proteins include O-sialylglycoprotease from Mannheimia haemo- immediately adjacent to their peptidase catalytic machinery. The lytica (formerly classified into the deleted MEROPS family M22) three peptidases showed selectivity for different glycans, revealing (7), StcE from Escherichia coli (family M66) (8), viral enhancins protein-specific adaptations to particular glycan modifications, yet (family M60) (9), and some protease autotransporters always cleaved the peptide bond immediately preceding the produced by enteric pathogens (SPATEs proteins; family S6) (10). glycosylated residue. This insight builds upon the paradigm of how peptidases recognize substrates and provides a molecular un- Similarly, BT4244 from thetaiotaomicron (11), IMPa BIOCHEMISTRY derstanding of degradation. from Pseudomonas aeruginosa (12), and SslE from E. coli (13) also display selectivity for glycosylated proteins. These three latter O-glycopeptidase | metallopeptidase | glycoprotein | O-glycosylation extracellular peptidases are members of the recently defined Pfam family PF13402 (also annotated as “peptidase M60, enhancin, and enhancin-like” or “M60-like”) (11). This family, which is in part eptidases catalyze the hydrolysis of the peptide bonds in Pproteins and, by this activity, play fundamental roles in an defined by the possession of a conserved HEXXH(8,24)E glu- enormous variety of critical biological processes across all of the zincin metallopeptidase motif (14), is distributed widely among taxonomic kingdoms. These enzymes can be classified by their and and was proposed by Nakjang et al. to catalytic mechanisms into seven groupings: serine (S), (C), (T), aspartic (A), glutamic (G), (N), metallo Significance (M), or asparagine lyase (N) catalytic types (1). Each grouping of catalytic types has been broken into a robust classification of >240 Protein glycosylation is one of the most abundant and important families, which are based on amino acid sequence identity, posttranslational modifications where the protein-linked glycans whereas the categorization of >50 clans reflects the structural can impart specific physiochemical properties to the glycoprotein similarities between the families (https://merops.sanger.ac.uk)(1). and/or the glycans themselves can mediate particular biological This classification contains >0.9 million putative peptidase se- functions. The degradation of glycosylated proteins in normal or quences but only <0.5% of these have been experimentally veri- pathogenic processes, therefore, is an important biological pro- fied, suggesting that we are far from a complete understanding of cess. This study reveals the molecular basis of how peptidases how this vast sequence space translates into variable substrate can use the O-glycans present on as a critical de- specificities and biological outcomes. terminant of peptidase activity and, in doing so, provides unique With billions of sequences for known or putative proteins in the insight into how peptidases may directly use posttranslational protein databases, the variety of possible peptidase substrates is vast. modifications present on their substrates to influence recogni- For example, the human genome encodes for ∼600 peptidases, yet tion and peptide bond cleavage. over 17,000 genes are known to be transcribed and translated into unique protein products that are possible substrates for the encoded Author contributions: I.N., E.F.-B., and A.B.B. designed research; I.N., E.F.-B., B.P., C.S., M.L.J., D.B., N.B., W.W., J.E.B., M.G., and A.B.B. performed research; D.B., N.B., W.W., and M.G. peptidases (2, 3). This highlights one of the major challenges in contributed new reagents/analytic tools; I.N., E.F.-B., B.P., M.L.J., J.E.B., and A.B.B. ana- characterizing peptidase specificity and identifying specific substrates lyzed data; and I.N. and A.B.B. wrote the paper. (4). Furthermore, the diversity in substrates is substantially increased The authors declare no conflict of interest. by posttranslational modifications. Indeed, upwards of 50% of pro- This article is a PNAS Direct Submission. teins are thought to be posttranslationally modified with glycosyla- Data deposition: The atomic coordinates and structure factors have been deposited in the tion being one of the most abundant alterations (5). Typically, Protein Data Bank, www.pdb.org (PDB ID codes 5KD2 for BT4244_E575Q_l; 5KD5 for protein glycosylation constitutes glycans attached to asparagine side BT4244_m; 5KD8 for BT4244-TnAg; 5KDJ for ZmpB_l; 5KDN for ZmpB_m; 5KDS for chains (N linked) or serine/threonine side chains (O linked). The ZmpB-BSMfrg; 5KDU for ZmpB-STAg; 5KDV for IMPa-6His; 5KDW for IMPa; and 5KDX for IMPa-TAg). structures of glycans attached to a given site can be variable, giving 1Present address: Sorbonne Universités, UPMC Université Paris 06, UMR 8227, Integrative glycoforms of a protein. For example, there are eight distinct core Biology of Marine Models, Station Biologique de Roscoff, CS 90074, 29688 Roscoff O-glycan structures, all covalently α-linked via an N-acetylgalactos- CEDEX, France. amine (GalNAc) moiety to the hydroxyl group of serine or threonine 2To whom correspondence should be addressed. Email: [email protected]. side chains (6). Furthermore, the degree of glycosylation is protein This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. dependent, but the average protein is ∼50% O-linked glycans 1073/pnas.1615141114/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1615141114 PNAS | Published online January 17, 2017 | E679–E688 Downloaded by guest on September 30, 2021 comprise glycoprotein-specific peptidases (11). Although the Pfam Results classification of these three peptidases indicates an evolutionary re- In Vitro Peptidase Activity. Our target proteins possessed complex lationship between them, BT4244, IMPa, and SslE are classified into multimodularity (SI Appendix,Fig.S1); thus, to advance their the separate MEROPS families M60, M88, and M98, respectively. study, gene fragments encoding the M60-like domains of BT4244 Through their ability to cleave O-glycoproteins, many of these and ZmpB (referred to as BT4244_m and ZmpB_m, respectively) peptidases have important roles in cell biology (10, 12, 15), have were expressed in E. coli. IMPa was similarly produced but as a been implicated in bacterial pathogenesis (16–18), or have found full-length protein. Treatment of highly O-glycosylated bovine use as reagents (7). Despite the functional significance of O-gly- submaxillary mucin (BSM) with BT4244_m resulted in a clear coprotein degrading peptidases, however, their mode of action change in the electrophoretic mobility of BSM, whereas IMPa and on O-glycoproteins is unidentified. To obtain insight into how ZmpB_m gave nearly complete clearing of the BSM (Fig. 1A). We peptidases select for glycosylated substrates, we focused our also used a microtiter plate-based mucinase assay using immobi- studies on the model examples BT4244, IMPa, and ZmpB (locus lized biotinylated BSM as a substrate and all three peptidases also tag CPF_1489) from Clostridium perfringens (strain ATCC 13124), displayed activity in this assay (SI Appendix,Fig.S2). These results were consistent with the reported activity of BT4244 and IMPa on the latter of which is a member of the Pfam M60-like family but is glycoprotein substrates (11, 12), while revealing comparable not classified into a MEROPS family. Biochemical analysis of properties for ZmpB_m. In all cases, EDTA treatment of the their peptidase activity revealed the absolute requirement of a enzymes abolished activity, supporting their assignment as metal- glycosylated substrate, whereas the first structural studies of these lopeptidases, as did mutation of the putative catalytic glutamate in enzymes by X-ray crystallography in complex with O-glycosylated the gluzincin motif (Fig. 1A and SI Appendix, Fig. S2). When amino acids and an O- product of catalysis uncovered bovine fetuin, also a glycosylated protein, was used as a substrate, specific carbohydrate-binding subsites immediately adjacent to the only BT4244_m displayed no obvious effect on the substrate, catalytic machinery, thus providing unprecedented insight into the suggesting some differential selectivity among the three enzymes molecular details of how the posttranslational modification of a (Fig. 1A). When chemically desialylated bovine fetuin (asialofe- peptidase substrate can determine specificity. tuin) was used as a substrate, only IMPa retained activity on the

A BT4244_m ZmpB_m IMPa B F15-S2,3TAg Wt + + + + + + + + + F15-TAg EDTA + + + 2,3 EA_mut + + + + + + F15-TnAg Substrate + + ++ + + ++ + + ++ 1,3 1,3 MW F15 1- 1- 1- 250 GAEAEAPSAVPDAAG GAEAEAPSAVPDAAG GAEAEAPSAVPDAAG GAEAEAPSAVPDAAG 150 100

75

50

IMPa IMPa IMPa IMPa ZmpB_m ZmpB_m ZmpB_m ZmpB_m 100 BT4244_m BT4244_m BT4244_m BT4244_m

75 C BDPF15-S2,6TAg D 50 MUC-1(Tn) 2,6 1,3 37 MUC-1 1- 1- 1- BDP-GAEAEAPSAVPDAAG PAPGSTAPPAHGVTSAPDTRPAPG PAPGSTAPPAHGVTSAPDTRPAPG

100 75

50

37

IMPa IMPa IMPa ZmpB_m ZmpB_m ZmpB_m BT4244_m BT4244_m BT4244_m

Fig. 1. In vitro peptidase activity. (A) SDS/PAGE visualized by periodic acid-Schiff stain. BSM (Upper), Fetuin (Middle), and Asialofetuin (Bottom). Native peptidase is denoted as Wt and peptidase with mutation of the putative catalytic glutamate to alanine as EA_mut. (B and C) TLC analysis of the synthetic peptides and O- treated with or without (−) each of the peptidases. In each panel the sequence and glycan structure present in the peptides is indicated. Yellow squares and circles represent GalNAc and , respectively, and magenta diamonds represent Neu5Ac. The appropriate linkages are

depicted. Reactions were separated using a silica gel TLC plate in a solvent comprising butanol:acetic acid:H2O (45:35:30, v:v:v). TLC plates in B and D were developed with ninhydrin. Due to limiting quantities of the BODIPY-labeled glycopeptide, BDPF15-S2,6TAg, a concentration too low to develop with nin- hydrin had to be used; however, the BODIPY label was visible to the eye and therefore the TLC in D did not undergo any development procedure.

E680 | www.pnas.org/cgi/doi/10.1073/pnas.1615141114 Noach et al. Downloaded by guest on September 30, 2021 substrate. Furthermore, the peptidases displayed different abilities IMPa and ZmpB_m resulted in the same set of two product peaks PNAS PLUS to cleave a variety of additional glycoproteins, which was also foreachenzyme,whichagaingaveaccurate masses within 0.02 Da abolished by the addition of EDTA into the reactions (SI Ap- of those predicted for cleavage of the peptide bond immediately pendix,Fig.S3). This observation of differential activity of glyco- preceding the glycosylated amino acid (Fig. 2C). Finally, the hy- proteins points to further selectivity among the enzymes for drolysis of MUC-1(Tn) by BT4244_m resulted in three product different glycoprotein substrates. Most notably, the lack of peaks, the masses of which again indicate cleavage of the peptide ZmpB_m activity on asialofetuin versus fetuin suggested a de- bonds immediately preceding the glycosylated threonine residues pendence of its activity on the presence of , which led us (before residues T6 and T19). In contrast, the hydrolysis of MUC-1(Tn) to postulate that glycans are a critical feature in the recognition of by IMPa yielded fragments indicating cleavage of the peptide substrate by this enzyme and possibly BT4244_m and IMPa. bond preceding the first glycosylated threonine residue (before residue T6) but no cleavage at the second glycosylation site. The Peptidase Activity on Defined Synthetic O-Glycopeptides. Given the single and double cleavage of MUC-1(Tn) by IMPa and BT4244_m, ability of these enzymes to degrade highly O-glycosylated proteins, respectively, are generally consistent with the release of the different we examined their potential O-glycan dependence by assessing ac- products visualized by TLC. tivity on defined chemoenzymatically prepared O-glycopeptide sub- These results therefore indicate that the ability of the three pep- strates. A 15mer synthetic peptide (F15: GAEAEAPSAVPDAAG) tidases to hydrolyze the O-glycopeptides is carbohydrate dependent. containing the S271 O-glycosylation site in fetuin was sequentially Furthermore, the peptidases showed differential requirements for gly- glycosylated using specific to install an α-linked cosylation. BT4244_m consistently cleaved only the O-GalNAcylated GalNAc on the serine (also called Tn-antigen; glycopeptide re- peptides, but did not tolerate extended sugars. In contrast, IMPa ferred to as F15-TnAg), then extend the to the required at least a GalNAc but tolerated the various extended gly- core-1 Galβ1–3GalNAc (T-antigen; F15-TAg), and finally to modify cans. ZmpB_m appears to be the most specific of the three pepti- the disaccharide to the α-2,3-Neu5Ac monosialylated core-1 dases by having an absolute requirement for the α-2,6-Neu5Ac O-glycopeptide (F15-S2,3TAg). As determined by TLC, none of present on the peptide-linked GalNAc, likely explaining the sialic the three enzymes were able to cleave the unglycosylated F15 acid dependence of this enzyme’s activity on fetuin, which com- peptide (Fig. 1B). ZmpB_m had no apparent affect on any gly- monly bears this sialic acid modification on its O-glycans (19). cosylated peptide. In contrast, treatment of the three glycosy- lated peptides with IMPa resulted in complete conversion of the Mammalian Glycan Array Screening. Using mammalian glycan arrays

peptides into two new products, whereas BT4244_m treatment to explore the potential for selective glycan binding by ZmpB_m, BIOCHEMISTRY resulted in partial conversion of F15-TnAg into the same BT4244_m, and IMPa resulted in suggestive but inconclusive re- products observed from IMPa activity on this O-glycopeptide sults for ZmpB_m and BT4244_m, which was likely due to low- (Fig. 1B). affinity interactions with the array. However, IMPa screening Given the inactivity of ZmpB_m on F15-S2,3TAg, yet its ap- resulted in identifying 31 glycans that displayed a clear interaction parent dependence on sialic acid for activity on fetuin, we also with IMPa when using a minimum threshold of 10% of the max- generated BDPF15-S2,6TAg, which is the F15 peptide bearing imum signal (SI Appendix,Fig.S5). Notably, the hits were all gly- Galβ1–3(Neu5Acα2–6)GalNAc glycan and labeled with BODIPY cans built on core-2 [Galβ1–3(GlcNAcβ1–6)GalNAc-Thr], core-4 at the N terminus. Due to limiting amounts of this peptide, we were [GlcNAcβ1–3(GlcNAcβ1–6)GalNAc-Thr], or core-6 [GlcNAcβ1– unable to use ninhydrin to visualize all of the peptide fragments 6GalNAc-Thr] O-glycan motifs and the glycans were all O-linked generated by the peptidases; however, the BODIPY label was vis- to threonine residues (Note: the glycoamino acid O-glycan ligands ible to the eye, allowing facile visualization of peptide fragments contained only threonine and were not present as the serine ver- possessing this label. Both ZmpB_m and IMPa were able to convert sions). Although IMPa was clearly able to hydrolyze a minimally BDPF15-S2,6TAg peptide to a new product, indicating activity O-glycosylated substrate with only the Tn- or TAg-glycans, the on this peptide, whereas the peptide treated with BT4244_m enzyme appeared to have the highest affinity when this core remained unchanged (Fig. 1C). Finally, we assessed the activity of modification is extended with a β-1,6-linked GlcNAc (GlcNAcβ1– these enzymes on the Tn-modifed MUC-1 peptide [MUC-1(Tn): 6GalNAcα1-Thr). Furthermore, the strict recognition of glycosy- PAPGS(GalNAcα1-)TAPPAHGV(GalNAcα1-)TSAPDTRPAPG]. lated threonine indicates the necessity of the amino acid component None of the enzymes had any activity on the nonglycosylated for optimal affinity. peptide (Fig. 1D). MUC-1(Tn) displayed no mobility on the TLC plates under the solvent conditions used, and this was unchanged X-Ray Crystallographic Analysis of the Peptidase Structures. To ex- after treatment of the peptide with ZmpB_m (Fig. 1D). Treatment amine the molecular basis of substrate selectivity in these enzymes, of MUC-1(Tn) with IMPa, however, generated a single new and we determined the high-resolution crystal structures of BT4244_m, resolvable product, indicating activity on this peptide. The activity ZmpB_m, and IMPa (SI Appendix,TablesS1–S3 for all X-ray data of BT4244_m resulted in two new products that had mobilities collection and model statistics). BT4244_m and ZmpB_m had very distinct from the product produced by IMPa (Fig. 1D). similar architectures comprising two Ig-like fold (Ig) domains sep- To interrogate the point of cleavage on these structurally de- arated by a central all-α catalytic domain (Fig. 3 A and B). This fined and homogeneous peptides, we used liquid central domain houses the gluzincin motif that forms the charac- (LC)-MS to examine the specific cleavage of F15-TnAg, BDPF15- teristic metallopeptidase catalytic machinery containing the cata- S2,6TAg, and MUC-1(Tn) (Fig. 2A). Consistent with the TLC lytic glutamate and metal binding site (Fig. 3 A and B) and is similar results, ZmpB_m showed no activity on F15-TnAg or MUC-1 to the core fold of the gluzincin representative enzyme, thermolysin (Tn), whereas BT4244_m had no activity on BDPF15-S2,6TAg (SI Appendix,Fig.S6). The core catalytic domain of IMPa has the (Fig. 2B). In contrast, treatment of F15-TnAg with IMPa or same general characteristics as BT4244_m and ZmpB_m; however, BT4244_m revealed the same set of two product peaks for both IMPa lacks the C-terminal Ig domain, whereas the N terminus has enzymes. Their masses were within 0.01 Da of the masses pre- an α/β-fold domain linked through an α-helix bundle to the Ig dicted for the peptide fragments GAEAEAP7 and GalNAcα1- domain that precedes the catalytic domain (Fig. 3C). S8AVPDAAG (Fig. 2C), clearly revealing the identity of these peptides and the cleavage point at the peptide bond immediately Structures of the Peptidases in Complex with O-Glycan Ligands. Us- preceding the glycosylated residue S8. Using LC-tandem MS (LC- ing the knowledge that BT4244_m appeared selective for the Tn- MS/MS) we confirmed the identity of these cleavage products (SI antigen and that IMPa had broader specificity, we were able to Appendix,Fig.S4). Similarly, treatment of BDPF15-S2,6TAg with determine the structures of BT4244_m and IMPa in complex

Noach et al. PNAS | Published online January 17, 2017 | E681 Downloaded by guest on September 30, 2021 A

B

C

Fig. 2. elution profile of glycosylated peptide and cleavage products. (A) Sequences of peptides analyzed in the study, and the predicted exact masses of peptides resulting from peptidase cleavage. The peptides are labeled. (B) Extracted ion traces of different peptides [Left to Right: F15-TnAg, BDPF15-S2,6TAg, MUC-1(Tn)] for samples where the peptidases lacked activity on the peptide. (C) Extracted ion traces of different peptides [Left to Right: F15-TnAg, BDPF15-S2,6TAg, MUC-1(Tn)] for samples on which the peptidases displayed activity on the peptide. In B and C the starting material is indicated as a solid line (denoted as S), and all fragment traces are shown as a dotted line, with the indicated fragment labeled according to A. Experimental masses for each peptide are shown for every experiment. The identity of the hydrolases used are colored according to the legend. The identity of the peptides and digest products were confirmed by tandem MS/MS for F15-TnAg (SI Appendix, Fig. S4) and by exact mass measurements for BDPF15-S2,6TAg and MUC-1(Tn).

with serinyl Tn-antigen (GalNAcα1-Ser; BT4244_m-TnAg) and of the GalNAc residue, it provides little insight into the apparent serinyl T-antigen (Galβ1–3GalNAcα1-Ser; IMPa-TAg), respectively. inability of this protein to recognize the T-antigen; however, its Both compounds could be easily modeled into electron density seemingly strict requirement for the Tn-antigen likely explains the maps, although the terminal galactose in the serinyl T-antigen bound qualitatively limited activity of BT4244_m on BSM and fetuin, to IMPa displayed some disorder as shown by the partial electron where the Tn-antigen is comparatively rare or nonexistent, re- density for this (SI Appendix,Fig.S7A and B). These com- spectively (19, 20). plexes revealed a pattern of specific interactions between the GalNAc The determination of the ZmpB_m structure after crystals of residues and active site amino acids that was very similar for the two the protein were soaked in a solution of BSM yielded a complex enzymes (Fig. 4 A and B). The additional galactose residue on the with a product of catalysis (ZmpB_m-BSMfrg). A ligand in the T-antigen of the IMPa complex was oriented into solvent and made active site with clear electron density was readily identifiable no interactions with the protein (Fig. 4B). Although a phosphate as an O-glycopeptide that was modeled as the trisaccharide molecule was fortuitously trapped in the IMPa complex, it was not GlcNAcβ1–3(Neu5Acα2–6)GalNAcα1(α2,6-sialylated core-3 positioned appropriately to participate in substrate recognition. A O-glycan) linked to the threonine residue of a pentapeptide with key difference in the active site architectures of the two enzymes the proposed sequence TAPGG (SI Appendix, Fig. S7D). The was the presence and positioning of Y538 and D567 in BT4244_m, positioning of the glycosylated N-terminal threonine in this whereas these residues are absent in IMPa. Y538 and D567 of complex was consistent with both the anticipated hydrolytic mech- BT4244_m close in on the C6-hydroxyl of the GalNAc residue and anism of a metallopeptidase and our observation that all of the hydrogen bond with it, thus sterically blocking the ability of the enzymes hydrolyzed the peptide bond immediately preceding the protein to recognize substituents linked to O6 of the GalNAc (Fig. glycosylated residue (Fig. 5A). Apart from hydrogen bonding be- 4 A and C). In contrast, this region of the active site is open in IMPa tween the peptidase active site and the ligand threonine residue and contains a tryptophan sidechain (W685; Fig. 4 B and D)that that bears the glycan, the interface between ZmpB_m and the is ideally positioned to make potential CH–π interactions with a pentapeptide portion of the ligand suggested little to no speci- GlcNAc residue β-1,6-linked to the core GalNAc (SI Appendix,Fig. ficity for the peptide (Fig. 5A). Indeed, this also revealed ample S7C). Although the structure of the Tn-antigen BT4244_m complex space in the active site to accommodate the methyl group of the suggests the protein cannot accommodate a modification on the O6 glycosylated threonine, which also appears to be the case with

E682 | www.pnas.org/cgi/doi/10.1073/pnas.1615141114 Noach et al. Downloaded by guest on September 30, 2021 PNAS PLUS A ! BT4244_m B ZmpB_m! C IMPa 322 857 496 1002 42 924 IG C IG IG C IG / H IG C

~85 Å ~90 Å ~95 Å

H574 H756 H696

E771 Zn E716 E591 Zn E575 Zn E757 E697

H578 H760 H700

Fig. 3. Peptidase structures. Domain boundaries and secondary structure cartoon representation of BT4244_m (A), ZmpB_m (B), and IMPa (C). The catalytic central all-α fold domain (C) is illustrated with the active site helix colored in red. Ig-like domains (IG) and additional N-terminal α/β-fold domain and helix bundle linker (H) in IMPa are also depicted. Insets show the catalytic site of each peptidase with the gluzincin motif residues as sticks, the Zn ion as a gold

sphere, and metal coordination bonds as black dashes. BIOCHEMISTRY

BT4244_m and IMPa, in particular suggesting that the enzymes do amino acids and all three sugar residues (Fig. 5B). A similar mode not discriminate between serine or threonine at this position. of recognition was observed in the subsequently determined structure The pattern of interactions between ZmpB_m and the ligand of ZmpB_m in complex with the α2,6-sialylated core-1 O-glycan, were sugar specific, including hydrogen bonds between active site Galβ1–3(Neu5Acα2–6)GalNAcα1-Ser (ZmpB_m-STAg; Fig. 5C

A B

Y538 Gal

W685 H574 GalNAc GalNAc H696 PO4

G571 Q720 G693 N595 W692 W570 D567 R742 D739 R611

C D

Fig. 4. BT4244_m-TnAg and IMPa–TAg interfaces. The glycan-binding site of (A) BT4244_m-TnAg and Y538 (B) IMPa-TAg (molecule A), with peptidase residues shown as gray sticks, GalNAc and galactose as yellow W685 sticks, the glycosylated threonine as orange lines,

phosphate (PO4) as salmon lines, water molecules as red spheres, and hydrogen bonds as black dashes. The tryptophan side chain putatively involved in IMPa-glycan recognition is shown in B as gray lines. D567 (C and D) The corresponding electrostatic potential surface representations of A and B, respectively, are colored from −8 kT/e (red, negative) to +8 kT/e (blue, positive).

Noach et al. PNAS | Published online January 17, 2017 | E683 Downloaded by guest on September 30, 2021 AB G

Sia G GlcNAc

GlcNAc P F727

A F727 GalNAc GalNAc Sia

T822 T H756 M728 G753 Q794

E757 R790 E771 H756 V754 W752 N749 N775 H760 C D

F727 Gal

Sia GalNAc N741 T822 EG G753

R790 H756 V754 W752 N749 N775

Fig. 5. The peptidase–substrate interface of ZmpB_m. ZmpB_m-BSMfrg complex structure demonstrating the interface with (A) the pentapeptide (TAPGG) fragment as orange sticks and (B) the trisaccharide glycan moiety [yellow for GalNAc, blue for GlcNAc, and magenta for Neu5Ac (Sia)]. (C) The glycan binding interface of ZmpB_m-STAg complex structure. (D) The corresponding electrostatic potential surface representation of B, are colored from −8 kT/e (red, negative) to +8 kT/e (blue, positive). The cryoprotectant ethylene glycol (EG) is depicted as salmon lines and other color schemes are as in Fig. 4.

and SI Appendix,Fig.S7E). In this complex, however, the β1,3-linked MEROPS metallopeptidase clan MA. Remarkably, these enzymes galactose of the ligand results in only a single hydrogen bond between rely on the recognition of O-linked glycans present on the protein the galactose and the enzyme active site that is mediated by an substrate. The O-glycan modificationsareaccommodatedinspecific ethylene glycol molecule, which likely displaced a water molecule subsites present in the catalytic site such that the peptide bond tar- during the cryoprotection procedure and acts as a surrogate for this geted for hydrolysis immediately precedes the site of glycosylation. water. In contrast, in the ZmpB_m-BSMfrg complex the presence of BT4244, ZmpB, and IMPa have a primary GalNAcα1-Ser/Thr the acetamido group of the β1,3-linked GlcNAc in the bound ligand binding subsite that is largely conserved among the enzymes (Fig. enables one direct and one water-mediated hydrogen bond between 6A) and that we consider consistent with the fact that the majority thisresidueandtheenzymeactivesite.ThissuggeststhatZmpB_m of O-glycan modifications possess the same core GalNAc residue may favor β1,3-linked GlcNAc over β1,3-linked galactose in this that is α-linked to a serine or threonine residue. Indeed, the active position. Both structures revealed the sialic acid to be bound in a site of BT4244 appears equipped to recognize only this GalNAc pocket with a notably basic surface charge that is complementary to residue. In contrast, the catalytic site of ZmpB possesses this the acidic carboxylate on the sialic acid (Fig. 5D). In contrast, the primary GalNAc binding site and at least two additional subsites analogous pocket in IMPa, which is near W685, possesses a relatively that accommodate the α2,6-Neu5Ac linked to the core GalNAc, neutral or even slightly acidic surface charge (Fig. 4D). Additional which we postulate is a key recognition determinant in this en- specificity for the α2,6-linked sialic acid is provided by a series of four zyme, and another subsite that binds the β1,3-linked galactose or direct hydrogen bonds between the glycerol sidechain of the sugar GlcNAc found in core-1 and core-3 O-glycans, respectively. IMPa and residues in the enzyme active site. Thus, the structural role of has a catalytic site that also accepts branched O-glycans. The Neu5Acα2–6GalNAc in substrate recognition by ZmpB_m is con- structure of IMPa in complex with the serinyl T-antigen indicated sistent with the sialic acid dependence of its activity on fetuin, which no specific interactions between the β1,3-galactose and the en- is abundant with this modification, and its activity on the BDPF15- zyme, suggesting this modification is more likely tolerated than a S2,6TAg, but lack of activity on the other tested O-glycopeptides. key determinant of recognition, although it is possible that, like in ZmpB, the acetamido group of the β1,3-linked GlcNAc may af- Discussion ford additional interactions with the enzyme. Contrary to this The M60-like domains of BT4244, ZmpB, and IMPa display low proposal, however, the microarray binding studies revealed that pairwise amino acid sequence identities that vary between 12% and IMPa accommodates both β1,3-linked galactose and GlcNAc, but 19%, consistent with their classification into different MEROPS suggested no preference between them. A clear pattern from the families or, in the case of ZmpB, lacking a classification. Despite the microarray results was that every glycan that IMPa interacted with divergent sequences, yet consistent with their larger classification possessed a β1,6-GlcNAc linked to the GalNAc residue, implying into the Pfam M60-like family, the M60-like catalytic domains of the importance of this moiety for optimal enzyme affinity. Based these proteins all share the thermolysin-like fold, complete with the on the structure of IMPa, we postulate a second subsite that metal binding site housing the catalytic zinc atom, found in the is defined largely by the apolar face of W685 and that binds

E684 | www.pnas.org/cgi/doi/10.1073/pnas.1615141114 Noach et al. Downloaded by guest on September 30, 2021 AB PNAS PLUS

Fig. 6. Comparison of the peptidase glycan-binding subsites. Superposition of BT4244_m-TnAg (green), IMPa-TAg (magenta), and ZmpB_m-STAg (cyan) active sites, demonstrating focusing on (A) the conserved primary GalNAc glycan-binding subsite and (B) the lack of conserved binding residues in the ad- ditional glycan binding subsites. A GalNAcα1-Ser molecule, positioned as in the ZmpB_m-STAg structure, is shown for reference and depicted as yellow and orange lines, respectively. The conserved gluzincin motif is shown as gray sticks with the Zn ion in gold sphere.

β1,6-linked GlcNAc (Fig. 4D and SI Appendix, Fig. S7C); occu- chemical diversity in the resulting glycans, which in turn impart pation of this subsite would likely translate into increased affinity to the protein important physiochemical properties and/or bi- for the substrate and more efficient enzyme activity. Overall, ological functions. Thus, the amino acids to which O-glycans are therefore, these peptidases share a common scaffold that contains attached are considered more than just modified amino acids. the catalytic machinery and a generally conserved primary Gal- Here we have shown that the diversity of O-glycans can be NAc-specific glycan-binding subsite. However, differential recog- mirrored in the glycan-binding subsites of M60-like peptidases, nition of posttranslational O-glycan modifications is imparted by where each subsite recognizes a unique building block of the the presence (or lack) of additional glycan-binding subsites that O-glycan. As the interaction of the glycan subsites with substrate BIOCHEMISTRY are not conserved among the enzymes (Fig. 6B) and serve to bind are a critical component of substrate recognition by this class of specific extensions to the core GalNAc residue, thus providing peptidases, which we refer to as O-glycopeptidases, the glycan- specificity to the enzymes. specific subsites bear explicit consideration. Therefore, in the The paradigm of peptidase–substrate recognition is based on case of O-glycopeptidases, we promote an update to the para- the model of Schechter and Berger (21), which defines specific digm of peptidase–substrate recognition to include what we refer recognition of amino acids of the substrate (P and P′ residues) by to as “G sites” that accommodate the O-glycan (Fig. 7A). This subsites in the enzyme active site (S and S′ subsites) where the model incorporates the potential presence of specific peptide scissile bond of the substrate between P1 and P1′ spans the S1 recognition elements (S and S′ subsites) with the G subsites that and S1′ subsites (3). This paradigm, although elegant and simple, can recognize a simple O-linked sugar modification, as in the does not adequately apply to the glycoprotein-specific peptidases case of BT4244 (Fig. 7B), or more complex glycans, as in the studied here. Like the peptide backbone, O-glycans can be cases of ZmpB (Fig. 7C) and IMPa (Fig. 7D). composed of multiple different building blocks but, unlike pep- The importance of the substrate’s amino acid sequence to the tides, the monosaccharide building blocks can be linked in activity of BT4244, ZmpB, and IMPa remains unclear. However, multiple different ways to provide substantial structural and other than the frequent proximity of a to the site of

AB

CD

Fig. 7. Proposed subsite nomenclature for O-glycopeptidase substrate recognition. (A) General model for O-glycopeptidase substrate recognition presenting the common subsite paradigm for peptidases with the substrate residues (P) and the corresponding peptidase subsites (S) numbered outwards from the cleaved bond, which is indicated by a red triangle. The proposed glycan sites are depicted as G, with G1′ as GalNAc, which can be branched via its O6 (G2′)or O3 (G2′′). Specific representations using this model to show the subsites of (B) BT_4244, (C) ZmpB, and (D) IMPa. Question marks represent the unknown presence of S subsites in the enzyme and P residues in the substrate.

Noach et al. PNAS | Published online January 17, 2017 | E685 Downloaded by guest on September 30, 2021 glycosylation we note that there is a lack of an unambiguous served for the three M60-like O-glycopeptidase members studied consensus motif for O-glycosylation sites (22). Whereas this lack here, lends itself to the idea of using such enzymes as unique does not preclude preference of a specific amino acid sequence molecular tools to study glycoprotein structure. Furthermore, in the substrate of these peptidases, their reliance on the pres- O-glycopeptidases may find application as glycopeptiligases where ence of glycosylation combined with recognition of a specific their native specificity for O-glycosylated peptides could be har- amino acid sequence would likely result in ultrarare cleavage nessed to ligate peptides and glycopeptides to generate designed sites, which given the strong depolymerizing activity of these glycosylated products without the need to significantly engineer enzymes on BSM seems unlikely. Some support for this argument the active site environment of the peptidase (24, 25). is given by the limited interaction observed between ZmpB_m and The mechanism we have described by which these three M60-like the peptide portion of a bound O-glycopeptide, which indicated peptidases recognize substrate reveals the intimate utilization of only an S1′ subsite that accommodated the glycosylated residue posttranslationally added glycans as a substrate recognition de- and no clearly definable additional S′ subsites (Fig. 5A). The terminant. The participation of posttranslational modification in presence of any S subsites is unknown, as these were not occupied protein degradation is not unusual with the most common being the in any of the crystal structures we obtained. Furthermore, as an targeting of proteins for proteolytic destruction by ubiquitinylation. example, the enzyme BT4244_m cleaved two different peptides at The specialized processing of N-linked glycans on misfolded pro- a total of three different glycosylation sites that had unique sur- teins in the also triggers a pathway of pro- rounding amino acid sequences, suggesting promiscuity toward the teolytic destruction (26, 27). The role of the posttranslational peptide backbone. In this light, the present evidence promotes the modifications in these cases, however, is thought to occur upstream concept that for these peptidases the glycan composition, and of proteolysis, rather than by the direct recognition of the modifi- therefore the presence of specific G sites, rather than the sub- cation by a protease active site. The mechanism of O-glycopeptidase strate’s peptide sequence serves to be a major determinant of substrate recognition described here is different from that pre- O-glycopeptidase specificity. However, we do note that IMPa was viously defined and therefore reveals unique insight into how pep- unable to cleave at the second glycosylation site in the MUC-1(Tn) tidases may directly use posttranslational modifications present on peptide, whereas BT4244_m was able to cleave at this site, despite their substrates to influence recognition and peptide bond cleavage. this site possessing an appropriate sugar. This finding implies that there may be some features in the peptide sequence that can in- Materials and Methods fluence activity. In this case, the glycosylated residue is followed by Materials. All reagents and chemicals were from Sigma, unless otherwise a bulky arginine residue. The active site of IMPa in the space this stated. residue would occupy is quite closed in by virtue of the loop structure surrounding the active site, thus potentially causing steric Cloning. The gene fragments encoding BT4244_l (accession code: Q89ZX7, hindrance with the arginine residue (SI Appendix,Fig.S8A). In residues 274–857) and BT4244_m (residues 322–857) were amplified from contrast, the active site of BT4244_m (and ZmpB_m) is quite open, B. thetaiotaomicron VPI-5482 genomic DNA (ATCC 29148). The gene fragments encoding ZmpB_l (accession code: Q0TR08, residues 434–1084) and ZmpB_m likelyexplaininghowitcanaccommodatealargearginineresidue (residues 497–1003) were amplified from C. perfringens genomic DNA (ATCC that is C terminal to the glycosylated residue (SI Appendix,Fig.S8B 13124), and the gene fragment encoding IMPa (accession code: Q9I5W4, resi- and C). Therefore, there is evidence that the amino acid sequence dues 42–923) was amplified from P. aeruginosa PAO1 genomic DNA (ATCC around the O-glycopeptidase cleavage site can influence activity, 47085), using the oligonucleotide primers listed in SI Appendix,TableS4. but evidence for the presence of specific subsites to accommodate a PCR amplified gene fragments were cloned into pET28a (Novagen) via particular amino acid sequence is presently lacking. NheI and XhoI restriction sites with the exception of the IMPa gene fragment The M60-like family as presented by the InterPro database (23) that was cloned into pET22b (Novagen) through NcoI and XhoI restriction contains over 2,200 sequence entries spanning hundreds of species sites according to standard procedures. that include viruses, eukaryotes (including humans), prokaryotes, In-Fusion HD Cloning Kit (Clontech) was used to introduce BT4244_l-E575Q and archaea. The amino acid sequence signature upon which this and BT4244_m-E575A mutants as well as ZmpB_m-E757A and IMPa-E697A family is built includes the tryptophan and asparagine residues that mutants to create catalytically inactive peptidases, using primers described in arepresentinthecoreGalNAcbindingsubsiteofBT4244,ZmpB, SI Appendix, Table S4. The resulting BT4244 and ZmpB plasmids encoded the desired polypeptide fused to an N-terminal six-histidine tag by a thrombin and IMPa. This finding suggests the potential for most, if not all, of protease cleavage site. The resulting IMPa plasmids encoded an N-terminal the family members to recognize O-glycans and generally supports pelB signal sequence fused to the desired polypeptide, for potential peri- the proposal of Nakjang et al. that this is a glycoprotein-specific plasmic localization, followed by a C-terminal six-histidine tag. The DNA family of peptidases (11). Indeed, this family, which is better de- sequence fidelity of all constructs was verified by bidirectional sequencing. scribed as a superfamily, includes three MEROPS families (M60, M88, and M98) and likely contains several additional unclassified Protein Expression and Purification. E. coli strain BL21 (DE3) cells (Invitrogen) families. Furthermore, StcE from E. coli is also a glycoprotein- transformed with the appropriate recombinant expression plasmid were specific protease that is classified into family M66, which is not part grown in 2L of sterile 2xYT media supplemented with the suitable antibiotic at ∼ of the M60-like superfamily but belongs to the same peptidase clan, 37 °C until the culture reached an optical density of 0.8 at 600 nm. Cells were MA. Based on the uncomplexed structure of this protein, Yu et al. cooled at 16 °C for 1 h and recombinant protein expression was induced by the addition of isopropyl-β-D-1-thiogalacto-pyranoside (IPTG) to a final concen- proposed a sugar-binding site on the surface of the protein adjacent tration of 0.5 mM; incubation was continued overnight at 16 °C with shaking. the catalytic machinery (8). Our observations with the fold-related Cells were harvested by centrifugation at 5,000 × g for 10 min at 10 °C and M60-like peptidases support the sugar-binding site hypothesized disrupted by lysozyme-chemical lysis (28). Cell lysate was centrifuged at 15,000 × g for StcE and suggest a mechanism of using glycans as a determinant for 30 min at 10 °C and proteins were purified from the cleared cell lysate by + of substrate recognition analogous to that used by the M60-like Ni2 -NTA immobilized metal . Increased imidazole peptidases. Thus, there is evidence that the recognition of glycans by concentrations (5–500 mM) in binding buffer (20 mM Tris·HCl pH 8.0, 500 mM peptidases may be quite widespread, implying that glycoprotein NaCl) were used to elute the His-tagged protein from the column. Fractions turnover is not necessarily a random or untargeted process, or a containing pure protein as judged by SDS/PAGE analysis were pooled and process that requires preremoval of the glycans by hy- concentrated using a stirred ultrafiltration unit (Amicon) with a 10-kDa cutoff membrane (EMD Millipore). The concentrated protein was further purified by drolases. Organisms or cells that deploy O-glycopeptidases may do size exclusion chromatography using a Sephacryl S-200 HR column (GE so to target specific proteins bearing only the correct glycan modi- Healthcare) in 20 mM Tris·HCl pH 8.0 and 300 mM NaCl. Fractions containing fication or even portions of proteins with the appropriate glycans, the purified protein were pooled and concentrated using the ultrafiltration which in turn implies tailored biological functions for these en- unit. The same protocol was used for the periplasmic expression of the IMPa zymes. Indeed, the specific targeting to glycan structures, as ob- construct and the purification of the protein.

E686 | www.pnas.org/cgi/doi/10.1073/pnas.1615141114 Noach et al. Downloaded by guest on September 30, 2021 Protein concentration was determined by measuring the absorbance at 280 nm anomalous scattering from selenium. This beamline was also used to collect PNAS PLUS − − and using the calculated molar extinction coefficients of 125,140 cm 1·M 1 the diffraction data from ZmpB_m crystals, whereas diffraction data from − − for BT4244_l_E575Q, 122,160 cm 1·M 1 for BT4244_m/BT4244_m_E575A, ZmpB-STAg crystal was collected at beamline 08ID-1 (CMCF-ID). Diffraction − − − − 97,180 cm 1·M 1 for ZmpB_l, 88,240 cm 1·M 1 for ZmpB_m/ZmpB_m_E757A; data from both ZmpB_l and ZmpB-BSMfrg crystals were collected at the SSRL − − 174,680 cm 1·M 1 for IMPa/IMPa_E697A (29). beamline BL9-2. Selenomethionine-labeled proteins were produced using either E. coli Diffraction data were integrated using MOSFLM (31) or XDS (32). POINTLESS BL21 (DE3) cells or the methionine auxotrophic strain E. coli B834 (DE3) was used to identify the space group and SCALA was used to scale and merge (Invitrogen) as the expression strain. The defined media (4L) containing the data (33), both as implemented in CCP4 (34). Data collection and pro- selenomethionine was prepared according to the instructions of the man- cessing statistics are shown in SI Appendix,TablesS1–S3. ufacturer (AthenaES). The procedure for protein expression and purification was otherwise performed as for the native protein. Structure Solution and Refinement. The structure of the bromide derivative of BT4244_l-E575Q was determined using the multiwavelength anomalous dis- In Vitro Mucin Degradation Assays. BSM type I-S, bovine fetuin, and bovine persion (MAD) method (35). Initial automated heavy atom substructure de- asialofetuin (ASF) were used as substrates in the absence or presence of termination, phasing, and density modification were done using autoSHARP 50 mM EDTA. Enzymes were incubated with BSM in a 1:175 (wt/wt) ratio for (36). A total of 31 bromide sites were found by ShelxD (37) in the single · 20 h and with fetuin or ASF in a 1:25 (wt/wt) ratio for 3 h, in 20 mM Tris HCl monomer present in the asymmetric unit and used in the phasing process. – pH 7.5 at 37 °C. The samples were separated on 10 15% SDS/PAGE and stained BUCCANEER (38) was able to build an incomplete model. This partial model for specific glycoprotein detection with the periodic acid-Schiff (PAS) stain was used in PHASER-EP (39) along with the 31 bromide sites for MR/SAD (30). Briefly, gels were fixed in 50% (vol/vol) methanol for 30 min, washed phasing and phase improvement by Parrot density modification. The cycle of × with 5% (vol/vol) acetic acid (3 10 min), and incubated with 0.7% (wt/vol) automated model building and phase improvement was repeated until a periodic acid in 5% acetic acid solution for 3 h at room temperature (RT). The final run of ARP/wARP (40) resulted in a model of sufficient quality to result periodic acid solution was removed, gels were washed with 5% acetic acid in a nearly complete structure when submitted to ARP/wARP as a starting (3 × 5 min), and incubated in Schiff reagent (Fisher) overnight at RT. Reddish- model along with the 2.15-Å resolution BT4244_l-E575Q native dataset. This pink bands of stained glycoprotein would then be visible. BT4244_l-E575Q model, with one molecule in the asymmetric unit, was com- pleted by successive rounds of model building with COOT (41) and refinement Activity on Defined O-Glycopeptides. The details of the synthesis of the with REFMAC (42) and was used to solve the structures of both BT4244_m (at O-glycopeptides is given in SI Appendix. Detection of peptidase activity on 1.65 Å resolution) and BT4244_m-TnAg (at 2.3 Å resolution) by molecular re- defined O-glycopeptides was analyzed by TLC. The 15mer synthetic peptide placement using MOLREP (43). The final model of each of these structures, (8 mM) and the synthetic O-glycopeptides (4 mM each) were incubated with with one molecule in the asymmetric unit, was completed as well by successive each of the peptidases (10 μM) in 20 mM Tris·HCl pH 7.5 for 3 h at 37 °C. rounds of model building with COOT and refinement with REFMAC. Reactions (3 μL each) were spotted onto a silica gel TLC plate and separated BIOCHEMISTRY The structure of selenomethionine-labeled IMPa-6His (SeMet-IMPa-6His) in a solvent comprising butanol:acetic acid:H O (45:35:30, v:v:v). Ninhydrin 2 was solved using the single wavelength anomalous dispersion (SAD) method solution (0.1g in 0.5 mL acetic acid and 100 mL acetone) was used to develop (44). ShelxC/D (37) was used with data extending to 2.7 Å to find the six the TLC plate at 65 °C for 15 min. expected selenium atoms present in the single molecule of IMPa-6His in the asymmetric unit. Initial phases were generated using ShelxE (37) and further Mass Spectrometry. Mass spectrometry experiments were carried out on a improved by solvent flattening using DM (34). BUCCANEER was able to build Bruker Impact HD mass spectrometer acquiring over a mass range from 150 to a model of ∼80% completeness, which was further improved by ARP/wARP 2,200 m/z using an (ESI) source operated at a tem- to a model with 99% completeness. The resulting model was used to solve perature of 200 °C and a spray voltage of 4.5 kV. The peptides were sepa- the structure of IMPa-6His (at 1.93-Å resolution) by molecular replacement rated using a Dionex ultimate 3000 UPLC system. In brief, peptides were using MOLREP. This model, consisting of a single molecule in the asymmetric trapped and desalted on a VanGuard precolumn trap (Waters). The trap was unit, was completed by successive rounds of model building with COOT and subsequently eluted in line with an Acquity 1.7 μm particle, 100 mm × 1mm refinement with REFMAC and used as a starting model to solve both IMPa C18 UPLC column (Waters), using a gradient of 3–80% B (buffer A 0.1% formic acid, buffer B 100% acetonitrile) over 5 min, with a 2-min wash (at 1.85-Å resolution) and IMPa-TAg (at 2.4-Å resolution) by molecular re- method afterward. MS/MS was run in data-dependent acquisition mode placement using MOLREP. The final model of each of these structures, with with a 0.5-s precursor scan from 150 to 2,200 m/z, followed by 12 fragment two molecules in the asymmetric unit, was completed by successive rounds scans from 150 to 2,200 m/z of 0.25 s (Impact). MS/MS fragments were of model building with COOT and refinement with REFMAC. identified with a mass tolerance of 20 ppm, with a minimal signal of 300 The structure of selenomethionine-labeled ZmpB_l (SeMet-ZmpB_l) was also counts. Extracted ion chromatograms were generated with an extraction determined using the SAD method. Using data to 4.0 Å, ShelxC/D was used to window of 0.1 Da from the monoisotopic peak. find 20 of the 22 expected selenium atoms present in the two molecules of SeMet-ZmpB_l in the asymmetric unit. Initial phasing and refinement was per- Crystallization. Crystals of the different peptidases were obtained at 18 °C formed with autoSHARP. PROFESS (34) was used to determine the non- using sitting-drop vapor diffusion for screening and hanging drop vapor crystallographic symmetry (NCS) operators relating the two SeMet-ZmpB_l diffusion for optimization. In all cases 1:1 protein to crystallization solution molecules in the asymmetric unit. Phase improvement was achieved by solvent ratios were used. The details of the crystallization conditions and cry- flattening with NCS averaging using DM. The resulting electron maps were of oprotection conditions are given in SI Appendix. relatively low quality but did have regions of interpretable secondary structure. BUCCANEER (fast build) was able to build a partial model comprising 50% of the Data Collection and Processing. All data collections were done at 100 K. X-ray polypeptide and backbone only. This partial model was used in PHASER-EP diffraction data from both BT4244_l-E575Q-Br and BT4244_l-E575Q crystals along with the 20 selenium sites for MR/SAD phasing. The resulting phases were were collected at the Stanford Synchrotron Radiation Lightsource (SSRL, improved by solvent flattening with NCS averaging using DM and gave a sub- Stanford) on beamline BL12-2. Diffraction data from both BT4244_m and stantial improvement in the electron density maps. The cycle of automated BT4244-TnAg crystals were collected at the Canadian Light Source (Saska- model building, MR/SAD phasing, and density improvement was repeated an toon) beamline 08ID-1 (CMCF-ID). In particular, datasets from a bromide- additional two times. A final run of BUCCANEER resulted in a model of ∼60% soaked crystal of BT4244_l-E575Q were collected at the peak and inflection completeness with 30% of the amino acid sidechains docked. This partial model and remote wavelengths were determined by fluorescence scan to optimize was of sufficient quality to result in an automatically built model of 85% for the anomalous scattering from bromide. completeness when submitted to ARP/wARP as a starting model along with the Diffraction data from crystal of SeMet-IMPa were collected at the Cana- 2.15-Å resolution ZmpB_l native dataset. This model, consisting of two molecules dian Light Source beamline 08B1-1 at a wavelength optimized for anomalous in the asymmetric unit, was completed by successive rounds of model building scattering from the incorporated selenium atoms. The same beamline was with COOT and refinement with REFMAC. The final model of the ZmpB_l used to collect diffraction data from IMPa-6His and IMPa-TAg crystals. protein backbone was used to solve the structures of ZmpB_m (at 1.66-Å Beamline CMCF-ID 08ID-1 at the Canadian Light Source was used to collect resolution), ZmpB_m-BSMfrg (at 1.60-Å resolution) and ZmpB_m-STAg (at diffraction data from IMPa crystal. 2.0-Å resolution), by molecular replacement using MOLREP. The final model Diffraction data from selenomethionine-labeled ZmpB_l crystal were of each of these structures, with one molecule in the asymmetric unit, was collected at the Canadian Light Source beamline 08B1-1 (CMCF-BM) at the completed by successive rounds of model building with COOT and re- peak wavelength determined by fluorescence scan to optimize for the finement with REFMAC.

Noach et al. PNAS | Published online January 17, 2017 | E687 Downloaded by guest on September 30, 2021 In all cases, the final addition of water molecules was performed in COOT Lightsource (SSRL) beamlines where diffraction data were collected. This with FINDWATERS and manually checked after refinement. In all datasets, research was supported by operating grants (to A.B.B.) from the Canadian refinement procedures were monitored by flagging 5% of all observations as Institutes of Health Research (CIHR, MOP-130305) and the GlycoNet National “free” (45). Model validation was performed with MOLPROBITY (46). Re- Centre of Excellence. J.E.B. is supported by a New Investigator grant from finement statistics are shown in SI Appendix,TablesS1–S3. Structure figures CIHR and a Discovery Research grant from the Natural Sciences and Engi- neering Research Council of Canada (NSERC 2014‐05218). The CLS is sup- were generated using PyMOL (the PyMOL Molecular Graphics System, version ported by NSERC, the National Research Council Canada, the Canadian 1.6.0.0, Schrödinger). Electrostatic potential surfaces were generated using Institutes of Health Research, the Province of Saskatchewan, Western Eco- APBS (47). The core-2 glycan model was generated using GLYCAM (48). nomic Diversification Canada, and the University of Saskatchewan. SSRL is a Directorate of SLAC National Accelerator Laboratory and an Office of Sci- ACKNOWLEDGMENTS. We thank Dr. Richard Cummings and Dr. David Smith ence User Facility operated for the US Department of Energy (DOE) Office of from the Protein–Glycan Interaction Resource (supported by Grant R24 Science by Stanford University. The SSRL Structural Molecular Biology Pro- GM098791) of the Consortium for Functional for performing the gram is supported by the DOE Office of Biological and Environmental Re- glycan microarray analysis; Hayden McClure for technical assistance; and the search, and by the National Institutes of Health, National Center for staff of the Canadian Macromolecular Crystallography Facility beamlines at Research Resources, Biomedical Technology Program Grant P41RR001209, the Canadian Light Source (CLS) and the Stanford Synchrotron Radiation and the National Institute of General Medical Sciences.

1. Rawlings ND, Barrett AJ, Finn R (2016) Twenty years of the MEROPS database of 25. Matsumoto K, Davis BG, Jones JB (2002) Chemically modified “polar patch” mutants proteolytic enzymes, their substrates and inhibitors. Nucleic Acids Res 44(D1): of subtilisin in peptide synthesis with remarkably broad substrate acceptance: De- D343–D350. signing combinatorial biocatalysts. Chemistry 8(18):4129–4137. 2. Kim MS, et al. (2014) A draft map of the human proteome. Nature 509(7502):575–581. 26. Määttänen P, Gehring K, Bergeron JJ, Thomas DY (2010) Protein quality control in the 3. Hooper NM (2002) Proteases: A primer. Essays Biochem 38:1–8. ER: The recognition of misfolded proteins. Semin Cell Dev Biol 21(5):500–511.   4. Vizovisek M, Vidmar R, Fonovic M, Turk B (2016) Current trends and challenges in 27. Olivari S, Molinari M (2007) Glycoprotein folding and the role of EDEM1, EDEM2 and – proteomic identification of protease substrates. Biochimie 122:77 87. EDEM3 in degradation of folding-defective glycoproteins. FEBS Lett 581(19): 5. Khoury GA, Baliban RC, Floudas CA (2011) Proteome-wide post-translational modifi- 3658–3664. cation statistics: Frequency analysis and curation of the swiss-prot database. Sci Rep 1:1. 28. Noach I, et al. (2016) The details of glycan hydrolysis by the structural 6. Hang HC, Bertozzi CR (2005) The chemistry and biology of mucin-type O-linked gly- analysis of a family 123 from Clostridium perfringens. J Mol Biol cosylation. Bioorg Med Chem 13(17):5021–5034. 428(16):3253–65. 7. Abdullah KM, Udoh EA, Shewen PE, Mellors A (1992) A neutral glycoprotease of 29. Gasteiger E, et al. (2003) ExPASy: The proteomics server for in-depth protein knowl- Pasteurella haemolytica A1 specifically cleaves O-sialoglycoproteins. Infect Immun edge and analysis. Nucleic Acids Res 31(13):3784–3788. 60(1):56–62. 30. Doerner KC, White BA (1990) Detection of glycoproteins separated by nondenaturing 8. Yu AC, Worrall LJ, Strynadka NC (2012) Structural insight into the bacterial mucinase polyacrylamide using the periodic acid-Schiff stain. Anal Biochem StcE essential to adhesion and immune evasion during enterohemorrhagic E. coli – infection. Structure 20(4):707–717. 187(1):147 150. 9. Wang P, Granados RR (1997) An intestinal mucin is the target substrate for a bacu- 31. Leslie AGW (2006) The integration of macromolecular diffraction data. Acta – lovirus enhancin. Proc Natl Acad Sci USA 94(13):6977–6982. Crystallogr D Biol Crystallogr 62(Pt 1):48 57. – 10. Ayala-Lujan JL, et al. (2014) Broad spectrum activity of a -like bacterial serine 32. Kabsch W (2010) XDS. Acta Crystallogr D Biol Crystallogr 66(Pt 2):125 132. protease family on human leukocytes. PLoS One 9(9):e107920. 33. Evans P (2006) Scaling and assessment of data quality. Acta Crystallogr D Biol 11. Nakjang S, Ndeh DA, Wipat A, Bolam DN, Hirt RP (2012) A novel extracellular met- Crystallogr 62(Pt 1):72–82. allopeptidase domain shared by animal host-associated mutualistic and pathogenic 34. Winn MD, et al. (2011) Overview of the CCP4 suite and current developments. Acta microbes. PLoS One 7(1):e30287. Crystallogr D Biol Crystallogr 67(Pt 4):235–242. 12. Bardoel BW, et al. (2012) Identification of an immunomodulating metalloprotease of 35. Hendrickson WA (1991) Determination of macromolecular structures from anomalous Pseudomonas aeruginosa (IMPa). Cell Microbiol 14(6):902–913. diffraction of synchrotron radiation. Science 254(5028):51–58. 13. Nesta B, et al. (2014) SslE elicits functional that impair in vitro mucinase 36. Vonrhein C, Blanc E, Roversi P, Bricogne G (2007) Automated structure solution with activity and in vivo colonization by both intestinal and extraintestinal Escherichia coli autoSHARP. Methods Mol Biol 364:215–230. strains. PLoS Pathog 10(5):e1004124. 37. Sheldrick GM (2010) Experimental phasing with SHELXC/D/E: Combining chain tracing 14. Hooper NM (1994) Families of zinc metalloproteases. FEBS Lett 354(1):1–6. with density modification. Acta Crystallogr D Biol Crystallogr 66(Pt 4):479–485. 15. Szabady RL, Yanta JH, Halladin DK, Schofield MJ, Welch RA (2011) TagA is a secreted 38. Cowtan K (2006) The Buccaneer software for automated model building. 1. Tracing protease of Vibrio cholerae that specifically cleaves mucin glycoproteins. Microbiology protein chains. Acta Crystallogr D Biol Crystallogr 62(Pt 9):1002–1011. 157(Pt 2):516–525. 39. McCoy AJ, et al. (2007) Phaser crystallographic software. J Appl Cryst 40(Pt 4):658–674. 16. Bhullar K, et al. (2015) The serine protease autotransporter Pic modulates Citrobacter 40. Langer G, Cohen SX, Lamzin VS, Perrakis A (2008) Automated macromolecular model rodentium pathogenesis and its innate recognition by the host. Infect Immun 83(7): building for X-ray crystallography using ARP/wARP version 7. Nat Protoc 3(7): – 2636 2650. 1171–1179. 17. Pillai L, et al. (2006) Molecular and functional characterization of a ToxR-regulated 41. Emsley P, Cowtan K (2004) Coot: Model-building tools for molecular graphics. Acta lipoprotein from a clinical isolate of Aeromonas hydrophila. Infect Immun 74(7): Crystallogr D Biol Crystallogr 60(Pt 12 Pt 1):2126–2132. 3742–3755. 42. Murshudov GN, Vagin AA, Dodson EJ (1997) Refinement of macromolecular struc- 18. Popham HJR, Bischoff DS, Slavicek JM (2001) Both Lymantria dispar nucleopolyhe- tures by the maximum-likelihood method. Acta Crystallogr D Biol Crystallogr 53(Pt 3): drovirus enhancin genes contribute to viral potency. J Virol 75(18):8639–8648. 240–255. 19. Windwarder M, Altmann F (2014) Site-specific analysis of the O-glycosylation of bo- 43. Vagin A, Teplyakov A (2010) Molecular replacement with MOLREP. Acta Crystallogr D vine fetuin by electron-transfer dissociation mass spectrometry. J Proteomics 108: Biol Crystallogr 66(Pt 1):22–25. 258–268. 44. Dodson E (2003) Is it jolly SAD? Acta Crystallogr D Biol Crystallogr 59(Pt 11): 20. Tsuji T, Osawa T (1986) Carbohydrate structures of bovine submaxillary mucin. – Carbohydr Res 151:391–402. 1958 1965. 21. Schechter I, Berger A (1967) On the size of the active site in proteases. I. Papain. 45. Brünger AT (1992) Free R value: A novel statistical quantity for assessing the accuracy – Biochem Biophys Res Commun 27(2):157–162. of crystal structures. Nature 355(6359):472 475. 22. Halim A, Rüetschi U, Larson G, Nilsson J (2013) LC-MS/MS characterization of O-gly- 46. Chen VB, et al. (2010) MolProbity: All-atom structure validation for macromolecular cosylation sites and glycan structures of human cerebrospinal fluid glycoproteins. crystallography. Acta Crystallogr D Biol Crystallogr 66(Pt 1):12–21. J Proteome Res 12(2):573–584. 47. Baker NA, Sept D, Joseph S, Holst MJ, McCammon JA (2001) Electrostatics of nano- 23. Mitchell A, et al. (2015) The InterPro protein families database: The classification systems: Application to microtubules and the ribosome. Proc Natl Acad Sci USA 98(18): resource after 15 years. Nucleic Acids Res 43(Database issue):D213–D221. 10037–10041. 24. Doores KJ, Davis BG (2005) “Polar patch” proteases as glycopeptiligases. Chem 48. Kirschner KN, et al. (2008) GLYCAM06: A generalizable biomolecular force field. Commun (Camb) (2):168–170. . J Comput Chem 29(4):622–655.

E688 | www.pnas.org/cgi/doi/10.1073/pnas.1615141114 Noach et al. Downloaded by guest on September 30, 2021