bioRxiv preprint doi: https://doi.org/10.1101/859785; this version posted November 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Trefoil factors share a lectin activity that remove terminating α-GlcNAc (pMucinred+GH89). ELISA per- formed using these reagents (Fig. 1c) detected robust binding of defines their role in mucus mTFF1bio and mTFF3bio to both pMucin and pMucinred, while no Michael Järvå1,2, James P. Lingford1,2, Alan John1,2, binding was observed for pMucinred+GH89. This established that Nichollas E. Scott3, and Ethan D. Goddard-Borger1,2* all TFF– interactions are independent of mucin structure but do require α-GlcNAc. 1 The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria 3052, Australia. We next sought to determine the minimal carbohydrate structure 2 Department of Medical Biology, University of Melbourne, Parkville, required for TFF binding and the affinity of this interaction for Victoria 3010, Australia. each TFF. Isothermal calorimetry (ITC) demonstrated that the simple monosaccharides D-GlcNAc, benzyl α-D-GlcNAc and 3 Department of Microbiology and Immunology, University of Melbourne at the Peter Doherty Institute for Infection and Immunity, 4-nitrophenyl α-D-GlcNAc had no detectable affinity for TFF1- Parkville, Victoria 3010, Australia. 3. The larger GlcNAc-α-1,4-Gal disaccharide, which is the term- inating structure unique to type-III , bound to mTFF1, * Correspondence and requests for materials should be addressed to TFF2 and mTFF3 with a Kd of 49 ± 4 µM, 44 ± 8 µM and E.D.G.-B. ([email protected]). 65 ± 5 µM, respectively (Fig. 1d, Supplementary Table 2 and The trefoil factors (TFFs) are disulfide-rich mucosal Supplementary Fig. 1). Other α-1,4-linked disaccharides, such peptides that protect the epithelium by promoting cell as maltose (Glc-α-1,4-Glc) and Gal-α-1,4-Gal, failed to show migration and increasing the viscoelasticity of the mucosa. any affinity for the TFFs by ITC. These results were corr- Here we show that all TFFs are divalent lectins that oborated by a tryptophan fluorescence quenching assay, which recognise the GlcNAc-α-1,4-Gal disaccharide, which term- gave Kd values for mTFF1 and mTFF3 of 37 ± 2 µM, and inates type-III mucin-like O-glycans. Structural, mutagenic 58 ± 1 µM, respectively (Fig. 1e). and biophysical data support a model of mucus viscoelasticity that features non-covalent cross-linking of Initial attempts to co-crystallise TFFs and their shared ligand glycoproteins by TFFs. (GlcNAc-α-1,4-Gal) were confounded by the exceptional solubility of these . Eventually we obtained mTFF1 The three human TFFs (Fig. 1a) (TFF1, TFF2 and TFF3)1 are crystals that only yielded an apo structure (Supplementary ubiquitous in mucosal environments. They share substantial Table 3 and Supplementary Fig. 2). Following mTFF3 surface sequence similarity, suggestive of a conserved function, yet their lysine methylation, a crystal of the mTFF3–GlcNAc-α-1,4-Gal biological roles are not redundant2. They protect the mucosal complex was obtained and the structure determined to a reso- epithelium by increasing mucus viscoelasticity3,4 and enhancing lution of 1.55Å (Fig. 1f, Supplementary Table 3). The di- epithelial restitution5–9. TFF overexpression is a hallmark of saccharide is accommodated within a hydrophobic cleft of TFF3 chronic inflammatory diseases of the respiratory tract10–12. The with ligand binding driven by sterical fitting and hydrogen contribution of TFFs to the progression of these diseases is bonds to the peptide backbone (Fig. 1g,h). Only two side chains, unclear, though any increase in mucus viscoelasticity might be Asp20 and Trp47, make direct contacts with the disaccharide: expected to facilitate the formation of mucus plug obstructions Asp20 makes a bidentate hydrogen-bonding interaction between in the airways. Exploring the role of TFFs in these and other O-4 and O-6 of the non-reducing α-GlcNAc, while C-H–π inter- biological contexts is confounded by an incomplete under- actions are made between the indole ring of Trp47 and the standing of how they modulate the physical properties of mucus reducing Gal. Trp47 undergoes considerable motion between and a dearth of tools for inhibiting TFF activity2. liganded and unliganded forms (Fig. 1i). These two residues are conserved in TFF1-3: though TFF1 and TFF2 feature an Asn in An extensive list of TFF binding partners has accumulated place of Asp20 (Fig. 2k). This binding mode is remarkably within the literature and includes many cell surface receptors similar to that of the bacterial carbohydrate-binding module and mucosal glycoproteins2,13. In the context of the mucus- (CBM) family 32 protein that binds the GlcNAc-α-1,4-Gal thickening properties of the TFFs, the soluble mucins are the glycan (Kd of 72 µM) in the same conformational pose, despite most relevant binding partners. This is complemented by the sharing no sequence, structural or ancestral commonalities with observation that TFF2 interacts with the α-D-GlcNAc- the TFFs (Fig. 2j)15. terminated O-glycans of mucins (Fig. 1b)14. It remains unknown what the simplest glycan structure required for TFF2 binding is, Mutagenesis of the ligand-binding Asn/Asp and Trp in dimeric how tightly it binds, if TFF1 and TFF3 share this property and, TFF1 and TFF3 constructs (dTFF1bio and dTFF3bio) enabled an if so, how the different biological activities of these closely- examination of their role in mucin binding by ELISA. The related peptides arise. dTFF1bio-N14A and dTFF3bio-D20A mutants had dramatically reduced affinity for pMucinred, while dTFF1bio-W41A and To establish that TFF1 and TFF3 possessed the same α-GlcNAc- dTFF3bio-W47A had no detectable affinity for pMucinred dependent binding activity as TFF2, we started by preparing (Fig. 2a). ITC was unable to detect any binding of GlcNAc-α- site-selectively biotinylated monomeric TFF1 and TFF3 1,4-Gal to the TFF mutants (Supplementary Fig. 1), suggesting (mTFF1bio and mTFF3bio) (Supplementary Table 1) and three that the weak residual binding observed for dTFF1bio-N14A and mucin preparations: commercially-available type-III porcine dTFF3bio-D20A arises from avidity effects. Curiously, all gastric mucin (pMucin), reduced and alkylated pMucin known mammalian TFF1s utilise a disaccharide-binding Asn, (pMucinred), and pMucinred digested with an α-D-N-acetyl- while all TFF3s retain an Asp (Supplementary Fig. 3). TFF1 glucosaminidase from Clostridum perfringens (CpGH89) to operates in the low pH gastric mucosa, while TFF3 does not, and

1

bioRxiv preprint doi: https://doi.org/10.1101/859785; this version posted November 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

so the Asn/Asp difference may relate to pH tolerances in glycan Acknowledgements binding. To supplement binding data collected at pH 7.4 We thank A/Prof. Alisdair Boraston for providing the plasmid 19 (Fig. 1e), the Kd for the disaccharide–TFF complexes were for CpGH89 production , Dr. Richard Birkinshaw for determined at pH 5.0 and pH 2.6 using the same tryptophan suggesting surface lysine methylation for protein crystallisation, fluorescence quenching assay (Supplementary Fig. 4). At the MX2 beamline staff at the Australian Synchrotron for help pH 2.6, the Kd for mTFF1 increased only slightly to 67 ± 1 µM, with X-ray data collection, as well as Dr. Janet Newman and Dr. while for mTFF3 the Kd was 350 ± 60 µM; six-fold higher than Bevan Marshall at the Commonwealth Scientific and Industrial at pH 7.4. Like TFF1, TFF2 is abundant in gastric mucus, has a Research Organisation (CSIRO) Collaborative Crystallisation conserved Asn (Supplementary Fig. 3) and also binds mucins Centre (C3) for assistance in protein crystallization. This work at low pH14. The non-ionisable disaccharide-binding Asn side was supported by a National Health and Medical Research chain is thus important for TFF activity at low pH. Council of Australia (NHMRC) project grant (GNT1139549).

To demonstrate that the TFFs are capable of cross-linking Author contributions mucosal glycoproteins into larger particles in a glycan- M.J. performed all biophysical, bioinformatic and structural dependent manner, we performed agglutination assays using studies; M.J., J.P. and A.J. produced recombinant protein; pMucin and our monomeric, dimeric and mutant TFF1/3 N.E.S. performed all mass spectrometry experiments; E.D.G.-B. constructs (Fig. 2b, Supplementary Fig. 5). Dimeric TFF1 and conceived the project; M.J. and E.D.G.-B. co-wrote the TFF3 induced a dose-dependent increase in light scattering over manuscript. time at concentrations as low as 4 µM. Monomeric TFFs had no detectable influence on light scattering at concentrations up to Competing interests 32 µM, commensurate with divalency being required for glyco- We declare no competing financial interests. protein cross-linking. The mutant dimers, which had greatly diminished or no detectable disaccharide-binding activity, References mirrored the results of the monomers in that they failed to 1. Thim, L. A new family of growth factor-like peptides ‘Trefoil’ disulphide agglutinate pMucins. We also demonstrated using ELISA that loop structures as a common feature in breast cancer associated peptide (pS2), pancreatic spasmolytic polypeptide (PSP), and frog skin peptides mucin-bound dTFF1bio and dTFF3bio could be displaced by the (spasmolysins). FEBS Lett. 250, 85–90 (1989). CpGH89 enzyme-catalysed removal of α-GlcNAc (Fig. 2c). The 2. Braga Emidio, N., Hoffmann, W., Brierley, S. M. & Muttenthaler, M. efficacy of CpGH89 under these conditions differed between the Trefoil factor family: unresolved questions and clinical perspectives. Trends Biochem. Sci. 44, 387–390 (2019). two TFFs, with EC50 values of 0.8 ± 0.1 nM for TFF1 and 3. Thim, L., Madsen, F. & Poulsen, S. S. Effect of trefoil factors on the 0.4 ± 0.1 nM for TFF3: the slightly greater resilience of TFF1 viscoelastic properties of mucus gels. Eur. J. Clin. Invest. 32, 519–527 may arise from its higher affinity for the disaccharide. This (2002). enzyme may be a useful tool for perturbing TFF function in 4. Bastholm, S. K. et al. Trefoil factor peptide 3 is positively correlated with other systems. the viscoelastic properties of the cervical mucus plug. Acta Obstet. Gynecol. Scand. 96, 47–52 (2017). 5. Mashimo, H., Wu, D.-C., Podolsky, D. K. & Fishman, M. C. Impaired Finally, we re-evaluated structures of other human proteins with defense of intestinal mucosa in mice lacking intestinal trefoil factor. trefoil domains in the PDB to uncover further evidence that Science (80-. ). 274, 262–265 (1996). these domains evolved as lectins in other contexts: the trefoil 6. Oertel, M. et al. Trefoil factor family–peptides promote migration of human bronchial epithelial cells. Am. J. Respir. Cell Mol. Biol. 25, 418– domain of human lysosomal a-glucosidase is bound to 424 (2001). isomaltose (Glc-α-1,6-Glc) (PDB ID: 5KZW) (Supplementary 7. Rinnert, M. et al. Synthesis and localization of trefoil factor family (TFF) Fig. 6). Phylogenetic analysis of all mammalian proteins with a peptides in the human urinary tract and TFF2 excretion into the urine. Cell Tissue Res. 339, 639–647 (2010). trefoil domain revealed three clades of putative lectins 8. Hung, L. Y. et al. Macrophages promote epithelial proliferation following associated with either amylose-processing enzymes, the infectious and non-infectious lung injury through a - GlcNAc-α-1,4-Gal-binding mucosal TFFs, or the glycoprotein- dependent mechanism. Mucosal Immunol. 12, 64–76 (2019). binding zona pellucida proteins (Fig. 3). Our work provides a 9. Hoffmann, W. TFF (Trefoil Factor Family) peptides and their potential roles for differentiation processes during airway remodeling. Curr. Med. basis for probing the impact of these domains on the function of Chem. 14, 2716–2719 (2007). their respective proteins. 10. Curran, D. R. & Cohn, L. Advances in mucous cell metaplasia. Am. J. Respir. Cell Mol. Biol. 42, 268–275 (2010). The data presented here reveals that the mucin-binding activity 11. Viby, N.-E. et al. Trefoil factors (TFFs) are increased in bronchioalveolar lavage fluid from patients with chronic obstructive lung disease (COPD). of all TFFs is contingent on their glycoprotein binding-partners Peptides 63, 90–95 (2015). possessing α-GlcNAc-terminated O-glycans. These glycan 12. Doubková, M., Karpíšek, M., Mazoch, J., Skřičková, J. & Doubek, M. structures are assembled in the Golgi by α-1,4-N-acetylglucos- Prognostic significance of surfactant protein A, surfactant protein D, Clara aminyltransferase (α4GnT)16,17, which is constitutively cell protein 16, S100 protein, , and prostatic secretory 18 protein 94 in idiopathic pulmonary fibrosis, sarcoidosis, and chronic expressed in gastric mucous and Brunner’s gland cells but not pulmonary obstructive disease. Sarcoidosis, Vasc. Diffus. lung Dis. Off. J. in lung tissues. As such, future studies that focus on the role of WASOG 33, 224–234 (2016). the TFF lectins in health and disease must consider quantitation 13. Belle, N. M. et al. TFF3 interacts with LINGO2 to regulate EGFR of α4GnT and α-GlcNAc-terminated O-glycans in cells and activation for protection against colitis and gastrointestinal helminths. Nat. Commun. 10, 4408 (2019). mucus, respectively, in addition to the abundance of the TFFs 14. Hanisch, F.-G., Bonar, D., Schloerer, N. & Schroten, H. Human trefoil themselves. factor 2 is a lectin that binds α-GlcNAc-capped mucin glycans with antibiotic activity against Helicobacter pylori. J. Biol. Chem. 289, 27363– 27375 (2014).

2

bioRxiv preprint doi: https://doi.org/10.1101/859785; this version posted November 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

15. Ficko-Blean, E. et al. Carbohydrate recognition by an architecturally (2001). complex α-N-acetylglucosaminidase from Clostridium perfringens. PLoS 18. Nakamura, N. et al. Histochemical reactivity of normal, metaplastic, and One 7, (2012). neoplastic tissues to α-linked N-acetylglucosamine residue-specific 16. Nakayama, J. Alpha-1,4-N-Acetylglucosaminyltransferase (A4GNT). in monoclonal antibody HIK1083. J. Histochem. Cytochem. 46, 793–801 Handbook of Glycosyltransferases and Related 1–2, 379–391 (1998). (Springer Japan, 2014). 19. Ficko-Blean, E., Stubbs, K. A., Nemirovsky, O., Vocadlo, D. J. & 17. Zhang, M. X. et al. Immunohistochemical demonstration of α1,4-N - Boraston, A. B. Structural and mechanistic insight into the basis of acetylglucosaminyltransferase that forms GlcNAcα1,4Galβ Residues in mucopolysaccharidosis IIIB. Proc. Natl. Acad. Sci. U. S. A. 105, 6560– Human gastrointestinal mucosa. J. Histochem. Cytochem. 49, 587–596 6565 (2008).

Figure 1. (a) Schematics and protein topology of human TFF1-3. (b) The a-GlcNAc-capped Core 2 O-glycan identified as a ligand for TFF214. (c) ELISA data demonstrating the ability of mTFF1bio and mTFF3bio to bind to pMucin and pMucinred but not pMucinred+GH89. Error bars indicate standard deviation of three replicates. ns = p>0.05, **** = p<0.0001 (d) Representative ITC isotherms of mTFF1 (blue), TFF2 (orange) and mTFF3 (green) titrated with GlcNAc-a-1,4-Gal. (e) mTFF1 and mTFF3 binding to GlcNAc-a-1,4-Gal as determined by a tryptophan fluorescence quenching assays. Error bars indicate standard deviation of three replicates. (f) Structure of TFF3–GlcNAc-a-1,4-Gal. (g) Interactions between GlcNAc-a-1,4-Gal and TFF3 (distances in Å) and Fo-Fc omit map contoured at 1.5σ around the ligand. (h) Schematic representation of the interactions between TFF3 and GlcNAc-a-1,4-Gal. (i) Superposition of TFF3–GlcNAc-a-1,4-Gal (green), TFF1 (blue) and the trefoil domains from porcine TFF2 (orange and yellow) (PDBID: 2PSP). The glycan binding side chains are shown as sticks. (j) A CBM32 domain (PDBID: 4A6O) in complex with GlcNAc-a-1,4-Gal and their interactions (distances in Å). (k) Sequence alignment of the trefoil domains in human TFFs. Numbers indicate disulfide connectivity, and vertical arrows indicate conserved ligand-binding residues.

3

bioRxiv preprint doi: https://doi.org/10.1101/859785; this version posted November 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 2 (a) The ability of dimeric WT or mutant dTFF1bio (blue), or dTFF3bio (green) to bind to immobilized pMucinred assessed by ELISA. Error bars indicate standard deviations of three replicates. ns = p>0.05, **** = p<0.0001. (b) pMucin agglutination assays using monomeric wild-type, dimeric wild-type or mutant TFF1 (top) and TFF3 (bottom) with optical density at 405 nm monitored with respect to time. (c) CpGH89-mediated displacement of dimeric TFF1 (blue circles) and TFF3 (green squares) from immobilized pMucin, as determined by ELISA. Error bars represent standard deviations from three replicates.

4

bioRxiv preprint doi: https://doi.org/10.1101/859785; this version posted November 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 3. Phylogenetic tree of the mammalian trefoil domains color coded based on their - and enzyme-association, including structural representations where available (PDB Accession IDs listed under protein names). The large clade of trefoil domains found in amylose-processing enzymes likely bind maltose or isomaltose, as observed for lysosomal α-glucosidase. The clade of TFF trefoil domains bind the GlcNAc-α-1,4- Gal disaccharide that is unique to type-III mucin-like O-glycans. It is unclear what ligand the third clade, found in zona pellucida protein 1 and 4, might have affinity for.

5

bioRxiv preprint doi: https://doi.org/10.1101/859785; this version posted November 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Methods media + 0.2% glucose + 50 µg.ml-1 Kan. The culture was Cloning, expression and purification of untagged monomeric incubated at 30°C and 220 rpm until it reached an OD600 of 1.0. TFF1/3 IPTG was added to a final concentration of 0.4 mM and the A series of dsDNA oligonucleotides encoding human TFF1 culture was incubated for 4 h at 30°C and 220 rpm. The cells (UniProt ID: P04155) and TFF3 (UniProt ID: Q07654) with an were harvested by centrifugation (8,000×g, 25 min, 4 °C) and N-terminal PelB signal peptide, no interchain Cys, and a C- frozen (-80°C) until further use. terminal His6-tag codon-harmonised for E. coli (Supple- mentary Table 4) were synthesised (IDT) and cloned into the The cell pellet from 1.0 l of culture was resuspended in 20 ml pET29b(+) vector (Novagen) using the NdeI and XhoI ice-cold PBS (50 mM NaPi, 150 mM NaCl, pH 7.5) + 40 mM restriction sites. The resulting plasmids were verified using ImH + 0.5 mM EDTA + 2 µL Benzonaze (Millipore) + 1 tablet Sanger sequencing. Each plasmid was transformed into T7 of cOmplete EDTA-free protease inhibitor cocktail (Roche). Express cells (NEB) and plated onto LB-agar + 2% glucose + The suspension was sonicated in an ice bath in bursts of 5 50 µg.ml-1 Kan and incubated at 37 °C for 16 h. Single colonies seconds, followed by 10 seconds pause, at 50% amplitude until were picked to generate overnight cultures, which were used to total energy spent reached 10 kW. The lysate was centrifuged inoculate SB media + 0.2% glucose + 50 µg.ml-1 Kan. The (18,000×g, 30 min, 4°C) and the supernatant filtered (0.45 µm). culture was incubated at 37°C and 220 rpm until it reached an The supernatant was applied to a Ni-affinity column (HisTrap OD600 of 1.0. IPTG was added to a final concentration of 0.1 Excel, 1 ml), the column washed with 10 CV of 50 mM Tris, mM for mTFF1 or 0.4 mM for mTFF3 and the culture was 300 mM NaCl, 40 mM ImH, pH 7.5, and the protein eluted with incubated for 4 h at 30°C and 220 rpm. The cells were harvested 50 mM Tris, 300 mM NaCl, 400 mM ImH, pH 7.5. Fractions by centrifugation (8,000×g, 25 min, 4 °C) and frozen (-80°C) containing product, as judged by SDS-PAGE, were pooled and until further use. further purified by size exclusion chromatography (Superdex® 75 10/300, GE Healthcare) using 50 mM Tris-HCl, 150 mM Thawed cells were resuspended in ice-cold high-osmolyte NaCl, pH 7.5 as buffer. Non-reducing SDS-PAGE and intact buffer (0.5 M sucrose, 0.2 M Tris, pH 8.0) at 4°C for 15 min. ESI-MS data suggested that these proteins had one free cysteine Three volumes of ice-cold milliQ H2O was added and the each: the interchain disulfide had not formed within the cell. mixture nutated at 4°C for 30 min. The suspension was adjusted These were biotinylated (see below) to generate mTFF1bio and to 150 mM NaCl, 2 mM MgCl2, 20 mM ImH, pH 8.0, mTFF3bio. centrifuged (18,000×g, 30 min, 4°C) and the supernatant filtered (0.45 µm). The supernatant was applied to a Ni-affinity column To make the native dimers with an interchain disulfide bond, (HisTrap Excel, 1 ml), the column washed with 10 CV of 50 protein samples at a concentration of 5 mg.ml-1 in 100 mM Tris- mM Tris, 300 mM NaCl, 40 mM ImH, pH 7.5, and the protein HCl, pH 8.3 were treated with 20 mM K3Fe(CN6) for 16 h at 25 eluted with 50 mM Tris, 300 mM NaCl, 400 mM ImH, pH 7.5. ºC. The reaction mixture was filtered (0.22 µm) and purified by size exclusion chromatography (Superdex® 75 10/300, GE Fractions containing product, as judged by SDS-PAGE, were Healthcare) using 50 mM Tris-HCl, pH 7.5 as buffer, and anion pooled and further purified by size exclusion chromatography exchange chromatography (MonoQ 5/50 GL, GE Healthcare) (Superdex® 75 10/300, GE Healthcare) using 50 mM Tris-HCl, using a gradient over 30 CV from 100% binding buffer (50 mM 150 mM NaCl, pH 7.5 as buffer. The C-terminal His6-tag was Tris-HCl, pH 7.5) to 60% elution buffer (binding buffer + 1 M removed by the addition of 1:100 molar ratio of NaCl). SDS-PAGE and intact ESI-MS data confirmed the carboxypeptidase A (Sigma) with incubation at 25 ºC for 24 h. homogeneity and number of disulfide bonds in each sample The solution was filtered (0.22 µM) and passed through a Ni- (Supplementary Fig. 7 and 8). affinity column (HisTrap Excel, 1 ml) with the flow-through being further purified by size exclusion chromatography Cloning, expression and purification of TFF2 (Superdex® 75 10/300, GE Healthcare) using 50 mM HEPES, A dsDNA oligonucleotide encoding human TFF2 (UniProt ID: 50 mM NaCl, pH 7.4 (ITC-Buffer). SDS-PAGE and intact ESI- Q03403) with an N-terminal gp67 signal peptide and a C- MS data confirmed the homogeneity of mTFF1 and mTFF3, the terminal His10 sequence codon-harmonised for Spodoptera removal of the His6-tags and the presence of three disulfide frugiperda (Supplementary Table 4) was synthesised (IDT) bonds (Supplementary Fig. 7 and 8). and cloned into the pFastBac vector (ThermoFisher) using the SpeI and XhoI restriction sites. The resulting plasmid was Cloning, expression and purification of avi-tagged-TFF1/3 verified using Sanger sequencing. Expression of TFF2 was and their mutants achieved in Sf21 cells using the ‘Bac-to-Bac Baculovirus A series of dsDNA oligonucleotides encoding human TFF1 Expression System’ (ThermoFisher) in accordance with the (UniProt ID: P04155), TFF3 (UniProt ID: Q07654) and mutants manufacturer’s instructions. One litre of cell culture at a density 6 -1 thereof with an N-terminal His6-Avi-tag and Factor Xa cleavage of 1×10 cells.ml was infected with 30 ml of P3 baculovirus site codon-harmonised for E. coli (Supplementary Table 4) and cultured at 27 °C for 72 h. The culture was centrifuged were synthesised (IDT) and cloned into the pET28b(+) vector (8,000×g, 20 min, 4 °C) and the supernatant collected, adjusted (Novagen) using the NcoI and XhoI restriction sites. The to pH 7.5 and filtered (0.45 µm). The supernatant was applied to resulting plasmids were verified using Sanger sequencing. Each a Ni-affinity column (HisTrap Excel, 5 ml), the column washed plasmid was transformed into SHuffle T7 cells (NEB) and with 10 CV of 50 mM Tris, 300 mM NaCl, 40 mM ImH, pH 7.5, plated onto LB-agar + 2% glucose + 50 µg.ml-1 Kan and and the protein eluted with 50 mM Tris, 300 mM NaCl, 400 mM incubated at 37 °C for 16 h. Single colonies were picked to Imidazole, pH 7.5. Fractions containing product, as judged by generate overnight cultures, which were used to inoculate SB SDS-PAGE, were pooled and further purified by size exclusion

6

bioRxiv preprint doi: https://doi.org/10.1101/859785; this version posted November 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

chromatography (Superdex® 75 10/300, GE Healthcare) using GlcNAc-α-1,4-Gal, 25×1.58 µL injections were used. All runs 50 mM Tris-HCl, 150 mM NaCl, pH 7.5 as buffer. SDS-PAGE were performed at 25 °C in a MicroCal iTC200 (Malvern). and intact ESI-MS data confirmed that TFF2 was homogenous and had seven disulfide bonds (Supplementary Fig. 7 and 8). Tryptophan fluorescence quenching assay Tryptophan fluorescence quenching assays were performed Biotinylation of Avi-tagged TFFs at 25 °C on an EnVision 2105 Multimode Plate Reader Biotinylation of Avi-tagged TFFs was accomplished by (PerkinElmer) with excitation at 280 nm and emission measured combining 952 μl of 100 µM TFF in PBS, 5 μl of 1 M MgCl2, at 340 nm. The assay was performed in a 384-well plate 20 μl of 100 mM ATP, 20 μl of 40 μM BirA (Sigma) and 3 μl (Corning, 384-well low volume black round bottom polystyrene of 50 mM D-biotin. The solution was incubated for 1 h at 30°C non-binding surface) using 15 µl per well. Each sample cont- with gentle agitation. An additional 20 μl of 100 mM ATP, 20 μl ained mTFF1 or mTFF3 (5 µM) and GlcNAc-α-1,4-Gal (1 µM– of 40 μM BirA and 3 μl of 50 mM D-biotin was added and the 4.1 mM) in 50 mM BTP/citric acid, 25 mM NaCl, at either pH solution incubated for an additional hour. The sample was 2.6, 5.0, or 7.4. For each dilution series, Kd was calculated by filtered (0.22 µm) and purified by size exclusion chroma- fitting the data to a one-site binding curve in Prism 8 (GraphPad) tography (Superdex® 75 10/300, GE Healthcare) using 50 mM and the mean calculated from three independent experiments. Tris-HCl, 150 mM NaCl, pH 7.5 as buffer. The degree of biotinylation was assessed by intact ESI-MS and was found to Crystallisation and structure determination of apo-mTFF1 be near 100% in all cases (Supplementary Fig. 8). Sitting drops comprised of 1 µl well solution (1 M ammonium sulfate, 0.1 M Tris-HCl, pH 8.5) and 1 µL mTFF1 solution (10 Preparation of mucin samples mg.ml-1) supplemented with GlcNAc-α-1,4-Gal (5 mM) A 50 mg.ml-1 stock of pMucin was prepared by resuspending afforded small clusters of rod-like crystals after one month at porcine gastric mucin type III (Sigma) in PBS + 50 mM EDTA 20°C. The crystals were cryo-protected by supplementing the + 0.02% NaN3. This pMucin (1 ml) was dialyzed (100 kDa mother liquor with 3 M ammonium sulfate before being NMWL) against 4 M guanidium hydrochloride (2× 2.0 l) over collected on a crystal loop and cryogenically stored in liquid 48 h at 4 ºC. Precipitate was removed by centrifugation nitrogen. Data was collected at the Australian Synchrotron (8,000×g, 20 min, 4 °C) and the soluble supernatant dialyzed (MX2 beamline)20 and processed using XDS21. The structure (100 kDa NMWL) against PBS (2× 2.0 l) over 48 h at 4 ºC. Half was solved by molecular replacement using PHASER22 and a of this sample was treated with 50 µg/mL CpGH89 at 30°C for truncated version of the mTFF3 crystal structure described 16 h: the CpGH89 was produced as previously described19. Both below as a search model. No density commensurate with a mucin samples were reduced with 10 mM DTT (30 min, 50°C) GlcNAc-α-1,4-Gal ligand was observed. The final model of then alkylated by the addition of 30 mM iodoacetamide (30 min, TFF1 in its apo-form was built in Coot23 and refined with 50°C). These samples were dialyzed (100 kDa NMWL) against Phenix24 to a resolution of 2.4 Å. Data collection and refinement PBS (2× 2.0 l) over 48 h at 4 ºC. The concentration of the final statistics are summarized in Supplementary Table 3. The Rwork pMucinred and pMucinred+GH89 stock solutions were standardised and Rfree after the final refinement was 0.1939 and 0.2259, to A280=1.5 using PBS. These samples were aliquoted, flash respectively. The coordinates have been deposited in the Protein frozen in liquid N2 and stored at -80 °C. Data Bank (accession code: 6V1D). Figures were prepared using PyMOL. TFF-mucin ELISA -1 100 µl of either pMucin (10 µg.ml ), 1:1000 pMucinred, or Crystallisation and structure determination of mTFF3 in 1:1000 pMucinred+GH89 in 100 mM carbonate-bicarbonate buffer, complex with GlcNAc-α-1,4-Gal pH 9.4, was added to each well of a 96-well plate (flat-bottom Surface lysine methylation of mTFF3 was conducted using the Nunc MaxiSorp, Thermo Scientific) and the plate gently nutated protocol described by Kim et al25. Sitting drops comprised of for 16 h at 4°C. The wells were rinsed four times with 200 µl 0.15 µL well solution (2 M ammonium sulfate, 0.2 M potassium PBS + 0.2% Tween (PBS-T) then blocked with 200 µL PBS + sodium tartrate, and 0.1 M trisodium citrate-citric acid, pH 5.6) -1 5% (w/v) skim milk powder (blocking buffer) for 1 h at 25 ºC. and 0.15 µL of permethylated mTFF3 solution (15 mg.ml ) The wells were rinsed four times with 200 µl PBS-T then incu- supplemented with GlcNAc-α-1,4-Gal (5 mM) afforded a single bated with 50 µl blocking buffer + TFF (various concentrations) crystal after 14 days at 20°C. The crystal was cryo-protected by and 1:1000 Strep-HRP (Thermo) for 1 h at 25 ºC. The wells were supplementing the mother liquor with 3 M sodium malonate- rinsed four times with 200 µl PBS-T then developed by adding malonic acid, pH 5.6, before being collected on a crystal loop 100 µl of TMB-Turbo (ThermoFisher) and incubated for 30 min and cryogenically stored in liquid nitrogen. Data was collected at 25 ºC. The reaction was quenched by the addition of 100 µl at the Australian Synchrotron (MX2 beamline)20 and processed 21 of 2 M H2SO4. Absorbance at 450 nm was read within 20 min using XDS . The structure was solved by molecular replace- using an EnVision 2105 Multimode Plate Reader ment using PHASER22 with a truncated version of the porcine (PerkinElmer). Data was plotted using Prism 8 (GraphPad). TFF2 crystal structure as a search model (PDB ID: 2PSP)26. The final model was built in Coot23 and refined with Phenix24 to a Isothermal calorimetry resolution of 1.55 Å. Data collection and refinement statistics All TFF and ligand stock solutions were prepared in the same are summarized in Supplementary Table 3. The Rwork and Rfree buffer (50 mM HEPES, 50 mM NaCl, pH 7.4) and filtered (0.22 after the final refinement was 0.1738 and 0.1882, respect- µm) prior to titration. All control and non-binding ligand ively. The coordinates have been deposited in the Protein Data experiments were performed as 12×3.18 µl injections while for Bank (accession code: 6V1C). Figures were prepared using PyMol and LigPlot+27.

7

bioRxiv preprint doi: https://doi.org/10.1101/859785; this version posted November 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Intact protein MS analysis were annotated as belonging to one of ten groups (glucoamylase, Intact analysis was performed using a 6520 Accurate mass Q- isomaltase, lysosomal α-glucosidase, maltase, sucrase, TFF1, TOF mass spectrometer (Agilent). Protein samples were re- TFF2-1, TFF2-2, TFF3, or zona pellucida) based on their suspended in 2% acetonitrile, 0.1% TFA and 5 μg loaded onto a UniProt annotation. The remaining alignment was used as input C5 Jupiter column (5 µm, 300 Å, 50 mm × 2.1 mm, to calculate phylogenetic distances with phylogeny.fr33. The Phenomenex) using an Agilent 1200 series HPLC system. phylogenetic tree was visualized with iTol34 and colour-coded Samples were desalted by washing the column with buffer A according to the described metadata. Where available, a (2% acetonitrile, 0.1% formic acid) for 4 min and then separated structural representation of each clade of TFF-domains was with a 12 min linear gradient from 2 to 100% buffer B (80% created using PyMOL and included for eight of the ten groups acetonitrile, 0.1% formic acid) at a flow rate of 0.250 ml.min-1. mentioned above. MS1 mass spectra were acquired at 1 Hz between a mass range of 300–3,000 m/z. Intact mass analysis and deconvolution was Methods References performed using MassHunter B.06.00 (Agilent). 20. Aragão, D. et al. 2018. MX2: a high-flux undulator microfocus beamline serving both the chemical and Mucin agglutination assay macromolecular crystallography communities at the Mucin agglutination assays were performed at 25 °C on an Australian Synchrotron. J. Synchrotron Radiat. 25 (3), EnVision 2105 Multimode Plate Reader (PerkinElmer) to 885–891 monitor transmission at 405 nm over time. The assay was 21. Kabsch, W. 2010. XDS. Acta Crystallogr. Sect. D Biol. performed in a 96-well plate (Corning, 96-well round bottom Crystallogr. 66 (2), 125–132 polystyrene non-binding surface) with 25 µl per well. pMucin 22. Storoni, L. C. et al. 2004. Likelihood-enhanced fast concentration was kept consistent at 0.02% w/v and TFF conc- rotation functions. Acta Crystallogr. Sect. D Biol. entration varied (0, 1, 2, 4, 8, 16, or 32 µM). The resulting curves Crystallogr. 60 (3), 432–438 were normalized to percent decrease in signal, averaged over 23. Emsley, P. & Cowtan, K. 2004. Coot : model-building tools three replicates and plotted using Prism 8 (GraphPad). for molecular graphics. Acta Crystallogr. Sect. D Biol. Crystallogr. 60 (12), 2126–2132 TFF displacement assay -1 24. Adams, P. D. et al. 2010. PHENIX : a comprehensive 100 µl of pMucin (5 µg.ml ) in 100 mM carbonate-bicarbonate Python-based system for macromolecular structure buffer, pH 9.4, was added to each well of a 96-well plate (flat- solution. Acta Crystallogr. Sect. D Biol. Crystallogr. 66 (2), bottom Nunc MaxiSorp, Thermo Scientific) and the plate gently 213–221 nutated for 1 h at 30°C. The wells were rinsed four times with 25. Kim, Y. et al. 2008. Large-scale evaluation of protein 200 µl PBS + 0.2% Tween (PBS-T) then blocked with 200 µL reductive methylation for improving protein crystallization. PBS + 5% (w/v) skim milk powder (blocking buffer) for 1 h at Nat. Methods 5 (10), 853–854 30 ºC. The wells were rinsed four times with 200 µl PBS-T then 26. Petersen, T. N. et al. 1996. Structure of Porcine Pancreatic incubated with 50 µl blocking buffer + dTFF1/3bio (16 nM) for Spasmolytic Polypeptide at 1.95 Å Resolution. Acta 1 h at 25 ºC. The wells were rinsed four times with 200 µl PBS- Crystallogr. Sect. D Biol. Crystallogr. 52 (4), 730–737 T then treated with 100 µl GH89 (0-512 nM) in blocking buffer 27. Laskowski, R. A. & Swindells, M. B. 2011. LigPlot+: for 20 h at 30 ºC. The wells were rinsed six times with 200 µl Multiple ligand–protein interaction diagrams for drug PBS-T then probed with 100 µl Strep-HRP diluted 1:1000 in discovery. J. Chem. Inf. Model. 51 (10), 2778–2786 blocking buffer for 1 h at 25 ºC. The wells were rinsed six times 28. Robert, X. & Gouet, P. 2014. Deciphering key features in with 200 µl PBS-T then developed by adding 100 µl of TMB- protein structures with the new ENDscript server. Nucleic Turbo (ThermoFisher) and incubated for 30 min at 25 ºC. The Acids Res. 42 (W1), W320–W324 reaction was quenched by the addition of 100 µl of 2 M H2SO4. 29. Crooks, G. et al. 2004. WebLogo: a sequence logo Absorbance at 450 nm was read within 20 min using an generator. Genome Res 14 1188–1190 EnVision 2105 Multimode Plate Reader (PerkinElmer) and an 30. 2019. UniProt: a worldwide hub of protein knowledge. EC50 calculated using Prism 8 (GraphPad) and the mean from Nucleic Acids Res. 47 (D1), D506–D515 three independent experiments 31. Madeira, F. et al. 2019. The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic Acids Res. Trefoil domain sequence analysis 47 (W1), W636–W641 An annotated sequence alignment of the trefoil domains of 32. El-Gebali, S. et al. 2019. The Pfam protein families human TFF1, TFF2, and TFF3, was created by ESPript 28 database in 2019. Nucleic Acids Res. 47 (D1), D427–D432 (espript.ibcp.fr) . Sequence logos for each of the four mucosal 33. Dereeper, A. et al. 2008. Phylogeny.fr: robust phylogenetic TFF domains in the three human trefoil factors were generated 29 analysis for the non-specialist. Nucleic Acids Res. 36 (Web by WebLogo (weblogo.berkeley.edu) using full length protein Server), W465–W469 sequences annotated as mammalian TFF1, TFF2, or TFF3 34. Letunic, I. & Bork, P. 2019. Interactive tree of life (iTOL) retrieved from the UniProt database30 and aligned with Clustal 31 v4: recent updates and new developments. Nucleic Acids Omega . The phylogenetic tree of all mammalian trefoil Res. 47 (W1), W256–W259 domains was created as follows. All trefoil domain sequences corresponding to PF00088 were retrieved aligned according to their hidden Markov model logo generated by the Pfam database32. Based on UniProt30 metadata entries marked as obsolete or truncated were discarded. The remaining sequences

8