On the relationship of viral adhesins to mammalian bridging factors

Michael L. Kleina, Antonio Romerob, Herbert Kaltnerc, Virgil Percecd, and Hans-Joachim Gabiusc aInstitute of Computational Molecular Science, Temple University, Philadelphia, PA 19122; bStructural and Chemical Biology, Centro Investigaciones Biológicas Margarita Salas, Consejo Superior de Investigaciones Científicas, Ramiro de Maeztu 9, 28040 Madrid, Spain; cInstitute of Physiological Chemistry, Faculty of Veterinary Medicine, Ludwig-Maximilians-University Munich, 80539 Munich, Germany; dRoy & Diana Vagelos Laboratories, Department of Chemistry, University of Pennsylvania, Philadelphia, PA 19104-6323;

Significance The sugary (carbohydrate ) coating of biological surfaces is important for binding and recognition. , a ubiquitous superfamily of defined by presence of a carbohydrate recognition domain, enable bridging between biological surfaces. Evolution has led to distinct receptor classes. Galectins, -binding proteins, form a family present in all animals. The discovery of structural similarity between attachment proteins, named adhesins, and mammalian galectins, most recently for the coronavirus spike , crucial for COVID-19 infection, indicates virus adhesins may have originated from galectins captured from the host genome. This hypothesis suggests that to counter viral threats one could develop therapeutics that inhibit contact between the viral (galectin-like) adhesion proteins and their cellular counterreceptors inspired by experience gleaned from studying galectins.

Abstract Glycan- recognition is vital to processes that impact human health. It plays a fundamental role in diverse pathological and physiological functions as well as in infectious diseases. Carbohydrate recognition domains of endogenous lectins are versatile platforms for creating receptors with wide-ranging specificity that includes peptide motifs. In the quest to find ways to ward off viral threats, unveiling similarities and differences between adeno-, corona-, and rotaviral spike proteins and mammalian galectins, known to be potent bridging factors, has rekindled interest in lectins and their counterreceptors. Herein, we discuss the known structural versatility of galectins to compare and contrast it with that of the related viral adhesins. Our findings and observations are offered to inspire lectin-based approaches to deny viral entry, and thereby thwart infection by present and future viral pathogens. coronavirus | glycan | lectins | infection | pathogen

Footnotes Author contributions: M.L.K. and H.-J.G. designed research; A.R., H.K., M.L.K., and H.-J.G. performed research; A.R. and H.K. contributed analytic tools; A.R., H.K., V.P., M.L.K., and H.-J.G. analyzed data; M.L.K. and H.-J.G. wrote the paper.

To whom correspondence may be addressed. Email: [email protected] or [email protected]

1 Biological surfaces contain a “sugary coating” (1) of complex carbohydrates or heterosaccharides (2-4). The structural diversity of these glycan chains, which are attached to proteins and/or lipid anchors, makes “glycoconjugates much more complex, variegated, and difficult to study than proteins and nucleic acids” (5). Carbohydrates, the letters of the third alphabet of life, form the messages (), ‘readers’ and ‘translators’ of glycan-encoded information have developed that make functional coupling of surfaces possible (6). Indeed, their ability to ‘select’ (Latin, legere) is the origin of the term ‘lectin’ (7). Communication via lectins has emerged as a theme that underlies many processes in pathology and physiology, with relevance for pandemic threats. The latter insight, which is the focus of this article, may even guide to for new ways to fight infections. Lectins have enormous structural versatility, with large families of more or less related proteins. This toolbox serves Nature’s demand for multi-purpose effectors. Strategically, the tips of bacterial pili and viral spike proteins provide bacteria and with the needed “molecular tentacles” for adhesive bridging or docking. The detection of shared sequence and/or folding characteristics offer telltale evidence that certain viral adhesins originated from host lectins. Supporting this hypothesis, sequence alignments of respective virus that initiate the infection process have revealed similarity to carbohydrate recognition domains (CRDs) of mammalian lectins. Specifically, stated for C-type lectin-like domains: “parasitic bacteria and viruses, … are involved in interactions with the animal host and are either hijacked host proteins or their imitations” (8). Of relevance to the current coronavirus pandemic, the rotavirus outer capsid spike protein (VP4), which is responsible for cell attachment and membrane penetration, was proposed to have arisen “by the insertion of a host-derived, galectin-like CRD into an ancestral membrane interaction protein. Subsequently, VP4 acquired a new carbohydrate-binding site and specificity .... These molecular events were probably accompanied by significant changes in viral tropism and pathogenicity” (9). Intriguingly, sequence analysis of the putative fiber protein of a porcine adenovirus isolate, NDAC-1, also disclosed “significant homology” to galectins (10), a class of multifunctional b-sandwich proteins present in all animals (11- 15). Herein, we first illustrate the extent of natural diversity among galectins using the human proteins as an example. These intrafamily comparisons affirm that sequence deviations of the galectin CRD structural platform led to a wide panel of homologous

2 glycan- and peptide-binding proteins. Next, we document evidence for an evolutionary connection between galectins and viral spike proteins. Notably, scores for similarity among certain galectins do not differ much from those of galectin-adhesin pairs. Finally, we argue that systematic consideration of the general glycobiology of virus-host interplay could likely prove helpful to detect a possible Achilles heel in the virus invasion strategy. This, in turn, might offer a direction to find ways to interfere with viral infection.

Fig. 1 Illustration of the crystal structure (top, PDB code: 6FOF) and sequence (bottom, UniProt ID: P17931) of the human galectin-3 (hGal-3) CRD (aa 107-250). The ribbon diagram is shown, top left. The galectin signature sequence of highly conserved amino acids, which are involved in binding cognate β-galactosides, is highlighted with red letters on a yellow background. The boxed image, top right, shows the CRD with higher-level magnification.

3 Results Versatility of the Galectin Fold. The canonical glycan specificity to lactose and its derivatives is based on a sequence signature. Its hallmark is a central Trp for C-H/π- bonding with the B-face of in the core of cognate (Fig. 1). Sequence diversification at neighboring sites or within this set of amino acids lets family members acquire individual glycan specificity profiles (Fig. 1, SI Appendix, Fig. S1). A phylogenetic tree demonstrates that even the loss of carbohydrate-binding activity and a gain of new receptor activity to peptide motifs are natural phenomena within the galectin family (SI Appendix, Fig. S2). In mammalian galectins, affinity to distinct (glyco)proteins as counterreceptors has also developed at sites different from the canonical contact region (Fig. 2). The examples of human galectins-1, -3 and -8 (Gal-1, -3 and -8) evidence the galectin fold have a plasticity for the nature of a binding partner and capacity to recognize proteins. The key issue for multifunctionality is whether galectin pairing with its counterreceptor(s) is a selective and context-dependent process.

4

Fig. 2 Multiple amino acid sequence alignment (Clustal Omega software, top) of human galectin-1 (hGal-1, UniProt ID: P09382), hGal-3 CRD (P17931) and human galectin-8C (hGal- 8C) (O00214). Strictly conserved (red background) and homologous residues (>70% conservation; boxed red letters) are highlighted by coloring. Amino acids in contact with the canonical ligand are marked by asterisks. Illustrated in the middle and bottom, respectively are hGal-3 CRD in interaction with Bcl-2 or CXCL12 at different sites, hGal-1 with preBCR and the recognition site of hGal-8C for the autophagic receptor NDP52 (PDB code: 4HAN). The location of carbohydrate binding sites of hGal-1 and of hGal-8C in the proteins’ S-face are given by lactose. Loss of carbohydrate-binding capacity is highlighted by coloring.

5 Physiologically, the emerging evidence suggests that each galectin has its own set of counterreceptor glycoconjugates and/or proteins in a context-dependent manner (4, 15). In addition to sequence variations, the galectin architecture, i.e., the modularity of CRD presentation, is a factor for specific effector activity, as illustrated in Fig. 3.

Fig. 3 Modular architecture of galectins. The galectin CRD (b-sandwich fold) in the center, hGal-3 CRD (PDB code: 6FOF), is presented in five types of architectural display. Shown clockwise are: a non-covalently associated homodimer (top), a chimera with collagen-like repeats mediating self-association (nine in man) and an N-terminal peptide with two sites of serine phosphorylation, an N-tailed proto-type-like (monomeric) protein, a linker-connected dimer of two different domains as well as the heterotetrameric design found in oysters.

CRD-counterreceptor complementarity, galectin design, and binding-site presentation, e.g., in microdomains, seem to be relevant for homing in on certain glycoconjugates, as well as triggering distinct effects. Examples include Gal-1-specific binding to complex-type N-glycans of the a5b1- for induction of anoikis in tumor cells, and Gal-4 association with carcinoembryonic (CEA), dipeptidylpeptidase- IV (DPP-IV) or aminopeptidase-N (APN) and sulfatide for apical transport and delivery in enterocytes (SI Appendix, Fig. S3). Facilitating firm, e.g. nanomolar affinity, contacts and letting evolution create new binding sites on a structurally stable fold make the galectin CRD an attractive target candidate for a virus to exploit for attachment to host cells.

6

Galectin-Like Viral Proteins. Discovery of the galectin fold in the N-terminal part of the rhesus rotavirus outer capsid spike protein (9) was followed by further detection of such viral proteins. Notable, for this spike protein domain, a spatial shift of the sugar-binding site in a galectin to a cleft region occurred, accommodation of sialic acids there, plus adaptation to allow docking of a potent ligand of human Gal-3, namely the histo-blood group A antigen (Fig. 4). The corresponding sequence alignments are given in SI Appendix, Fig. S4, notably, Gal-12C is closest in similarity score, secondary structure similarity of 44% (rhesus) and 55% (human) not much different from that between Gal-1 and -13 of 62%. The fiber head of porcine adenovirus (type 4) presents two galectin-like domains separated by a linker of 23 residues; a display mimicking the host tandem-repeat-type galectin shown in Fig. 3. In this case, N-acetyllactosamine (LacNAc) and its oligomers bind to the N-terminal CRD at the canonical site (Fig. 4); a conserved signature is seen in sequence alignment, with Gal-3 closest (Figs. 1 & 2; SI Appendix, Fig. S4). Since the adenovirus fiber is trimeric, as with endocytic tissue lectins, e.g., the hepatic asialoglycoprotein receptor, and 12 fibers populate the surface, there are multiple sticky docking options to a host cell surface. In analogy to mammalian and , the receptor-binding domain (RBD) of the trimeric coronaviral prefusion spike is located at the tip. As is the case for the aforementioned adenovirus, it consists of two structural units referred to as N-(most distal) and C-terminal domains (N/CTDs). Strategically, both can act alone or in concert and acquire individual features by dynamic diversification. The a- and b-coronaviral NTD is most similar to the galectin fold, binding glycans in a species-specific manner, i.e., 9- O-acetylated sialic acid/a2,6-sialylated -type core 1 (TF antigen, CD176) O-glycan, and also a peptide motif of a , i.e. CEA-related molecule 1 (Fig. 4, SI Appendix, Fig. S4). The second (CTD) domain, more distant to galectins, as shown in SI Appendix, Fig. S4, harbors the infamous affinity to glycoproteins, namely, APN, DPP-IV and angiotensin-converting enzyme 2 (ACE2) (Fig. 4). Of note, the high degree of homology among mammalian galectins and the availability of crystal structures for human galectins allow one to set a human galectin as reference point instead of a lectin of the true host of recent pandemic malefactors and the genome of origin for assumed capture (e.g. bat). In fact, human Gal-3 CRD has 86.7% (110/146) sequence identity and an additional 10.4% (28/146) sequence similarity to its homolog in Rhinolophus

7 ferrumequinum (Greater horseshoe bat) that is in the same genus as the coronavirus hosts R. macrotis and R. sinicus. Despite these high scores, the current and future coronaviral threat nevertheless gives powerful incentive to supply crystallographic information on bat galectins.

Fig. 4 Carbohydrate recognition domains (CRDs) and receptor-binding domains (RBDs) (sites for binding drawn in semi-transparent surface presentation with crucial amino acids presented as sticks; contact region between RBD and its (sugar or protein) binding partner highlighted as colored curved line). Human galectin-3 (hGal-3, PDB code: 6FOF), in a complex with lactose, is in the center of the figure. Viral proteins are arranged clockwise around the human galectin, starting from the top: rhesus rotavirus VP4 sialic acid-binding domain (rhesus rotavirus VP8*, 1KQR) in complex with 2-O-methyl-a-D-N-acetylneuraminic acid, human Wa rotavirus VP8* carbohydrate-recognizing domain (human rotavirus VP8*, 2DWR) bound to methyl-a-D-N-acetylneuraminic acid, porcine adenoviral NADC-1 (NTD, 2WSV) isolate fiber in complex with lactose, the lectin domain of bovine coronavirus spike protein (bovine-CoV S1-NTD, 4H14), mouse coronavirus receptor-binding domain (mouse hepatitis-CoV S1-NTD, 3R4D) complexed with its receptor mCEACAM1a, Middle East respiratory syndrome-related MERS-CoV spike RBD complexed with its receptor DPP-IV (MERS-CoV S1-CTD, 4KR0), severe acute respiratory syndrome (SARS) coronavirus spike receptor-binding domain (SARS-CoV S1-CTD, 2AJF) complexed with its receptor ACE2 and 2019-nCoV receptor-binding domain (SARS-CoV-2 S1-CTD, 6M0J) in complex with ACE2. Given in boxes are the quantitative measures: Z-score, the statistical metric parameter for extent of fold similarity; rmsd, the root mean square deviation; Seq id, the sequence identity calculation from the BLAST (www.uniprot.org/blast/) and Align (www.uniprot.org/align/) tools of the UniProt database; Sec Struc, the secondary structure similarity using the SSM tool (65). The scoring matrix was PAM (point accepted mutation) 250, identity indicates positions, which have a single fully conserved residue, and similarity indicates conservation between groups of strongly (scoring > 0.5) and/or weakly (scoring = < 0.5) similar properties.

8 In support of a gene-capture (or gene emigration) process, sequence comparisons among members of the family of human galectins reveals cases with an extent of deviation that is comparable in comparisons between human galectins and viral adhesins (SI Appendix, Fig. S4), as noted above for secondary structure elements. Specifically, Gal-1 and GRP share less identity/similarity positions (47.1%) than Gal-1 and SARS-CoV-2 NTD (50.3%) or CTD (52.6%) (SI Appendix, Fig. S4). The Gal-12C CRD is a similar case with scores in this range in comparison to the tested set of galectins (see SI Appendix, Fig. S4F).

Discussion CRDs of cell surface lectins are frequently presented on protein stalks to ensure spatial accessibility, with galectin CRDs being a “class of bridging molecules”, ideal for recognition between surfaces (16). Similarly docking on glycans, these viral adhesins thus harbor lectin capacity (17). Recruitment of galectin-like domains to the initial step of coronavirus infection identifies the “critical determinants of the host range and tissue tropism of coronaviruses”, as well as explains the origin of the clinical consequences of the proposed “gene capture” with relevance to become a target for vaccination by inducing blocking antibodies (18, 19). Intriguingly, the presence of anti-galectin autoantibodies is known in association with certain disorders, e.g. autoimmune diseases, providing a likely source of information on antigenic sites on the galectin CRD platform (20, 21). Moreover, this insight invites a systematic consideration of (fatal) bridging theoretically by galectin-like domains and of glycosylation of virus and host proteins as sites to be able to interfere with undesirable contact building. After all, endogenous lectins are among the effectors of innate immunity protecting us against microbial challenges by recognizing pathogen-associated molecular patterns (22-27). Tactically, an invader should be confronted with restrictions to its entry or lured into binding to smart decoy receptors.

A possible in vitro strategy is to saturate the viral receptor domain by a specific high-affinity compound, thereby precluding docking. Conceptually, this anti-adhesion approach is inspired by the classical hemagglutination inhibition assay (28, 29), and is discussed in the context of clinically harmful activities of endogenous galectins (30): Sialic acid, for rhesus rotavirus VP8* (31); histo-blood group A tetrasaccharide, for the human rotaviral receptor (32, 33); LacNAc oligomers, for the porcine adenovirus type 4 fiberhead (34); N-acetyl(glycolyl)neuraminic acid and its 9-O-acetylated derivative, for b- and likely

9 some a- and g-genus coronaviruses (35-39), also for a d-coronavirus NTD binding mucin

(40) as well as sialyl Tn antigen (CD175s) for the feline infectious peritonitis alphacoronavirus (41) have all emerged as respective candidates and starting points for structural optimization to minimize cross-reactivity to tissue lectins.

Intriguingly, if the binding partner is a protein as shown in Fig. 2 or Fig. 4, then this principle can swiftly be adapted: conformationally restricted peptides, instead of the sugar, may fulfill the same purpose by occupying sites of contact with cellular protein counterreceptors, e.g. either APN, CEACAM1, DPP-IV or ACE2. The characterization of contact points and generators of affinity, e.g. in the case of ACE2 recognition by SARS- CoV-2 (42), is a source for gaining optimal complementarity, and this principle can be readily adapted to cases of cell adhesion molecules as viral entry site (43). Supramolecular chemistry also offers an attractive concept, i.e. intercepting the virus on its way to a cell by a high-density display of cognate compounds, presented either as soluble dendrimeric constructs or suitably tailored nanoscale dendrimersomes (44, 45).

Based on the nature of the mutual contact sites, as illustrated in Fig. 4, epitope masking of the cellular determinants, deserves attention. Toward this end, the viral receptor domain (that may even also evoke an ) or minimally sized peptides with receptor properties, as identified for human galectins-1 and -3 by proteolytic on-site excision and mass spectrometric mapping (46), could serve this purpose. Making the cellular binding partner invisible for the viral receptor can alternatively be accomplished by a tissue lectin that is resident before a virus moves in. Candidates are human galectins-1, -3 or -9 for LacNAc oligomers or the CRD of the macrophage galactose-type lectin (named MGL, CD301, or CLEC10A) (47) and siglecs-

2, -3, -5 and -6 (48) for the sialyl Tn antigen. In combination with defense molecules working at a different level of protection such as a q-defensin (retrocyclin 2), which is building “a protective barricade of immobilized surface proteins” (49), may result in additive effects.

In an indirect manner, the nature of the cellular glycoprotein docking sites warrants attention. In principle, glycan chains can attract lectins, and their association can then automatically impede other contacts in their vicinity. Further exploiting glycans as markers along with the target specificity of tissue lectins, N- and O-glycans of cellular glycoprotein counterreceptors for viral adhesins may direct receptor proteins to strategic

10 sites at the cellular surface, thereby blocking viral attachment by homing in on their preferred target, as Gal-4 does with APN, CEA, and DPP-IV (50-52). Thus, it is crucial to map the glycan profiles of cellular counterreceptors of coronaviruses, ideally obtained for in situ material to exclude source-dependent variations of glycan structures. Conversely, lectins from either the host’s defense system (22, 24-27), or from external sources (53), can become guardians against viruses by connecting to virus glycans.

To identify suitable lectins, it is necessary to determine viral glycan signatures, often an essential shield against immune surveillance. Interestingly, in the special case of a cross-reactive antibody for SARS-CoV and SARS-CoV-2, affinity was found to be increased by an N-glycan at N370 (54). Since the glycan chains of viral surface glycoproteins are assembled by the host’s enzymatic machinery, they are much less likely to serve as vaccines than bacterial glycocompounds. Lectins, however, may find an Achilles heel on the virus surface. Stepwise refinements of tracing viral surface features through tissue lectins by mutational tuning, architecture engineering, and chimera design has potential to create variants with favorable properties (55-57).

Realization that coronavirus adhesins appears as likely derived from mammalian galectins has led us to view the initial step of infection from the perspective of glycobiology. Doing so, suggests exploring lectins or lectin-like domains and glycans or peptides as both targets and tools in efforts toward the ambitious aim of inhibiting cell- virus contact.

Materials and Methods

Methods Alignment of Sequences and Construction of the Phylogenetic Tree. Multiple alignments of amino acid sequences (human galectins, viral spike proteins) were performed using the (EMBL-EBI) Clustal Omega software (58): www.ebi.ac.uk/Tools/msa/clustalo/. For ID entries in the UniProt Knowledgebase, UniProtKB, Expasy Proteomics Server; www.expasy.org, see figure legends. Analysis of evolutionary relationships and construction of the phylogenetic tree for the members of the galectin family was done

11 using the Maximum Likelihood method implemented in the MEGA X software package (59, 60). Initial tree(s) for the heuristic search were obtained by applying the Neighbor- Joining method to a matrix of pairwise distances estimated using the Jones–Taylor– Thornton model as substitution. The test of phylogeny was performed applying bootstrap analysis (with 1000 replicates) and calculating a bootstrap consensus tree. The percentage of tree(s), in which the associated sequences clustered together, is shown next to branches.

Images of 3D Structures. Coordinates of the different vertebrate galectins and viral proteins were taken from the Protein Data Bank (61); for PDB entries, see figure legends. Analyses of the protein-ligand interactions were performed using the PISA web server (62). The structures were visually inspected using COOT (63) and PyMOL programs (64). Structural superpositions were performed with the secondary-structure matching (SSM) routine (65) incorporated in COOT. Figures were generated applying the PyMOL Molecular Graphics System.

ACKNOWLEDGEMENTS. We gratefully acknowledge inspiring discussions with Drs. B. Friday, A. Leddoz and A. W. L. Nose, the valuable input by the reviewers and support from COST Action CA18103 (InnoGly). V.P. and M.L.K. are respectively thankful for generous support of the P. Roy Vagelos Chair at UPenn and the Sheikh Saqr Foundation, RAK.

12 References

1. H. S. Bennett, Morphological aspects of extracellular polysaccharides. J. Histochem. Cytochem. 11, 14-23 (1963). 2. V. Ginsburg, E. F. Neufeld, Complex heterosaccharides of animals. Annu. Rev. Biochem. 38, 371-388 (1969). 3. N. Sharon (1975) Complex Carbohydrates. Their chemistry, biosynthesis, and functions (Addison-Wesley Publ. Co., Reading, MA, USA). 4. H. Kaltner, et al., The sugar code: letters and vocabulary, writers, editors and readers and biosignificance of functional glycan-lectin pairing. Biochem. J. 476, 2623-2655 (2019). 5. S. Roseman, Reflections on glycobiology. J. Biol. Chem. 276, 41527-41542 (2001). 6. H.-J. Gabius, J. Roth, An introduction to the sugar code. Histochem. Cell Biol. 147, 111-117 (2017). 7. H. Kaltner, et al., From glycophenotyping by (plant) lectin histochemistry to defining functionality of glycans by pairing with endogenous lectins. Histochem. Cell Biol. 149, 547- 568 (2018). 8. A. N. Zelensky, J. E. Gready, The C-type lectin-like domain superfamily. FEBS J. 272, 6179- 6217 (2005). 9. P. R. Dormitzer, Z. Y. Sun, G. Wagner, S. C. Harrison, The rhesus rotavirus VP4 sialic acid binding domain has a galectin fold with a novel carbohydrate binding site. EMBO J. 21, 885- 897 (2002). 10. S. B. Kleiboeker, Sequence analysis of the fiber genomic region of a porcine adenovirus predicts a novel fiber protein. Virus Res. 39, 299-309 (1995). 11. S. H. Barondes, Soluble lectins: a new class of extracellular proteins. Science 223, 1259-1264 (1984). 12. H. Kaltner, et al., Galectins: their network and roles in immunity/tumor growth control. Histochem. Cell Biol. 147, 239-256 (2017). 13. K. Godula, Following sugar patterns in search of galectin function. Proc. Natl. Acad. Sci. USA 115, 2548-2550 (2018). 14. J. Hirabayashi (ed), Special issue on galectins. Trends Glycosci. Glycotechnol. 30, SE1-SE223 (2018). 15. G. García Caballero, et al., How galectins have become multifunctional proteins. Histol. Histopathol. 35, 509-539 (2020). 16. F. L. Harrison, C. J. Chesterton, Factors mediating cell-cell recognition and adhesion. Galaptins, a recently discovered class of bridging molecules. FEBS Lett. 122, 157-165 (1980). 17. A. J. Thompson, R. P. de Vries, J. C. Paulson, Virus recognition of glycan receptors. Curr. Opin. Virol. 34, 117-129 (2019). 18. F. Li, Receptor recognition mechanisms of coronaviruses: a decade of structural studies. ournal of virology 89, 1954-1964 (2015). 19. F. Li, Structure, function, and evolution of coronavirus spike proteins. Annual review of virology 3, 237-261 (2016). 20. D. Lutomski, et al., Anti-galectin-1 autoantibodies in serum of patients with neurological diseases. Clin. Chim. Acta 262, 131-138 (1997). 21. K. Sarter, et al., Autoantibodies against galectins are associated with antiphospholipid syndrome in patients with systemic lupus erythematosus. Glycobiology 23, 12-22 (2013). 22. J. K. van de Wetering, L. M. G. van Golde, J. J. Batenburg, : players of the innate immune system. Eur. J. Biochem. 271, 1229-1249 (2004). 23. R. I. Lehrer, et al., Multivalent binding of carbohydrates by the human α-defensin, HD5. J. Immunol. 183, 480-490 (2009). 24. S. Sato, C. St-Pierre, P. Bhaumik, J. Nieminen, Galectins in innate immunity: dual functions of host soluble beta-galactoside-binding lectins as damage-associated molecular patterns (DAMPs) and as receptors for pathogen-associated molecular patterns (PAMPs). Immunol. Rev. 230, 172-187 (2009). 25. W. van Breedam, et al., Bitter-sweet symphony: glycan-lectin interactions in virus biology. FEMS Microbiol. Rev. 38, 598-632 (2014).

13 26. Y. Endo, M. Matsushita, T. Fujita, New insights into the role of in the lectin pathway of innate immunity. Int. Rev. Cell Mol. Biol. 316, 49-110 (2015). 27. D. A. Wesener, A. Dugan, L. L. Kiessling, Recognition of microbial glycans by soluble human lectins. Curr. Opin. Struct. Biol. 44, 168-178 (2017). 28. N. Sharon, Bacterial lectins, cell-cell recognition and infectious disease. FEBS Lett. 217, 145- 157 (1987). 29. N. Sharon, Carbohydrates as future anti-adhesion drugs for infectious diseases. Biochim. Biophys. Acta 1760, 527-537 (2006). 30. A. Romero, H.-J. Gabius, Galectin-3: is this member of a large family of multifunctional lectins (already) a therapeutic target? Expert Opin. Ther. Targets 23, 819-828 (2019). 31. P. R. Dormitzer, et al., Specificity and affinity of sialic acid binding by the rhesus rotavirus VP8* core. J. Virol. 76, 10512-10517 (2002). 32. L. Hu, et al., Cell attachment protein VP8* of a human rotavirus specifically interacts with A- type histo-blood group antigen. Nature 485, 256-259 (2012). 33. X. Sun, et al., Human group C rotavirus VP8*s recognize type A histo-blood group as ligands. J. Virol. 92, e00442-00418 (2018). 34. P. Guardado-Calvo, et al., Crystallographic structure of porcine adenovirus type 4 fiber head and galectin domains. J. Virol. 84, 10558-10568 (2010). 35 R. Vlasak, W. Luytjes, W. Spaan, P. Palese, Human and bovine coronaviruses recognize sialic acid-containing receptors similar to those of influenza C viruses. Proc. Natl. Acad. Sci. USA 85, 4526-4529 (1988). 36. G. Peng, et al., Crystal structure of bovine coronavirus spike protein lectin domain. J. Biol. Chem. 287, 41931-41938 (2012). 37. W. Li, et al., Identification of sialic acid-binding function for the Middle East respiratory syndrome coronavirus spike glycoprotein. Proc. Nat. Acad. Sci. USA 114, E8508-E8517 (2017). 38. R. J. G. Hulswit, et al., Human coronaviruses OC43 and HKU1 bind to 9-O-acetylated sialic acids via a conserved receptor-binding site in spike protein domain A. Proc. Natl. Acad. Sci. USA 116, 2681-2690 (2019). 39. M. A. Tortorici, et al., Structural basis for human coronavirus attachment to sialic acid receptors. Nat. Struct. Mol. Biol. 26, 481-489 (2019). 40. J. Shang, et al., Cryo-electron microscopy structure of porcine dcoronavirus spike protein in the prefusion state. J. Virol. 92, e01556-01517 (2018). 41. T. J. Yang, et al., Cryo-EM analysis of a feline coronavirus spike protein reveals a unique structure and camouflaging glycans. Proc. Natl. Acad. Sci. USA 117, 1438-1446 (2020). 42. J. Shang, et al., Structural basis of receptor recognition by SARS-CoV-2. Nature, doi: 10.1038/s41586-020-2179-y (2020). 43. D. Bhella, The role of cellular adhesion molecules in virus attachment and entry. Phil. Trans. R. Soc. B 370, 20140035 (2015). 44. V. Percec, et al., Modular synthesis of amphiphilic Janus glycodendrimers and their self- assembly into glycodendrimersomes and other complex architectures with bioactivity to biomedically relevant lectins. J. Am. Chem. Soc. 135, 9055-9077 (2013). 45. S. Zhang, et al., Glycodendrimersomes from sequence-defined Janus glycodendrimers reveal high activity and sensor capacity for the agglutination by natural variants of human lectins. J. Am. Chem. Soc. 137, 13334-13344 (2015). 46. A. Moise, et al., Toward bioinspired galectin mimetics: identification of ligand-contacting peptides by proteolytic-excision mass spectrometry. J. Am. Chem. Soc. 133, 14844-14847 (2011).

47. N. Mortezai, et al., Tumor-associated Neu5Ac-Tn and Neu5Gc-Tn antigens bind to C-type lectin CLEC10A (CD301, MGL). Glycobiology 23, 844-852 (2013). 48. E. C. M. Brinkman-van der Linden, A. Varki, New aspects of binding specificities,

including the significance of fucosylation and of the sialyl-Tn epitope. J. Biol. Chem. 275, 8625- 8632 (2000). 49. E. Leikina, et al., Carbohydrate-binding molecules inhibit viral fusion and entry by crosslinking membrane glycoproteins. Nat. Immunol. 6, 995-1001 (2005). 50. E. M. Danielsen, B. van Deurs, Galectin-4 and small intestinal brush border enzymes form clusters. Mol. Biol. Cell 8, 2241-2251 (1997).

14 51. D. Delacour, et al., Galectin-4 and sulfatides in apical membrane trafficking in enterocyte-like cells. J. Cell Biol. 169, 491-501 (2005). 52. H. Ideo, A. Seko, K. Yamashita, Galectin-4 binds to sulfated glycosphingolipids and carcinoembryonic antigen in patches on the cell surface of human colon adenocarcinoma cells. J. Biol. Chem. 280, 4730-4737 (2005). 53. C. A. Mitchell, K. Ramessar, B. R. O'Keefe, Antiviral lectins: selective inhibitors of viral entry. Antiviral Res. 142, 37-54 (2017). 54. M. Yuan, et al., A highly conserved cryptic epitope in the receptor binding domains of SARS- CoV-2 and SARS-CoV. Science 368, 630-633 (2020). 55. M. R. White, E. Crouch, D. Chang, K. L. Hartshorn, Increased antiviral and opsonic activity of a highly multimerized chimera. Biochem. Biophys. Res. Commun. 286, 206-213 (2001). 56. J. Kopitz, et al., Reaction of a programmable glycan presentation of glycodendrimersomes and cells with engineered human lectins to show the sugar functionality of the cell surface. Angew. Chem. Int. Ed. 56, 14677-14681 (2017). 57. A.-K. Ludwig, et al., Design-functionality relationships for adhesion/growth-regulatory galectins. Proc. Natl. Acad. Sci. USA 116, 2837-2842 (2019). 58. F. Sievers, et al., Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011). 59. S. Kumar, et al., MEGA X: Molecular Evolutionary Genetics Analysis across computing platforms. Mol. Biol. Evol. 35, 1547-1549 (2018). 60. G. García Caballero, et al., Galectin-related protein: an integral member of the network of chicken galectins. 1. From strong sequence conservation of the gene confined to vertebrates to biochemical characteristics of the chicken protein and its crystal structure. Biochim. Biophys. Acta 1860, 2285-2297 (2016). 61. M. Varadi et al., PDBe-KB: a community-driven resource for structure and functional annotations. Nucleic Acids Res. 48, D344-D353 (2020). 62. E. Krissinel, K. Henrick, Inference of macromolecular assemblies from crystalline state. J. Mol. Biol. 372, 774-797 (2007). 63. P. Emsley, B. Lohkamp, W. G. Scott, K. Cowtan, Features and development of Coot. Acta Crystallogr. D66, 486-501 (2010). 64. W. DeLano (2002) The PyMOL Molecular Graphics System (Delano Scientifics, Palo Alto, CA, USA). 65. E. Krissinel, K. Henrick, Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta. Crystallogr. D60, 2256-2268 (2004).

15 Supplementary Information The relationship of viral adhesins to mammalian bridging factors suggests an approach to tackle pandemic threats

Michael L. Kleina, Antonio Romerob, Herbert Kaltnerc, Virgil Percecd, and Hans-Joachim Gabiusc aInstitute of Computational Molecular Science, Temple University, Philadelphia, PA 19122; bStructural and Chemical Biology, Centro Investigaciones Biológicas, Consejo Superior de Investigaciones Científicas, Ramiro de Maeztu 9, 28040 Madrid, Spain; cInstitute of Physiological Chemistry, Faculty of Veterinary Medicine, Ludwig-Maximilians-University Munich, 80539 Munich, Germany; dRoy & Diana Vagelos Laboratories, Department of Chemistry, University of Pennsylvania, Philadelphia, PA 19104-6323;

To whom correspondence may be addressed. Email: [email protected], or [email protected]

16

Fig. S1 Illustration of the structural homology and the contact pattern in ligand binding based on the signature sequence of galectins. (a) hGalectin-1 (hGal-1) (PDB code: 1GZW), (b) hGal- 2 (5DG2), (c) hGal-3 (4LBN), (d) hGal-4N (5DUW), (e, f) hGal-4C with 3ʹ-sulfated lactose (4YM2) or lacto-N-neotetraose (4YL7) as ligand, (g) hGal-7 (4XBQ), (h) hGal-8N (3AP9), (i) hGal-9N with the N-acetyllactosamine (LacNAc) dimer (DiLacNAc) (2ZHK) as ligand, (j) Gal- 9C (3NV2), (k) the chicken galectin-related inter-fiber protein (C-GRIFIN) (5NLD), and (l) the chicken galectin-related protein (C-GRP) (5IT6).

17

18

19 Fig. S2 Phylogenetic family-tree diagram (bootstrap consensus tree, shown at the top) of human galectins, GRIFIN and GRP using the Maximum Likelihood method implemented in the MEGAX (v. 10.1.7, released Jan. 2020) software package. Based on a multiple alignment (shown at the bottom, background in red, strictly conserved, in light green, >70% conservation, in yellow, amino acids in contact to canonical ligand lactose) and on highest likelihood, processing of the amino acid sequences led to protein arrangement into a phylogenetic tree, calculations involving a bootstrap analysis (with 1000 replicates) and the Neighbor-Joining method applied to a matrix of pairwise distances estimated using a Jones- Taylor-Thornton model. UniProt IDs: galectin-1 (hGal-1) P09382, galectin-2 (hGal-2) P05162, galectin-3 (hGal-3) P17931, galectin-4 (hGal-4) P56470, galectin-7 (hGal-7) P47929, galectin- 8 (hGal-8) O00214, galectin-9 (hGal-9) O00182, galectin-10 (hGal-10) Q05315, galectin-12 (hGal-12) Q96DT0, galectin-13 (hGal-13) Q9UHV8, galectin-related inter-fiber protein (GRIFIN) A4D1Z8, galectin-related protein (GRP) Q3CW2.

Fig. S3 Schematic illustration of examples for functional consequences of galectin-glycan (protein) pairing.

20

Fig. S4 (A) Multiple alignment of human galectin-3 CRD (UniProt ID: P17931, hGal-3: aa 17- 250), rhesus rotavirus VP4 sialic acid-binding domain (P12473; rhesus rotavirus VP8*: aa 64- 224), human rotavirus VP4 sialic acid-binding domain (Q86169; human rotavirus VP8*: aa 64- 224), porcine adenovirus type 4 NADC-1 isolate fiber galectin domain (Q83467, N-terminal CRD: PAdV4 CRDn: aa 394-523; C-terminal CRD: PAdV4 CRDc: aa 548-683), bovine coronavirus spike glycoprotein NTD (Q1HLC5, BoCoV NTD: aa 8-284), Murine hepatitis virus spike glycoprotein NTD (Q9J3E7, MHV NTD: aa 16-290), and feline coronavirus UU4 spike glycoprotein (C6GHB7, FIPV UU8 domain0: aa 1-275/domain A: aa 276-540).

21

Fig. S4 (B) Multiple alignment of Middle East respiratory syndrome-related (MERS) coronavirus spike glycoprotein (A0A0D3MU71, MERS-CoV NTD: aa 18-353/CTD: aa 382- 503), human severe acute respiratory syndrome 2 coronavirus spike glycoprotein (P59594, SARS-CoV NTD: aa 14-292/CTD: aa 317-507), and SARS 2 (2019-nCoV) (P0DTC2, SARS- CoV-2 NTD: aa 16-305/CTD: aa 330-521) with reference to hGal-3 and BoCoV NTD. Residues were colored according to their physicochemical properties (Clustal Omega color code): red (AVFPMILW, small and hydrophobic (incl. aromatic, but not Y)), blue (DE, acidic), magenta (RK, basic, but not H), green (STYHCNGQ, hydroxyl, sulfhydryl, amine and G). Highly conserved amino acids in hGal-3 involved in binding cognate β-galactosides and equivalent positions in aligned sequences were marked with a yellow background.

22

Fig. S4 (C) Illustration of the structural homology of NTDs/CTDs from the spike glycoproteins of SARS-CoV-2 (PDB code: 6VXX), SARS-CoV (6CRZ) and MERS-Cov (6Q05) shown separately (top) and as parts of the respective ectodomains (S1/S2, bottom).

23

24

Fig. S4 (D-F) Sequence identity and similarity between human galectins-1/3 and versus viral adhesins (galectin/viral adhesin versus viral adhesin (D, E) as well as the comparison in checkerboard-style between carbohydrate-recognition domains of eight human galectins (F, *GRIFIN = galectin-related inter-fiber protein, **GRP = galectin-related protein)). Above the diagonal line depicted as series of grey boxes on the right are given: identities (%, in each rectangle given above the line) and similarities (%, in each rectangle given below the line), in parentheses: numbers of matches/sequence stretch covered; below the diagonal line of grey boxes on the left: sum (%) of the number of identities and similarities, in parentheses: sum of matches/sequence stretch covered. Calculations were done using the BLAST (www.uniprot.org/blast/) and Align (www.uniprot.org/align/) tools of the UniProt database. Scoring matrix was PAM (point accepted mutation) 250, identity indicates positions which have a singly, fully conserved residue, similarity indicates conservation between groups of strongly (scoring > 0.5) and/or weakly (scoring = < 0.5) similar properties (please see also legend to main text Fig. 4).

25