CH-π interactions play a central role in protein recognition of carbohydrates

by Roger Christopher Diehl B.S. Biochemistry University of Wisconsin-Madison, 2012 M.S. Biochemistry University of Wisconsin-Madison, 2017

SUBMITTED TO THE DEPARTMENT OF CHEMISTRY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN CHEMISTRY AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 2021

©2021 Massachusetts Institute of Technology. All rights reserved.

Signature of Author:______Department of Chemistry January 15, 2021

Certified by:______Laura L. Kiessling Novartis Professor of Chemistry Thesis Supervisor

Accepted by:______Adam Willard Associate Professor Graduate Officer

1

This doctoral thesis has been examined by a committee of the Department of Chemistry as follows:

Professor Barbara Imperiali…………………………………………………………………………………………………. Thesis Committee Chair Class of 1922 Professor of Chemistry and Biology

Professor Laura L. Kiessling…………………………………………………………………………………………………. Thesis Supervisor Novartis Professor of Chemistry

Professor Matthew D. Shoulders…………………………………………………………………………………………… Thesis Committee Member Assistant Professor of Chemistry

2

CH-π interactions play a central role in protein recognition of carbohydrates

by Roger Christopher Diehl Submitted to the Department of Chemistry in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the Massachusetts Institute of Technology Abstract Carbohydrate-protein interactions play a central role in biology, but knowledge of the forces underlying them is limited. Carbohydrates are generally hydrophilic and therefore present unique challenges in their recognition. One underappreciated force involved in carbohydrate-protein interactions is the CH-π interaction, an attractive interaction between the aliphatic protons of a carbohydrate and the π system of an aromatic ring. In this thesis, I examine the fundamental nature, strength, and biological significance of this interaction, largely in the context of a family of carbohydrate-binding proteins known as galectins. In Chapter 1, I review previous knowledge of the forces underlying carbohydrate-binding proteins and the forces they utilize to bind their . In particular, I focus on CH-π interactions and galectins.

In Chapter 2, I examine the forces that contribute to CH-π interactions in the context of carbohydrates and aromatic compounds in aqueous solution. I find the CH-π interaction to be electronic in nature, and demonstrate its selectivity between different carbohydrates.

In Chapter 3, I determine the contribution of the CH-π interaction to the binding of galectin-3, a human carbohydrate-binding protein of medical significance. The data demonstrate that the CH-π interaction accounts for a majority of the binding energy.

In Chapter 4, I explore the biological implications of the CH-π interaction in galectin-3. I demonstrate that the CH-π interaction is critical for the biological activities of galectin-3. In Chapter 5, I propose several directions future researchers could take to extend this work. For three of four directions, I present the progress I have made during my studies.

The work contained within this thesis demonstrates that CH-π interactions play a central role in protein-carbohydrate interactions at both a molecular level and a biological level. Understanding the CH-π interaction is key to explaining and predicting the activity of carbohydrate-binding proteins.

Thesis Supervisor: Laura L. Kiessling Title: Novartis Professor of Chemistry

3

Acknowledgements This work was only possible due to the many wonderful mentors I have had throughout my growth and development as a scientist. I cannot name all of them here, but I will name some. To my parents Tim and Sita Diehl, thank you for all your support in raising me to be the man I am today. Of relevance to this work, thank you for encouraging my curiosity as a child and enabling me to pursue my dreams. To Professor Nancy A Rice at Western Kentucky University, thank you for giving me my first experience in a modern biochemistry laboratory at VAMPY 2005. Science has changed since then but my passion for it has not. To Ms. Lucy Organ at Hillsboro High School, thank you for helping me realize that chemistry in particular is my field of interest. I am still a proud chemist, and still a proud Burro. Professor Lauren Buchanan, thank you for making my first chemistry class of college enjoyable, and for helping me handle the stress of my first first-author paper. Professor M Thomas Record, thank you for taking me into your lab as a freshman and teaching me techniques and principles that serve me well to this day. Chapters 2 and 3 in particular greatly benefitted from your perspective. Dr. Emily J. Guinn, thank you for being the best graduate student mentor an undergraduate researcher could ask for. You knew of my potential as a scientist when I doubted it, and helped me build the confidence I needed to make it through graduate school. Your example is one I look to whenever I am called upon to teach a fellow scientist. Last but definitely not least, thank you Professor Laura L. Kiessling. Thank you for seven excellent years in your lab, and in particular for giving me the opportunity to design the projects in this thesis. You have placed great trust in me over these years, and I sincerely hope you feel that I have rewarded that trust. Thank you also for ensuring a positive workplace environment in Kiessling Lab, even as students and postdocs have come and gone and the lab has moved from Wisconsin to MIT. Science is never easy and it is of the utmost importance that it takes place in a supportive context. Finally, I would like to thank all of the members of Kiessling Group that I have worked with during my time as a graduate student. All of you have been wonderful colleagues, and have helped me learn a broad array of disciplines and techniques ranging from vaccine development to reverse phase high-pressure liquid chromatography. I would particularly like to thank two subsets of Kiessling Group members. First, I would like to thank Dr. Christine Isabella, Dr. Alex Justen, Dr. Cassie Jarvis, and Professor Caitlin McMahon for organizing the move to MIT. It was an unforgettable experience and you four were key in making sure it proceeded as smoothly as possible. Second, I would like to thank Stephen Early, Melanie Halim, Alan Carter, Dr. Robert Brown, Dr. Mohammad Murshid Alam, Dr. R. Lyle McPherson, and Dr. Amanda Dugan for their contributions to the work in this thesis. Again, you have all been wonderful to work with.

4

Table of Contents

Abstract………………………………………………………………………………………………………3 Acknowledgements………………………………………………………………………………………4 Table of Contents………………………………………………………………………………………….5 List of Tables and Schemes…………………………………………………………………………...7 List of Figures………………………………………………………………………………………………8

Chapter 1: Molecular Features of Lectin-Carbohydrate Interactions………………….10 Introduction…………………………………………………………………………………………………………11 Previously described features of carbohydrate-binding sites……………………………………..13

Previously described features of CH-π interactions…………………………………………………..15 Biophysical background regarding galectins…………………………………………………………….17 Biological background regarding galectins………………………………………………………………20 Conclusions and present work………………………………………………………………………………..22 References…………..……………………………………………………………………………………………….25

Chapter 2: Electronic Nature of CH-π Interactions………………………………………….36 Background and significance………………………………………………………………………………….37 Experimental section…………………………………………………………………………………………….38

1H NMR assay shows CH-π interactions in aqueous solution…………………………………….40 CH-π interactions between carbohydrates and aromatic groups are electronic in nature……………………………………………………………………………………………….43 CH-π interactions display selectivity between monosaccharides and anomers…………….45 Conclusions and future directions…………………………………………………………………………..48 Acknowledgements……………………………………………………………………………………………….50 References……………………………………………………………………………………………………………50

5

Chapter 3: A CH-π interaction drives glycan-binding to human galectin-3…………52 Background and significance………………………………………………………………………………….53 Galectin-3 bears a conserved tryptophan centrally located in the binding site…………….54 Materials and methods………………………………………………………………………………………….55 Galectin-3 variants are stable at room temperature………………………………………………….57 Galectin-3 variants have reduced binding affinity towards lactose…………………………….60 CH-π interactions account for the majority of binding energy in galectin-3………………..63 Acknowledgements……………………………………………………………………………………………….64 References……………………………………………………………………………………………………………64

Chapter 4: Biological functions of the CH-π interaction in galectin-3…………………67 Roles and structures of human galectins…………………………………………………………………68 Full-length galectin-3 variants have reduced agglutination activity towards mouse red blood cells…………………………………………………………………………………………….69 Wild-type galectin-3C binds to secreted mucins, but variants at W181 do not……………..71 Wild-type galectin-3 is exported from HEK-293 cells, while variants at W181 are retained in the cell………………………………………………………….73 Conclusions………………………………………………………………………………………………………….76 Acknowledgements………………………………………………………………………………………………..77 References…………………………………………………………………………………………………………….77

Chapter 5: Further Avenues for Investigation…………………………………………………80 Hammett series to determine electronic nature of CH-π interactions in a protein……….81 Design of glycomimetic ligands with enhanced CH-π interactions……….……………………86 Conferral of CH-π interactions on existing antiretroviral lectins……………………………….90 Characterization of tandem repeat galectin binding profiles to mammalian and microbial glycans………………………………………………………………………….93 Acknowledgements……………………………………………………………………………………………….98 References……………………………………………………………………………………………………………98

6

List of Tables and Schemes

Chapter 2: Electronic Nature of CH-π Interactions Scheme 1: Deprotection of methyl 3,4,6-tri-O-benzyl-β-D-mannopyranoside…………...39

Chapter 3: A CH-π interaction drives glycan-binding to human galectin-3 Table 1: Strength of CH-π interactions with lactose at position 181 by variant……………62

Chapter 5: Further Avenues for Investigation Scheme 1: Two-step synthesis of tryptophan analogs from indole precursors……………..82 Scheme 2: One-step chemoenzymatic synthesis of substituted tryptophan analogs……..82

7

List of Figures Chapter 1: Molecular Features of Lectin-Carbohydrate Interactions Figure 1: Classes of carbohydrate-binding proteins………………………………………………….11 Figure 2: Aromatic residues are overrepresented in carbohydrate binding sites………….15 Figure 3: Electronic and dispersive forces contribute to the strength of CH-π interactions between carbohydrates and aromatic groups……………………………………………………………16 Figure 4: Galectin subfamilies are based on domain structure and mode of oligomerization…………………………………………………………………………………………17 Figure 5: Conserved residues of the galectin binding site subsite B…………………………….18 Figure 6: The galectin-glycoprotein lattice……………………………………………………………….21 Chapter 2: Electronic Nature of CH-π Interactions Figure 1: Methyl β-galactoside and indole form a geometrically defined complex in aqueous solution……………………………………………………………………………………………….42 Figure 2: Stacking geometry is consistent for all indole concentrations tested…………….43 Figure 3: Carbohydrate-aromatic interactions in aqueous solution are electronic in nature……………………………………………………………………………………………….44 Figure 4: CH-π stacking geometries in solution resemble patterns found in carbohydrate binding sites……………………………………………………………………………………..46 Figure 5: CH-pi interactions involving fucose and N-acetylglucosamine resemble those involving galactose and glucose respectively…………………………………………………………….47 Chapter 3: Mutational tuning of the CH-π interaction in galectin-3 Figure 1: Structure of the galectin-3 binding site………………………………………………………57 Figure 2: Most galectin-3 variants at W181 are stable at room temperature………………..59 Figure 3: Galectin-3C variants show decreased binding to lactose……………………………..60

Chapter 4: Biological functions of the CH-π interaction in galectin-3 Figure 1: The W181 CH-π interaction is necessary for galectin-3 induced hemagglutination………………………………………………………………………………………………….70 Figure 2: Binding of galectin-3 to mucins is a carbohydrate-protein interaction dependent on the W181 CH-π interaction………………………………………………………………..73 Figure 3: Wild-type galectin-3 is robustly exported from HEK-293 cells, while variants are retained within the cell…………………………………………………………………………………….75

8

Chapter 5: Further Avenues for Investigation Figure 1: Fluorinated analogs have similar lactose binding to wild-type galectin-3C……85 Figure 2: The CH3 bond of β-galactose is polarized by overlap with the antiperiplanar CO4 hydroxy group………………………………………………………………………..87 Figure 3: Proposed monosaccharide analogs to test the relative importance of contributors to CH-π interaction strength……………………………………………………………....88 Figure 4: Algal and bacterial lectins recognize diverse motifs on HIV gp120 Man9 glycans………………………………………………………………………………………………………..91 Figure 5: Cyanovirin-N variants at T25 are stable at room temperature……………………..92 Figure 6: Tandem repeat N- and C-terminal domains are similar to each other…………..94 Figure 7: Tandem repeat galectin domains are stably folded at room temperature………96

9

Chapter 1: Molecular features of lectin-carbohydrate interactions

This chapter is reproduced in part with permission from Hudson, K. L.; Bartlett, G. J.; Diehl, R. C.; Agirre, J.; Gallagher, T.; Kiessling, L. L.; Woolfson, D. N. Carbohydrate-Aromatic Interactions in Proteins. J. Am. Chem. Soc. 2015, 137 (48), 15152–15160. Copyright 2015 American Chemical Society.

10

Introduction

Glycans are found in all cells1 and play crucial roles in many facets of biology. Some primarily play structural roles, such as cellulose in wood, chitin in fungi and arthropods, and chondroitin in cartilage.2 Others, such as starch and glycogen, act as readily accessible energy sources. However, one of the most diverse and necessary roles glycans play is as conveyors of information. Carbohydrates excel in this role because they not only vary in their linear sequence, but also the nature of the linkages between each monomer. Additionally, the multiple allowed linkages in a carbohydrate allows for branched structures, adding yet another element of diversification even in a small oligosaccharide.3 Oligosaccharides bear a narrower range of functional groups than proteins and do not appear to form intricate secondary structures as

RNA does, so most biological functions of carbohydrates require a protein to recognize the involved carbohydrate.4–6 Herein lies another challenge; carbohydrates are highly hydrophilic and afford little binding energy through the hydrophobic effect, a source of binding energy that is essential to many other classes of ligand. Proteins that recognize carbohydrates must therefore bring other forces to bear in order to bind their ligand.

Carbohydrate-binding proteins generally fall into four classes: antibodies, carbohydrate-modifying enzymes, GAG-binding proteins, and lectins (Figure 1.1). Given that pathogens, like all cellular organisms, are coated in a carbohydrate-based

7 Figure 1: Classes of carbohydrate-binding proteins. A) glycocalyx, it is little surprise that the Antibodies generally bind carbohydrates loosely due to the hydrophilic nature of the ligand. B) Enzymes bind human immune system generates carbohydrates tightly in order to modify them. C) Lectins have evolved for recognition of specific binding epitopes on antibodies against glycans.8,9 Due to glycans. D) GAG-binding proteins bind anionic glycosaminoglycans with heavily cationic motifs. the difficulty of strongly binding a

11 carbohydrate ligand, carbohydrates tend to be poorly immunogenic but carbohydrate-binding antibodies do occur, especially when the carbohydrate is presented on a peptide scaffold as would be the case with an enzymatically digested glycoprotein.10 Unlike other carbohydrate- binding proteins, carbohydrate-binding antibodies can often bind peptides in the same site, though generally by taking advantage of different features of the binding site than are used to bind carbohydrates.11 While relatively low in affinity,12 carbohydrate-binding antibodies display many similar features to other carbohydrate-binding proteins, such as a heavy reliance on aromatic residues13 and extended binding sites.12 Due to the low monovalent affinity, multivalency is crucial for antibody binding to carbohydrate targets.14

Another class of proteins that must recognize carbohydrates includes the enzymes that modify them. While carbohydrate-modifying enzymes have a range of specificities,15 they generally have tighter, more encompassing binding sites in order to position the sugar residues properly for .16 In many cases, this binding is aided by the presence of a nucleotide attached to the sugar, where the nucleobase and phosphate can afford substantial affinity.17

Covalent binding and transition metals can help to position the substrate for catalysis.18 Also, the fine specificities of these enzymes are often conferred by a lectin-like carbohydrate-binding module that does not possess catalytic activity itself.19 Otherwise, many of the trends observed in other carbohydrate-binding proteins also apply to enzymes.

Highly anionic carbohydrates known as glycosaminoglycans (GAGs) play key roles in mammalian development via cell-cell and cell-matrix interactions;20 These interactions are mediated by a broad range of GAG-binding proteins.21 A common feature of such proteins is their reliance on cationic amino acids for recognition, often in the pattern XBBXBX or

XBBBXXBX, where X is an uncharged residue and B a basic residue (cationic at neutral pH).22

These cationic motifs bear some similarity to those found in anti-microbial peptides, and there are a number of peptides with both GAG-binding and antimicrobial activity for this reason.23

12

These interactions are particularly strong — displaying a peptide consisting of the immediate binding site alone on a surface can allow adhesion of cells via their proteoglycans,24 whereas with other classes of carbohydrate-binding protein, the entire protein is generally needed to yield a high-affinity interaction. The central principle involved is an electronic interaction between the positively charged GAG binding site and the negatively charged GAG.

Finally, lectins are a class of proteins evolved to bind carbohydrates, usually without any direct enzymatic activity. While the earliest work on lectins focused on the abundant and readily isolable plant lectins,25 lectins exist in all domains of life26 in a variety of roles. Some are toxic, such as those in rattlesnake venom27 or cholera toxin,28 while others are essential for life, such as

XEEL, a lectin that forms a protective gel around Xenopus laevis eggs.29 To better understand such a diverse category of proteins, lectins are classified into families based on their fold and generalizations regarding their binding specificity.30 Due to their relatively strong affinities and relatively flexible specificities, I focused on a family of lectins known as the galectins in my studies of the mechanisms behind carbohydrate-protein interactions. That said, many of the conclusions reached here are likely applicable to enzymes and antibodies as well, and additional facets of GAG-binding, enzyme-carbohydrate, and antibody-carbohydrate interactions warrant further study.

Previously described features of carbohydrate binding sites

Given the challenges of glycan recognition, I analyzed data on how lectins typically bind carbohydrates. Weis and Drickamer published seminal work in this field in 1996,16 which identified several forces that play a key role in binding carbohydrates, with a focus on the binding activities of lectins. The first force Weis identified, and perhaps the most obvious, is direct hydrogen bonding. Interestingly, this typically does not involve protein hydroxy groups such as those of serine, threonine, or tyrosine, likely due to the entropic cost of simultaneously fixing the rotamers of the protein and sugar hydroxy groups. A noted exception is the frequent

13 occurrence of serine hydroxy groups in binding sites for sialic acid, where the hydroxyl groups act as hydrogen-bond donors to the sialic acid carboxylate. In a number of cases where highly structured water molecules are present, water-mediated protein-carbohydrate hydrogen bonds can be found. In isolation, hydrogen bonds are strong, with a bond energy of 3 to 7 kcal/mol.31 A central challenge for lectins is that hydrogen bonding is also present in bulk water, and lectins only function if they can generate a preference for the ligand binding over its retention in bulk water.3 Hydrogen bonding plays a far greater role when carbohydrates are bound in organic solvents,32 or for generating selectivity between different carbohydrate ligands.33 Hydrogen bonding can also be crucial for restricting the mobility of the ligand, as in the case of xylan carbohydrate-binding modules.34 A related effect, the displacement of water from a preorganized binding site, can provide additional affinity for some lectins.35,36

A second group of forces available is those involving cations, especially bound divalent metals such as calcium and manganese. C-type lectins and intelectins use calcium ions, with the ion accepting coordinate bonds from lone pairs on two vicinal hydroxy groups on the sugar.29,37–

40 Intelectins use calcium ions in a similar role. Arginine residues can play a similar role, with the electronic effects of proximity to partially negative hydroxyl oxygen atoms complementing the attractive strength of hydrogen bonds via the guanidinium protons.41 Cationic residues also, as noted, play a large role in binding glycosaminoglycans42- interestingly divalent metals are not as common in these cases.

Weis and Drickamer also examined the role of nonpolar residues in carbohydrate binding sites.16 Here, they noted that this form of recognition was most prevalent for galactose thanks to its axial 4-hydroxy group, which provides a continuous aliphatic patch on the α-face of

14 the sugar. On the face of it, this seems to be a surprising way to recognize such a hydrophilic ligand as galactose, and it bears mentioning that the residue involved is typically an aromatic residue (Figure 2).43 Similarly, aromatic residues excel at binding N- Figure 2: Aromatic residues are overrepresented in acetylgalactosamine (GalNAc).44 These carbohydrate binding sites. The horizontal axis is ordered with increasing hydrophobicity to the right, while the vertical axis shows the ratio between the interactions are also particularly prominent in occurrence of a given amino acid within 4 Å of a noncovalently bound glycan compared to the cellulases, as the highly ordered β-glucosides frequency in the UniProt database. Figure adapted from Hudson et. al. 2015. can be sandwiched by tyrosine and tryptophan residues to separate them from water or other polysaccharide chains.45 Likewise, human α- galactosidase uses tyrosine and tryptophan residues to bind the α-face of α-galactose away from the anomeric position while the enzyme distorts the geometry of the sugar to allow for cleavage.46

Previously described features of CH-π interactions

Based on the observed requirement for an aromatic residue, it is likely that the interaction at work when tryptophan, phenylalanine, or tyrosine is in the binding site is a CH-π interaction. Thus, in order to comprehend the role of aromatic residues in carbohydrate binding sites, one must understand the features of CH-π interactions. In the 1980s, it became apparent that under certain circumstances, aliphatic and aromatic CH protons could act as hydrogen bond donors to oxygen or nitrogen-based hydrogen bond acceptors.47 By the same token, canonical hydrogen bond donors consisting of OH or NH protons were observed interacting with π-systems, especially those of aromatic compounds, with the π system playing a role of a hydrogen bond acceptor.48 This led to the notion that a CH proton could bond with a

π system in an analogous manner, as Motohiro Nishio demonstrated computationally in 1995.49

Further computational work by the Chandrasekhar group showed that this interaction could

15 occur between two aromatic systems, and that CH-π interactions are somewhat stronger for heterocyclic aromatic systems than , and that the orientation of a single hydrogen directly towards the aromatic ring is preferred over two hydrogens angled towards opposite edges of the ring.50

Earlier computational works generally cite dispersion forces as providing the majority of the interaction energy, aided by the high polarizability of the C-H bond.51 However, the CH bonds of an Figure 3: Electronic and dispersive forces aliphatic hydrocarbon such as methane grant only a contribute to the strength of CH-π interactions between carbohydrates and

50,52 aromatic groups. A) Dispersive forces rely modest interaction with an aromatic π system. on the high polarizablility of C-H bonds to create transient partial charges in both Partial substitution of methane with chlorine and molecules, which oscillate to maintain an attractive interaction. B) Electronic forces fluorine, by contrast, results in a considerably take the form of a dipole-quadrupole interaction between the partially positive stronger interaction between the remaining face of the sugar and the partially negative face of the aromatic group. hydrogens and the benzene π system,52–54 suggesting that a strong dipole moment leads to a large electronic contribution (Figure 3).55 That there would be a strong electronic contribution was also investigated in the context of π-π interactions mediated through aromatic CH bonds, in the case of substituted , where an electron-withdrawing substituent on the edge-on aromatic ring resulted in a stronger interaction.56

In recent years, the CH-π interaction has been explained in terms of hard/soft acid-base theory, where it represents a hydrogen bond between a soft acid (the CH group) and a soft base

(the π system).57 The ability of CH bonds to act as soft acids is demonstrated by their preference for softer chloride over harder oxyanions in synthetic anionophores.58 It is clear that both electronic forces and the hydrophobic effect are important in mediating these interactions in water,59 and furthermore that the interaction is highly solvent-dependent60,61 and often cooperative.62,63 CH-π interactions are useful in achieving fine control over organic synthesis

16 reactions via shifting the balance between transition states.64 The balance of electronic and dispersive effects has been an active area of debate. In the case of nonpolar CH groups63,65 and to a lesser extent weakly polarized carbohydrates such as β-GlcNAc66 the dispersive component is dominant, but strongly polarized carbohydrates such as β-Gal are more likely to be found in proximity to an aromatic group43 and in these cases the electronic component is likely large.67 In the case of carbohydrates, the cooperativity of the interaction means that aromatic rings have some selectivity in favor of monosaccharides that can make three CH-π interactions.68 By contrast, furanosides, which have flexible ring structures with poor complementarity with the aromatic systems, are unlikely to make CH-π interactions.69 Computational studies suggest that the indole group of tryptophan, due to its greater electron richness,70 should be a better CH-π acceptor than a benzene ring as found on phenylalanine,71 which is reflected in the overrepresentation of tryptophan in carbohydrate binding sites.43 Histidine, by contrast, has an especially electron-poor π system and generally serves as a hydrogen donor when it is found in a carbohydrate binding site.43,72

Biophysical background regarding galectins

Galectins are a family of proteins found across animal taxa, distinguished by their affinity for β-galactosides and the distinctive “jelly roll” fold of their Figure 4: Galectin subfamilies are based on domain carbohydrate recognition domain.73 Fifteen structure and mode of oligomerization. A) Prototype galectins consist of a single carbohydrate-recognition are described in mammals, of which eleven domain and form noncovalent dimers. B) Chimeric galectins have a proline-rich N-terminal domain that are found in humans.74,75 Galectins are found forms flexible oligomers. C) Tandem repeat galectins consist of two carbohydrate recognition domains in a broad range of tissue types, and in connected by a disordered linker. humans galectins-1 and 3 are ubiquitously expressed.76 Mammalian galectins consist of three subfamilies, based on their domain structure and oligomerization (Figure 4). Prototype galectins, such as galectins-1, 2, and 7, contain a carbohydrate recognition domain and no other

17 domain.76 These galectins are capable of noncovalent dimerization with the two monomers oriented in opposite directions, but generally exist in a monomer-dimer equilibrium in solution, favoring the dimer upon binding to ligand.77 Tandem-repeat galectins, such as galectins-4, 8, and 9, possess two distinct carbohydrate recognition domains and a linker of variable length and composition.78 The domains have differing carbohydrate specificity, the implications of which are an active area of study.79

Galectin-3, the focus of my studies, exists in a subfamily all to its own as a chimera-type galectin. It possesses a C-terminal carbohydrate recognition domain highly similar to other galectins in the active site, but its N-terminal domain is a loosely ordered proline-rich helix that engages in numerous protein-protein interactions inside and outside of the cell.80 These interactions include a highly flexible form of oligomerization; galectin-3 exists as a monomer in solution81, but oligomerizes upon binding multivalent ligands82, creating a galectin-glycan lattice that can force receptor clustering or mediate host-pathogen interactions.83 Prototype galectin dimers and bivalent tandem repeat galectins can form similar lattices as well.

Galectins are generally unable to benefit from multivalent binding to a single ligand, as their carbohydrate binding domains are oriented away from each other. In order to maximize binding affinity for their hydrophilic and usually uncharged ligand, galectins instead utilize extended binding sites, where as many as four monosaccharide units are recognized by the binding site, in subsites A, C, and D situated on either side of the required β-

74,84 galactose or β-GalNAc residue, which Figure 5: Conserved residues of the galectin binding site subsite B. Bifurcated hydrogen bond donors and binds in subsite B. Like other cases of acceptors bind the hydroxy groups of β-galactose, while tryptophan makes a CH-π interaction with the extended binding sites in lectins, the α-face of the sugar.

18 peripheral monosaccharides (often Glc or GlcNAc, Fuc, or another Gal/GalNAc) need not be able to bind as monosaccharides, as their binding affinity is individually weaker than the primary determinant β-Gal/β-GalNAc.16 In the case of galectin-3,β-galactose is able to bind as a monomer, but its affinity is a meager 5-20 mM.84 The canonical ligand traditionally used to block galectin-3 binding in biological assays is lactose (Gal-β1-4-Glc), for which galectin-3 has a binding affinity of approximately 100 μM.85 However, further elaboration of the glycan can provide additional affinity approaching two orders of magnitude.86 In the case of an extended

LacNAc epitope, the stronger binding provided by a tetrasaccharide epitope is nearly as beneficial as arranging for divalent binding via an alkyl linker.87 This additional affinity and the widespread occurrence of lactose and lactosamine epitopes in mammalian glycans render galectins potent binders of carbohydrate self-epitopes, but in a far more selective manner than plant lectins such as wheat germ agglutinin.88

All vertebrate galectins possess a tryptophan residue in the β-galactose-binding subsite

(Figure 5).89 In the case of galectin-3, this residue is Trp181, and notably is quite mobile in apo- galectin-3 but is far more ordered in lactose-bound galectin-3.81 A conserved arginine, R186 in galectin-3, near the β-face of the sugar hydrogen bonds with O4 and O6 of the β-galactose residue and may help polarize the molecule to aid the CH-π interaction. Arginine and lysine are also heavily represented in the other subsites, especially subsite A in at the non-reducing end of the binding site,90 and these arginines are heavily targeted for inhibitor development via cation-

π interactions with aromatic substituents on the inhibitor.91 Due to the high conservation of subsite B, efforts at mutational tuning of galectin-3 to alter its specificity have focused on the other subsites.92 As such, the precise importance of the subsite B residues remains underexplored.

Biological background regarding galectins

19

Galectins in general, and galectin-3 in particular, are heavily targeted for inhibitor development due to their myriad biological roles.93 First of all, galectin-3 tends to promote tumor progression by a number of activities, most of which revolve around inducing clustering in N-glycosylated cell surface receptors.94 Through binding and clustering integrins, galectin-3 can promote cell mobility.95 Galectin-3 plays a key role in cell-matrix signaling,96,97 including polymerizing hensin through its carbohydrate-binding activity.98 In tumors, strong galectin-3 ligands are upregulated, allowing galectin-3 to participate in the remodeling of the extracellular matrix that allows for invasion and metastasis.99 Galectin-3, while soluble, is able to act as a receptor for transferrin possessing tri-antennary and tetra-antennary N-glycans, aiding the recycling of bound transferrin to the cell surface.100 Galectin-3 also promotes angiogenesis through acting as a chemoattractant for endothelial cells101, and promotes metastasis through its interaction with MUC1102 and via promotion of cytokine secretion.103 Overall, these activities link upregulation of galectin-3 to increased tumor progression and worse prognoses.104

Secondly, galectin-3 is a potent immunomodulator.105 All immune cells express galectin-

3, and just as in cancer its ability to form a galectin-receptor lattice (Figure 6) is key to its function.106 Galectin-3 binding to T cell receptor can prevent the T cell receptor from clustering around CD8, leading to anergy in cytotoxic T lymphocytes.107 Extracellular galectin-3 can also induce apoptosis in T cells through binding to CD45 and clustering of CD71,108 even as intracellular galectin-3 has the opposite effect, likely through protein-protein interactions mediated by the N-terminal domain.109 Galectin-3 activates the JAK-STAT pathway, likely via its

N-terminal domain as this activity cannot be inhibited by lactose.110 Through recognition of lactosamine-containing N-glycans, galectin-3 serves as a signal to the immune system that self- epitopes are present.111,112

20

Thirdly, galectin-3 can interact directly with pathogens and the microbiome.113 Through its recognition of self-like lactosamine-containing epitopes, galectin-3 can bind bacteria that express these epitopes as a mimicry mechanism.114 This allows galectin-3 to bind a variety of bacterial pathogens from K. pnuemoniae to H. pylori.115 In the case of H. pylori, some studies have suggested a direct cytotoxic effect resulting from binding to Lewis Figure 6: The galectin-glycoprotein lattice. Galectin dimers or oligomers (brick) bind lactosamine- antigens on the bacterial cell membrane.116 For containing glycans (yellow/blue) on glycoproteins (grey) with multiple N- or O-glycosylation sites. This draws the glycoproteins and galectins into an LPS-displaying bacteria, galectin-3 can bind the ordered arrangement, inducing clustering of the glycoproteins. LPS and reduce resulting inflammation.117 Even in cases where galectin-3 does not bind bacteria or pathogenic protists, as with Shigella and T. cruzi, it is associated with phagocytosis and vacuole lysis.118,119 Galectin-3 can also bind to some commensal microbiota such as B. longum, and it does not appear to harm these bacteria.120 In fact, as galectin-3 also binds strongly to intestinal mucins,121 it may serve to retain the commensal microbiota.

One question these studies leave open is quality control. In canonical protein export, secreted proteins are N-glycosylated in the ER and Golgi,122 and structurally defective proteins are routed to lectin chaperones for refolding.123 Like other galectins, human galectin-3 is translated on free polysomes in the cytoplasm75 and is not trafficked via the Golgi,124 so it cannot benefit from this quality control pathway. That said, export of misfolded galectin-3 variants could be highly deleterious considering both its biological role and that galectin-3 bears an unpaired cysteine that is buried if the protein is properly folded.97 For this reason, my work

21 explored the effect of W181 mutations that impair binding on galectin-3 export from HEK-293 cells.

Galectin-3 has six known polymorphisms in humans that occur with a frequency of 0.1% or greater.125 A225 is replaced with aspartate or valine 46% of the time, though no study has yet identified any medical consequences of this variation. Likewise, no medical relevance has been found for the polymorphisms leading to the variations A53D, R183K, R212L, or R212Q.

Polymorphisms of medical relevance are found at two sites. First, T98 is replaced by proline in

43% of humans- this is associated with a modestly increased rate of gastric cancer.126 Second,

P64 is replaced with histidine 29% of the time- this polymorphism is associated with considerably worse cancer prognoses127. The only known variation of W181 is a somatic mutation to arginine that was found in a gastric tumor128.

Conclusions and present work

To investigate the importance of CH-π interactions in carbohydrate binding sites of lectins, I first chose to examine CH-π interactions in the simplest relevant context available, namely the interaction between a single monosaccharide and indole, choosing the latter because it is the functional group of tryptophan, which in turn is heavily overrepresented in carbohydrate binding sites.43 In Chapter 2, I measured the strength of these interactions using a

1H NMR based assay, in which the change in chemical shift of glycoside protons involved in CH-

π interactions indicates the likelihood that they stack on the aromatic ring.129,130 I repeated this process for α- and β- anomers of mannose, glucose, and galactose, and for a Hammett series of substituted indoles.43,131

As the results of these experiments showed a strong electronic contribution to CH-π interactions in a biologically relevant small molecule context,43 I sought in Chapter 3 to examine the interaction in the context of a protein. After consideration of several β-galactose binding lectins, I chose to work with human galectin-3 due to its ease of bacterial expression,132 its

22 medical relevance,133–136 and the availability of sub-Ångstrom crystal structures showing three β- galactose protons favorably oriented for a CH-π interaction with the invariant tryptophan residue W181.137,138 I generated a range of variants at W181 to test the importance of this CH-π interaction, measuring affinity by isothermal titration calorimetry, with a procedure optimized for measuring low affinity interactions.139 I chose phenylalanine and tyrosine as they present a smaller aromatic π system than tryptophan, and to test the reason behind why phenylalanine is less overrepresented than tyrosine in carbohydrate binding sites.43 I chose histidine as it bears a particularly electron-poor π system that would have weaker electronic interactions with the β- galactose residue43, but it is still aromatic. I chose methionine as a model non-aromatic residue, as modeling the mutation in PyMOL140 indicated better shape complementarity than any other non-aromatic residue. I generated an alanine variant but shortly found that it was unfolded at room temperature. Finally, I cloned and expressed an arginine variant, as the W181R somatic mutation has been found in a gastric cancer,128 and I wished to understand the effect that said mutation would have on the folding and activity of the resulting protein. In addition to calorimetric assays, I used differential scanning fluorimetry141 to determine the denaturation temperature (Tm) of each variant as a measure of its stability. For better handling, I conducted these studies with a C-terminal construct that is commonly used for crystallography91,137,138 and is present biologically as an enzymatically produced competitive inhibitor of biological activities of galectin-3.111

With this biophysical data in hand, I then explored the importance of the galectin-3 CH-

π interaction in a biological context in Chapter 4, in order to determine if the CH-π interaction still plays a crucial role with a larger ligand and highlight its importance to health. The first activity I chose to investigate was the well-established ability of galectin-3 to agglutinate red blood cells through binding to their Lewis antigens142,143. I tested this activity for wild type and

W181 variant galectin-3 through both a microscopy-based aggregation assay and a plate-based hemagglutination titer assay. Additionally, I sought to determine the role of the CH-π

23 interaction in more newly described galectin-3 activities. To this end, I applied wild type and variant galectin-3 to a panel of mucins purified by the Ribbeck Group144 in a nitrocellulose dot- blot assay to determine if CH-π interactions are essential for binding to the lactosamine- containing O-glycans that mucins bear.121 Finally, I expressed wild type and variant galectin-3 in

HEK-293 cells to determine if the CH-π interaction is necessary for export of the protein from a human cell.

While these results answer many questions about the nature of CH-π interactions and their role in carbohydrate binding by proteins, they also open up a number of new avenues for further study. Among these are three projects that extend beyond the scope of my doctoral research, and one that I have envisioned but not initiated. These projects are outlined in Chapter

5. First, the logical synthesis of my small molecule Hammett series and my mutagenesis of galectin-3 would be to use non-natural amino acid incorporation within the context of a lectin.

While this is a more challenging effort than I had originally anticipated, my preparatory research and experience bear lessons for those who would bring such a project to completion.

Second, varying groups on the carbohydrate to make it a stronger CH-π donor could extend the small molecule portion of my project. This could be further elaborated into developing stronger galectin inhibitors, as current inhibitors mainly vary outlying areas of the binding site rather than the β-galactose residue itself.145 Third, a future researcher could demonstrate the potency of CH-π interactions by conferring them upon a lectin that does not use them in nature. I have explored the possibility of doing so for cyanovirin-N, a lectin that can bind the high-mannose glycans of HIV protein gp120, but does so without recognizing the β-mannose residue at the core of the glycan that would be an excellent CH-π donor.146,147 Finally, the biological portion of the project could be extended to the tandem repeat galectins, namely galectins-4, 8, and 9.

These galectins have noted antimicrobial activity,148 so assaying their binding to pathogens or isolated microbial glycan would shed light on the mechanism by which they operate. These projects will ensure many paths of investigation for future researchers.

24

References

(1) Varki, A. Evolutionary Forces Shaping the Golgi Glycosylation Machinery: Why Cell Surface Glycans Are Universal to Living Cells. Cold Spring Harb. Perspect. Biol. 2011, 3 (6), 1–14. (2) Koch, B. E.; Stougaard, J.; Spaink, H. P. Keeping Track of the Growing Number of Biological Functions of Chitin and Its Interaction Partners in Biomedical Research. Glycobiology 2015, 25 (5), 469–482. (3) Jayaraman, N. Multivalent Ligand Presentation as a Central Concept to Study Intricate Carbohydrate-Protein Interactions. Chem. Soc. Rev. 2009, 38 (12), 3463–3483. (4) Zhuo, L.; Kanamori, A.; Kannagi, R.; Itano, N.; Wu, J.; Hamaguchi, M.; Ishiguro, N.; Kimata, K. SHAP Potentiates the CD44-Mediated Leukocyte Adhesion to the Hyaluronan Substratum. J. Biol. Chem. 2006, 281 (29), 20303–20314. (5) Pipirou, Z.; Powlesland, A. S.; Steffen, I.; Pöhlmann, S.; Taylor, M. E.; Drickamer, K. Mouse LSECtin as a Model for a Human Ebola Virus Receptor. Glycobiology 2011, 21 (6), 806–812. (6) Wang, S.-F.; Tsao, C.-H.; Lin, Y.-T.; Hsu, D. K.; Chiang, M.-L.; Lo, C.-H.; Chien, F.-C.; Chen, P.; Arthur Chen, Y.-M.; Chen, H.-Y.; et al. Galectin-3 Promotes HIV-1 Budding via Association with Alix and Gag P6. Glycobiology 2014, 24 (11), 1022–1035. (7) Varki, A. Biological Roles of Glycans. Glycobiology 2017, 27 (1), 3–49. (8) Muthana, S. M.; Xia, L.; Campbell, C. T.; Zhang, Y.; Gildersleeve, J. C. Competition between Serum IgG, IgM, and IgA Anti-Glycan Antibodies. PLoS One 2015, 10 (3), 1–17. (9) Shilova, N.; Navakouski, M.; Khasbiullina, N.; Blixt, O.; Bovin, N. Printed Glycan Array: Antibodies as Probed in Undiluted Serum and Effects of Dilution. Glycoconj. J. 2012, 29 (2–3), 87–91. (10) Bennett, N. R.; Jarvis, C. M.; Alam, M. M.; Zwick, D. B.; Olson, J. M.; Nguyen, H. V. T.; Johnson, J. A.; Cook, M. E.; Kiessling, L. L. Modular Polymer Antigens to Optimize Immunity. Biomacromolecules 2019, 20 (12), 4370–4379. (11) Harris, S. L.; Craig, L.; Mehroke, J. S.; Rashed, M.; Zwick, M. B.; Kenar, K.; Toone, E. J.; Greenspan, N.; Auzanneau, F. I.; Marino-Albernas, J. R.; et al. Exploring the Basis of Peptide-Carbohydrate Crossreactivity: Evidence for Discrimination by Peptides between Closely Related Anti-Carbohydrate Antibodies. Proc. Natl. Acad. Sci. U. S. A. 1997, 94 (6), 2454–2459. (12) Deng, S. J.; MacKenzie, C. R.; Sadowska, J.; Michniewicz, J.; Young, N. M.; Bundle, D. R.; Narang, S. A. Selection of Antibody Single-Chain Variable Fragments with Improved Carbohydrate Binding by Phage Display. J. Biol. Chem. 1994, 269 (13), 9533–9538. (13) Cygler, M.; Rose, D. R.; Bundle, D. R. Recognition of a Cell-Surface Oligosaccharide of Pathogenic Salmonella by an Antibody Fab Fragment. Science (80-. ). 1991, 253 (5018), 442–445. (14) Zhang, Y.; Campbell, C.; Li, Q.; Gildersleeve, J. C. Multidimensional Glycan Arrays for Enhanced Antibody Profiling. Mol. Biosyst. 2010, 6 (9), 1583–1591.

25

(15) Franceus, J.; Desmet, T. Sucrose Phosphorylase and Related Enzymes in Glycoside Hydrolase Family 13: Discovery, Application and Engineering. Int. J. Mol. Sci. 2020, 21 (7). (16) Weis, W. I.; Drickamer, K. Structural Basis of Lectin-Carbohydrate Recognition. Annu. Rev. Biochem. 1996, 65, 441–473. (17) Lairson, L. L.; Withers, S. G. Mechanistic Analogies amongst Carbohydrate Modifying Enzymes. Chem. Commun. 2004, No. 20, 2243–2248. (18) Frandsen, K. E. H.; Simmons, T. J.; Dupree, P.; Poulsen, J. C. N.; Hemsworth, G. R.; Ciano, L.; Johnston, E. M.; Tovborg, M.; Johansen, K. S.; Von Freiesleben, P.; et al. The Molecular Basis of Polysaccharide Cleavage by Lytic Polysaccharide Monooxygenases. Nat. Chem. Biol. 2016, 12 (4), 298–303. (19) Hettle, A.; Fillo, A.; Abe, K.; Massel, P.; Pluvinage, B.; Langelaan, D. N.; Smith, S. P.; Boraston, A. B. Properties of a Family 56 Carbohydrate-Binding Module and Its Role in the Recognition and Hydrolysis of β-1,3-Glucan. J. Biol. Chem. 2017, 292 (41), 16955– 16968. (20) Carey, D. J. Syndecans: Multifunctional Cell-Surface Co-Receptors. Biochem. J. 1997, 327 (1), 1–16. (21) Esko, J. D.; Prestegard, J. H.; Linhardt, R. J. Proteins That Bind Sulfated Glycosaminoglycans. In Essentials of Glycobiology; Varki, A., Cummings, R. D., Esko, J. D., Stanley, P., Hart, G. W., Aebi, M., Darvill, A. G., Kinoshita, T., Packer, N. H., Prestegard, J. H., et al., Eds.; Cold Spring Harbor Laboratory Press: Cold Spring Harbor, NY, 2015; pp 493–502. (22) Capila, I.; Linhardt, R. J. Heparin - Protein Interactions. Angew. Chemie - Int. Ed. 2002, 41, 390–412. (23) Andersson, E.; Rydengård, V.; Sonesson, A.; Mörgelin, M.; Björck, L.; Schmidtchen, A. Antimicrobial Activities of Heparin-Binding Peptides. Eur. J. Biochem. 2004, 271 (6), 1219–1226. (24) Klim, J. R.; Li, L.; Wrighton, P. J.; Piekarczyk, M. S.; Kiessling, L. L. A Defined Glycosaminoglycan-Binding Substratum for Human Pluripotent Stem Cells. Nat. Methods 2010, 7 (12), 989–994. (25) Showalter, A. M. Structure and Function of Plant Cell Wall Proteins. Plant Cell 1993, 5 (1), 9–23. (26) Lis, H.; Sharon, N. Lectins: Carbohydrate-Specific Proteins That Mediate Cellular Recognition. Chem. Rev. 1998, 98 (2), 637–674. (27) Polgár, J.; Clemetson, J. M.; Kehrel, B. E.; Wiedemann, M.; Magnenat, E. M.; Wells, T. N. C.; Clemetson, K. J. Platelet Activation and Signal Transduction by Convulxin, a C-Type Lectin from Crotalus Durissus Terrificus (Tropical Rattlesnake) Venom via the P62/GPVI Collagen Receptor. J. Biol. Chem. 1997, 272 (21), 13576–13583. (28) MacKenzie, C. R.; Hirama, T.; Lee, K. K. Quantitative Analysis of Bacterial Toxin Affinity and Specificity for Glycolipid Receptors by Surface Plasmon Resonance. J. Biol. Chem. 1997, 272 (9), 5533–5538. (29) Wangkanont, K.; Wesener, D. A.; Vidani, J. A.; Kiessling, L. L.; Forest, K. T. Structures of

26

Xenopus Embryonic Epidermal Lectin Reveal a Conserved Mechanism of Microbial Glycan Recognition. J. Biol. Chem. 2016, 291 (11), 5596–5610. (30) Taylor, M. E.; Drickamer, K.; Schnaar, R. L.; Etzler, M. E.; Varki, A. Discovery and Classification of Glycan-Binding Proteins. In Essentials of Glycobiology; Cold Spring Harbor Laboratory Press: Cold Spring Harbor, NY, 2017. (31) Biedermann, F.; Schneider, H. J. Experimental Binding Energies in Supramolecular Complexes. Chem. Rev. 2016, 116 (9), 5216–5300. (32) Davis, A. P.; Wareham, R. S. Carbohydrate Recognition through Noncovalent Interactions: A Challenge for Biomimetic and Supramolecular Chemistry. Angew. Chemie - Int. Ed. 1999, 38 (20), 2978–2996. (33) Boraston, A. B.; Nurizzo, D.; Notenboom, V.; Ducros, V.; Rose, D. R.; Kilburn, D. G.; Davies, G. J. Differential Oligosaccharide Recognition by Evolutionarily-Related β-1,4 and β-1,3 Glucan-Binding Modules. J. Mol. Biol. 2002, 319 (5), 1143–1156. (34) Xie, H.; Bolam, D. N.; Nagy, T.; Szabó, L.; Cooper, A.; Simpson, P. J.; Lakey, J. H.; Williamson, M. P.; Gilbert, H. J. Role of Hydrogen Bonding in the Interaction between a Xylan Binding Module and Xylan. Biochemistry 2001, 40 (19), 5700–5707. (35) Schwefel, D.; Maierhofer, C.; Beck, J. G.; Seeberger, S.; Diederichs, K.; Möller, H. M.; Welte, W.; Wittmann, V. Structural Basis of Multivalent Binding to Wheat Germ Agglutinin. J. Am. Chem. Soc. 2010, 132 (25), 8704–8719. (36) Li, Z.; Lazaridis, T. The Effect of Water Displacement on Binding Thermodynamics: Concanavalin A. J. Phys. Chem. B 2005, 109 (1), 662–670. (37) Botos, I.; Wlodawer, A. Proteins That Bind High-Mannose Sugars of the HIV Envelope; 2005; Vol. 88. (38) Lee, R. T.; Hsu, T. L.; Huang, S. K.; Hsieh, S. L.; Wong, C. H.; Lee, Y. C. Survey of Immune-Related, Mannose/Fucose-Binding C-Type Lectin Receptors Reveals Widely Divergent Sugar-Binding Specificities. Glycobiology 2011, 21 (4), 512–520. (39) Holla, A.; Skerra, A. Comparative Analysis Reveals Selective Recognition of Glycans by the Dendritic Cell Receptors DC-SIGN and Langerin. Protein Eng. Des. Sel. 2011, 24 (9), 659–669. (40) Wesener, D. A.; Wangkanont, K.; McBride, R.; Song, X.; Kraft, M. B.; Hodges, H. L.; Zarling, L. C.; Splain, R. A.; Smith, D. F.; Cummings, R. D.; et al. Recognition of Microbial Glycans by Human Intelectin-1. Nat. Struct. Mol. Biol. 2015, 22 (8), 603–610. (41) Sun, G.; Zhao, H.; Kalyanaraman, B.; Dahms, N. M. Identification of Residues Essential for Carbohydrate Recognition and Cation Dependence of the 46-KDa Mannose 6- Phosphate Receptor. Glycobiology 2005, 15 (11), 1136–1149. (42) Esquivies, L.; Blackler, A.; Peran, M.; Rodriguez-Esteban, C.; Izpisua Belmonte, J. C.; Booker, E.; Gray, P. C.; Ahn, C.; Kwiatkowski, W.; Choe, S. Designer Nodal/BMP2 Chimeras Mimic Nodal Signaling, Promote Chondrogenesis, and Reveal a BMP2-like Structure. J. Biol. Chem. 2014, 289 (3), 1788–1797. (43) Hudson, K. L.; Bartlett, G. J.; Diehl, R. C.; Agirre, J.; Gallagher, T.; Kiessling, L. L.; Woolfson, D. N. Carbohydrate-Aromatic Interactions in Proteins. J. Am. Chem. Soc. 2015, 137 (48), 15152–15160.

27

(44) Bernardi, A.; Arosio, D.; Potenza, D.; Sánchez-Medina, I.; Mari, S.; Cañada, F. J.; Jiménez-Barbero, J. Intramolecular Carbohydrate-Aromatic Interaction and Intermolecular van Der Waals Interactions Enhance the Molecular Recognition Ability of GM1 Glycomimetics for Cholera Toxin. Chem. - A Eur. J. 2004, 10 (18), 4395–4406. (45) Payne, C. M.; Bomble, Y. J.; Taylor, C. B.; McCabe, C.; Himmel, M. E.; Crowley, M. F.; Beckham, G. T. Multiple Functions of Aromatic-Carbohydrate Interactions in a Processive Cellulase Examined with Molecular Simulation. J. Biol. Chem. 2011, 286 (47), 41028– 41035. (46) Guce, A. I.; Clark, N. E.; Salgado, E. N.; Ivanen, D. R.; Kulminskaya, A. A.; Brumer, H.; Garman, S. C. Catalytic Mechanism of Human α-Galactosidase. J. Biol. Chem. 2010, 285 (6), 3625–3632. (47) Taylor, R.; Kennard, O. Crystallographic Evidence for the Existence of C-H⋯O, C-H⋯N, and C-H⋯Cl Hydrogen Bonds. J. Am. Chem. Soc. 1982, 104 (19), 5063–5070. (48) Pereira Silva, P. S.; Cardoso, C.; Ramos Silva, M.; Paixão, J. a.; Matos Beja, a.; Nogueira, F. Density Functional and X-Ray Diffraction Studies of Two Polymorphs of N,N′,N″- Triphenylguanidine. J. Mol. Struct. 2008, 888 (1–3), 92–98. (49) Nishio, M.; Umezawa, Y.; Hirota, M.; Takeuchi, Y. The CH/π Interaction: Significance in Molecular Recognition. Tetrahedron 1995, 51 (32), 8665–8701. (50) Samanta, U. .; Chakrabarti, P. .; Chandrasekhar, J. . Ab Initio Study of Energetics of X-H ‚‚‚ π ( X ) N , O , and C ) Interactions Involving a Heteroaromatic Ring. J. Phys. Chem. A 1998, 8964–8969. (51) Oki, M.; Takano, S.; Toyota, S. Benzene-Ethene Interactions as Studied by Ab Initio Calculations. Bull. Chem. Soc. Jpn. 2000, 73, 2221–2230. (52) Tsuzuki, S.; Honda, K.; Uchimaru, T.; Mikami, M.; Tanabe, K. The Interaction of Benzene with Chloro- and Fluoromethanes: Effects of Halogenation on CH/π Interaction. J. Phys. Chem. A 2002, 106 (17), 4423–4428. (53) Ams, M. R.; Fields, M.; Grabnic, T.; Janesko, B. G.; Zeller, M.; Sheridan, R.; Shay, A. Unraveling the Role of Alkyl F on CH-π Interactions and Uncovering the Tipping Point for Fluorophobicity. J. Org. Chem. 2015, 80 (15), 7764–7769. (54) Dey, R. C.; Seal, P.; Chakrabarti, S. CH/π Interaction in Benzene and Substituted Derivatives with Halomethane: A Combined Density Functional and Dispersion- Corrected Density Functional Study. J. Phys. Chem. A 2009, 113 (37), 10113–10118. (55) Ugozzoli, F.; Arduini, A.; Massera, C.; Pochini, A.; Secchi, A. CH/π Interaction between Benzene and Model Neutral Organic Molecules Bearing Acid CH Groups. New J. Chem. 2002, 26 (12), 1718–1723. (56) Sinnokrot, M. O.; Sherrill, C. D. Substituent Effects in π-π Interactions: Sandwich and t- Shaped Configurations. J. Am. Chem. Soc. 2004, 126 (24), 7690–7697. (57) Nishio, M. The CH/π Hydrogen Bond in Chemistry. Conformation, Supramolecules, Optical Resolution and Interactions Involving Carbohydrates. Phys. Chem. Chem. Phys. 2011, 13 (31), 13873–13900. (58) Lisbjerg, M.; Valkenier, H.; Jessen, B. M.; Al-Kerdi, H.; Davis, A. P.; Pittelkow, M. Biotin[6]Uril Esters: Chloride-Selective Transmembrane Anion Carriers Employing C-

28

H···anion Interactions. J. Am. Chem. Soc. 2015, 137 (15), 4948–4951. (59) Jiménez-Moreno, E.; Jiménez-Osés, G.; Gómez, A. M.; Santana, A. G.; Corzana, F.; Bastida, A.; Jiménez-Barbero, J.; Asensio, J. L. A Thorough Experimental Study of CH/π Interactions in Water: Quantitative Structure-Stability Relationships for Carbohydrate/Aromatic Complexes. Chem. Sci. 2015, 6 (11), 6076–6085. (60) Stanca-Kaposta, E. C.; Çarçabal, P.; Cocinero, E. J.; Hurtado, P.; Simons, J. P. Carbohydrate-Aromatic Interactions: Vibrational Spectroscopy and Structural Assignment of Isolated Monosaccharide Complexes with p-Hydroxy Toluene and N- Acetyl L-Tyrosine Methylamide. J. Phys. Chem. B 2013, 117 (27), 8135–8142. (61) Salonen, L. M.; Ellermann, M.; Diederich, F. Aromatic Rings in Chemical and Biological Recognition: Energetics and Structures. Angew. Chemie - Int. Ed. 2011, 50 (21), 4808– 4842. (62) Zhao, C.; Li, P.; Smith, M. D.; Pellechia, P. J.; Shimizu, K. D. Experimental Study of the Cooperativity of CH-π Interactions. Org. Lett. 2014, 16 (13), 3520–3523. (63) Baggioli, A.; Meille, S. V.; Raos, G.; Po, R.; Brinkmann, M.; Famulari, A. Intramolecular CH/π Interactions in Alkylaromatics: Monomer Conformations for Poly(3- Alkylthiophene) Atomistic Models. Int. J. Quantum Chem. 2013, 113 (18), 2154–2162. (64) Krenske, E. H.; Houk, K. N. Aromatic Interactions as Control Elements in Stereoselective Organic Reactions. Acc. Chem. Res. 2013, 46 (4), 979–989. (65) Ninković, D. B.; Vojislavljević-Vasilev, D. Z.; Medaković, V. B.; Hall, M. B.; Brothers, E. N.; Zarić, S. D. Aliphatic-Aromatic Stacking Interactions in Cyclohexane-Benzene Are Stronger than Aromatic-Aromatic Interaction in the Benzene Dimer. Phys. Chem. Chem. Phys. 2016, 18 (37), 25791–25795. (66) Chen, W.; Enck, S.; Price, J. L.; Powers, D. L.; Powers, E. T.; Wong, C.; Dyson, H. J.; Kelly, W. Structural and Energetic Basis of Carbohydrate−Aromatic Packing Interactions in Proteins. JACS 2013, 135, 9877–9884. (67) Hsu, C. H.; Park, S.; Mortenson, D. E.; Foley, B. L.; Wang, X.; Woods, R. J.; Case, D. A.; Powers, E. T.; Wong, C. H.; Dyson, H. J.; et al. The Dependence of Carbohydrate- Aromatic Interaction Strengths on the Structure of the Carbohydrate. J. Am. Chem. Soc. 2016, 138 (24), 7636–7648. (68) Bautista-Ibañez, L.; Ramírez-Gualito, K.; Quiroz-García, B.; Rojas-Aguilar, A.; Cuevas, G. Calorimetric Measurement of the CH/Pi Interaction Involved in the Molecular Recognition of Saccharides by Aromatic Compounds. J. Org. Chem. 2008, 73 (3), 849– 857. (69) Asensio, J. L.; Arda, A.; Canada, F. J.; Jiménez-Barbero, J. Carbohydrate-Aromatic Interactions. Acc. Chem. Res. 2013, 46 (4), 946–954. (70) Nishio, M.; Umezawa, Y.; Fantini, J.; Weiss, M. S.; Chakrabarti, P. CH-π Hydrogen Bonds in Biological Macromolecules. Phys. Chem. Chem. Phys. 2014, 16 (25), 12648–12683. (71) Tsuzuki, S.; Uchimaru, T.; Mikami, M. Magnitude and Nature of Carbohydrate-Aromatic Interactions in Fucose-Phenol and Fucose-Indole Complexes: CCSD(T) Level Interaction Energy Calculations. J. Phys. Chem. A 2011, 115 (41), 11256–11262. (72) Houser, J.; Kozmon, S.; Mishra, D.; Hammerová, Z.; Wimmerová, M.; Koča, J. The CH–π

29

Interaction in Protein–Carbohydrate Binding: Bioinformatics and In Vitro Quantification. Chem. - A Eur. J. 2020, 26 (47), 10769–10780. (73) Barondes, S. H. Stumbling on Galectins. In Galectins; 2008; pp 1–8. (74) Johannes, L.; Jacob, R.; Leffler, H. Galectins at a Glance. J. Cell Sci. 2018, 131 (9), 1–9. (75) Cummings, R. D.; Liu, F.-T.; Vasta, G. R. Galectins. In Essentials of Glycobiology; Varki, A., Cummings, R. D., Esko, J. D., Stanley, P., Hart, G. W., Aebi, M., Darvill, A. G., Kinoshita, T., Packer, N. H., Prestegard, J. H., et al., Eds.; Cold Spring Harbor Laboratory Press: Cold Spring Harbor, NY, 2015; pp 469–480. (76) Klyosov, A. a. Galectins and Their Functions in Plain Language. In Galectins; 2008; pp 9–31. (77) Nesmelova, I. V.; Dings, R. P. M.; Mayo, K. H. Understanding Galectin Structure- Function Relationships to Design Effective Antagonists. In Galectins; Klyosov, A. a., Witczak, Z. J., Platt, D., Eds.; John Wiley & Sons, Inc.: Hoboken, NJ, 2008; pp 33–69. (78) Troncoso, M. F.; Ferragut, F.; Bacigalupo, M. L.; Cardenas Delgado, V. M.; Nugnes, L. G.; Gentilini, L.; Laderach, D.; Wolfenstein-Todel, C.; Compagno, D.; Rabinovich, G. a.; et al. Galectin-8: A Matricellular Lectin with Key Roles in Angiogenesis. Glycobiology 2014, 24 (10), 907–914. (79) Cagnoni, A. J.; Troncoso, M. F.; Rabinovich, G. A.; Mariño, K. V.; Elola, M. T. Full-Length Galectin-8 and Separate Carbohydrate Recognition Domains: The Whole Is Greater than the Sum of Its Parts? Biochem. Soc. Trans. 2020, 48 (3), 1255–1268. (80) Pugliese, G.; Iacobini, C.; Pesce, C. M.; Menini, S. Galectin-3: An Emerging All-out Player in Metabolic Disorders and Their Complications. Glycobiology 2015, 25 (2), 136–150. (81) Diehl, C.; Genheden, S.; Modig, K.; Ryde, U.; Akke, M. Conformational Entropy Changes upon Lactose Binding to the Carbohydrate Recognition Domain of Galectin-3. J. Biomol. NMR 2009, 45 (1–2), 157–169. (82) Lepur, A.; Salomonsson, E.; Nilsson, U. J.; Leffler, H. Ligand Induced Galectin-3 Protein Self-Association. J. Biol. Chem. 2012, 287 (26), 21751–21756. (83) Sato, S.; Rabinovich, G. a. Galectins As Danger Signals in Host-Pathogen and Host- Tumor Interactions: New Members of the Growing Group of “Alarmins”? In Galectins; 2008; pp 115–146. (84) Salomonsson, E.; Carlsson, M. C.; Osla, V.; Hendus-Altenburger, R.; Kahl-Knutson, B.; Oberg, C. T.; Sundin, A.; Nilsson, R.; Nordberg-Karlsson, E.; Nilsson, U. J.; et al. Mutational Tuning of Galectin-3 Specificity and Biological Function. J. Biol. Chem. 2010, 285 (45), 35079–35091. (85) Cumpstey, I.; Salomonsson, E.; Sundin, A.; Leffler, H.; Nilsson, U. J. Double Affinity Amplification of Galectin-Ligand Interactions through Arginine-Arene Interactions: Synthetic, Thermodynamic, and Computational Studies with Aromatic Diamido Thiodigalactosides. Chemistry 2008, 14 (14), 4233–4245. (86) Bachhawat-Sikder, K.; Thomas, C. J.; Surolia, A. Thermodynamic Analysis of the Binding of Galactose and Poly-N-Acetyllactosamine Derivatives to Human Galectin-3. FEBS Lett. 2001, 500 (1–2), 75–79. (87) Šimonová, A.; Kupper, C. E.; Böcker, S.; Müller, A.; Hofbauerová, K.; Pelantová, H.;

30

Elling, L.; Křen, V.; Bojarová, P. Chemo-Enzymatic Synthesis of LacdiNAc Dimers of Varying Length as Novel Galectin Ligands. J. Mol. Catal. B Enzym. 2014, 101, 47–55. (88) Gabius, H.-J.; Wu, A. M. Galectins As Regulators of Tumor Growth and Invasion by Targeting Distinct Cell Surface Glycans and Implications for Drug Design. In Galectins; 2008; pp 71–85. (89) Nesmelova, I. V.; Dings, R. P. M.; Mayo, K. H. Understanding Galectin Structure- Function Relationships to Design Effective Antagonists. In Galectins; 2008; pp 33–69. (90) Sörme, P.; Qian, Y.; Nyholm, P. G.; Leffler, H.; Nilsson, U. J. Low Micromolar Inhibitors of Galectin-3 Based on 3′-Derivatization of N-Acetyllactosamine. ChemBioChem 2002, 3 (2–3), 183–189. (91) Bum-Erdene, K.; Gagarinov, I. a; Collins, P. M.; Winger, M.; Pearson, A. G.; Wilson, J. C.; Leffler, H.; Nilsson, U. J.; Grice, I. D.; Blanchard, H. Investigation into the Feasibility of Thioditaloside as a Novel Scaffold for Galectin-3-Specific Inhibitors. Chembiochem 2013, 14 (11), 1331–1342. (92) Lundquist, J. J.; Kiburz, B. M.; Wu, J. K.; Gibbs, K. D.; Toone, E. J. Towards High Affinity Carbohydrate-Binding Proteins: Directed Evolution of Murine Galectin-3. Can. J. Chem. 2002, 80 (8), 999–1009. (93) Salameh, B. a.; Cumpstey, I.; Sundin, A.; Leffler, H.; Nilsson, U. J. 1H-1,2,3-Triazol-1-Yl Thiodigalactoside Derivatives as High Affinity Galectin-3 Inhibitors. Bioorganic Med. Chem. 2010, 18 (14), 5367–5378. (94) Funasaka, T.; Raz, a.; Nangia-Makker, P. Galectin-3 in Angiogenesis and Metastasis. Glycobiology 2014, 24 (10), 886–891. (95) Saravanan, C.; Liu, F. T.; Gipson, I. K.; Panjwani, N. Galectin-3 Promotes Lamellipodia Formation in Epithelial Cells by Interacting with Complex N-Glycans on Α3β1 Integrin. J. Cell Sci. 2009, 122 (20), 3684–3693. (96) Friedrichs, J.; Torkko, J. M.; Helenius, J.; Teräväinen, T. P.; Füllekrug, J.; Muller, D. J.; Simons, K.; Manninen, A. Contributions of Galectin-3 and -9 to Epithelial Cell Adhesion Analyzed by Single Cell Force Spectroscopy. J. Biol. Chem. 2007, 282 (40), 29375– 29383. (97) Woo, H. J.; Lotz, M. M.; Jung, J. U.; Mercurio, A. M. Carbohydrate-Binding Protein 35 (Mac-2), a Laminin-Binding Lectin, Forms Functional Dimers Using Cysteine 186. J. Biol. Chem. 1991, 266 (28), 18419–18422. (98) Hikita, C.; Vijayakumar, S.; Takito, J.; Erdjument-Bromage, H.; Tempst, P.; Al-Awqati, Q. Induction of Terminal Differentiation in Epithelial Cells Requires Polymerization of Hensin by Galectin 3. J. Cell Biol. 2000, 151 (6), 1235–1246. (99) Lagana, A.; Goetz, J. G.; Cheung, P.; Raz, A.; Dennis, J. W.; Nabi, I. R. Galectin Binding to Mgat5-Modified N-Glycans Regulates Fibronectin Matrix Remodeling in Tumor Cells. Mol. Cell. Biol. 2006, 26 (8), 3181–3193. (100) Carlsson, M. C.; Bengtson, P.; Cucak, H.; Leffler, H. Galectin-3 Guides Intracellular Trafficking of Some Human Serotransferrin Glycoforms. J. Biol. Chem. 2013, 288 (39), 28398–28408. (101) Nangia-Makker, P.; Honjo, Y.; Sarvis, R.; Akahani, S.; Hogan, V.; Pienta, K. J.; Raz, A.

31

Galectin-3 Induces Endothelial Cell Morphogenesis and Angiogenesis. Am. J. Pathol. 2000, 156 (3), 899–909. (102) Yu, L. G.; Andrews, N.; Zhao, Q.; McKean, D.; Williams, J. F.; Connor, L. J.; Gerasimenko, O. V.; Hilkens, J.; Hirabayashi, J.; Kasai, K.; et al. Galectin-3 Interaction with Thomsen-Friedenreich Disaccharide on Cancer-Associated MUC1 Causes Increased Cancer Cell Endothelial Adhesion. J. Biol. Chem. 2007, 282 (1), 773–781. (103) Chen, C.; Duckworth, C. A.; Zhao, Q.; Pritchard, D. M.; Rhodes, J. M.; Yu, L. G. Increased Circulation of Galectin-3 in Cancer Induces Secretion of Metastasis-Promoting Cytokines from Blood Vascular Endothelium. Clin. Cancer Res. 2013, 19 (7), 1693–1704. (104) Barrow, H.; Guo, X.; Wandall, H. H.; Pedersen, J. W.; Fu, B.; Zhao, Q.; Chen, C.; Rhodes, J. M.; Yu, L. G. Serum Galectin-2, -4, and -8 Are Greatly Increased in Colon and Breast Cancer Patients and Promote Cancer Cell Adhesion to Blood Vascular Endothelium. Clin. Cancer Res. 2011, 17 (22), 7035–7046. (105) Thiemann, S.; Baum, L. G. Galectins and Immune Responses-Just How Do They Do Those Things They Do? Annu. Rev. Immunol. 2016, 34, 243–264. (106) Rabinovich, G. A.; Toscano, M. A. Turning “sweet” on Immunity: Galectin-Glycan Interactions in Immune Tolerance and Inflammation. Nat. Rev. Immunol. 2009, 9 (5), 338–352. (107) Demotte, N.; Stroobant, V.; Courtoy, P. J.; Van Der Smissen, P.; Colau, D.; Luescher, I. F.; Hivroz, C.; Nicaise, J.; Squifflet, J. L.; Mourad, M.; et al. Restoring the Association of the T Cell Receptor with CD8 Reverses Anergy in Human Tumor-Infiltrating Lymphocytes. Immunity 2008, 28 (3), 414–424. (108) Stillman, B. N.; Hsu, D. K.; Pang, M.; Brewer, C. F.; Johnson, P.; Liu, F.-T.; Baum, L. G. Galectin-3 and Galectin-1 Bind Distinct Cell Surface Glycoprotein Receptors to Induce T Cell Death. J. Immunol. 2006, 176 (2), 778–789. (109) Matarrese, P.; Tinari, N.; Semeraro, M. L.; Natoli, C.; Iacobelli, S.; Malorni, W. Galectin-3 Overexpression Protects from Cell Damage and Death by Influencing Mitochondrial Homeostasis. FEBS Lett. 2000, 473 (3), 311–315. (110) Jeon, S.-B.; Yoon, H. J.; Chang, C. Y.; Koh, H. S.; Jeon, S.-H.; Park, E. J. Galectin-3 Exerts Cytokine-Like Regulatory Actions through the JAK–STAT Pathway. J. Immunol. 2010, 185 (11), 7037–7046. (111) Sato, S.; Nieminen, J. Seeing Strangers or Announcing “Danger”: Galectin-3 in Two Models of Innate Immunity. Glycoconj. J. 2004, 19 (7–9), 583–591. (112) Varki, A. Letter to the Glyco-Forum: Since There Are PAMPs and DAMPs, There Must Be SAMPs? Glycan “Self-Associated Molecular Patterns” Dampen Innate Immunity, but Pathogens Can Mimic Them. Glycobiology 2011, 21 (9), 1121–1124. (113) Baum, L. G.; Garner, O. B.; Schaefer, K.; Lee, B. Microbe-Host Interactions Are Positively and Negatively Regulated by Galectin-Glycan Interactions. Front. Immunol. 2014, 5 (JUN), 1–8. (114) Mandrell, R. N.; Apicella, M. A.; Lindstedt, R.; Leffler, H. Possible Interaction between Animal Lectins and Bacterial Carbohydrates. In Methods in enzymology; 1994; Vol. 236, pp 231–254.

32

(115) Chen, H. Y.; Weng, I. C.; Hong, M. H.; Liu, F. T. Galectins as Bacterial Sensors in the Host Innate Response. Curr. Opin. Microbiol. 2014, 17 (1), 75–81. (116) Park, A. M.; Hagiwara, S.; Hsu, D. K.; Liu, F. T.; Yoshie, O. Galectin-3 Plays an Important Role in Innate Immunity to Gastric Infection by Helicobacter Pylori. Infect. Immun. 2016, 84 (4), 1184–1193. (117) Li, Y.; Komai-Koma, M.; Gilchrist, D. S.; Hsu, D. K.; Liu, F.-T.; Springall, T.; Xu, D. Galectin-3 Is a Negative Regulator of Lipopolysaccharide-Mediated Inflammation. J. Immunol. 2008, 181 (4), 2781–2789. (118) Paz, I.; Sachse, M.; Dupont, N.; Mounier, J.; Cederfur, C.; Enninga, J.; Leffler, H.; Poirier, F.; Prevost, M. C.; Lafont, F.; et al. Galectin-3, a Marker for Vacuole Lysis by Invasive Pathogens. Cell. Microbiol. 2010, 12 (4), 530–544. (119) Machado, F. C.; Cruz, L.; Da Silva, A. A.; Cruz, M. C.; Mortara, R. A.; Roque-Barreira, M. C.; Da Silva, C. V. Recruitment of Galectin-3 during Cell Invasion and Intracellular Trafficking of Trypanosoma Cruzi Extracellular Amastigotes. Glycobiology 2014, 24 (2), 179–184. (120) Kavanaugh, D.; Kane, M.; Joshi, L.; Hickey, R. M. Detection of Galectin-3 Interaction with Commensal Bacteria. Appl. Environ. Microbiol. 2013, 79 (11), 3507–3510. (121) Leclaire, C.; Lecointe, K.; Gunning, Patrick A. Tribolo, S.; Wittmann, Alexandra Latousakis, Dimitrios MacKenizie, Donald A. Kawasaki, N.; Juge, N. Molecular Basis for Intestinal Mucin Recognition by Galectin-3 and C-Type Lectins. FASEB J. 2018, 32 (6), 3301–3320. (122) Helenius, A.; Aebi, M. Roles of N-Linked Glycans in the Endoplasmic Reticulum. Annu. Rev. Biochem. 2004, 73, 1019–1049. (123) Hebert, D. N.; Lamriben, L.; Powers, E. T.; Kelly, J. W. The Intrinsic and Extrinsic Effects of N-Linked Glycans on Glycoproteostasis. Nat. Chem. Biol. 2014, 10 (11), 902–910. (124) Bänfer, S.; Schneider, D.; Dewes, J.; Strauss, M. T.; Freibert, S. A.; Heimerl, T.; Maier, U. G.; Elsässer, H. P.; Jungmann, R.; Jacob, R. Molecular Mechanism to Recruit Galectin-3 into Multivesicular Bodies for Polarized Exosomal Secretion. Proc. Natl. Acad. Sci. U. S. A. 2018, 115 (19), E4396–E4405. (125) Yates, A. D.; Achuthan, P.; Akanni, W.; Allen, J.; Allen, J.; Alvarez-Jarreta, J.; Amode, M. R.; Armean, I. M.; Azov, A. G.; Bennett, R.; et al. Ensembl 2020 https://www.ensembl.org/Homo_sapiens/Gene/Variation_Gene/Table?db=core;g=ENS G00000131981;r=14:55124110-55145423 (accessed Dec 18, 2020). (126) Shi, Y.; Lin, X.; Chen, G.; Yan, J.; Ying, M.; Zheng, X. Galectin-3 Rs4652 A>C Polymorphism Is Associated with the Risk of Gastric Carcinoma and P-Glycoprotein Expression Level. Oncol. Lett. 2017, 14 (6), 8144–8149. (127) Furuya, T. K.; Jacob, C. E.; Tomitão, M. T. P.; Camacho, L. C. C.; Ramos, M. F. K. P.; Eluf- Neto, J.; Alves, V. A. F.; Zilberstein, B.; Cecconello, I.; Ribeiro, U.; et al. Association between Polymorphisms in Inflammatory Response-Related Genes and the Susceptibility, Progression and Prognosis of the Diffuse Histological Subtype of Gastric Cancer. Genes (Basel). 2018, 9 (12), 1–22. (128) Tate, J. G.; Bamford, S.; Jubb, H. C.; Sondka, Z.; Beare, D. M.; Bindal, N.; Boutselakis, H.; Cole, C. G.; Creatore, C.; Dawson, E.; et al. COSMIC: The Catalogue Of Somatic Mutations

33

In Cancer. Nucleic Acids Res. 2019, 47 (D1), D941–D947. (129) del Carmen Fernández-Alonso, M.; Cañada, F. J.; Jiménez-Barbero, J.; Cuevas, G. Molecular Recognition of Saccharides by Proteins. Insights on the Origin of the Carbohydrate-Aromatic Interactions. J. Am. Chem. Soc. 2005, 127 (20), 7379–7386. (130) Vandenbussche, S.; Díaz, D.; Fernández-Alonso, M. C.; Pan, W.; Vincent, S. P.; Cuevas, G.; Cañada, F. J.; Jiménez-Barbero, J.; Bartik, K. Aromatic-Carbohydrate Interactions: An NMR and Computational Study of Model Systems. Chemistry 2008, 14 (25), 7570– 7578. (131) Hammett, L. P. The Effect of Structure upon the Reactions of Organic Compounds. Benzene Derivatives. J. Am. Chem. Soc. 1937, 59 (1), 96–103. (132) Seetharaman, J.; Kanigsberg, A.; Slaaby, R.; Leffler, H.; Barondes, S. H.; Rini, J. M. X- Ray Crystal Structure of the Human Galectin-3 Carbohydrate Recognition Domain at 2.1- A Resolution. J. Biol. Chem. 1998. (133) Radosavljevic, G.; Volarevic, V.; Jovanovic, I.; Milovanovic, M.; Pejnovic, N.; Arsenijevic, N.; Hsu, D. K.; Lukic, M. L. The Roles of Galectin-3 in Autoimmunity and Tumor Progression. Immunol. Res. 2012, 52 (1–2), 100–110. (134) Ochieng, J.; Green, B.; Evans, S.; James, O.; Warfield, P. Modulation of the Biological Functions of Galectin-3 by Matrix Metalloproteinases. Biochim. Biophys. Acta - Gen. Subj. 1998. (135) Sarter, K.; Janko, C.; André, S.; Muñoz, L. E.; Schorn, C.; Winkler, S.; Rech, J.; Kaltner, H.; Lorenz, H. M.; Schiller, M.; et al. Autoantibodies against Galectins Are Associated with Antiphospholipid Syndrome in Patients with Systemic Lupus Erythematosus. Glycobiology 2013, 23 (1), 12–22. (136) Wang, Y.; Balan, V.; Raz, A. Galectin-3 and Cancer. In Introduction to Animal Lectins; Vasta, G. R., Ahmed, H., Eds.; CRC press: Boca Raton, FL, 2009; pp 195–206. (137) Collins, P. M.; Hidari, K. I. P. J.; Blanchard, H. Slow Diffusion of Lactose out of Galectin- 3 Crystals Monitored by X-Ray Crystallography: Possible Implications for Ligand- Exchange Protocols. Acta Crystallogr. D. Biol. Crystallogr. 2007, 63 (Pt 3), 415–419. (138) Saraboji, K.; Hakansson,̊ M.; Genheden, S. The Carbohydrate-Binding Site in Galectin-3 Is Preorganized to Recognize a Sugarlike Framework of Oxygens: Ultra-High-Resolution Structures and Water Dynamics. Biochemistry 2012, 51, 296–306. (139) Turnbull, W. B.; Daranas, A. H. On the Value of c: Can Low Affinity Systems Be Studied by Isothermal Titration Calorimetry? J. Am. Chem. Soc. 2003, 125 (48), 14859–14866. (140) DeLano, W. L. Pymol: An Open Source Molecular Graphics Tool. CCP4 Newsl. Protein Crystallogr. 2002, No. 40, 82–92. (141) Niesen, F. H.; Berglund, H.; Vedadi, M. The Use of Differential Scanning Fluorimetry to Detect Ligand Interactions That Promote Protein Stability. Nat. Protoc. 2007, 2 (9), 2212–2221. (142) Liu, F. T.; Hsu, D. K.; Zuberi, R. I.; Hill, P. N.; Shenhav, A.; Kuwabara, I.; Chen, S. S. Modulation of Functional Properties of Galectin-3 by Monoclonal Antibodies Binding to the Non-Lectin Domains. Biochemistry 1996, 35 (19), 6073–6079. (143) Denecke, J.; Kranz, C.; Nimtz, M.; Conradt, H. S.; Brune, T.; Heimpel, H.; Marquardt, T.

34

Characterization of the N-Glycosylation Phenotype of Erythrocyte Membrane Proteins in Congenital Dyserythropoietic Anemia Type II (CDA II/HEMPAS). Glycoconj. J. 2008, 25 (4), 375–382. (144) Lieleg, O.; Lieleg, C.; Bloom, J.; Buck, C. B.; Ribbeck, K. Mucin Biopolymers as Broad- Spectrum Antiviral Agents. Biomacromolecules 2012, 13 (6), 1724–1732. (145) Salameh, B. a.; Leffler, H.; Nilsson, U. J. 3-(1,2,3-Triazol-1-Yl)-1-Thio-Galactosides as Small, Efficient, and Hydrolytically Stable Inhibitors of Galectin-3. Bioorganic Med. Chem. Lett. 2005, 15 (14), 3344–3346. (146) Francois, K. O.; Balzarini, J. Potential of Carbohydrate-Binding Agents AsTherapeutics Against Enveloped Viruses. Med. Res. Rev. 2012, 32 (2), 349–387. (147) Balzarini, J. Inhibition of HIV Entry by Carbohydrate-Binding Proteins. Antiviral Res. 2006, 71 (2-3 SPEC. ISS.), 237–247. (148) Stowell, S. R.; Arthur, C. M.; Dias-baruffi, M.; Rodrigues, L. C.; Gourdine, P.; Heimburg- molinaro, J.; Ju, T.; Molinaro, R. J.; Xia, B.; Smith, D. F.; et al. Innate Immune Lectins Kill Bacteria Expressing Blood Group Antigen. Nat. Med. 2010, 16 (3), 295–301.

35

Chapter 2: Electronic nature of CH-π interactions

This chapter is reproduced in part with permission from Hudson, K. L.; Bartlett, G. J.; Diehl, R. C.; Agirre, J.; Gallagher, T.; Kiessling, L. L.; Woolfson, D. N. Carbohydrate-Aromatic Interactions in Proteins. J. Am. Chem. Soc. 2015, 137 (48), 15152–15160. Copyright 2015 American Chemical Society.

36

Background and significance Carbohydrate-protein interactions pervade every aspect of biology, from embryonic development1 to immune responses to pathogens2 and cancer.3 However, these interactions must contend with the fact that carbohydrates often are poor ligands due to their hydrophobic nature. Additionally, selectivity is crucial, as differences as small as the position of the 4- hydroxyl group, which is equatorial in glucose and axial in galactose, can have profound biological consequences.

For this reason, it is crucial to elucidate the physical forces involved in protein- carbohydrate interactions with respect to their effects on affinity and specificity. Given that carbohydrates have a vast number of hydrogen bonding groups, satisfying all hydrogen bonds is essential if an interaction is to take place- failure to do so imposes an often prohibitive penalty of

2-4 kcal/mol of binding energy.4 This can be an important source of specificity in carbohydrate- binding interactions, as maintain a geometrically fixed array of hydrogen bond donors and acceptors allows a carbohydrate-binding protein to prevent non-ligand carbohydrates from binding. To this end, hydrogen-bonding residues in carbohydrate-binding sites are generally residues with sp2-hybridized and bifurcated donors and acceptors,5 such that they are preorganized to satisfy only the hydrogen binding pattern of the ligand, while excluding non- ligand carbohydrates.

This preorganization also means that when the carbohydrate ligand is not in the binding site, water molecules are present instead. The displacement of this water results in a favorable contribution to the enthalpy of binding, balancing the entropic penalty incurred by restricting the motion of the ligand. In some carbohydrate-binding proteins, especially lectins, this can be an important source of binding energy. Lectins binding negatively charged carbohydrates, such as glycosaminoglycans or sialic-acid containing motifs generally utilize salt bridges, whether with cationic residues on the protein or by bound metal ions, as in the case of the C-type lectins.

Harder cations like Ca2+ are more commonly found than softer cations like Cu+, as the anionic

37 nature of the carbohydrate generally arises from hard oxyanions such as carboxylates, phosphates, and sulfates. These interactions can be highly beneficial to binding, though they are much stronger if salt concentrations are low in the solution where binding takes place.6 These charged groups might also be involved in binding the hydroxyl groups of neutral carbohydrates via charge-dipole interactions.

An important note about carbohydrate-protein interactions is that they generally occur in an aqueous environment. As such, binding energy can only be obtained from interactions that are weaker or unavailable in bulk water. This means generally that while satisfying hydrogen bonds is essential in a carbohydrate-protein interaction, it does not provide a large binding energy contribution. Interactions that are unavailable in aqueous solution therefore gain special salience in carbohydrate-protein recognition. Most notable among these is the CH-π interaction, a stacking interaction that occurs between an aliphatic CH bond and a π system. Aromatic π systems and highly polarized CH binds where the carbon has nearby electron-withdrawing groups are especially favorable.7,8 Such is often the case in carbohydrate binding sites, which often feature tyrosine or tryptophan in a face-on orientation to the aliphatic-rich α-face of β- linked saccharide residues, especially β-galactose.9

Experimental Section

For NMR experiments, indole, 5-substituted indoles, and deuterium oxide were obtained from Sigma-Aldrich and TCI. 4,4-dimethyl-4-silapentane 1-sulfonic acid (DSS) was obtained from Uvasol. Glycosides (other than methyl-β-D-mannopyranoside, synthesis below) were obtained fromPfanstiem, Sigma-Aldrich, and TCI America. All chemicals were of at least 97% purity. Solutions were prepared on a weight per volume basis. Proton NMRspectra were acquired in D2O on a Bruker Avance-500 500 MHz spectrometer with a DCH cryoprobe.

Experiments used a spectral window from 11 to −1 ppm, a 4 s acquisition time, a 2 s relaxation delay, and 64 scans. NMR experiments with a relaxation delay of 15 s were run to verify indole concentration. The shift of the trimethyl peak of DSS was normalized to δDSS = 0 ppm. For the

38 data points shown, three series of experiments were conducted at the same glycoside and indole concentrations: indole only, glycoside only, and mixed samples. The chemical shifts of glycoside protons were averaged over three replicates, and the chemical-shift perturbations were reported as Δδ = δindole − δindole‑free.

Analytical thin-layer chromatography (TLC) was carried out on Silicycle (Quebec) TLC platesprecoated w ith silica gel 60 F254 (250 μm layer thickness). Analyte visualization was accomplished using a UV lamp and by charring with phosphomolybdic acid solution. Flash column chromatography was performed with Silicycle flash silica gel (40–63 μm, 60 Å pore size) using ACS grade methanol and CH2Cl2. 1H and 13C nuclear magnetic resonance (NMR) spectra were recorded on a Bruker Avance 500 MHz spectrometer (acquired at 500 MHz for 1H NMR and 125 MHz for 13C NMR), with COSY and HSQC spectra used for assignments. Chemical shifts are reported relative to 4,4-dimethyl-4-silapentane1-sulfonic acid (DSS) as an internal reference standard, in parts per million ((Me)3SiR: 1H, 0.0 ppm, 13C, 0.0 ppm). Peak multiplicity is reported as singlet (s), doublet (d), triplet (t), quarter (q), or some combination of these. High-resolution mass spectra (HRMS) were obtained on an electrospray ionization-time of flight (ESI-TOF) mass spectrometer. Methyl-β-D-mannopyranoside 2 was obtained by catalytic hydrogenation of methyl 3,4,6-tri-O-benzyl-βD-mannopyranoside 1, which was synthesized according to literature procedures.10,11

Scheme 1: Deprotection of methyl 3,4,6-tri-O-benzyl-β-D-mannopyranoside

To a stirred solution of methyl 3,4,6-tri-O-benzyl-β-D-mannopyranoside 1 (123 mg, 0.27 mmol) in EtOH (2.5 mL) was added 10% on carbon (13 mg, 0.12 mmol). The reaction

39 vessel was purged with N2 and then equipped with a H2-filled balloon. The reaction mixture was stirred at room temperature until TLC indicated complete consumption of the starting material (16 h, Rf = 0.15 EtOAc:n-hexane 2:3) and subsequently filtered through a plug of Celite with EtOH. The filtrate was concentrated under reduced pressure and the resulting residue was purified by silica gel column chromatography (MeOH:CH2Cl2 1:9 → 1:4), then lyophilized to yield methyl-β-D-mannopyranoside 2 as a white foam (45 mg, 0.23 mmol, 88%): Rf = 0.20

(MeOH:CH2Cl2 1:4); 1H NMR (500 MHz, D2O) δ 4.57 (s, H1), 3.98 (d, J = 3.2 Hz, H2), 3.93

(dd, J = 12.2, 2.2 Hz, H6a), 3.73 (dd, J = 12.2, 6.6 Hz, H6b), 3.63 (dd, J = 9.6, 3.3 Hz, H3), 3.56

(d, J = 9.7 Hz, H4), 3.54 (s, Me), 3.37 (ddd, J = 9.3, 6.5, 2.3 Hz, H5) ppm; 13C NMR (125 MHz,

D2O) δ 101.04 (C1), 76.29 (C5), 72.98 (C3), 70.32 (C2), 66.91 (C4), 61.11 (C6), 56.83(Me) ppm;

HRMS (ESI-TOF+) calculated for C7H18NO6 (M + NH4+) 212.1129, found 212.1124.

1H NMR assay shows CH-π interactions in aqueous solution

The exact strength and properties of this interaction are poorly understood. Many previous studies, both computational and experimental, have had considerable limitations in modeling the interaction in a biologically relevant context, whether due to the complexities of a synthetic system of limits on modeling of solvent effects and dispersive forces in computational systems.12 What I sought was a system that was both easily controllable and allowed most of the effects involved in a protein-carbohydrate interaction to take place. Drawing inspiration from

Fernandez-Alonso et. al.,13 I chose to examine the interaction of indole, the functional group of tryptophan, with methyl glycosides by 1H NMR in deuterium oxide.

This assay utilizes a property known as the anisotropic effect, which results from the conductive nature of the aromatic ring. Due to the orientation of the ring in the magnetic field of the NMR spectrometer, protons along the edge of the aromatic ring will be deshielded, with an increase in chemical shift (δ), while those near the face of the ring will be shielded, with a decrease in chemical shift. In systems where quantitative CH-π stacking can be obtained, usually involving an intramolecular CH-π interaction, this shielding is approximately 1 ppm for the

40 interacting protons and decreases rapidly with distance from the aromatic ring.14 This property is very useful in measuring relative binding affinity. Even though indole is not soluble enough to form a saturated complex with any monosaccharide represented in mammalian glycans, it is possible to measure relative likelihoods that each proton will be near the face of the aromatic ring in a stacking interactions. This is because fast exchange dynamics mean that the measured chemical shift is a weighted average of the free and complexed chemical shift for the proton in question.15 A key advantage of this approach is that the resulting chemical shift change depends only on the indole concentration, not on the carbohydrate concentration. Furthermore, as this shift is specific to protons that are in contact with the aromatic ring, this allows the technique to determine the average geometry of the stacking interaction.

As such, the assay I designed consists first of measuring a 1H NMR spectrum of a 0.5-10 mM methyl glycoside solution in deuterium oxide. Another situation is simultaneously prepared containing 10 mM indole alone, and a third containing both glycoside and 10 mM indole. As the saturating concentration of indole in D2O was experimentally determined to be slightly above 10 mM, higher indole concentrations were not used due to the risk of precipitation or aggregation interfering with shimming of the NMR field.

Proton signals were assigned based on reference spectra from SDBS,16 as well as COSY

(1H-1H) and HSQC (13C-1H) spectra taken on a 500 MHz Avance II spectrometer with a DCH probe (Bruker Corporation, Billerica, MA). For each of the seven indole protons, chemical shifts were compared between the indole-only measurement and the measurement with indole and glycoside. For the eight methyl glycoside signals (OMe, H1, H2, H3, H4, H5, H6a, H6b), the comparison was made instead between the glycoside-only spectra and those with both indole and glycoside. Three replicate samples were prepared for each conditions (9 total), and the comparisons above were analyzed by a Student’s t-test. Indole signals showed no significant shift, as carbohydrates do no produce an anisotropic shift and therefore proximity to them will not alter chemical shift values.

41

Figure 1: Methyl β-galactoside and indole form a geometrically defined complex in aqueous solution. A) 1H NMR spectra of 10 mM methyl β-galactoside alone (blue line) or with 10 mM indole (red line). A decrease in chemical shift, or shielding, is observed for α-face protons such as H3 and H5 (enlarged for clarity at top left), while no change is observed for the H2 or methyl protons on the β-face. B) Geometry of the CH-π interaction based on observed chemical shift changes, with the protons labeled in the NMR spectrum circled.

This assay was originally carried out with methyl β-galactoside, representing the residue found by bioinformatics to be the most likely to be found involved in a CH-π interaction.9 An especially strong shielding was seen for H5 (Figure 1), which is anti-periplanar to O4 and in close proximity to the particularly electronegative ring oxygen. Smaller shifts are seen for the anomeric proton, H3, and H4, all of which are on the α-face. Shifts for H6 protons were small and inconsistent, and no shielding of the methyl protons or the β-face H2 was observed. For all protons examined, a linear dependence of shielding strength on indole concentration was observed up to 12 mM, the upper limit of indole solubility (Figure 2). This indicates that the dissociation constant (Kd) of this interaction far exceeds 12 mM. This linear dependence also means that values taken at lower concentrations of less soluble substituted indoles can be scaled to 10 mM.

42

Figure 2: Stacking geometry is consistent for all indole concentrations tested. Change in chemical shift varies linearly with indole concentration for all protons, which are indicated on the chemical structure of methyl β-galactoside at bottom left. This assay was conducted in deuterium oxide at 25°C, where 13 mM is the observed concentration of a saturated solution.

CH-π interactions between carbohydrates and aromatic groups are electronic in nature

The next project was to repeat the assay using a Hammett series of 5-substituted indoles ranging from the electron-rich 5-aminoindole to the electron-poor 5-nitroindole. For each shielded proton, and most prominently for H5, shielding strength was larger for more electron- rich indoles and smaller for more electron-poor indoles (Figure 3). In fact, 5-nitroindole appeared to show no shielding of β-galactoside protons at all, implying the elimination of the

CH-π interaction. This strong and linear dependence of interaction strength on Hammett parameter indicates that the CH-π interaction between a carbohydrate and an aromatic system is primarily electronic in nature, instead of being a mainly dispersive interaction as had been previously proposed.14. Specifically, the interaction occurs between the partially positive dipole on the α-face of the carbohydrate and the partially negative quadrupole on the face of the aromatic ring. This has profound consequences for designing ligands for lectins and other

43 carbohydrate-binding proteins, as it suggests enhancing the ground-state polarization of the interacting CH bonds would enhance CH-π interactions with the protein, leading to a more potent probe.

Figure 3: Carbohydrate-aromatic interactions in aqueous solution are electronic in nature. A) Favored geometry of methyl β-galactoside-indole stacking interaction with H5 and methyl protons circled. The H5 proton is positioned close to the aromatic ring, while the methyl protons are too far away to experience an anisotropic shift. B) Indoles with electron-donating substituents display strengthened CH-π stacking, while indoles with electron-withdrawing substituents display weakened or absent CH-π stacking. Values for indoles with a solubility under 10 mM are normalized to 10 mM using the observed linear dependence of deshielding on aromatic concentration.

This electronic dependence differs from the primarily dispersive CH-π interactions predicted for simpler systems such as the methane-benzene complex.17 This is likely because the electronic portion of the interaction, which this assay suggests is dominant, requires a polarized aliphatic group and an electron-rich π system such as that of indole. Furthermore, this data suggests that increasing either the partial negative charge on the face of the aromatic group or the partial positive charge on the interacting hydrogens on the carbohydrate should enhance the

CH-π interaction and lead to stronger binding. The benefit derived from a more electron-rich aromatic explains the preference for tryptophan over tyrosine or phenylalanine in carbohydrate binding sites. Two notable features present in biologically relevant carbohydrate-binding proteins could enhance the polarization of the carbohydrate. First, acetyl groups on some carbohydrates such as N-acetylgalactosamine could withdraw electrons from the interacting protons to a greater extent than the hydroxy groups present on unsubstituted carbohydrates.

44

Second, many carbohydrate-binding sites use cationic residues or divalent cations to hydrogen bond with or coordinate hydroxy groups on the α-face of the carbohydrate, which could serve to withdraw electron density from the interacting protons, rendering the protons more partially positive and the CH-π interaction more favorable.

CH-π interactions display selectivity between monosaccharides and anomers

Next, I examined the question of selectivity between different carbohydrates.

Bioinformatic data gathered by Hudson et. al.9 suggest that β-galactose was most often found in

CH-π interactions in carbohydrate binding sites, followed by α-galactose and β-glucose. To test the biophysical basis behind this, I repeated the 1H NMR assay described above using the following methyl glycosides: α-galactoside, α-glucoside, β-glucoside, α-mannoside, and β- mannoside. Due to a strong preference for the α-anomer in the Fischer glycoside synthesis of methyl mannoside, β-mannose was not commercially available at the time of the experiments.

Instead, Dr. Rob Brown and I synthesized methyl β-mannoside as previously described,10 adding a Pd/C hydrogenation step to deprotect the final product. NMR assays were performed on an

Avance III 500 MHz spectrometer with 10 mM indole and 0.5 mM glycoside.

45

Figure 4: CH-π stacking geometries in solution resemble patterns found in carbohydrate binding sites. Inner panels show the change in chemical shift for each proton when 10 mM indole is added to a 0.5 mM solution of the indicated glycoside, with larger and redder circles indicating stronger shielding from CH-π stacking. Outer panels show the probability of a binding site for the indicated monosaccharide having an aromatic group positioned closest to the indicated carbon. Figure adapted from Hudson et. al. 2015.

Among the six glycosides tested, β-galactose and β-mannose displayed the strongest interactions with indole (Figure 4). These two sugars, uniquely among those tested, have four α- face CH protons, and neither has a hydroxyl group in an axial position on the α-face. Both likewise have two protons in a position to benefit from orbital overlap with an antiperiplanar CO bond: H1 and H3 for β-mannose, and H3 and H5 for β-galactose. Interestingly, H5 on β- mannose and H1 on β-galactose, neither of which is antiperiplanar to a hydroxyl group, nonetheless display strong CH-π activity, likely due to the strongly electron-withdrawing nature of the ring oxygen and the possibility of a cooperative CH-π interaction with the other two α- axial protons. α-galactose and β-glucose also display CH-π interactions in this assay, though

46 much reduced from β-galactose and β-mannose. Both are still able to make a trivalent CH-π interaction, but have substantial deficiencies compared to the two strongest CH-π donors.

Notably, as all hydroxyl groups on β-glucose are equatorial, no proton can benefit from orbital overlap with an antiperiplanar hydroxyl group. Meanwhile α-galactose has a large methoxy group on its α-face, preventing engagement of the anomeric proton and introducing a partial negative charge near the CH-π interaction. It can still make a CH-π interaction with H3, H4, and

H5, all of which benefit from antiperiplanar hydroxyl groups. Due to α-face hydroxyl groups and lack of favorable orbital overlap, α-glucose and α-mannose do not make CH-π interactions in this assay. These patterns are similar to the distributions of aromatic residues in carbohydrate binding sites for each monosaccharide unit,9 indicating that the geometry induced by the aromatic group in aqueous solution is energetically favorable enough to persist in a protein where other forces are present.

Figure 5: CH-pi interactions involving fucose and N-acetylglucosamine resemble those involving galactose and glucose respectively. Isopropyl β-thiogalactoside displays similar interactions as β-galactose, but stacking is weighted more strongly to the anomeric proton. As in Figure 4, circles show change in chemical shift upon addition of 10 mM indole, and larger and redder circles indicate stronger shielding and therefore higher propensity to stack on the aromatic ring.

47

In addition to these six monosaccharides, I repeated the assay with α- and β-linked D-

GlcNAc, α- and β-linked L-Fuc, and isopropyl β-thiogalactopyranoside (IPTG). N- acetylglucosamine appears to have largely similar CH-π interactions to the corresponding glucose anomer, with little stacking observed from the acetyl group (Figure 5). This is a highly significant finding in the case of N-glycosylation in particular, as it suggests CH-π interactions could occur with the GlcNAc2 core, either in cis (as an attached glycan) or in trans (as a ligand).

In the case of fucose, the lack of a 6-hydroxy group appears to aid binding someone, as a diaxial interaction prevents the CO6 bond in galactose from being antiperiplanar to the CH5 bond, but not to the extent of making α-fucose as favorable a CH-π donor as β-galactose. While β-fucose is an extremely strong CH-π donor, suggesting the loss of the electronegative and effectively equatorial CO6 does aid the interaction, it is not found in mammalian glycans. IPTG, like methyl

β-galactoside, is a strong CH-π donor. While its isopropyl group is large and hydrophobic enough that it could engage in dispersive interactions with the aromatic ring, the galactose CH protons on the α-face dominate interactions with the aromatic ring instead, illustrating the strength and electronic nature of the CH-π interaction.

Conclusions and future directions

These results lead to a number of exciting conclusions. First, the presence of a CH-π interaction in a carbohydrate binding site is not only a source of affinity but also a source of specificity. Not only are aromatic rings in binding sites able to distinguish between different monosaccharides, such groups can even differentiate between anomers, best illustrated in the case of α- and β-mannose. Secondly, CH-π interactions in the context of carbohydrates and aromatic residues are largely electronic in nature. This is demonstrated by the strong dependence of indole-glycoside interactions on the electron-donating nature of indole substituents, and has profound implications. Primarily, this creates a clear distinction with simple hydrophobic interactions, which are primarily characterized by dispersive forces and the avoidance of unfavorable interactions between water and hydrocarbons. Third, orbital overlap

48 and a partially positive α-face are key factors in CH-π donor strength when it comes to carbohydrates. For less common or understood monosaccharides, this knowledge could prove useful in understand which lectins will bind them. Such insights are also useful in the design of carbohydrate-based or glycomimetic probes and inhibitors.

This work raises important insights regarding carbohydrate-protein interactions, but also opens several new avenues of research. With regard to binding specificities, future work could examine other monosaccharides, especially those found in non-mammalian glycans. This would help decipher the role of CH-π interactions in binding to microbial glycans, a central process both in microbiome homeostasis and pathogen invasion. Another, more biophysical direction would be to tweeze apart the factors that make a carbohydrate or glycomimetic a strong CH-π donor. Three factors stand out as likely contributors. First, α-face hydrophobicity may play a role, though it does not appear to be the only factor. 2-fluoro-2-deoxygalactose would be a good test of the importance of hydrophobicity, as it shares the electronic distribution of galactose but has a more hydrophobic α-face. Second, overall α-face electronics are likely to play a role, especially with regard to the weakening effects that α-face hydroxyl groups have on the interaction. This could be tested using 2-thio-2-deoxygalactose, which has a milder partially negative charge in place of the charge on O2. Finally, but not least, the electronics of the individual CH proton, which are enhanced by induction and orbital overlap, bear consideration as a contributing factor. Direct inductive effects can be tested using 4-fluoro-4-deoxygalactose, which should have an especially positive CH4 proton, while orbital overlap effects can be tested using 4-thio-4-galactose, which has especially positive CH3 and CH5 protons due to excellent overlap between CH bonds and CS antibonds. These possibilities are explored further in Chapter

5.

Another question this work raises is how the effects I measured in these experiments would manifest in the carbohydrate-binding site of a protein. The experiments described in this chapter highlight important features of CH-π interactions, but they occurred in the homogenous

49 and highly polar environment of bulk deuterium oxide. Protein binding sites, by contrast, take advantage of a heterogenous environment, with regions ranging from charged to polar to nonpolar. In addition, CH-π interactions do not occur in isolation- the glycan is often restrained in a favorable orientation by bifurcated hydrogen bonds, stabilized by water displacement, and polarized by salt bridges or charge-dipole interactions between protein cations and hydroxyl groups on the β-face of the carbohydrate. For this reason, the next phase of my studies, covered in Chapter 3, focused on a CH-π interaction in a protein context, specifically the case of galectin-

3.

Acknowledgements

I completed this work in collaboration with Dr. Robert Brown, who led the synthesis of methyl β-mannoside.

References:

(1) Furukawa, J. ichi; Okada, K.; Shinohara, Y. Glycomics of Human Embryonic Stem Cells and Human Induced Pluripotent Stem Cells. Glycoconj. J. 2016, 33 (5), 707–715. (2) Hug, I.; Feldman, M. F. Analogies and Homologies in Lipopolysaccharide and Glycoprotein Biosynthesis in Bacteria. Glycobiology 2011, 21 (2), 138–151. (3) D’Haene, N.; Maris, C.; Rorive, S.; Decaestecker, C.; Le Mercier, M.; Salmon, I. Galectins and Neovascularization in Central Nervous System Tumors. Glycobiology 2014, 0 (0), 1– 7. (4) Weis, W. I.; Drickamer, K. Structural Basis of Lectin-Carbohydrate Recognition. Annu. Rev. Biochem. 1996, 65, 441–473. (5) He, X.; Hatcher, E.; Eriksson, L.; Wildmalm, G.; MacKerell, A. D. Bifurcated Hydrogen Bonding and Asymmetric Fluctuations in a Carbohydrate Crystal Studied via X-Ray Crystallography and Computational Analysis. J. Phys. Chem. B 2013, 117 (25), 7546– 7553. (6) Hoersch, D.; Otto, H.; Joshi, C. P.; Borucki, B.; Cusanovich, M. A.; Heyn, M. P. Role of a Conserved Salt Bridge between the PAS Core and the N-Terminal Domain in the Activation of the Photoreceptor Photoactive Yellow Protein. Biophys. J. 2007, 93 (5), 1687–1699. (7) Raju, R. K.; Bloom, J. W. G.; An, Y.; Wheeler, S. E. Substituent Effects on Non-Covalent Interactions with Aromatic Rings: Insights from Computational Chemistry. Chemphyschem 2011, 12 (17), 3116–3130. (8) Nishio, M. The CH/π Hydrogen Bond in Chemistry. Conformation, Supramolecules, Optical Resolution and Interactions Involving Carbohydrates. Phys. Chem. Chem. Phys. 2011, 13 (31), 13873–13900.

50

(9) Hudson, K. L.; Bartlett, G. J.; Diehl, R. C.; Agirre, J.; Gallagher, T.; Kiessling, L. L.; Woolfson, D. N. Carbohydrate-Aromatic Interactions in Proteins. J. Am. Chem. Soc. 2015, 137 (48), 15152–15160. (10) Mayato, C.; Dorta, R. L. Experimental Evidence on the Hydroxymethyl Group Conformation in Alkyl b - D -Mannopyranosides. 2004, 15, 2385–2397. (11) Cheshev, P.; Marra, A.; Dondoni, A. Direct Epoxidation of D-Glucal and d-Galactal Derivatives with in Situ Generated DMDO. Carbohydr. Res. 2006, 341 (16), 2714–2716. (12) Toukach, F. V; Ananikov, V. P. Recent Advances in Computational Predictions of NMR Parameters for the Structure Elucidation of Carbohydrates: Methods and Limitations. Chem. Soc. Rev. 2013, 42 (21), 8376–8415. (13) del Carmen Fernández-Alonso, M.; Cañada, F. J.; Jiménez-Barbero, J.; Cuevas, G. Molecular Recognition of Saccharides by Proteins. Insights on the Origin of the Carbohydrate-Aromatic Interactions. J. Am. Chem. Soc. 2005, 127 (20), 7379–7386. (14) Chen, W.; Enck, S.; Price, J. L.; Powers, D. L.; Powers, E. T.; Wong, C.; Dyson, H. J.; Kelly, W. Structural and Energetic Basis of Carbohydrate−Aromatic Packing Interactions in Proteins. JACS 2013, 135, 9877–9884. (15) Vandenbussche, S.; Díaz, D.; Fernández-Alonso, M. C.; Pan, W.; Vincent, S. P.; Cuevas, G.; Cañada, F. J.; Jiménez-Barbero, J.; Bartik, K. Aromatic-Carbohydrate Interactions: An NMR and Computational Study of Model Systems. Chemistry 2008, 14 (25), 7570– 7578. (16) National Institute of Advanced Industrial Science and Technology. Spectral Database for Organic Compounds https://sdbs.db.aist.go.jp. (17) Dey, R. C.; Seal, P.; Chakrabarti, S. CH/?? Interaction in Benzene and Substituted Derivatives with Halomethane: A Combined Density Functional and Dispersion- Corrected Density Functional Study. J. Phys. Chem. A 2009, 113 (37), 10113–10118.

51

Chapter 3: A CH-π interaction drives glycan- binding to human galectin-3

52

Background and significance

Glycans play myriad roles in biology, from stem cell differentiation1 to viral entry into cells.2 In these roles, glycans store information regarding cell identity and state. To act on this information, proteins must recognize relevant glycans. Small differences in oligosaccharide structure, such as the position of a single hydroxyl group, reflect different avidities in multivalent binding3 and biological messages.4 Thus, carbohydrate-binding interactions must be specific.

Most carbohydrates have numerous hydrogen bond donor and acceptor sites and therefore interact favorably with water. As a result, the entropic benefit of complexation obtained from desolvating the ligand is limited.5 Carbohydrate-protein binding must rely on interactions other than the hydrophobic effect. While hydrogen bonds must be satisfied to desolvate the hydroxyl groups of the carbohydrate, this contributes minimally to the binding energy as such interactions are also present in bulk solution. One source of binding energy is the displacement of water from the preorganized binding sites present in many carbohydrate-binding proteins, which renders the entropy of binding more favorable.6,7

A key contribution to binding of many carbohydrates is the interaction of the carbohydrate with aromatic residues.8 This interaction generally involves a face-on contact between the α-face of the carbohydrate and the face of the aromatic ring9. Aromatic residues are overrepresented near the faces of carbohydrates in published carbohydrate-binding protein structures,10 suggesting that mere shape complementarity with the aliphatic CH protons of the carbohydrate is insufficient for recognition. Tryptophan and tyrosine are the most overrepresented in binding sites, and they participate in CH-π interactions. Histidine is also enriched in carbohydrate binding sites, but it is almost exclusively acting as a hydrogen bond donor.10 The CH-π interactions interactions are especially prevalent for some carbohydrates, particularly β-linked galactose, and much rarer for others such as α-mannose, likely due to the more electropositive

α-face that β-galactose and other strongly interacting carbohydrates present towards the

53

partially negative face of the aromatic ring.10 In the case of β-galactose, the closest proton to the center of the aromatic ring is likely to be H5, which is particularly electropositive due to the

* inductive effect of the ring oxygen and overlap of the σC5-H5 bond with the σC4-O4 antibond.

The above tendencies suggest that CH-π interactions drive interactions between carbohydrates and aromatic residues. Such interactions are related to π-stacking and cation-π interactions, but instead of a π-system or cation interacting with a π-system, an aliphatic CH hydrogen plays an analogous role.11 These interactions are considerably more common when the

CH group shares a carbon with an electronegative atom,12 as is the case with the hydroxyl and ring oxygens of a carbohydrate. In these cases, CH-π interactions are characterized primarily by electrostatics, with a secondary dispersive component.8 While earlier data suggest that these interactions would likely occur in many carbohydrate-binding pockets due to the presence of stacking between CH protons of the carbohydrate and the face of an aromatic ring, previous studies have not characterized the importance and strength of such interactions.

Galectin-3 bears a conserved tryptophan centrally located in the binding site

One of the lectins with an aromatic residue prominently located in the binding site is human galectin-3. This lectin, expressed in most tissues of the body, has a variety of roles, including mediating inflammation,13 clustering receptors to aid in signaling and endocytosis,14 and maintaining mucosal barriers.15,16 These activities result from oligomerization and protein- protein interactions mediated by the collagen-like N-terminal domain and carbohydrate recognition by the C-terminal lectin domain. Its binding site, like many lectins, is primarily hydrophilic and has a number of geometrically defined hydrogen bond donors and acceptors, and does not bind a metal ion. As galectin-3 has a likely CH-π interaction, is amenable to recombinant expression, and is of considerable medical relevance,17 we chose it as a system in which to investigate the importance of CH-π interactions in carbohydrate binding.

54

In this study, we examined the effect of replacing W181, the CH-π acceptor residue in galectin-3, with another aromatic residue or an aliphatic residue. We tested the stability of the resulting protein was tested by differential scanning fluorimetry (DSF), and then measured the ability of the variant to bind lactose by isothermal titration calorimetry (ITC). Additionally, we tested the ability of the full-length variant protein to bind its ligands on mammalian cells by a hemagglutination assay. We show that the presence of a CH-π interaction accounts for the clear majority of the binding affinity of galectin-3 for lactose. Furthermore, the binding energy provided by the trivalent CH-π interaction between the β-galactose residue of lactose and W181 is 3.9 kcal/mol, within the 3-7 kcal/mol range for the strength of a conventional hydrogen bond.8

Materials and methods

The coding sequence of human galectin-3 (Uniprot ID P17931) residues 113-250, a truncated construct known as galectin-3C (Gal3C), was obtained from NCBI (Gene ID: 3958).

This sequence was ordered as a gene product from Integrated DNA Technologies (Coralville, IA) and inserted into a pET24a vector (Novagen, Madison, WI) by the Gibson assembly method.15

This vector was amplified in Escherichia coli DH5α cells, then used to transform DE3 Tuner cells for protein expression. To add a noncleavable hexahistidine (His6) tag and generate variants at W181, we used an inverse PCR based strategy. Primers were designed to amplify the entire construct, with the desired modification included as a 5’ overhang. After completion of a

PCR reaction on the wild type plasmid using these primers, the product was circularized with T4 polynucleotide kinase and T4 DNA ligase, both obtained from New England Biolabs (Ipswich,

MA). The ligation reaction product was digested with the restriction enzyme DpnI to remove the template plasmid, and then the cleaned product was electroporated into DH5α cells. Mutants were verified by Sanger DNA sequencing. In later experiments, an N-terminal SUMO tag was added to the W181H and W181M mutants by Gibson cloning to improve solubility, as well as to the wild type gene for comparison. For hemagglutination experiments, a full-length construct

55

bearing an N-terminal hexahistidine tag was generated with E. coli codon optimization on residues 1-112.

Galectin-3C, full-length galectin-3, and all variants were expressed in Escherichia coli

DE3 from a pET-24a vector. Liter-scale cultures were grown to mid-log phase (OD600=0.3-0.5) and induced with 0.4 mM isopropyl β-thiogalactoside (IPTG). Cultures were then grown for 4 hours at 37°C and pelleted by centrifugation (20 min at 3000xg). After storage at -80°C, pellets were resuspended in Ni-NTA Loading Buffer (20 mM sodium phosphate, 500 mM sodium chloride, 20 mM imidazole, pH 7.4), lysed by sonication (3x40s) or French press (two cycles, maximum pressure 1500 bar), and cell debris was removed by centrifugation (30 min at

33000xg). Filtered supernatants were then loaded onto a Ni-NTA affinity column (BioRad,

Hercules, CA), washed in Ni-NTA Loading Buffer, and then eluted on a gradient to Ni-NTA

Elution Buffer (20 mM sodium phosphate, 500 mM sodium chloride, 500 mM imidazole, pH

7.4). Fractions were analyzed on a tris-glycine gel stained with Coomassie Brilliant Blue, and pooled and injected onto a Superdex 75 size exclusion column (GE Healthcare) in Galectin-3

Assay Buffer (60 mM sodium phosphate, 8 mM potassium phosphate, 390 mM sodium chloride, 5.5 mM potassium chloride, pH 7.4).

In isothermal titration calorimetry assays, we titrated a 10-50 mM solution of lactose into a 0.05-1 mM solution of galectin-3C on a MicroCal VP-ITC calorimeter (Malvern

Panalytical, Malvern, UK) with a 1.4 mL cell volume with 2 μL, 5μL, and 10 μL injection sizes until the titrant comprised 15% of the total volume. We used a ligand-buffer titration to normalize measured binding enthalpies. Galectin-3 Assay Buffer was used for both protein and ligands to ensure that all variants remained in their native state. Experiments were conducted in triplicate with separate preparations of protein to ensure reproducibility.

56

Differential scanning fluorimetry(DSF)16 assays were conducted in Galectin-3 Assay

Buffer. Experiments were conducted with 10 µM Gal3C or a variant thereof, and 15x Sypro

Orange dye (Thermo Fisher Scientific, Waltham, MA). Samples were heated at a rate of 1.5

°C/min from 15-95°C in a CFX96 Touch Real-Time PCR Detection System (Bio-Rad

Laboratories, Hercules, CA), and the melting temperature (Tm) of the protein was measured using a Boltzmann analysis. Measurements were repeated in triplicate with separate stocks of protein to verify reproducibility of the results. For variants where SUMO tags were used for ITC experiments, both tagged and untagged constructs were tested in the DSF assay.

Galectin-3 variants are stable at room temperature

The carbohydrate-recognition domain of galectin-3 contains a single tryptophan, W181, located centrally in the binding pocket. As indicated by a sub-Ångstrom crystal structure

(Figure 1A),6 the indole ring of this residue is Figure 1: Structure of the galectin-3 binding site. A) Crystal structure of the human galectin-3 binding located adjacent to the α-face of the galactose site at 0.89 Å resolution (PDB Code 3ZSJ) with lactose ligand colored dark grey and W181 colored residue in lactose, and galactose C3, C4, and C5 ruby. The aliphatic hydrogens attached to C3, C4, and C5 of the galactose residue in lactose bind in a face-on orientation to the indole ring of W181 at a are approximately 4 Å away from the aromatic distance of approximately 4 Å between heavy atoms, a favorable arrangement for the formation of three ring. This orientation places the aliphatic CH-π interactions. B-G) Close-up views of the interaction between β-galactose and residue 181 in hydrogens of the β-galactose residue in Van der the following variants of galectin-3: B) wild type, C) W181F, D) W181H, E) W181Y, F) W181M, G) W181R. Waals contact with the carbons of the indole ring Images were generated using PyMol37. in a face-on orientation, ideal for a CH-π interaction.8 W181 is invariant in all mammalian galectin carbohydrate recognition domains,18

57

suggesting it plays a key role in binding glycans. As galectin-3 has a likely CH-π interaction, is amenable to recombinant expression, and is of considerable medical relevance,17 we chose it as a system in which to investigate the importance of CH-π interactions in carbohydrate binding.

Phenylalanine and tyrosine were chosen as smaller, less electron rich alternatives to tryptophan, and tyrosine is also heavily represented in carbohydrate binding sites. Histidine was chosen as it is aromatic and similar in size to tyrosine or phenylalanine, but is bears an electron- poor π system. While histidine is heavily represented in carbohydrate binding sites, it usually serves as a hydrogen bond donor.10 Methionine was chosen as a non-aromatic residue of similar size to phenylalanine to afford shape complementarity but disallow CH-π interactions. Arginine is the amino acid in a corresponding position to W181 in human galectin-related protein,19 and has also been found as a somatic variation in gastric cancer,20 so we wished to determine the effect of the W181R variation on the stability and activity of galectin-3.

Stability of wild-type galectin-3C and all variants were measured by differential scanning fluorimetry (DSF). Briefly, this technique measures the denaturation temperature (Tm) of a protein through visualizing the fluorescence of a dye that has high affinity for the hydrophobic surfaces exposed upon protein unfolding.21 As shown in Figure 2, wild-type galectin-3C and all variants tested except W181R have a Tm considerably exceeding 25°C, the temperature at which isothermal titration calorimetry (ITC) experiments were conducted, so protein used in ITC experiments was in the native state. In all cases, melting curves showed a single melting event, consistent with the unfolding of a single-domain protein for constructs lacking a SUMO tag.

SUMO tagged constructs also displayed a single melting event (Figure 2B); likely because the signal from exposing the hydrophobic interior of the galectin-3 jelly roll fold is much larger than

58

that of unfolding the SUMO domain. Wild type galectin-3C was significantly more stable than all variants save W181M, and the trend in stability otherwise mirrored the trend in size of the residue at position 181, with larger residues resulting in a more stable protein. Even so, the loss of stability incurred by replacing W181 with histidine is quite notable, even at pH 7.4 where the histidine should not be protonated.

In our DSF assays to determine the stability of variants of galectin-3 at W181, we found that all variants except W181R were stable at room temperature, but all except

W181M incurred a decrease in melting temperature greater than 10°C. The significant loss in stability incurred by even conservative mutations of W181 indicates that W181 plays a key role in maintaining the structure of the galectin-3 carbohydrate-binding domain even in the absence of ligand. While all variants Figure 2: Most galectin-3 variants at W181 are stable at room temperature. A) Denaturation temperature (Tm) of wild-type galectin-3C (Gal3C) and variants necessarily have a smaller residue in place of used in this study, as stability, all of the variants used have a Tm exceeding the temperature used for W181, another factor that could explain the ITC experiments (25°C) by at least 10°C, such that the protein is stable against thermal denaturation structural importance of W181 is its flexibility under such conditions. Asterisks represent significant differences from 25 °C (p-value <0.05 for in the absence of ligand. The carbohydrate- *, <0.01 for **, and <0.001 for ***) as determined by a one-sample t-test, and pound signs (#) binding domain of galectin-3 generally has a represent significant differences from untagged wild type Gal3C, as determined by a two-sample t-test. W181R Gal3C was not stable at room temperature rigid jelly-roll fold, but both the loop with or without a SUMO tag. B) Representative melting curves for SUMO-tagged wild type, W181H, containing W181 and the W181 sidechain itself and W181M Gal3C, with melting temperatures determined by a negative peak in the first derivative show significant mobility in the absence of of fluorescence with respect to temperature. ligand.29

59

Galectin-3 variants have reduced binding affinity towards lactose

Due to the high conservation of W181 and the excellent characteristics of tryptophan for

CH-π interactions, we hypothesized that its interaction with β-galactose would account for most of the binding energy of the galectin-3 interaction with lactose. Accordingly, we hypothesized that variants at W181 would have less binding affinity for lactose than the wild type, and that less conservative mutations would decrease binding affinity to a greater extent. At the same time, we sought to discover the size of the energetic composition the CH-π interaction made to the lactose- galectin-3 interaction. Figure 3: Galectin-3C variants show decreased binding to lactose. A) Binding affinity of galectin-3C variants to lactose, as measured by isothermal The ITC data (Figure 3) indicate that titration calorimetry (ITC). A 10-50 mM lactose stock was used for these experiments, and error bars replacement of W181 with any other residue represent standard error from three independent experiments. Replacement of W181 with any other indeed detracts from binding affinity, as we residue results in a significant decrease in binding affinity, and this effect is especially pronounced with electron-poor histidine and non-aromatic hypothesized. The phenylalanine and tyrosine methionine and arginine. Uncleaved SUMO tags were used where indicated. B) Representative ITC variants, which differ mainly by the size of the trace for SUMO-tagged wild type galectin-3C binding to lactose (Kd=90 μM). C) Representative aromatic residue and the lack of a hydrogen ITC trace for SUMO-tagged W181M galectin-3C binding to lactose (Kd=84 mM). bond donor in phenylalanine, display a roughly 10-fold loss in affinity, corresponding to a loss in binding energy of 1.4 kcal/mol (Table

1). The histidine variant, which bears an electron-poor aromatic system, displays a 200-fold loss

60

in affinity, or a 3.6 kcal/mol loss in binding energy. Replacing tryptophan with methionine or arginine, which eliminates the aromatic system, allows only trace binding. While data with histidine and methionine variants used a SUMO fusion tag, the lack of a significant difference between wild-type galectin-3 with and without a SUMO tag indicates that the tag has little effect on binding.

Previous studies have shown that phenol and benzene, the functional groups of tyrosine and phenylalanine, respectively, show an interaction with several carbohydrates, including β- galactose, that are measurable by 1H NMR. 30 These studies were complemented with similar experiments using indole, the functional group of tryptophan, which displayed stronger interactions with β-galactose than did benzene or phenol.10,31 These complexes had a reproducible geometry, with the α-face nearest the face of the aromatic system as shown by shielding of its protons. One salient feature of β-galactose that makes it particularly suitable for

CH-π interactions is its axial hydroxyl group on the 4-position, which draws electron density away from the aliphatic hydrogens on C3 and C5, increasing their acidity and enhancing the interaction with an aromatic group.

As we expected some variants to bind very weakly to lactose, we made two modifications to enhance the precision of our calorimetry. First, three injection sizes (2, 5, and 10 µL) were used to afford denser data at low lactose concentrations in some experiments. Second, experiments were analyzed primarily to determine Kd rather than ΔH, as dissociation constant is more robust to errors caused by low binding affinity relative to protein concentration.32

While W181 does make a Van der Waals contact with aliphatic surface on the galactose residue of the ligand, the binding data in this study show that simple completion of the binding pocket for galactose does not provide enough binding affinity to recognize lactose to a measurable degree. Of note, phenylalanine, tyrosine, histidine, and methionine are nearly isosteric, but the histidine variant exhibits markedly reduced lactose binding when compared to

61

the phenylalanine and tyrosine variants, and the methionine variant has negligible binding affinity as measured by isothermal titration calorimetry. This result strongly suggests that it is indispensable for an aromatic residue to recognize the aliphatic, electropositive face of the galactose residue, and furthermore that it is important for the aromatic residue to be electron- rich. These trends indicate that W181 recognizes its galactose ligand by a CH-π interaction involving the partially positive α-face of the galactose residue and the partially negative face of the aromatic π system.

These data indicate that, in general, trends observed in small molecule assays of carbohydrate-aromatic interactions are replicated in carbohydrate binding sites of proteins. Just as benzene and phenol show similar CH-π interactions with carbohydrates but make weaker interactions than indole, phenylalanine and tyrosine are similarly effective in galectin-3 but both are inferior to tryptophan. Histidine seems to be a particularly poor acceptor due to its electronics, and methionine is not tolerated, consistent with the lack of preferential interactions between carbohydrates and aliphatic compounds in aqueous solution. Size and electron richness both appear to correlate with stronger CH-π interactions, but these variables are confounded enough in the case of tryptophan that it is not yet possible to determine which is more important.

Table 1: Strength of CH-π interactions with lactose at position 181 by variant

Binding affinity CH-π interaction energy Variant K (mM) d (kcal/mol) (kcal/mol)

Wild type 0.11 ± 0.01 -5.41 ± 0.06 3.91 W181Y 0.71 ± 0.05 -4.29 ± 0.04 2.79 W181F 0.66 ± 0.14 -4.34 ± 0.15 2.84 SUMO-W181H 45 ± 9.1 -1.83 ± 0.13 0.33 SUMO-W181M 75 ± 45 -1.53 ± 0.53 0.03 W181R 79 ± 50 -1.50 ± 0.60 0

62

CH-π interactions account for the majority of binding energy in galectin-3

Based on the association constants observed and the relation between free energy and association constant, ΔG=-RT ln Ka, where R=1.987 cal/mol/K, the CH-π interaction between tryptophan and β-galactose contributes 3.9 kcal/mol to the binding energy of galectin-3 to lactose (Table 1). This value is slightly greater than the strength of a hydrogen bond in bulk aqueous solution, and represents an energetic contribution that is exclusively available in the binding site. While this represents a particularly strong CH-π interaction due to the electron richness of tryptophan, even tyrosine and phenylalanine allow a 2.8 kcal/mol CH-π interaction.

The strong preference of tyrosine over phenylalanine in carbohydrate binding sites is not explained by energetic considerations in this study- it is more likely that the preference results from the possibility of conformationally restraining the phenol group of tyrosine through a hydrogen bond, thus decreasing the entropic cost of the CH-π interaction. As galectins invariably possess tryptophan as the CH-π acceptor residue, they lack the hydrogen bonds needed to benefit from this restraint if the tryptophan is substituted with tyrosine.

The results of this study highlight both the importance of aromatic residues in binding carbohydrates and the factors that determine the strength of such interactions. As aliphatic residues do not allow for binding even in the presence of shape complementarity, it is clear that

CH-π interactions are involved. The inability of histidine to yield an interaction on par with tyrosine or phenylalanine, meanwhile, indicates the importance of an aromatic system with partially negative faces to interact with the partially positive aliphatic protons of the carbohydrate ligand. With a strength similar to a hydrogen bond, this CH-π interaction plays a central role in galectin-3 and likely many other carbohydrate-binding proteins.

Of particular note is the fact that, unlike hydrogen bonds, CH-π interactions are not available to the carbohydrate ligand in solution. This means that the entirety of the CH-π interaction energy contributes to the binding energy of the protein-carbohydrate interaction.

63

This is a key point in understanding both binding of natural carbohydrate ligands to proteins through CH-π interactions as well as the design of inhibitors or probes for such proteins.

Targeting the CH-π interaction by further polarizing the CH bond or engaging in edge-on stacking with the tryptophan π-system could greatly enhance the ability of a small-molecule ligand to bind to a lectin with tyrosine or tryptophan prominently positioned in the binding site.

From a protein engineering perspective, CH-π interactions are a valuable tool for binding saccharide residues that have strongly polarized CH bonds, particularly β-galactose and β- mannose, and secondarily α-galactose, α-fucose, or β-glucose. Care should be taken to make sure the aromatic residue is structurally tolerated by the protein, likely involving the use of protein modeling software. If properly conducted, this approach could lead to an improvement in binding of up to 1000-fold at 25°C. Due to the fact that this interaction relies on a partially positive CH proton, a CH-π based strategy is more likely to be effective for neutral carbohydrates than anionic carbohydrates, as is observed with galectin-3.35

Acknowledgements

I conducted this work in collaboration with Stephen A. Early, who performed some of the DSF assays showing stability of the galectin-3 variants at room temperature.

References:

(1) Smith, R. a a; Meade, K.; Pickford, C. E.; Holley, R. J.; Merry, C. L. R. Glycosaminoglycans as Regulators of Stem Cell Differentiation. Biochem. Soc. Trans. 2011, 39 (1), 383–387.

(2) Francois, K. O.; Balzarini, J. Potential of Carbohydrate-Binding Agents AsTherapeutics Against Enveloped Viruses. Med. Res. Rev. 2012, 32 (2), 349–387.

(3) Mortell, K. H.; Weatherman, R. V.; Kiessling, L. L. Recognition Specificity of Neoglycopolymers Prepared by Ring-Opening Metathesis Polymerization. J. Am. Chem. Soc. 1996, 118 (9), 2297–2298.

(4) Williams, C. Galactose. Encyclopedia of Food Sciences and Nutrition (Second Edition); 2003; pp 2843–2846.

(5) Klein, E.; Ferrand, Y.; Barwell, N. P.; Davis, A. P. Solvent Effects in Carbohydrate Binding by Synthetic Receptors: Implications for the Role of Water in Natural Carbohydrate Recognition. Angew. Chemie - Int. Ed. 2008, 47 (14), 2693–2696.

64

(6) Saraboji, K.; Håkansson, M.; Genheden, S. The Carbohydrate-Binding Site in Galectin-3 Is Preorganized to Recognize a Sugarlike Framework of Oxygens: Ultra-High-Resolution Structures and Water Dynamics. Biochemistry 2012, 51, 296–306.

(7) Dam, T. K.; Brewer, C. F. Thermodynamic Studies of Lectin-Carbohydrate Interactions by Isothermal Titration Calorimetry. Chem. Rev. 2002, 102 (2), 387–429.

(8) Nishio, M. The CH/π Hydrogen Bond in Chemistry. Conformation, Supramolecules, Optical Resolution and Interactions Involving Carbohydrates. Phys. Chem. Chem. Phys. 2011, 13 (31), 13873–13900.

(9) Asensio, J. L.; Arda, A.; Canada, F. J.; Jiménez-Barbero, J. Carbohydrate-Aromatic Interactions. Acc. Chem. Res. 2013, 46 (4), 946–954.

(10) Hudson, K. L.; Bartlett, G. J.; Diehl, R. C.; Agirre, J.; Gallagher, T.; Kiessling, L. L.; Woolfson, D. N. Carbohydrate-Aromatic Interactions in Proteins. J. Am. Chem. Soc. 2015, 137 (48), 15152–15160.

(11) Brandl, M.; Weiss, M. S.; Jabs, A.; Su, È.; Jena, D.; Hilfengard, R. C-H...π Interactions in Proteins. J. Mol. Biol. 2001, No. 307, 357–377.

(12) Tsuzuki, S.; Honda, K.; Uchimaru, T.; Mikami, M.; Tanabe, K. The Interaction of Benzene with Chloro- and Fluoromethanes: Effects of Halogenation on CH/π Interaction. J. Phys. Chem. A 2002, 106 (17), 4423–4428.

(13) Hsu, D. K.; Chen, H. Galectin-3 Regulates T-Cell Functions. 2009, 230, 114–127.

(14) Pugliese, G.; Iacobini, C.; Pesce, C. M.; Menini, S. Galectin-3: An Emerging All-out Player in Metabolic Disorders and Their Complications. Glycobiology 2015, 25 (2), 136–150.

(15) Gibson, D. G.; Young, L.; Chuang, R.; Venter, J. C.; Iii, C. A. H.; Smith, H. O.; America, N. Enzymatic Assembly of DNA Molecules up to Several Hundred Kilobases. Nat. Methods 2009, 6 (5), 12–16.

(16) Niesen, F. H.; Berglund, H.; Vedadi, M. The Use of Differential Scanning Fluorimetry to Detect Ligand Interactions That Promote Protein Stability. Nat. Protoc. 2007, 2 (9), 2212–2221.

(17) Nesmelova, I. V.; Dings, R. P. M.; Mayo, K. H. Understanding Galectin Structure- Function Relationships to Design Effective Antagonists. In Galectins; Klyosov, A. a., Witczak, Z. J., Platt, D., Eds.; John Wiley & Sons, Inc.: Hoboken, NJ, 2008; pp 33–69.

(18) Zhou, D.; Ge, H.; Sun, J.; Gao, Y.; Teng, M.; Niu, L. Crystal Structure of the C-Terminal Conserved Domain of Human GRP, a Galectin-Related Protein, Reveals a Function Mode Different from Those of Galectins. Proteins Struct. Funct. Genet. 2008, 71 (3), 1582– 1588.

(19) Tate, J. G.; Bamford, S.; Jubb, H. C.; Sondka, Z.; Beare, D. M.; Bindal, N.; Boutselakis, H.; Cole, C. G.; Creatore, C.; Dawson, E.; et al. COSMIC: The Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res. 2019, 47 (D1), D941–D947.

(20) Diehl, C.; Genheden, S.; Modig, K.; Ryde, U.; Akke, M. Conformational Entropy Changes upon Lactose Binding to the Carbohydrate Recognition Domain of Galectin-3. J. Biomol. NMR 2009, 45 (1–2), 157–169.

65

(21) del Carmen Fernández-Alonso, M.; Cañada, F. J.; Jiménez-Barbero, J.; Cuevas, G. Molecular Recognition of Saccharides by Proteins. Insights on the Origin of the Carbohydrate-Aromatic Interactions. J. Am. Chem. Soc. 2005, 127 (20), 7379–7386.

(22) Vandenbussche, S.; Díaz, D.; Fernández-Alonso, M. C.; Pan, W.; Vincent, S. P.; Cuevas, G.; Cañada, F. J.; Jiménez-Barbero, J.; Bartik, K. Aromatic-Carbohydrate Interactions: An NMR and Computational Study of Model Systems. Chemistry 2008, 14 (25), 7570– 7578.

(23) Turnbull, W. B.; Daranas, A. H. On the Value of c: Can Low Affinity Systems Be Studied by Isothermal Titration Calorimetry? J. Am. Chem. Soc. 2003, 125 (48), 14859–14866.

(24) Stowell, S. R.; Arthur, C. M.; Mcbride, R.; Berger, O.; Razi, N.; Rodrigues, L. C.; Gourdine, J.; Noll, A. J.; Gunten, S. Von; Smith, D. F.; et al. Microbial Glycan Microarrays Define Key Features of Host-Microbial Interactions. Nat. Chem. Biol. 2014, 10 (6), 470–476.

66

Chapter 4: Biological functions of the CH-π interaction in galectin-3

67

Roles and structures of human galectins

Galectins are a family of proteins with myriad roles and activities in biology, many of which center around their carbohydrate-binding function. Two key features differentiate the galectins from most other mammalian lectins. The first is their carbohydrate specificity. While galectins canonically recognize β-galactosides in general and lactose and lactosamine in particular,1 their specificities are by no means binary. Often, galectins have a wide range of ligands with binding affinities varying across several orders of magnitude.2 This allows a high level of competition among ligands and a strong dependence on the context of the interaction, leading to the multifaceted biological roles galectins play.

The second is the orientation of the carbohydrate recognition domains (CRDs) of galectins with respect to each other. All galectins are capable of forming a complex presenting multiple CRDs, and unlike many other mammalian lectins, the domains are oriented in different directions instead of at a single surface. This allows the formation of a net-like arrangement of galectins and glycoconjugates known as a galectin lattice, which is of particular importance in regulating cell surface signaling.3 However, it also means that in many cases galectins do not benefit from multivalent interactions with a single surface or ligand, placing a premium on the monovalent affinity of galectin-glycan interactions.

With these properties in bind, it seemed like CH-π interactions could play a central role in the biological role of galectins, particularly galectin-3. The central question of this portion of the project was whether the longer glycan structures involved in biological activities of galectin-

3 could still be bound even if the interaction with the central β-galactose was impaired by mutation of W181. To best complement my previous work using ITC, shown in Chapter 3, I chose to focus on biological measurements of these interactions, specifically hemagglutination and binding to mucins. I additionally examined the export mechanism of galectin-3 and its possible dependence on the W181 CH-π interaction.

68

Full-length galectin-3 variants have reduced agglutination activity towards mouse red blood cells

Like other galectins, galectin-3 has the ability to agglutinate red blood cells when present at sufficient concentration.4,5 This activity is dependent on both the oligomerization provided by the N-terminal domain6 and the interaction of the C-terminal domain with lactosamine- containing N-glycans on erythrocyte membrane proteins.7 While there is some variation in the residues present on erythrocyte membrane proteins of different mammals, both human and murine erythrocyte glycans contain lactosamine epitopes which galectin-3 can bind.8 As such, we examined the hemagglutination activity of galectin-3 as a measure of its ability to bind lactosamine-containing glycans in a biological context. We used two assays to measure the hemagglutination ability of wild type and variant galectin-3: a microscopy based assay and a titer assay in a 96-well plate.

For hemagglutination microscopy assays, blood was collected from C57bl/6 mice through retro-orbital bleeding. The blood was heparinized, and then washed three times with PBS by centrifuging 5 min at 100xg after each resuspension. A 6.25% suspension of the final pellet of mouse red blood cell (RBC) pellet was prepared in PBS, and 1 µL of this suspension was treated with a 9 µL solution of wild-type or variant full length galectin-3 in PBS on a glass sample dish, to afford a 160x final dilution. The dish was covered and incubated 30 minutes at room temperature, after which 20x magnification images were taken with a Nikon A1R confocal microscope (Tokyo, Japan). Percentage agglutination was quantified by manually counting single and agglutinated RBCs in the central 317x317 µm region of the drop.

For hemagglutination plate assays, blood collected and processed as above was diluted to

1%, and 50 μL was applied to U-bottomed 96-well plate containing 50 μL of serial dilutions of

400 μg/mL protein as measured by absorbance at 280 nm. Protein used in this assay was

69

Figure 1: The W181 CH-π interaction is necessary for galectin-3 induced hemagglutination. A) Percentage agglutination of mouse red blood cells by full-length galectin-3 and variants at W181 after 30 minutes of treatment at the indicated concentration. Error bars represent standard error from three independent experiments. Treatment of mouse red blood cells with the W181H, W181R, or W181M variants of galectin- 3 (Gal3) did not result in hemagglutination at any concentration, nor did treatment with PBS. B) Galectin- 3 agglutinates red blood cells by binding multiple blood cells per pentamer C-F) Representative images of (C) wtGal3, (D) W181Y Gal3, (E) PBS, and (F) W181M Gal3, all using 200 μg/mL of protein. G) Agglutination of a 0.5% suspension of mouse red blood cells in 96-well plates, using dilutions of galectin-3 and its variants in PBS. Titer is shown in inverse concentration units, and all points represent at least three independent experiments. H) Representative titer assay plate, showing wt, W181F, W181M, W181R, and W181Y Gal3 in order from left to right. An even reddish color throughout the solution was scored as agglutination, while any settling in the bottom of the well was scored as non-agglutination. filtered through a 100 kDa molecular weight cutoff centrifugal filter immediately before use.

Wells containing 50 μL PBS and 50 μL 1% RBC suspension were used as controls. Plates were incubated overnight at room temperature, and then wells were scored visually. A uniform suspension in the well was scored as a positive reading (agglutination) and any sign of a pellet

70 on the bottom of the well was scored as a negative reading (no agglutination). The lowest concentration of each variant sufficient to cause agglutination was recorded.

As shown in Figure 1A, wild-type galectin-3 shows higher hemagglutination activity than any variant at W181. Following the trends observed in calorimetric measurements, W181F and

W181Y Gal3 displayed noticeable hemagglutination activity at 200 μg/mL, but such activity was weaker than wild type. Agglutination was absent for W181H, W181M, and W181Y Gal3, indicating the inability of such variants to bind ligands on red blood cells and subsequently oligomerize to cause agglutination. These results show that in addition to preventing binding to lactose, interference with the CH-π interaction between W181 and the β-galactose residue prevents binding to the lactosamine-containing hybrid and complex-type N-glycans coating mammalian red blood cells. Similar trends in hemagglutination activity were observed in the plate assay (Fig. 1F), supporting the conclusion that the ability to form a CH-π interaction is crucial for hemagglutination.

Wild-type galectin-3C binds to secreted mucins, but variants at W181 do not

Another facet of biology where galectin-3 plays a key role is in the gut. There, galectin-3 is expressed on the tips of the villi9 and serves a variety of roles from immunomodulation to barrier maintenance. In this second role, galectin-3 can bind and lattice mucins via their lactosamine-containing N- and O-glycans.10 Mucins are a family of very high molecular weight glycoproteins that coat epithelial surfaces that must remain hydrated. Some mucins, such as

Muc1 and Muc16, are anchored to cell membranes, while others, such as Muc2 and Muc5ac, are secreted into luminal areas.11 In both cases, much of the mucin’s molecular weight consists of

PTS repeat domains, which carry numerous O-glycans. Other domains provide crosslinking and protein-protein interaction sites, and often carry N-glycans.12

For this study, lyophilized Muc2, Muc5ac, and Muc5b were provided by Prof. Katharina

Ribbeck, whose group purified them from porcine sources as previously described.13 We reconstituted these mucins to 10 μg/μL overnight at 4°C in Mucin Assay Buffer (20 mM HEPES,

71

150 mM sodium chloride, pH 7.4), then prepared dilutions of 5, 1, and 0.5 μg/μL in Mucin Assay

Buffer. We spotted 1 μL of these four concentrations on dry nitrocellulose membranes and allowed the spots to air dry. We then blocked membranes with 5% bovine serum albumin (BSA) and 0.1% Tween 20 in Mucin Assay Buffer for 1 hour at room temperature, then incubated membranes overnight at 4°C with 10 μg/mL SUMO-galectin-3C (wild type or variant) bearing a

C-terminal Strep-tag in Mucin Assay Buffer with 0.1% BSA and 0.05% Tween 20. For each variant, one membrane was incubated with the above solution plus 50 mM lactose, while the other was incubated with a solution not containing lactose.

After incubation and washing, we incubated the blots for 90 minutes at room temperature with 1:5000 anti-Strep antibody (Strep-MAB Classic HRP, BioRad, Hercules, CA) and developed for 5 minutes with Clarity ECL reagent (BioRad). We visualized binding using chemiluminescence on a ChemiDoc imager (BioRad), then quantified the volume of spots using the Fiji image suite,14 expressing the volume of each spot as a percentage of the highest volume spot on any blot in a given replicate. We conducted three replicates of this assay.

The lactoside binding activity of galectin-3 allows it to bind mucins in a carbohydrate- dependent manner, including both cell surface10 and secreted mucins.15 This function is important for the maintenance of mucosal barriers,10 so we tested the ability of wild type and variant galectin-3C to bind three secreted porcine mucins found in the gut:15,16 Muc2, Muc5ac, and Muc5b. We expected wild type galectin-3C to bind all three mucins, and that lactose would inhibit such binding. Furthermore, we hypothesized variants that interfered with the CH-π interactions would bind more weakly if the variant had a weaker CH-π interaction (as in the case of W181F and W181Y) and would not bind at all if the variant lacked a CH-π interaction (as in the cases of W181H, W181M, and W181R).

72

As shown in Figure 2, wild-type galectin-3 does indeed bind all three mucins, and application of 50 mM lactose prevents this binding. This indicates that the binding is a protein-carbohydrate interaction involving the mucin glycans, rather than a protein-protein interaction. Variants that preserve a weak CH-

π interaction show weak carbohydrate- dependent binding to mucins, while those without aromatic groups show no carbohydrate-dependent binding at all. W181R Figure 2: Binding of galectin-3 to mucins is a galectin-3C shows modest binding, but this is carbohydrate-protein interaction dependent on the W181 CH-π interaction. A) Binding of strep-tagged galectin-3C and variants at W181 to porcine mucins not lactose dependent and such binding is Muc2, Muc5ac, and Muc5b. Dot blots with 10 μg of each mucin were incubated overnight at 4°C with 10 accompanied by a broad staining of the blot, μg/mL of the indicated galectin bearing a C- terminal Strep-tag, then washed and treated with indicating it is likely nonspecific and related to 1:5000 Strep-MAB HRP antibody (BioRad, Hercules, CA). Blots were incubated for 5 minutes aggregation of the protein. These findings with Clarity ECL substrate (BioRad) and visualized by chemiluminescence and colorimetry in a indicate that the interaction between galectin- ChemiDoc Imager. The volume of each spot was measured using ImageJ, and normalized to the highest value of any spot in the replicate. Error bars 3 and mucin glycans, like other galectin-3 represent standard error of the mean (n=3) B) Representative image of mucin dot blot assay. Blots interactions with its ligands, relies on the were spotted with 10 μg, 5 μg, 1 μg, and 0.5 μg of each mucin, then probed and developed as central role of W181 and its CH-π interaction described above. with β-galactose.

Wild-type galectin-3 is exported from HEK-293 cells, while variants at W181 are retained in the cell

While galectins are secreted proteins, none possess the canonical N-terminal signal sequence for export through the Golgi.1,17 Galectin-3 in particular is still exported when the canonical export pathway is inhibited by brefeldin A or monensin.18 Intriguingly, galectin-3

73 transfers between stromal cells and leukemia-derived B cell precursors in exosomes, as demonstrated through a fluorophore labeling experiment.19 In contrast to Golgi-based export, this pathway proceeds via the multivesicular bodies and the ESCRT machinery. In these multivesicular bodies, which are derived from late endosomes, vesicles are formed in the lumen such that their interiors are topologically related to the cytoplasm.20 HIV and other viruses can hijack the ESCRT machinery to aid in viral budding,21,22 and in the case of HIV this process is aided by galectin-3.23 One ESCRT-I component, Tsg101, plays a central role in this, recognizing the viral Gag protein via a PTAP motif.24 As such, it would be reasonable to suspect that such a pathway may reflect the normal way that galectin-3 exits cells to perform its functions as a secreted protein. Indeed, a recent study by Banfer et. al. showed that such a sequence is involved in the export of galectin-3.25

To determine if galectin-3 export has a quality control mechanism that disallows export of inactive or unstable variants, I cloned the coding sequence of galectin-3 (derived from the pET-24a-Gal3 expression vector previously cloned) into a pcDNA-4 vector (Thermo Fisher

Scientific, Waltham, MA) with a C-terminal Strep-II tag.26 This vector allowed its transfection into HEK-293 cells and subsequent overexpression. Additionally, I generated W181F, W181H,

W181M, and W181R variants by site-directed mutagenesis as described in Chapter 3.

Galectin-3 is produced and secreted by the kidney epithelium,27 so HEK-293 cells should be able to secrete wild type galectin-3 when it is overexpressed on a pcDNA-4 vector. For variants, three possibilities arise. If the mutation is too minor to affect export, or export is controlled by a distant sequence such as a motif in the N-terminal domain,25 the variant could be secreted at a similar level as the wild type. If export is impeded, galectin-3 could be retained and accumulate in the cytosol. If the mutation is drastic enough to activate degradation pathways, the variant could be mostly or entirely absent. If W181 variants are exported differently than the wild type, the trend could be based on thermodynamic stability or carbohydrate-binding activity. If the trend is based on stability, the quality control mechanism likely depends on

74 chaperones, similar to Golgi-based protein export. If the trend is based on binding activity, the export pathway likely relies on galectin-3 binding a glycoprotein with a signal sequence, like the putative mechanism for galectin-9 export via Tim-3.28

To test these hypotheses, I transfected adherent HEK-293 cells in 6-well plates with 2 μg pcDNA-Gal3 (wild type and each variant cloned above) per well at 75% confluency (~1 million cells per well) in OptiMEM media using Lipofectamine 2000. After 4 hours, I replaced the media with F17 media to induce expression of galectin-3, and collected samples every day for four days. On the fourth day, I removed the media after collecting a sample, then resuspended the cells in 1x SDS-PAGE loading dye with DTT. I measured the galectin-3 content of media and cell samples by western blotting, using 1:5000 rabbit anti-galectin-3 (Reference number PA5-

34819, Invitogen, Carlsbad, CA) and 1:10000 goat anti-rabbit HRP (Reference number 1858415,

Pierce Biotechnology, Rockford, IL), developed with Clarity Max ECL reagent (BioRad,

Hercules, CA). To allow both media and cells to be visualized on the same blot, I loaded 1 μL of cell samples and 10 μL of media samples.

As shown in Figure 3, wild-type galectin-3 is robustly exported from cells transfected with pcDNA-Gal3, but all varieties are largely retained within the cell.

This suggests the existence of a stringent quality control mechanism in the galectin-3 Figure 6: Wild-type galectin-3 is robustly exported export pathway. As neither the relatively from HEK-293 cells, while variants are retained within the cell. This western blot shows media and cell stable W181M variant nor the partially samples collected after four days of galectin-3 expression, run on a 4-15% TGX gel and visualized functional W181F variant are detectably with 1:5000 anti-galectin-3 and 1:10000 goat anti- rabbit HRP. A ladder and positive controls of purified exported, it is not possible in this assay to W181H and W181M galectin-3 are shown at left, and the band corresponding to galectin-3 (~30 kDa) is determine whether thermodynamic stability boxed in red. or carbohydrate-binding activity are more important in this quality control mechanism.

75

Further investigation of this question could proceed either by adapting the current assay to be more sensitive or by using other assays to more directly interrogate the export pathway.

Building off the current assay, one could enrich galectin-3 from the media using anti-Strep beads, such that it is possible to detect smaller amounts of exported protein. Meanwhile co- immunoprecipitation using wild type and variant galectin-3 could allow detection of binding partners that are lost on mutation and may be needed for export. Based on these results, it seems that while the Tsg101 binding sequence is necessary for galectin-3 export, it is not sufficient. Additional binding partners exist that require either a highly stable fold or a galectin- glycan interaction to allow export.

Conclusions

Previous studies have shown that galectin-3 binds more strongly to extended lactose or lactosamine containing glycans than to lactose due to its extended binding site.29,30 As such, it would theoretically be possible for a loss in affinity from the W181-β galactose interaction to be offset by interactions elsewhere in the binding site. A key test of that hypothesis was carried out in our hemagglutination assays, which test the biological function of galectin-3 binding to blood group antigens. As our data show, disrupting the CH-π interaction at W181 prevents hemagglutination even with an extended ligand. This is likely because of the central role β- galactose plays in the binding site, and because the CH-π interaction W181 makes is necessary for galectin-3 to bind β-galactose. Furthermore, galectin-3 variants without the strong CH-π interaction that W181 provides cannot bind the elaborated O-glycans of secreted mucins, demonstrating the importance of CH-π interactions in gut health and the maintenance of mucosal barriers.

In conclusion, CH-π interactions are crucial for the binding of galectin-3 to its ligands, including extended ligands such as blood group antigens. Due to the particularly favorable nature of the

CH-π interaction involved, with an electron-rich aromatic and a highly polarized carbohydrate, the energy of the overall CH-π interaction is roughly equal to a hydrogen bond and makes up a

76 majority of the binding affinity of the protein. This effect is likely to be recapitulated in other proteins, and its consideration could be of much use in engineering carbohydrate proteins and their ligands.

Acknowledgements

Purified mucins were provided by Prof. Katarina Ribbeck of Massachusetts Institute of Technology. I conducted the hemagglutination assays in collaboration with Dr. Mohammad Murshid Alam. I conducted the mucin binding assay in collaboration with Dr. Amanda Dugan. I conducted the galectin-3 export assay in collaboration with Dr. Robert Lyle McPherson.

References

(1) Klyosov, A. a. Galectins and Their Functions in Plain Language. In Galectins; 2008; pp 9–31.

(2) Stowell, S. R.; Arthur, C. M.; Mcbride, R.; Berger, O.; Razi, N.; Rodrigues, L. C.; Gourdine, J.; Noll, A. J.; Gunten, S. Von; Smith, D. F.; et al. Microbial Glycan Microarrays Define Key Features of Host-Microbial Interactions. Nat. Chem. Biol. 2014, 10 (6), 470–476.

(3) Nabi, I. R.; Shankar, J.; Dennis, J. W. The Galectin Lattice at a Glance. J. Cell Sci. 2015, 128 (13).

(4) Liu, F. T.; Hsu, D. K.; Zuberi, R. I.; Hill, P. N.; Shenhav, A.; Kuwabara, I.; Chen, S. S. Modulation of Functional Properties of Galectin-3 by Monoclonal Antibodies Binding to the Non-Lectin Domains. Biochemistry 1996, 35 (19), 6073–6079.

(5) Zhang, T.; Miller, M. C.; Zheng, Y.; Zhang, Z.; Xue, H.; Zhao, D.; Su, J.; Mayo, K. H.; Zhou, Y.; Tai, G. Macromolecular Assemblies of Complex Polysaccharides with Galectin-3 and Their Synergistic Effects on Function. Biochem. J. 2017, 474 (22), 3849–3868.

(6) Ochieng, J.; Green, B.; Evans, S.; James, O.; Warfield, P. Modulation of the Biological Functions of Galectin-3 by Matrix Metalloproteinases. Biochim. Biophys. Acta - Gen. Subj. 1998.

(7) Denecke, J.; Kranz, C.; Nimtz, M.; Conradt, H. S.; Brune, T.; Heimpel, H.; Marquardt, T. Characterization of the N-Glycosylation Phenotype of Erythrocyte Membrane Proteins in Congenital Dyserythropoietic Anemia Type II (CDA II/HEMPAS). Glycoconj. J. 2008, 25 (4), 375–382.

(8) Aoki, T. A Comprehensive Review of Our Current Understanding of Red Blood Cell (RBC) Glycoproteins. Membranes (Basel). 2017, No. 7, 56–75.

(9) Nio-Kobayashi, J. Tissue- and Cell-Specific Localization of Galectins, β-Galactose- Binding Animal Lectins, and Their Potential Functions in Health and Disease. Anat. Sci. Int. 2017, 92 (1), 25–36.

(10) Argüeso, P.; Guzman-Aranguez, A.; Mantelli, F.; Cao, Z.; Ricciuto, J.; Panjwani, N. Association of Cell Surface Mucins with Galectin-3 Contributes to the Ocular Surface Epithelial Barrier. J. Biol. Chem. 2009, 284 (34), 23037–23045.

77

(11) Ridley, C.; Thornton, D. J. Mucins : The Frontline Defence of the Lung. Biochem. Soc. Trans. 2018, 46, 1099–1106.

(12) Ridley, C.; Lockhart-cairns, X. M. P.; Collins, R. F.; Jowitt, T. A.; Subramani, X. D. B.; Kesimer, M.; Baldock, C.; Thornton, X. D. J. The C-Terminal Dimerization Domain of the Respiratory Mucin MUC5B Functions in Mucin Stability and Intracellular. J. Biol. Chem. 2019, 294, 17105–17116.

(13) Kavanaugh, N. L.; Zhang, A. Q.; Nobile, C. J.; Johnson, A. D.; Ribbeck, K. Mucins Suppress Virulence Traits of Candida Albicans. MBio 2014, 5 (6), 1–8.

(14) Schindelin, J.; Arganda-Carreras, I.; Frise, E.; Kaynig, V.; Longair, M.; Pietzsch, T.; Preibisch, S.; Rueden, C.; Saalfeld, S.; Schmid, B.; et al. Fiji: An Open-Source Platform for Biological-Image Analysis. Nat. Methods 2012, 9 (7), 676–682.

(15) Leclaire, C.; Lecointe, K.; Gunning, Patrick A. Tribolo, S.; Wittmann, Alexandra Latousakis, Dimitrios MacKenizie, Donald A. Kawasaki, N.; Juge, N. Molecular Basis for Intestinal Mucin Recognition by Galectin-3 and C-Type Lectins. FASEB J. 2018, 32 (6), 3301–3320.

(16) Pelaseyed, T.; Bergström, J. H.; Gustafsson, J. K.; Ermund, A.; Birchenough, G. M. .; Schütte, A.; Post, S. Van Der; Svensson, F.; Rodríguez-, A. M. The Mucus and Mucins of the Goblet Cells and Enterocytes Provide the First Defense Line of the Gastrointestinal Tract and Interact with the Immune System. Immunol. Rev. 2015, 260 (1), 8–20.

(17) Sato, S.; Nieminen, J. Seeing Strangers or Announcing “Danger”: Galectin-3 in Two Models of Innate Immunity. Glycoconj. J. 2004, 19 (7–9), 583–591.

(18) Lindstedt, R.; Apodaca, G.; Barondes, S. H.; Mostov, K. E.; Leffler, H. Apical Secretion of a Cytosolic Protein by Madin-Darby Canine Kidney Cells. J. Biol. Chem. 1993, 268 (16), 11750–11757.

(19) Fei, F.; Joo, E. J.; Tarighat, S. S.; Schiffer, I.; Paz, H.; Fabbri, M.; Abdel-Azim, H.; Groffen, J.; Heisterkamp, N. B-Cell Precursor Acute Lymphoblastic Leukemia and Stromal Cells Communicate through Galectin-3. Oncotarget 2015, 6 (13), 11378–11394.

(20) Pornillos, O.; Alam, S. L.; Rich, R. L.; Myszka, D. G.; Davis, D. R.; Sundquist, W. I. Structure and Functional Interactions of the Tsg101 UEV Domain. EMBO J. 2002, 21 (10), 2397–2406.

(21) Garrus, J. E.; Von Schwedler, U. K.; Pornillos, O. W.; Morham, S. G.; Zavitz, K. H.; Wang, H. E.; Wettstein, D. A.; Stray, K. M.; Côté, M.; Rich, R. L.; et al. Tsg101 and the Vacuolar Protein Sorting Pathway Are Essential for HIV-1 Budding. Cell 2001, 107 (1), 55–65.

(22) Dowlatshahi, D. P.; Sandrin, V.; Vivona, S.; Shaler, T. A.; Kaiser, S. E.; Melandri, F.; Sundquist, W. I.; Kopito, R. R. ALIX Is a Lys63-Specific Polyubiquitin Binding Protein That Functions in Retrovirus Budding. Dev. Cell 2012, 23 (6), 1247–1254.

(23) Wang, S.-F.; Tsao, C.-H.; Lin, Y.-T.; Hsu, D. K.; Chiang, M.-L.; Lo, C.-H.; Chien, F.-C.; Chen, P.; Arthur Chen, Y.-M.; Chen, H.-Y.; et al. Galectin-3 Promotes HIV-1 Budding via Association with Alix and Gag P6. Glycobiology 2014, 24 (11), 1022–1035.

(24) Martin-Serrano, J.; Zang, T.; Bieniasz, P. D. Role of ESCRT-I in Retroviral Budding. J. Virol. 2003, 77 (8), 4794–4804.

78

(25) Bänfer, S.; Schneider, D.; Dewes, J.; Strauss, M. T.; Freibert, S. A.; Heimerl, T.; Maier, U. G.; Elsässer, H. P.; Jungmann, R.; Jacob, R. Molecular Mechanism to Recruit Galectin-3 into Multivesicular Bodies for Polarized Exosomal Secretion. Proc. Natl. Acad. Sci. U. S. A. 2018, 115 (19), E4396–E4405.

(26) Schmidt, T. G. M.; Skerra, A. The Strep-Tag System for One-Step Purification and High- Affinity Detection or Capturing of Proteins. Nat. Protoc. 2007, 2 (6), 1528–1535.

(27) Chiu, M. G.; Johnson, T. M.; Woolf, A. S.; Dahm-vicker, E. M.; Long, D. A.; Guay- woodford, L.; Hillman, K. A.; Bawumia, S.; Venner, K.; Hughes, R. C.; et al. Galectin-3 Associates with the Primary Cilium and Modulates Cyst Growth in Congenital Polycystic Kidney Disease. Am. J. Pathol. 2006, 169 (6), 1925–1938.

(28) Yasinska, I. M.; Sakhnevych, S. S.; Pavlova, L.; Selnø, A. T. H.; Abeleira, A. M. T.; Benlaouer, O.; Silva, I. G.; Mosimann, M.; Varani, L.; Bardelli, M.; et al. The TiM-3- Galectin-9 Pathway and Its Regulatory Mechanisms in Human Breast Cancer. Front. Immunol. 2019, 10 (JULY), 1594.

(29) Krzeminski, M.; Singh, T.; André, S.; Lensch, M.; Wu, A. M.; Bonvin, A. M. J. J.; Gabius, H. Human Galectin-3 (Mac-2 Antigen): Defining Molecular Switches of Affinity to Natural Glycoproteins, Structural and Dynamic Aspects of Glycan Binding by Flexible Ligand Docking and Putative Regulatory Sequences in the Proximal Promoter Region. BBA - Gen. Subj. 2011, 1810 (2), 150–161.

(30) Bachhawat-Sikder, K.; Thomas, C. J.; Surolia, A. Thermodynamic Analysis of the Binding of Galactose and Poly-N-Acetyllactosamine Derivatives to Human Galectin-3. FEBS Lett. 2001, 500 (1–2), 75–79.

79

Chapter 5: Future Avenues for Investigation

80

Hammett series to determine electronic nature of CH-π interactions in a protein

As shown in Chapter 2, CH-π interactions between carbohydrates and the aromatic residues of proteins are largely electronic in nature, at least in bulk aqueous solution. However, the nature of a binding site is not the same as that of bulk water. Instead of a homogenous protic solvent, a binding site is composed of a heterogeneous arrangement of nonpolar, polar, and charged surfaces, often acting in concert to maximize binding energy.1 It would therefore be useful from a fundamental perspective to understand how this difference in the nature of the environment surrounding a CH-π interaction would affect the strength of such an interaction.

To answer this question, I sought to replicate my small-molecule Hammett series in the context of a glycan binding site through the use of non-canonical amino acids. For reasons described in Chapters 1 and 3, the protein I chose was galectin-3C. Conveniently, galectin-3C bears a single tryptophan residue,2 which is engaged in the CH-π interaction I wished to perturb.

This meant that I could use global incorporation techniques to add a substituted tryptophan analog in place of W181. Thus, I wished to use selective pressure incorporation3 to replace W181 with a Hammett series of analogs ranging from electron-rich 5-aminotryptophan to highly electron-depleted 5-nitrotryptophan.

In selective pressure incorporation, one must use an auxotrophic strain to avoid competition with the endogenous canonical amino acid. This project used the RF12 strain of E. coli, a modified DE3 expression strain lacking the tryptophan synthases TrpA and TrpB.4 I found this strain to strongly resemble DE3 Tuner cells when growing in rich media, but in minimal media the strain requires 1 mM tryptophan, provided as a 100 mM solution in 0.1 M

NaOH due to the poor solubility of tryptophan in neutral buffers.

81

Scheme 1: Two-step synthesis of tryptophan analogs from indole precursors

At the time of this project, chirally pure non-canonical tryptophan analogs were not commercially available, and only the L- enantiomer can be incorporated into proteins. I therefore used two different synthetic schemes to generate tryptophan analogs from L-serine and the corresponding indole derivative. In the first (Scheme 1), I dissolved the indole and L- serine in a mixture of acetic anhydride and glacial acetic acid, and stirred the mixture at 70°C for

2 hours. This converts the serine into an electrophile, allowing attack by the C3 position of the indole and formation of a racemic, acetylated tryptophan analog. The acyl group can be removed by Amano acylase, provided the substitution on the indole is accommodated by the acylase.5

This selectively removes the acetyl groups from only the L-enantiomer, allowing separation from the D-enantiomer. In later experiments, I found that RF12 E. coli could grow when supplemented by a racemic tryptophan analog and incorporate the L-enantiomer thereof, so I removed the N-acetyl group using 2M HCl for 72 hours at 95°C. These conditions were harsh but generally effective.

Scheme 2: One-step chemoenzymatic synthesis of substituted tryptophan analogs

82

A second synthetic scheme (Scheme 2) relies on the use of recombinant tryptophan synthase. In this procedure, a culture of E. coli expressing the pSTB7 plasmid is grown to saturation and lysed using sonication of a cell disruptor, and the crude lysate is kept in a dialysis bag within the reaction mixture for use as a catalyst.6 This reaction requires 3 days at 37°C in phosphate buffer. I used a nitrogen atmosphere in later experiments, which increased yield and decreased formation of a dark-colored particulate byproduct that is likely composed of oxidation products.

In either case, purification of the resulting tryptophan analog is nontrivial. The product is amphiphilic and poorly soluble in pH-neutral water or aprotic organic solvents, including polar aprotic solvents such as chloroform, dichloromethane, or dimethyl sulfoxide. Thankfully, most tryptophan analogs are, per my experience, readily soluble in methanol, ethanol, or solvent systems containing a high proportion of either alcohol. While the reactions above occur in water or acetic acid, the massive quantity of salts present upon concentration or workup of the reaction mixture complicates purification by reverse phase chromatography. This results in a viscous sample that flows slowly in flash chromatography and affords poor separation.

Instead, I concentrated the reaction mixture to dryness and stirred the precipitate in methanol overnight to solubilize the product while leaving the salts in solid phase. After filtration to remove salts, I dry loaded the reaction mixture on amorphous silica and purified the product by normal phase flash chromatography using an ethyl acetate-methanol solvent system.

While widespread lore states that amorphous silica will dissolve at methanol concentrations above 20%, there is little evidence to suggest that this is in fact true, except possibly when methanol and DCM are mixed to create a high-density protic solvent system. As ethyl acetate- methanol solvent systems are always low density, I observed no silica in concentrated fractions from this solvent system at any percentage, up to and including 100% methanol. Per ninhydrin- stained TLC and subsequent NMR, most substituted tryptophan analogs elute at 60-80% methanol.

83

Yields for 5-fluorotryptophan, 7-fluorotryptophan and 5,6-difluorotryptophan were good

(75%) following Scheme 5.2. Yields for aminotryptophans were also good, as long as special care was taken to avoid exposure of the reaction to air. An attempt to produce 7-hydroxytryptophan by Scheme 5.1 resulted in rapid oxidation of the intermediate in the purification process as seen by the color of solution in the fraction tubes. Yields for methyltryptophans and chlorotryptophans were moderate (40-70%) following either scheme. Yields for 5- formyltryptophan and 5-cyanotryptophan were less than 10%.

Incorporation, as mentioned previously, was by selective pressure as outlined previously.7 Briefly, a 25 mL overnight culture of RF12 cells containing a pET-24a-galectin-3C vector was grown to saturation, then diluted into 1L M9 media with 1 mM tryptophan. This culture was grown at 37°C until it reached late-log phase (OD600=0.6-0.8), and then it was pelleted by centrifugation for 15 minutes at 4000xg. The pellet was resuspended in a 3 M NaCl rinse solution, centrifuged again, and then transferred to 1L fresh M9 media without tryptophan. After a 30 min period of growth at 37°C to eliminate cellular tryptophan, 1 mM tryptophan analog was added to the media and the culture was shaken for 15 minutes at 37°C to allow uptake of the analog. At that point, 0.4 mM IPTG was added to induce protein expression and the culture was shaken 4 hr at 37°C and then centrifuged 15 min at 4000xg to pellet cells.

Purification of galectin-3C from this pellet was conducted according to the procedure in Chapter

3.

Incorporation was determined by mass spectrometry of the intact protein, using an ESI-

TOF or Q-TOF mass spectrometer. Higher incorporation percentages (>90%) were seen for fluorinated tryptophans (5-fluoro, 7-fluoro, and 5,6-difluoro), while lower percentages were seen for 5-chlorotryptophan and 5-aminotryptophan. Incorporation of 5-methyltryptophan and

5-hydroxytryptophan was not observed. This was unexpected, as incorporation of 5- aminotryptophan and 5-hydroxytryptophan by selective pressure has been previously reported.3,8 I attempted to broaden the scope of analogs that I could incorporate through the use

84 of the EMBL cell-free expression system,9 but was largely unsuccessful. Use of the PURExpress reconstituted cell-free expression system (New England Biolabs, Ipswich, MA) allowed some incorporation of 5-aminotryptophan, but did not produce protein at the scale required for calorimetry.

Yield for fluorinated analogs was approximately 10 mg of purified galectin-3C per liter of culture, a sufficient quantity for ITC experiments. I conducted ITC experiments using 5F-W181,

7F-W181, and 5,6-diF-W181 galectin-3C, using lactose as the ligand in a manner outlined in

Chapter 3. My hypothesis was that incorporation of a less-fluorinated tryptophan analog would lead to decreased binding affinity. Surprisingly, this was not the case (Figure 5.1), as there was no significant difference in association constant between wild-type galectin-3C and any fluorinated analog tested.

There are two conclusions that this finding suggests. First, CH-π interactions could be largely dispersive in the context of the binding site of a protein. This would be surprising in light of the Hammett series in Chapter Figure 1: Fluorinated analogs have similar lactose binding 2, but would highlight the difference to wild-type galectin-3C. between bulk solution and the binding site of a protein. Second, the larger surface area of the fluorinated tryptophan residue could compensate for the decreased electron richness. This is a testable hypothesis if an analog bearing a substituent that is similar in size or larger than a fluorine but not as electron-withdrawing could be incorporated. Incorporation of an aminotryptophan would be ideal. However, it is likely that the tryptophan tRNA synthetase either does not allow for 5-aminotryptophan or has such a strong preference for unsubstituted tryptophan that any residual tryptophan present will be incorporated in place of the analog.

85

This is a solvable problem, should some future researcher find it worth the effort. Three strategies show special promise in broadening the scope of tryptophan analogs that can be incorporated into galectin-3. First, Lactococcus lactis has more promiscuous tRNA synthetases than E. coli, and while the suite of genetic tools is not as broad as in E. coli, inducible expression strains are available.10,11 Second, directed evolution methods as pioneered by Peter Schultz could be used to evolve a tRNA synthetase (either an amber stop codon suppressor or a tryptophan synthetase) to recognize and charge the desired non-canonical tryptophan analog.12 While

Schultz employs a negative selection step to ensure that the evolved synthetase and tRNA are not charged with a canonical amino acid, this step is likely impractical due to the close similarity of the analogs to tryptophan. Instead, the use of positive selection would create a promiscuous synthetase that could be used for selective pressure incorporation. Finally, a different lectin could be used, especially one with tyrosine in the binding site as analogs would be easier to obtain.

Design of glycomimetic ligands with enhanced CH-π interactions While the data shown in Chapter 2 strongly suggest an electrostatic basis for CH-π interactions involving carbohydrates, proper inhibitor optimization requires knowledge of the scale at which electrostatics are important in the interaction. After all, some modifications, such as placing two fluorine atoms on the same carbon, can increase the partial charge of the interacting protons but at the same time introduce a stronger partial negative charge on the interacting face.13 Studies with natural monosaccharides, where the α-face is almost always the interacting face, show that sugars with a more electropositive α-face, such as βGal and βMan, do have stronger CH-π interactions. However, natural sugars confound the effects of reduced partial negative charge in the equatorial α-face positions and increased partial positive charge in the axial α-face positions as C2 or C4 hydroxyl groups are moved from equatorial to axial.

86

Figure 2: The CH3 bond of β-galactose is polarized by overlap with the antiperiplanar CO4 hydroxy group. 2D natural bond orbital (NBO) diagrams of a) Overlap between CH3 bonding orbital and CO4 antibonding orbital of β-D-galactose and b) Non-overlap between CH2 bonding orbital and CO3 antibonding orbital of β-D-galactose. Note that these orbitals are antiperiplanar in a) and gauche in b). One contributor to the partial positive charge of aliphatic hydrogens in sugars is the inductive effect of axial hydroxyl groups on neighboring carbons. These act to polarize the CH bond through an overlap of the σCH and σCO* orbitals (Figure 2a). As this is a two-electron interaction between a filled and an empty orbital, it is stabilizing and results in a delocalization of electrons from the CH bond into the CO antibond, increasing the electropositivity of the hydrogen involved.14 As efficient overlap does not occur for bonds in a gauche conformation

(Figure 2b), this effect is specific to hydroxyl groups antiperiplanar to the interacting proton, which for a face-on CH-π interaction means axial β-face hydroxyls. For a hexopyranose, O2, O4, and O6 carry the possibility of such inductive effects, though O6 will rotate to equatorial if O4 is axial to avoid an unfavorable syn-pentane interaction. This could explain why both βGal and

βMan are more capable of CH-π interactions than βGlc.

Another possibility for this disparity in CH-π interactions strengths is that the lesser number of hydroxyl groups on the α-face in βGal and βMan eliminates unfavorable electrostatic interactions between the hydroxyl oxygens and the π-system quadrupole. While these equatorial positions are farther from the quadrupole than the interacting hydrogens, the difference in partial charge between an aliphatic proton and a hydroxyl oxygen is larger than any change in

87 partial charge of an aliphatic proton that inductive effects could reasonably produce. This again cannot be deciphered using natural sugars as the presence of a hydroxyl group in an axial β-face position implies its absence in the corresponding equatorial α-face position.

Finally, while the difference between natural sugars indicates that the hydrophobicity of the molecule as a whole is not the dominant factor in determining CH-π strength, it is still plausible that the hydrophobicity of the α-face may play a role. After all, the aromatic partner in the interaction is quite hydrophobic, so while hydrophobic effects should not be dominant considering the substituted indole data, they are likely still a factor.

In order to gauge the importance of these three effects, it would be beneficial to test the interaction strength of select monosaccharide analogs with indole and phenol, so as to assess the effect of the size of the π system. The aqueous 1H NMR assay used Chapter 2 would be convenient for this project, but aromatic concentrations should be as high as can be achieved while maintaining a linear dependence of chemical shift perturbation on aromatic concentration. Monosaccharide analogs should be 2 mM, as this was shown to give clear line shape in Chapter 2 and will keep the scale of synthesis modest.

1 2 3

Figure 3: Proposed monosaccharide analogs to test the relative importance of contributors to CH-π interaction strength, all variants of methyl β-D-galactopyranoside.

The first monosaccharide analog to use is methyl 2-deoxy-2-fluoro-β-D- galactopyranoside (1). As the fluoro substituent is similar in partial charge to a hydroxyl group but cannot donate a hydrogen bond, this compound should have similar electrostatics to βGal

88 but be more hydrophobic on the α-face.15 If hydrophobicity is an important factor in CH-π strength, this compound should have stronger CH-π interactions than MeβGal as shown by chemical shift perturbations.

To test the influence of increased inductive effects, one could synthesize and use methyl

4-deoxy-4-thio-β-D-galactopyranoside (2). This compound retains some of the hydrogen bonding donating and accepting ability at the 4-position and does not alter the α-face hydrophobicity with respect to MeβGal, but the CS antibond overlaps very well with the CH bonds, leading to a large delocalization and polarization of the CH bonds at C3 and C5. If this inductive effect is important in determining CH-π strength, this sugar should display considerably stronger CH-π interactions than MeβGal, especially at H3 and H5.

The thiol group has similar hydrophobicity to the fluoro substituent in fragmentation methods, but carries a far smaller partial negative charge. To test the effect of decreased partial negative charge on the α-face, it would be useful to synthesize and use methyl 2-deoxy-2-thio-β-

D-galactopyranoside (3). If the partial charge of the entire α-face is important, the reduced electronegativity at the equatorial 2-position should give this compound stronger CH-π interactions than methyl 2-deoxy-2-flouro-β-D-galactopyranoside, or for that matter MeβGal.

The syntheses of these compounds are non-trivial, but all have published methods,16,17 which should be followed and optimized if needed. Due to the high sensitivity of the 1H NMR assay, only 20-40 mg of each compound is required, allowing the synthesis to be carried out on a reasonable scale.

89

Depending on the results with the sugars mentioned above, this project is amenable to extension in an iterative fashion, with the results of one experiment guiding the design of the next. One logical extension would be to add chloro substituents to C2 and/or C4 while keeping a galactose configuration. Chlorine is less electronegative and more hydrophobic than fluorine, and C-Cl antibonds overlap very well with CH bonds, leading to strong inductive effects. Due to the steric similarity between a chlorine atom and a hydroxyl group, chlorosugars already find use in glycomimetics, including the sugar substitute sucralose.

Conferral of CH-π interactions on existing antiretroviral lectins

HIV and its associated disease AIDS constitute a global pandemic. AIDS was the second most common infectious cause of death in 2019 after tuberculosis, and despite being a pandemic for decades lacks a vaccine or cure. While much focus has been dedicated to small molecule therapeutics, another line of research focuses on preventing infection using lectins that bind the glycans on the gp120 protein, which is required for fusion with CD4 and entry into T cells.18,19

Such agents can furthermore prevent the virus from interacting with DC-SIGN, preventing its transmission from dendritic cells to T cells.20 The gp120 glycoprotein is densely N-glycosylated, leaving very little exposed protein surface for recognition. Furthermore, this glycosylation is so dense as to interfere with canonical glycan processing in the Golgi, leading to a drastic overabundance of oligomannose glycans on the protein, particularly the unprocessed Man9 glycan. Several proteins, isolated from mammals, plants, and algae, bind strongly to the Man9 glycan and show promise in preventing HIV entry into T cells.21

90

Some of these proteins, such as

Oscillatoria aghardii agglutinin (OAA) and Microcystis viridis lectin (MVL), bind principally to the β-mannose containing core of the Man9 glycan, while others such as cyanovirin-N (CVN) Figure 4: Algal and bacterial lectins recognize diverse and griffithsin bind principally to the motifs on HIV gp120 Man9 glycans. O. aghardii agglutinin recognizes the core β-mannose residue, while cyanovirin- non-reducing ends, which principally N binds to the outer α1,2 mannose residues. consist of α-mannose(Figure 4).22 As shown in Chapter 2, α-mannose is a poor CH-π donor, while β-mannose is an excellent CH-π donor. Thus, we would expect that a lectin recognizing the core would bear an aromatic residue in the binding site, while a lectin recognizing the nonreducing ends would not. In the case of CVN and OAA, this is indeed true- OAA has a tryptophan residue recognizing the β-mannose residue of the glycan,23 while CVN lacks aromatic residues in the binding site.24 As extended binding sites are common in carbohydrate recognition and generally lead to enhanced affinity, and to demonstrate the utility of CH-π interactions in binding carbohydrates,25 I sought to confer a CH-π interaction on CVN through protein engineering. This would extend its binding site to recognize the core β-mannose, which should greatly increase its affinity for the Man9 glycan and make it a more potent prophylactic.

Cyanovirin-N (CVN) is a lectin produced and secreted by the cyanobacterium Nostoc ellispisporum.26 It bears two carbohydrate recognition domains, both with specificity for α- mannose.27 While the domains have 60% identity with each other, cyanovirin-N as a whole has little homology with any eukaryotic lectin. In solution, it is in equilibrium between a monomeric form and a domain-swapped dimer, though the latter form is always found in crystallography24.

Cyanovirin-N can inactivate HIV at nanomolar concentrations through binding to and crosslinking gp120 on HIV Env spikes,18 preventing the formation of the six-helix bundle

91 required for viral entry. While the therapeutic concentration is two orders of magnitude below the lethal concentration for human cells,18, cyanovirin-N is also a potent mitogen at the required concentrations.28 Enhancing its binding affinity through adding a CH-π interaction could decrease the needed concentration to a level that does not result in mitogenic activity, making it a usable prophylactic.

Cyanovirin-N can be problematic 50 to express in E. coli due to its intricate 45 arrangement of disulfide bonds.29 40

However, it proper folding can be (°C) m

T 35 ensured by expression with a SUMO 30 tag,30 and is retained upon cleavage of 25 the tag. This allows a researcher to take wt T25F T25W T25Y CVN Variant advantage of the plentiful yields and Figure 5: Cyanovirin-N variants at T25 are stable at room diverse genetic tools that E. coli temperature. Melting temperatures were measured by dynamic scanning fluorimetry. expression provides. As such, I ordered a SUMO-CVN construct gene as a gBlock from Integrated DNA Technologies (Coralville, IA), and cloned it into a pET24a vector as described for galectin-3 in Chapter 3. This construct expresses readily in E. coli and remains soluble when the SUMO tag is removed. In order to make the first step towards a CH-π enabled cyanovirin-N construct, I mutated T25 to phenylalanine, tyrosine, and tryptophan by site-directed mutagenesis as described for galectin-3 in Chapter 3. Once I expressed and cleaved all three variants, I used DSF (as in Chapter 3) to determine the stability, to account for any drastic structural consequence of these mutations.

While there is a slight loss in stability in each of these variants, all are clearly stable at room temperature (Figure 5), and thus provide a good starting point for optimization.

It is highly likely that the residues surrounding T25 are not optimal to position an aromatic variant of T25 for a CH-π interaction with the β-mannose residue of the Man9 glycan.

92

Directed evolution techniques provide a path to superior accommodation of the CH-π interaction and a higher affinity lectin. Directed evolution is a protein-level artificial selection method consisting of an iterative cycle of diversification, screening, and sequencing, such as to optimize a protein for the selected function. Two diversification approaches are potentially useful for this project. First, an unbiased approach such as error prone rolling circle amplification31 would allow generation of mutants at any point in the protein, including allosteric sites that could be important for proper positioning of the aromatic residue at position

25. Second, a more targeted approach using deep mutational scanning32 could selectively modify each amino acid in or adjacent to the loop that T25 is on (positions 22 to 30) to all possible single mutants, allowing local optimization specifically to accommodate and enhance the CH-π interaction. To best enable selection for Man9 on gp120, the target should be recombinant gp120

33 expressed in HEK-293 cells, which strongly expresses Man9. Due to the required disulfide bonds, bacterial display is likely the best technology, particularly with a bacterium such as S. gordonii that has already been shown to display functional cyanovirin-N.34 The central challenge will be in maintaining a highly stringent screen, due to the high baseline affinity of

35 cyanovirin-N for Man9 glycans. If successful, this project could offer both a demonstration of the strength and importance of CH-π interactions and a powerful anti-HIV prophylactic.

Characterization of tandem repeat galectin binding profiles to mammalian and microbial glycans

While prototype galectins form dimers and galectin-3 forms pentamers tandem repeat galectins such as galectins -4, -8, and -9 do not oligomerize. Instead, they bear two carbohydrate-recognition domains on opposite ends of a linker, a mode of organization distinct from other mammalian lectins. Intriguingly, these domains are distinct from each other in both

93 sequence and binding mode- among galectins -4, -8, and

-9 the C-terminal domains are more closely related to each other than to their corresponding N-terminal domains (Figure 6).36 This arrangement of two different carbohydrate recognition domains on a flexible linker is similar to many constructs designed for use in molecular biology, but the biological role of such an arrangement is Figure 6: Tandem repeat N- and C- poorly studied. More is known about the functions of the terminal domains are similar to each other. This phylogenetic plot, generated tandem repeat galectins in general. using BLAST, compares the protein sequence of the individual tandem Galectin-4 is strongly expressed in the repeat domains to each other and to the C-terminal domain of galectin-3. mammalian intestinal epithelia, and has an anti- inflammatory role in the intestinal mucosa, inhibiting T cell activation through selective binding to activated T cells.37 Through its binding to lactosamine-containing glycans, it is able to drive trafficking of glycoproteins to the apical side of the intestinal epithelia, selectively enriching heavily N-glycosylated proteins.38 Its crosslinking ability also strengthens adherence junctions and lipid rafts in the digestive tract epithelia.39 Galectin-4 also plays an anti-mimicry role, binding and killing bacteria that express blood group antigens.40,41

Galectin-8 can act as a danger receptor by binding to the glycans of damaged vescicles and associating with NDP52, thus inducing autophagy. This provides a defense against

Salmonella typhimurium and other vescicle-damaging pathogens.42 While sialylation detracts from binding for many galectins, galectin-8 binds particularly strongly to sialylated glycans.43

Biophysically, galectin-8 is distinguished by its variable linker length between isoforms- three lengths known as 8S, 8M, and 8L are most common.44 It plays a key role in angiogenesis in development and tumor progression, and also activates platelets to mediate thrombogenesis.45

Through its crosslinking activity, galectin-8 can enhance the adhesion of tumor cells, making high expression a marker of poor prognosis in cancer.46

94

Galectin-9, also known as Tim-3 ligand, is well known to play a strong role in both innate and adaptive immunity.47 It localizes with the mucin-like protein Tim-3 due its binding to the glycans on Tim-3, and may in fact use such binding as an export pathway, given that like other galectins, galectin-9 lacks a signal sequence.48 This binding to Tim-3 decreases antiviral response, but allows galectin-9 to opsonize some bacterial pathogens such as P. aeruginosa, targeting them for neutrophil-mediated killing.49 While most galectin-9 activities are mediated by N-glycan binding, it is capable of crosslinking protein disulfide isomerase through binding to

O-glycans, causing reduction of disulfide bonds in surface-displayed and secreted proteins on T cells.50

With these disparate functions in mind, the two-domain architecture raises a number of intriguing possibilities. One domain could bind a host cell, such as an epithelial cell or an immune cell, and the other a microbe, whether a commensal or a pathogen. The domains could have differing specificities to recognize a broad variety of organisms with the same protein. The domains could even be very similar in their binding profiles, such as to best produce a galectin lattice, aided by the fact that no oligomerization is necessary to form such a lattice. To find out which of these possibilities could be true, one must determine the binding partners of the individual domains of each of these tandem repeat galectins.

95

600 Gal4N DSF assay results for tandem repeat galectin domains

400 Gal4C

200 Gal8 full Gal8N 0 Gal8C -200 Gal9N

-d(RFU)/dT -400 Gal9C

no -600 protein no dye -800

-1,000 0 10 20 30 40 50 60 70 80 90 100 Temperature (°C)

Figure 7: Tandem repeat galectin domains are stably folded at room temperature. The negative peak on each DSF trace shown represents the melting temperature of each tandem repeat galectin domain.

In order to interrogate the functions of the domains, purified samples of the domains expressed as single-domain constructs are needed. To this end, I have cloned galectin-4N

(residues 1–152), galectin-4C (residues 179–323),51 galectin-8N (residues 1-156), galectin-8C

(residues 184-317),52 galectin-9N (residues 1-149), and galectin-9C (residues 220-355)53 into the pET24a vector for expression in E. coli. All constructs lack structural disulfide bonds or glycosylation, making them amenable to bacterial expression. In addition, I have cloned full length galectins 4, 8, and 9 in the same manner so as to allow comparison to their individual domains. All constructs bear a C-terminal hexahistidine tag downstream of a sortagging site, such that sortagged products can be removed by inverse NiNTA chromatography. Initial purification of all six domains by NiNTA chromatography was effective, though some domains such as galectin-4N were obtained in far greater yield than others, such as galectin-8C. All six domains are highly amenable to stability measurement by DSF as described in Chapter 3, and all purified samples were stably folded (Figure 7). The domains of galectin-9 have a strong

96 tendency to aggregate under oxidizing conditions and should be stored with reducing agent whenever possible.

Since tandem repeat galectins, like other galectins, bind strongly to both mammalian and microbial glycans, a variety of techniques present appealing opportunities to assay the binding profiles of their individual domains. Glycan arrays, which are available for a range of mammalian and bacterial glycans,54 allow lectins to be assayed for binding to hundreds of glycans at once. While most lectins tested on these arrays also benefit from multivalency, monovalent lectins such as a tandem repeat galectin domain should still be capable of a robust glycan array response, as the likewise monovalent galectin-3C yields strong signals. The principal advantage of these glycan arrays is that individual glycan binding partners can be identified for the lectin tested. The principal disadvantage is that many glycans particularly those displayed by the commensal microbiota, are not found on any currently available array.

Another class of assays, which complement the glycan arrays mentioned previously, consists of using microscopy to visualize the binding profiles of the tandem repeat galectin domains. As the individual domains are unlikely to induce agglutination or lysis due to their monovalent nature, microscopic assays entail the labeling of the domain with a fluorophore.

This can be accomplished either through the use of a fluorescent antibody that recognizes an epitope tag on the lectin or by sortagging the lectin with a peptide containing a small molecule fluorophore like fluorescein or eosin.55 Direct conjugation eliminates the issue of nonspecific antibody recognition, but may be challenging for domains that express poorly in E. coli, such as galectin-8C and galectin-9N. Microscopy is useful for determining binding to isolated or cultured pathogen strains, cultured mammalian cells (especially immune cells), or murine gut sections as described in Chapter 4. The principal advantage of microscopy is in the broad range of potential analytes, as anything that can be visualized under a microscope without the introduction of harsh conditions can be investigated as a binding partner. The principal

97 disadvantage is that it can be extremely difficult to determine the exact glycans involved in the binding.

Another assay involving whole mammalian and bacterial cells is flow cytometry, where a sample is treated with a DNA dye and a fluorescently labeled lectin, such as a tandem repeat galectin domain. The cytometer gathers the sample into a focused stream, and as the stream passes through the detector, one cell at a time is tested for forward scatter (related to size), side scatter (related to internal complexity), and any fluorophores and dyes applied. Tandem repeat galectins may well bind fecal microbiota as well, so this assay should be repeated for galectins 4,

8, 9, and the individual domains thereof. In addition, salivary or oral microbiota and white blood cell samples are of particular interest. The principal advantages of flow cytometry are its convenience for measuring binding to commensal microbes and the potential to isolate binding populations through flow-assisted cell sorting (FACS). The principal disadvantage is that samples must be very carefully prepared to ensure reproducibility.

These assays should provide a fruitful direction to explore the workings of tandem repeat galectins. While less studied than the ubiquitously expressed galectins 1 and 3, these lectins play key roles in immunity and host-microbe interactions, particularly within the gut. Deciphering the individual functions of these domains will be very valuable in understanding how the lectins are able to maintain gut integrity, defend against pathogens mimicking host glycans, and potentially retain commensal bacteria.

Acknowledgements

Expression of the individual galectin-8 domains was a collaborative effort with Melanie Halim and Alan Carter.

References (1) Schutz, C. N.; Warshel, A. What Are the Dielectric “Constants” of Proteins and How to Validate Electrostatic Models? Proteins Struct. Funct. Genet. 2001, 44 (4), 400–417. (2) Collins, P. M.; Hidari, K. I. P. J.; Blanchard, H. Slow Diffusion of Lactose out of Galectin- 3 Crystals Monitored by X-Ray Crystallography: Possible Implications for Ligand- Exchange Protocols. Acta Crystallogr. D. Biol. Crystallogr. 2007, 63 (Pt 3), 415–419.

98

(3) Lepthien, S.; Wiltschi, B.; Bolic, B.; Budisa, N. In Vivo Engineering of Proteins with Nitrogen-Containing Tryptophan Analogs. Appl. Microbiol. Biotechnol. 2006, 73 (4), 740–754. (4) Lin, M. T.; Fukazawa, R.; Miyajima-Nakano, Y.; Matsushita, S.; Choi, S. K.; Iwasaki, T.; Gennis, R. B. Escherichia Coli Auxotroph Host Strains for Amino Acid-Selective Isotope Labeling of Recombinant Proteins, 1st ed.; Elsevier Inc., 2015; Vol. 565. (5) Blaser, G.; Sanderson, J. M.; Batsanov, A. S.; Howard, J. A. K. The Facile Synthesis of a Series of Tryptophan Derivatives. Tetrahedron Lett. 2008, 49 (17), 2795–2798. (6) Goss, R. J. M.; Newill, P. L. A. A Convenient Enzymatic Synthesis of L-Halotryptophans. Chem. Commun. 2006, No. 47, 4924–4925. (7) Broos, J.; Ter Veld, F.; Robillard, G. T. Membrane Protein-Ligand Interactions in Escherichia Coli Vesicles and Living Cells Monitored via a Biosynthetically Incorporated Tryptophan Analogue. Biochemistry 1999, 38 (31), 9798–9803. (8) Zamanian-Daryoush, M.; Gogonea, V.; DiDonato, A. J.; Buffa, J. A.; Choucair, I.; Levison, B. S.; Hughes, R. A.; Ellington, A. D.; Huang, Y.; Li, X. S.; et al. Site-Specific 5- Hydroxytryptophan Incorporation into Apolipoprotein A-I Impairs Cholesterol Efflux Activity and High-Density Lipoprotein Biogenesis. J. Biol. Chem. 2020, 295 (15), 4836– 4848. (9) Kim, T. W.; Kim, D. M.; Choi, C. Y. Rapid Production of Milligram Quantities of Proteins in a Batch Cell-Free Protein Synthesis System. J. Biotechnol. 2006, 124 (2), 373–380. (10) El Khattabi, M.; van Roosmalen, M. L.; Jager, D.; Metselaar, H.; Permentier, H.; Leenhouts, K.; Broos, J. Lactococcus Lactis as Expression Host for the Biosynthetic Incorporation of Tryptophan Analogues into Recombinant Proteins. Biochem. J. 2008, 409 (1), 193–198. (11) Petrović, D. M.; Leenhouts, K.; Van Roosmalen, M. L.; Broos, J. An Expression System for the Efficient Incorporation of an Expanded Set of Tryptophan Analogues. Amino Acids 2013, 44 (5), 1329–1336. (12) Chatterjee, A.; Xiao, H.; Yang, P. Y.; Soundararajan, G.; Schultz, P. G. A Tryptophanyl- TRNA Synthetase/TRNA Pair for Unnatural Amino Acid Mutagenesis in E.Coli. Angew. Chemie - Int. Ed. 2013, 52 (19), 5106–5109. (13) Zhang, R.; McCarter, J. D.; Braun, C.; Yeung, W.; Brayer, G. D.; Withers, S. G. Synthesis and Testing of 2-Deoxy-2,2-Dihaloglycosides as Mechanism-Based Inhibitors of Alpha- Glycosidases. J. Org. Chem. 2008, 73 (8), 3070–3077. (14) Alabugin, I. V.; Zeidan, T. a. Stereoelectronic Effects and General Trends in Hyperconjugative Acceptor Ability of Sigma Bonds. J. Am. Chem. Soc. 2002, 124 (12), 3175–3185. (15) Ioannou, A.; Cini, E.; Timofte, R. S.; Flitsch, S. L.; Turner, N. J.; Linclau, B. Heavily Fluorinated Carbohydrates as Enzyme Substrates: Oxidation of Tetrafluorinated Galactose by Galactose Oxidase. Chem. Commun. 2011, 47 (40), 11228–11230. (16) Haradahira, T.; Maeda, M.; Yano, Y.; Kojima, M. New Synthesis of 2-Deoxy-2-Fluoro-D- Galactose. Chem. Pharm. Bull. 1984, 32 (8), 3317–3319. (17) Pei, Z.; Dong, H.; Caraballo, R.; Ramström, O. Synthesis of Positional Thiol Analogs of β-

99

D-Galactopyranose. European J. Org. Chem. 2007, No. 29, 4927–4934. (18) Botos, I.; Wlodawer, A. Proteins That Bind High-Mannose Sugars of the HIV Envelope; 2005; Vol. 88. (19) Akkouh, O.; Ng, T. B.; Singh, S. S.; Yin, C.; Dan, X.; Chan, Y. S.; Pan, W.; Cheung, R. C. F. Lectins with Anti-HIV Activity: A Review. Molecules 2015, 20 (1), 648–668. (20) Balzarini, J.; Van Herrewege, Y.; Vermeire, K.; Vanham, G.; Schols, D. Carbohydrate- Binding Agents Efficiently Prevent Dendritic Cell-Specific Intercellular Adhesion Molecule-3-Grabbing Nonintegrin (DC-SIGN)-Directed HIV-1 Transmission to T Lymphocytes. Mol. Pharmacol. 2007, 71 (1), 3–11. (21) Ziółkowska, N. E.; Wlodawer, A. Structural Studies of Algal Lectins with Anti-HIV Activity. Acta Biochim. Pol. 2006, 53 (4), 617–626. (22) Koharudin, L. M.; Gronenborn, A. M. Antiviral Lectins as Potential HIV Microbicides. Curr. Opin. Virol. 2014, 7 (1), 95–100. (23) Koharudin, L. M. I.; Gronenborn, A. M. Structural Basis of the Anti-HIV Activity of the Cyanobacterial Oscillatoria Agardhii Agglutinin. Structure 2011, 19 (8), 1170–1181. (24) Botos, I.; O’Keefe, B. R.; Shenoy, S. R.; Cartner, L. K.; Ratner, D. M.; Seeberger, P. H.; Boyd, M. R.; Wlodawer, A. Structures of the Complexes of a Potent Anti-HIV Protein Cyanovirin-N and High Mannose Oligosaccharides. J. Biol. Chem. 2002, 277 (37), 34336–34342. (25) Weis, W. I.; Drickamer, K. Structural Basis of Lectin-Carbohydrate Recognition. Annu. Rev. Biochem. 1996, 65, 441–473. (26) Boyd, M. R.; Gustafson, K. R.; McMahon, J. B.; Shoemaker, R. H.; O’Keefe, B. R.; Mori, T.; Gulakowski, R. J.; Wu, L.; Rivera, M. I.; Laurencot, C. M.; et al. Discovery of Cyanovirin-N, a Novel Human Immunodeficiency Virus- Inactivating Protein That Binds Viral Surface Envelope Glycoprotein Gp120: Potential Applications to Microbicide Development. Antimicrob. Agents Chemother. 1997, 41 (7), 1521–1530. (27) Sandström, C.; Berteau, O.; Gemma, E.; Oscarson, S.; Kenne, L.; Gronenborn, A. M. Atomic Mapping of the Interactions between the Antiviral Agent Cyanovirin-N and Oligomannosides by Saturation-Transfer Difference NMR. Biochemistry 2004, 43 (44), 13926–13931. (28) Férir, G.; Huskens, D.; Noppen, S.; Koharudin, L. M. I.; Gronenborn, A. M.; Schols, D. Broad Anti-HIV Activity of the Oscillatoria Agardhii Agglutinin Homologue Lectin Family. J. Antimicrob. Chemother. 2014, 69 (10), 2746–2758. (29) Xiong, S.; Fan, J.; Kitazato, K. The Antiviral Protein Cyanovirin-N: The Current State of Its Production and Applications. Appl. Microbiol. Biotechnol. 2010, 86 (3), 805–812. (30) Gao, X.; Chen, W.; Guo, C.; Qian, C.; Liu, G.; Ge, F.; Huang, Y.; Kitazato, K.; Wang, Y.; Xiong, S. Soluble Cytoplasmic Expression, Rapid Purification, and Characterization of Cyanovirin-N as a His-SUMO Fusion. Appl. Microbiol. Biotechnol. 2010, 85 (4), 1051– 1060. (31) Fujii, R.; Kitaoka, M.; Hayashi, K. Error-Prone Rolling Circle Amplification: The Simplest Random Mutagenesis Protocol. Nat. Protoc. 2006, 1 (5), 2493–2497. (32) Taylor, N. D.; Garruss, A. S.; Moretti, R.; Chan, S.; Arbing, M. A.; Cascio, D.; Rogers, J.

100

K.; Isaacs, F. J.; Kosuri, S.; Baker, D.; et al. Engineering an Allosteric Transcription Factor to Respond to New Ligands. Nat. Methods 2016, 13 (2), 177–183. (33) Li, X.; Grant, O. C.; Ito, K.; Wallace, A.; Wang, S.; Zhao, P.; Wells, L.; Lu, S.; Woods, R. J.; Sharp, J. S. Structural Analysis of the Glycosylated Intact HIV ‑ 1 Gp120 − B12 Antibody Complex Using Hydroxyl Radical Protein Footprinting. Biochemistry 2017, 56, 957–970. (34) Giomarelli, B.; Provvedi, R.; Meacci, F.; Maggi, T.; Medaglini, D.; Pozzi, G.; Mori, T.; McMahon, J. B.; Gardella, R.; Boyd, M. R. The Microbicide Cyanovirin-N Expressed on the Surface of Commensal Bacterium Streptococcus Gordonii Captures HIV-1. Aids 2002, 16 (10), 1351–1356. (35) Barrientos, L. G.; Matei, E.; Lasala, F.; Delgado, R.; Gronenborn, A. M. Dissecting Carbohydrate-Cyanovirin-N Binding by Structure-Guided Mutagenesis: Functional Implications for Viral Entry Inhibition. Protein Eng. Des. Sel. 2006, 19 (12), 525–535. (36) Altschul, S. F. A Protein Alignment Scoring System Sensitive at All Evolutionary Distances. J. Mol. Evol 1993, 36, 290–300. (37) Paclik, D.; Danese, S.; Berndt, U.; Wiedenmann, B.; Dignass, A.; Sturm, A. Galectin-4 Controls Intestinal Inflammation by Selective Regulation of Peripheral and Mucosal T Cell Apoptosis and Cell Cycle. PLoS One 2008, 3 (7), 1–12. (38) Stechly, L.; Morelle, W.; Dessein, A. F.; André, S.; Grard, G.; Trinel, D.; Dejonghe, M. J.; Leteurtre, E.; Drobecq, H.; Trugnan, G.; et al. Galectin-4-Regulated Delivery of Glycoproteins to the Brush Border Membrane of Enterocyte-like Cells. Traffic 2009, 10 (4), 438–450. (39) Bum-Erdene, K.; Leffler, H.; Nilsson, U. J.; Blanchard, H. Structural Characterisation of Human Galectin-4 N-Terminal Carbohydrate Recognition Domain in Complex with Glycerol, Lactose, 3′-Sulfo-Lactose, and 2′-Fucosyllactose. Sci. Rep. 2016, 6 (December 2015), 20289. (40) Cao, Z. Q.; Guo, X. L. The Role of Galectin-4 in Physiology and Diseases. Protein Cell 2016, 7 (5), 314–324. (41) Stowell, S. R.; Arthur, C. M.; Dias-baruffi, M.; Rodrigues, L. C.; Gourdine, P.; Heimburg- molinaro, J.; Ju, T.; Molinaro, R. J.; Xia, B.; Smith, D. F.; et al. Innate Immune Lectins Kill Bacteria Expressing Blood Group Antigen. Nat. Med. 2010, 16 (3), 295–301. (42) Thurston, T. L. M.; Wandel, M. P.; Von Muhlinen, N.; Foeglein, Á.; Randow, F. Galectin 8 Targets Damaged Vesicles for Autophagy to Defend Cells against Bacterial Invasion. Nature 2012, 482 (7385), 414–418. (43) Kaltner, H.; Gabius, H. J. Sensing Glycans as Biochemical Messages by Tissue Lectins: The Sugar Code at Work in Vascular Biology. Thromb. Haemost. 2019, 119 (4), 517–533. (44) Troncoso, M. F.; Ferragut, F.; Bacigalupo, M. L.; Cardenas Delgado, V. M.; Nugnes, L. G.; Gentilini, L.; Laderach, D.; Wolfenstein-Todel, C.; Compagno, D.; Rabinovich, G. a.; et al. Galectin-8: A Matricellular Lectin with Key Roles in Angiogenesis. Glycobiology 2014, 24 (10), 907–914. (45) Romaniuk, M. A.; Tribulatti, M. V.; Cattaneo, V.; Lapponi, M. J.; Molinas, F. C.; Campetella, O.; Schattner, M. Human Platelets Express and Are Activated by Galectin-8. Biochem. J. 2010, 432 (3), 535–547.

101

(46) Friedel, M.; André, S.; Goldschmidt, H.; Gabius, H. J.; Schwartz-Albiez, R. Galectin-8 Enhances Adhesion of Multiple Myeloma Cells to Vascular Endothelium and Is an Adverse Prognostic Factor. Glycobiology 2016, 26 (10), 1048–1058. (47) Su, E. W.; Bi, S.; Kane, L. P. Galectin-9 Regulates T Helper Cell Function Independently of Tim-3. Glycobiology 2011, 21 (10), 1258–1265. (48) Yasinska, I. M.; Sakhnevych, S. S.; Pavlova, L.; Selnø, A. T. H.; Abeleira, A. M. T.; Benlaouer, O.; Silva, I. G.; Mosimann, M.; Varani, L.; Bardelli, M.; et al. The TiM-3- Galectin-9 Pathway and Its Regulatory Mechanisms in Human Breast Cancer. Front. Immunol. 2019, 10 (JULY), 1594. (49) Casals, C.; Campanero-Rhodes, M. A.; García-Fojeda, B.; Solís, D. The Role of Collectins and Galectins in Lung Innate Immune Defense. Front. Immunol. 2018, 9 (SEP), 1–10. (50) Schaefer, K.; Webb, N. E.; Pang, M.; Hernandez-Davies, J. E.; Lee, K. P.; Gonzalez, P.; Douglass, M. V.; Lee, B.; Baum, L. G. Galectin-9 Binds to O-Glycans on Protein Disulfide Isomerase. Glycobiology 2017, 27 (9), 878–887. (51) Rustiguel, J. K.; Soares, R. O. S.; Meisburger, S. P.; Davis, K. M.; Malzbender, K. L.; Ando, N.; Dias-baruffi, M.; Nonato, M. C. Full-Length Model of the Human Galectin-4 and Insights into Dynamics of Inter-Domain Communication. Nat. Publ. Gr. 2016, No. May, 1–13. (52) Nishi, N.; Itoh, A.; Fujiyama, A.; Yoshida, N.; Araya, S. I.; Hirashima, M.; Shoji, H.; Nakamura, T. Development of Highly Stable Galectins: Truncation of the Linker Peptide Confers Protease-Resistance on Tandem-Repeat Type Galectins. FEBS Lett. 2005, 579 (10), 2058–2064. (53) Yoshida, H.; Teraoka, M.; Nishi, N.; Nakakita, S.; Nakamura, T.; Hirashima, M.; Kamitori, S. X-Ray Structures of Human Galectin-9 C-Terminal Domain in Complexes with a Biantennary Oligosaccharide And. J. Biol. Chem. 2010, 285 (47), 36969–36976. (54) Stowell, S. R.; Arthur, C. M.; Mcbride, R.; Berger, O.; Razi, N.; Rodrigues, L. C.; Gourdine, J.; Noll, A. J.; Gunten, S. Von; Smith, D. F.; et al. Microbial Glycan Microarrays Define Key Features of Host-Microbial Interactions. Nat. Chem. Biol. 2014, 10 (6), 470–476. (55) Theile, C. S.; Witte, M. D.; Blom, A. E. M.; Kundrat, L.; Ploegh, H. L.; Guimaraes, C. P. Site-Specific N-Terminal Labeling of Proteins Using Sortase Mediated Reactions. Nat. Methods 2014, 8 (9), 1800–1807.

102