Carbohydrate and Bacterial Binding Specificity of Human Intelectin-1

By

Christine R. Isabella

M.S. Biochemistry University of Wisconsin – Madison, 2017 B.S. Molecular and Cellular Biology University of Puget Sound, 2012

SUBMITTED TO THE DEPARTMENT OF CHEMISTRY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY IN CHEMISTRY AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY

February 2021

Ó 2021 Massachusetts Institute of Technology. All Rights Reserved.

Signature of Author: ______Department of Chemistry January 11, 2021

Certified By: ______Laura L. Kiessling Novartis Professor of Chemistry Thesis Supervisor

Accepted By: ______Adam P. Willard Associate Professor of Chemistry Chair, Department Committee on Graduate Students

1 This doctoral thesis has been examined by a committee of professors from the Department of Chemistry as follows:

Barbara Imperiali ______Thesis Committee Chair Class of 1922 Professor of Biology and Chemistry

Laura L. Kiessling ______Thesis Supervisor Novartis Professor of Chemistry

Ronald T. Raines ______Thesis Committee Member Roger and Georges Firmenich Professor of Natural Products Chemistry

Eric J. Alm ______Thesis Committee Member Professor of Biological Engineering

2 Carbohydrate and Bacterial Binding Specificity of Human Intelectin-1

By

Christine R. Isabella

Submitted to the Department of Chemistry on January 15, 2021 in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Chemistry

ABSTRACT

The mucosal surfaces of the human body exist in close contact with complex communities of resident microorganisms termed the microbiome. The microbiome is crucial for host health, and therefore the host must discern between which microbes colonize and which must be cleared. Human soluble lectins are secreted carbohydrate-binding proteins that bind microbes by specific recognition of cell surface glycans. Many soluble lectins are important mucosal innate immune factors, as lectins binding to microbes can result in their clearance from the host. However, the glycan and microbial binding specificities of lectins are poorly defined. In this thesis, I aim to address this gap with a focus on human intelectin-1 (hItln-1). In Chapter 1, I review the recently identified class of lectins, the X-type lectins. The X-type lectins, or intelectins, are found throughout chordates and share highly conserved sequences but their biological roles are not well understood. However, their expression patterns and microbial binding specificity suggest a role in regulation of the microbiome. In Chapter 2, I build on previous work to further define hItln-1 carbohydrate specificity. These studies reveal that carbohydrate conformation is stabilized by stereoelectronic effects, and that carbohydrates are bound by hItln-1 in their stabilized conformation. In Chapter 3, I turn to bacterial cell recognition by hItln-1 and determine that hItln-1 displays competitive binding to bacterial strains in a mixture. These studies reveal the need to assay lectin- recognition against diverse microbial communities to understand their binding specificity in a biological context. In Chapter 4, I develop lectin-sequencing (lectin- SEQ) as a method for identifying bacterial targets of lectins in native communities. Using the human stool microbiome, I assess binding to stool bacteria by hItln-1, and surfactant protein-D (SP-D). Lectin-SEQ reveals that hItln-1 recognizes health-promoting commensal bacteria, while SP-D recognizes pathogenic bacteria. These results indicate a novel role for hItln-1 in promoting colonization of commensal bacteria.

Thesis Supervisor: Laura L. Kiessling Title: Novartis Professor of Chemistry

3 Acknowledgements

It turns out that a Ph.D. is true type II fun. There were countless times when I thought I wouldn’t do it, and for all of those times there were people there who supported and encouraged me, and those who ensured that, in the end, it all looked like fun. To all of you, I owe a debt of gratitude.

First, I must thank Laura. You have always encouraged me and believed in my abilities more than I believed in myself. I will always be grateful for your push to dive deep into lectin- SEQ and your interest and excitement in seeing it through to the finish line. I appreciate, too, the moments away from the science when you took the time to support me. I feel very lucky to have an advisor who truly cares about me as a person and a scientist.

The Kiessling Group has been an incredible place to learn and live and grow over the past five years. I was drawn to the lab in large part because of the people, and I still feel lucky to spend my days with you all. I have to thank Darryl, for recruiting me, for teaching me all of the hIntL secrets, for believing in my abilities, and for passing down such an awesome project. I learned so much from you in the first months of my time in the lab that set the stage for my time in the Kiessling lab, and to this day, I hear your voice in my head talking about all of the cool hIntL projects we could do. Heather, I will never forget the first time you talked to me. You offered me a pour-over coffee and I felt like I was welcomed into a club. I admire you so much as a scientist and a person, and I learned so much from you, including but not limited to how to most efficiently order Jimmy John’s to the lab. Alex—we really went through it all together. You have been a source of encouragement to me day after day throughout the years and I am appreciative of you always making the time to troubleshoot experiments, chat, and snack. I hope one day we can return to Bull Shoals and live our best lives. Sayaka, thank you for always sharing your snacks with me and reminding me to eat when I am hangry. Thank you for always taking the time to help me think through experiments and results. And thank you for just being the most kind and sincere and caring person, and for running TWO half marathons with me that I was undertrained for. Finally, I owe so much to LectinLand for embodying the “all hands on deck” motto. I am grateful for having such a wonderful team with so much engagement and excitement about all of our various projects. Thank you all for teaching me, working with me, spending hours sorting bacteria with me, and just generally being the best subgroup.

I want to next thank those who made UW – Madison feel like home. To my IPiB family—moving to a new institution made me realize how special the people and the relationships built at UW truly are. To Delia, my soul twin and little sister, thank you for everything. To Dylan (Dylz), our QQ and Dominoes study sessions got me through so much, and have been greatly missed the last three years. And I’m guessing you really need a haircut. To Dana, I am yet to meet a comparable cooking partner. To Mark, my bike has been lonely, and I haven’t fallen off of it recently. To Karl, I hope my life legacy is the flong cape, and I am so glad that I get to still see you in Boston. To Anji, you have been an inspiration to me since the day I met you and I can’t wait to see what you do next. Ian and Sue, I’m so glad that we were all Discertators together, and that it led to so many adventures in Wisconsin and Boston. From bike camping to Friday Fish Fry to candlepin bowling, you guys have been incredible friends. I am

4 also grateful that I got to collaborate with Ian and had the opportunity to learn so much about protein crystallography from him.

When the Kiessling Group moved to MIT, I was lucky to meet incredible people who have made my experience in Cambridge unforgettable. Lisa, you were my first MIT friend and you have always made me feel welcome here. I am grateful to have you as a source of encouragement and inspiration, for teaching me everything about job hunting, and for countless runs…and pastries. Smrithi, thank you for always showing up to lab with a smile and ready to make lectins in cells that were basically dead. You brought much needed light to my days and real perseverance to our project and I will always be sad that a pandemic cut your time in the lab short. I have had so much fun working with you and watching you grow over the years, and I am excited to see what you do next. Katherine, thank you for being a great friend, only sometimes being distracting, and for always being passionate. I am grateful for the time we spent as co- presidents of WIC, for ceramics (RIP), and for always having you to talk to and knowing that you will go to bat for me and/or probably literally fight people if I need you to. Victoria, you are wise beyond your years. Thank you for always keeping the perspective that life is full of pain and that you can’t ruin a project with one experiment. Janet, we might not have a snack shelf, but we have many snack adventures. I am grateful for our quarantine walks and for your listening ear and constant encouragement. I don’t know if I would have made it through the last few months if it weren’t for both of you, Sugo Sunday, Caturday, and boba. I believe you will truly move with motion one day.

There are a few very special people who I wouldn’t have made it through without. Liz, I don’t really know how it took us until 5 years after college to know each other, but you are one of my favorite humans. Thank you for randomly moving to Boston for a year so that we could be friends forever. You seriously were the friend I needed that year. Nacey, I don’t even know how it happened but now you are my best friends. Nate, thank you for striking the appropriate balance between making fun of me for how aloof I am and also reminding me that I am smart and good at things. Because of you I can always remember that this whole PhD was basically just a really really long V2. Stacey, thank you for always being down for the beach, slumber parties, and shopping. And corn dogs. Cecie, I don’t even have words for what you mean to me. Thank you for being the best SW and being there every moment of every day for me. For all of the adventures and movie nights. And for also sometimes telling me to suck it up. Finally, I have to thank Dave for being in this with me and still putting up with me. Thank you for listening to, encouraging, and supporting me, for putting up with all of my tears, and for literally and figuratively pulling me up cliffs.

Lastly and most of all, I have to thank my family. My brother, Adam, for doing everything first in life so that I could follow in his footsteps. Thank you for being such a supportive older brother and for always believing that I could do this. And my parents, thank you for supporting me in everything over the years, for visiting me in all corners of the country, and for always wanting and finding a way to provide the best for me.

5 Table of Contents

Abstract ...... 3 Acknowledgements ...... 4 Table of Contents ...... 6 List of Figures ...... 9 List of Tables ...... 9 List of Abbreviations ...... 10

Chapter 1: X-type lectins: soluble lectins with microbial glycan specificity and elusive biological function

1.1 Abstract ...... 16 1.2 Introduction ...... 17 1.3 X-type lectins in chordates ...... 18 1.4 X-lectin structure ...... 19 1.5 Xenopus intelectins ...... 27 1.5.1 Cortical granule lectins ...... 27 1.5.2 Serum lectins ...... 28 1.5.3 Intestinal lectins ...... 30 1.5.4 Embryonic epidermal lectin ...... 30 1.6 Mouse intelectins ...... 31 1.7 Sheep intelectins ...... 34 1.8 Human intelectins ...... 36 1.8.1 Glycan recognition ...... 37 1.8.2 Human omentin ...... 38 1.8.3 Intelectins in diseases associated with microbial dysbiosis ...... 40 1.9 Conclusions ...... 40 1.10 Acknowledgements ...... 43 1.11 References ...... 44

Chapter 2: Stereoelectronic effects impact glycan recognition

2.1 Abstract ...... 54 2.2 Introduction ...... 55 2.3 Results ...... 57 2.3.1 hItln-1 binding to microbial monosaccharides ...... 57 2.3.2 Structure of hItln-1 bound to allyl-KO ...... 61 2.3.3 Bioinformatic analysis of glycan conformation ...... 65 2.3.4 Computational analysis of glycan conformation and recognition ...... 68 2.4 Discussion ...... 73 2.5 Conclusions ...... 76 2.6 Materials and Methods ...... 78 2.6.1 Recombinant protein expression ...... 78

6 2.6.2 Chemical synthesis of glycans ...... 78 2.6.3 Biolayer interferometry (BLItz) ...... 78 2.6.4 ELISA ...... 79 2.6.5 Protein X-ray crystallography ...... 80 2.6.6 Bioinformatics ...... 81 2.6.7 Computational Analysis ...... 81 2.7 Funding Sources...... 82 2.8 Acknowledgements ...... 83 2.9 References ...... 84 2.10 Supplemental Information ...... 89

Chapter 3: Human Intelectin-1 specificity for microbe binding in synthetic communities

3.1 Abstract ...... 100 3.2 Introduction ...... 101 3.3 Results ...... 103 3.3.1 Recognition of microbial strains by hItln-1 ...... 103 3.3.2 hItln-1 binding affinity for Gram-positive and Gram-negative bacteria ...... 106 3.3.3 hItln-1 binding to synthetic microbial communities ...... 109 3.3.4 hItln-1 competitive binding in mixed microbial communities ...... 112 3.4 Discussion ...... 116 3.5 Conclusions ...... 118 3.6 Materials and Methods ...... 119 3.6.1 hItln-1 Expression and Purification ...... 119 3.6.2 hItln-1 Binding to Bacterial Strains ...... 119 3.6.3 hItln-1 Binding to Synthetic Communities ...... 120 3.7 Acknowledgements ...... 121 3.8 References ...... 122 3.9 Supplemental Information ...... 124

Chapter 4: Lectin-sequencing for analyzing microbial communities

4.1 Abstract ...... 126 4.2 Introduction ...... 127 4.3 Results ...... 130 4.3.1 Soluble lectins bind bacteria from stool ...... 130 4.3.2 16S Sequencing reveals patterns of hItln-1 binding to stool bacteria ...... 133 4.3.3 Metagenomic sequencing identifies lectin-bound bacteria ...... 136 4.3.4 Lectin binding levels are altered in IBD ...... 139 4.3.5 HItln-1 and MBL binding to the healthy and IBD microbiota ...... 141 4.4 Discussion ...... 141 4.5 Materials and Methods ...... 147 4.5.1 Protein expression and purification ...... 147 4.5.2 Direct labeling of lectins with fluorophores ...... 149 4.5.3 Preparation of human stool samples ...... 149

7 4.5.4 Flow cytometry and FACS of donor stool samples ...... 150 4.5.5 Fluorescence microscopy ...... 151 4.5.6 Nucleic acid extraction ...... 151 4.5.7 16S sequencing ...... 152 4.5.8 Metagenomic sequencing ...... 154 4.6 Acknowledgements ...... 156 4.7 References ...... 157 4.8 Supplemental Information ...... 163

8 List of Figures

1-1. X-type lectin structures ...... 21 1-2. Carbohydrate binding sites from select X-type lectins ...... 24 1-3. Alignment of X-lectin amino acid sequences ...... 25 2-1. Factors contributing to lectin–carbohydrate binding and recognition ...... 56 2-2. Human intelectin-1 (hItln-1) binding to monosaccharides in a biolayer interferometry (BLI) competition assay ...... 58 2-3. Evaluation of BSA-conjugated sugars as ligands for hItln-1 using ELISA...... 60 2-4. Structure of hItln-1 bound to allyl-α-KO ...... 62 2-5. Bioinformatic analysis of exocyclic vicinal diol-containing glycans in the PDB ...... 66 2-6. Observed and accommodated ligand conformations in hItln-1 binding site ...... 69 2-7. Stabilizing stereoelectronic effects of preferred rotamers of the proximal side chain C–C bond of KO, L,D-heptose, and D,D-heptose ...... 72 3-1. Binding of hItln-1 to fixed bacterial strains ...... 104 3-2. hItln-1 binding affinity for microbial cell surfaces ...... 108 3-3. Binding of hItln-1 to E. fergusonii is inhibited in a mixed community ...... 111 3-4. Competitive inhibition of hItln-1 binding in microbial communities ...... 115 3-S1. Competitive hItln-1 binding in communities is dependent on lectin concentration and washing ...... 124 4-1. Lectin trimeric structures and binding ligands ...... 128 4-2. Soluble lectins bind the human microbiome ...... 132 4-3. 16S lectin-sequencing of hItln-1 sorted stool bacteria ...... 135 4-4. Lectin-SEQ of healthy donor stool samples with metagenomic sequencing ...... 138 4-5. HItln-1 and SP-D binding to the IBD microbiome ...... 140 4-6. HItln-1 and MBL binding levels healthy, UC and CD donor stool microbiome ...... 141 4-1S. Sequence level enrichment plot of lectin-SEQ with metagenomics ...... 163

List of Tables

1-1. GenBank accession codes for aligned sequences in Figure 1-2 ...... 26 2-1. IC50 values of ligands and corresponding changes in free energy of binding compared to KO ...... 59 2-2. Data collection and refinement statistics for the crystal structure of hItln-1 bound to allyl- α-KO ...... 64 2-3. Summary of conformational analysis results ...... 67 2-4. NBO Donor-acceptor interaction energies and calculated ΔENBO of bond rotation ...... 72 2-S1. Conformational analysis of saccharides containing exocyclic diols in the PDB ...... 89 2-S2. Cartesian coordinates of saccharides optimized at the M06-2X/6-311+G(d,p); IEFPCM:water level of theory ...... 94 3-1. Summary of hItln-1 binding to microbes ...... 105 3-2. Example synthetic microbial community mixtures assayed for hItln-1 binding ...... 109 3-3. hItln-1-binding and non-binding strains used in synthetic communities ...... 112

9 List of Abbreviations

AMPs Antimicrobial peptides

BA angulatum

BCSDB Bacterial Carbohydrate Structure Data Base

BLI Biolayer interferometry

BO Bacteroides ovatus

BP Bacteroides plebeius

BSA Bovine serum albumin

CD Crohn’s disease

CPS Capsular polysaccharide

CRD Carbohydrate recognition domain

D,D-heptose D-glycero-α-D-manno-heptose

DFT Density functional theory

EF Escherichia fergusonii

ELISA Enzyme-linked immunosorbent-like assay

ELLA Enzyme-linked lectin assay

FACS Fluorescence-activated cell sorting

FBG Fibrinogen-like

FSC-A Forward scatter

GlcNAc N-Acetyl-D-glucosamine

GlyP D-glycerol-1-phosphate

Gro-1-P Glycerol 1-phosphate

GWAS Genome wide association studies

10 hItln-1 Human intelectin-1 hItln-647 Human intelectin-1 Alexa Fluor 647 hItln-2 Human intelectin-2

HRP Horseradish peroxidase

HUVECs Human umbilical vein epithelial cells

IBD Inflammatory bowel disease

IHC Immunohistochemistry

IL Interleukin

KDO D-glycero-D-talo-oct-2-ulosonic acid 3-deoxy-D-manno-oct-2-ulosonic acid

KEGG Kyoto Encyclopedia of and Genomes

KO D-glycero-D-talo-oct-2-ulosonic acid

L,D-heptose L-glycero-α-D-manno-heptose lectin-SEQ Lectin-sequencing

LfR receptor

LPS Lipopolysaccharide

LR Lactobacillus reuteri

MBL Mannose-binding lectin

MBL-555 Mannose binding lectin Alexa Fluor 555

MCP Monocyte chemotactic protein mItln-1 Mouse intelectin-1 mItln-2 Mouse intelectin-2

NBO Natural bond orbital

NF-κB Nuclear factor kappa-light-chain-enhancer of activated B cells

11 NMR Nuclear magnetic resonance

OTUs Operational taxonomic units

OVA Ovalbumin

PCoA Principal coordinate analysis

PDB Protein Data Bank

PP Proteus penneri sMCP-1 Sheep mast cell protease

SMCs Smooth muscle cells

SP-D surfactant protein-D

STAT6 Signal transducer and activator of transcription 6

TA Teichoic acid

TFF Trefoil factor

TFF3 Trefoil factor 3

Th2 T helper type 2

TLR Toll like receptor

TNF-α Tumor necrosis factor alpha

UC Ulcerative colitis

V109D Human intelectin-1 V109D

XCGL Xenopus laevis cortical granule lectin

XCGL2 Xenopus laevis cortical granule lectin 2

XCL Xenopus laevis serum lectin

XCL-2 Xenopus laevis serum lectin 2

XEEL Xenopus laevis embryonic epidermal lecdtin

12 xIntl-3 Xenopus laevis Intelectin 3 xIntl-4 Xenopus laevis Intelectin 4

XL-35 Xenopus laevis lectin 35kDa

β-D-Galf β-D-Galactofuranose

β-Galf β-D-Galactofuranose

ΔΔG Relative free energy

13

14

Chapter 1

X-Type lectins: soluble lectins with microbial glycan specificity and elusive biological function

15 1.1 Abstract

The X-type lectins are a recently identified class of calcium-dependent lectins that lack the C-type lectin fold. The X-type lectins are found throughout the animal kingdom and show high levels of sequence homology across chordates. Still, the numbers of intelectin genes and their expression patterns vary widely between species. While X-type lectins have been suggested to function in innate immunity against microbes, the biological function of most X-type lectins remains elusive. In this review, I summarize critical features of intelectin protein structure, glycan recognition, and current understanding of biological functions. By analyzing data from multiple species, my goal is to illuminate areas where insights from individual species are either unique or broadly applicable toward understanding the intelectins. Additionally, I aim to highlight gaps in knowledge of the intelectins to guide future research.

16 1.2 Introduction

Animals have an integral, yet complicated, relationship with microbes. On one hand, the surfaces of the body that contact the environment must maintain a barrier to protect the animal tissues from microbial . On the other hand, microbes that reside at epithelial surfaces play important roles for their animal host.1 Most notable is the mutualistic relationship between an animal host and its gut microbiome. The latter breaks down components of the diet to provide important metabolites to the host. A host must therefore have specialized defenses at the epithelial surfaces to distinguish between microorganisms that could become pathogenic, commensal, or symbiotic.2-4 To make such decisions, the immune system can exploit the carbohydrates that coat all cells on earth.5, 6 In particular, lectins, which are non-antibody carbohydrate binding proteins, can distinguish glycan residues, and many lectins play important innate immune roles in the host.7-9

An under-studied class of lectins, the X-type lectins, was identified in chordates and proposed to function in innate immunity.10 Since the first X-type lectin was discovered in

Xenopus laevis oocytes in 1982, homologous proteins have been identified in chordates from tunicates to humans.10, 11 This family of lectins has also been termed the intelectins, for the discovery of an X-type lectin in the mouse intestine (intestinal lectin).12 The expression of the X- type lectins in skin, the intestinal mucosa, and serum suggests a role in innate defense against microorganisms. Indeed, some of these lectins can bind glycans presented on the surface of microorganisms, though the exact biological function of intelectins in many species remains elusive.

The first human intelectin (hItln-1) was discovered in 2001.13-15 In the 20 years since, numerous structural, biochemical, and biological insights have added to our understanding of the

17 range of functional roles played by the X-type lectins in chordates. In the present review, I first explore the evolutionary conservation of the X-type lectins. I then cover the structural features of

X-type lectins, comparing the structures from various species. Finally, I review the literature on

X-type lectins, with a focus on Xenopus laevis and the mammalian X-type lectins—Xenopus for the historical perspective, and mammalian lectins because they are most pertinent for the results described in this thesis. I highlight glycan recognition and putative biological roles of these proteins as well as future directions for their study. While I have not focused on them here for clarity, the invertebrate and fish X-type lectins should not be overlooked.

1.3 X-Type lectins in chordates

X-Type lectins are found from the earliest chordates, amphioxus and tunicates, to humans, with few exceptions. Homologous protein sequences are also present in the marine phyla Placozoa and Cnidaria. A recent evolutionary analysis placed Trichoplax adhaerens, a placozoa and one of the most simple animals, as the likely origin of intelectins.11 These analyses suggest the X-type lectins have been present throughout the evolution of animal species.

There is an apparent correlation between the number of intelectin genes and the evolutionary age of an organism. For example, amphioxus species have as many as 12 intelectin genes and the tunicate Ciona intestinalis has 21 intelectins.16 Many marine-dwelling vertebrates also have many intelectins. Xenopus laevis has eight known intelectins, which are described in this review, while zebrafish, Danio rerio, have seven intelectins.17 Land mammals have fewer intelectins. Laboratory strains of mice have one to six intelectins, depending on their numbers of duplications of the Itln locus,18 sheep have three intelectins, and humans have two. This pattern suggests that intelectins play an important role in innate immunity, where more copies were advantageous in animals without robust adaptive immunity strategies. Another potential

18 explanation is that marine animals face additional challenges in protection from environmental microbes at their surface. Fish and frog intelectins are expressed in the intestine as well as in reproductive tissue, embryos, skin slime and gill tissue, suggesting that these intelectins could play important roles in protecting the animal from microbes in their environment. Indeed, many of the lectins expressed by marine animals can agglutinate bacteria, further suggesting a role in protection from microbes.19-24

Interestingly, animals of the order Carnivora lack intelectins. The only exception is the giant panda, which has an intelectin-1-like partial sequence identified by the Basic Local

Alignment Search Tool25 for sequences similar to hItln-1. The sequence aligns to positions 128-

306 of hItln-1 and contains the calcium binding and aromatic residues corresponding to the hItln-

1 ligand binding pocket. Though the Giant Panda is a member of the Carnivora order, it is actually a herbivore consuming almost entirely bamboo. The presence of intelectin in plant- consuming organisms suggests different microbial requirement for the host to access dietary nutrients compared to predatory carnivores.26 Birds and bats also lack intelectins. Because of the requirements for flight, birds and bats have very short digestive tracts and less reliance on a microbiome.27 The tinamou, emu, and kiwi are exceptions that have intelectin genes. These large, ground-dwelling birds from the clade Palaeognathae, have omnivorous diets and do not fly, suggesting a different relationship with their microbiomes compared to flying birds. Taken together, these insights suggest that have a role in microbiome.

1.4 X-Type lectin structure

All X-type lectins characterized to date require calcium ions for carbohydrate-binding, however, they lack sequence similarity to the calcium-dependent C-type lectins.28, 29 In the N- terminal region of their carbohydrate recognition domain, many X-type lectins contain a

19 fibrinogen-like (FBG) domain consisting of about 45 amino acids (residues 37-82 in hItln-1, highlighted in Figure 1-1A, B).30 Another class of lectins, the ficolins, also contain an FBG domain and are thought to be the most similar to the intelectins. However, the FBG domain is the only conserved motif between the two classes. Moreover, the ficolins and the X-type lectins have distinct structures. Our group has solved crystal structures for hItln-1 and Xenopus embryonic epidermal lectin (XEEL), and analysis of these structures revealed that the X-type lectins are, indeed, a novel lectin class displaying a unique fold.30, 31

The 1.8-Å resolution structure of Apo-hItln-1 determined by X-ray crystallography (PDB

4WMQ) revealed that hIntl-1 is a disulfide linked homotrimer (Figure 1-1A).31 Each monomer has a globular structure consisting of two highly twisted β-sheet structures surrounded by seven short α-helices. The structure also contains many random coil regions that are mainly located at the interface between monomers. Each monomer has three divalent calcium ions—two structural calcium ions are buried and the third is solvent accessible and sits in the carbohydrate binding pocket. In addition to the intermolecular disulfide between residues C31 and C48, each monomer has four additional intramolecular disulfide bonds (Figure 1-1B). The cysteines that form the intermolecular disulfide between hItln-1 monomers are not conserved among all intelectins.

XEEL, for example, lacks these cysteines in the carbohydrate recognition domain (CRD).

However, the XEEL CRD does form trimers in solution, and additionally crystallized as a trimer.30 Mouse intelectin-1 (mItln-1) also lacks the analogous cysteines, but similarly form trimers in solution (unpublished data).

20

Figure 1-1. X-type lectin structures. (A) X-ray crystal structure of hItln-1 bound to allyl-β-D-Galf (β- Galf, PDB 4WMY). Each monomer is depicted in white with the fibrinogen-like (FBG) domain highlighted in teal; intermolecular disulfides in yellow spheres; intramolecular disulfides in yellow sticks. (B) Depiction of sequence features of hItln-1 showing the signal peptide, N-linked glycosylation site, intramolecular disulfides, cysteines that form intermolecular disulfides highlighted in yellow, fibrinogen domain highlighted in teal, carbohydrate binding pocket highlighted in gray. Amino acid positions are above sequence features. (C) Predicted hexameric structure of XEEL bound to glycerol 1- phosphate (Gro-1-P) adapted from Wankanont et al.30 The hexamer is a dimer of trimers, with each trimer shown in white or light blue (CRD structure is derived from PDB 4WN0). The predicted N- terminal trimerization domain is shown as a helical bundle from PDB 2SIV. Two of the six predicted Cys-24–Cys-42 intermolecular disulfide bonds are depicted as yellow spheres. (D) Depiction of sequence features of XEEL, with the same feature representations as (B) and the additional helical domain highlighted in light blue.

21 The carbohydrate binding pockets of each monomer in hItln-1 and XEEL are oriented on one face of the trimer. This binding site arrangement allows the lectin to take advantage of multivalency when engaging carbohydrates displayed on a surface, such as a microbial cell glycocalyx.32 On the opposite face of the trimer, the FBG domains are exposed and are poised for potential protein-protein interactions (Figure 1-1A). In other proteins containing FBG- domains, such as fibrinogen, tenascins, and angiopoietins, the FBG-domain has been shown to mediate protein–protein interactions to promote tissue repair in response to injury and infection.33 The potential for the FBG domain of human intelectin to participate in protein– protein interactions suggests that it could play a role in signal transduction between the bacteria it recognizes and the host organism.

The XEEL CRD is structurally very similar to hItln-1. The protein, however harbors an additional N-terminal domain that is not a conserved feature of the X-type lectin family (Figure

1-1C, D). This domain, consisting of residues 22-47, is predicted to be helical and to form an anti-parallel six-helix bundle with six intermolecular disulfide bonds between Cys-24 and Cys-

42.30 The resulting full-length structure of XEEL a disulfide linked hexamer with a predicted barbell-like arrangement (Figure 1-1C, D). The barbell structure explains the ability of XEEL to agglutinate bacteria.30

The structures of hItln-1 and XEEL determined by X-ray crystallography have afforded an understanding of the X-type lectin carbohydrate recognition mode. In the complex of hItln-1 and allyl-β-D-galactofuranose (allyl-β-D-Galf, PDB 4WMY),31 the protein structure is not altered by the addition of the ligand, consistent with the lock-and-key binding model common among lectins.34 This structure also explained the binding specificity of hItln-1, as determined by glycan array, for saccharides containing exocyclic, vicinal diols. The exocyclic diol of allyl-β-D-Galf

22 coordinates to the solvent exposed divalent calcium ion and sits in an aromatic box formed by

W288 and Y297 (Figure 1-2A). The aromatic box is a conserved feature of the binding pockets in mouse intelectin-1 (mItln-1) and XEEL, both of which have been shown to bind β-D-Galf.30, 31

In mItln-1, the Trp and Tyr residues of the aromatic box are conserved, while XEEL has Trp residues in both positions. Nevertheless, the crystal structure of XEEL bound to glycerol 1- phosphate (Gro-1-P) reveals remarkable structural similarity between the hItln-1 and XEEL binding sites (Figure 1-2).

Alignment of the amino acid sequences of intelectins from human, mouse, sheep, xenopus, zebrafish, catfish and lamprey using Clustal Omega35, 36 reveals a high conservation of calcium and ligand binding residues (Figure 1-3, Table 1-1). Analysis of the residues that directly complex the structural calcium ions and the calcium ion in the ligand binding site using ggseqlogo37 shows remarkable sequence identity across species. In contrast, the ligand binding residues are more varied (Figure 1-3B). In hItln-1, W288 and Y297 make an aromatic box

(Figure 1-2A) that is preserved in many species (Figure 1-3). Notably, though, some X-type lectins including human intelectin-2 (hItln-2), mouse intelectin-2 (mItln-2) and Xenopus cortical granule lectins (XCGL and XCGL2) do not have the conserved aromatic box (Figure 1-2,

Figure 1-3A), indicating divergent carbohydrate recognition profiles. While the binding specificity of hItln-2 and mItln-2 are not known, glycan array data suggests that XCGL binds

Galα(1-3)GalNAc.30 In-silico docking of Galα(1-3)GalNAc in the modeled XCGL binding pocket shows that, even without extensive optimization, the ligand fits remarkably well into the space created by the tryptophan to asparagine change in the binding pocket (Figure 1-2D).

Additionally, the presence of a phenylalanine adjacent to the GalNAc C1 suggests that an extended saccharide could take advantage of stacking interactions with the aromatic ring.

23

Figure 1-2. Carbohydrate binding sites from select X-type lectins. (A) Carbohydrate binding site of hItln-1 bound to allyl-β-D-Galf (PBD 4WMY) and model of hItln-2 binding site. (B) Models of mItln-1 and mItln-2. (C) Carbohydrate binding site of XEEL bound to glycerol 1-phosphate (Gro-1-P, PDB 4WN0) and model of XCGL. (D) Modeled XCGL binding site complexed with Galα(1-3)GalNAc. Generated via in silico docking of Galα(1-3)GalNAc by aligning the calcium-coordinating hydroxyls with the bound Gro-1-P in XEEL (4WN0) to match coordination geometry. Side chains altered in the aromatic box are highlighted in cyan. Models were built using SWISS-MODEL. HItln-1 (PDB 4WMQ) was the template for hItln-2, mItln-1, and mItln-2. XEEL (PDB 4WMO) was the template for XCGL. All ligands are shown in grey; calcium ions in green; and ordered water molecules in the binding pocket in red.

24

1 for each ). - A ln acting with the ligand oss the sequences shown in ( ) Sequence alignment of human, mouse, sheep, xenopus, zebrafish, catfish and and catfish zebrafish, xenopus, sheep, mouse, human, of alignment ) Sequence A ( ) Sequence logo showing amino acid conservation acr conservation acid amino showing logo ) Sequence B lectin amino acid sequences. sequences. acid amino lectin - 3. Alignment of X - lamprey intelectins. Residues coordinating structural calcium (blue), ligand binding site calcium (green) and those inter those and (green) calcium site binding ligand (blue), calcium structural coordinating Residues intelectins. lamprey hIt to corresponding position the and right, the on is shown sequence full each for length Sequence highlighted. are (orange) Figure 1 highlighted residue is shown at the top. (

25 Table 1-1. GenBank accession codes for aligned sequences in Figure 1-2 GenBank accession code Species Protein BAD98810.1 Lethenteron camtschaticum itlnb X82626 Xenopus laevis XCGL BF232570 X. laevis XCGL2 AB061238 X. laevis XCL-1 AB061238 X. laevis XCL-2 NP_001085762 X. laevis XCL-3 BC087616 X. laevis XEEL BAL14267.1 Silurus asotus itln-gill BAL14266.1 S. asotus itln-skin/kidney U583680 Danio rerio zItln1 EU583682 D. rerio zItln2 EU583681 D. rerio zItln3 XP_027821289 Ovis aries sItln-1 XP_027821293 O. aries sItln-2 CAP09695 O. aries sItln-3 AAU88049 Mus musculus mItln-1 AAO60215 M. musculus mItln-2 BC020664 Homo sapiens hItln-1 AY358905 H. sapiens hItln-2

The ability to produce recombinant intelectins to delineate their biological and biochemical properties is valuable. However, the structures of these proteins have revealed that their processing and modification is complex. Many intelectins possess a signal peptide which is cleaved upon secretion, numerous inter- and intramolecular disulfides, and N-linked glycosylation required for proper folding and solubility. Thus, intelectins are recalcitrant to recombinant expression in Escherichia coli. Moreover, we have observed that improperly folded intelectins can display non-specific but calcium-dependent carbohydrate binding.9 Therefore, assessment and verification of protein folding of recombinant intelectins is imperative.

26 1.5 Xenopus intelectins

The X-type lectin family is named for the discovery of the first proteins of this class in

Xenopus laeivs.10 In 1974, Jerry Hedrick and colleagues observed that contents of the X. laevis egg cortical granule caused calcium-dependent agglutination of the egg jelly coat to block polyspermy.38 This agglutination action was later ascribed to the presence of a D- galactopyranoside binding lectin found in the oocyte, embryo and the cortical granules of the egg.39-41 This lectin, named both XCGL (X. laevis cortical granule lectin) and XL-35 (X. laevis lectin 35kDa), accounts for more than 70% of the protein in the X. laevis egg cortical granules.28,

41 To date, at least eight X-type lectins have been identified in X. laevis: the cortical granule lectins (XCGL and XCGL2), the serum lectins (XCL-1, XCL-2, and XCL-3), the embryonic epidermal lectin (XEEL), and most recently, the intestinal lectins (xIntl-3 and xIntl-4).

1.5.1 Cortical granule lectins

As mentioned previously, the cortical granule lectin was the first identified X-type lectin.

Cortical granules are secretory organelles associated with preventing polyspermy, and the presence of a lectin at these sites is intriguing. XCGL was identified in the cortical granules and fertilization envelope by purification from oocytes using a melibiose affinity column.38, 42 The result was a glycosylated protein that could agglutinate rabbit erythrocytes in a calcium- dependent manner.39 Later, Quill and Hedrick determined XCGL purified as oligomers of 10 to

12 glycosylated monomers. They then purified egg jelly to isolate large molecular weight mucin- like glycoproteins, and developed an enzyme-linked lectin assay (ELLA) to identify the native

XCGL ligand. This assay revealed that α-galactosidases strongly inhibited lectin binding, indicating that the XCGL binds α-galactosides of glycoproteins in the egg jelly.43 Upon release

27 from the cortical granules at fertilization, the highly oligomeric XCGL engages in high avidity interactions with mucin glycans in the egg jelly. The result is tightly crosslinked the egg jelly, thereby creating an impenetrable fertilization envelope.44 This fertilization envelope acts as a physical barrier of polyspermy, and may additionally protect the embryo from microorganisms present in the environment.

The second cortical granule protein, XCGL2, shares 87.5% sequence identity with

XCGL. Both are expressed at the highest level in unfertilized eggs, and their expression decreases throughout embryogenesis.45 Sequence alignment of XCGL and XCGL2 shows that most variability occurs in the N-terminal domain, while identical amino acid residues are present in the ligand-binding site.30 Together, these data indicate that the two cortical granule lectins share the same ligands. Interestingly, XCGL and XCGL2 are the only Xenopus X-type lectins that differ from XEEL in the residues involved in carbohydrate binding. Where XEEL forms an aromatic box with W317 and W326, XCGL and XCGL2 have phenylalanine and asparagine residues, respectively. The difference in these two amino acid residues influences whether the lectins engage in self-carbohydrate epitope recognition, as with the cortical granule lectins, ir microbial glycan recognition, as with XEEL and the other X-type lectins.30

1.5.2 Serum lectins

At least three X-type lectins have been identified in the serum of X. laevis: XCL-1, XCL-

2, and XCL-3. In 1985, a serum lectin was identified by Roberson and colleagues, and shown to differ from XCGL in amino acid composition. Despite the sequence differences, the lectin retained galactoside-binding ability.46 Still, it is unclear whether this is one of the serum lectins later characterized and discussed herein. In 2007, Ishino and colleagues cloned the cDNA of a

28 calcium-dependent lectin isolated from adult Xenopus serum (XCL-1). They then used primers targeting the regions with high conservation in other known X-type lectins to identify potential

X-type lectins expressed during tail regeneration in X. laevis tadpoles. In this way, they identified Xenopus calcium-dependent serum lectin 2 (XCL-2), which shares 60% amino acid identity with XCL-1.47 Finally, XCL-3 was identified from a DNA database48 based on sequence similarity to XCL-1 and -2.49 XCL-1, -2, and -3 are differentially expressed in adult frog tissues.49 Alignment of the amino acid sequences of XCL-1, -2, and -3 with XEEL shows that

XCL-1 and -2 have conserved residues at both the structural and carbohydrate-binding calcium ion coordination sites. XCL-3 has conserved residues at almost all sites; the only exception is

V305, which corresponds to W317 in the ligand-binding site of XEEL (Figure 1-3).30 The consequence of this change is not clear but, taken with the additional non-conserved C-terminal residues in XCL-3, it likely indicates a different carbohydrate binding specificity. XCL-1, -2, and

-3 differ most greatly from each other in their N-terminal sequences downstream of the signal peptide. These differences could affect the oligomerization states of the mature proteins.

Nagata and colleagues performed more in-depth analyses of XCL-1 using a monoclonal antibody specific to the lectin. They determined that expression of the encoding XCL-1 is induced in response to lipopolysaccharide (LPS) injection, and additionally observed calcium- dependent binding to Staphylococcus aureus and LPS, and to a lesser extent, to Escherichia coli.

Binding of XCL-1 to bacteria was inhibited by the pentoses ribose and xylose.49 However, this binding may not be indicative of XCL-1 glycan specificity, as the carbohydrates used for this experiment had a free reducing end, allowing the carbohydrate to equilibrate between the ring- closed and linear forms in solution. Based on sequence similarity to XEEL, XCL-1 likely shares the conserved mechanism of recognition of glycans with exocyclic vicinal diol groups.30

29 1.5.3 Intestinal lectins

Two novel intelectins were recently identified in X. laevis intestine, xIntl-3 and xIntl-4.50

The proteins were purified by galactose-sepharose affinity and characterized by N-terminal amino acid sequencing followed by cDNA cloning. XItln-3 is expressed most highly in the intestine, while xIntl-4 is most strongly expressed in liver, lung, and kidney.

Immunohistochemistry showed xIntl-3 localized in the mucus granules of goblet cells in the intestine and rectum, and injection of LPS increases the xIntl-3 content throughout the intestine and rectum. XIntl-3 formed multimers with as many as 12 copies. Comparison of xIntl-3 and

XEEL reveals highly conserved binding site residues in comparison to XEEL, suggesting it also functions in recognizing microbial glycans. Indeed, it can agglutinate E. coli in a calcium- dependent manner.50

1.5.4 Embryonic epidermal lectin

The Xenopus embryonic epidermal lectin (XEEL) was identified by Nagata and colleagues in 2003. Analysis of cDNA from X. laevis embryo led to the identification of a 342 amino acid protein containing a signal sequence and fibrinogen-like motif that localized to epidermal cells in the embryonic stage.51 The protein shares 62-70% identity with XCGL, mouse intelectin-1, and the human intelectins. Nagata later determined XEEL to be a disulfide linked homohexamer with highest expression at the hatching sage of the embryo.52 Recently,

Wangkanont and colleagues determined the structure of XEEL using x-ray crystallography, which was described in detail in section 1.4. This structure revealed a conserved binding site structure and ligand binding mode between XEEL and human Itln-1 for recognizing microbial glycans containing an exocyclic vicinal diol.30 Taken together, these studies suggest a role of

30 XEEL in innate immunity for the hatching embryo. XEEL is secreted into the environmental water and can entrap microbes via agglutination, thereby regulating the microbes at the surface of the developing hatchling.

1.6 Mouse intelectins

The first mouse intelectin was identified by Komiya et al. in 1998 from a large scale in situ hybridization screen on intestinal tissue from BALB/c mice. Because the transcript showed homology to XCGL and appeared to be expressed in the intestine, it was named intelectin, for intestinal lectin.12 Since then, additional intelectins have been identified in mice and, importantly, the number of intelectins varies between common laboratory mouse strains. The aforementioned intelectin is mouse intelectin-1 (mItln-1, also called Itlna), which is present in all mouse strains. However, it is the only intelectin present in C57BL/6J mice.18, 53 Other common laboratory strains of mice have undergone expansion of the intelectin locus on to contain up to 6 intelectin genes. 129S7 mice have six intelectins, mItln-1, -2, -5, and -6 are full length proteins with highly conserved but not identical sequences, and mItln-3 and -4 contain early stop codons.18 The variation in the number of intelectin genes is relevant as mice are critical immunological models and the potential function of intelectins in innate immunity may be relevant for interpretation of results between different mouse strains.

MItln-1 and mItln-2 are the only mouse intelectins that have been further studied. MItln-1 is expressed solely in the Paneth cells of the mouse small intestine. The lectin’s binding pocket has residues identical to those of hItln-112, 18 and therefore likely an identical glycan binding profile. However, mItln-1 lacks both the cysteine residues corresponding to those that form the intermolecular disulfide bonds in the hItln-1 homotrimer and the glycosylation site (N163)

31 present in hItln-1.54 We hypothesize that mItln-1 is a non-covalent homotrimer, due to the observation that the XEEL CRD is a non-covalent homotrimer in solution (described above,

Figure 1-1C),30 as well as the observation that mItln-1 binds immobilized galactofuranose with a similar affinity to hItln-1,31, 55 suggesting that it is in a multimeric form that takes advantage of multivalent binding to its ligands. MItln-1 expression in small intestinal Paneth cells was shown to increase three and a half fold in response to microbial colonization of germ free NMRI mice,56 suggesting that mItln-1 likely plays an innate immune function at the mucosal surface of the small intestine. In contrast to mItln-1, mItln-2 is a goblet cell product expressed in the mouse lung and intestine.18, 53 The binding site of mItln-2 is unique from both hItln-1 and hItln-2

(Figure 1-2B). Where hItln-1 has the aromatic box consisting of W288 and Y297, mItln-2 has an alanine and tyrosine at the analogous positions. In hItln-2, the analogous positions are tryptophan and serine, respectively. Thus, human and mouse Itln-2 are likely to display different glycan binding specificities, and the mouse Itln-2 should not be viewed as a model for the human Itln-2.

To date, there are no known ligands of mItln-2, but is upregulated in response to parasitic nematode infection.53, 57-59 An intriguing possibility is that the lectin could recognize glycans specific to nematodes.

Studies of mouse intelectins show that the intelectin proteins are upregulated by the T helper type 2 (Th2) innate immune response in response to infection by pathogenic organisms.58

The Th2 response is characterized by the cytokines interleukin (IL) -4 and IL-13 and is the primary response that drives allergic asthma and parasitic helminth expulsion.60 IL-4 and IL-13 activate phosphorylation of signal transducer and activator of transcription 6 (STAT6), a transcription factor that activates genes involved in humoral immunity.60-62 By infecting BALB/c mice (resistant to nematode infection) and AKR mice (susceptible to infection, lack the Th2

32 response) with the intestinal nematode Trichuris muris, Datta et al. showed that only in BALB/C mice is mItln-1 expression upregulated after challenge. A later study showed mItln-1 and mItln-2 upregulation in response to Nippostrongylus brasiliensis in a STAT-6-dependent manner.59

However, when both mouse intelectin genes were constitutively expressed in the lung, there was no sign of enhanced clearance of N. brasiliensis,59 indicating that the lectins themselves are not directly clearing the parasite, but rather are either expressed alongside other STAT6 dependent genes that are responsible for clearance, or part of a pathway that leads to clearance and other parts of that pathway must also be expressed in STAT6-dependent manner. While the aforementioned studies implicate both mItln-1 and -2 in helminth clearance, another study pointed toward mItln-2 as the critical player. Pemberton et al. showed that when infected with

Trichinella spiralis, strains that have mItln-2 (129/SvEv and BALB/c) can expel the worms, while strains that do not have mItln-2 (C57BL/6 and C57BL/10) display delayed worm expulsion.53

The mouse intelectins have also been implicated in the Th2 allergic asthma response.

When FVB/NCrl mice were challenged with ovalbumin (OVA) to initiate an allergic airway response, the expression of both mItln-1 and mItln-2 genes in airway mucus cells was significantly increased. In addition, upregulation of the production of monocyte chemotactic proteins (MCP) -1 and -3 appears to depend on mItln expression. The aforementioned proteins increased with the intelectins upon OVA challenge, but when mouse lung epithelial cells were treated with mItln shRNA, MCP-1 and -3 were not upregulated in response to IL-13.63 Other genes expressed in asthma in mouse in response to IL-13 are also likely involved in glycan- protein interactions, including trefoil factor (TFF) 1, TFF2, and the mucins Muc5AC and

Muc5B.64, 65

33 1.7 Sheep intelectins

Sheep have three intelectins with expression primarily at the mucosa of the airway, lung and gut. These intelectins, sItln-1, sItln-2, and sItln-3, were named by the order in which they were identified. The naming of sheep intelectins has no relation to their homology to other mammalian intelectins. A sequence comparison of sheep and human intelectins indicates that sItln-1 and sItln-2 have 80.4% and 86.6% identity to hItln-1, respectively. Both have identical residues corresponding to the W288 and Y297 in the hItln-1 carbohydrate binding pocket

(Figure 1-3B). This analysis suggests that both sItln-1 and sItln-2 recognize similar glycans as hItln-1, but sItln-2 is the most similar to hItln-1. Sheep Itln-3 has a serine in place the Y297 of hItln-1; therefore, sItln-3 maps to the hItln-2 binding site residues. Alignment of the two sequences shows that sItln-3 and hItln-2 are 82.4% identical (Figure 1-3A). The glycan specificities of both sItln-3 and hItln-2 are unknown.

In uninfected sheep, the three sheep intelectins display distinct tissue distribution. SItln-1 is expressed in the lung, abomasum, colon, gastric lymph node and terminal rectum. SItln-2 is strongly expressed in the abomasum (a ruminant stomach component), but not detected in other tissues. SItln-3 is expressed widely in the digestive tract, including the abomasum, jejunum, colon, rectum, and ileal Peyer’s patches.66 By immunohistochemistry, the intelectins appear to be expressed and secreted with the mucus. They are localized to goblet cells in the lung, mucus neck cells in the abomasum, mucus cells in the colon, gastric mucus, and free mucus in the ileum.66-68 Indeed, sItln-2 has been shown to interact with the mucin Muc5AC from gastric mucus of sheep.69 Purified sheep gastric mucus separated by SDS-PAGE showed a high molecular weight band, that, when analyzed by mass spectrometry contained both Muc5AC and sItln-2. The purified mucin required both SDS and reducing agent, along with boiling, to

34 dissociate sItln-2 from the Muc5AC,69 indicating a reasonably high affinity association between the two molecules. SItln-2 shares binding site residues with hItln-1, and hItln-1 does not appear to bind mammalian glycans,31 consequently, it seems unlikely that sItln-2 binds the Muc5AC O- glycans.

The focus of studies of the sheep intelectins has been the Th2 immune response to asthma, allergens, or parasitic infection. Because of the prevalence of nematode infection in livestock, a major emphasis has been on parasitic nematode infection. French and colleagues identified an intelectin in the sheep airway goblet cells that is upregulated by the Th2 cytokine,

IL-4.67 This finding follows on the observations that mouse strains either susceptible or resistant to nematode infection have differential expression of intelectins,53 and humans with Th2 asthmatic responses have a high level of intelectin expression.64 Shortly thereafter, the second intelectin, sItln-2 was identified in sheep abomasum and observed to be upregulated in response to infection with the nematode Teladorsagia circumcincta. SItln-2 localizes to the mucus neck cells and the gastric mucus of the abomasum, and was upregulated along with galectin-14, IL-4, and sheep mast cell protease (sMCP-1) in response to T. circumcincta infection.68 Proteomic analysis of the gastric epithelia of sheep capable of preventing the establishment of the parasite

(considered immune sheep), showed sItln-2 levels are higher than those of naïve sheep.70 When the parasite was reintroduced to previously infected sheep, they more rapidly upregulated sItln-2 than naïve sheep with initial infection.68 SItln-1 is also upregulated in the abomasum in response to T. circumcincta infection.66 Upon infection with another parasitic worm, Haemonchus contortus, sItln-2 was the most highly upregulated gene in the early stages of infection, followed by trefoil factor 3 (TFF3).71 The parallels in the expression of intelectins and trefoil factors is interesting. The trefoil factor family of proteins was recently shown to have lectin activity with

35 binding specificity for mucin glycans.72 Members of this lectin class are known to play a role in protection of the mucosal epithelium by altering the rheological properties of mucus.73, 74 Finally, infection of the lung with the parasitic worm Dictyocaulus filaria caused upregulation of sItln-2 and sItln-3, which do not normally show expression in the lung.66

The size of parasitic nematodes explains why they cannot be phagocytosed. Rather, via the Th2 immune response, tissue repair pathways are activated to strengthen the epithelial barrier and expel the nematodes. Expulsion is achieved through multiple mechanisms including epithelial hyperproliferation and increased mucus production.75 It is unclear whether the observed increase in sItln expression in response to nematode infection is simply a byproduct of the expansion of goblet cells and increased mucus secretion or if sItlns themselves play a direct role in the defense against nematodes. However, the data implicate intelectins in the nematode response. Bolstering these studies in sheep is the finding that mice lacking mItln-2 are susceptible to parasitic infection, while those expressing mItln-2 are resistant.53 The molecular mechanism by which intelectins participate in nematode resistance is unknown. New tools such as expanded glycan microarrays to identify the carbohydrate ligands of the various intelectins and intelectin knockout model species will be crucial to understanding the role of these proteins in response to nematode infection.

1.8 Human intelectins

Humans have two intelectins, hItln-1 and hItln-2, which share 83% identity and differ in their binding pocket residues (Figures 1-2A, 1-3) and N-terminal regions. Three separate reports identified the human intelectins in 2001.13-15 Using cDNA libraries from small intestine, two homologs of XL-35 were identified and termed HL-1 and HL-2.13 In tandem, hItln-1 was identified from both placental tissue and fetal intestine tissue.14, 15 Analysis of mRNA levels in

36 fetal tissues showed that hItln-1 is expressed very highly in the intestine.14 A proteomic analysis of a preterm infant revealed that hItln-1 was shown to be one of the 20 most abundant proteins present in stool throughout the first month of life.76 Expression patterns in adult tissues differ, with the highest mRNA expression observed in heart and lower expression observed in the small intestine, colon, thymus, salivary gland, ovary, testis and pancreas.13-15 HItln-2, in contrast, was only expressed in small intestine.13 Single cell RNA sequencing data from human intestinal biopsy tissue shows that hItln-1 is a goblet cell product, while hItln-2 is a Paneth cell product.77,

78 Similarly, in the adult lung, hItln-1 was shown to localize to goblet cells by immunohistochemistry.79 The human proteins omentin and lactoferrin receptor (LfR) are identical to hItln-1, and research has implicated this protein, under whichever name, plays a role in innate immunity, metabolic disorders, cancers, asthma and inflammatory bowel disease

(IBD)c. In contrast, we could not find studies that focus on hItln-2.

1.8.1 Glycan recognition

Glycan recognition by hItln-1 has been extensively characterized. While early studies of hItln-1 suggested it to be a furanose binding lectin,15 further characterization revealed that it does not have specificity for furanose sugars. The lectin specificity rests in its engagement of glycans with an exocyclic vicinal diol moiety found in the microbial monosaccharides β-D-Galf, D- glycero-D-talo-oct-2-ulosonic acid (KO), and D-glycero-D-talo-oct-2-ulosonic acid 3-deoxy-D- manno-oct-2-ulosonic acid (KDO), and Gro-1-P.31, 80 A likely explanation for the confusion around glycan specificity is that the initial studies used carbohydrates that had a free anomeric hydroxyl (free reducing end). As a result, the carbohydrates tested could adopt either the linear

37 or closed ring forms in solution. The linear form of the sugars would display diols that could be bound by hItln-1.

Although the heptose sugars contain an exocyclic vicinal diol, they are not ligands of hItln-1. Their inability to bind effectively to the lectin has been attributed to their prevalent solution conformations, stabilized by stereoelectronic effects, which is sterically incompatible with the hItln-1 binding site.80 The prevalent mammalian saccharide, sialic acid, is also not a ligand of hItln-1. In silico docking studies suggest that it too is sterically occluded from the binding pocket.31 HItln-1 therefore has evolved to have highly specific glycan recognition, despite a seemingly simple minimal binding epitope. Specifically, the exclusion the mammalian building block sialic acid and L,D-heptose, which is the most abundant bacterial monosaccharide,81 suggests that hItln-1 has evolved to recognize distinct bacterial species

1.8.2 Human omentin

An adipokine identified in human omental fat was named omentin82 and then later determined to be identical in sequence to human Itln-1.83 Omentin, which I will refer to as hItln-

1 for consistency, is detected in human serum and, while the levels of serum hItln-1 are highly variable, it has been measured as high as 800 ng/mL in healthy human sera.84 Serum omentin concentration has been observed to correlate inversely with body mass index, waist circumference, leptin levels, and insulin resistance;85 thus, hItln-1 is decreased in many metabolic and inflammatory conditions. Lower levels are also observed in many diseases including type 2 diabetes86-88 and polycystic ovary syndrome.89 Patients with IBD, including

Crohn’s disease (CD) and ulcerative colitis (UC), have decreased omentin expression in omental

38 tissue83 as well as serum omentin levels.90 Additional conditions and diseases with altered serum hItln-1 levels are compiled in reference 84.84

Studies of omentin in humans and mouse have pointed toward a potential anti- inflammatory role. In vitro studies of smooth muscle cells (SMCs) and human umbilical vein epithelial cells (HUVECs) have shown that omentin inhibits tumor necrosis factor alpha (TNF-α) activation of nuclear factor kappa-light-chain-enhancer of activated B cells (NF-κB), and the resultant expression of downstream effector molecules.91-93 Additionally, when human monocytes were differentiated to macrophages in the presence of recombinant omentin, the resulting macrophages were the anti-inflammatory M2-type macrophages, as indicated by expression of PPAR-γ.94 Recombinant omentin was also able to attenuate toll like receptor

(TLR) -4 stimulation by LPS and downstream activation of NF-κB in human macrophages from the U937 cell line.95 It is unclear from this study whether the effect of omentin-1 is due to omentin interaction with a receptor, or due to omentin-1 sequestering LPS, as core LPS contains the ligands of hItln-1, KO and KDO. Nevertheless, an in vivo study in mice investigating the effect of omentin in inflammatory bone disease showed that, in an omentin knockout mouse model, bone tissue has increased expression of TNF-α and the inflammatory cytokines IL-1α, IL-

1β, and IL-6. Treatment with an adenovirus containing the mouse omentin gene rescued the inflammatory phenotype and increased the prevalence of M2 macrophages in bone marrow.96

While these studies provide consistent evidence of omentin’s anti-inflammatory role both in vitro and in vivo, it should be noted that all recombinant omentin used in the studies discussed here was from suppliers that expressed the recombinant protein in E. coli or the source was not indicated. As discussed earlier, recombinant X-type lectins produced in E. coli lack N- glycosylation and would not be expected to be folded. Without a demonstration that the

39 recombinant protein is functional, such studies should be interpreted cautiously. To further elucidate the anti-inflammatory mechanism of omentin, mammalian produced recombinant proteins should be utilized and additional follow-up studies in mouse models will be necessary.

1.8.3 Intelectins in diseases associated with microbial dysbiosis

Genome-wide association studies (GWAS) have linked the hItln-1 polymorphism,

V109D, to asthma and CD.97, 98 Interestingly, an aspartic acid at this position is the most conserved residue across intelectins. In human Itln-1, valine is the predominant residue at position 109, suggesting that a valine at position 109 may confer some functional benefit.

However, further studies to understand potential structural and biological consequences of the two variants will be required to gain insight into the contribution to disease states. With regard to hItln-1 production, ]high levels are observed in the mucus of the lung during allergic asthma,99 and single cell RNA sequencing of human intestinal tissue has shown hITLN1 expression to be upregulated in CD and UC.77, 78 Asthma and IBD are each associated with dysbiosis of the microbiota.100, 101 The observations provided here along with the binding specificity of hItln-1 for microbial glycans strongly suggests a role of hItln-1 in regulation of the mucosa associated microbiota. To date, very few studies have examined the ability of hItln-1 to bind microbes,31, 55 and none have specifically examined binding to components of the human gut microbiota.

1.9 Conclusions

In this review, I have summarized the research to date into the Xenopus, sheep, mouse and human X-type lectins. Throughout chordates, the intelectins share high levels of sequence homology. Most of the variability between intelectin sequences occurs at the N-terminus, which

40 governs oligomerization state, such as is observed for the XEEL extended N-terminus (Figure 1-

1B).30 Additionally, there is variability in the residues of the carbohydrate recognition domain that confer glycan-binding specificity (Figure 1-2 and 1-3), suggesting that individual intelectins have evolved specificity for different glycan structures. While the glycan specificity of hItln-1 is well-defined, no glycan ligands have been identified for intelectins with varied binding site residues. Expanded glycan arrays will be crucial for determining the glycan specificities of intelectins and other soluble lectins. Missing from the current glycan arrays are representative glycans from fungi, helminths, and commensal microbes. The ability to interrogate glycans from these organisms will expand our understanding of protein carbohydrate interactions relevant to mucosal innate immunity.

While intelectin sequences across species are highly related, intelectin expression is variable. A recurrent theme in mammals is intelectin expression at mucosal sites, including the lung and the gastrointestinal tract. At these sites, expression of the intelectins is upregulated upon infection or in disease states, strongly suggesting a role in innate immunity at mucosal surfaces.

Finding that intelectin production is upregulated by the Th2 response is intriguing. The Th2 response also results in goblet cell differentiation and increased mucin secretion, as well as expression of other proteins including trefoil factors and galectins, which have known roles in crosslinking mucins to protect the epithelial surface.72-74 Intelectins are produced by the mucus secreting cells and have been shown to co-purify with the mucin Muc5Ac in sheep.77 While intelectin does not recognize mammalian glycans, it does have a fibrinogen-like domain that could participate in protein-protein interactions. Further, whether intelectins are able to alter mucus properties and whether their microbial binding abilities allow them to entrap or anchor microbes to the mucus layer will provide clues to intelectin biological function.

41 Finally, despite a proposed role in innate immunity, recognition of microbial glycans, and expression at mucosal surfaces, the ability of intelectins to interact with microbes resident to the human microbiome has not been explored. I address this gap with the experiments described in this thesis. Herein, I have further defined the glycan specificity of human intelectin-1, revealing that the heptoses are not ligands despite containing an exocyclic diol. This suggests a more selective microbial binding profile than previously suggested, as L,D-heptose the most abundant bacterial carbohydrate moiety.81 I then analyze the ability of hItln-1 to recognize gut commensal bacteria. To this end, I developed a new strategy, termed lectin-sequencing, as a method for identifying bacterial targets of lectins in native microbial communities such as the human gut microbiome. By lectin-sequencing, I identify bacterial species bound by hItln-1 from healthy human stool samples to reveal that hItln-1 recognizes many Gram-positive, butyrate-producing bacteria that are associated with a healthy microbiota. These findings suggest a role for hItln-1 in microbiome regulation that has not been proposed previous for lectins—that it may be a positive effector of microbial colonization.

42 1.10 Acknowledgements

I would like to thank Professor Laura L. Kiessling and all of LectinLand for thoughtful and stimulating conversations about intelectins, and for humoring me when I want to talk about panda intelectins, and Dr. Austin Kruger when I want to talk about coral intelectins. I would also like to thank Forrest FitzGerald, a rotation student who worked with me on the sequence alignments. Finally, I would like to thank Katherine Taylor, Melanie Halim, and Dr. Sayaka

Masuko for feedback and edits in preparing this chapter.

43 1.11 References

1. Bäckhed, F.; Ley, R. E.; Sonnenburg, J. L.; Peterson, D. A.; Gordon, J. I., Host- Bacterial Mutualism in the Human Intestine. Science 2005, 307 (5717), 1915-1920.

2. Thaiss, C. A.; Zmora, N.; Levy, M.; Elinav, E., The microbiome and innate immunity. Nature 2016, 535 (7610), 65-74.

3. Hooper, L. V.; Gordon, J. I., Commensal Host-Bacterial Relationships in the Gut. Science 2001, 292 (5519), 1115-1118.

4. Hooper, L. V.; Littman, D. R.; Macpherson, A. J., Interactions Between the Microbiota and the Immune System. Science 2012, 336 (6086), 1268-1273.

5. Tytgat, H. L. P.; Lebeer, S., The Sweet Tooth of Bacteria: Common Themes in Bacterial Glycoconjugates. Microbiology and Molecular Biology Reviews 2014, 78 (3), 372.

6. Costerton, J. W.; Irvin, R. T.; Cheng, K. J., The bacterial glycocalyx in nature and disease. Annual Review of Microbiology 1981, 35, 299-324.

7. Halina Lis; Sharon, N., Lectins: Carbohydrate-Specific Proteins That Mediate Cellular Recognition. Chemical Reviews 1998, 98, 637-674.

8. Weis, W. I.; Drickamer, K., Structural Basis of Lectin-Carbohydrate Recognition. Annual Review of Biochemistry 1996, 65 (1), 441-473.

9. Wesener, D. A.; Dugan, A.; Kiessling, L. L., Recognition of microbial glycans by soluble human lectins. Current Opinion in Structural Biology 2017, 44, 168-178.

10. Lee, J.-K.; Baum, L. G.; Moremen, K. W.; Pierce, M., The X-Lectins: A new family with homology to the Xenopus laevis oocyte lectin XL-35. Glycoconjugate Journal 2004, (21), 443-450.

11. Chen, L.; Li, J.; Yang, G., A comparative review of intelectins. Scandinavian Journal of Immunology 2020, 92 (1), e12882.

12. Komiya, T.; Tanigawa, Y.; Hirohashi, S., Cloning of the novel gene intelectin, which is expressed in intestinal paneth cells in mice. Biochemical and Biophysical Research Communications 1998, 251 (3), 759-762.

13. Lee, J.-K.; Schnee, J.; Pang, M.; Wolfert, M.; Baum, L. G.; Moremen, K. W.; Pierce, M., Human homologs of the Xenopus oocyte cortical granule lectin XL35. Glycobiology 2001, 11 (1), 65-73.

14. Suzuki, Y. A.; Shin, K.; Lönnerdal, B., Molecular Cloning and Functional Expression of a Human Intestinal Lactoferrin Receptor. Biochemistry 2001, 40 (51), 15771-15779.

44 15. Tsuji, S.; Uehori, J.; Matsumoto, M.; Suzuki, Y.; Matsuhisa, A.; Toyoshima, K.; Seya, T., Human Intelectin Is a Novel Soluble Lectin That Recognizes Galactofuranose in Carbohydrate Chains of Bacterial Cell Wall. Journal of Biological Chemistry 2001, (276), 23456-23463.

16. Yan, J.; Xu, L.; Zhang, Y.; Zhang, C.; Zhang, C.; Zhao, F.; Feng, L., Comparative genomic and phylogenetic analyses of the intelectin gene family: implications for their origin and evolution. Developmental and Comparative Immunology 2013, 41 (2), 189-199.

17. Lin, B.; Cao, Z.; Su, P.; Zhang, H.; Li, M.; Lin, Y.; Zhao, D.; Shen, Y.; Jing, C.; Chen, S.; Xu, A., Characterization and comparative analyses of zebrafish intelectins: Highly conserved sequences, diversified structures and functions. Fish & Shellfish Immunology 2009, 26 (3), 396-405.

18. Lu, Z. H.; di Domenico, A.; Wright, S. H.; Knight, P. A.; Whitelaw, C. B. A.; Pemberton, A. D., Strain-specific copy number variation in the intelectin locus on the 129 mouse chromosome 1. BMC Genomics 2011, 12 (110).

19. Xue, Z.; Pang, Y.; Liu, X.; Zheng, Z.; Xiao, R.; Jin, M.; Han, Y.; Su, P.; Lv, L.; Wang, J.; Li, Q., First evidence of protein G-binding protein in the most primitive vertebrate: serum lectin from lamprey (Lampetra japonica). Developmental and Comparative Immunology 2013, 41 (4), 618-630.

20. Russell, S.; Young, K. M.; Smith, M.; Hayes, M. A.; Lumsden, J. S., Identification, cloning and tissue localization of a rainbow trout (Oncorhynchus mykiss) intelectin-like protein that binds bacteria and chitin. Fish & Shellfish Immunology 2008, 25 (1), 91-105.

21. Li, J.; Chen, Y.; Gu, W.; Xu, F.; Li, H.; Shan, S.; Sun, X.; Yin, M.; Yang, G.; Chen, L., Characterization of a common carp intelectin gene with bacterial binding and agglutination activity. Fish & Shellfish Immunology 2021, 108, 32-41.

22. Tsutsui, S.; Komatsu, Y.; Sugiura, T.; Araki, K.; Nakamura, O., A unique epidermal mucus lectin identified from catfish (Silurus asotus): first evidence of intelectin in fish skin slime. The Journal of Biochemistry 2011, 150 (5), 501-514.

23. Yan, J.; Wang, J.; Zhao, Y.; Zhang, J.; Bai, C.; Zhang, C.; Zhang, C.; Li, K.; Zhang, H.; Du, X.; Feng, L., Identification of an amphioxus intelectin homolog that preferably agglutinates gram-positive over gram-negative bacteria likely due to different binding capacity to LPS and PGN. Fish & Shellfish Immunology 2012, 33 (1), 11-20.

24. Xue, Z.; Pang, Y.; Liu, X.; Zheng, Z.; Xiao, R.; Jin, M.; Han, Y.; Su, P.; Lv, L.; Wang, J.; Li, Q., First evidence of protein G-binding protein in the most primitive vertebrate: Serum lectin from lamprey (Lampetra japonica). Developmental & Comparative Immunology 2013, 41 (4), 618-630.

25. Altschul, S. F.; Gish, W.; Miller, W.; Myers, E. W.; Lipman, D. J., Basic local alignment search tool. Journal of Molecular Biology 1990, 215 (3), 403-410.

45 26. Jin, L.; Wu, D.; Li, C.; Zhang, A.; Xiong, Y.; Wei, R.; Zhang, G.; Yang, S.; Deng, W.; Li, T.; Li, B.; Pan, X.; Zhang, Z.; Huang, Y.; Zhang, H.; He, Y.; Zou, L., Bamboo nutrients and microbiome affect gut microbiome of giant panda. Symbiosis 2020, 80 (3), 293- 304.

27. Song, S. J.; Sanders, J. G.; Delsuc, F.; Metcalf, J.; Amato, K.; Taylor, M. W.; Mazel, F.; Lutz, H. L.; Winker, K.; Graves, G. R.; Humphrey, G.; Gilbert, J. A.; Hackett, S. J.; White, K. P.; Skeen, H. R.; Kurtis, S. M.; Withrow, J.; Braile, T.; Miller, M.; McCracken, K. G.; Maley, J. M.; Ezenwa, V. O.; Williams, A.; Blanton, J. M.; McKenzie, V. J.; Knight, R., Comparative Analyses of Vertebrate Gut Microbiomes Reveal Convergence between Birds and Bats. mBio 2020, 11 (1), e02901-19.

28. Lee, J.-K.; Buckhaults, P.; Wilkes, C.; Teilhet, M.; King, M. L.; Moremen, K. W.; Pierce, M., Cloning and expression of a Xenopus laevis oocyte lectin and characterization of its mRNA levels during early development. Glycobiology 1997, 7 (3), 367-372.

29. Weis, W. I.; Taylor, M. E.; Drickamer, K., The C-type lectin superfamily in the immune system. Immunological Reviews 1998, 163, 19-34.

30. Wangkanont, K.; Wesener, D. A.; Vidani, J. A.; Kiessling, L. L.; Forest, K. T., Structures of Xenopus Embryonic Epidermal Lectin Reveal a Conserved Mechanism of Microbial Glycan Recognition. Journal of Biological Chemistry 2016, 291 (11), 5596-5610.

31. Wesener, D. A.; Wangkanont, K.; McBride, R.; Song, X.; Kraft, M. B.; Hodges, H. L.; Zarling, L. C.; Splain, R. A.; Smith, D. F.; Cummings, R. D.; Paulson, J. C.; Forest, K. T.; Kiessling, L. L., Recognition of microbial glycans by human intelectin-1. Nature Structural and Molecular Biology 2015, 22 (8), 603-610.

32. Kiessling, L. L.; Young, T.; Mortell, K. H., Multivalency in Protein-Carbohydrate Recognition. In Glycoscience: Chemistry and Chemical Biology I–III, Fraser-Reid, B. O.; Tatsuta, K.; Thiem, J., Eds. Springer Berlin Heidelberg: Berlin, Heidelberg, 2001; pp 1817-1861.

33. Zuliani-Alvarez, L.; Midwood, K. S., Fibrinogen-Related Proteins in Tissue Repair: How a Unique Domain with a Common Structure Controls Diverse Aspects of Wound Healing. Advances in Wound Care 2015, 4 (5), 273-285.

34. Imberty, A.; Prestegard, J. H., Structural Biology of Glycan Recognition. In Essentials of Glycobiology, 3 ed.; Varki, A.; Cummings, R. D.; Esko, J. D.; Stanley, P.; Hart, G. W.; Aebi, M.; Darvill, A. G.; Kinoshita, T.; Packer, N. H.; Prestegard, J. H.; Schnaar, R. L.; Seeberger, P. H., Eds. Cold Spring Harbor Laboratory Press: Cold Spring Harbor (NY), 2015; pp 387-400.

35. Goujon, M.; McWilliam, H.; Li, W.; Valentin, F.; Squizzato, S.; Paern, J.; Lopez, R., A new bioinformatics analysis tools framework at EMBL–EBI. Nucleic Acids Research 2010, 38 (suppl_2), W695-W699.

36. Sievers, F.; Wilm, A.; Dineen, D.; Gibson, T. J.; Karplus, K.; Li, W.; Lopez, R.; McWilliam, H.; Remmert, M.; Söding, J.; Thompson, J. D.; Higgins, D. G., Fast, scalable

46 generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular Systems Biology 2011, 7 (1), 539.

37. Wagih, O., ggseqlogo: a versatile R package for drawing sequence logos. Bioinformatics 2017, 33 (22), 3645-3647.

38. Wyrick, R. E.; Nishihara, T.; Hedrick, J. L., Agglutination of Jelly Coat and Cortical Granule Components and the Block to Polyspermy in the Amphibian Xenopus laevis. Proceedings of the National Academy of Sciences of the United States of America 1974, 71 (5), 2067-2071.

39. Roberson, M. M.; Barondes, S. H., Lectin from Embryos and Oocytes of Xenopus laevis: Purification and Properties. The Journal of Biological Chemistry 1982, 257 (13), 7520-7524.

40. Roberson, M. M.; Barondes, S. H., Xenopus laevis Lectin is Localized at Several Sites in Xenopus Oocytes, Eggs, and Embryos. The Journal of Cell Biology 1983, 97 (6), 1875-1881.

41. Nishihara, T.; Wyrick, R. E.; Working, P. K.; Chen, Y.; Hedrick, J. L., Isolation and Characterization of a Lectin from the Cortical Granules of Xenopus laevis Eggs. Biochemistry 1986, (25), 6013-6020.

42. Greve, L. C.; Hedrick, J. L., An Immunocytochemical Localization of the Cortical Granule Lectin in Fertilized and Unfertilized Eggs of Xenopus laevis. Gamete Research 1978, (1), 13-18.

43. Quill, T. A.; Hedrick, J. L., The fertilization layer mediated block to polyspermy in Xenopus laevis: isolation of the cortical granule lectin ligand. Archives of Biochemistry Biophysics 1996, 333 (2), 326-332.

44. Arranz-Plaza, E.; Tracy, A. S.; Siriwardena, A.; Pierce, J. M.; Boons, G.-J., High- Avidity, Low-Affinity Multivalent Interactions and the Block to Polyspermy in Xenopus laevis. Journal of the American Chemical Society 2002, 124 (44), 13035-13046.

45. Shoji, H.; Ikenaka, K.; Nakakita, S.; Hayama, K.; Hirabayashi, H.; Arata, Y.; Kasai, K.; Nishi, N.; Nakamura, T., Xenopus galectin-VIIa binds N-glycans of members of the cortical granule lectin family (xCGL and xCGL2). Glycobiology 2005, 15 (7), 709-720.

46. Roberson, M. M.; Wolffe, A. P.; Tata, J. R.; Barondes, S. H., Galactoside-binding Serum Lectin of Xenopus laevis. The Journal of Biological Chemistry 1985, 260 (20), 11027-11032.

47. Ishino, T.; Kunieda, T.; Natori, S.; Sekimizu, K.; Kubo, T., Identification of novel members of the Xenopus Ca2+-dependent lectin family and analysis of their gene expression during tail regeneration and development. Journal of Biochemistry 2007, 141, 479-488.

48. Klein, S. L.; Strausberg, R. L.; Wagner, L.; Pontius, J.; Clifton, S. W.; Richardson, P., Genetic and genomic tools for Xenopus research: The NIH Xenopus initiative. Developmental Dynamics 2002, 225 (4), 384-391.

47 49. Nagata, S.; Nishiyama, S.; Ikazaki, Y., Bacterial lipopolysaccharides stimulate production of XCL1, a calcium-dependent lipopolysaccharide-binding serum lectin, in Xenopus laevis. Developmental and Comparative Immunology 2013, 40, 94-102.

50. Nagata, S., Identification and characterization of a novel intelectin in the digestive tract of Xenopus laevis. Developmental and Comparative Immunology 2016, 59, 229-239.

51. Nagata, S.; Nakanishi, M.; Nanba, R.; Fujita, N., Developmental expression of XEEL, a novel molecule of the Xenopus oocyte cortical granule lectin family. Development Genes and Evolution 2003, 213 (7), 368-370.

52. Nagata, S., Isolation, characterization, and extra-embryonic secretion of the Xenopus laevis embryonic epidermal lectin, XEEL. Glycobiology 2005, 15 (3), 281-290.

53. Pemberton, A. D.; Knight, P. A.; Gamble, J.; Colledge, W. H.; Lee, J. K.; Pierce, M.; Miller, H. R. P., Innate BALB/c enteric epithelial responses to Trichinella spiralis: inducible expression of a novel goblet cell lectin, intelectin-2, and its natural deletion in C57BL/10 mice. The Journal of Immunology 2004, 173 (3), 1894-1901.

54. Tsuji, S.; Yamashita, M.; Nishiyama, A.; Shinohara, T.; Zhongwei, U.; Myrvik, Q. N.; Hoffman, D. R.; Henriksen, R. A.; Shibata, Y., Differential structure and activity between human and mouse intelectin-1: Human intelectin-1 is a disulfide-linked trimer, whereas mouse homologue is a monomer. Glycobiology 2007, 17 (10), 1045-1051.

55. Tsuji, S.; Yamashita, M.; Hoffman, D. R.; Nishiyama, A.; Shinohara, T.; Ohtsu, T.; Shibata, Y., Capture of heat-killed Mycobacterium bovis bacillus Calmette-Guerin by intelectin-1 deposited on cell surfaces. Glycobiology 2009, 19 (5), 518-526.

56. Cash, H. L.; Whitham, C. V.; Behrendt, C. L.; Hooper, L. V., Symbiotic Bacteria Direct Expression of an Intestinal Bactericidal Lectin. Science 2006, 313 (5790), 1126-1130.

57. Pemberton, A. D.; Knight, P. A.; Wright, S. H.; Miller, H. R. P., Proteomic analysis of mouse jejunal epithelium and its response to infection with the intestinal nematode, Trichinelia spiralis. Proteomics 2004, 4 (4), 1101-1108.

58. Datta, R.; DdeSchoolmeester, M. L.; Hedeler, C.; Paton, N. W.; Brass, A. M.; Else, K. J., Identification of novel genes in intestinal tissue that are regulated after infection with an intestinal nematode parasite. Infection and Immunity 2005, 73 (7), 4025-4033.

59. Voehringer, D.; Stanley, S. A.; Cox, J. S.; Completo, G. C.; Lowary, T. L.; Locksley, R. M., Nippostrongylus brasiliensis: Identification of intelectin-1 and -2 as Stat6-dependent genes expressed in lung and intestine during infection. Experimental Parasitology 2007, 116 (4), 458- 466.

60. Koyasu, S.; Moro, K., Type 2 innate immune responses and the natural helper cell. Immunology 2011, 132 (4), 475-481.

48 61. Erle, D. J.; Sheppard, D., The cell biology of asthma. Journal of Cell Biology 2014, 205 (5), 621-631.

62. Kaplan, M. H.; Schindler, U.; Smiley, S. T.; Grusby, M. J., Stat6 Is Required for Mediating Responses to IL-4 and for the Development of Th2 Cells. Immunity 1996, 4 (3), 313- 319.

63. B., G. N.; Kang, G. N.; Jin, C. E.; Xu, Y. J.; Zhang, Z. X.; Erle, D. J.; Zhen, G. H., Intelectin is required for IL-13-induced monocyte chemotactic protein-1 and-3 expression in lung epithelial cells and promotes allergic airway inflammation. American Journal of Physiology - Lung Cellular and Molecular Biology 2010, 298 (3), L290-L296.

64. Kuperman, D. A.; Lewis, C. C.; Woodruff, P. G.; Rodriguez, M. W.; Yang, Y. H.; Dolganov, G. M.; Fahy, J. V.; Erle, D. J., Dissecting asthma using focused transgenic modeling and functional genomics. Journal of Allergy and Clinical Immunology 2005, 116 (2), 305-311.

65. Zhen, G.; Park, S. W.; Nguyenvu, L. T.; Rodriguez, M. W.; Barbeau, R.; Paquet, A. C.; Erle, D. J., IL-13 and epidermal growth factor receptor have critical but distinct roles in epithelial cell mucin production. American Journal of Respiratory Cell and Molecular Biology 2007, 36 (2), 244-253.

66. French, A. T.; Knight, P. A.; Smith, W. D.; Pate, J. A.; Miller, H. R. P.; Pemberton, A., Expression of three intelectins in sheep and response to a Th2 environment. Veterinary Research 2009, 40 (6), 53.

67. French, A. T.; Bethune, J. A.; Knight, P. A.; McNeily, T. N.; Wattegedera, S.; Rhind, S.; Miller, H. R. P.; Pemberton, A. D., The expression of intelectin in sheep goblet cells and upregulation by interleukin-4. Veterinary Immunology and Immunopathology 2007, 123 (1-2), 41-46.

68. French, A. T.; Knight, P. A.; Smith, W. D.; Brown, J. K.; Craig, N. M.; Pate, J. A.; Miller, H. R. P.; Pemberton, A. D., Up-regulation of intelectin in sheep after infection with Teladorsagia circumcincta. International Journal for Parasitology 2007, 38 (3-4), 467-475.

69. Pemberton, A. D.; Verdon, B.; Inglis, N. F.; Pearson, J. P., Sheep Intelectin-2 co-purifies with the mucin Muc5ac from gastric mucus. Research in Vetrinary Science 2011, 91 (3), E53- E57.

70. Athanasiadou, S.; Pemberton, A.; Jackson, F.; Inglis, N.; Miller, H. R. P.; Thevenod, F.; Mackellar, A.; Huntley, J. F., Proteomic approach to identify candidate effector molecules during the in vitro immune exclusion of infective Teladorsagia circumcincta in the abomasum of sheep. Veterinary Research 2008, 39 (6), 58.

71. Rowe, A.; Gondro, C.; Emery, D.; Sangster, N., Sequential microarray to identify timing of molecular responses to Haemonchus contortus infection in sheep. Veterinary Parasitology 2009, 161 (1-2), 76-87.

49 72. Järvå, M. A.; Lingford, J. P.; John, A.; Soler, N. M.; Scott, N. E.; Goddard-Borger, E. D., Trefoil factors share a lectin activity that defines their role in mucus. Nature Communications 2020, 11 (1), 2265.

73. Thim, L.; Madsen, F.; Poulsen, S. S., Effect of trefoil factors on the viscoelastic properties of mucus gels. European Journal of Clinical Investigation 2002, 32 (7), 519-527.

74. Bastholm, S. K.; Samson, M. H.; Becher, N.; Hansen, L. K.; Stubbe, P. R.; Chronakis, I. S.; Nexo, E.; Uldbjerg, N., Trefoil factor peptide 3 is positively correlated with the viscoelastic properties of the cervical mucus plug. Acta Obstetricia et Gynecologica Scandinavica 2017, 96 (1), 47-52.

75. Sorobetea, D.; Svensson-Frej, M.; Grencis, R., Immunity to gastrointestinal nematode . Mucosal Immunology 2018, 11 (2), 304-315

76. Young, J. C.; Pan, C.; Adams, R. M.; Brooks, B.; Banfield, J. F.; Morowitz, M. J.; Hettich, R. L., Metaproteomics reveals functional shifts in microbial and human proteins during a preterm infant gut colonization case. Proteomics 2015, 15 (20), 3463-3473.

77. Smillie, C. S.; Biton, M.; Ordovas-Montanes, J.; Sullivan, K. M.; Burgin, G.; Graham, D. B.; Herbst, R. H.; Rogel, N.; Slyper, M.; Waldman, J.; Sud, M.; Andrews, E.; Velonias, G.; Haber, A. L.; Jagadeesh, K.; Vickovic, S.; Yao, J.; Stevens, C.; Dionne, D.; Nguyen, L. T.; Villani, A.-C.; Hofree, M.; Creasey, E. A.; Huang, H.; Rozenblatt-Rosen, O.; Garber, J. J.; Khalili, H.; Desch, A. N.; Daly, M. J.; Ananthakrishnan, A. N.; Shalek, A. K.; Xavier, R. J.; Regev, A., Intra- and Inter-cellular Rewiring of the Human Colon during Ulcerative Colitis. Cell 2019, 178 (3), 714-730.e22.

78. Martin, J. C.; Chang, C.; Boschetti, G.; Ungaro, R.; Giri, M.; Grout, J. A.; Gettler, K.; Chuang, L.-s.; Nayar, S.; Greenstein, A. J.; Dubinsky, M.; Walker, L.; Leader, A.; Fine, J. S.; Whitehurst, C. E.; Mbow, M. L.; Kugathasan, S.; Denson, L. A.; Hyams, J. S.; Friedman, J. R.; Desai, P. T.; Ko, H. M.; Laface, I.; Akturk, G.; Schadt, E. E.; Salmon, H.; Gnjatic, S.; Rahman, A. H.; Merad, M.; Cho, J. H.; Kenigsberg, E., Single-Cell Analysis of Crohn’s Disease Lesions Identifies a Pathogenic Cellular Module Associated with Resistance to Anti-TNF Therapy. Cell 2019, 178 (6), 1493-1508.e20.

79. Kerr, S. C.; Carrington, S. D.; Oscarson, S.; Gallagher, M. E.; Solon, M.; Yuan, S. P.; Ahn, J. N.; Dougherty, R. H.; Finkbeiner, W. E.; Peters, M. C.; Fahy, J. V., Intelectin-1 Is a Prominent Protein Constituent of Pathologic Mucus Associated with Eosinophilic Airway Inflammation in Asthma. American Journal of Respiratory and Critical Care Medicine 2014, 189 (8), 1005-1007.

80. McMahon, C. M.; Isabella, C. R.; Windsor, I. W.; Kosma, P.; Raines, R. T.; Kiessling, L. L., Stereoelectronic Effects Impact Glycan Recognition. Journal of the American Chemical Society 2020, 142 (5), 2386-2395.

81. Herget, S.; Toukach, P. V.; Ranzinger, R.; Hull, W. E.; Knirel, Y. A.; von der Lieth, C.- W., Statistical analysis of the Bacterial Carbohydrate Structure Data Base (BCSDB):

50 Characteristics and diversity of bacterial carbohydrates in comparison with mammalian glycans. BMC Structural Biology 2008, 8 (1), 35.

82. Yang, R.-Z.; Lee, M.-J.; Hu, H.; Pray, J.; Wu, H.-B.; Hansen, B. C.; Shuldiner, A. R.; Fried, S. K.; McLenithan, J. C.; Gong, D.-W., Identification of omentin as a novel depot-specific adipokine in human adipose tissue: possible role in modulating insulin action. American Journal of Physiology-Endocrinology and Metabolism 2006, 290 (6), E1253-E1261.

83. Schäffler, A.; Neumeier, M.; Herfarth, H.; Fürst, A.; Schölmerich, J.; Büchler, C., Genomic structure of human omentin, a new adipocytokine expressed in omental adipose tissue. Biochimica et Biophysica Acta 2005, 1732, 96-102.

84. Watanabe, T.; Watanabe-Kominato, K.; Takahashi, Y.; Kojima, M.; Watanabe, R., Adipose Tissue-Derived Omentin-1 Function and Regulation. In Comprehensive Physiology, Terjung, T., Ed. 2017; pp 765-781.

85. de Souza Batista, C. M.; Yang, R. Z.; Lee, M. J.; Glynn, N. M.; Yu, D. Z.; Pray, J.; Ndubuizu, K.; Patil, S.; Schwartz, A.; Kligman, M.; Fried, S. K.; Gong, D. W.; Shuldiner, A. R.; Pollin, T. I.; McLenithan, J. C., Omentin plasma levels and gene expression are decreased in obesity. Diabetes 2007, 56 (6), 1655-1661.

86. Pan, H.-Y.; Guo, L.; Li, Q., Changes of serum omentin-1 levels in normal subjects and in patients with impaired glucose regulation and with newly diagnosed and untreated type 2 diabetes. Diabetes Research and Clinical Practice 2010, 88 (1), 29-33.

87. Yoo, H. J.; Hwang, S. Y.; Hong, H. C.; Choi, H. Y.; Yang, S. J.; Seo, J. A.; Kim, S. G.; Kim, N. H.; Choi, K. M.; Choi, D. S.; Baik, S. H., Association of circulating omentin-1 level with arterial stiffness and carotid plaque in type 2 diabetes. Cardiovascular Diabetology 2011, 10 (1), 103.

88. El-Mesallamy, H. O.; El-Derany, M. O.; Hamdy, N. M., Serum omentin-1 and chemerin levels are interrelated in patients with Type 2 diabetes mellitus with or without ischaemic heart disease. Diabetic Medicine 2011, 28 (10), 1194-1200.

89. Tan, B. K.; Adya, R.; Farhatullah, S.; Lewandowski, K. C.; O’Hare, P.; Lehnert, H.; Randeva, H. S., Omentin-1, a Novel Adipokine, Is Decreased in Overweight Insulin-Resistant Women With Polycystic Ovary Syndrome. Diabetes 2008, 57 (4), 801-808.

90. Yin, J.; Hou, P.; Wu, Z.; Nie, Y., Decreased Levels of Serum Omentin-1 in Patients with Inflammatory Bowel Disease. Medical Science Monitor 2015, 21, 118-122.

91. Zhong, X.; Li, X.; Liu, F.; Tan, H.; Shang, D., Omentin inhibits TNF-α-induced expression of adhesion molecules in endothelial cells via ERK/NF-κB pathway. Biochemical and Biophysical Research Communications 2012, 425 (2), 401-406.

92. Yamawaki, H.; Kuramoto, J.; Kameshima, S.; Usui, T.; Okada, M.; Hara, Y., Omentin, a novel adipocytokine inhibits TNF-induced vascular inflammation in human endothelial cells. Biochemical and Biophysical Research Communications 2011, 408 (2), 339-343.

51 93. Kazama, K.; Usui, T.; Okada, M.; Hara, Y.; Yamawaki, H., Omentin plays an anti- inflammatory role through inhibition of TNF-alpha-induced superoxide production in vascular smooth muscle cells. European Journal of Pharmacology 2012, 686 (1-3), 116-123.

94. Watanabe, K.; Watanabe, R.; Konii, H.; Shirai, R.; Sato, K.; Matsuyama, T. A.; Ishibashi-Ueda, H.; Koba, S.; Kobayashi, Y.; Hirano, T.; Watanabe, T., Counteractive effects of omentin-1 against atherogenesis. Cardiovascular research 2016, 110 (1), 118-128.

95. Wang, J. Z.; Gao, Y.; Lin, F.; Han, K.; Wang, X. Z., Omentin-1 attenuates lipopolysaccharide (LPS)-induced U937 macrophages activation by inhibiting the TLR4/MyD88/NF-κB signaling. Archives of Biochemistry and Biophysics 2020, 679, 7.

96. Rao, S.-S.; Hu, Y.; Xie, P.-L.; Cao, J.; Wang, Z.-X.; Liu, J.-H.; Yin, H.; Huang, J.; Tan, Y.-J.; Luo, J.; Luo, M.-J.; Tang, S.-Y.; Chen, T.-H.; Yuan, L.-Q.; Liao, E.-Y.; Xu, R.; Liu, Z.-Z.; Chen, C.-Y.; Xie, H., Omentin-1 prevents inflammation-induced osteoporosis by downregulating the pro-inflammatory cytokines. Bone Research 2018, 6 (1), 9.

97. Barrett, J. C.; Hansoul, S.; Nicolae, D. L.; Cho, J. H.; Duerr, R. H.; Rioux, J. D.; Brant, S. R.; Silverberg, M. S.; Taylor, K. D.; Barmada, M. M.; Bitton, A.; Dassopoulos, T.; Datta, L. W.; Green, T.; Griffiths, A. M.; Kistner, E. O.; Murtha, M. T.; Regueiro, M. D.; Rotter, J. I.; Schumm, L. P.; Steinhart, A. H.; Targan, S. R.; Xavier, R. J.; the, NIDDK IBD Genetics Consortium; Libioulle, C.; Sandor, C.; Lathrop, M.; Belaiche, J.; Dewit, O.; Gut, I.; Heath, S.; Laukens, D.; Mni, M.; Rutgeerts, P.; Van Gossum, A.; Zelenika, D.; Franchimont, D.; Hugot, J.-P.; de Vos, M.; Vermeire, S.; Louis, E.; the Belgian-French IBD Consortium; the Wellcome Trust Case Control Consortium; Cardon, L. R.; Anderson, C. A.; Drummond, H.; Nimmo, E.; Ahmad, T.; Prescott, N. J.; Onnie, C. M.; Fisher, S. A.; Marchini, J.; Ghori, J.; Bumpstead, S.; Gwilliam, R.; Tremelling, M.; Deloukas, P.; Mansfield, J.; Jewell, D.; Satsangi, J.; Mathew, C. G.; Parkes, M.; Georges, M.; Daly, M. J., Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease. Nature Genetics 2008, 40 (8), 955-962.

98. Pemberton, A. D.; Rose-Zerilli, M. J.; Holloway, J. W.; Gray, R. D.; Holgate, S. T., A single-nucleotide polymorphism in intelectin 1 is associated with increased asthma risk. Journal of Allergy and Clinical Immunology 2008, 122 (5), 1033-1034.

99. Wu, J.; Kobayashi, M.; Sousa, E. A.; Liu, W.; Cai, J.; Goldman, S. J.; Dorner, A. J.; Projan, S. J.; Kavuru, M. S.; Qiu, Y.; Thomassen, M. J., Differential Proteomic Analysis of Bronchoalveolar Lavage Fluid in Asthmatics following Segmental Antigen Challenge. Molecular & Cellular Proteomics 2005, 4 (9), 1251-1264.

100. Hufnagl, K.; Pali-Schöll, I.; Roth-Walter, F.; Jensen-Jarolim, E., Dysbiosis of the gut and lung microbiome has a role in asthma. Seminars in Immunopathology 2020, 42 (1), 75-93.

101. Stappenbeck, T. S.; Rioux, J. D.; Mizoguchi, A.; Saitoh, T.; Huett, A.; Darfeuille- Michaud, A.; Wileman, T.; Mizushima, N.; Carding, S.; Parkes, M.; Xavier, R. J., Crohn disease: A current perspective on genetics, autophagy and immunity. Autophagy 2011, 7 (4), 355- 374.

52

Chapter 2

Stereoelectronic effects impact glycan recognition

Reproduced with permission from McMahon, C.M; Isabella, C.R; Windsor, I.W.; Kosma, P.; Raines, R.T.; and Kiessling, L.L. Stereoelectronic effects impact glycan recognition. Journal of the American Chemical Society. 2020. 142, 5, 2386-2395. https://doi.org/10.1021/jacs.9b11699. Copyright 2020 American Chemical Society.

Contributions: Protein expression, purification and crystallization performed by Christine R. Isabella. Biolayer interferometry and ELISA performed by Caitlin M. McMahon. Bioinformatic analysis of the Protein Data Bank performed by Christine R. Isabella and Ian W. Windsor. Computational analysis performed by Caitlin M. McMahon and Ian W. Windsor. Glycans synthesized and provided by the laboratory of Paul Kosma. Research designed by Christine R. Isabella, Caitlin M. McMahon, Ian W. Windsor, and Laura L. Kiessling.

53 2.1 Abstract

Recognition of distinct glycans is central to biology, and lectins mediate this function.

Lectin glycan preferences are usually centered on specific monosaccharides. In contrast, human intelectin-1 (hItln-1, also known as Omentin-1) is a soluble lectin that binds a range of microbial sugars, including β-D-galactofuranose (β-Galf), D-glycerol 1-phosphate, D-glycero-D-talo-oct-2- ulosonic acid (KO), and 3-deoxy-D-manno-oct-2-ulosonic acid (KDO). Though these saccharides differ dramatically in structure, they share a common feature—an exocyclic vicinal diol. How and whether such a small fragment is sufficient for recognition was unclear. We tested several glycans with this epitope and found that L-glycero-α-D-manno-heptose (L,D-heptose) and D- glycero-α-D-manno-heptose (D,D-heptose) possess the critical diol motif yet bind weakly. To better understand hItln-1 recognition, we determined the structure of the hItln-1·KO complex using X-ray crystallography, and our 1.59-Å resolution structure enabled unambiguous assignment of the bound KO conformation. This carbohydrate conformation was present in

>97% of the KDO/KO structures in the Protein Data Bank. Bioinformatic analysis revealed that

KO and KDO adopt a common conformation, while heptoses prefer different conformers. The preferred conformers of KO and KDO favor hItln-1 engagement, but those of the heptoses do not. Natural bond orbital (NBO) calculations suggest these observed conformations, including the side chain orientations, are stabilized by not only steric but also stereoelectronic effects.

Thus, our data highlight a role for stereoelectronic effects in dictating the specificity of glycan recognition by proteins. Finally, our finding that hItln-1 avoids binding prevalent glycans with a terminal 1,2 diol (e.g., N-acetylneuraminic acid, and L-glycero-α-D-manno-heptose) suggests the lectin has evolved to recognize distinct bacterial species.

54 2.2 Introduction

Lectins can selectively target specific cell types by recognizing cell-surface glycans. In this way, lectins mediate diverse processes ranging from fertilization to pathogen clearance and the immune response.1-4 An understanding of the molecular basis for glycan recognition is critical for determining lectin selectivity and therefore lectin function. Lectin-carbohydrate specificity can be elucidated using a combination of glycan-array profiling, nuclear magnetic resonance (NMR) spectroscopy, carbohydrate modeling, computational studies, bioinformatics, and X-ray crystallography.5-9 Still, glycan recognition specificity is difficult to predict.

Hydrogen bonding with sugar hydroxyl groups, CH–π interactions with aromatic amino acid side chains, and coordination to calcium ions can contribute to carbohydrate binding

(Figure 2-1a).5, 7, 10-13 Although carbohydrates are often considered to be flexible, influences such as steric repulsion, torsional strain, and stereoelectronic effects can give rise to conformational preferences14, 15. Individual saccharide binding constants are relatively weak because the glycan binding sites of lectins are not deep clefts but rather solvent exposed. As a result, the low-energy saccharide conformations, most commonly discussed in terms of φ, ψ, and

ω dihedral angles (Figure 2-1c),16 are those typically recognized and bound by proteins.17, 18

Most studies conducted to date have focused on the recognition of hexoses in the pyranose form, as these are common building blocks of mammalian glycans. Much less is known about how lectins recognize microbial saccharides, which include furanose sugars, the ulosonic acids (D- glycero-D-talo-oct-2-ulosonic acid (KO), 3-deoxy-D-manno-oct-2-ulosonic acid (KDO)), and the heptoses (L-glycero-α-D-manno-heptose and D-glycero-α-D-manno-heptose). Our studies of human intelectin-1 (hItln-1), a soluble lectin that recognizes specifically the glycan residues

55 found on microbial cells, prompted us to explore the conformations of these bacterial saccharides.

Figure 2-1. Factors contributing to lectin– carbohydrate binding and recognition. (a) The binding site of the complex of hItln-1 and allyl-β-Galf, is depicted with hydrogen bonding and metal-ion coordination with the hydroxyl groups of sugars represented. Aromatic residues play crucial roles in many sugar-protein complexes. (b) Human Itln-1 is a trimer, which can engage in multivalent binding to glycan-displaying surfaces or proteins. (c) Carbohydrate conformations are characterized by 3 dihedral angles (φ, ψ, ω). Flexibility at these positions results in multiple conformational possibilities, influencing overall shape and recognition. Stereoelectronic effects influence the prevalence of distinct sugar conformations.

HItln-1 is expressed at mucosal barriers in the small intestine and the lung,19-21 and has been implicated in diseases associated with dysbiosis of the microbiome, including Crohn’s disease, ulcerative colitis, asthma, and diabetes.22-24 We found that hItln-1 is a trimeric protein that binds the microbial monosaccharide β-D-galactofuranose (β-Galf) through the Ca2+- coordination of its exocyclic vicinal diol (Figure 2-1a).25 This epitope proved to be a determining feature for binding, as a glycan array revealed hItln-1 preferentially bound to polysaccharides containing residues with an exocyclic vicinal diol (e.g., β-Galf, glycerol 1- phosphate) over those without it. Moreover, the diol-recognition motif is conserved across species as the frog intelectin, XEEL, interacts with glycerol 1-phosphate through a similar binding mode.26 Still, hItln-1 does not bind all sugars with this epitope, as sialic acid, which is a common mammalian sugar, possesses such a diol but is not a ligand for hItln-1.25 These data indicate there are other determinants of recognition.

56 We investigated the binding of hItln-1 to saccharide residues that possess a terminal 1,2- diol, including KO, KDO, L-glycero-α-D-manno-heptose (L,D-heptose), and D-glycero-α-D- manno-heptose (D,D-heptose).25 These sugars are present in bacterial lipopolysaccharide (LPS) structures that can stimulate innate immunity, and all feature an exocyclic vicinal diol that could facilitate recognition by hItln-1.27 Our data reveal that these saccharides have conformational preferences that maximize favorable stereoelectronic effects and, in turn, dictate their ability to bind lectins. These findings provide guidelines for predicting the bound conformation of bacterial sugars.28 Because small differences in monovalent protein–carbohydrate interactions are amplified through multivalency (Figure 2-1b),29, 30 the conformational differences we observed can make critical contributions to lectin recognition and specificity.

2.3 Results

2.3.1 hItln-1 binding to microbial monosaccharides

Our previous microbial glycan array profiling identified several putative monosaccharide ligands of hItln-1, including glycerol 1-phosphate, KO, KDO, and L,D-heptose. Each of these saccharides has an exocyclic vicinal diol, the required epitope for calcium ion coordination in the hItln-1 binding site, but whether all bind hItln-1 was unclear. Specifically, glycan hits containing heptose ligands also had KO and/or KDO residues. We addressed this issue with a competition assay in which soluble monosaccharides were assessed for their ability to compete with hItln-1 binding to immobilized β-Galf. Using biolayer interferometry (BLI), we determined the half maximal inhibitory concentrations (IC50 values) for β-Galf, glycerol 1-phosphate, KO, KDO, and

L,D-heptose (Figure 2-2, Table 2-1). Allyl-KO was the most effective competitor, followed by allyl-KDO and D-glycerol 1-phosphate. L,D-Heptose was a weaker binder, exhibiting an IC50 value 80-fold higher than that of allyl-KO. The IC50 values were used to estimate the relative free

57

Figure 2-2. Human intelectin-1 (hItln-1) binding to monosaccharides in a biolayer interferometry (BLI) competition assay. (a) Structures of monosaccharides identified as potential hItln-1 ligands from glycan microarray and evaluated in hItln-1 binding studies. (b) Schematic of BLI competition assay used to assess IC50 values of hItln-1 monosaccharide ligands. Biotinylated galactofuranose (β-Galf) was immobilized on streptavidin-coated biosensors and incubated with hItln-1 in the presence of varying concentrations of soluble monosaccharides. (c) Real-time BLI sensorgrams of competition by each monosaccharide shown in (a) for hItln-1 binding to immobilized biotinylated β-Galf. (d) Competition of soluble monosaccharide epitopes with immobilized β-Galf using BLI. Endpoint data (710–720 s) from BLItz sensorgrams were averaged and normalized for each competitor concentration and fitted to a one site logIC50 equation (solid lines). IC50 values were determined (Table 2-1). The allyl glycoside of each compound in the anomer shown in (a) was tested in the competition assay.

58 energy (ΔΔG) of binding between the test ligands, which revealed a 2.6 kcal/mol decrease in binding energy for L,D-heptose versus KO (Table 2-1). Because the exocyclic vicinal diol is engaged in calcium coordination, we first compared its conformation. The stereochemistry at the

C6-position of L,D-heptose differs from that of the corresponding side chain of KO and KDO

(C7). We therefore evaluated hItln-1 binding to D,D-heptose, as the diol configuration matches that of KO and KDO. D,D-Heptose is a less naturally abundant microbial monosaccharide than

L,D-heptose31 and was less frequently represented on the glycan array. Competition of D,D- heptose with hItln1- β-Galf binding afforded an IC50 of 6.8 mM, indicating that D,D-heptose is a better ligand than L,D-heptose but not as effective as KO, KDO, or β-Galf.

Table 2-1. IC50 values of ligands and corresponding changes in free energy of binding compared to KO glycerol KO KDO β-Galf 1-phosphate D,D-heptose L,D-heptose a IC50 (mM) 0.7 (±0.3) 1.5 (±0.4) 1.8 (±0.1) 2.1 (±0.4) 6.8 (±2.3) 53 (±29) b ∆∆G(kcal/mol) KO − − − − − − KDO 0.4 − − − − − β-Galf 0.6 0.1 − − − − glycerol 1-phosphate 0.7 0.2 0.1 − − − D,D-heptose 1.3 0.9 0.8 0.7 − − L,D-heptose 2.4 2.0 2.0 1.9 1.2 − aData is shown as mean (±standard deviation) of two independent experiments; bThe relative free energy of binding is ∆∆G = 0.593 (kcal/mol) ln(IC50,row/ IC50,column).

HItln-1 is a trimer; therefore, like many oligomeric lectins, hItln-1 can engage in multivalent interactions at the cell surface. Monovalent carbohydrate ligands tend to bind proteins with low affinities and free energies of binding (i.e., Kd values that are often in the millimolar range), and oligomeric lectins can take advantage of multivalency to achieve functional avidity.29, 32 Differences in monovalent binding affinity are amplified through

59 multivalency to endow glycan-binding proteins with high specificity for cell-surface recognition.

We therefore tested the specificity of hItln-1 in an assay that relies on multivalency. Specifically, we conjugated multiple copies of each saccharide tested to bovine serum albumin (BSA), and assessed its ability to bind hItln-1 in an enzyme-linked immunosorbent-like assay (ELISA,

Figure 2-3a).33 We observed the same general trends in monosaccharide binding as in the monovalent assay (Figure 2-3a-b), but the selectivity in this assay was much higher. We detected no measurable binding to hItln-1 by ELISA for L,D- or D,D-heptose (Figure 2-3b).

Figure 2-3. Evaluation of BSA-conjugated sugars as ligands for hItln-1 using ELISA. (a) Schematic of ELISA-like assay used to test binding of hItln-1 to immobilized monosaccharide ligands. Bovine serum albumin (BSA)-conjugated sugars were coated onto a plate and incubated with various concentrations of hItln-1. Binding to strep-tagged hItln-1 is detected via the enzyme horseradish peroxidase (HRP) conjugated to an anti-strep tag antibody reacting with a chromogenic HRP substrate. (b) Evaluation of BSA-conjugated heptose sugars as ligands for hItln-1 compared to β-Galf (positive control) and Neu5Ac (negative control). (c) Evaluation of BSA-conjugated KO, KDO, β-Galf, and L,D-heptose. Kd values for KO, KDO, β-Galf are 5.1, 18.3, and 26.2 nM, respectively. Kd values could not be determined for L,D-heptose or D, D-heptose. Data in (b) and (c) are shown as mean ± SEM (n = 2 technical replicates) and fit to a single-site binding equation (solid lines). OD, optical density.

60 Our data indicate KO and KDO, along with β-Galf and glycerol 1-phosphate, are the most relevant monosaccharide ligands from the observed glycan array hits. They are bound by hItln-1 with higher affinity than the heptoses. The weaker inhibitory activities of L,D-heptose and D,D- heptose point to the importance of hydroxyl group stereochemistry. We postulated that the effect of stereochemistry arises from stereoelectronic effects that dictate side chain conformation and therefore hItln-1 binding.

2.3.2 Structure of hItln-1 bound to allyl-KO

To understand the molecular mechanisms underlying saccharide selectivity by hItln-1, we solved the structure of allyl-α-KO bound to hItln-1 by X-ray crystallography (Figure 2-4, Table

2-2). As we reported for Apo-hItln-1 (PDB entry 4wmq) and allyl-β-D-Galf-bound hItln-1

(4wmy), the asymmetric unit of the allyl-α-KO-bound hItln-1 structure contains two monomers.25 We did not observe sufficient density to model the N-terminal residues and the inter-chain disulfide bridge between residues Cys31 and Cys48; however, the crystal packing of the monomers is consistent with two unique trimers arranged by a crystallographic three-fold axis (Figure 2-4a). The binding pocket of the monomer in chain A is solvent exposed, whereas that of the monomer in chain B is oriented such that the bound KO contacts surface residues of the chain A monomer. Although the ligands are bound in nearly identical conformations (0.28 Å

RMSD over 19 atoms), we focused on chain A for analysis of the bound ligand. As with β-Galf, hItln-1 binds allyl-α-KO via recognition of its exocyclic vicinal diol, with the O7 and O8 hydroxyl groups of allyl-α-KO coordinating to the calcium ion present in the hItln-1 binding site

(Figure 2-4b). In comparison to β-Galf, allyl-α-KO engages in an additional hydrogen bond between the KO C1 carboxylate group and the indole nitrogen of the hItln-1 Trp288 residue

(Figure 2-4c), which could contribute to the higher affinity observed for KO than β-Galf. From

61 the IC50 values, the free energy of binding for KO relative to β-Galf was estimated as ΔΔG = –

0.6 kcal/mol, which is on the scale of stabilization gained by a typical hydrogen bond.

Figure 2-4. Structure of hItln-1 bound to allyl-α-KO. (a) Complex of hItln-1 trimer and allyl-α-KO. The lectin monomers are depicted in green, wheat, or light blue; the allyl-α-KO in black; calcium ions in green; intra-monomer disulfides in yellow; and ordered water molecules in the binding site in red. The trimeric structure is produced from chain A in the asymmetric unit by a three-fold crystallographic operation. (b) The carbohydrate-binding site of hItln-1 with allyl-α-KO bound. Residues involved in calcium ion coordination and ligand binding are noted. Dashed lines show heptavalent coordination of the calcium ion. (c) Rotation of the binding pocket shows a hydrogen bond between KO and W288, depicted by a dashed line (rNH···O = 2.9 Å). Electron density of final structures of KO bound to chain A and Chain B shown with the the 2Fo − Fc map contoured at 1.0σ (d) and 3.0σ (e). Maps were prepared with phenix.refine and visualized with PyMOL.

The observed ligand density of allyl-α-KO allowed unambiguous assignment of the bound ligand conformation (Figure 2-4d). The pyranose ring puckering of KO can be described by the Cremer–Pople parameters q, f, and Q.34 In chain A of hItln-1, KO has q, f, and Q values

62 5 4 of 9.5°, 264.6°, and 0.58 Å respectively, indicating a near ideal C2 ( C1) chair conformation.

The calcium-coordinating exocyclic vicinal diol is in the gauche conformation, with a dihedral angle of 51°. The torsion angle around the C6–C7 bond is trans–gauche (tg) and that around the

C7–C8 bond is gauche–trans (gt). The observed conformation of the diol side chain allows the pyranose ring of KO to fit in the hItln-1 binding pocket and engage nearby polar and aromatic side chains through hydrogen bonding and CH–π interactions.

63 Table 2-2. Data collection and refinement statistics for the crystal structure of hItln-1 bound to allyl-α-KO Allyl-α-KO bound hItln-1 PDB code 6USC

Data Collection X-ray source 23-ID-B Detector Eiger-16m Wavelength, Å 1.033202 Resolution, Å 48.25–1.59 (1.79–1.59)

Space group P 213 a, b, c (Å) 118.14, 118.14, 118.14 α, β, γ (°) 90, 90, 90 No. of Reflections 2,622,179 (70,740) No. Unique Reflections 73,131 (6,680) Redundancy 35.8 (10.6) Mean I/σ 16.9 (1.9) Completeness 99.2 (92.0)

Rmeas 0.1384 (0.8223)

Rmerge 0.1403 (0.8629)

Rpim 0.02237 (0.2486)

CC1/2 0.999 (0.514) Wilson B-factor 18.3

Refinement Working set 73,130 (6680) Test set 3,522 (338)

Rwork 0.1582 (0.2666)

Rfree 0.1822 (0.3098) RMS deviation bond lengths (Å) 0.007 RMS deviation bond angles (°) 0.86 Protein residues 558 Total number of atoms 4811 Protein 4438 Allyl-a-KO 47 Solvent 326 Mean B-factor (Å2) 21.94 Protein 21.35 Allyl-a-KO 29.38 Solvent 28.88

Ramachandran favored, allowed, outliers (%) 97, 3, 0 Values in parentheses are for highest-resolution shell.

64 2.3.3 Bioinformatic analysis of glycan conformation

As stated earlier, glycans generally adopt their low-energy conformation when they bind to lectins. We therefore analyzed the Protein Data Bank (PDB) to assess the bound conformations of glycans containing exocyclic vicinal diols. We performed a structure-based chemical component search of the PDB for structures with a resolution of ≤ 2.0Å containing pyranose or furanose ligands with an exocyclic vicinal diol side chain at C5 or C4, respectively.

We eliminated sialic acids from our search, as we previously found that hItln-1 does not bind N- acetyl neuraminic acid due to steric clash between the carboxyl group of the ligand the hItln-1

E274 side chain.25 Our initial search yielded 121 hits. However, because crystallographic data on carbohydrates is notoriously prone to errors,17 we manually assessed the electron density of each glycan. We eliminated 24 structures due to incomplete electron density for the exocyclic vicinal diol or participation in a covalent linkage. The remaining 97 structures included nine unique pyranoses and two furanoses (Table 2-S1).

We next determined the dihedral angles of the exocyclic diol side chain in glycans with either an axial (KO-like) or equatorial (heptose-like) hydroxyl at the C4 position (Figure 2-5).

Calcium coordination by a saccharide demands that the two side chain hydroxyl groups are gauche; therefore, we focused on the relative orientation of the first hydroxyl group of the side chain (i.e., the C6–C7 rotamer in KDO and KO or the C5–C6 rotamer in the heptoses) to the ring. We assigned the rotamer using the convention employed for hexoses in the pyranose form, in which the orientation (gauche (g) or trans (t)) of the proximal side chain hydroxyl group is listed first relative to the ring C–O bond (g or t) and then the ring C–C bond (i.e., tg, gg, or gt;

Figure 2-5a). For example, the structure of KO bound to hItln-1 reported herein was resolved in the tg conformation as the C7 hydroxyl group is trans to the ring C–O bond but gauche to the

65 C5–C6 bond. We also documented interactions between the exocyclic vicinal diol and protein side chains, ligands, or metal ions to account for potential intermolecular conformational influences.

Figure 2-5. Bioinformatic analysis of exocyclic a gauche gauche trans vicinal diol-containing glycans in the PDB. (a) OH H CH2OH O gauche O trans O gauche Newman projections showing the gg, tg, and gt conformations of the proximal hydroxyl of the HOH2C H HO CH2OH H OH H H H exocyclic vicinal diol (highlighted in blue, gg tg gt green or red) with respect to the C5–O bond, first, and the C5–C4 bond second (cf, C4–O and b Pyranose Furanose C4–C3 for furanoses). (b) The most prevalent H OH HO HO proximal hydroxyl conformation of the three HO OR HO H O *H O *HO O HO classes of carbohydrate in our analysis are shown. For pyranoses with an equatorial OR OR HO OH hydrogen, 65/67 structures are tg; pyranoses *equatorial H *equatorial OH (KO/KDO-like) (heptose-like) (Galf-like) with an equatorial hydroxyl, 21/24 structures 65/67 tg 21/24 gg 5/6 gg are gg; and furanoses with an equatorial hydroxyl, 5/6 structures are gg.

We identified distinct differences in the favored conformations of the heptoses versus those of KDO and KO. Analysis of structures with KO or KDO indicate these saccharides share a strong conformational preference. Of 67 structures, 65 had the proximal hydroxyl group of the side chain in the tg conformation (Figure 2-5, Table 2-3, Table 2-S1). The aberrant structures were in the gg conformation – a preference driven by the simultaneous coordination of KDO carboxylate and the axial hydroxyl and C7 side chain hydroxyl groups to a calcium ion.35 The predominant tg conformation is that observed for KO bound to hItln-1. A similar analysis of the heptose sugars revealed their conformational preferences differ from those of KDO and KO. Of the 24 heptose structures, the majority include L,D-heptose in a proximal gg conformation (21), while three had D,D-heptose in the gt conformation (Figure 2-5, Table 2-3). These observations indicate that the configuration of the side chain hydroxyl group impacts the conformation and that the predominant conformations of heptoses and KDO/KO diverge dramatically.

66 Table 2-3. Summary of conformational analysis results Proximal rotamer Number of instances Distal rotamer Number of instances Pyranose, equatorial hydrogen (67 structures) gg 2 gg 0 tg 2 gt 0 tg 65 gg 34 tg 2 gt 29 gt 0 0 Pyranose, equatorial hydroxyl group (24 structures) gg 21 gg 0 tg 2 gt 19 tg 0 0 gt 3 gg 0 tg 0 gt 3 Furanose (6 structures) gg 5 gg 1 tg 3* gt 2* tg 0 0 gt 1 gg 0 tg 0 gt 1 *one structure showed alternate conformations with distal hydroxyl in tg and gt

We also analyzed furanose structures, though far fewer were available. The majority of proximal hydroxyl groups in these cases (five of six) occupied the gg conformation (Table 2-3).

The one exception had three hydrogen bonds involving the proximal hydroxyl group, likely influencing the conformation.36 The preferred gg conformation is that present in the structure of hItln-1 bound to β-Galf. The bioinformatic analysis was striking in that each saccharide residue was found to adopt a preferred conformation in which the orientation of the glycan side chain and the pyranose or furanose was defined.

67 2.3.4 Computational analysis of glycan conformation and recognition

We examined whether the preferred conformations of the heptoses would be compatible with hItln-1 binding by employing modeling and computational methods. We extracted coordinates of L,D- and D,D-heptose bound to surfactant protein D (SP-D; 2rib and 2ria, respectively),37 a C-type lectin. The heptose-bound conformations in these structures are representative of those that predominated in our PDB analysis. We aligned the exocyclic vicinal diol with that of KO bound to hItln-1 by rigid-body superimposition. The preferred conformers of each heptose would experience significant steric repulsion from the protein surface, precluding binding to hItln-1 (Figure 2-6). By rotating around the C5–C6 exocyclic bond, we identified single conformations of each that are permissive for binding (Figure 2-6). The gt conformer of L,D-heptose is the sole rotamer that fits in the binding pocket without steric clashes.

For D,D-heptose, the gg conformer is the only one accommodated. The heptose conformations were not those that dominated in our bioinformatic analysis (i.e., gt for D,D-heptose, and gg for

L,D-heptose). Thus, our analysis predicts the affinities observed for hItln-1 binding.

68

Figure 2-6. Observed and accommodated ligand conformations in hItln-1 binding site. (a) KO can bind to hItln-1 in the tg/tg conformation without steric interactions, as observed in the structure of the complex (Figure 2-4). Alignment of the exocyclic vicinal diol of L,D-heptose (b) or D,D-heptose (c) affords steric interaction between each ligand and W288, depicted by red spheres. Rotation about the proximal bond of the exocyclic vicinal diol side chain gives rise to ligand conformations accommodated by the hItln-1 binding site without steric interaction. Ligands are shown in white sticks, and red spheres represent the van der Waals radii of atoms with significant interactions. Observed conformations of L,D- heptose and D,D-heptose were extracted from PBD: 2rib and 2ria, respectively.

The observed differences in side chain conformation between KO/KDO and the heptoses suggested that substituents on the saccharide ring are influential. KDO and KO have an axial hydroxyl group at the 5-position, while the corresponding C4 hydroxyl group in the heptoses is equatorial. We postulated that this position would impart sterics and stereoelectronic effects. To this end, we performed all-atom optimizations of the observed conformation of the heptose structures as well as the conformers accommodated by the hItln-1 binding site using density functional theory (DFT). We reasoned that the preferred conformers would be stabilized by hyperconjugation, so we employed natural bonding orbital (NBO) analysis to assess the energies of the various rotamers. We examined whether altered donation to the ring σ*C–O antibonding

69 orbitals might influence side chain orientation. For the heptoses, we summed the interaction energies for donation to the ring oxygen (C5–O) and the proximal exocyclic hydroxyl group

(C6–O) σ*C–O orbitals for the observed and hItln-1-accommodated structures (Figure 2-7, Table

2-4). We also added the interaction energies with the exchanged donors and alternative acceptors

(Table 2-4). In this way, we estimate that the gt conformer (the major rotamer observed in the

PDB) and gg (hItln-1 accommodating) are similar in energy (within 0.2 kcal/mol) for D,D- heptose but L,D-heptose prefers the gg (by 3.3 kcal/mol). The computationally-derived values in

Table 2-4 are on par with previously published values.38 The low energy L,D-heptose conformation is incompatible with hItln-1 complexation. This analysis is consistent with the binding data indicating D,D-heptose is a better hItln-1 ligand. We also performed all-atom optimizations with DFT and NBO analysis of the hItln-1 ligand, KO (Figure 2-7, Table 2-S1).

KO benefits from a fixed equatorial C–H bond at C5, which can donate into the ring σ*C–O orbital. The side chain therefore adopts a conformation that minimizes steric interactions and is aligned for hItln-1 binding. These calculations predict the low-energy conformation found in the

PDB and the hItln-1 structure.

70

Figure 2-7. Stabilizing stereoelectronic effects of preferred rotamers of the proximal side chain C–C bond of KO, L,D-heptose, and D,D-heptose. Preferred rotamers are based on predominant conformations in published PDB structures. Relevant orbitals involved in stabilizing these conformations are represented in blue (top). NBO renderings of significant σC–H→σ*C–O interactions are depicted with blue and yellow orbitals (bottom). Rotamers shown were atom-optimized at the M06- 2X/6-311+G(d,p) level of theory employing the IEFPCM solvation model. For clarity, two separate renderings are shown for each stabilizing interaction in the preferred gg L,D-heptose conformer. Comparisons of these interactions aid in explaining the differences in affinity of each monosaccharide to hItln-1.

71

Table 2-4. NBO Donor-acceptor interaction energies and calculated ΔENBO of bond rotation. Glycan conformation Interaction Energy (kcal/mol)

σC19−H20→σ*C2−O9 5.34

σC1−C2→σ*C19−O24 2.42

DD-heptose-gtgt σC19−C21→σ*C2−H6 1.95

σC2−H6→σ*C19−C21 3.54 Sum of all interactions 13.25

σC19−C21→σ*C2−O9 2.72

σC2−H6→σ*C19−O24 4.93

DD-heptose-gggt σC19−H20→σ*C1−C2 3.93

σC1−C2→σ*C19−H20 1.48 Sum of all interactions 13.06 DD-heptose ΔE 0.19 NBO

σC19−H33→σ*C2−O9 4.73

σC19−H20→σ*C1−C2 2.06

LD-heptose-gggt σC2−H6→σ*C19−O34 4.52

σC1−C2→σ*C19−C20 1.64 Sum of all interactions 13.25

σC19−H33→σ*C2−H6 2.86

σC19−C20→σ*C2−O9 2.15

LD-heptose-gtgt σC2−H6→σ*C19−H33 2.57

σC1−C2→σ*C19−O34 2.12 Sum of all interactions 9.7

LD-heptose ΔENBO 3.25

72 2.4 Discussion

Lectin–glycan interactions are critical in innate immunity. We are intrigued by the possibility that host lectins detect and control microbial populations. To understand and predict protein–carbohydrate recognition, we analyzed hItln-1–glycan interactions to elucidate features of sugars that influence lectin affinity. HItln-1 recognizes multiple glycans that possess glyceryl side chains. As a consequence, studying hItln-1 provides an opportunity to examine the interplay of saccharide steric and stereoelectronic effects that influence lectin binding. We determined that heptoses bind more weakly than KO, KDO, or β-Galf. That heptoses and N-acetylneuraminic acid are poor hItln-1 ligands underscores that a terminal 1,2-diol is not sufficient. HItln-1 can distinguish between saccharides that possess the critical glyceryl group. Among carbohydrates exclusively utilized by microbes, L,D-heptose is the most commonly observed monosaccharide building block.31 Indeed, this saccharide is widely distributed throughout Gram negative bacteria and is a critical component of many lipopolysaccharides. The ability of hItln-1 to discriminate against L,D-heptose could augment the lectin’s selectivity for distinct microbial species.

A molecular understanding of X-type lectin binding emerged with the determination of the structures of hItln-1 bound to allyl-β-D-Galf 25 and the frog lectin XEEL complexed with glycerol 1-phosphate.26 In each of these structures, the terminal 1,2-diol of the ligand coordinates to the protein-bound calcium ion. The structure of hItln-1 bound to KO reinforces the importance of exocyclic diol coordination. This structure alone, however, does not explain why L,D-heptose binds so weakly. The molecular basis for the disparity in affinity between L,D- and D,D-heptose epimers was not apparent. Because saccharides that are pre-organized for lectin binding (i.e., can interact via their low-energy conformation) should be the most effective ligands, we

73 hypothesized that the low-energy conformations of KO and KDO would enable binding but that of L,D-heptose would be incompatible.

To test this hypothesis, we employed a meta-analysis of high-resolution structures in the

PDB. We examined the structures of candidate glycans that possess the terminal 1-2-diol, including KO, KDO, and the heptoses. Although the monosaccharide units were crystallized in a variety of different chemical environments, distinct conformations predominated. The ulosonic acids KO and KDO (and related sugars) are almost always (97%) in the conformation observed in the hItln-1·KO complex. In contrast, the prevalent heptose conformations are not those that are poised for hItln-1 binding.

Glucose and galactose, which are epimeric at the C4 position, have been observed to populate different ω angles driven by sterics, solvent interactions, and stereoelectronic effects 14,

39 . The data presented herein suggest that in monosaccharides with higher substitution at the C6- equivalent position, the conformation of the proximal side chain hydroxyl group depends on the configuration of the adjacent ring hydroxyl group (C4 in heptoses; C5 in KO/KDO) and the stereochemistry of the exocyclic hydroxyl group itself. The preferences observed are reinforced not only by sterics but also by stereoelectronic effects. In carbohydrates, the ring C-O bond is most electron deficient due to the anomeric nO→σ*C–O interaction and the inductive effect. The system is stabilized by donation into the ring σ*C–O orbital, which can come from an electron- rich σC–H orbital. In KO/KDO, the equatorial C5–H is aligned for hyperconjugation. The proximal side chain bond then adopts the sterically preferred tg conformation. Here, the alignment of the smallest group (H) with the axial hydroxyl group is preferable to a 1,3-diaxial interaction of hydroxyl groups. Thus, stereoelectronic and steric effects stabilize the hItln-1- bound conformation, and this conformation is present in 65 of 67 PDB structures.

74 In contrast to KDO and KO, L,D- and D,D-heptose have an equatorial hydroxyl group at

C4. Accordingly, the conformation that provides hyperconjugative stabilization is one in which the C–H bond of the side chain is aligned to donate electron density into the ring σ*C–O orbital.

Our analysis of PDB structures and the NBO calculations indicate that the gg rotamer is the preferred and prevalent conformation of L,D-heptose. The L,D-heptose gg conformation is stabilized further by a second orbital overlap of the C5 σC–H orbital of the ring to the C6 side chain hydroxyl σ*C–O orbital. This rotamer also avoids unfavorable syn pentane interactions that would occur in other conformations. Superposition-based docking of the preferred conformation of L,D-heptose ligand into the binding site indicates a steric clash with Trp288 and a poor fit. Our in silico conformational analysis predicts that the heptose conformer that could accommodate hItln-1 binding is energetically disfavored.

For D,D-heptose, no single rotamer both avoids unfavorable steric interactions and capitalizes on stabilizing stereoelectronic interactions. Indeed, the gt and gg conformations are roughly equal in calculated energy. The NBO analysis accounts for the electronic contribution, however, the gg conformation of D,D-heptose would experience a steric effect similar to a 1,3- diaxial interaction and has a C–C rather than C–H bond donating into the σ*C–O. Of the two D,D- heptose rotamers favored by stereoelectronics, binding of the gt rotamer is precluded by a steric clash with Trp288, whereas the gg rotamer could be docked in the hItln-1 binding site. These analyses would predict that D,D-heptose is a better hItln-1 ligand than is L,D-heptose, a prediction consistent with the binding data. Still, D,D-heptose lacks the benefits of pre-organization intrinsic to KO and KDO. The difference in rotamer preference observed for each of the bacterial glycans studied herein is consistent with previous observations that side chain conformation influences glycosylation reactions.15

75 The difference in affinity for monomeric D,D-heptose versus β-Galf binding appears small, yet we see no binding to immobilized D,D-heptose in an ELISA (Figure 2-3). The discrimination between D,D-heptose and β-Galf arises from multivalent interactions, whereby small differences in binding are amplified.30 Thus, through stereoelectronic effects and multivalent binding hItln-1 binds selectivity to surfaces displaying KDO/KO, glycerol 1- phosphate, and β-Galf.

We find that hItln-1 binding to microbial sugars is determined by the recognition of an exocyclic diol, but our results indicate that the diol conformation and its relationship relative to the saccharide ring are major determinants of selectivity. The differential recognition of monosaccharides we observe could not have been predicted from the glycan array results alone.

Many other lectins have known monosaccharide ligands or binding epitopes but a detailed analysis of the structural and conformational constraints that determine affinity and selectivity is lacking. For example, L-ficolin is proposed to bind glycans with a simple acetyl group motif, but this lectin displays a range of affinities across monosaccharides containing this epitope.40, 41 Our molecular analysis afforded a more complete ligand binding profile of hItln-1, which can guide future investigations into the function of this lectin and provide a basis for understanding the specificity of lectin–glycan interactions.

2.5 Conclusion

Our studies illuminate how saccharide conformation influences lectin specificity.

Whereas carbohydrates are often viewed as conformationally flexible molecules, stereoelectronic effects, such as the gauche effect, are critical determinants of conformational preferences.

Because lectins typically bind the low-energy conformations of glycans, we used X-ray crystallography and bioinformatic analysis to assess the favored rotamers of the ulosonic acids

76 KO and KDO, β-Galf, and the heptoses. Though the differences in binding affinity were subtle on the monosaccharide level, these preferences were amplified upon the binding of the trimeric hItln-1 to a surface displaying a target ligand. In such a multivalent assay, hItln-1 bound to β-

Galf, KDO, and KO but not to either L,D-heptose or D,D-heptose. We anticipate that this specificity is critical for the physiological function of hItln-1 and will aid in the prediction, analysis, and generation of synthetic lectin ligands. These advances in our understanding of lectin – glycan interactions should also facilitate the generation of effective lectin inhibitors.

77 2.6 Materials and Methods

2.6.1 Recombinant protein expression

Recombinant hItln-1 with an N-terminal Strep-tag II (IBA Lifesciences) was expressed via transient transfection of suspension adapted HEK293 cells as previously described.25

Purification was performed as described using Strep-Tactin Superflow high capacity resin (IBA

Lifesciences, cat. no. 2-1208-002). The concentration of Strep-hItln-1 was determined by absorbance at 280 nm, with a calculated e = 239,775 M–1cm–1 and a molecular mass of 102,024

Da for the disulfide-linked trimer.

Protein for X-ray crystallography: Strep-hItln-1 was expressed and purified as previously described with minor modifications.25 Conditioned expression medium was harvested by centrifugation and sterile filtration. The culture medium was then adjusted to pH 6.7 by slow addition of 0.1 M sodium hydroxide, avidin was added per the IBA protocol, calcium chloride was added to 10 mM, and the solution was cleared by centrifugation. Protein purification was performed by capture onto Strep-Tactin Superflow high-capacity resin, washed with 20 mM bis-

Tris, pH 6.7, 150 mM sodium chloride, and 1 mM EDTA; eluted with 5 mM d-desthiobiotin

(Sigma) in 20 mM bis-Tris, pH 6.7, 150 mM sodium chloride, and 1 mM EDTA; and concentrated with a 30,000-MWCO Amicon Ultra centrifugal filter.

2.6.2 Chemical synthesis of glycans

Detailed information on synthesis and characterization of the glycans used in this work can be found in the Supplementary Information of the original published version of this chapter

(https://doi.org/10.1021/jacs.9b11699).

2.6.3 Biolayer interferometry (BLItz)

78 IC50 values for monosaccharides with hItln-1 were determined using a BLItz instrument

(ForteBio). Biotin-β-Galf was loaded (300 s) onto streptavidin biosensors as a 5 µM solution in phosphate-buffered saline (PBS). The sensor was washed in HEPES-T + BSA buffer for 60 s (20 mM HEPES, pH 7.4, 150 mM sodium chloride, 10 mM calcium chloride, 0.1% Tween-20, 0.1%

BSA). hItln-1 was then associated (15 µg/mL in association buffer) for 300 s in the presence of various concentrations of competitor monosaccharide (0 to 100 mM), followed by dissociation in

HEPES-T + bovine serum albumin (BSA) buffer for 90 s. The shake rate was set to 1000 rpm throughout the experiment and all reagents were used at room temperature. Data was adjusted based on a reference curve (no hItln-1 in association step). The last 10 s of association (710-720 s) were averaged together, normalized, and plotted as a curve of % binding of hItln-1 to β-Galf vs. log[competitor] (mM). Data were analyzed in Prism8 (GraphPad) and fitted to a one-site logIC50 equation.

2.6.4 ELISA

Monosaccharides conjugated to BSA were adsorbed onto a Maxisorp (Nunc) flat- bottomed 96-well plate in phosphate buffered saline (PBS) (1.5 µM by sugar concentration present) and incubated at room temperature for 1 h. After washing with PBS (3 × 5 min), wells were then blocked with 5% w/v BSA in HEPES-T buffer for 2 h. The plate was washed with

HEPES-T buffer (3 × 5 min) and then Strep-hItln-1 solutions prepared by serial dilution into

HEPES-T + BSA buffer were added to wells for 2 h at room temperature. Wells were washed 4 times with HEPES-T buffer and then incubated with StrepMAB-Classic HRP conjugate (IBA, cat. No. 2-1509-001; 1:10,000 dilution in HEPES-T + BSA) for 2 h at room temperature for detection of the Strep-tag II of bound hItln-1. Wells were washed with HEPES-T (3 × 5 min) and hItln-1 was detected colorimetrically by addition of 1-Step Ultra TMB-ELISA and quenching

79 with an equal volume of 2 M sulfuric acid. Plates were read at 450 nm on an ELx800 plate reader

(Bio-Tek). Data were analyzed in Prism8 (GraphPad) and fitted to a one-site binding equation.

2.6.5 Protein X-ray crystallography

Protein Crystallization: The Strep-hItln-1 protein that was purified with 20 mM bis-Tris, pH 6.7, was concentrated to 1.5 mg/mL, 1 M CaCl2 was added to 10 mM. Crystallization was performed in 100 mM bis-Tris, pH 6.0, and 25% PEG 3350 (hanging drop) as previously described 25. Crystals grew to full size in 5 weeks, and additional crystals continued to appear over the next 2 months. The allyl-a-KO complex was formed by soaking of apo-hItln-1 crystals in cryoprotection solution (100 mM bis-Tris, pH 6.0, 35% PEG 3350) supplemented with 50 mM allyl-a-KO for 7 days prior to cryopreservation.

X-Ray Diffraction: Single crystal diffraction experiments were performed at beamline 23-

ID-B equipped with Dectris Eiger-16m detector (GM/CA, Advanced Photon Source, Argonne

National Laboratory). Data were indexed and integrated with DIALS and scaled with Aimless.42

Details of X-ray diffraction experiments and ensuing data are found in Table 2-2.

Structure Solution and Refinement: The monomer from chain A of the hItln-1 apo structure

(4wmq) with alternative conformations removed was used as an input for molecular replacement.

Molecular replacement was conducted using Phaser implemented in PHENIX.43 Model building including the fitting of protein, solvent, and ligands was conducted with COOT.44 Refinement was conducted with phenix.refine using PHENIX. The initial model of the ligand, prop-2-en-1-yl

D-glycero-alpha-D-talo-oct-2-ulopyranosidonic acid (KO), was obtained from the Protein

Databank (PDB) (ligand code: ko2). Restraints were prepared with eLBOW as implemented in

PHENIX. The Protein Data Bank accession code for the deposited coordinates and structure factors of hItln-1 bound to allyl-KO is 6USC.

80 2.6.6 Bioinformatics

The PDB was queried for pyranoses and furanoses containing an exocyclic diol at the C5 or C4 position, respectively, using the structure-based chemical component search tool. Resulting structures resolved at ≤ 2.0Å were analyzed manually to determine completeness of ligand density and assignment of the exocyclic diol conformation. First, the 2Fo – Fc map contoured at

1.5σ was examined for each individual carbohydrate ligand using the Electron Density Map feature of the 3D View tool found on the PDB website to ensure electron density was present for the side chain and that the ligand was accurately fit to the density. If the electron density was absent for part of the molecule, the structure was omitted from our analysis. Additionally, any ligands where either hydroxyl of the exocyclic diol was participating in a covalent linkage were omitted. Next, the conformation of the exocyclic diol was recorded (Table 2-S1) for the proximal hydroxyl group (i.e., the one most proximal to the ring) and the distal hydroxyl group

(i.e., the terminal one). Each rotamer is assigned by the gauche or trans orientation of the O-C-C-

O and O-C-C-C dihedral angels using the gauche-trans (gt), gauche-gauche (gg), trans-gauche

(tg) annotation. Finally, interactions autonomously identified using the Ligand View feature of the 3D View tool were recorded to account for potential influences on conformation (Table 2-

S1).45

2.6.7 Computational Analysis

DFT optimization with Gaussian: Full structure optimizations were conducted with

Gaussian 16, Revision A.03 software from Gaussian (Wallingford, CT).46 Initial structures were prepared with GaussView 6.0 (Gaussian). Optimizations were conducted at the M06-2X/6-

311+G(d,p) level of theory employing the IEFPCM solvation model. Structures with several intramolecular hydrogen bonding patterns were optimized for the sugar in the conformation

81 found in co-crystal structures with lectins to identify the lowest energy hydrogen atom conformation. Additional models were prepared to access the consequences of changing the conformation of non-hydrogen atoms. Again, several conformations were sampled for the hydrogens on mobile atoms to identify the minimum energy conformer. Minimized structures were confirmed to lack imaginary frequencies.

Natural Bonding Orbital Analysis: NBO analysis was conducted using NBO 6.0 software.47 Reported energies for donor–acceptor interactions were calculated by second-order perturbation analysis. Changes in energy for bond rotation (∆ENBO) were calculated by summing the energies of donor-acceptor interactions between NBOs associated with atoms of the rotated bonds that have altered interactions during the rotation and finding the difference in unique stabilizing energy between the two rotamers (Table 2-4). 3D images of orbital overlaps were rendered using NBOView 1.1 software.

2.7 Funding Sources

Research reported in this publication was supported by the National Institute of Allergy and Infectious Diseases under grant number R01 AI055258 (LLK), the National Cancer Institute under grant number U01 CA2310789 (LLK), the National Institute of General Medical Sciences

R01 GM044783 (RTR), and the Austrian Science Fund FWF (P 28826-N28, PK). We acknowledge NIGMS F32 GM125165 (CMM) and National Science Foundation Graduate

Research Fellowship Program 1122374 (CRI) for fellowships.

82 2. 8 Acknowledgements

We thank Craig Bingman [Collaborative Crystallography Core in the Department of

Biochemistry, UW–Madison] for data collection and Craig Ogata [GM/CA@APS] for beamline support. This research has made use of GM/CA@APS, which has been funded by National

Cancer Institute (ACB-12002) and the NIGMS (AGM-12006). This research used resources of the Advanced Photon Source, a U.S. Department of Energy (DOE) Office of Science User

Facility operated for the DOE Office of Science by Argonne National Laboratory under Contract

No. DE-AC02-06CH11357. The Eiger 16M detector was funded by an NIH–Office of Research

Infrastructure Programs, High-End Instrumentation Grant (1S10OD012289-01A1). This work utilized the Molecular Graphics and Computation Facility at the UC-Berkeley, which is funded by Grant NIH S10OD023532. We also thank Prof. Barbara Imperiali for providing generous access to equipment and Bob Dass (Pall) for advice and assistance with instrumentation.

83 2.9 References

1. Lee, Y. C.; Lee, R. T., Carbohydrate-Protein Interactions: Basis of Glycobiology. Accounts of Chemical Research 1995, 28 (8), 321-327.

2. Bewley, C. A., Protein-Carbohydrate Interactions in Infectious Diseases. RSC Publishing: Cambridge, United Kingdom, 2006.

3. Lis, H.; Sharon, N., Lectins: Carbohydrate-Specific Proteins That Mediate Cellular Recognition. Chemical Reviews 1998, 98 (2), 637-674.

4. Varki, A., Biological roles of glycans. Glycobiology 2017, 27 (1), 3-49.

5. Weis, W. I.; Drickamer, K., Structural basis of lectin-carbohydrate recognition. Annual Review of Biochemistry 1996, 65, 441-473.

6. Elgavish, S.; Shaanan, B., Lectin-carbohydrate interactions: different folds, common recognition principles. Trends in Biochemical Sciences 1997, 22 (12), 462-467.

7. Gabius, H.-J.; André, S.; Jiménez-Barbero, J.; Romero, A.; Solís, D., From lectin structure to functional glycomics: principles of the sugar code. Trends in Biochemical Sciences 2011, 36 (6), 298-313.

8. Park, S.; Gildersleeve, J. C.; Blixt, O.; Shin, I., Carbohydrate microarrays. Chemical Society Reviews 2013, 42 (10), 4310-4326.

9. Wesener, D. A.; Dugan, A.; Kiessling, L. L., Recognition of microbial glycans by soluble human lectins. Current Opinion in Structural Biology 2017, 44, 168-178.

10. Imberty, A.; Pérez, S., Structure, Conformation, and Dynamics of Bioactive Oligosaccharides: Theoretical Approaches and Experimental Validations. Chemical Reviews 2000, 100 (12), 4567-4588.

11. Asensio, J. L.; Ardá, A.; Cañada, F. J.; Jiménez-Barbero, J., Carbohydrate-Aromatic Interactions. Accounts of Chemical Research 2013, 46 (4), 946-954.

12. Hudson, K. L.; Bartlett, G. J.; Diehl, R. C.; Agirre, J.; Gallagher, T.; Kiessling, L. L.; Woolfson, D. N., Carbohydrate-Aromatic Interactions in Proteins. Journal of the American Chemical Society 2015, 137 (48), 15152-15160.

13. Kiessling, L. L., Chemistry-driven glycoscience. Bioorganic & Medicinal Chemistry 2018, 26 (19), 5229-5238.

14. Woods, R. J., Predicting the Structures of Glycans, Glycoproteins, and Their Complexes. Chemical Reviews 2018, 118 (17), 8005-8024.

15. Moumé-Pymbock, M.; Furukawa, T.; Mondal, S.; Crich, D., Probing the Influence of a 4,6-O-Acetal on the Reactivity of Galactopyranosyl Donors: Verification of the Disarming

84 Influence of the trans–gauche Conformation of C5–C6 Bonds. Journal of the American Chemical Society 2013, 135 (38), 14249-14255.

16. Wormald, M. R.; Petrescu, A. J.; Pao, Y.-L.; Glithero, A.; Elliott, T.; Dwek, R. A., Conformational Studies of Oligosaccharides and Glycopeptides: Complementarity of NMR, X- ray Crystallography, and Molecular Modelling. Chemical Reviews 2002, 102 (2), 371-386.

17. Agirre, J.; Davies, G.; Wilson, K.; Cowtan, K., Carbohydrate anomalies in the PDB. Nature Chemical Biology 2015, 11, 303.

18. Nivedha, A. K.; Makeneni, S.; Foley, B. L.; Tessier, M. B.; Woods, R. J., Importance of ligand conformational energies in carbohydrate docking: Sorting the wheat from the chaff. Journal of Computational Chemistry 2014, 35 (7), 526-539.

19. Suzuki, Y. A.; Shin, K.; Lönnerdal, B., Molecular Cloning and Functional Expression of a Human Intestinal Lactoferrin Receptor. Biochemistry 2001, 40 (51), 15771-15779.

20. Tsuji, S.; Uehori, J.; Matsumoto, M.; Suzuki, Y.; Matsuhisa, A.; Toyoshima, K.; Seya, T., Human Intelectin Is a Novel Soluble Lectin That Recognizes Galactofuranose in Carbohydrate Chains of Bacterial Cell Wall. Journal of Biological Chemistry 2001, 276 (26), 23456-23463.

21. Voehringer, D.; Stanley, S. A.; Cox, J. S.; Completo, G. C.; Lowary, T. L.; Locksley, R. M., Nippostrongylus brasiliensis: Identification of intelectin-1 and -2 as Stat6-dependent genes expressed in lung and intestine during infection. Experimental Parasitology 2007, 116 (4), 458- 466.

22. Barrett, J. C.; Hansoul, S.; Nicolae, D. L.; Cho, J. H.; Duerr, R. H.; Rioux, J. D.; Brant, S. R.; Silverberg, M. S.; Taylor, K. D.; Barmada, M. M.; Bitton, A.; Dassopoulos, T.; Datta, L. W.; Green, T.; Griffiths, A. M.; Kistner, E. O.; Murtha, M. T.; Regueiro, M. D.; Rotter, J. I.; Schumm, L. P.; Steinhart, A. H.; Targan, S. R.; Xavier, R. J.; the, NIDDK IBD Genetics Consortium; Libioulle, C.; Sandor, C.; Lathrop, M.; Belaiche, J.; Dewit, O.; Gut, I.; Heath, S.; Laukens, D.; Mni, M.; Rutgeerts, P.; Van Gossum, A.; Zelenika, D.; Franchimont, D.; Hugot, J.-P.; de Vos, M.; Vermeire, S.; Louis, E.; the Belgian-French IBD Consortium; the Wellcome Trust Case Control Consortium; Cardon, L. R.; Anderson, C. A.; Drummond, H.; Nimmo, E.; Ahmad, T.; Prescott, N. J.; Onnie, C. M.; Fisher, S. A.; Marchini, J.; Ghori, J.; Bumpstead, S.; Gwilliam, R.; Tremelling, M.; Deloukas, P.; Mansfield, J.; Jewell, D.; Satsangi, J.; Mathew, C. G.; Parkes, M.; Georges, M.; Daly, M. J., Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease. Nature Genetics 2008, 40 (8), 955-962.

23. de Souza Batista, C. M.; Yang, R.-Z.; Lee, M.-J.; Glynn, N. M.; Yu, D.-Z.; Pray, J.; Ndubuizu, K.; Patil, S.; Schwartz, A.; Kligman, M.; Fried, S. K.; Gong, D.-W.; Shuldiner, A. R.; Pollin, T. I.; McLenithan, J. C., Omentin Plasma Levels and Gene Expression Are Decreased in Obesity. Diabetes 2007, 56 (6), 1655-1661.

24. Kerr, S. C.; Carrington, S. D.; Oscarson, S.; Gallagher, M. E.; Solon, M.; Yuan, S.; Ahn, J. N.; Dougherty, R. H.; Finkbeiner, W. E.; Peters, M. C.; Fahy, J. V., Intelectin-1 Is a

85 Prominent Protein Constituent of Pathologic Mucus Associated with Eosinophilic Airway Inflammation in Asthma. American Journal of Respiratory and Critical Care Medicine 2014, 189 (8), 1005-1007.

25. Wesener, D. A.; Wangkanont, K.; McBride, R.; Song, X.; Kraft, M. B.; Hodges, H. L.; Zarling, L. C.; Splain, R. A.; Smith, D. F.; Cummings, R. D.; Paulson, J. C.; Forest, K. T.; Kiessling, L. L., Recognition of microbial glycans by human intelectin-1. Nature Structural & Molecular Biology 2015, 22 (8), 603-610.

26. Wangkanont, K.; Wesener, D. A.; Vidani, J. A.; Kiessling, L. L.; Forest, K. T., Structures of Xenopus Embryonic Epidermal Lectin Reveal a Conserved Mechanism of Microbial Glycan Recognition. Journal of Biological Chemistry 2016, 291 (11), 5596-5610.

27. Alexander, C.; Rietschel, E. T., Invited review: Bacterial lipopolysaccharides and innate immunity. Journal of Endotoxin Research 2001, 7 (3), 167-202.

28. Kirschner, K. N.; Woods, R. J., Solvent interactions determine carbohydrate conformation. Proceedings of the National Academy of Sciences of the United States of America 2001, 98 (19), 10541-10545.

29. Kiessling, L. L.; Grim, J. C., Glycopolymer probes of signal transduction. Chemical Society Reviews 2013, 42 (10), 4476-4491.

30. Mortell, K. H.; Weatherman, R. V.; Kiessling, L. L., Recognition Specificity of Neoglycopolymers Prepared by Ring-Opening Metathesis Polymerization. Journal of the American Chemical Society 1996, 118 (9), 2297-2298.

31. Herget, S.; Toukach, P. V.; Ranzinger, R.; Hull, W. E.; Knirel, Y. A.; von der Lieth, C.- W., Statistical analysis of the Bacterial Carbohydrate Structure Data Base (BCSDB): Characteristics and diversity of bacterial carbohydrates in comparison with mammalian glycans. BMC Structural Biology 2008, 8, 35.

32. Mann, D. A.; Kanai, M.; Maly, D. J.; Kiessling, L. L., Probing Low Affinity and Multivalent Interactions with Surface Plasmon Resonance: Ligands for Concanavalin A. Journal of the American Chemical Society 1998, 120 (41), 10575-10582.

33. Fu, Y.; Baumann, M.; Kosma, P.; Brade, L.; Brade, H., A synthetic glycoconjugate representing the genus-specific epitope of chlamydial lipopolysaccharide exhibits the same specificity as its natural counterpart. Infection and Immunity 1992, 60 (4), 1314-1321.

34. Cremer, D.; Pople, J. A., A General Definition of Ring Puckering Coordinates. Journal of the American Chemical Society 1975, 97 (6), 1354-1358.

35. Arunmanee, W.; Pathania, M.; Solovyova, A. S.; Le Brun, A. P.; Ridley, H.; Baslé, A.; van den Berg, B.; Lakey, J. H., Gram-negative trimeric porins have specific LPS binding sites that are essential for porin biogenesis. Proceedings of the National Academy of Sciences of the United States of America 2016, 113 (34), E5034-E5043.

86 36. Horler, R. S. P.; Müller, A.; Williamson, D. C.; Potts, J. R.; Wilson, K. S.; Thomas, G. H., Furanose-specific Sugar Transport: Characterization of a bacterial galactofuranose binding protein. Journal of Biological Chemistry 2009, 284 (45), 31156-31163.

37. Wang, H.; Head, J.; Kosma, P.; Brade, H.; Müller-Loennies, S.; Sheikh, S.; McDonald, B.; Smith, K.; Cafarella, T.; Seaton, B.; Crouch, E., Recognition of Heptoses and the Inner Core of Bacterial Lipopolysaccharides by Surfactant Protein D. Biochemistry 2008, 47 (2), 710-720.

38. Martins, F. A. F., Matheus P., The Fluorine gauche Effect and a Comparison with Other Halogens in 2-Halofluoroethanes and 2-Haloethanols. European Journal of Organic Chemistry 2019, 6401-6406.

39. Barnett, C. B.; Naidoo, K. J., Stereoelectronic and Solvation Effects Determine Hydroxymethyl Conformational Preferences in Monosaccharides. The Journal of Physical Chemistry B 2008, 112 (48), 15450-15459.

40. Krarup, A.; Mitchell, D. A.; Sim, R. B., Recognition of acetylated oligosaccharides by human L-ficolin. Immunology Letters 2008, 118 (2), 152-156.

41. Krarup, A.; Thiel, S.; Hansen, A.; Fujita, T.; Jensenius, J. C., L-ficolin Is a Pattern Recognition Molecule Specific for Acetyl Groups. Journal of Biological Chemistry 2004, 279 (46), 47513-47519.

42. Winter, G.; Waterman, D. G.; Parkhurst, J. M.; Brewster, A. S.; Gildea, R. J.; Gerstel, M.; Fuentes-Montero, L.; Vollmar, M.; Michels-Clark, T.; Young, I. D.; Sauter, N. K.; Evans, G., DIALS: implementation and evaluation of a new integration package. Acta Crystallographica Section D: Biological Crystallography 2018, 74 (Pt 2), 85-97.

43. Adams, P. D.; Afonine, P. V.; Bunkóczi, G.; Chen, V. B.; Davis, I. W.; Echols, N.; Headd, J. J.; Hung, L.-W.; Kapral, G. J.; Grosse-Kunstleve, R. W.; McCoy, A. J.; Moriarty, N. W.; Oeffner, R.; Read, R. J.; Richardson, D. C.; Richardson, J. S.; Terwilliger, T. C.; Zwart, P. H., PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallographica Section D: Biological Crystallography 2010, 66 (Pt 2), 213-221.

44. Emsley, P.; Lohkamp, B.; Scott, W. G.; Cowtan, K., Features and development of Coot. Acta Crystallographica Section D: Biological Crystallography 2010, 66 (Pt 4), 486-501.

45. Rose, A. S.; Bradley, A. R.; Valasatava, Y.; Duarte, J. M.; Prlić, A.; Rose, P. W., NGL viewer: web-based molecular graphics for large complexes. Bioinformatics 2018, 34 (21), 3755- 3758.

46. Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Scalmani, G.; Barone, V.; Petersson, G. A.; Nakatsuji, H.; Li, X.; Caricato, M.; Marenich, A. V.; Bloino, J.; Janesko, B. G.; Gomperts, R.; Mennucci, B.; Hratchian, H. P.; Ortiz, J. V.; Izmaylov, A. F.; Sonnenberg, J. L.; Williams-Young, D.; Ding, F.; Lipparini, F.; Egidi, F.; Goings, J.; Peng, B.; Petrone, A.; Henderson, T.; Ranasinghe, D.; Zakrzewski, V. G.; Gao, J.; Rega, N.; Zheng, G.; Liang, W.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda,

87 R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Vreven, T.; Throssell, K.; Montgomery, J. A.; Peralta, J. E.; Ogliaro, F.; Bearpark, M. J.; Heyd, J. J.; Brothers, E. N.; Kudin, K. N.; Staroverov, V. N.; Keith, T. A.; Kobayashi, R.; Normand, J.; Raghavachari, K.; Rendell, A. P.; Burant, J. C.; Iyengar, S. S.; Tomasi, J.; Cossi, M.; Millam, J. M.; Klene, M.; Adamo, C.; Cammi, R.; Ochterski, J. W.; Martin, R. L.; Morokuma, K.; Farkas, O.; Foresman, J. B.; Fox, D. J., Gaussian 16, Revision A.03. Gaussian, Inc.: Wallingford, CT, 2016.

47. Glendening, E. D.; Badenhoop, J. K.; Reed, A. E.; Carpenter, J. E.; Bohmann, J. A.; Morales, C. M.; Landis, C. R.; Weinhold, F. NBO 6.0, Theoretical Chemistry Institute, University of Wisconsin: Madison, WI, 2012.

88 2.10 Supplemental Information

Table 2-S1. Conformational analysis of saccharides containing exocyclic diols in the PDB. Trans to ring Proximal Distal PDB Ligand Glycan Protein Interacting Residue C–O: rotamer rotame ID ID interactions interactions residues ring, (y) r (z) side-chain Pyranose 1q9w KDO B301 H,O tg gt no no KDO B302 H,O tg gt 1-Oz no Tyr92 KDO B303 H,O tg gt no 1-Oy mainchain KDO C311 H,O tg gt no no KDO C312 H,O tg gt no no Tyr92 KDO C313 H,O tg gt 1-Oz 1-Oy mainchain 2r1y KDO B212 H,O tg gg 1-Oz no Asn52 KDR B213 H,O tg tg no 1-Oz sidechain 2r23 KO1 B214 H,O tg gg no no 2r2b KDO B215 H,O tg gg no no Asn30 2r2h KO2 B212 H,O tg gg no 1-Oy sidechain Arg343 2ria 289 A356 O,H gt gt no 2-Oy sidechain 289 B356 O,H gt gt no no 289 C356 O,H gt gt no no coordinates 2rib GMH A356 O,H gg gt no Ca calcium GMH B356 O,H gg gt no Ca GMH C356 O,H gg gt no Ca coordinates 2ric GMH A356 O,H gg gt no Ca calcium GMH B356 O,H gg gt no Ca GMH C356 O,H gg gt no Ca coordinates 2rie 293 A356 O,H gg gt no Ca calcium 293 B356 O,H gg gt no Ca 293 C356 O,H gg gt no Ca Tyr92 mainchain, 3dur KDO A303 H,O tg gg no 2-Oy, 1-Oz Lys30 sidechain (both)

89 Tyr92 KDO D303 H,O tg gg no 2-Oy mainchain, Lys30 sidechain Tyr92 mainchain, 3dus KDO A107 H,O tg gg no 2-Oy, 1-Oz Lys30 sidechain (both) Tyr92 mainchain, KDO C108 H,O tg gg no 2-Oy, 2-Oz Lys30 sidechain (both), PEG Tyr92 mainchain, 3duu KDO B114 H,O tg gg no 2-Oy, 1-Oz Lys30 sidechain (both) Tyr92 KDO D114 H,O tg gg no 2-Oy mainchain, Lys30 sidechain Tyr33, Arg52 3dv6 KDO B114 H,O tg gg no 2-Oy, 1-Oz (both) 3etn CMK B500 H,O tg gg no 1-Oy,1-Oz Asn96 both CMK C500 H,O tg gg no 1-Oy,1-Oz Asn96 both

CMK D500 H,O tg gg no 1-Oy,1-Oz Asn96 both 3hzm KDO B303 H,O tg gt no no 3hzv KDA A301 H,O tg gt 1-Oy no KDO A303 H,O tg gt no Oz-1 Arg27 sidechain Asn242 mainchain, 3k2v CMK A1 H,O tg gt no 1-Oy, 1-Oz His307 sidechain Asn242 CMK B2 H,O tg gt no 1-Oy mainchain His181 and 3k8d KDO A1244 H,O tg gg no 2-Oy Gln211 sidechains His181 and KDO B1244 H,O tg gg no 2-Oy Gln211 sidechains His181 and KDO C1244 H,O tg gg no 2-Oy Gln211 sidechains His181 and KDO D1244 H,O tg gg no 2-Oy Gln211 sidechains

90 3okd KDO B214 H,O tg gg no no Tyr92 mainchain, 3okk KDO B225 H,O tg gg 1-Oz 2-Oy Asn27 sidechain 3okl KDO B216 H,O tg gg no no 3sy0 KDA B301 H,O tg gg no no KDO B303 H,O tg gg no no 3t4y KDO B303 H,O tg gg no no Asn30 3t65 KDO B303 H,O tg gg no 1-Oy sidechain Tyr92 3t77 KDO B303 H,O tg gt no 1-Oy mainchain 3v0w GMH H307 O,H gg gt 1-Oy no GM0 H308 O,H gg gt no no KDO H309 H,O tg gt 1-Oy no coordinates 4e52 GMH B404 O,H gg gt no Ca calcium coordinates GMH C404 O,H gg gt no Ca calcium Asn93 4hgw KDO B302 H,O tg gg no 1-Oy sidechain 4m7j KDA H303 H,O tg gt no no Thr99 sidechain, KDO H305 H,O tg tg 2-Oz 1-Oy, 2-Oz Asn28 and Tyr32 sidechain Leu240 mainchain, 4o9k CMK A401 H,O tg gg no 2-Oy Asn321 sidechain Leu240 mainchain, Asn321 CMK B401 H,O tg gt no 2-Oy, 1-Oz sidechain and Arg304 sidechain 4pf6 KDO A401 H,O tg gg no 1-Oy, 1-Oz Arg42 sidechain Arg199 5fvn KDO A417 H,O tg gt no 1-Oz mainchain KDO B409 H,O tg gt no no

GMH B410 O,H gg gt no no Arg199 KDO B411 H,O tg gt no 1-Oz mainchain

91 KDO C415 H,O tg gt 1-Oy no GMH C416 H,O tg gt no no coordinates KDO C417 H,C gg tg no Ca calcium KDO C427 H,O tg gt no no Arg199 KDO C428 H,O tg gt no 1-Oz mainchain KDO D425 H,O tg gt no no Arg199 KDO D426 H,O tg gt no 1-Oz mainchain KDO E407 H,O tg gt no no Arg199 KDO E408 H,O tg gt no 1-Oz mainchain KDO F408 H,O tg gg 1-Oz no coordinates KDO F410 H,C gg tg no Ca calcium KDO F420 H,O tg gt no no Arg199 KDO F421 H,O tg gt no 1-Oz mainchain coordinates 5oxr GMH B405 O,H gg gt no Ca calcium

GMH B406 O,H gg tg 1-Oz no coordinates GMH C405 O,H gg gt no Ca calcium 5oxs GMH A405 O,H gg tg 2-Oz no coordinates GMH B404 O,H gg gt no Ca calcium GMH B405 O,H gg gt no no coordinates GMH C404 O,H gg gt no Ca calcium 6c5h KDO H409 H,O tg gt no no Asn28 KDO H411 H,O tg gg 1-Oz 1-Oz sidechain 6c5k KDO A406 H,O tg gg no no Asn28 KDO A408 H,O tg gg 1-Oz 1-Oz sidechain KDO H406 H,O tg gg no no Asn28 KDO H408 H,O tg gg 1-Oz 1-Oz sidechain Furanosea 2vk2 GZL A1298 O,C gt gt 3-Oy, 1-Oz

92 Asn65 2ydg A5C A1131 H,H gg tg 1-Oy sidechain Ser50, Asn59, 4xad 3ZW A207 O,H gg tg 3-Oy Arg61 sidechain Both 4wmy 3S6 A404 O,H gg gt Ca coordinated 5lsh KTS A205_1 O,H gg tg/gt no Arg98 and KTS A205_2 O,H gg gg 2-Oy Asp102 sidechain a GLZ, 3ZW, 3S6, and KTS are all β-D-galactofuranose-containing

93 Table 2-S2. Cartesian coordinates of saccharides optimized at the M06-2X/6-311+G(d,p); IEFPCM:water level of theory.

KO DD_gtgt C 0.85349500 -1.87232600 -0.15394700 C -1.25090400 0.99202400 0.67333800 O -0.24235900 0.08517200 0.72447000 C -0.64381300 -0.37806100 0.39774500 C -0.14406900 -1.89765100 -1.31608400 C 1.03284000 0.53549900 -1.03003400 O -2.14431300 -1.94466900 0.07189900 C 0.54444400 1.96160500 -0.78948200 C -1.45575000 -1.20247600 -0.93245100 C -0.15930200 2.05495000 0.56688600 O -0.39478000 -3.23370300 -1.71959200 H 0.13424400 -0.57492900 1.14480100 C -1.15197500 0.17934800 -0.36659900 H 1.39276100 2.64646400 -0.81411200 O 1.47707000 0.00528600 2.71277600 H 0.56792500 1.89111700 1.36626200 C 1.02000300 -0.44623300 0.38166200 O -0.05407900 -0.36717400 -0.90727700 O 3.08177400 -0.90822200 1.43785400 O 2.04510000 0.27719900 -0.11265400 C 1.94182800 -0.44697900 1.65263600 C 2.66970200 -0.99738800 -0.29030300 O 1.60126300 0.29029000 -0.66791600 H 3.08602600 -1.06591400 -1.30151800 C -2.40071200 0.89023900 0.15695500 H 1.91674300 -1.78447000 -0.16760900 O -3.40310400 0.91705400 -0.84607100 C 5.00574100 -1.41607100 0.45746000 C -2.06039500 2.28333100 0.65221800 H 5.74776900 -1.54021900 1.23720700 O -3.22475400 2.84370500 1.23884900 H 5.33800400 -1.53569500 -0.56905800 C 1.93734400 1.63268800 -0.32144500 C 3.73970600 -1.13798900 0.74421500 O 0.42938100 -2.74924200 0.87834600 H 3.41857300 -1.01750200 1.77578400 C 2.26576800 2.36921900 -1.58118300 C -1.61342200 -1.55653100 0.43038600 C 3.40234200 3.02832900 -1.77380500 H -1.98569400 -1.66506100 1.45135300 H 4.17758300 3.04591000 -1.01398600 C -2.80076300 -1.38127000 -0.50297400 H 1.82786400 -2.23070700 -0.48523100 H -3.34138300 -0.46844100 -0.23306800 H 0.29048900 -1.36722600 -2.16620900 H -2.44930600 -1.28877600 -1.53760800 H -2.46975500 -2.75677800 -0.33110800 O -0.89483100 -2.74460600 0.14260900 H -2.08838000 -1.09694300 -1.81781300 H -0.46896600 -2.61920000 -0.71429400 H -0.28860100 -3.79061800 -0.93767400 O -3.64181600 -2.51358000 -0.35071600 H -0.70802400 0.78158800 -1.17197300 H -4.40188300 -2.40602600 -0.92882700 H -2.81687000 0.30764000 0.98125700 H 1.38349200 0.41213500 -2.05997800 H -3.14591300 1.55067200 -1.52536900 H -2.02128900 1.21196500 -0.07898100 H -1.72053700 2.89524800 -0.19589300 O -0.33446900 2.36506800 -1.82772900 H -2.99852300 3.70354200 1.60360000 H -0.91539700 1.62745400 -2.05197000 H 2.78757300 1.63830200 0.36949200 O -0.71058000 3.34244600 0.76933200 H -0.45980400 -2.47214400 1.14469600 H -1.14634300 3.59922100 -0.05410900 H 1.49865300 2.35355500 -2.35205600 O -1.82655000 0.97052100 1.96207600

94 KO, continued DD_gtgt, continued H 3.59225300 3.57485200 -2.69004100 H -2.06283600 1.87827600 2.18718300 H -1.24619100 2.21237600 1.37811700 H 1.07948500 2.10043100 0.17754300

DD_gggt LD_gggt C 1.12682100 0.95322000 -0.57972400 C 1.43296500 -0.90169400 0.68699200 C 0.66899800 -0.38602800 -0.00557100 C 0.71924500 0.39063800 0.30483600 C -1.29525900 0.61193900 0.94914600 C -0.87532300 -0.78204200 -1.03696800 C -0.91562500 2.00477900 0.45926500 C -0.24021100 -2.13365900 -0.72515800 C -0.06694500 1.88099000 -0.80306600 C 0.45907900 -2.07551600 0.63288800 H 0.07632700 -0.91821000 -0.75942400 H -0.04509200 0.60949500 1.06021000 H -1.81823200 2.57888000 0.24692900 H -1.01072600 -2.90537100 -0.71376900 H -0.67928000 1.46429000 -1.60662100 H -0.28937800 -1.94338600 1.41840700 O -0.12331800 -0.15107200 1.16578500 O 0.10965700 0.23472100 -0.98096500 O -2.12797500 0.03955600 -0.00671600 O -1.89693400 -0.57193400 -0.11662700 C -2.66524300 -1.22648600 0.38694100 C -2.68202400 0.59270700 -0.38952400 H -3.27462200 -1.10320700 1.28913300 H -3.22712800 0.45639800 -1.33003200 H -1.83911500 -1.90981200 0.61445800 H -2.01519700 1.45619700 -0.49464800 C -4.75334300 -2.12200900 -0.62758100 C -4.93809100 0.89453400 0.61696500 H -5.30543800 -2.53039800 -1.46569300 H -5.58422800 1.07342100 1.46815500 H -5.27960500 -2.02993300 0.31742600 H -5.41086100 0.80363800 -0.35599000 C -3.48255800 -1.75679000 -0.74685700 C -3.62119400 0.79886800 0.75500600 H -2.96597900 -1.85001800 -1.69887800 H -3.15855900 0.88741100 1.73489500 C 1.79248500 -1.29623000 0.48939900 C 1.66956500 1.57627700 0.18063900 H 1.31031400 -2.20904300 0.86318000 C 0.89954500 2.86679700 -0.04660600 C 2.77803100 -1.70187100 -0.58567600 H 0.41779700 2.83654000 -1.03069500 H 2.23687700 -1.95504200 -1.50275200 H 0.12085800 2.95851800 0.71970200 H 3.44916900 -0.86642100 -0.79872700 O 1.81636000 3.94565100 0.03684200 O 2.51490100 -0.67268900 1.53841900 H 1.35698500 4.75683200 -0.19633600 H 1.89320100 -0.49552300 2.25312000 H -1.24809400 -0.75460600 -2.06607600 O 3.49900100 -2.82546100 -0.09535600 H 2.25102900 -1.07303700 -0.02613200 H 4.20664200 -3.02482200 -0.71403100 O 0.68141400 -2.49463800 -1.74165100 H -1.78161700 0.66006200 1.92903300 H 1.19283100 -1.71524300 -1.99211700 H 1.82301900 1.41845800 0.13225300 O 1.13710700 -3.28666400 0.91341400 O -0.21084200 2.71221200 1.46666400 H 1.61459900 -3.53862900 0.11210700 H 0.41637000 2.11256900 1.88946900 O 1.95580900 -0.74143400 1.99075200

95 DD_gggt, continued LD_gggt, continued O 0.38397800 3.14766000 -1.24851600 H 2.28721300 -1.60140100 2.27452400 H 0.70680800 3.62503200 -0.47297300 H 2.23263500 1.66106800 1.11308100 O 1.78241400 0.71147000 -1.81035900 O 2.62556200 1.34811300 -0.84021400 H 1.95572100 1.57064100 -2.21293100 H 2.14217500 1.20016800 -1.66212000

LD_gtgt C -1.18963400 1.13901600 0.44739600 C -0.75794000 -0.24775600 -0.02717100 C 1.28557400 0.61883600 -0.95100800 C 0.94929200 2.05042400 -0.55057900 C 0.03720500 2.02710400 0.67227500 H -0.19884800 -0.75119500 0.77185000 H 1.86516800 2.59624600 -0.32137400 H 0.59215800 1.62817500 1.52510700 O 0.08221100 -0.09222800 -1.18012500 O 2.04829300 0.05850100 0.06662000 C 2.52423200 -1.25741200 -0.23168500 H 3.14916400 -1.22818600 -1.13117100 H 1.66652100 -1.91300400 -0.42269000 C 4.54951800 -2.19022700 0.87189600 H 5.06993800 -2.56731800 1.74422100 H 5.08899400 -2.19483400 -0.07009600 C 3.29992900 -1.74859700 0.94799100 H 2.76963000 -1.74561900 1.89698200 C -1.87898400 -1.16513100 -0.52237300 C -2.83048400 -1.60355700 0.57038300 H -2.27519200 -2.19307400 1.30979700 H -3.25209500 -0.72602400 1.06138400 O -3.84994500 -2.38694100 -0.03681400 H -4.48753300 -2.62665600 0.64082900 H 1.81147800 0.59195600 -1.91106300 H -1.82785200 1.59224900 -0.32610500 O 0.32430800 2.73289600 -1.62508000 H -0.30321800 2.13539800 -2.05074800 O -0.37985700 3.33175600 1.03078900 H -0.64208300 3.78238000 0.21718200

96 LD_gtgt, continued O -1.90839300 1.02150800 1.65955700 H -2.00469200 1.91252100 2.01747500 H -2.44608900 -0.62759800 -1.29526000 O -1.29932600 -2.33501600 -1.07681800 H -0.64602000 -2.04543500 -1.72409500

97

98

Chapter 3

Human Intelectin-1 microbial binding specificity in synthetic communities

Contributions: Protein expression, purification, and community binding analysis performed by Christine R. Isabella and Darryl A. Wesener. Strain library binding performed by Darryl A. Wesener. Bacterial strains were cultured and verified via 16S rRNA sequencing by Robert Kerby. Research designed by Darryl A. Wesener, Christine R. Isabella, and Laura L. Kiessling.

99 3.1 Abstract

Microbial surfaces are covered in a coat of glycans, whose identity is strain-specific.

These glycans are therefore poised to serve as identification codes, which can be interpreted by lectins that function as readers. Lectins perform important roles in innate immunity, such as target bacteria for phagocytosis or agglutination and clearance. However, the selectivity of lectins for detecting subsets of microbes within a larger community is not known. To assess lectin selectivity within microbial communities, we focused on human intelectin-1 (hItln-1), a lectin expressed predominantly in lung and intestinal mucosal epithelial tissues. Human Itln-1 binds microbial but not human glycans through recognition of exocyclic vicinal diol epitopes present in several microbe-specific carbohydrates, including β-galactofuranose, D-glycerol 1- phosphate, D-glycero-D-talo-oct-2-ulosonic acid, and 3-deoxy-D-manno-oct-2-ulosonic acid.

However, hItln-1 does not bind the most prevalent microbial glycan, L-glycerol-D-manno- heptose, suggesting that this lectin has evolved a more selective glycan binding profile to target specific microbes. To understand hItln-1 microbe binding, we profiled hItln-1 recognition of commensal bacteria resident to the human gastrointestinal (GI) tract. We screened 45 bacterial strains from diverse taxa and found hItln-1 bound 12 strains from the four phyla tested, including

Gram-positive and Gram-negative species. In general, the lectin had a higher affinity for the

Gram-positive strains and this specificity was magnified in communities. Specifically, when exposed to a community, hItln-1 exhibited high selectivity for distinct strains. These data indicate that community composition and lectin expression may tune native lectin–carbohydrate interactions at mucosal surfaces. Our data also indicate that hItln-1 could regulate innate immunity in the gut.

100 3.2 Introduction

The human intestine is home to trillions of bacteria and other microorganisms, collectively termed the microbiome.1, 2. To protect the highly absorptive epithelial surface from microbial residents, the intestines harbor a specialized immune system that maintains spatial separation and influences the gut microbiome composition. The intestinal immune system includes multiple components: specialized immune cells and secretory epithelial cells that produce mucus, innate immune factors that include antimicrobial peptides (AMPs) and microbe targeting enzymes, and soluble lectins that can regulate the microbiome.3-5 AMPs are typically cationic peptides or small proteins (e.g., the defensins) that destabilize the anionic microbial membrane, and lysozyme degrades peptidoglycan of the Gram-positive cell wall.6 While AMPs and lysozyme defend against bacteria in a less specific manner, lectins can achieve serotype- specific recognition of bacteria through their binding to cell-surface glycans.7, 8 Known functions of lectins that recognize microbial cells include direct killing of bacteria, complement activation, and agglutination, though for many lectins, their microbial binding specificity is not well defined.7, 9

Human intelectin-1 (hItln-1) is an X-type lectin expressed and secreted by goblet cells in the intestine.10-12 While its microbial binding targets and biological function are not known, we have identified and characterized carbohydrate ligands of hItln-1. Human Itln-1 recognizes microbial glycans by engaging an exocyclic vicinal diol via coordination by a calcium ion in the carbohydrate binding pocket.13 We reported that hItln-1 has the greatest affinity for D-glycero-D- talo-oct-2-ulosonic acid (KO), followed by 3-deoxy-D-manno-oct-2-ulosonic acid (KDO), β-D- galactofuranose (β-Galf), and finally D-glycerol 1-phosphate. Despite its propensity for binding the 1,2-terminal diol, hItln-1 binds poorly to heptose sugars.14 L,D-Heptose is the most abundant

101 bacterial carbohydrate in the Bacterial Carbohydrate Structure Data Base.15 The lack of hItln-1 binding suggests that hItln-1 has evolved specificity for microbe binding by excluding recognition of this glycan. To date, however, the microbe binding repertoire of hItln-1 remains unexplored.

The monosaccharide ligands of hItln-1 are present in the diverse polysaccharide structures found on the surface of microbes.16 KO and KDO are largely confined to the core lipopolysaccharide (LPS) of Gram-negative bacteria.17, 18 However, the core LPS structure is buried in O-antigen and CPS and may be inaccessible for lectin binding. It should be noted that, while less common, KDO can also be found in the CPS of Gram-negative species.19, 20 Glycerol

1-phosphate and β-Galf, on the other hand, are strongly enriched in and Bacilli, but still present in Gram-negative species.15, 21 These monosaccharides are components of CPS in

Gram-positive and Gram-negative species and teichoic acids (TAs) in Gram-positive species.22

β-Galf is also found in LPS of Gram-negative species and the galactan in the mycobacterial cell wall.23 Our group demonstrated previously that hItln-1 distinguishes between Streptococcus pneumoniae serotypes whose capsular polysaccharide (CPS) contains over those that lack a hItln-1 carbohydrate ligand, indicating that the lectin monosaccharide specificity translates to microbial recognition.13 While the occurrence of specific components in the cell envelope can give clues about hItln-1 recognition, which organisms hItln-1 will bind cannot be determined from this information alone; the presence, accessibility, and density of ligands on a particular species is not yet apparent from the biosynthetic gene clusters alone.

We set out to catalog hItln-1 binding to bacterial isolates from the human gut microbiota.

While hItln-1 bound species from the four major phyla found in the microbiome, we observed differences in fluorescence intensity for binding different species. We hypothesized that these

102 differences may arise from a difference in affinity for binding the bacterial cell surfaces. We then developed a flow cytometry-based assay that allowed us to observe competitive binding to microbes in a community. Our findings indicate that hItln-1 binding to bacterial species is dependent on the species present. These results suggest that in vitro studies of lectins binding to single cultured bacterial isolates may not recapitulate a lectin’s ability to target certain species in physiological settings; therefore, the selectivity we observe have critical and have important implications for the role of lectins in vivo.

3.3 Results

3.3.1 Recognition of microbial strains by hItln-1

Human intelectin-1 is expressed at the mucosal surface of the small intestine. We therefore explored hItln-1 binding to bacterial reference strains that are similar to bacteria found within the intestinal microbiome. To determine the range of species identified by hItln-1 and whether certain taxa would be preferentially recognized, we developed a panel of bacterial strains from diverse taxa with available genomic sequencing data. Binding of StrepII-tagged hItln-1 to freshly grown, fixed bacterial cells was assayed and quantified using flow cytometry.

The calcium ion-dependence of hItln-1 carbohydrate-binding was evaluated by monitoring binding in the presence of the calcium-chelator EDTA (Figure 3-1a). In total, 45 strains from four phyla were assayed. While hItln-1 displayed binding to 12 species including species from each of the four phyla (Table 3-1), the results suggest that binding is favored to Actinobacteria and (Figure 3-1b). This preference is supported by a recent examination of the

Bacterial Carbohydrate Structure Data Base (BCSDB) that revealed increased utilization of hItln-1 ligands by Actinobacteria and Firmicute bacteria.15 For example, Actinobacteria cell surface glyconconjugates were shown to be enriched in the hItln-1 ligand β-Galf.

103

Figure 3-1. Binding of hItln-1 to fixed bacterial strains. (a) Representative flow cytometry histogram of hItln-1 binding to fixed bacterial cells. hItln-1 was visualized using a fluorophore-labeled Anti- Strep-tag II antibody. For the EDTA treated sample, cells were stained in the presence of 1 mM EDTA. (b) Summary of 45 assayed strains, sorted by taxa. All strains have been confirmed by 16S rRNA sequencing. More information on these bacteria can be found in Table 3-1.

104 Table 3-1. Summary of hItln-1 Binding to Microbes hItln-1-Binding Cells Non-hItln-1-Binding Cells Genus species Straina Genus species Straina Actinobacteria Actinobacteria Bifidobacterium angulatum ATCC 27535 Bifidobacterium adolescentis ATCC 15703 Bifidobacterium bifidum ATCC 29521 Bifidobacterium ATCC 27919 pseudocatenulatum Bifidobacterium dentium ATCC 27678 Collinsella aerofaciens ATCC 25986 Collinsella intestinalis DSMZ 13280 Bacteroidetes Bacteroidetes Bacteroides plebeius DSMZ 17135 Alistipes indistinctus DSMZ 22520 Bacteroides caccae ATCC 43185 Bacteroides finegoldii DSMZ 17565 Bacteroides intestinalis DSMZ 17393 Bacteroides ovatus ATCC 8483 Bacteroides thetaiotaomicron 7330 Bacteroides thetaiotaomicron VPI-5462b Bacteroides uniformis ATCC 8492 Bacteroides vulgatus ATCC 8482 Bacteroides xylanisolvens DSMZ 18836 Parabacteroides merdae ATCC 43184 Firmicute Firmicute Anaerococcus hydrogenalis ATCC 49630 Blautia hansenii ATCC 27752 Dorea longicatena DSMZ 13814 asparagiforme DSMZ 15981 Eubacterium biforme ATCC 27806 Clostridium bloteae DSMZ 15670c Lactobacillus reuteri DSMZ 20016 Clostridium hylemonae DSMZ 15053 Lactobacillus ruminis ATCC 27780 Clostridium symbiosum ATCC 14940 Roseburia intestinalis DSMZ 14610 Coprococcus comes ATCC 27758 Ruminococcus torques ATCC 27756 filiformis ATCC 51649 Mitsuokella multacida ATCC 27723 Ruminococcus gnavus ATCC 29149 Streptococcus infantarius ATCC BAA- 102 Proteobacteria Proteobacteria Escherichia fergusonii ATCC 35469 Citrobacter youngae ATCC 29200 Edwardsiella tarda ATCC 23685 Enterobacter cancerogenus ATCC 35316 Escherichia coli MS.200.1 Escherichia coli K12 ATCC 47076 Proteus penneri ATCC 35198 Providencia rettgeri DSMZ 1131 Providencia stuartii ATCC 25827 a Strain information is provided as the ATCC strain designation, or the Leibniz Institute DSMZ– German Collection of Microorganisms and Cell Cultures identification number. b ATCC 29148 c ATCC BAA-613

105 3.3.2 hItln-1 binding affinity for Gram-positive and Gram-negative bacteria

The individual strain binding analysis revealed a higher degree of hItln-1 binding to L. reuteri and B. angulatum (Gram-positive) than to E. fergusonii and B. plebeius (Gram-negative)

(Figure 3-1a). To determine the minimal concentration of hItln-1 required for complete binding of the Gram-positive and Gram-negative species, we measured binding by flow cytometry at varying hItln-1 concentrations. The two Gram-positive or two Gram-negative binding species were mixed at equal parts and incubated with StrepII-hItln-1 ranging in concentration from 0.01

μg/1E6 cells to 0.5 μg /1E6 cells and analyzed by flow cytometry (Figure 3-2a, b). The Gram- positive mixture showed maximal binding at 0.05 μg hItln-1/1E6 cells, though even at 0.01

μg/1E6 cells the entire population showed an increase in fluorescence intensity compared to the unstained bacteria. For the Gram-negative mixture, maximal binding was observed at 0.25 μg hItln-1/1E6 cells. To assess whether binding is limited by the concentration of hItln-1 in the staining solution, we performed a western blot analysis of the staining solution post-incubation.

At concentrations above 0.01 μg hItln-1/1E6 cells, there is detectable excess hItln-1in the staining solution for both Gram-positive and Gram-negative bacteria (Figure 3-2c), indicating that a limiting amount of hItln-1 is not the cause of less than maximal binding of Gram-negative bacteria. Rather, this suggests a lower apparent affinity for the binding epitopes displayed by these bacteria. The slight decrease in binding upon washing (versus diluting only, Figure 3-2b) the Gram-negative samples further indicates a lower binding affinity. To determine the apparent binding affinity of hItln-1 to the microbial cell surface of L. reuteri and B. angulatum, B. plebeius, and E. fergusonii, the percent of maximal fluorescence intensity was plotted against hItln-1 concentration and fit to a single site binding curve (Figure 3-2d). The calculated apparent

Kd values for trimeric hItln-1 binding to L. reuteri and B. angulatum was 15 nM, B. plebeius was

106 38 nM, and E. fergusonii was 73 nM. While these values are only estimates of binding affinity due to measuring from mixtures of combined Gram-positive or -negative species, the trend has important implications for understanding the interactions between lectin and microbes in a community.

107

Figure 3-2. hItln-1 binding affinity for microbial cell surfaces. (a) histogram representations of hItln-1 binding to Gram-positive (L. reuteri and B. angulatum) and Gram-negative (B. plebeius and E. fergusonii) at increasing concentrations of hItln-1. (b) Percent of the Gram-positive or Gram-negative populations bound by hItln-1 (measured as percent of population with fluorescence intesnsity shifted above negative control) at increasing concentrations of hItln-1. (c) Western blot analysis of the excess hItln-1 in the staining solution after microbial cells were incubated with hItln-1. (d) binding curve of hItln-1 binding to L. reuteri and B. angulatum, B. plebeius, and E. fergusonii fit by a single site binding curve to calculate apparent Kd. Percent of maximal mean fluorescence intensity was measured for each population shown in (a). Points shown as mean +/- S.D., n=3.

108 3.3.3 hItln-1 binding to synthetic microbial communities

The results above indicated that human Itln-1 displays a two to five-fold greater affinity for the Gram-positive species than Gram-negative species. This suggests that in a mixed microbial population, such as that encountered in the human gut microbiome, hItln-1 will preferentially bind to species for which it has a greater affinity. We hypothesized that in a simple synthetic community, hItln-1 would preferentially bind the higher affinity species. To test this hypothesis, we built a synthetic four-member community of two binding species: L. reuteri and

E. fergusonii, and two non-binding species: B. ovatus and P. penneri. Equal parts of the two binding and of the two non-binding strains were mixed and combined to give communities containing 0%, 10%, 50%, and 100% hItln-1 binding bacteria, as described in Table 3-2.

Table 3-2. Example synthetic microbial community mixtures assayed for hItln-1 binding

Predicted Bound 0% Predicted Bound 50% Binding % of mixture Binding % of mixture L. reuteri 0 L. reuteri 25 E. fergusonii 0 E. fergusonii 25 Non-binding Non-binding B. ovatus 50 B. ovatus 25 P. penneri 50 P. penneri 25

Predicted Bound 10% Predicted Bound 100% Binding % of mixture Binding % of mixture L. reuteri 5 L. reuteri 50 E. fergusonii 5 E. fergusonii 50 Non-binding Non-binding B. ovatus 45 B. ovatus 0 P. penneri 45 P. penneri 0

Prior to assaying the communities, each strain was analyzed individually (Figure 3-3a).

Each strain showed differences in forward scatter, side scatter, and the amount of hItln-1 bound to the surface (Figure 3-3a). The distinct scattering properties of each strain facilitates their

109 specific identification within a mixed population. For example, L. reuteri shows the greatest forward scatter (FSC-A) and the strongest signal for hItln-1 binding in FL4-H, and B. ovatus has the smallest forward scatter. (Figure 3-3a). In the strain analysis, we treated each strain as binomial, either 0% or 100% bound.

We assembled the four communities of L. reuteri, E. fergusonii, P. penneri, and B. ovatus with increasing amounts of predicted hItln-1-bound bacteria (Table 3-3). Each community was composed of ten million bacteria and stained at 0.1 μg hItln-1 per 1E6 cells. The microbial mixtures were diluted four-fold and analyzed by flow cytometry and the percentage of bound cells was measured in a gate determined by comparison to the 0 μg hItln-1 per 1E6 cells community. Analysis of these mixtures revealed that the number of bacteria bound by hItln-1 was consistently less than what was predicted (Figure 3-3b, c). Based on the properties of the individually analyzed strains, we were able to determine that L. reuteri is completely bound, while E. fergusonii is almost entirely excluded from the bound population in the community.

These data indicate that hItln-1 exhibits species-specific binding within a microbial community.

Specifically, hItln-1 binds Gram-positive L. reuteri over Gram-negative E. fergusonii, suggesting that hItln-1 exploits its ligand specificity and avidity to selectively bind species within a microbial community.

110

Figure 3-3. Binding of hItln-1 to E. fergusonii is inhibited in a mixed community. (a) Four strains of bacteria were individually assayed for hItln-1 binding to establish the attributes useful in distinguishing them within mixtures. Strains were analyzed individually by flow cytometry and data are overlaid. (b) hItln-1 binding to synthetic mixed communities of increasing percentage of predicted hItln-1 binding. The gate for measuring the percent of hItln-1 bound cells was set on an unstained community (data not shown) and shows the percentage of hItln-1-positive cells within each graph. (c) The data for hItln-1 binding to each predicted percent bound mixture represented in a histogram. (d) Summary of the results shown in (b). The red dashed line indicates the predicted trendline with a slope of 1 and y-intercept of 0. Black squares represent observed percent hItln-1 binding. Cells were stained with 0.1 μg of hItln-1 per 1E6 cells.

111 3.3.4 hItln-1 competitive binding in mixed microbial communities

In a simple community containing a Gram-positive (L. reuteri) and Gram-negative (E. fergusonii) binding species, hItln-1 showed a strong propensity for binding the Gram-positive species, while excluding the Gram-negative. To determine the hItln-1 binding profile in mixed microbial communities, we tested microbe binding by hItln-1 within microbial communities containing all Gram-positive binders, Gram-positive and Gram-negative binders, or all Gram- negative binders. We hypothesized that hItln-1 would preferentially bind the higher affinity species in a community, and that increasing the amount of hItln-1 would result in increased binding to the lower affinity strains. To test this, additional four-member communities were prepared. We assembled six communities built of two binding and two non-binding strains using the strains listed in Table 3-3. Community 1 contained two Gram-positive binders, LR and BA.

Community 2 contained two Gram-negative binders, EF and BP. Communities 3, 4, 5 and 6 contained a Gram-positive and Gram-negative binding strain (LR+EF, LR+BP, BA+ EF, and

BA+BP, respectively). All six communities contained BO and PP as the non-binding strains. For each community, four mixtures were assembled to give a predicted bound percentage of 0, 10,

50, and 100% as described in Table 3-2. Each mixture of each community was then incubated with 0, 0.1, 0.25, and 0.5 μg hItln-1 per 1E6 cells and assayed by flow cytometry.

Table 3-3. hItln-1 binding and non-binding strains used in synthetic communities hItln-1 binding strains Gram-pos/neg hItln-1 non-binding strain Gram-pos/neg Lactobacillus reuteri (LR) Positive Bacteroides ovatus (BO) Negative Bifidobacterium angulatum (BA) Positive Proteus penneri (PP) Negative Bacteroides plebeius (BP) Negative Escherichia fergusonii (EF) Negative

112 Analysis of the mixed microbial communities by flow cytometry revealed that when the binding strains in the community were both Gram-positive (LR and BA), the observed percent of the population bound by hItln-1 is equal to the predicted percent at all concentrations of hItln-1

(Figure 3-4a), indicating that both strains are bound by hItln-1. When the two binding strains were Gram-negative (EF and BP), we observed binding of both strains, as indicated by the shift in the histograms in Figure 3-4b. However, this shift in fluorescence is not significant enough to distinguish the population from the negative control (depicted as the gray histogram in Figure 3-

4b). The quantitation of binding in community 2 by our gating strategy showed 52% to 87% observed binding depending on the concentration of hItln-1 in the mixture (Figure 3-4c). On the contrary, when the two binding strains in the community were composed of one Gram-positive and one Gram-negative strain (communities 3-6), the observed percentage of hItln-1 positive cells was always less than predicted. Analysis of the scattering properties by flow cytometry

(data not shown) revealed that the Gram-positive strain was fully bound by hItln-1, while the

Gram-negative strain was excluded from hItln-1 binding in every case. These data are summarized in Figure 3-4c-f. Taken together, these data indicate that when there is no Gram- positive binding strain in the community, the Gram-negative strains are, indeed, bound by hItln-

1, but when a Gram-positive binding strain is present, it is bound preferentially by hItln-1.

These data support the hypothesis that the observed differences in apparent affinity for

Gram-positive and Gram-negative bacteria translate to competitive binding in a community context. We next asked whether additional washing in our staining procedure would disrupt the low-affinity interactions between hItln-1 and Gram-negative strains. We further hypothesized that increasing the concentration of hItln-1 would increase binding to the lower affinity strains in a mixture, as once binding to the higher affinity strain is saturated, excess hItln-1 should bind the

113 lower affinity strain. However, we only observed these trends in community 2 and to a smaller extent in community 4 and 6 (Figure 3-4b, d, f). This result differs somewhat from our observations an earlier experiment on four communities: LR + EF, LR + BP, EF + BP, and LR +

BA (Figure 3-1S). In the earlier experiment, the community composed entirely of Gram-positive hItln-1 binding strains (LR + BA), additional washing did not decrease hItln-1 binding (Figure 3-

S1d). In the remaining communities, the number of hItln-1 binding cells was greater when a higher concentration of lectin was used, and a smaller percentage of cells were bound in the washed samples (Figure 3-S1a-c). This trend was observed in all communities containing Gram- negative strains. Thus, hItln-1 binding to Gram-positive strains was unaffected by washing, while washing decreased hItln-1 binding to Gram-negative strains (Figure 3-S1), further confirming a lower affinity for the Gram-negative strains. These data indicate that in a community setting where hItln-1 is at low concentration, binding is selective for the higher affinity strain. The differences observed between the earlier experiment (Figure 3-S1) and the current experiment (Figure 3-4) could be due to differences in quantification of the individual bacterial strains, and these experiments should be repeated to confirm the results.

114

Figure 3-4. Competitive inhibition of hItln-1 binding in microbial communities. Six communities were assayed for hItln-1 binding by flow cytometry. The strains in each mixture that are known hItln-1 binders are denoted above each graph (a) contained two Gram-positive binding strains, L. reuteri and B. angulatum. (b) contained two Gram-negative binding strains, E. fergusonii and B. plebeius. (c-f) contained a Gram-positive and Gram-negative binding strain as follows: (c) L. reuteri and E. fergusonii, (d) L. reuteri and B. plebeius, (e) B. angulatum and E. fergusonii, and (f) B. angulatum and B. plebeius. All six communities share B. ovatus and P. penneri as non-binding strains. The red line represents a perfect correlation between predicted and observed with a slope of 1 and y-intercept of 0. The data from 0.1 μg of hItln-1 per 1E6 cells (washed) in (c) is the same data shown in Figure 4-3c.

115 3.4 Discussion

Here we describe experiments that inform the microbe binding specificity of hItln-1. We identified 12 strains from diverse taxa that are robustly bound by the lectin in a calcium ion- dependent manner, including both Gram-positive and Gram-negative strains. Intriguingly, of the

21 Gram-negative strains tested, only two were recognized by hItln-1. Given that the LPS core contains the hItln-1 ligands KDO and KO,17, 24 these results suggest that the LPS core is likely inaccessible to lectins. We further observed that hItln-1 has a lower apparent affinity for the

Gram-negative strains than the Gram-positive strains tested herein (Figure 3-2). This difference in affinity may be a result of (i) the presence of specific hItln-1 ligands as well as (ii) the accessibility of these ligands on the microbial cell surface. In the case of Gram-positive species, the lectin likely binds to the capsular polysaccharide (CPS), exo-polysaccharide (EPS), and teichoic acid (TA) that localized to the cell surface, whereas in Gram-negative bacteria hItln-1 likely binds to the LPS O-antigen.

The observed differences in affinity of hItln-1 for different species of bacteria have important implications for lectin recognition of microbes in vivo. However, it was unclear whether these findings were representative of hItln-1 binding to microbes in a mixed microbial community, such as those encountered at our body’s mucosal surfaces where lectins perform important innate immune functions. We therefore explored hItln-1 binding in the context of synthetic microbial communities. We observed that hItln-1 binding to microbes can be strikingly different when assayed individually versus assayed within a community. Specifically, Gram- negative bacteria appear to be excluded from hItln-1 binding in the presence of competing Gram- negative and Gram-positive bacteria. Our results suggest that, more generally, hItln-1 recognition

116 of bacteria within native communities such as the gut microbiome is dependent on affinity for individual strains and community composition.

We also demonstrate that hItln-1 binding is sensitive to the context of the assay and the amount of lectin available. As expected, the species with lower affinity hItln-1 binding composed a greater proportion of the bound population in the community as the concentration of hItln-1 was increased (Figure 3-4a-c, Figure 3-S1). This suggests that differential expression levels of hItln-1 may result in differential targeting of microbes in mixed communities. Our study provides valuable insight into lectin specificities, which have been studied almost exclusively in the context of single strain binding assays when examining microbial recognition.

Based on our results, we postulate that differences in affinity for microbial binding, likely driven by the ligands displayed on the cell surface, drives selective binding in a community context. It has not escaped us that the human body may use lectin expression levels to selectively target and shape microbial communities through alteration of lectin-microbe binding.

The results presented herein appear robust. Particularly, the difference in affinity for the

Gram-positive organisms L. reuteri and B. angulatum compared to the Gram-negative organisms, B. plebeius and E. fergusonii, and the translation to competitive binding in a mixture.

A more fine-tuned assay for determining the apparent Kd of hItln-1 binding to each of these organisms individually (rather than a mix of the Gram-positive or Gram-negative as shown in

Figure 3-2) would provide more accurate binding affinities and allow for the development of an equation to predict community binding at varying concentrations of hItln-1. Further, the community binding assays we performed are very sensitive to the number of bacteria of each species. A more quantitative approach should be developed for future community binding analyses and for larger, more complex communities.

117 3.5 Conclusions

To the best of our knowledge, this is the first report of a lectin showing competitive binding to microbes in a community context. Our data suggest that analyses of lectins binding to single microbial strains are not sufficient to ascribe recognition by a soluble lectin in a biological context. Understanding hItln-1 microbe binding specificity provides a foundation for mining genomic data to identify features that correlate to lectin binding. Additionally, our results identify hItln-1 binding strains that will be useful in future experiments employing mouse models to examine how intelectin binding to bacteria influences mammalian health.

118 3.6 Materials and Methods

3.6.1 hItln-1 Expression and Purification

StrepII-tagged hItln-1 was expressed in HEK cells and purified as previously described.13

3.6.2 hItln-1 Binding to Bacterial Strains

Bacteria were grown under anaerobic conditions by Robert Kerby in Federico Rey’s group. 2 mL of overnight or saturated growth were pelleted at 5,000 RPM. Cells were washed with 5 mL of cold PBS, pelleted by centrifugation, and fixed in 5 mL of cold PBS + 1% formaldehyde for 30 min on ice. Fixation was quenched with addition of 5 mL of PBS + 1 M lysine for 30 min on ice. Fixed cells were pelleted by centrifugation and resuspended in 5 mL of

20 mM HEPES (7.4), 150 mM NaCl, 10 mM CaCl2, 0.1 % BSA, and 0.05 % tween-20. 100 μL of the fixed bacteria solution was used for each staining condition, ~ 3E6 – 30E6 cells. Cells were stained in a total volume of 250 μL. To assay hItln-1 binding, cell were stained in 20 mM

HEPES (7.4), 150 mM NaCl, 10 mM CaCl2, 0.1 % BSA, 0.05 % tween-20, 15 μg/mL Strep- hItln-1, and a 1:250 dilution of an Oyster 645 nm:Anti-Strep-tag II antibody (IBA Bioscience).

To assay for calcium ion dependence, the calcium was omitted from the staining procedure and replaced with 1 mM EDTA. An antibody only control was performed with each strain by omitting hItln-1 from the lectin staining conditions. After staining for two hours at 4 °C, cells were centrifuged, the supernatant removed, and resuspended in 1 mL of staining buffer. Stained cells were analyzed on a BD Accuri C6 flow cytometer. To help visualize cells, propidium iodide was added to some samples at a 1:250 dilution (Life Technologies). A minimum of

50,000 events were collected under each condition. Data were processed using FlowJo.

Histograms were generated and strains that demonstrated a substantial increase in fluorescence in

119 the presence of hItln-1, and were completely sensitive to EDTA addition were considered hItln-1 bound.

3.6.3 hItln-1 Binding to Synthetic Communities

Strains of bacteria were prepared identically to what is described above. Strains were first analyzed individually to profile each strain, and quantify the density of the cells on a BD Accuri

C6 flow cytometer. A minimum of 50,000 events were collected under each condition. The events/μL observed on the flow cytometer during the strain analysis were used to quantitative mixtures of bacteria. 20E6, and later 10E6 cells were stained in a final volume of 250 μL. Cells were stained under similar conditions as reported above, except the amount of hItln-1 per sample was varied as indicated. After staining for two hours at 4 °C, the cells were split into two samples. One was diluted fourfold with the addition of 400 μL of HEPES-Ca/BSA/Tween buffer to 100 μL of the stained cell solution, these cells were the “diluted” sample, while another 100

μL was centrifuged and resuspended in 500 μL of HEPES-Ca/BSA/Tween buffer, these cells were the “washed” sample. Samples were analyzed immediately on a BD Accuri C6 flow cytometer. Data were processed using FlowJo software. Bound cells were quantified where a gate was derived from an unstained sample. Quantified data were graphed using Graphpad Prism

6.

120 3.7 Acknowledgements

I would like to thank Dr. Darryl Wesener for the foundational work he did on this project and for welcoming me into it and tackling all of these experiments together. I would also like to thank Robert Kerby and the entire group of Professor Federico Rey for our collaboration on the binding of hIntL-1 to bacteria associated with the human microbiome, and for thoughtful discussions about the project. I would like to acknowledge Darrell R. McCaslin and Dan Stevens of the Department of Biochemistry BIF at the University of Wisconsin-Madison for their thoughtful discussion and assistance using the BD Accuri C6. And lastly, I would like to thank

Lectin Land the entire Kiessling Group, but specifically Dr. Robert Lyle McPherson and Dr.

Mike Wuo for their suggestions on data analysis.

121 3.8 References

1. Human Microbiome Project, C., Structure, function and diversity of the healthy human microbiome. Nature 2012, 486 (7402), 207-14.

2. Turnbaugh, P. J.; Ley, R. E.; Hamady, M.; Fraser-Liggett, C. M.; Knight, R.; Gordon, J. I., The human microbiome project. Nature 2007, 449 (7164), 804-810.

3. Allaire, J. M.; Crowley, S. M.; Law, H. T.; Chang, S. Y.; Ko, H. J.; Vallance, B. A., The Intestinal Epithelium: Central Coordinator of Mucosal Immunity. Trends in Immunology 2018, 39 (9), 677-696.

4. Turner, J. R., Intestinal mucosal barrier function in health and disease. Nature Reviews Immunology 2009, 9 (11), 799-809.

5. Belkaid, Y.; Hand, Timothy W., Role of the Microbiota in Immunity and Inflammation. Cell 2014, 157 (1), 121-141.

6. Gallo, R. L.; Hooper, L. V., Epithelial antimicrobial defence of the skin and intestine. Nature Reviews Immunology 2012, 12 (7), 503-16.

7. Wesener, D. A.; Dugan, A.; Kiessling, L. L., Recognition of microbial glycans by soluble human lectins. Current Opinion in Structural Biology 2017, 44, 168-178.

8. Lis, H.; Sharon, N., Lectins: Carbohydrate-Specific Proteins That Mediate Cellular Recognition. Chemical Reviews 1998, 98 (2), 637-674.

9. Taylor, M. E.; Drickamer, K.; Schnaar, R. L.; Etzler, M. E.; Varki, A., Discovery and Classification of Glycan-Binding Proteins. In Essentials of Glycobiology, 3 ed.; Varki, A.; Cummings, R. D.; Esko, J. D.; Stanley, P.; Hart, G. W.; Aebi, M.; Darvill, A. G.; Kinoshita, T.; Packer, N. H.; Prestegard, J. H.; Schnaar, R. L.; Seeberger, P. H., Eds. Cold Spring Harbor (NY): Cold Spring Harbor (NY), 2015; pp 361-372.

10. Tsuji, S.; Tsuura, Y.; Morohoshi, T.; Shinohara, T.; Oshita, F.; Yamada, K.; Kameda, Y.; Ohtsu, T.; Nakamura, Y.; Miyagi, Y., Secretion of intelectin-1 from malignant pleural mesothelioma into pleural effusion. British Journal of Cancer 2010, 103 (4), 517-23.

11. Tsuji, S.; Uehori, J.; Matsumoto, M.; Suzuki, Y.; Matsuhisa, A.; Toyoshima, K.; Seya, T., Human intelectin is a novel soluble lectin that recognizes galactofuranose in carbohydrate chains of bacterial cell wall. Journal of Biological Chemistry 2001, 276 (26), 23456-23463.

12. Lee, J. K.; Baum, L. G.; Moremen, K.; Pierce, M., The X-lectins: a new family with homology to the Xenopus laevis oocyte lectin XL-35. Glycoconjugate Journal 2004, 21 (8-9), 443-50.

13. Wesener, D. A.; Wangkanont, K.; McBride, R.; Song, X.; Kraft, M. B.; Hodges, H. L.; Zarling, L. C.; Splain, R. A.; Smith, D. F.; Cummings, R. D.; Paulson, J. C.; Forest, K. T.;

122 Kiessling, L. L., Recognition of microbial glycans by human intelectin-1. Nature Structural & Molecular Biology 2015, 22 (8), 603-10.

14. McMahon, C. M.; Isabella, C. R.; Windsor, I. W.; Kosma, P.; Raines, R. T.; Kiessling, L. L., Stereoelectronic Effects Impact Glycan Recognition. Journal of the American Chemical Society 2020, 142 (5), 2386-2395.

15. Herget, S.; Toukach, P. V.; Ranzinger, R.; Hull, W. E.; Knirel, Y. A.; von der Lieth, C. W., Statistical analysis of the Bacterial Carbohydrate Structure Data Base (BCSDB): characteristics and diversity of bacterial carbohydrates in comparison with mammalian glycans. BMC Structural Biology 2008, 8, 35.

16. Tytgat, H. L.; Lebeer, S., The sweet tooth of bacteria: common themes in bacterial glycoconjugates. Microbiology and Molecular Biology Reviews 2014, 78 (3), 372-417.

17. Whitfield, C.; Szymanski, C. M.; Aebi, M., Eubacteria. In Essentials of Glycobiology, 3 ed.; Varki, A.; Cummings, R. D.; Esko, J. D.; Stanley, P.; Hart, G. W.; Aebi, M.; Darvill, A. G.; Kinoshita, T.; Packer, N. H.; Prestegard, J. H.; Schnaar, R. L.; Seeberger, P. H., Eds. Cold Spring Harbor (NY): Cold Spring Harbor (NY), 2015; pp 265-282.

18. Raetz, C. R. H.; Whitfield, C., Lipopolysaccharide Endotoxins. Annual Review of Biochemistry 2002, 71 (1), 635-700.

19. Willis, L. M.; Whitfield, C., KpsC and KpsS are retaining 3-deoxy-d-manno-oct-2- ulosonic acid (Kdo) transferases involved in synthesis of bacterial capsules. Proceedings of the National Academy of Sciences of the United States of America 2013, 110 (51), 20753-20758.

20. Whitfield, C.; Roberts, I. S., Structure, assembly and regulation of expression of capsules in Escherichia coli. Molecular Microbiolgy 2002, 31 (5), 1307-1319.

21. Adibekian, A.; Stallforth, P.; Hecht, M.-L.; Werz, D. B.; Gagneux, P.; Seeberger, P. H., Comparative bioinformatics analysis of the mammalian and bacterial glycomes. Chemical Science 2011, 2 (2), 337-344.

22. Brown, S.; Santa Maria, J. P., Jr.; Walker, S., Wall teichoic acids of gram-positive bacteria. Annual Review of Microbiology 2013, 67, 313-36.

23. Richards, M. R.; Lowary, T. L., Chemistry and Biology of Galactofuranose‐Containing Polysaccharides. Chembiochem 2009, 10 (12), 1920-1938.

24. Chung, H. S.; Yang, E. G.; Hwang, D.; Lee, J. E.; Guan, Z.; Raetz, C. R., Kdo hydroxylase is an inner core assembly enzyme in the Ko-containing lipopolysaccharide biosynthesis. Biochemical and Biophysical Research Communications 2014, 452 (3), 789-794.

123 3.9 Supplementary Information

Figure 3-S1. Competitive hItln-1 binding in communities is dependent on lectin concentration and washing. Four communities were assayed for hIntL-1 binding by flow cytometry. The strains in each mixture that are known hIntL-1 ligands are denoted above each graph (a) contained L. reuteri and E. fergusonii, (b) L. reuteri and B. plebeius, (c) E. fergusonii and B. plebeius, and (d) L. reuteri and B. angulatum) while all four share B. ovatus and P. penneri as non-binding strains. The effects of increasing amounts of hIntL-1 and the removal of a wash step were used to suggest competition between bound strains that is dependent on the binding affinity for the microbial cell surface. The red line represents a perfect correlation between predicted and observed with a slope of 1 and y-intercept of 0. This data represents a separate experiment with the same set up as that shown in Figure 3-4, but here we observed increased binding in (a), (b), and (c) with an increased concentration of hItln-1 and when the wash was eliminated.

124

Chapter 4

Lectin-sequencing for analyzing microbial communities

Contributions: Protein expression and purification performed by Christine R. Isabella, Smrithi Raman, Michael Wuo (SP-D), Melanie Halim (MBL) and Amanda Dugan (V109D). Flow cytometry performed by Christine R. Isabella, Robert L. McPherson and Smrithi Raman. 16S library prep and data analysis performed by Christine R. Isabella. 16S sequencing performed at the BioMicro center at MIT. Metagenomic library preparation, sequencing, and data analysis performed by Tony Gaca of the Microbial Omics Core at The Broad Institute of Harvard and MIT. Microscopy performed by Michael Wuo and Amanda Dugan. Research designed by Christine R. Isabella, Darryl A. Wesener, Hera Vlamakis, Eric J. Alm, Ramnik J. Xavier and Laura L. Kiessling.

125 4.1 Abstract

Soluble carbohydrate binding proteins (lectins) contribute to innate immunity. Mammalian lectins such as surfactant protein D (SP-D) and mannose-binding lectin (MBL) can promote pathogens clearance, while others are reported to directly kill microbes. Although some lectins have been shown to bind select pathogenic microbes in vitro, lectin specificity for different species is largely uncharacterized. Here, we describe lectin-sequencing (lectin-SEQ), as an unbiased method to assess lectin recognition of microbes within the native microbial communities by applying lectins to human stool samples. Lectin-SEQ using human SP-D revealed its propensity to engage bacteria that can become pathogenic. In contrast, human intelectin-1 (hIltn-1), a soluble lectin with unknown microbial binding targets, preferentially recognized health-promoting bacteria. Comparative analysis of samples from healthy and inflammatory bowel disease (IBD) donors showed a decrease in hItln-1 binding bacteria, suggesting that microbes recognized by hItln-1 are altered in patients. These findings highlight the utility of lectins as powerful tools for profiling microbial communities.

126 4.2 Introduction

The human intestine is home to trillions of diverse microbes, collectively termed the gut microbiome.1, 2 An imbalance or dysbiosis of the organisms present in the gut microbiome contributes to numerous disease state.3 Thus, the recognition and regulation of microbes by the host is critical for maintaining health. Soluble lectins are a class of secreted carbohydrate-binding proteins that bind to microbes through specific recognition of cell surface glycans.4-6 Lectins play important roles in anti-microbial innate immunity, as lectin binding to microbes can result in their clearance from the host.7-9 Previous studies indicate that lectins recognize pathogenic organisms and that microbe binding by lectins can be strain-specific.10 However, these studies were conducted on limited panels of monocultured microbes that do not capture the breadth of microbial diversity in the host. To understand how lectins shape microbial populations, new methods are needed to understand community-wide recognition in native communities.

We sought to address this gap by monitoring the binding of human lectins to microbes in complex communities. To this end, we developed lectin-sequencing (lectin-SEQ), whereby fluorescently tagged recombinant lectins are applied to complex microbial communities and lectin-bound bacteria are sorted via fluorescence-activated cell sorting (FACS) followed by sequencing to identify the microbial targets of soluble lectins. The gut microbiome is the most complex microbial community in the body, and information about this community is easily accessible via stool samples, making it ideal for assessing microbial recognition. We chose four lectins with known or suggested roles in innate immunity at mucosal surfaces, but different glycan binding specificity: human intelectin-1 (hItln-1), human intelectin-1 V109D (V109D), surfactant protein D (SP-D) and mannose binding lectin (MBL, Figure 4-1) . HItln-1, an X-type lectin, is expressed in the small intestine and lung and binds exclusively microbial glycans.10, 11

127 Genome wide association studies (GWAS) have linked the polymorphism, V109D, to asthma12 and Crohn’s disease (CD),13 but hItln-1’s microbial binding specificity is unknown. MBL and

SP-D are members of the collectin family of lectins. each has been shown to bind pathogenic bacteria and play important roles in pathogen clearance. Accordingly, mutations or insufficiency in these lectins result in increased bacterial or viral infection.14-18

Figure 4-1. Lectin trimeric structures and binding ligands. (A) Crystal structure of human intelectin-1 (hItln-1) bound to allyl-β-D-Galactofuranose, and hItln-1 with V109D residue shown in blue sticks (PDB: 4WMY). HItln-1 and hItln-1 V109D share identical monosaccharide ligands (unpublished data). (B) Ligands and crystal structure of human mannose binding lectin (MBL) neck and carbohydrate recognition domain (PDB: 1HUP) with α-methyl-D-mannopyranoside from the rat MBL structure (PDB: 1RDM, has identical binding site) docked in silico. (C) Ligands and crystal structure of human surfactant protein D (SP-D) neck and carbohydrate recognition domain bound to L,D-α-heptose (PDB: 2RIB). All ligands are shown in black sticks, calcium ions shown in green spheres, and hard protein shown in wheat, light blue, and light cyan for hItln1, MBL, and SP-D, respectively.

128 HItln-1, V109D, MBL and SP-D are all trimeric, calcium-dependent lectins (Figure 4-1).

While the hItln-1 V109D polymorphism is associated with disease, its carbohydrate specificity is identical to hItln-1 by microbial glycan array analysis (unpublished data). HItln-1 and V109D bind the microbial glycans β-D-Galactofuranose (β-Galf), D-glycerol-1-phosphate (GlyP), D- glycero-D-talo-oct-2-ulosonic acid (KO), and 3-deoxy-D-manno-oct-2-ulosonic acid (KDO)

(Figure 4-1A), but their microbe binding specificity are not known.10, 19 MBL and SP-D recognize distinct glycans. MBL has been reported to bind α-D-mannopyranoside (mannose) and

N-Acetyl-D-glucosamine (GlcNAc) (Figure 4-1B),20 and SP-D is reported to bind L-glycero-α-D- manno-heptose (L,D-heptose) and D-glycero-α-D-manno-heptose (D,D-heptose), and mannose

(Figure 4-1C).16, 21, 22 While MBL and SP-D have been shown to bind select pathogenic isolates in vitro,14, 18, 23 their microbial binding specificity has not been comprehensively analyzed in complex communities.

Using these lectins, we performed lectin-SEQ on stool samples from heathy donors. First, we used hItln-1 and 16S sequencing to validate our strategy. Then, using, hItln-1, V109D and

SP-D, we sorted bacteria from healthy human donors and used metagenomic sequencing to achieve species level resolution. Using this method, we identified each lectin’s microbial targets, as well as their abundance, in the donor stool samples. Lectin-SEQ revealed unprecedented recognition of health-associated bacteria by hItln-1, whereas SP-D bound bacteria known to act as pathogens. We then sought to profile inflammatory bowel disease (IBD, including Crohn’s disease (CD) and ulcerative colitis (UC)) donor stool samples using lectins. In IBD donor samples, hItln-1-bound bacteria were depleted, while SP-D showed increased agglutination.

Taken together, lectin-SEQ has revealed distinct microbial binding profiles of human lectins with different glycan binding specificity and provides insight toward the biological functions of

129 these lectins in the context of the microbiome. Further, these data demonstrate the potential for lectins to be developed as tools to rapidly profile microbial communities.

4.3 Results

4.3.1 Soluble lectins bind stool bacteria

Many lectins show specificity for binding to microbial glycans, however their ability to bind specific microbes from complex native microbial communities has not been examined. We first sought to assess the binding of lectins to microbes in the human stool microbiome. To determine whether hItln-1, V109D, MBL, and SP-D recognize bacteria in the microbiome, we applied recombinant StrepII-tagged lectins to homogenized stool samples from healthy human donors (Figure 4-2). Each lectin showed binding to a subset of the bacterial population by flow cytometry in calcium containing buffer (Figure 4-2A-E). In the presence of the calcium chelator

EDTA, lectin binding to microbes was significantly reduced. These data indicate that the observed binding to cells is glycan-dependent. HItln-1 and V109D bound approximately to 20% of the total bacterial population, and each bound population shows similar scattering properties

(Figure 4-2B, C). MBL bound 14% (Figure 4-2D) and SP-D bound 13% (Figure 4-2E).

Notably, the scattering properties of the bound populations are different for each lectin.

HItln-1 and V109D have different glycan ligands than MBL and SP-D (Figure 4-1). To understand whether differences in glycan ligands translates to differences in microbial recognition, we analyzed stool samples by co-staining with multiple lectins. Co-staining of healthy stool samples with Flag-hItln-1 and strep-SP-D showed that each lectin indeed bound distinct subsets of the microbial population (Figure 4-2F, Q1 and Q3). Additionally, 7% of the population was positive for both lectins (Figure 4-2F, Q2). However, because SP-D can agglutinate bacteria, flow cytometry cannot determine whether the double positive population

130 contains hItln-1 bound bacteria entrapped by agglutination, or if both lectins are indeed bound to the same cells. Fluorescence microscopy should be used to further understand this result.

Engineering a variant of SP-D lacking the N-terminal oligomerization domain, and therefore lacking the ability to agglutinate bacteria, would also be useful for flow cytometry-based assays.

We similarly sought to determine if hItln-1 and MBL recognize distinct populations of the stool microbiome by co-staining with lectins directly labeled with fluorophores (hItln-647 and MBL-

555). Analysis of stool samples co-stained hItln-647 and MBL-555 by flow cytometry revealed that hItln-1 and MBL bind distinct populations of the healthy stool microbiome with minimal overlap (Figure 4-2G).

131

Figure 4-2. Soluble lectins bind the human microbiome. Representative flow cytometry dot plots of a healthy donor stool sample stained with anti-strep antibody only (A) or incubated with fluorescent anti-strep antibody combined with hItln-1 (B), hItln-1 V109D (C), MBL (D), or SP-D (E) in the presence of calcium and EDTA. Lectin positive population falls within the Q2 gate. (F) Co-staining of healthy stool sample with Flag-hItln-1 and strep-SP-D in calcium and EDTA. (G) Representative flow cytometry dot plot of stool co-stained hItln-647 and MBL-555 calcium and EDTA. In all panels, events were first gated based on being positive for SytoBC, and only SytoBC-positive events are shown.

132 4.3.2 16S Sequencing reveals patterns of hItln-1 binding to stool bacteria

To determine the identity of the bacteria bound by each lectin in human stool samples, we developed a method combining fluorescence activated cell sorting (FACS) to sort populations bound by the lectins followed by sequencing of each population (lectin-SEQ, Figure 4-3A). We first ran a pilot study with hItln-1, using 16S sequencing to identify bacteria in each sorted population. Homogenized stool samples from three healthy donors were stained with strep- tagged hItln-1 in triplicate, sorted by FACS, and each fraction (input, total, lectin+ and lectin–) was sequenced. Principal coordinate analysis of unweighted unifrac distances revealed that samples from the different donors are not similar to each other (Figure 4-3B). This is to be expected from human donor stool samples. Additionally, within each donor, the sorted fractions show clustering of unstained, hItln-positive, and hItln-negative fractions, indicating there are differences between the fractions and that the replicates are consistent. (Figure 4-3B).

To understand the composition of each fraction, we assembled operational taxonomic units (OTUs) and calculated the average relative abundance of each OTU within each fraction

(Figure 4-3C). OTU classification is based on differences in sequence of the V3-V4 region of the 16S gene, but in terms of annotation, there is not a matching reference sequence that allows assignment at the species, or even genus, level in most cases. Thus, 16S sequencing is low resolution and does not allow us to identify lectin-bound species. However, we can observe patterns in relative abundance at the level of bacterial order across sorted fractions from each donor (Figure 4-3C) that validate our sorting and sequencing strategy. Reads across triplicate samples for each fraction were averaged and the relative abundance calculated. Low abundance bacteria were defined as orders that were less than 1% relative abundance in all fractions. In donor FC and donor FE, there is enrichment of the order Clostridiales in the hItln-positive

133 fraction, whereas in donor FD, there is an enrichment of the order Bacteroidales (Figure 4-3C).

This observation prompted us to examine whether these differences were due to composition of the sample, or whether we were observing inconsistent bacterial recognition by hItln-1 in different samples. We identified Bacteroides plebeius as a bacteria defined at the species level by

16S sequencing analysis that displayed different abundance across donors. Bacteroides plebeius was only present in the unstained sample from donor FD, but is not present in FC or FE. Further,

B. plebeius was highly enriched in the hItln-positive fraction of the FD stool sample (Figure 4-

3D), which accounts for the differences observed in Figure 4-3C. Bacteroides plebeius contains an enzyme to break down carbohydrates found in red seaweed and is not a typical gut-resident commensal, but rather is present in the gut microbiota of individuals with a diet high in seaweed.24, 25 These data indicate that hintl-1 bacterial recognition is consistent and that we are able to detect differences in community composition with lectin-SEQ.

134

Figure 4-3. 16S lectin-sequencing of hItln-1 sorted stool bacteria. (A) Schematic overview of lectin- bound cell sorting of stool sample followed by sequencing (lectin-SEQ). (B) Principal coordinate analysis of unweighted unifrac distances of unstained, hItln-1+, and hItln-1– fractions from healthy donors. (C) Bar plots showing average relative abundance of bacterial order in input, unstained, hItln- 1+ and hItln-1– fractions for each stool donor. Low abundance bacteria represents all orders that were present at less than 1% in all fractions (D) Average relative abundance of Bacteroides plebeius in the unstained, hItln-1+ and hItln-1– fractions across stool donors. Error bars represent standard deviation, n=3.

135 4.3.3 Metagenomic sequencing identifies lectin-bound bacteria

Lectin-SEQ using 16S sequencing validated our FACS method by revealing that we see enrichment of organisms in the lectin-positive fraction. To identify the species bound by lectins, we next paired lectin-SEQ with metagenomic sequencing for higher resolution sequencing data.

Four healthy donor samples were stained with hItln-1, hItln-1 V109D, and SP-D and sorted in triplicate by FACS as shown in Figure 4-3A. Following FACS, each fraction (input, total, lectin+ and lectin–) was sequenced by shotgun metagenomics. For all unstained and lectin- negative fractions, 1E6 cells were collected. For the lectin-positive fractions, only 5E4 - 6E6 cells were collected due to sample limitations. These cell numbers do not contain an amount of DNA sufficient for a standard metagenomics workflow, however the metagenomics workflow was modified by 10-fold reduction of the tagmentation enzyme, reduction of tagmentation time from five minutes to one minute, and increasing the number of PCR cycles from 12 to 20 to amplify the library. These modifications resulted in library quality sufficient for sequencing and yielded

3.3E6 reads per sample.

Principal coordinate analysis (PCoA) showed that each individual donor stool sample was distinct, with the unstained and negative fractions clustering together for each donor (Figure

4-4A). Despite heterogeneity among donor samples, each lectin bound a specific population of bacteria across donors. Further, the hItln1+ and V109D+ fractions cluster together, while SP-D+ populations cluster separately (Figure 4-4A). To identify which species were bound by each lectin, we examined the relative abundance of species in each fraction across donors and calculated a lectin enrichment index for each taxa present using the equation: absindex = relative abundance (lectin+) – relative abundance (lectin –) / relative abundance (lectin+) + relative abundance (lectin –) (Figures 4B-D and 4-1S). The same analysis on the genus level reveals

136 patterns of enrichment across donors in the hItln1+ and V109D+ are similar, with Roseburia,

Lachospiraceae, Coprococcus, and Blautia being strongly enriched across donor samples. SP-D showed enrichment of distinct genera, including Stenotorphomonas, Pseudomonas, and

Faecalibacterium across donors (Figure 4-4B). The relative abundance of individual species that were enriched in multiple donor samples by hItln-1 and V109D, as well as B. plebeius are shown in Figure 4-4C. Notably, B. plebeius showed the same pattern by metagenomics as by 16S sequencing—it was only present in donor FD, and is enriched in both hItln-1 and V109D positive fractions. Species enriched by SP-D across donor samples are shown in Figure 4-4D.

137

138 Figure 4-4. Lectin-SEQ metagenomic analysis of healthy donor stool. (A) Genus level principal coordinate analysis (PCoA) generated from the Bray-Curtis dissimilarity metric of unstained (red), lectin+ (blue) and lectin– (green) fractions for total (unstained, triangle), hItln-1 (circle), V109D (plus), and SP-D (square) across healthy donor samples. (B) Enrichment plot depicting enrichment (green circles) or depletion (red circles) or no change (blue circles) of genera in the lectin+ fraction for hItln-1, V109D, and SP-D across donors. (C) Relative abundance of species enriched by hItln-1 and V109D. (D) relative abundance of species enriched by SP-D. Error bars represent standard deviation, n=3.

4.3.4 Lectin binding levels are altered in IBD

The species-level identification of lectin bound bacteria achieved by metagenomic sequencing allowed us to determine that species enriched by hItln-1 are reported to be decreased in the Crohn’s Disease microbiota.26-28 To determine if profiling of the IBD microbiota with lectins revealed differences in composition, we stained a healthy sample and an IBD sample with recombinant hItln-1 and SP-D for analysis by flow cytometry and fluorescence microscopy

(Figure 4-5A-D). Healthy donor FA showed 15.3% of the population bound by hItln-1 and

12.9% of the population bound by SP-D in a calcium dependent manner (Figure 4-5A). The IBD donor sample showed a significant reduction of hItln-1 bound bacteria, with only 4.35% of the population was bound by hItln-1. The IBD donor sample showed SP-D binding to 9.5% of the population (Figure 4-5B). Notably, the population bound by SP-D in the IBD donor sample showed a strong shift intensity for lectin binding as well as the DNA marker, SytoBC (Figue 4-

4B). We hypothesized that this shift was due to agglutination and therefore imaged the healthy

(Figure 4-5C) and IBD (Figure 4-5D) donor stool samples stained with SP-D by fluorescence microscopy. Both samples showed agglutination by SP-D in the presence of calcium that was inhibited by the addition of EDTA, however, the size of the agglutinated clumps of bacteria was much larger in the IBD donor sample. These data indicate that there likely is an increase in SP-D bound bacteria in the IBD donor stool sample, however agglutination interferes with quantification by flow cytometry.

139

Figure 4-5. HItln-1 and SP-D binding to the IBD microbiome. (A) Representative flow cytometry dot plots of healthy donor FA stool sample stained with anti-strep antibody only, and with hItln-1 or SP-D in calcium and EDTA. (B) Representative flow cytometry dot plots of an IBD donor stool sample stained with anti-strep antibody only, and with hItln-1 or SP-D in calcium and EDTA. Representative fluorescence microscopy images of healthy (C) and IBD (D) donor stool samples stained with SP-D in calcium and EDTA. SytoBc is cyan, SP-D is magenta, arrows point to agglutinated bacteria, scale bar = 20 μM.

140 4.3.5 HItln-1 and MBL binding to the healthy and IBD microbiota.

The agglutination of bacteria after incubation with SP-D observed in Figure 4-5C-D is not compatible with FACS as it will cause inaccurate cell counts and imprecise sorting. We therefore expressed and purified StrepII-tagged MBL, which has similar glycan ligands but does not agglutinate bacteria. We then applied hItln-1 and MBL to stool homogenates from five healthy, five UC, and five CD donors. The lectin-binding data is summarized in Figure 4-6. In healthy donor stool samples, MBL binding is significantly lower than hItln-1 binding (paired t- test, p = 0.0349). In UC and CD donor stool, most donors show a decrease in hItln-1 binding bacteria, though the decrease in hItln-1 is not statistically significant. Interestingly, the cases of

UC and CD donors where hItln-1 binding is 17% or higher, MBL binding is low. These data suggest a potential inverse relationship between hItln-1 and MBL binding bacteria in stool populations, however, more donors will be required to determine the significance of this pattern.

Figure 4-6. hItln-1 and MBL binding levels in healthy, UC and CD donor stool microbiome. Percent of stool microbiome staining positive for lectin-binding in the presence of hItln-1 and MBL across 5 individual healthy, ulcerative colitis (UC), and Crohn’s disease (CD) donor stool samples. Datapoints are paired by donor. *p = 0.0349, paired t-test.

4.4 Discussion

We examined the ability of soluble lectins, which have been studied primarily as innate immune factors with roles in pathogen clearance,29-33 to recognize bacteria in the microbiome.

Using four lectins with different glycan binding specificities: hItln-1, V109D, SP-D and MBL,

141 we determined that each recognizes a subset of the heathy human stool microbiome (Figure 4-

2). To identify each lectin’s bacterial binding targets in a community context, we developed lectin-SEQ, combining FACS with metagenomic sequencing to generate species-level identification of lectin-bound bacteria. By lectin-SEQ, we identified distinct microbial binding partners for hItln-1 and SP-D in human fecal samples, which afford insight into each lectin’s biological function.

By lectin-SEQ, we determined that hItln-1 and V109D show the same microbial binding profile. Position 109 is distal to the carbohydrate binding site (Figure 4-1A), and hItln-1 and

V109D have the same glycan specificity (unpublished data), therefore we did not expect a difference in microbe binding. Nevertheless, because position 109 is near the interface between trimers, we did not exclude the possibility of a possible effect on trimeric structure or stability that could affect microbial recognition. Our data suggest that the substitution has no effect on the microbial binding specificity of V109D. Therefore, further studies of the biological consequences of hItln-1 binding to microbes and how those may be disrupted by V109D are required to determine the contribution of V109D to disease.

Lectin-SEQ revealed that hItln-1 shows unprecedented recognition of health-promoting bacteria. To the best of our knowledge, human lectins have never been reported to bind health- promoting microbes, but rather have only been assessed for binding to pathogens and their roles in pathogen clearance. However, reports on mosquitos and amoeba show a role of lectins in protecting beneficial microorganisms within the host.34, 35 Our findings provide valuable insight into the biological role of hItln-1, suggesting that hItln-1 may play a role in establishing the microbiome or in promoting colonization by specific microbes. In our dataset, hItln-1 recognized numerous members of Clostridium cluster XIVa including Ruminococcus torques,

142 Ruminococcus Lactaris, Roseburia intestinalis, Eubacterium hallii, Dorea formiceigenerans,

Coprococcus comes, and Blautia sp. (Figure 4-4B, 4-S1).36 Members of this clade have been shown to colonize the mucus layer of the gut.37 Intriguingly, sheep intelectin has been shown to interact with gastric mucins.38 Because intelectins do not recognize mammalian glycans, it is possible that intelectin is able to interact with mucins through a protein-protein interaction mediated by intelectin’s N-terminal fibrinogen-like domain. Such an interaction would allow hItln-1 to promote mucosal colonization of its bacterial binding partners by anchoring them to the mucus layer. The ability of hItln-1 to form a complex between bacteria and intestinal mucins should be investigated.

In addition to hItln-1-bound species being mucus associated, many species bound by hItln-1 in healthy donors have been reported to be decreased in the IBD gut microbiome, including Ruminococcus torques, Eubacaterium rectale, Roseburia intestinalis, and

Coprococcus comes 26-28. Indeed, our flow cytometry-based analysis of the microbiome from

IBD donors showed that hItln1-binding bacteria are decreased in the majority of donors (Figure

4-5, 4-6). The hItln1-binding bacteria should be sorted and sequenced to identify if the lectin- bound population reflects the decrease of these organisms in disease. In the same analysis, MBL binding remained relatively constant across healthy, CD, and UC donor stool samples (Figure 4-

6). It is notable that in our analysis paired by donor for hItln-1 and MBL binding, the donors with the highest levels of hItln-1 binding also showed the lowest levels of MBL binding. However, to determine if there is a negative correlation between hItln-1 and MBL binding levels, a larger sample size is needed. While we have not yet sequenced MBL binders, we expect similar results to SP-D, as the two lectins have more similar glycan binding profiles. Nevertheless, MBL-bound

143 bacteria across healthy and IBD donor stool samples should be sorted and sequenced to identify species recognized by MBL.

In contrast to hItln-1, analysis of SP-D binding to bacteria from healthy donors revealed only three binders were enriched in the lectin-positive fraction from multiple donors:

Pseudomonas aeruginosa, Stenotrophomonas maltophilia, and Faecalibacterium prausnitzii

(Figure 4-S1). Faecaibacterium prausnitzii was by far the most prevalent species in the SP-D- positive fraction, and is also one of the most prominent members of the healthy human microbiota.39 Pseudomonas aeruginosa and S. maltophilia are opportunistic pathogens, which are associated most commonly with respiratory diseases, but can be present in the intestine in immunocompromised individuals.40-43 Additionally, P. aeruginosa and S. maltophilia often co- occur, and are able to form interspecies biofilms.44 The role of SP-D in clearance of respiratory pathogens is well-studied, therefore the results of SP-D lectin-SEQ are consistent with previous work. Respiratory and oral bacteria are also found in the intestine during microbial dysbiosis and associated with CD and UC,45, 46 suggesting that SP-D could detect microbial changes in the IBD microbiota. In our analysis of the stool microbiome of a donor with IBD, SP-D showed no increase in binding events by flow cytometry, but displayed increased agglutination by microscopy (Figure 4-3). The microscopy data suggests that there is an increase in SP-D binding bacteria in the IBD stool sample, however, this data indicates that the full length SP-D construct with the capacity to form higher order oligomers and agglutinate bacteria is not compatible with

FACS. When agglutinated, individual lectin-bound cells cannot be sorted and agglutinated masses could be too large to pass through the FACS nozzle or otherwise interfere with the fluidics. This also suggests the possibility that the lectin-seq data generated for SP-D may not represent organisms specifically bound by SP-D but could include organisms trapped in an

144 agglutinated mass. While our results show promise for SP-D as a probe of opportunistic pathogens in the gut microbiota, it will require protein engineering of a SP-D variant to eliminate the ability to agglutinate to be compatible with flow-based applications.

Our results from lectin-SEQ also reveal that different glycan specificity results in different microbial recognition. HItln-1 and SP-D are reported to recognize glycans of the core lipopolysaccharide (LPS)—KO/KDO and heptose, respectively.19, 21, 22, 47 However, hItln-1 and

SP-D did not show overlap in their microbe-binding profiles. This suggests that core LPS is not accessible for lectin-binding. Interestingly, an analysis of Kyoto Encyclopedia of Genes and

Genomes (KEGG) pathways in bacteria that are increased or decreased in CD show that organisms decreased in CD contain a UDP-galactopyranose mutase gene, which is required to convert galactopyranose to galactofuranose, a hItln-1 ligand. In contrast, organisms increased in

CD contained enzymes involved in the biosynthesis and export of heptose, a reported SP-D ligand.26 A similar analysis of metagenomic data from lectin-SEQ will provide invaluable insight into the glycan biosynthesis genes that correlate with microbial recognition by lectins with different glycan specificities.

Lectin-SEQ has allowed us to comprehensively analyze the bacterial targets of lectins in native microbial communities. By doing so, we were able to develop novel hypotheses about the biological functions of hItln-1 and bring to light the idea that, in addition to many lectins playing important roles in pathogen clearance, some lectins may act as positive selectors of health- benefiting microbes in the human microbiota. Additionally, lectin-SEQ will provide insight into the pathways for glycan biosynthesis in many understudied environmental microbes, for which very little is known about their cell surface carbohydrates. Finally, by identifying microbial targets of lectins that are increased or decreased in disease, we have demonstrated the potential

145 of lectins as promising tools for profiling microbial communities. Future studies to define the binding targets of an expanded panel of lectins will provide a toolkit for profiling complex microbial communities and a facile and powerful method of detecting specific microbial species within a community.

146 4.5 Materials and Methods

4.5.1 Protein expression and purification

Cloning of full-length N-terminal StrepII-tagged hItln-1 into the pcDNA4/myc-HisA vector backbone (Life Technologies) was described previously (Wesener 2015). Strep-II-tagged hItln-1 V109D was generated by site directed mutagenesis of pcDNA4-strepII-hItln-1. Human

SP-D (SFTPD, NM_003019) expression plasmid (pCMV3-SFTPD, Sino Biological) was modified to include the strepII-tag sequence (5’-TGGAGCCATCCGCAATTTGAGAAG-3’) C- terminal of amino acid 20. The strepII tag was inserted in pCMV3-SFTPD using inverse PCR with fwd primer (5’-CAATTTGAGAAGAAGACCTACTCCCACAGAACA-3’) and rev primer

(5’-CGGATGGCTCCATGCTTCCAGGTAGCCC-3’). Human MBL (MBL2, NM_000242) gBlock DNA Fragment (Integrated DNA Technologies) was Gibson ligated into a linearized pcDNA4 plasmid (primers fwd (5’-CGTTTAAACTTAAGCTTCACC-3’) and rev (5’-

TGAGGATCCACTAGTCCAGTG-3’)). StrepII tag (5’-

TGGAGCCATCCGCAGTTTGAAAAG-3’) was inserted C-terminal of amino acid 20 into pcDNA4-hMBL2 using inverse pcr mutagenesis using fwd primer (5’-

TGGAGCCATCCGCAGTTTGAAAAGGAAACTGTGACCTGTGAGG-3’) and rev primer (5’-

TTCTGAGTAAGACGCTGCC-3’). All plasmid sequences were verified by DNA sequencing

(Quintara Biosciences).

HItln-1, MBL, and SP-D were each expressed by transient transfection of suspension- adapted HEK 293T cells. Cells were transfected at 1.8 x 106 cells/mL in growth medium

(DMEM, (Thermo Fisher, cat. no. 11995) supplemented with 10 % heat-inactivated FBS, 50

U/mL penicillin-streptomycin, 4 mM L-glutamine, and 1X non-essential amino acids) using lipofectamine 2000 (Thermo Fisher) following the manufacturer’s protocol. Six hours after the

147 transfection, culture medium was exchanged to FreeStyle F17 expression medium (Thermo

Fisher) supplemented with 50 U/mL penicillin-streptomycin, 4 mM L-glutamine, 1x nonessential amino acids, 0.1% heat-inactivated FBS, and 0.1% pluronic F-68 (Thermo Fisher). Transiently transfected cells were cultured up to 4 days or until viability was below 60%. The conditioned expression medium was then harvested by centrifugation and sterile filtration.

For purification, CaCl2 was added to the condition medium to a final concentration of 10 mM, avidin (7 mg/mL) was added at 12 ul per mL of expression media (IBA, cat. no. 2-0204-

015, per the IBA protocol). Protein was captured onto 2 mL of Strep-Tactin Superflow High

Capacity resin (IBA Lifesciences, cat. no. 2-1208-002) equilibrated with HEPES-Ca buffer (20 mM HEPES pH 7.4, 150 mM NaCl, 10 mM CaCl2). The resin was then washed with HEPES-

EDTA buffer (20 mM HEPES pH 7.4, 150 mM NaCl, 1 mM EDTA), and eluted with 5 mM d- desthiobiotin (Sigma) in HEPES-EDTA buffer. StrepII-tagged hItln-1 was concentrated with a

30,000-MWCO Amicon Ultra centrifugal filter. StrepII-tagged MBL and SP-D were concentrated with a 10,000-MWCO Vivaspin 6 centrifugal filter (GE). All proteins were buffer exchanged to HEPES-EDTA buffer for storage. Protein concentrations were determined by absorbance at 280nm. Extinction coefficients and molecular weights were calculated for the mature trimeric product of each protein (without the signal peptide) using the ProtParam tool

(web.expasy.org/protparam). StrepII-tagged hItln-1 had a calculated ε = 239,775 cm-1 M-1 and an estimated molecular mass of 102,024 Da. StrepII-tagged MBL had a calculated ε = 71,595 cm-1

M-1 and an estimated molecular mass of 75,183 Da. StrepII-tagged SP-D had a calculated ε =

68,505 cm-1 M-1 and an estimated molecular mass of 109,617 Da.

148 4.5.2 Direct labeling of lectins with fluorophores

Recombinang StrepII-tagged hItln-1 and MBL were directly conjugated with fluorophores using NHS-ester conjugation chemistry. HItln-1 was labeled with Alexa Fluor 647

NHS Ester (Succinimidyl Ester) (ThermoFisher Scientific) and MBL was labeled with Alexa

Fluor 555 NHS Ester (Succinimidyl Ester) (ThermoFisher Scientific) following manufacturers protocol as follows: proteins were labeled in storage buffer (HEPES-EDTA, pH 7.4 as described above) by the addition of dye dissolved in DMSO at a ratio of 1:4 (w:w, dye:protein), and incubation for 1 hr at room temperature with mixing. Unreacted dye was removed by desalting using a Zeba 2mL 7MWCO Spin Desalting Column (ThermoFisher Scientific) following manufacturer protocol. Molar ratio of fluorophore:protein was determined from UV-Vis absorbance spectrum using the extinction coefficients: ε Αlexa Fluor 555 = 155,000 cm-1 M-1, ε

Alexa Fluor 647 = 270,000 cm-1 M-1.

4.5.3 Preparation of human stool samples

Healthy human stool samples were provided as a homogenized suspension by the Alm

Lab (MIT). The preparation was as follows. Healthy subjects stool samples were brought into the anaerobic chamber within 2 hours of donation. Samples were homogenized using 1X PBS (pH

7.4), 0.1% L-cysteine at a ratio of 1 g stool:2.5 mL PBS. Glycerol (25% in 1X PBS, 0.1% L- cysteine) was added to the homogenate to a concentration of 12.5%, giving a final solution of 1g stool in 5mL 1X PBS, 0. % L-cysteine, and 12.5% glycerol. The homogenate was removed from the anaerobic chamber and stored at -80 °C.

Clinical samples were provided as homogenized suspension by the Xavier Lab (Broad

Institute of MIT and Harvard). The preparation was as follows. Stool was frozen at -80 °C upon collection. For processing prior to use, samples were thawed on ice and brought into the

149 anaerobic chamber. Stool (0.25 - 1 g) was transferred to a gentleMACS C-tube containing 10 mL of 1x PBS, 40% glycerol, and 0.001% cysteine. Samples were homogenized in Mitenyi gentle

MACS Disassociator on intestine setting for 60 seconds at increasing speed. Homogenized stool suspension was aliquoted, removed from the anaerobic chamber, and stored at -80 °C.

4.5.4 Flow cytometry and FACS of donor stool samples

All buffers were sterile filtered before use and all centrifugation steps were performed at

5000 xg at 4 °C. Stool homogenate was thawed on ice and 200ul of homogenate was washed 2X with 2mL PBS-BSA-T (PBS pH 7.4, Gibco, 0.1% (w/v) BSA (US Biological; A1311), 0.05%

(v/v) Tween-20 (Sigma), pelleted, and resuspended in PBS-BSA-T. The material was then passed through a 35 µm cell strainer cap (Falcon), pelleted, and resuspended in 2 mL HEPES-

Ca-BSA-T (20 mM HEPES pH 7.4, 150 mM NaCl, 10 mM CaCl2, 0.1% BSA, 0.05% tween-20).

A 5 ul sample of the bacterial suspension was pelleted and frozen as the input sample for metagenomic sequencing. Staining was performed at OD600 of 0.2 for all samples. Staining solutions were HEPES-Ca-BSA-T with SYTO BC (1:1000, Thermo Fisher Scientific) and

StrepMAB classic DY649-conjugate (1:150, IBA Lifesciences) for the unstained samples. All lectin-stained Ca2+ samples were stained as follows: HEPES-Ca-BSA-T, SytoBC (1:1000),

StrepMAB classic DY649-conjugate (1:150), 20ug/mL recombinant StrepII-tagged lectin. For the lectin-stained EDTA control samples, the staining conditions were as follows: HEPES-

EDTA-BSA-T (20 mM HEPES pH 7.4, 150 mM NaCl, 1 mM EDTA, 0.1% BSA, 0.05% Tween-

20), SytoBC (1:1000), StrepMAB classic DY649-conjugate (1:150), and 20 ug/mL recombinant strepII-tagged. Staining was performed at 4°C for four hours before being diluted 10X for analysis on a BD LSRII HTS flow cytometer with BD FACSDiva software, or 6X for FACS on a

FACS Aria contained in a biosafety cabinet, with BD FACSDiva software. Flow cytometry was

150 performed at the The Swanson Biotechnology Center Flow Cytometry Facility housed in the

Koch Institute (Massachusetts Institute of Technology, Cambridge, MA).

For the 16S sequencing analysis, 1,000,000 cells were sorted for all fractions (unstained, hItln-1-positive, and hItln-1-negative). For metagenomics, 1,500,000 cells were collected for the negative and lectin-negative samples. For hItln-1- and V109D-positive fractions, 300,000 -

500,000 cells were collected. For SPD-positive fractions, 40,000 - 60,000 cells were collected.

Sorted cells were centrifuged at 7,000 xg at 4°C for 5 minutes, supernatant was removed, and cell pellet was stored at -20°C for nucleic acid extraction.

4.5.5 Fluorescence Microscopy

Human donor stool sample aliquots were washed and resuspended in 1 mL of sterile 1X

PBS. 100 uL aliquots of bacterial samples at OD600 = 0.2 were made. Following centrifugation at 3000xg for 5 min, supernatant was removed and the cell pellet was resuspended in respective staining solution. StrepII-tagged lectin was added to each sample at final concentration 15 ug/mL in HEPES-Ca or HEPES-EDTA containing 0.1% BSA 0.1% Tween-20 for two hours with a

1:250 dilution of StrepMAB-Classic Oyster 645, and counterstained with SYTO BC (1:1000).

Each sample was transferred to a 35 mm glass bottom MatTek poly-lysine coated plate (P35GC-

1.5-14-C) and allowed to incubate at 4°C for one hour. Immunofluorescence images were captured at 25°C using a Zeiss AxioVert 200M inverted confocal microscope equipped with a

Yokogawa CSU-22 spinning disk confocal scan head and Hamamatsu Orca-ER cooled CCD camera. Images were processed using open-source Fiji distribution of ImageJ, and brightness and contrast were adjusted in the control sample. Images were then converted to an RGB format to preserve normalization and then assembled into panels.

151 4.5.6 Nucleic acid extraction

Cell pellets were lysed by addition of 50 ul HotShot Lysis buffer (25 mM NaOH, 0.2 mM

EDTA, pH 12), and heating to 95°C for 10 minutes followed by addition of equal volume

HotShot neutralization buffer (40 mM Tris-HCl, pH 5) and vortexed to combine. Lysed samples were centrifuged at 3000 xg for 10 minutes to pellet debris and transferred to 96 well plates. The genomic DNA was then purified by AMPURE bead cleanup as follows. 90 ul beads was added to 100 ul DNA and incubated for 13 minutes at room temperature. Beads were separated on a magnet for 2 minutes and supernatant removed. Beads were washed 2X with 200 ul EtOH and air dried for 20 minutes while still on the magnet. The elution was performed by resuspending the beads thoroughly in 30 uL H2O and incubation for 7 minutes at room temperature. Beads were separated on a magnet and 27uL eluted DNA was removed to a sterile plate.

4.5.7 16S Sequencing

Library Preparation and sequencing

16S Libraries for Illumina paired-end sequencing were constructed using a two-step PCR approach as described previously (all primer sequences found in reference).48 Prior to first-step

PCR, real-time qPCR was performed to determine the number of amplification cycles needed for uniform amplification. QPCR reactions were assembled at 25 μL reaction volumes as follows:

DNA-free H2O, 12.5 μL; high fidelity (HF) buffer, 5 μL; dinucleotide triphosphates (dNTPs),

0.5 μL; PE16S_V4_U515_F (3 μM), 2.5 μL; PE16S_V4_E786_R (3 μM), 2.5 μL; BSA (20 mg/mL, NEB), 0.625 μL; EvaGreen (20X), 1.25 μL; Phusion HF (NEB), 0.2 μL; and template

DNA, 2 μL. Reactions were run with the following conditions: 98°C for 2 min (initial denaturation); 40 cycles of: 98°C for 30 s (denaturation); 52°C for 30 s (annealing); and 72°C for

30 s (extension). Based on qPCR results, samples with Ct values below 20 were diluted to a Ct

152 value of 20. Because of the low input number of cells and therefore DNA, many samples had Ct values above 20, but first-step PCR was run at 20 cycles to avoid overamplification in negative controls.

First-Step PCR was run in triplicate for each sample at 25 μL reaction volumes with 20 amplification cycles under identical thermocycler conditions as qPCR, above. The reactions were set up as follows: DNA-free H2O, 8.5 μL; high fidelity (HF) buffer, 5 μL; dinucleotide triphosphates (dNTPs), 0.5 μL; PE16S_V4_U515_F (3 μM), 3.3 μL; PE16S_V4_E786_R (3

μM), 3.3 μL; Phusion HF (NEB), 0.25 μL; and template DNA, 2 μL. Triplicate PCR samples were pooled and cleaned using Agencourt AMPure XP-PCR purification (Beckman Coulter,

Brea, CA) following the manufacturer’s instructions. Second-step PCR was run as a single PCR per sample in 25 μL reaction volumes as follows: DNA-free H2O, 10.65 μL; HF buffer, 5 μL; dNTPs, 0.5 μL; PE-III-PCR-F (3 μM), 3.3 μL; PE-III-PCR-XXX (3 μM), 3.3 μL; Phusion, 0.25

μL; and first-step PCR DNA, 2 μL. Reactions were cycled under the conditions: 98°C for 30 s

(initial denaturation); 10 cycles of: 98°C for 30 s (denaturation); 83°C for 30 s (annealing); and

72°C for 30 s (extension). The Second-step PCR was then cleaned up using Agencourt AMPure

XP-PCR purification. Following clean-up of second-step PCR, PCR product quality was assessed by gel electrophoresis on an Agilent Bioanalyzer (Agilent Technologies, Santa Clara,

CA). DNA concentration was measured by Quant-iT PicoGreen (ThermoFisher Scientific), samples were normalized, multiplexed together, and sequenced on an Illumina MiSeq with

2x250-bp paired-end reads at the MIT BioMicro Center (Massachusetts Institute of Technology,

Cambridge, MA).

16S Sequence Data Analysis

153 16S data was analyzed using Qiime249 and DADA2.50 Taxanomic labels were added to

16S sequences using the SILVA database.51

4.5.8 Metagenomic sequencing

Metagenomics Library Construction and Sequencing

Purified DNA was used as input (1 µL) into a miniaturized version of the Nextera-XT

Library Preparation Kit (Illumina Inc.). All reactions were scaled to one-fourth their original volumes. Libraries were constructed according to the manufacturer's instructions with several modifications to accommodate low DNA concentrations. The amplicon tagmentation mix

(ATM) was diluted 1:10 in tagmentation DNA buffer (TD) to reduce the tagmentase:DNA ratio.

Tagmentation time was also reduced to from 5 minutes to 1 minute. Both modifications were implemented to boost the insert size thereby reducing read overlap and allowing for sampling of a larger proportion of the nucleotide sequence space. Lastly, the number of cycles in the library amplification PCR was increased from 12 to 20 in order to generate sufficient product for sequencing. Individual sample libraries were pooled at equimolar concentration and sequenced on a HiSeq 2500 at 200 cycles for 2x100 pair-end reads.

Metagenomic Profiling and Downstream Analysis

Raw sequencing reads were first processed through CutAdapt (version 1.7.1)52 and

Trimmomatic53 (version 0.3.3) to remove Nextera adapters, low quality bases, and low quality reads. Trimmed reads were then filtered for human contamination (Hg19) with KneadData

(version 0.5.1). These filtered reads were then run through Metaphlan2 (version 2.6.0) for taxonomic profiling.54, 55 Relative taxonomic abundances from Metaphlan2 were imported into the phyloseq package (version 1.30.0) for further refinement and visualization

(metaphlanToPhyloseq.R source code available at:

154 https://gist.github.com/lwaldron/512d1925a8102e921f05c5b25de7ec94).56 Relative abundance data were agglomerated to the either the genus- or species-level and used for all downstream analysis. Alpha diversity was measured using Shannon’s Diversity.57 Multidimensional scaling plots (MDS) and heatmaps were derived from Bray-Curtis dissimilarity indices.

155 4.6 Acknowledgements

This work would not have been possible without terrific effort from many individuals.

First, I would like to thank Professor Laura L. Kiessling for the push to really get this project off the ground, and continued support through the process of troubleshooting. I must acknowledge the tremendous effort of Smrithi Raman, an undergraduate who worked every day to make enough protein for us to get these samples sorted. Even though her time on the wet lab side of the project was cut short by COVID-related shutdowns, I am so grateful for her effort and continued dedication. I am also grateful to Dr. (Robert) Lyle McPherson for keeping me level-headed and spending hours running FACS with me. I would also like to thank Professor Eric Alm and Dr.

Ramnik Xavier for access to donor stool samples and equipment for library preparation. I would like to thank Tu Nguyen for teaching me 16S library prep and Qiime2 analysis, as well as encouragement to have faith in my very small samples. I would also like to thank the Koch

Institute Flow Core staff, especially Michele Griffin for training me to use the FACS Aria instruments. I also would like to thank Dr. Eric Brown and Dr. Hera Vlamakis for tips, thoughtful discussions, and data analysis. I also must thank Hera a second time for her work coordinating samples and sequencing with the MOC, and Dallis Sergio for prepping clinical samples. Finally, I would like to thank Tony Gaca for his effort to adapt the metagenomic sequencing library construction for our small sample sizes. Last but not least, I would like to share my gratitude for Dr. Darryl Wesener for planting the seed of this project when I first joined the Kiessling Lab, and for his continued interest in this project since he graduated from the lab.

And special thanks to VMM and JP for edits to this chapter.

156 4.7 References

1. Turnbaugh, P. J.; Ley, R. E.; Hamady, M.; Fraser-Liggett, C. M.; Knight, R.; Gordon, J. I., The Human Microbiome Project. Nature 2007, 449 (7164), 804-810.

2. Bäckhed, F.; Ley, R. E.; Sonnenburg, J. L.; Peterson, D. A.; Gordon, J. I., Host- Bacterial Mutualism in the Human Intestine. Science 2005, 307 (5717), 1915.

3. Gilbert, J. A.; Quinn, R. A.; Debelius, J.; Xu, Z. Z.; Morton, J.; Garg, N.; Jansson, J. K.; Dorrestein, P. C.; Knight, R., Microbiome-wide association studies link dynamic microbial consortia to disease. Nature 2016, 535 (7610), 94-103.

4. Sharon, H. L. a. N., Lectins: Carbohydrate-Specific Proteins That Mediate Cellular Recognition. Chemical Reviews 1998, 98 (2), 637-674.

5. Weis, W. I.; Drickamer, K., Structural Basis Of Lectin-Carbohydrate Recognition. Annual Review of Biochemistry 1996, 65 (1), 441-473.

6. Wesener, D. A.; Dugan, A.; Kiessling, L. L., Recognition of microbial glycans by soluble human lectins. Current Opinion in Structural Biology 2017, 44, 168-178.

7. Fujita, T., Evolution of the lectin–complement pathway and its role in innate immunity. Nature Reviews Immunology 2002, 2 (5), 346-353.

8. Stowell, S. R.; Arthur, C. M.; Dias-Baruffi, M.; Rodrigues, L. C.; Gourdine, J.-P.; Heimburg-Molinaro, J.; Ju, T.; Molinaro, R. J.; Rivera-Marrero, C.; Xia, B.; Smith, D. F.; Cummings, R. D., Innate immune lectins kill bacteria expressing blood group antigen. Nature Medicine 2010, 16 (3), 295-301.

9. Vaishnava, S.; Yamamoto, M.; Severson, K. M.; Ruhn, K. A.; Yu, X.; Koren, O.; Ley, R.; Wakeland, E. K.; Hooper, L. V., The Antibacterial Lectin RegIIIγ Promotes the Spatial Segregation of Microbiota and Host in the Intestine. Science 2011, 334 (6053), 255.

10. Wesener, D. A.; Wangkanont, K.; McBride, R.; Song, X.; Kraft, M. B.; Hodges, H. L.; Zarling, L. C.; Splain, R. A.; Smith, D. F.; Cummings, R. D.; Paulson, J. C.; Forest, K. T.; Kiessling, L. L., Recognition of microbial glycans by human intelectin-1. Nature Structural & Molecular Biology 2015, 22 (8), 603-610.

11. Tsuji, S.; Uehori, J.; Matsumoto, M.; Suzuki, Y.; Matsuhisa, A.; Toyoshima, K.; Seya, T., Human Intelectin Is a Novel Soluble Lectin That Recognizes Galactofuranose in Carbohydrate Chains of Bacterial Cell Wall. Journal of Biological Chemistry 2001, 276 (26), 23456-23463.

12. Pemberton, A. D.; Rose-Zerilli, M. J.; Holloway, J. W.; Gray, R. D.; Holgate, S. T., A single-nucleotide polymorphism in intelectin 1 is associated with increased asthma risk. Journal of Allergy and Clinical Immunology 2008, 122 (5), 1033-1034.

157 13. Barrett, J. C.; Hansoul, S.; Nicolae, D. L.; Cho, J. H.; Duerr, R. H.; Rioux, J. D.; Brant, S. R.; Silverberg, M. S.; Taylor, K. D.; Barmada, M. M.; Bitton, A.; Dassopoulos, T.; Datta, L. W.; Green, T.; Griffiths, A. M.; Kistner, E. O.; Murtha, M. T.; Regueiro, M. D.; Rotter, J. I.; Schumm, L. P.; Steinhart, A. H.; Targan, S. R.; Xavier, R. J.; the, NIDDK IBD Genetics Consortium; Libioulle, C.; Sandor, C.; Lathrop, M.; Belaiche, J.; Dewit, O.; Gut, I.; Heath, S.; Laukens, D.; Mni, M.; Rutgeerts, P.; Van Gossum, A.; Zelenika, D.; Franchimont, D.; Hugot, J.-P.; de Vos, M.; Vermeire, S.; Louis, E.; the Belgian-French IBD Consortium; the Wellcome Trust Case Control Consortium; Cardon, L. R.; Anderson, C. A.; Drummond, H.; Nimmo, E.; Ahmad, T.; Prescott, N. J.; Onnie, C. M.; Fisher, S. A.; Marchini, J.; Ghori, J.; Bumpstead, S.; Gwilliam, R.; Tremelling, M.; Deloukas, P.; Mansfield, J.; Jewell, D.; Satsangi, J.; Mathew, C. G.; Parkes, M.; Georges, M.; Daly, M. J., Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease. Nature Genetics 2008, 40 (8), 955-962.

14. Neth, O.; Jack, D. L.; Dodds, A. W.; Holzel, H.; Klein, N. J.; Turner, M. W., Mannose- Binding Lectin Binds to a Range of Clinically Relevant Microorganisms and Promotes Complement Deposition. Infection and Immunity 2000, 68 (2), 688.

15. Jack, D. L.; Klein, N. J.; Turner, M. W., Mannose-binding lectin: targeting the microbial world for complement attack and opsonophagocytosis. Immunological Reviews 2001, 180 (1), 86-99.

16. Crouch, E. C., Surfactant protein-D and pulmonary host defense. Respiratory Research 2000, 1 (2), 93-108.

17. Sorensen, G. L., Surfactant Protein D in Respiratory and Non-Respiratory Diseases. Frontiers in Medicine 2018, 5 (18).

18. Wright, J. R., Immunoregulatory functions of surfactant proteins. Nature Reviews Immunology 2005, 5 (1), 58-68.

19. McMahon, C. M.; Isabella, C. R.; Windsor, I. W.; Kosma, P.; Raines, R. T.; Kiessling, L. L., Stereoelectronic Effects Impact Glycan Recognition. Journal of the American Chemical Society 2020, 142 (5), 2386-2395.

20. Eddie Ip, W. K.; Takahashi, K.; Alan Ezekowitz, R.; Stuart, L. M., Mannose-binding lectin and innate immunity. Immunological Reviews 2009, 230 (1), 9-21.

21. Clark, H. W.; Mackay, R.-M.; Deadman, M. E.; Hood, D. W.; Madsen, J.; Moxon, E. R.; Townsend, J. P.; Reid, K. B. M.; Ahmed, A.; Shaw, A. J.; Greenhough, T. J.; Shrive, A. K., Crystal Structure of a Complex of Surfactant Protein D (SP-D) and Haemophilus influenzae Lipopolysaccharide Reveals Shielding of Core Structures in SP-D-Resistant Strains. Infection and Immunity 2016, 84 (5), 1585-1592.

22. Wang, H.; Head, J.; Kosma, P.; Brade, H.; Müller-Loennies, S.; Sheikh, S.; McDonald, B.; Smith, K.; Cafarella, T.; Seaton, B.; Crouch, E., Recognition of Heptoses and the Inner Core of Bacterial Lipopolysaccharides by Surfactant Protein D. Biochemistry 2008, 47 (2), 710-720.

158 23. Dommett, R. M.; Klein, N.; Turner, M. W., Mannose-binding lectin in innate immunity: past, present and future. Tissue Antigens 2006, 68 (3), 193-209.

24. Hehemann, J.-H.; Correc, G.; Barbeyron, T.; Helbert, W.; Czjzek, M.; Michel, G., Transfer of carbohydrate-active enzymes from marine bacteria to Japanese gut microbiota. Nature 2010, 464 (7290), 908-912.

25. Hehemann, J.-H.; Kelly, A. G.; Pudlo, N. A.; Martens, E. C.; Boraston, A. B., Bacteria of the human gut microbiome catabolize red seaweed glycans with carbohydrate-active enzyme updates from extrinsic microbes. Proceedings of the National Academy of Sciences of the United States of America 2012, 109 (48), 19786.

26. Gevers, D.; Kugathasan, S.; Denson, Lee A.; Vázquez-Baeza, Y.; Van Treuren, W.; Ren, B.; Schwager, E.; Knights, D.; Song, Se J.; Yassour, M.; Morgan, Xochitl C.; Kostic, Aleksandar D.; Luo, C.; González, A.; McDonald, D.; Haberman, Y.; Walters, T.; Baker, S.; Rosh, J.; Stephens, M.; Heyman, M.; Markowitz, J.; Baldassano, R.; Griffiths, A.; Sylvester, F.; Mack, D.; Kim, S.; Crandall, W.; Hyams, J.; Huttenhower, C.; Knight, R.; Xavier, Ramnik J., The Treatment-Naive Microbiome in New-Onset Crohn’s Disease. Cell Host & Microbe 2014, 15 (3), 382-392.

27. Kowalska-Duplaga, K.; Gosiewski, T.; Kapusta, P.; Sroka-Oleksiak, A.; Wędrychowicz, A.; Pieczarkowski, S.; Ludwig-Słomczyńska, A. H.; Wołkow, P. P.; Fyderek, K., Differences in the intestinal microbiome of healthy children and patients with newly diagnosed Crohn’s disease. Scientific Reports 2019, 9 (1), 18880.

28. Vich Vila, A.; Imhann, F.; Collij, V.; Jankipersadsing, S. A.; Gurry, T.; Mujagic, Z.; Kurilshikov, A.; Bonder, M. J.; Jiang, X.; Tigchelaar, E. F.; Dekens, J.; Peters, V.; Voskuil, M. D.; Visschedijk, M. C.; van Dullemen, H. M.; Keszthelyi, D.; Swertz, M. A.; Franke, L.; Alberts, R.; Festen, E. A. M.; Dijkstra, G.; Masclee, A. A. M.; Hofker, M. H.; Xavier, R. J.; Alm, E. J.; Fu, J.; Wijmenga, C.; Jonkers, D. M. A. E.; Zhernakova, A.; Weersma, R. K., Gut microbiota composition and functional changes in inflammatory bowel disease and irritable bowel syndrome. Science Translational Medicine 2018, 10 (472), eaap8914.

29. Brown, G. D.; Willment, J. A.; Whitehead, L., C-type lectins in immunity and homeostasis. Nature Reviews Immunology 2018, 18 (6), 374-389.

30. Bajic, G.; Degn, S. E.; Thiel, S.; Andersen, G. R., Complement activation, regulation, and molecular basis for complement-related diseases. The EMBO Journal 2015, 34 (22), 2735- 2757.

31. Bottazzi, B.; Doni, A.; Garlanda, C.; Mantovani, A., An Integrated View of Humoral Innate Immunity: Pentraxins as a Paradigm. Annual Review of Immunology 2010, 28 (1), 157- 183.

32. Vasta, G. R., Roles of galectins in infection. Nature Reviews Microbiology 2009, 7 (6), 424-438.

159 33. Holmskov, U.; Thiel, S.; Jensenius, J. C., Collectins and Ficolins: Humoral Lectins of the Innate Immune Defense. Annual Review of Immunology 2003, 21 (1), 547-578.

34. Pang, X.; Xiao, X.; Liu, Y.; Zhang, R.; Liu, J.; Liu, Q.; Wang, P.; Cheng, G., Mosquito C-type lectins maintain gut microbiome homeostasis. Nature Microbiology 2016, 1 (5), 16023.

35. Dinh, C.; Farinholt, T.; Hirose, S.; Zhuchenko, O.; Kuspa, A., Lectins modulate the microbiota of social amoebae. Science 2018, 361 (6400), 402.

36. Lozupone, C.; Faust, K.; Raes, J.; Faith, J. J.; Frank, D. N.; Zaneveld, J.; Gordon, J. I.; Knight, R., Identifying genomic and metabolic features that can underlie early successional and opportunistic lifestyles of human gut symbionts. Genome Research 2012, 22 (10), 1974- 1984.

37. Van den Abbeele, P.; Belzer, C.; Goossens, M.; Kleerebezem, M.; De Vos, W. M.; Thas, O.; De Weirdt, R.; Kerckhof, F.-M.; Van de Wiele, T., Butyrate-producing Clostridium cluster XIVa species specifically colonize mucins in an in vitro gut model. The ISME Journal 2013, 7 (5), 949-961.

38. Pemberton, A. D.; Verdon, B.; Inglis, N. F.; Pearson, J. P., Sheep intelectin-2 co- purifies with the mucin Muc5ac from gastric mucus. Research in Veterinary Science 2011, 91 (3), e53-e57.

39. Miquel, S.; Martín, R.; Rossi, O.; Bermúdez-Humarán, L. G.; Chatel, J. M.; Sokol, H.; Thomas, M.; Wells, J. M.; Langella, P., Faecalibacterium prausnitzii and human intestinal health. Current Opinion in Microbiology 2013, 16 (3), 255-261.

40. Apisarnthanarak, A.; Fraser, V. J.; Dunne, W. M.; Little, J. R.; Hoppe-Bauer, J.; Mayfield, J. L.; Polish, L. B., Stenotrophomonas maltophilia Intestinal Colonization in Hospitalized Oncology Patients with Diarrhea. Clinical Infectious Diseases 2003, 37 (8), 1131- 1135.

41. Hellmig, S.; Ott, S.; Musfeldt, M.; Kosmahl, M.; Rosenstiel, P.; Stüber, E.; Hampe, J.; Fölsch, U. R.; Schreiber, S., Life-Threatening Chronic Enteritis Due to Colonization of the Small Bowel With Stenotrophomonas maltophilia. Gastroenterology 2005, 129 (2), 706-712.

42. Gellatly, S. L.; Hancock, R. E. W., Pseudomonas aeruginosa : new insights into pathogenesis and host defenses. Pathogens and Disease 2013, 67 (3), 159-173.

43. Kerckhoffs, A. P. M.; Ben-Amor, K.; Samsom, M.; van der Rest, M. E.; de Vogel, J.; Knol, J.; Akkermans, L. M. A., Molecular analysis of faecal and duodenal samples reveals significantly higher prevalence and numbers of Pseudomonas aeruginosa in irritable bowel syndrome. Journal of Medical Microbiology 2011, 60 (2), 236-245.

44. Ryan, R. P.; Fouhy, Y.; Garcia, B. F.; Watt, S. A.; Niehaus, K.; Yang, L.; Tolker- Nielsen, T.; Dow, J. M., Interspecies signalling via the Stenotrophomonas maltophilia diffusible

160 signal factor influences biofilm formation and polymyxin tolerance in Pseudomonas aeruginosa. Molecular Microbiology 2008, 68 (1), 75-86.

45. Balfour Sartor, R., Enteric Microflora in IBD: Pathogens or Commensals? Inflammatory Bowel Diseases 1997, 3 (3), 230-235.

46. Atarashi, K.; Suda, W.; Luo, C.; Kawaguchi, T.; Motoo, I.; Narushima, S.; Kiguchi, Y.; Yasuma, K.; Watanabe, E.; Tanoue, T.; Thaiss, C. A.; Sato, M.; Toyooka, K.; Said, H. S.; Yamagami, H.; Rice, S. A.; Gevers, D.; Johnson, R. C.; Segre, J. A.; Chen, K.; Kolls, J. K.; Elinav, E.; Morita, H.; Xavier, R. J.; Hattori, M.; Honda, K., Ectopic colonization of oral bacteria in the intestine drives TH1-cell induction and inflammation. Science 2017, 358 (6361), 359-365.

47. Whitfield, C.; Szymanski, C. M.; Aebi, M., Eubacteria. In Essentials of Glycobiology, 3rd ed.; A., V.; Cummings, R. D.; Esko, J. D.; al., e., Eds. Cold Spring Harbor Laboratory Press: Cold Spring Harbor (NY), 2017.

48. Kearney, S. M.; Gibbons, S. M.; Erdman, S. E.; Alm, E. J., Orthogonal Dietary Niche Enables Reversible Engraftment of a Gut Bacterial Commensal. Cell Reports 2018, 24 (7), 1842- 1851.

49. Bolyen, E.; Rideout, J. R.; Dillon, M. R.; Bokulich, N. A.; Abnet, C. C.; Al-Ghalith, G. A.; Alexander, H.; Alm, E. J.; Arumugam, M.; Asnicar, F.; Bai, Y.; Bisanz, J. E.; Bittinger, K.; Brejnrod, A.; Brislawn, C. J.; Brown, C. T.; Callahan, B. J.; Caraballo- Rodríguez, A. M.; Chase, J.; Cope, E. K.; Da Silva, R.; Diener, C.; Dorrestein, P. C.; Douglas, G. M.; Durall, D. M.; Duvallet, C.; Edwardson, C. F.; Ernst, M.; Estaki, M.; Fouquier, J.; Gauglitz, J. M.; Gibbons, S. M.; Gibson, D. L.; Gonzalez, A.; Gorlick, K.; Guo, J.; Hillmann, B.; Holmes, S.; Holste, H.; Huttenhower, C.; Huttley, G. A.; Janssen, S.; Jarmusch, A. K.; Jiang, L.; Kaehler, B. D.; Kang, K. B.; Keefe, C. R.; Keim, P.; Kelley, S. T.; Knights, D.; Koester, I.; Kosciolek, T.; Kreps, J.; Langille, M. G. I.; Lee, J.; Ley, R.; Liu, Y.-X.; Loftfield, E.; Lozupone, C.; Maher, M.; Marotz, C.; Martin, B. D.; McDonald, D.; McIver, L. J.; Melnik, A. V.; Metcalf, J. L.; Morgan, S. C.; Morton, J. T.; Naimey, A. T.; Navas-Molina, J. A.; Nothias, L. F.; Orchanian, S. B.; Pearson, T.; Peoples, S. L.; Petras, D.; Preuss, M. L.; Pruesse, E.; Rasmussen, L. B.; Rivers, A.; Robeson, M. S.; Rosenthal, P.; Segata, N.; Shaffer, M.; Shiffer, A.; Sinha, R.; Song, S. J.; Spear, J. R.; Swafford, A. D.; Thompson, L. R.; Torres, P. J.; Trinh, P.; Tripathi, A.; Turnbaugh, P. J.; Ul-Hasan, S.; van der Hooft, J. J. J.; Vargas, F.; Vázquez-Baeza, Y.; Vogtmann, E.; von Hippel, M.; Walters, W.; Wan, Y.; Wang, M.; Warren, J.; Weber, K. C.; Williamson, C. H. D.; Willis, A. D.; Xu, Z. Z.; Zaneveld, J. R.; Zhang, Y.; Zhu, Q.; Knight, R.; Caporaso, J. G., Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology 2019, 37 (8), 852-857.

50. Callahan, B. J.; McMurdie, P. J.; Rosen, M. J.; Han, A. W.; Johnson, A. J. A.; Holmes, S. P., DADA2: High-resolution sample inference from Illumina amplicon data. Nature Methods 2016, 13 (7), 581-583.

161 51. Quast, C.; Pruesse, E.; Yilmaz, P.; Gerken, J.; Schweer, T.; Yarza, P.; Peplies, J.; Glöckner, F. O., The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Research 2012, 41 (D1), D590-D596.

52. Martin, M., Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 2011, 17 (1), 10-12.

53. Bolger, A. M.; Lohse, M.; Usadel, B., Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 2014, 30 (15), 2114-2120.

54. McIver, L. J.; Abu-Ali, G.; Franzosa, E. A.; Schwager, R.; Morgan, X. C.; Waldron, L.; Segata, N.; Huttenhower, C., bioBakery: a meta’omic analysis environment. Bioinformatics 2017, 34 (7), 1235-1237.

55. Truong, D. T.; Franzosa, E. A.; Tickle, T. L.; Scholz, M.; Weingart, G.; Pasolli, E.; Tett, A.; Huttenhower, C.; Segata, N., MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nature Methods 2015, 12 (10), 902-903.

56. McMurdie, P. J.; Holmes, S., phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data. PLOS ONE 2013, 8 (4), e61217.

57. Shannon, C. E., A Mathematical Theory of Communication. Bell System Technical Journal 1948, 27 (3), 379-423.

162 4.8 Supplemental information

Figure 4-1S. Species-level enrichment plot of lectin-SEQ with metagenomics. Enrichment plot depicting enrichment (green circles) or depletion (red circles) or no change (blue circles) of species in the lectin+ fraction for hItln-1, V109D, and SP-D across donors. absindex = relative abundance (lectin+) – relative abundance (lectin –) / relative abundance (lectin+) + relative abundance (lectin –).

163