<<

CONSIDERATION OF GLYCOSIDIC TORSION ANGLE PREFERENCES AND

CH/π INTERACTIONS IN - DOCKING

by

Anita Karen Nivedha

(Under the direction of Robert J. Woods)

ABSTRACT

Carbohydrates play a pivotal role in various life processes including energy metabolism, storage, immune recognition, transportation, signaling and biosynthesis. In these roles, they often interact with other integral components of the living system such as and . An understanding of how these interact can further our knowledge of crucial biological processes, and begins with the knowledge of the three-dimensional structures of these complexes. However, owing to challenges involved in crystallizing structures, theoretical modeling methods such as molecular docking are often used to predict how interact with protein receptors. But, docking programs have generalized scoring functions which often produce unnatural oligosaccharide conformations during docking. In this thesis, we present two approaches to improve protein-carbohydrate docking by accounting for specific intra- and intermolecular interaction energies relating to , which are not currently dealt with by existing docking methodologies. In the first approach, we developed a set of

Carbohydrate Intrinsic (CHI) energy functions in order to account for intramolecular energies of carbohydrate ligands primarily determined by the conformations of glycosidic torsion angles connecting individual saccharides. This work resulted in the development

of Vina-Carb (incorporation of the CHI energy functions within the scoring function of

AutoDock Vina), which significantly improved the conformations of oligosaccharide binding mode predictions. In the second approach, we developed a scoring function by fitting a mathematical model to data from literature describing the energy contributed by

CH/π interactions. This energy function was used to score the crucial interactions between CH groups lining the carbohydrate ring and the π electron densities in aromatic amino acids of interacting proteins. Employing the CH/π interaction energy function to rescore docked protein-carbohydrate complexes improved the rankings of accurate pose predictions made by both AutoDock Vina and Vina-Carb. The scoring functions developed and used in this work are transferable and can therefore be used with other docking programs and also in the refinement of experimental carbohydrate structures.

INDEX WORDS: Autodock, AutoDock Vina, Molecular Docking, Protein-Carbohydrate

Docking, Docking Scoring Functions, Internal Energies, Carbohydrate, Carbohydrate

Intrinsic Energy Functions, CHI Energy Functions, Vina-Carb, Antibody, Antigen,

Lectin, , Carbohydrate Binding Module, CH/π Interactions

CONSIDERATION OF GLYCOSIDIC TORSION ANGLE PREFERENCES AND

CH/π INTERACTIONS IN PROTEIN-CARBOHYDRATE DOCKING

by Anita Karen Nivedha B. Tech., Vellore Institute of Technology University, India, 2008

A Thesis Submitted to the Graduate Faculty of The University of Georgia in Partial

Fulfillment of the Requirements for the Degree

DOCTOR OF PHILOSOPHY

ATHENS, GEORGIA

2015

© 2015

Anita Karen Nivedha

All Rights Reserved

CONSIDERATION OF GLYCOSIDIC TORSION ANGLE PREFERENCES AND

CH/π INTERACTIONS IN PROTEIN-CARBOHYDRATE DOCKING

by Anita Karen Nivedha

Major Professor: Robert J. Woods Committee: James H. Prestegard Liming Cai Donald Evans

Electronic Version Approved: Suzanne Barbour Dean of the Graduate School The University of Georgia December 2015

DEDICATION

I would like to dedicate this work to my beloved parents, Jenetta and Joshwa.

iv

ACKNOWLEDGEMENTS

Firstly, I would like to acknowledge and extend my gratitude to my major

Professor, Dr. Robert J. Woods for his support, encouragement, guidance and for giving me the wonderful opportunity to be a part of the Woods’ Group Family. I would like to thank my PhD Advisory Committee, Dr. James H. Prestegard, Dr. Liming Cai and Dr.

Donald L. Evans for their valuable advice, insight and suggestions over the years as my dissertation took shape. I would like to thank colleagues who were directly involved in my research, Dr. B. Lachele Foley, Dr. Matthew B. Tessier, Dr. Spandana Makeneni and

David F. Thieker. It has been a great learning experience and a pleasure collaborating and working with each one of you.

I would like to acknowledge the support of my peers in the Woods’ group: Dr.

Arunima Singh, Amika Sood, Dr. Jodi Hadden, Mark Baine, Dr. Xiaocong Wang, Dr.

Keigo Ito, Dr. Oliver Grant, Huimin Hu, Dr. Valerie Murphy, Dr. Mari DeMarco, Mia Ji,

Dr. Elisa Fadda, Dr. Joanne Martin and Dr. Hannah Smith. Matt, thank you for helping me when I was a newbie in the group, and amongst other things, for teaching me to do docking, which constitutes a major portion of my dissertation today. Arunima, Amika,

Spandana and Jodi, thank you for being with me through the ups and downs in Graduate

School. Keigo, thank you for helping me with all my QM questions and for your tips on scientific writing. Mark, thank you for being a huge support during my time in the group and for all of your efforts in keeping everything around the lab in order.

I am thankful to God for being my Provider and for all of His blessings at every stage of my life as a graduate student. I would like to acknowledge the unconditional love,

v support and encouragement given by Mama and Papa. Thank you for being my greatest cheerleaders. I would like to extend my heartfelt thanks to Amy, Ashley, Niranjana,

Madison, Jagadish, Cookieday, Adwoa, Ken, Femi, Anna, Ebenezer, Adeline, Savior

Karnik and Manikins, for being there for me, for believing in me, cheering me on and supporting me throughout Graduate School. I could not have done it without your solid support.

I would certainly not be where I am if not for all of the wonderful people who have sown into my life and my career. For them, I am forever grateful.

vi

LIST OF TABLES

Table 4.1 PDB IDs and ligand sequences employed in the study, including the shape

RMSD (SRMSD) values for the ligands generated by GLYCAM, relative to the

crystallographic ligands...... 25

Table 5.1 Comparison between ADV and VC at the four settings of CHI-coefficient and

CHI-cutoff...... 68

Table 5.2 PRMSDmin(5) produced by ADV and VC1|2 for the 12 test systems with ligands

containing 1,6-linkages...... 69

Table 5.3 Comparison between ADV and VC1|2 for the apo proteins Test Set...... 76

Table 6.1 Average rank of accurate PRMSDmin pose predictions by ADV and VC1|2

before and after rescoring as a function of the CH/π interaction energy

coefficients. The systems are divided into different groups based on the number of

detected CH/π interactions...... 95

vii

LIST OF FIGURES

Figure 2.1. An illustration of the conversion from the chain and ring form of . .... 6

4 1 Figure 2.2 A representation of two chair conformations of Glucose, namely, C1 and C4.

...... 7

Figure 2.3 A 1-3 glycosidic linkage formation between a glucopyranose (Glcp) unit and a

galactopyranose (Galp) unit. The D in the name refers to the being

dextrorotatory, which refers to it rotating plane polarized light to the right...... 8

Figure 2.4 Carbohydrate epimers: and glucose are C4 epimers, while glucose

and are C2 epimers...... 8

Figure 3.1 a.) Rigid Docking b.) Flexible Ligand Docking ...... 14

Figure 3.2 The workflow within the AutoDock Vina algorithm...... 18

Figure 4.1 (a) Illustration of an antibody with its variable fragment (Fv) aligned to the

grid box. The yellow dot represents the CoM of the CDRs (0,0,0), and the green

dot represents the center of the grid box (0,0,11). (b) Aligned orientation of an

antibody antigen-binding fragment (Fab), with respect to the internal reference

axes. The region in red + pink represents the VH domain (CDRs (red) and

framework regions (pink) of the heavy chain) of the antibody, while the region in

blue represents the VL domain (CDRs (dark blue) and framework regions (cyan)

of the light chain). The X-axis for the alignment was defined by a vector passing

through the CoM of the variable light chain (VL domain, which contains the light

chain CDRs and framework sequences), and the CoM of the variable heavy chain

(VH domain). The Z-axis was defined as a vector normal to the X-axis, and

passing through the CoM of the entire variable region, or variable fragment (Fv).

viii

The antibody was then translated so that the CoM of the CDRs was placed at the

origin. The Y-axis was defined as a vector perpendicular to the XZ-plane, and

passing through the origin. The docking grid box was aligned to the internal co-

ordinate axes with its center offset from the origin by 11Å along the Z-axis, so as

to optimally encompass the CDR loops, while also permitting adequate volume

for the movement of the ligand during docking. Such a definition enabled the

docking grid box to be consistently aligned with respect to the CDRs...... 28

Figure 4.2 PRMSD and SRMSD calculation. Shown in (a) and (b) are the PRMSD and

SRMSD, respectively, of a representative docked pose with respect to its

ligand. (a) The PRMSD is the RMSD between the ring atoms of a representative

docked structure (white) and the corresponding crystal structure (black). (b) The

SRMSD is the RMSD value obtained after the docked structure (white) is

superimposed on the crystal structure (black)...... 30

Figure 4.3 The φ and ψ angle distributions from 100 docked structures, for selected

linkages, as indicated by the dashed rectangle. Data are presented, in order, for

AD3 (black bars), AD4.2 (white bars) and ADV (grey bars). The bin containing

the experimentally-determined values is highlighted with a light blue outline. The

bin containing the structure with the lowest docked energy is indicated as follows:

AD3, yellow; AD4.2, orange; ADV, green...... 34

Figure 4.4 Representation of the 8 model pertinent to the development of

CHI energy functions. The models depicting 1,2-linkages can be used to model

1,4-linkages due to symmetry about the O5 atom...... 35

ix

Figure 4.5 Individual (dashed lines) and average (solid line) rotational energy curves for

models (see Figure 4.4) whose linkages have similar local geometries...... 36

Figure 4.6 Comparison of the CHI energy functions (solid line) to the glycosidic torsion

angle distributions of carbohydrates from experimental co-crystal structures

(histograms)...... 38

Figure 4.7 Scatter plots demonstrating improvement in the linear correlation between

SRMSD and docked energies after rescoring, for each of the three docking

programs. Points before rescoring 22 are shown in dark grey and points after

rescoring are shown in light grey. Shown20 in the insets are SRMSD vs. docked

22 energy plots of only the overall lowest18 PRMSD structure for each of the six

20 antibody systems before (dark grey)16 and after (light grey) rescoring. The black 22 18 rectangles in all insets enclose plot 14areas with SRMSD ≤ 1 Å and energies ≤ 0 20 12 16 kcal/mol...... 39 18 14 Figure 4.8 Graphs showing the distribution10 of conformations produced by AD3 ( ), 16 12 AD4.2 ( ) and ADV ( ) plotted onto8 the corresponding CHI energy curves for 14 10 each of the representative linkage combinations;6 the curves are offset from each 12 4 8 other by 6 kcal/mol...... 41 10 2

6 Figure 4.9 a) SRMSD8 s of the lowest energy E [kcal/mol] poses for all six systems from AD3, AD4.2 Δ 0 4 and ADV,6 before (dark grey) and after (light grey) rescoring. (b) PRMSDs of the 0 60 120 180 240 300 360

2 lowest energy4 poses for all six systems from all three docking programs,ψ before[deg]

E E [kcal/mol] Δ

0 (dark grey)2 and after (light grey) rescoring...... 43 E E [kcal/mol]

0 60 120 Δ 180 240 300 360 Figure 4.10 (a) AD30 lowest energy pose for 1S3K before rescoring (white) compared to ψ [deg]0 60 120 180 240 300 360 the crystal ligand (black); PRMSD = 5.7ψ [deg] Å. (b) Lowest energy pose after

x

inclusion of the CHI energy (white) compared to the crystal ligand (black);

PRMSD = 0.6 Å...... 44

Figure 4.11 Docking the to the Salmonella antibody (in 1MFD and 1MFA).

(a) Lowest energy pose from ADV for 1MFD before rescoring (white) compared

to the crystal ligand (black); PRMSD = 5.5Å. (b) Lowest energy pose from ADV

for 1MFD after rescoring (white) compared to the crystal ligand (black); PRMSD

= 1.0Å. (c) and (d) show the 1MFD antibody in transparent surface representation

along with the oxygen atom belonging to the water molecule from the

crystallographic co-complex, WAT 601; in (c) the crystal ligand from 1MFD is

shown in CPK representation, and in (d) the lowest energy pose from ADV for

1MFD before rescoring (in CPK representation) showing the Gal residue

replacing Abe within the binding pocket is shown. (e) The Gal residue from the

ligand in 1MFD (in van der Waals representation) after being superimposed onto

the Abe residue from the ligand in 1MFA is shown within the 1MFA binding site.

A cross-section of the 1MFA antibody is represented as a transparent surface with

potential steric clashes visible between the Gal residue and the antibody. (f) Same

as (e) but with the 1MFA antibody represented as an opaque surface thus more

clearly depicting potential steric clashes between the O-3 and O-6 groups of the

Gal residue and the interior of the binding pocket...... 48

Figure 4.12 Docking to the antibody in 1M7I using AD4.2. (a) Lowest energy pose

before rescoring (white) compared to the crystal ligand (black); PRMSD = 3.9Å.

(b) Lowest energy pose after rescoring (white) compared to the crystal ligand

(black); PRMSD = 10.7Å...... 49

xi

Figure 5.1 a.) The effect of applying CHI-coefficient values of 1 (solid line), 2 (dashed

line) and 5 (dotted line) to the original VCΦ|β curve. b.) The effect of applying a

CHI-cutoff value of 2 to the original CHIΦ|β curve (VC1|2)...... 58

Figure 5.2 Assessment of docking to 14 antibody systems with ADV and various CHI-

energy coefficients of VC a.) SRMSDavg amongst the 5 top-ranked poses b.)

PRMSDmin(5)...... 65

Figure 5.3 Comparison of the VC1|2 (dotted line) and VC2|1 (solid line) CHIΦ|β curve to

the distribution of glycosidic linkages in carbohydrate crystal structures in the

PDB. The bottom X-axis and left Y-axis correspond to the histogram which

depicts the distribution of PDB structures, while the top X-axis and right Y-axis

correspond to the CHI-energy curves...... 67

Figure 5.4 Distribution of ω angles produced by ADV (blue) and VC1|2(green) for 12 test

systems containing one or more 1,6-linkages overlaid against the reference crystal

structure ω angles (red dots) and the corresponding CHI energy curve...... 70

Figure 5.5 a.) The PRMSDmin(5) pose from ADV compared to the reference ligand (blue).

b.) The PRMSDmin(5) pose from VC1|2 compared to the reference ligand (blue). c.)

The Φ torsion angles of α- from the docked poses of the 3C6S ligand from

both ADV (yellow triangles) and VC1|2 (green squares) plotted on to the CHI

curve. The torsion angles corresponding to the reference are plotted as blue

circles...... 72

Figure 5.6 The crystal structure of a CBM from endoglucanase Cel5A (PDB ID: 4AFD)

is depicted in complex with a ligand. All amino acids further than

xii

5 Å away from the ligand are colored grey. Those residues within 5 Å are colored

orange if they are cyclic and red if acyclic...... 74

Figure 5.7 a.) Models representing the PRMSDmin(5) produced by docking 1MFC with

ADV (yellow) and VC at CHI1|2 (green). The primary difference between docked

models is a ring that is flipped approximately 180 degrees, highlighted

by the orange arrows. b.) Ligands from two crystal structures, 1MFB (blue) and

1MFC (cyan), also differ by the orientation of the RAM 524 ring...... 75

Figure 5.8 A depiction of the ranks of acceptable poses (Rankacc), i.e., the lowest-ranked

pose with PRMSD ≤ 2Å, produced by ADV and VC1|2 from docking

oligosaccharide ligands onto apo protein structures...... 77

Figure 5.9 a) The ligands from five crystal structures (PDB ID: 2E0P, 2EO7, 2EEX,

2EJ1, and 2EQD) of the Cel44A enzyme are superimposed on the protein from

PDB ID: 2EQD. Amino acids reported to be involved in substrate binding (N45,

R47, W64, W71, W327, W331, E359, and W392) are colored orange or red,

depending on whether the residue is aromatic or not. 146 The catalytic residue

(Q186) is colored yellow. All other amino acids are grey. The active site has been

separated into a (-) and (+) site. The circled values represent the position of each

residue relative to the glycosidic linkage that is cleaved during catalysis. The

ligands exclusive to the (-) side of the active site are depicted by varying shades

of purple. The octasaccharide that extends across both the (-) and (+) site (2EQD)

is colored blue. Each carbohydrate ring is colored according to whether the CHI

energy penalty is applied to the surrounding Φ/Ψ values. Rings are either green or

red depending on whether VC is or is not applied, respectively. b) A

xiii

representation of the PRMSDmin(5) and PRMSDmin(20) poses from ADV and VC1|2.

c) The glycosidic linkages of the octasaccharide that extends across the active site

(2EQD) are labeled according to the penalty received by the CHI energy curve.

Penalties greater than 2 kcal/mol are highlighted in red. VC is not applied to the (-

4 1 1) residue since it is neither a C1 nor C4 chair, so the ring is colored red and the

penalties are unlisted...... 80

Figure 6.1 The replacement of the aromatic group in (A) by the group aliphatic group in

(B) in the study by Water et al. 154 in an interaction with a tetraacetylglucose

molecule led to a decrease in the interaction energy of the system...... 85

Figure 6.2 The carbohydrate antigen from Salmonella stacking against two aromatic

amino acids, namely, a Tryptophan and a Tyrosine in the binding pocket of an

antibody Fab fragment. (PDB ID: 1MFE)33 ...... 87

Figure 6.3 Representation of CH/π interactions between β-D-Glucopyranose (βDGlcp)

and Phenylalanine...... 88

Figure 6.4 The mathematical model (Lennard-Jones potential) used in this study to

describe the interaction between a CH-group and an aromatic moiety...... 90

Figure 6.5 Detection of CH/π interactions a.) An average position of the co-ordinates of

the atoms C2, O5 and O1 is determined. In order to find the vector C1H1, the

negative of the vector between points C1 and the average of atom positions C2, O5

and O1 (computed in (a.)) is determined. b.) The distance between the centroid of

the aromatic ring and the plane of the carbohydrate ring delineated by atoms O5,

C2, C3 and C5 is determined, dcenters (≤ 7Å). c.) The carbon atoms in the

carbohydrate ring are projected onto the aromatic ring plane and the distances

xiv

between each of these projections and the centroid of the aromatic ring is

determined, dcp (≤ 2.5Å). Shown in green are the CH bond vectors pointing

towards the aromatic ring (scored), and shown in red are the CH bond vectors

pointing away from the aromatic ring (not scored)...... 93

Figure 6.6 The effect of applying the CH/π interaction to the top-ranked pose produced

by VC1|2 before and after rescoring. Shown in green is the crystal ligand, in white

is the top-ranked pose before rescoring (PRMSD = 5.6Å) and in blue is the top-

ranked pose after rescoring (PRMSD = 0.9Å)...... 97

Figure 6.7 Model Systems used by Ringer et al. to quantify CH/π interactions using

quantum mechanical calculations ...... 99

Figure 6.8 a.) The individual interaction energy curves for the models (as described in

Figure 6.7) used by Ringer et al. 155, alongside the average of the individual

curves. b.) The average curve (a) shown alongside the mathematical model used

in the current study...... 100

xv

CONTENTS

ACKNOWLEDGEMENTS ...... v

LIST OF TABLES ...... vii

LIST OF FIGURES ...... viii

1. Introduction ...... 1

2. Carbohydrates: Biological Significance and Structure ...... 4

3. Computational Methods/Molecular Docking ...... 12

4. The Importance of Ligand Conformational Energies in Carbohydrate Docking:

Sorting the Wheat from the Chaff ...... 19

Abstract ...... 20

Introduction ...... 20

Methods...... 23

Results and Discussion ...... 30

Conclusions ...... 50

Individual Author Contributions ...... 51

5. Vina-Carb: Improving Glycosidic Angles During Carbohydrate Docking ...... 52

Abstract ...... 53

Introduction ...... 54

Methods...... 56

Results & Discussion ...... 62

xvi

Conclusions ...... 81

Individual Author Contributions ...... 83

6. The Consideration of CH/π Interactions in Carbohydrate-Protein Docking ...... 84

Introduction ...... 84

Methods...... 89

Results and Discussion ...... 94

Conclusions ...... 97

Future Directions ...... 98

7. CONCLUSIONS ...... 101

8. REFERENCES ...... 103

9. Appendix ...... 127

Supplementary Information Chapter 4...... 127

Supplementary Information Chapter 5...... 138

Supplementary Information Chapter 6...... 149

xvii

1. INTRODUCTION

This dissertation can be sub-divided into the following sections:

1. The comparison of docking programs for carbohydrate docking and the

development of Carbohydrate Intrinsic (CHI) Energy Functions, which describe

the rotational preferences of oligosaccharides about the glycosidic linkage.

2. The development and evaluation of Vina-Carb, formed by incorporating the CHI

energy functions within the scoring function of AutoDock Vina, and comparison

to the original program, AutoDock Vina.

3. The development of a CH/π interaction energy term to score CH/π interactions in

protein-carbohydrate complexes and the application of the function to docked

protein-carbohydrate complexes.

The above topics, along with a literature review of background information and the computational methods applied in each case are presented in the following manner:

CHAPTER 2: CARBOHYDRATES: BIOLOGICAL SIGNIFICANCE AND

STRUCTURE

Chapter 2 is a discussion on the structure and biological significance of carbohydrate and protein-carbohydrate interactions.

CHAPTER 3: MOLECULAR DOCKING

Chapter 3 discusses the theory behind the molecular docking computational method to predict intermolecular interactions. It further discusses the challenges associated with carbohydrate ligands, and specifically describes the AutoDock Vina docking algorithm.

1

Additionally in this chapter, an introduction to the research described in the following chapters is presented.

CHAPTER 4: IMPORTANCE OF LIGAND CONFORMATIONAL ENERGIES IN

CARBOHYDRATE DOCKING: SORTING THE WHEAT FROM THE CHAFF

Chapter 4 is an original research study, in which the performances of various versions of the popular docking program, AutoDock is compared using a set of antibody- carbohydrate complexes. A set of Carbohydrate Intrinsic (CHI) energy functions are developed, which are used to describe the conformational preferences of glycosidic linkages constituting oligosaccharides. The CHI energy functions are then employed to rescore the docked poses. The results from this study was published as a journal article.

A. K. Nivedha, S. Makeneni, B. L. Foley, M. B. Tessier , R. J. Woods, J. Comput. Chem.

2014, 35, 526–539.

CHAPTER 5: VINA-CARB: IMPROVING GLYCOSIDIC ANGLES DURING

CARBOHYDRATE DOCKING

Chapter 5 describes original research in which the CHI energy functions were incorporated within AutoDock Vina’s scoring function, leading to the development of

Vina-Carb. The performances of Vina-Carb and AutoDock Vina were evaluated using a set of protein-carbohydrate complexes consisting of antibodies, , carbohydrate binding modules and . This work has been accepted for publication.

A. K. Nivedha, D. F. Thieker, R. J. Woods, J. Chem. Theory. Comput. 2015

2

CHAPTER 6: THE CONSIDERATION OF CH/Π INTERACTIONS IN

CARBOHYDRATE-PROTEIN DOCKING

Chapter 6 describes original research in which, utilizing available literature, a mathematical model to score CH/π interactions in protein-carbohydrate complexes has been developed and employed in rescoring docking results from AutoDock Vina and

Vina-Carb, for a test set consisting of -carbohydrate complexes.

CHAPTER 7: CONCLUSIONS AND FUTURE DIRECTIONS

Chapter 7 summarizes the main conclusions from the preceding chapters and discusses future directions.

3

2. CARBOHYDRATES: BIOLOGICAL SIGNIFICANCE AND STRUCTURE

Carbohydrates play a central role in energy metabolism, biological recognition and as structural components in living . 1-3 4-6 Carbohydrate-binding proteins are required for transportation, degradation, biosynthesis, storage, antigen-binding and signaling. 7,8 They may exist both as freestanding entities or covalently linked to macromolecules such as proteins () and lipids (), frequently found attached to the outer surfaces, where they are conveniently positioned to modulate interactions between various components of the living system by mediating cell-cell and cell-molecule interactions. 9 When oligosaccharides are organized in the form of glycoconjugates, the mere size of the attached oligosaccharides influences the interactions of the glycoconjugates with other molecules. For example, N-glycosylation and O-glycosylation are common post-translational modifications which occur in proteins. 10 11-14, which protect the protein from degradation and in intracellular trafficking and secretion. 2 Aberrant glycosylation is often a hallmark of diseases such as rheumatoid arthritis 15-19 and cancer.20-23

Many carbohydrate-based host-pathogen interactions are currently known. 24

Surface are the most common structures found on the outer surfaces of bacterial cells. 25,26 In gram negative bacteria, carbohydrates are found constituting the lipopolysaccharides, lipooligosaccharides or capsular polysaccharides.27 The conjugation of a to a carrier protein has resulted in the production of commercially available vaccines such as those against Haemophilus influenzae 28 and Streptococcus pneumoniae 29 Many bacterial and viral pathogens bind to host tissue via interactions

4 with carbohydrates on the surfaces of the host cell. Antibodies contain as part of their structure and some antibodies are reactive against sugars found on cell surfaces of bacteria such as Shigella and Salmonella. 30-35

Of the four major classes of macromolecules found in living organisms, namely, nucleic acids, proteins, carbohydrates and lipids, carbohydrates are the most structurally diverse. 36 They are primarily defined as polyhydroxyaldehydes or polyhydroxyketones, and in their simplest form exist as , which combine with each other via glycosidic linkages forming oligosaccharides. Monosaccharides can exist in both the open chain and ring forms. When the chain-form of the has a carbonyl group (C==O) on one end which forms an aldehyde, it is called an , whereas if this carbonyl group is in the middle forming a ketone, it is referred to as a . The ring form of a monosaccharide, which is the preferred form in aqueous solutions and in oligosaccharides, is formed when the oxygen on C5, i.e., O5 links with the carbon comprising the carbonyl group (C1), transferring its hydrogen to the carbonyl oxygen forming a hydroxyl group. This forms a chiral anomeric center at C1. The oxygen at C1

(O1) can be either axial or equatorial with respect to the carbohydrate ring. This electronegative O1 atom prefers to adopt the axial orientation due to steric and stereoelectronic effects, instead of the less hindered equatorial orientation which would be expected to be the preferred orientation based on steric effects alone. This is known as the anomeric, or more accurately, the endo-anomeric effect.

5

anomeric carbon

chain form of glucose

α-glucopyranose β-glucopyranose

Figure 2.1. An illustration of the conversion from the chain and ring form of glucose.

Monosaccharides forming a five-membered ring are called and those which form a six-membered ring are called . Similar to cyclohexanes, 6- membered monosaccharides exist most often in one of two isomeric chair conformations,

1 4 which are specified as C4 and C1, where the letter C stands for ‘chair’ and the numbers indicate the carbon atoms above and below the reference plane of the chair conformation formed by the atoms C2, C3, C5 and O5. (Figure 2.2)

6

4 1 C1 C 4

4 1 Figure 2.2 A representation of two chair conformations of Glucose, namely, C1 and C4.

The individual units constituting proteins and nucleic acids are generally connected in a linear fashion by a single type of linkage, namely, the amide linkage between amino acids in proteins and the 3’ to 5’ phosphodiester bonds in nucleic acids. 37

Oligosaccharides however, can be linear or branched and each monosaccharide unit can be linked to another via a glycosidic linkage which can be if different types depending on the stereochemistry of the C1 atom on the non-reducing and that of the linking atom on the reducing sugar. A is formed when two monosaccharides combine via a condensation reaction, resulting in the release of a water molecule and the formation of a . The formation of a glycosidic linkage results in the formation of a reducing sugar on one end and a non-reducing sugar on the other.

7

H2O

bDGlcp bDGalp bDGlc1-3bDGal

Figure 2.3 A 1-3 glycosidic linkage formation between a glucopyranose (Glcp) unit and a galactopyranose (Galp) unit. The D in the name refers to the molecule being dextrorotatory, which refers to it rotating plane polarized light to the right.

Different kinds of sugars exist in and the main difference between most saccharides is in the orientation of the hydroxyl groups with respect to the plane of the carbohydrate ring, resulting in significant differences in the physical and chemical properties of the sugars. Glucose and mannose are C2-epimers while glucose and galactose are C4-epimers. (Figure 2.4) These have the molecular formula

C6H12O6. The stereoisomers for these aldohexoses were identified by the German chemist

Emil Fischer in the early 19th century. 38

bDGalp bDGlcp bDManp

Figure 2.4 Carbohydrate epimers: galactose and glucose are C4 epimers, while glucose and mannose are C2 epimers.

8

The three-dimensional structures of carbohydrates are greatly influenced by the conformations of the glycosidic linkages connecting individual monosaccharide units.

The lone pair of electrons on the O5 atom of the sugar ring has a significant effect on the conformational stability and orientation of the glycosidic linkage. 39,40 The anomeric effect is observed in saccharides, due to which the electronegative substituent at the C1 position tends to adopt the axial orientation rather than the equatorial orientation in contrast with expectations based solely on sterics. 41-46

From previous work analyzing the preferences of glycosidic bonds, it is clear that carbohydrates most prefer a single rotamer at both the Φ and Ψ linkages. The preferred range of glycosidic angle values is broader for the Ψ angle compared to the Φ linkage. It is also known that some proteins distort the carbohydrate ring shapes, and consequently the glycosidic linkages upon binding. A survey of the PDB for protein-carbohydrate crystal complexes in which the oligosaccharide is bound to enzymes in addition to other proteins such as lectins an antibodies, revealed that the distortion of glycosidic linkage by binding partners of carbohydrates is a rare occurrence. 47,48

Carbohydrate-Protein Complexes

Proteins that bind to carbohydrates have a great diversity of binding site topologies and functions, and include enzymes, lectins, antibodies and periplasmic receptors. 49 Complex formation is driven primarily by hydrogen bonding, van der Waals contacts, and hydrophobic interactions. 50 Whereas the former contributes to specificity,

51 by virtue of the directionality of the hydroxyl groups, the latter two contribute to affinity through non-specific interactions. 52 Being highly polar molecules, sugars are highly solvated in an aqueous solution. The hydroxyl groups in a sugar molecule are

9 involved in cooperative hydrogen bonds, bidentate hydrogen bonds and hydrogen bonding networks. 53 Each hydroxyl group in a saccharide can engage in two kinds of hydrogen bonds, as a donor of one hydrogen bond and an acceptor of two through the sp3 lone pairs. When the sugar hydroxyl group is a donor, the hydrogen bonds formed are shorter or stronger than those formed when the sugar hydroxyl group is an acceptor. 54 In cooperative hydrogen bonds, the hydroxyl group in the sugar acts as both a donor and acceptor of hydrogen bonds. A bidentate hydrogen bond is formed when two adjacent

4 hydroxyl groups in a C1 sugar interact with a different atom of the same planar polar side-chain residue. The presence of both cooperative and bidentate hydrogen bonds leads to the creation of networks of hydrogen bonds between the sugars and interacting amino acids. And when these planar polar residues hydrogen bond with nearby polar residues, it results in the formation of a more elaborate hydrogen bond network. Hydrogen bonds formed as a result are strong enough to stabilize the complex but are also weak enough to accommodate ligand dynamics. Amino acids with polar planar side-chain groups, capable of forming all three kinds of hydrogen bonds, such as Glu, Gln, Asp, Asn, Arg and His, are abundant in the binding sites of sugars. 51

Van der Waals interactions make a significant contribution to protein- carbohydrate complex-formation, in addition to contributions from other interactions such as the stacking of the hydrophobic patches of carbohydrate rings against aromatic amino acids lining the binding site. An analysis of protein-carbohydrate complexes in the

PDB has revealed that carbohydrate binding sites have a higher propensity for aromatic amino acids namely, tryptophan, tyrosine, phenylalanine and histidine compared to the rest of the protein. 55-57 The presence of aromatic amino acids in the sugar binding site

10 also contributes to specificity by allowing or disallowing particular sugar epimers through the combination of steric hindrance and a favorable or unfavorable polar environment. 58

A wealth of information can be gained from an understanding of the structure and dynamics of protein-carbohydrate interactions, however, carbohydrates are extremely flexible molecules 59, making protein-carbohydrate complexes particularly challenging to crystallize. As a result, computational methods such as molecular docking and simulations can be employed to gain insight into the physical and biochemical properties carbohydrate molecules, both freely in solution and in complex with proteins.

The knowledge thus gained has various applications including gene therapy and the design of carbohydrate-based biotherapeutic agents.

11

3. COMPUTATIONAL METHODS/MOLECULAR DOCKING

A detailed understanding of the three-dimensional structure and subsequently the function of carbohydrates is vital in increasing our understanding of crucial biological processes. However, obtaining experimental 3D structures of carbohydrates is a challenge, 60 and as a result, theoretical modeling methods can be employed to aid in understanding the relationship between the structure and function of oligosaccharides.

Molecular docking and molecular dynamics simulations are key computational approaches used in the study of carbohydrate molecules. In this chapter we will focus on molecular docking methodologies, specifically in relation to oligosaccharide ligands.

Molecular docking predicts the binding orientation and affinity of a small molecule

(ligand), with respect to a larger molecule (macromolecule). The area around the predicted ligand binding site on the macromolecule is specified using a gridbox. The two main steps in docking are searching and scoring. The search algorithm searches the available conformational space for favorable binding modes of the ligand with respect to the macromolecule, while the docking scoring function evaluates each pose generated by the algorithm. During docking, a compromise between speed and effectiveness in sampling the conformational space available has to be made. The program typically produces several models at the end of a docking run, which are then ranked based on calculated binding affinities.

There are different approaches to docking, such as rigid docking and flexible docking.

Figure 3.1 When all torsion angles are frozen during docking, it is termed as rigid

12 docking. During flexible docking, some if not all of these parameters are allowed to vary.

If upon complex-formation significant conformational change occurs in either the protein or ligand or in both molecules, rigid docking is inadequate to model such a binding event.

In such cases, flexible docking should be the method of choice, which allows for induced fit during complex formation. The level of computational complexity allowed during a docking run can be set by the user, by adjusting the level of flexibility of the ligand and macromolecule. Proteins can be docked rigidly, because, a comparison of experimental protein-ligand complexes to their unbound counterparts has revealed that in most cases, only a few side-chains in the active site of the protein change conformation.

a.) Docked Complexes Ranked Gridbox according to Binding Affinities.

1.

Macromolecule Ligand 2.

......

n.

13 b.) Docked Complexes Ranked Gridbox according to Binding Affinities.

1.

Macromolecule Ligand 2.

......

n.

Figure 3.1 a.) Rigid Docking b.) Flexible Ligand Docking

The application of a scoring function helps to assess protein-ligand complementarity more than calculating binding affinity, as even non-binder ligands can be docked and given a binding affinity score using molecular docking. However, docking has proved to be an indispensable computational tool which helps in obtaining a 3D starting structure for a bound protein-ligand complex, which could not be obtained experimentally. It also helps to assess the binding of multiple small molecules against a single protein target and compare binding affinities. Protein-ligand complementarity is a prerequisite for binding to occur, but cannot be used as the sole criterion for evaluation.

Docking scoring functions evaluate how well the predicted binding pose of a ligand complements the protein binding site, and can be empirical or knowledge-based scoring functions. Empirical scoring functions operate on the assumption that binding

14 affinities can be evaluated by the summation of independent interaction energy terms, which in most cases is a weighted sum of electrostatics, hydrogen bonding, hydrophobic interaction and repulsion terms. The coefficients for the individual terms of the scoring function are derived by fitting to experimentally determined Ki values of protein-ligand complexes with solved crystal structures. In general, these scoring functions suffer from a significant dependence on ligand size, i.e., greater the size of the docked ligand, greater or better the calculated binding affinity. Knowledge-based scoring functions are derived by performing a statistical analysis of experimentally-determined protein-ligand complexes based on the assumption that if certain contacts occur at a statistically significant rate, it must be favorable and vice versa.

Several parameters affect the performance of the docking scoring function, including the physical and chemical properties of input molecules, the preparation of the input and the individual terms of the docking scoring function. Docking scoring functions are usually developed for the purpose of high-throughput virtual screening of relatively small, rigid, drug-like molecules. In this thesis, we will study the performance of such docking methodologies with respect to carbohydrate ligands, which are larger, more flexible molecules ranging from a disaccharide to a dodecasaccharide connected by 1,x- linkages (x = 2, 3, 4 or 6). Applying these generalized docking scoring functions to carbohydrate docking usually leads to an unfavorable deviation of the carbohydrate ligands from their natural conformations. It may be useful to customize docking scoring functions to specifically dock carbohydrate ligands.

The glycosidic torsion angles connecting individual monosaccharide units have a major influence on the overall conformation of an oligosaccharide ligand. Although these

15 linkages are generally flexible, this flexibility spans a limited range of preferred torsion angles, which has been identified from a survey of carbohydrate crystal structures in the

PDB. 48 All protein-carbohydrate complexes found in the PDB were included in this survey which consisted of carbohydrates both covalently and non-covalently interacting with proteins such as lectins, antibodies, enzymes, carbohydrate binding modules, etc. In the past, efforts have been made to model the conformational preferences of carbohydrates into molecular docking; the approaches used include a re-calibration of an existing docking scoring function to model carbohydrate properties, the inclusion of additional interaction energy terms in the scoring function which are crucial to protein- carbohydrate binding and the inclusion of a carbohydrate conformational energy score to an existing docking scoring function.

In this thesis, the performances of a few docking programs are evaluated and compared using a set of antibody-carbohydrate complexes with solved X-ray crystal structures from the PDB. A standardized docking protocol for docking oligosaccharide ligands onto antibodies has also been described. A set of energy functions which calculate the conformational energies of carbohydrates has been derived using quantum mechanical methods. These carbohydrate internal energy functions, known as

Carbohydrate Intrinsic (CHI) energy functions score a disaccharide molecule based on the orientations of the glycosidic torsion angles. The CHI energies were then added to docked energies, showing a significant improvement in the ranking of accurate binding poses. Finally, the CHI energy functions were coded to constitute the docking program’s

(AutoDock Vina) scoring function leading to the development of Vina-Carb. The performance of Vina-Carb was evaluated against a set of 72 protein-carbohydrate

16 complexes with solved crystallographic structures from the PDB, and compared to the performance of the original docking program without the CHI energy functions,

AutoDock Vina.

For each AutoDock Vina docking job, multiple runs are started from random conformations. The number of individual runs are determined by the exhaustiveness parameter, which can be set by the user. Each run consists of a set of sequential steps, which are determined heuristically based on the number of flexible bonds in the system under study. Each step consists of 3 stages, namely a random perturbation of the system, followed by a local optimization using the Broyden-Fletcher-Goldfarb-Shanno algorithm and a selection step in which the step is either accepted or not. Each local optimization involved numerous evaluations of the docking scoring function, and is decided based on convergence and other criteria. Each run can produce multiple promising results, which are stored, and finally merged, clustered and sorted to produce the final result of docked poses. (Figure 3.2)

17

AutoDock Vina Each Run, R i

Run R1 Step S1

Each Step, Si

Run R2 Step S2

Random Perturbation Evaluations Local Optimization of Scoring (BFGS) Function

Selection

Run RN Step SN

Final Result Merged. Refined. Clustered. Sorted.

Figure 3.2 The workflow within the AutoDock Vina algorithm.

18

4. THE IMPORTANCE OF LIGAND CONFORMATIONAL ENERGIES IN

CARBOHYDRATE DOCKING: SORTING THE WHEAT FROM THE CHAFF

______A. K. Nivedha, S. Makeneni, B. L. Foley, M. B. Tessier , R. J. Woods, J. Comput. Chem. 2014, 35, 526–539. Reprinted here with the permission of publisher.

19

Abstract

Docking algorithms that aim to be applicable to a broad range of ligands suffer reduced accuracy because they are unable to incorporate ligand-specific conformational energies.

Here, we develop internal energy functions, Carbohydrate Intrinsic (CHI), to account for the rotational preferences of the glycosidic torsion angles in carbohydrates. The relative energies predicted by the CHI energy functions mirror the conformational distributions of glycosidic linkages determined from a survey of oligosaccharide-protein complexes in the Protein Data Bank. Addition of CHI energies to the standard docking scores in

Autodock 3, 4.2, and Vina consistently improves pose ranking of oligosaccharides docked to a set of anti-carbohydrate antibodies. The CHI energy functions are also independent of docking algorithm, and with minor modifications, may be incorporated into both theoretical modeling methods, and experimental NMR or X-ray structure refinement programs.

Introduction

Protein-carbohydrate interactions are crucial in numerous aspects of biology, including metabolism, gene expression, cell-cell communication, growth, development, and immune response 9. In vivo, complex carbohydrates (glycans) are found on cell surfaces as glyconjugates (glycoproteins/glycolipids) or polysaccharides, mediating biological function by their direct interaction with proteins, such as receptors (lectins), enzymes, and antibodies. Cancer is marked by aberrant glycosylation which can serve as a disease- related marker, or as a target for therapeutic intervention 22,61-63. Conversely, endogenous cell-surface glycans are frequently exploited by infectious agents, as in the

20 hemagglutinin-mediated adhesion of influenza A virus. 64-66 A physical understanding of carbohydrate-protein interactions aids in the development of therapeutic agents designed to block such interactions, 67-70 such as antibodies which target specific glycans. 71,72 A better understanding of the immune system’s response to carbohydrate-based vaccines, 73-

76 facilitates the prediction and rationalization 71 of hazardous or misleading cross- reactivities between antibodies against disease-related carbohydrates, and endogenous

77,78 glycans.

The challenges involved in obtaining co-complexed carbohydrate-protein structures using experimental methods such as X-ray crystallography and NMR spectroscopy include, production and purification of the protein, isolation or synthesis of the , and co- crystallization of the complex.60 Therefore, there is a long-standing interest in applying theoretical modeling methods (automated docking) to aid in the characterization of the

3D structure of carbohydrate-protein complexes. 71,79-84 However, these methods also have limitations. Automated docking faces the triple challenge of accurately predicting 1) the ligand orientation in the binding site (pose); 2) the ligand conformation in the binding site (shape); and 3) the relative affinity of the optimal pose (interaction energy). Ligand internal energies are only approximately modeled within docking algorithms by mainly considering energies associated with internal steric repulsion. Such an approximation inherently degrades the accuracy of docking predictions as various ligand classes have specific conformational properties. The glycosidic torsion angles between individual monosaccharides forming glycans are crucial in defining their 3D structure and dynamics. The accurate prediction of oligosaccharide conformations requires the

21 additional consideration of stereo-electronic properties responsible for the anomeric, exo- anomeric, and gauche effects. 85 Their omission frequently leads to the incorrect prediction of docked oligosaccharide conformations. 86-88

Docking programs treat interaction energy terms as empirically-adjustable components, which may be tuned for a particular ligand class, such as carbohydrates. 89 Inclusion of carbohydrate conformational energies in the docking energy function would likely require reoptimization of the empirical weighting resulting in a non-transferable carbohydrate-specific implementation of the algorithm. Alternatively, we wished to develop a carbohydrate-specific conformational energy function which predicts oligosaccharide energies independent of docking algorithm, and could potentially also be employed to evaluate the conformational energies of experimentally-determined oligosaccharide structures. We focused on modeling conformational properties intrinsic to glycosidic linkages between pyranoses, with the criterion that the method should also be generalizable to other carbohydrate ring forms, such as furanoses, as well as to other linkages, such as 1-6, 2-3, 2-6, etc. Tetrahydropyran, and related analogs, have long been employed as representative carbohydrates in quantum mechanical calculations for this purpose. 90-97 The assumption being that any additional effects on the conformational properties, for example from hydrogen bonding, overlay the intrinsic properties of the linkages between pyran rings. Quantum mechanical calculations were employed on a set of glycosidically-linked tetrahydropyrans representing all two-bond linkages between pyranoses. The rotational energy profiles for these linkages were used to derive the desired carbohydrate intrinsic (CHI) energy functions. Given a 3D oligosaccharide

22 structure, the CHI energy functions may be employed to estimate the energy arising from any distortion of the glycosidic linkages, relative to their lowest energy conformations.

Because of the important roles of anti-carbohydrate antibodies in therapeutic and diagnostic applications, and the challenges associated with experimentally defining their

3D structures, they have been the subject of numerous automated docking studies. 98-104

We chose six crystallographically-determined antibody-carbohydrate complexes to evaluate the ability of CHI energy functions to improve predicted rankings of the docked poses. These systems were selected based on the diversity of the antibody binding site topologies (canyon, valley, crater), 105 and size variations of the carbohydrate ligands (tri- to penta saccharides including linear and branched sequences).

Methods

System selection and docking protocol

Docking was performed using AutoDock 3.0.5 (AD3), 106 4.2 (AD4.2) 107 and Vina 1.1.2

(ADV). 108 Details of the reference systems, including PDB IDs, ligand sequences and biological origin are presented in Table 4.1. In each case, the protein chain containing the ligand with the lowest average B-factor was selected for docking. The carbohydrate ligands in systems 1UZ8, 1S3K and 1M7I were built using the Carbohydrate Builder on

GLYCAM-Web (www.glycam.org). 109 The remaining ligands contain the non-standard sugar residues abequose and 2-deoxy-rhamnose. Oligosaccharides containing these deoxy residues were assembled using the tLEaP 110 module from the AMBER package employing GLYCAM06i force field parameters and PREP residue structure files, available for download at www.glycam.org (S4.11). The antibody structures were

23 obtained from the PDB (www.rcsb.org). 111 All protein and ligand files were prepared for docking using AutoDock Tools 1.5.4 (ADT). 107 The choice of partial charge was based on the method used to calibrate the scoring functions of the individual docking programs;

Kollman charges 112 were added to the protein for docking with AD3, while Gasteiger

113 charges were used to prepare proteins for docking with AD4.2 and ADV, and in each case Gasteiger charges were assigned to the ligands. AutoDock distributes any non-zero residual net charge across the macromolecule. Hydrogen atoms were added to the protein using ADT, whereas GLYCAM hydrogens were retained in the ligands. A standard grid box (dimensions: 26.25 x 26.25 x 37.50Å) was employed for all runs, centered relative to the complementarity determining regions (CDRs) of the antibody (Figure 1a). Before docking, the ligand was translated to the center of mass (CoM) of the CDRs but maintained in the default GLYCAM orientation and conformation. VMD 109 was used for molecular and image-rendering.

24

Table 4.1 PDB IDs and ligand sequences employed in the study, including the shape

RMSD (SRMSD) values for the ligands generated by GLYCAM, relative to the crystallographic ligands.

PDB ID: Ligand Chain ID Graphic representation of SRMSD Biological (average B- (Resolution) the ligand a,c Origin factor)b a

DAbepα1-

1MFA69,d: 3[DGalpα1- Mus L/H 2]DManpα- 0.6 musculus (1.7) OMe

(25.1)

DAbepα1-

1MFD70,d: 3[DGalpα1- Mus L/H 2]DManpα- 0.5 musculus (2.1) OMe

(30.1)

DGalpβ1-

1UZ871: 4[LFucpα1-

Mus α

A/B 3]DGlcpNAc 3 0.3 β 4 musculus

(1.8) β-OMe

(41.8)

25

LRhapα1-3(2-

deoxy)LRhap 1M7D72: α1- Mus A/B α 3 α 3 0.3 3DGlcpNAcβ musculus (2.3) -OMe

(39.8)

LFucpα1-

2DGalpβ1-

1S3K73: Homo α

4[LFucpα1- α 2 L/H 3 0.4 sapiens, Mus 3]DGlcpNAc β 4

(1.9) musculus α-OH

(26.6)

LRhapα1-

2LRhapα1-

1M7I72: 3LRhapα1- Mus A/B 3DGlcpNAcβ α 2 α 3 α 3 β 2 1.1 musculus (2.5) 1-2LRhapα-

OMe

(35.4)

= Mannose (Man) = Galactose (Gal) = (Fuc) = 2-Deoxy Rhamnose

= Abequose (Abe) = N-Acetyl Glucosamine(GlcNAc) = Rhamnose (Rha) = Aglycon (OME/OH)

26 aIn Å. bIn Å2. cSRMSD defined in Section Shape, and pose, RMSD values. d1MFA and

1MFD, consisted of the trisaccharide antigen from Salmonella serotype B. In 1MFD, the trisaccharide is bound to a Fab antibody fragment, while in 1MFA the trisaccharide is bound to a single-chain Fv fragment of the antibody. Although the antigen-binding site in both the Fab and scFv fragments are essentially the same, and bound to the same trisaccharide antigen, in the Fv-complex a water molecule has become inserted into an internal hydrogen bond within the trisaccharide, leading to a perturbation of the trisaccharide conformation.

In all ligands, the hydroxyl groups and glycosidic torsion angles were defined as being flexible, while the C5-C6 bonds were restrained at the orientation present in the reference crystal structures. The protein was maintained rigid. In AD3 and AD4.2, 100 runs of the Lamarckian Genetic Algorithm were employed, with 800,000 energy evaluations per run, and a population size of 200. The translation step size was 2Å, while the quaternion and dihedral step sizes were each 50°. The ADV source code was modified to increase the total number of output structures from 20 to 100 (Supplementary

Material, S4.1). The maximum energy difference between the best and worst binding modes was set at 10 kcal/mol while the exhaustiveness value was 8. The complete set of docking parameters used is given in S4.2, S4.3 and S4.4.

Antibody and docking grid box alignment

Consistent grid box placement on the CDRs was achieved by positioning the box relative to three points defined by specific CoM’s within the CDRs. The CDRs were

27 identified using the AbM definition, 114,115 based on both the Kabat 116 and Chothia 117 numbering schemes. To ensure consistent orientation of the antibody surface relative to the box grid points, the protein coordinates were transformed with respect to a set of internal coordinate axes, as shown in Figure 4.1. This protocol removes any issues arising from the fact that the grid is cubic and not spherical, which can otherwise result in varied regions of each antibody being included within the grid.

Figure 4.1 (a) Illustration of an antibody with its variable fragment (Fv) aligned to the grid box. The yellow dot represents the CoM of the CDRs (0,0,0), and the green dot represents the center of the grid box (0,0,11). (b) Aligned orientation of an antibody antigen-binding fragment (Fab), with respect to the internal reference axes. The region in red + pink represents the VH domain (CDRs (red) and framework regions (pink) of the heavy chain) of the antibody, while the region in blue represents the VL domain (CDRs

(dark blue) and framework regions (cyan) of the light chain). The X-axis for the alignment was defined by a vector passing through the CoM of the variable light chain

28

(VL domain, which contains the light chain CDRs and framework sequences), and the

CoM of the variable heavy chain (VH domain). The Z-axis was defined as a vector normal to the X-axis, and passing through the CoM of the entire variable region, or variable fragment (Fv). The antibody was then translated so that the CoM of the CDRs was placed at the origin. The Y-axis was defined as a vector perpendicular to the XZ- plane, and passing through the origin. The docking grid box was aligned to the internal co-ordinate axes with its center offset from the origin by 11Å along the Z-axis, so as to optimally encompass the CDR loops, while also permitting adequate volume for the movement of the ligand during docking. Such a definition enabled the docking grid box to be consistently aligned with respect to the CDRs.

Quantum mechanical calculations

Quantum mechanical calculations were performed using Gaussian09. 118

Structures were optimized at the HF/6-31G++(2d, 2p) level of theory, and single-point energies calculated at the B3LYP/6-31G++(2d, 2p) level, consistent with the approach used in the GLYCAM force field development. 94 Rotational energy profiles were computed at 15° increments, allowing complete relaxation of other coordinates.

Shape, and pose, RMSD values

Pose RMSD (PRMSD) values were obtained by calculating the RMSD between the ring atoms of the crystal ligand maintained in its native co-crystallised position, and the corresponding ring atoms in the docked ligand maintained in its docked position

(Figure 4.2a). A pose with a PRMSD ≤ 2Å was considered to have been successfully docked. Shape RMSD (SRMSD) values were obtained by first superimposing the crystal and docked ligands followed by calculating the RMSD between their respective ring

29 atoms (Figure 4.2b). The SRMSD is a quantification of the dissimilarity in the 3D conformations of the docked and crystal ligands, irrespective of their relative positions on the protein surface.

Pose RMSD = 5.5Å b Shape RMSD = 1.1Å a

Figure 4.2 PRMSD and SRMSD calculation. Shown in (a) and (b) are the PRMSD and

SRMSD, respectively, of a representative docked pose with respect to its crystal ligand.

(a) The PRMSD is the RMSD between the ring atoms of a representative docked structure (white) and the corresponding crystal structure (black). (b) The SRMSD is the

RMSD value obtained after the docked structure (white) is superimposed on the crystal structure (black).

Results and Discussion

Assessment of current docking methodologies

The six ligands extracted from their co-crystal structures could successfully be docked back rigidly into the same structure of the protein (results not shown); this is an outcome observed previously in studies of carbohydrate-protein docking. 103,119 Although necessary, this docking experiment is not a sufficient prerequisite for any docking method, since both molecules in a co-crystallized complex are already in the correct conformation for binding, and do not require induced fit to occur during docking.

30

Independently-generated oligosaccharide 3D structures were employed as ligands to test the performance of the docking methodologies in predicting bound conformations of unknown carbohydrate-protein complexes. These starting structures were generated using GLYCAM, known to produce low-energy conformations of carbohydrates; the structures generated were found to be essentially equivalent to the same ligands found in the co-crystal structures, as indicated by their SRMSDs (Table 1), and by a comparison of their glycosidic torsion angles (S4.5). The average SRMSD between the crystallographic ligands and theoretical structures was 0.53Å. The preliminary SRMSD analysis also showed that the ligand in each antibody complex adopted a low energy conformation, similar to that expected for the free ligand.

A second requirement for a general docking protocol is to permit the ligands a reasonable level of freedom by allowing their glycosidic torsion angles and hydroxyl groups complete flexibility. This approach enables comparisons to be made between structures of the experimental and theoretical ligands, facilitating an assessment of the impact of induced fit in the ligand on the outcome from docking analysis.

After docking, the φ (O5’-C1’-Ox-Cx) and ψ (C1’-Ox-Cx-Cx-1) glycosidic torsion angles of the docked poses (Figure 4.4-I) were measured, and compared to the torsion angles of corresponding linkages in the experimental co-crystal structure, and in the initial

GLYCAM theoretical structure. The analysis indicated that the distribution of the torsion angle values amongst the docked poses frequently deviated considerably from both the crystal and GLYCAM reference values (S4.5). Five examples of this analysis are highlighted in Figure 4.3. Presented in Figure 4.3a is an instance in which all three docking programs identified the lowest energy pose correctly (that is, with the glycosidic

31 angles falling within 30° of the corresponding torsion angles in the crystal structure).

Presented in Figure 4.3b, c, and d are cases in which only one of the docking programs identified the correct pose, and finally an example is shown in which all three programs failed to produce the correct torsion angles (Figure 3e). All of the methods were able to generate some number of conformations that were within 30° of the crystallographic φ and ψ values, however, these were often not the poses that had the best docking energy.

Thus, in a routine application of docking, they would not be identified as the most likely

(highest-ranked) pose. Overall, a very broad range of torsion angles (and therefore 3D shapes) were generated by each algorithm, indicating a potential opportunity to employ a conformational energy function as an additional filter to identify unlikely conformations in the docking data.

32

φ ψ a 50 Expt.: 277.3 70 1UZ8 Expt.: 260.6

60 40 50 30 40

20 30 20 10 10

0 0

b 80 Expt.: 76.1 60 1MFD Expt.: 220.6 70 50 60 50 40 40 30 30 20 20 10 10 0 0

c 80 60 70 Expt.: 71.5 1MFA Expt.: 224.9 50 60 50 40

40 30 30 20 20

10 Percentage of of Percentagestructures

10 of Percentagestructures

0 0 d 50 60 Expt.: 282.2 1S3K Expt.: 256.6 40 50

30 40

30 20 20 10 10 0 0 e 50 60 Expt.: 269.8 1M7I Expt.: 53.4 50 40 40 30 30 20 20

10 10

0 0

φ [30 deg bins] ψ [30 deg bins]

33

Figure 4.3 The φ and ψ angle distributions from 100 docked structures, for selected linkages, as indicated by the dashed rectangle. Data are presented, in order, for AD3

(black bars), AD4.2 (white bars) and ADV (grey bars). The bin containing the experimentally-determined values is highlighted with a light blue outline. The bin containing the structure with the lowest docked energy is indicated as follows: AD3, yellow; AD4.2, orange; ADV, green.

Development and validation of the CHI energy functions

Quantum mechanical conformational energies for a variety of model disaccharides were obtained by employing tetrahydropyran (THP) as the minimal model of a carbohydrate ring. Two THP molecules were used to model each glycosidic linkage

4 1 (1-2, 1-3 and 1-4) between pyranoses in the C1 and C4 configurations. Given that there are two anomeric configurations (α and β), and two hydroxyl configurations (axial (ax) and equatorial (eq)), associated with each linkage, the development of each CHI energy function required the analysis of the glycosidic rotational energies of at least four structures per linkage. For example, the different models used in modeling the 1-3 linkage are presented in Figure 4.4.

34

I II III IV

(eq) (ax) ψ φ (ax) (eq) (eq) (eq) (eq) (eq) V VI VII VIII

(ax) (ax)

(ax) (ax) (ax) (ax) (eq) (eq)

Figure 4.4 Representation of the 8 model disaccharides pertinent to the development of

CHI energy functions. The models depicting 1,2-linkages can be used to model 1,4- linkages due to symmetry about the O5 atom.

Individual rotational energy profiles were determined for both the φ (O5’-C1’-Ox-

Cx) and ψ (C1’-Ox-Cx-Cx-1) glycosidic torsion angles of the various disaccharide models

(Figure 4.5). A similar approach has been employed by A. D. French to examine the properties of various disaccharides and disaccharide analogs. 96,98,112,120 Models with similar local symmetries gave rise to similar torsional energy profiles and were grouped together. Average energy curves were then obtained for each group. Based on similar energy profiles, two average energy curves for the Φ-linkage were computed: one, for all models with an α-linkage (Figure 4.5a), and the other for all models with a β-linkage

(Figure 4.5b). Similarly, two average curves for the Ψ-linkage were computed, based on division of the linkages into the following two groups: 1-2ax, 1-4ax, 1-3eq (Figure 4.5c); and 1-2eq, 1-4eq, 1-3ax (Figure 4.5d).

35

Figure 4.5 Individual (dashed lines) and average (solid line) rotational energy curves for models (see Figure 4.4) whose linkages have similar local geometries.

The CHI energy functions (S4.6) were generated by fitting expansions

(Eqn 4.1) to the average energy values for each of the curves in Figure 5 using the default fitting routine in Gnuplot ver. 4.0 113:

2 (푥−푏 ) − 푖 푁 푐 푓(푥) = ∑푖=1 푎푖 푒 푖

(Eqn 4.1) where, N is the number of individual Gaussian functions used for each CHI energy equation, x refers to the glycosidic torsion angle (φ or ψ), and ai, bi, and ci refer to the

36 magnitude, width, and mid-point of the distribution respectively. All curves (S4.7) were adjusted to a minimum value of 0 kcal/mol, and may therefore be considered conformational energy penalty functions. In order to apply the energy curves shown in

Figure 4.5 to linkages containing L-sugars, it is simply necessary to employ the mirror images of the relevant energy curve.

The experimental distribution of glycosidic angles in carbohydrate-protein crystal structures in the PDB provides an independent metric for comparison with the predicted

CHI energies. Glycosidic torsion angle data for over 13,000 glycosidic linkages were extracted using the GlyTorsion web-tool 121 (S4.8), binned, and plotted against the corresponding CHI energy curves (Figure 4.6). The comparison leads to the important conclusion that the majority of proteins that recognize oligosaccharides select low energy

(solution-like) conformations of the glycosidic linkage. This has considerable importance for carbohydrate docking, as it supports the view that biasing selection toward low energy linkage conformations should enhance the likelihood of correct pose prediction.

37

a φ [deg] b φ [deg] 0 60 120 180 240 300 360 0 60 120 180 240 300 360 12 9 16 16 8 10 14 14 7 12 8 12 6 10 10 5 6

8 8 4 [kcal/mol]

6 4 [kcal/mol] 6 3

E E E Δ 4 VII, VIII, VI, V Δ 4 III, IV, I, II 2 2 2 2 1

0 0 structuresofPercntage 0 0

Percentage of structuresof Percentage

0 0 4 to 0 4 to

25 to 29 to 25 54 to 50 79 to 75 29 to 25 54 to 50 79 to 75

275 to 279 to 275 100 to 104 to 100 129 to 125 154 to 150 179 to 175 204 to 200 229 to 225 254 to 250 304 to 300 329 to 325 354 to 350 104 to 100 129 to 125 154 to 150 179 to 175 204 to 200 229 to 225 254 to 250 279 to 275 304 to 300 329 to 325 354 to 350 φ [5 deg bins] φ [5 deg bins]

ψ [deg] c 0 60 120 180 240 300 360 d ψ [deg] 6 0 60 120 180 240 300 360 16 6 14 5 16 14 5 12 4 10 12 4 8 3 10

[kcal/mol] 3

6 2 8 E E VII, III, V, II 6 VIII, IV, VI, I [kcal/mol] 4 Δ 2 1 E

4 Δ 2 1 0 0 2

Percentage of structuresof Percentage 0 0

Percentage of structuresof Percentage

0 to 4 0 to

0 to 4 0 to

25 to 29 to 25 54 to 50 79 to 75

25 to 29 to 25 54 to 50 79 to 75

275 to to 275 279 100 to 100to 104 125to 129 150to 154 175to 179 200to 204 225to 229 250to 254 300to 304 325to 329 350to 354

350 to 354 to 350 125 to 129 to 125 154 to 150 179 to 175 204 to 200 229 to 225 254 to 250 279 to 275 304 to 300 329 to 325 ψ [5 deg bins] 104 to 100 ψ [5 deg bins]

Figure 4.6 Comparison of the CHI energy functions (solid line) to the glycosidic torsion angle distributions of carbohydrates from experimental co-crystal structures (histograms).

Refinement of the docking results using the CHI energy functions

An assessment of the performance of each of the docking algorithms can be made by plotting the difference between the conformations of the ligands, relative to that in the co-complex (SRMSDs), against the predicted interaction energies. Ideally, poses with correct ligand shapes should have lower interaction energies than seen for incorrect shapes. Plots of interaction energy versus SRMSD were generated for AD3, AD4.2 and

ADV (Figure 4.7), and the coefficient of determination (R2) computed by linear regression. In each case, only weak linear relationships between ligand shape and

38 interaction energy were observed (R2 ≤ 0.19), and in the case of ADV a slight negative slope was observed. Following rescoring of the docked poses by addition of the CHI energy from each glycosidic angle to the docked energy of the structure, a clear enhancement of the R2 values was observed, across all three programs (0.60 ≤ R2 ≤ 0.68).

It should be reiterated here that none of the three docking algorithms include internal rotational energies (torsion terms), and at best account for ligand internal energies in a general steric sense. In the case of glycosidic linkages, this internal energy was found to be less than approximately 0.2 kcal/mol. Thus, while some double counting of internal energy is introduced by adding the CHI energy directly to the total docking energy, it does not result in a significant error.

60 60 60 20 AD3 20 AD4.2 20 ADV

50 50 50 0 0 0 40 40 40

-20 -20 -20 30 0 1 2 3 30 0 1 2 3 30 0 1 2 3

20 20 20 [kcal/mol] R² = 0.68 10 10 10 R² = 0.66 Energy R² = 0.60 0 0 0

-10 -10 R² = 0.19 -10 R² = 0.09 R² = 0.12 -20 -20 -20 0 2 4 6 0 2 4 6 0 2 4 6 SRMSD [Å]

Figure 4.7 Scatter plots demonstrating improvement in the linear correlation between

SRMSD and docked energies after rescoring, for each of the three docking programs.

Points before rescoring are shown in dark grey and points after rescoring are shown in light grey. Shown in the insets are SRMSD vs. docked energy plots of only the overall lowest PRMSD structure for each of the six antibody systems before (dark grey) and after

39

(light grey) rescoring. The black rectangles in all insets enclose plot areas with SRMSD ≤

1 Å and energies ≤ 0 kcal/mol.

Prior to inclusion of the CHI energies, all poses from AD3 and ADV and a majority of those from AD4.2 were predicted to have favorable (negative) interaction energies; a result of the nearly horizontal slope of the SRMSD-versus-interaction-energy curves. Addition of the CHI energies led to positive slopes and frequently unfavorable interaction energies (positive) for high-energy ligand conformations. Therefore, an intuitive interaction energy cut-off of 0 kcal/mol could be defined as a convenient filter for eliminating the most unlikely structures.

For all six antibody complexes, the poses that are most similar to the co-crystal

(lowest PRMSD poses) also have CHI-adjusted interaction energies ≤ 0 kcal/mol, with the single exception being the AD4.2 results for 1M7I (Figure 4.7b). All 100 docked poses of that pentasaccharide received positive rescored interaction energies, reflecting the sub-optimal quality of the conformations produced by AD4.2 for this system. In this case, the pose closest to the co-complex displayed a PRMSD = 3.4Å, and a CHI- corrected interaction energy of 14.7 kcal/mol; rescoring can’t correct for the absence of a correct pose. Thus, the addition of the CHI energy to the docked energy scores provides a cutoff (0 kcal/mol), below which all poses may be considered possible binders.

Presented in Figure 4.8, are the φ and ψ torsion angles for the docked poses from all 6 antibody-carbohydrate systems, overlaid onto the corresponding CHI energy curves.

They provide a clear indication that the docking algorithms sample a disproportionately large number of high-energy ligand conformations, particularly evident for AD4.2 and

ADV. Several low energy regions, particularly for the ψ angles, are also not well-

40

represented. In quantitative terms, for AD3 >45% of the poses contain ligands with at

least one bond in a high energy conformation (CHI energies > 2 kcal/mol); the numbers

for AD4.2 and ADV being 73 and 77 %, respectively.

26 22 24 20 22 III, IV, I, II 20 VII, VIII, VI, V 18 18 16 16 14 14 12 12 10 10 8 8 6 6

4 4 [kcal/mol]

[kcal/mol] 2 2

E E E E

0 Δ 0 Δ 0 60 120 180 240 300 360 0 60 120 180 240 300 360 φ [deg] φ [deg] 20 22 20 18 18 VII, III, V, II VIII, IV, VI, I 16 20 16 14 14 12 12 22 10 18 10 8 8 6 16 6

20 4 4 [kcal/mol]

2 [kcal/mol] 2

E E E E

Δ 0 22 14 Δ 0 18 0 60 120 180 240 300 360 0 60 120 180 240 300 360 ψ [deg] ψ [deg] 20 16 12 18 14 Figure 4.8 Graphs showing the distribution10 of conformations produced by AD3 ( ), 16 12 AD4.2 ( ) and ADV ( ) plotted onto the corresponding8 CHI energy curves for each of 14 10 the representative linkage combinations; the6 curves are offset from each other by 6 12 8 kcal/mol. 4 10 2 6 8 E [kcal/mol] Δ 0 4 6 0 60 120 180 240 300 360 Pose ranking after including the CHI energy:

2 4 ψ [deg] E E [kcal/mol]

Δ In 9 of the 18 cases (6 antibodies x 3 docking algorithms), the top-ranked pose

0 2 E E [kcal/mol]

0 60 120 Δ 180 240 300 360 remained0 the same before and after inclusion of the CHI energies (Figure 4.9), with an ψ [deg]0 60 120 180 240 300 360 ψ [deg] 41 average SRMSD of 0.3Å. That the ranking of these poses did not change is unsurprising, given that inclusion of the CHI energy function does not greatly alter the interaction energy if the ligand is already in a low-energy conformation. However, in 7 of the 9 remaining cases, the SRMSD of the top-ranked pose improved by an average of 0.8Å, after rescoring and reranking.

Prior to rescoring, from the 100 docking runs, poses with PRMSDs ≤ 1Å were obtained in 17 out of the 18 cases, however, they were not necessarily lowest energy poses, highlighting the challenge in recognizing a correctly docked pose amongst all poses produced by a docking run. The impact of the CHI energy on the ability of docking to both produce a correctly docked pose and rank it as the lowest energy structure is indicated in terms of PRMSDs in Figure 4.9b. In several instances in which the lowest energy pose produced by the docking program was incorrect (PRMSD > 2Å), reranking after including the CHI energy led to lowest energy structures having both

PRMSD and SRMSD < 1Å.

42

a 1MFA 1MFD 1UZ8 1M7D 1S3K 1M7I b 1MFA 1MFD 1UZ8 1M7D 1S3K 1M7I 5.7 1.2 5.4 5.4

0.9 0.8 AD3 0.7 0.7 0.7 2.6 0.5 2.4 0.4 1.8 1.8 0.3 0.3 0.2 0.2 1 1 1.1

0.8 0.6 [Å] 2.8 [Å] 10.7

AD4.2 1.6 1.5 5.5 5.5 5 1.1 3.9 0.6 0.6 0.4 0.4 0.4 1.6 1.6

0.2 0.2 0.1 0.5 0.5 0.5 0.6 0.6

SRMSD of of theenergy lowest pose SRMSD PRMSD of the lowest energy energy lowest the pose of PRMSD 5.5 1.2 1.1 1.1

ADV 0.6 0.5 2 0.4 0.3 0.3 1.5 1.3 0.2 0.2 0.2 0.2 1 0.5 0.4 0.4 0.3 0.3 0.3 0.3

Figure 4.9 a) SRMSDs of the lowest energy poses for all six systems from AD3, AD4.2 and ADV, before (dark grey) and after (light grey) rescoring. (b) PRMSDs of the lowest energy poses for all six systems from all three docking programs, before (dark grey) and after (light grey) rescoring.

The impact of rescoring on the conformations (SRMSDs) and orientations

(PRMSDs) of the top-ranked poses are presented for several examples in the following section. Docking of the tetrasaccharide ligand onto the 1S3K antibody, using AD3

43

(Figure 4.10), and docking of the trisaccharide ligand onto the 1M7D antibody, using

AD4.2 yielded top-ranked poses with PRMSDs > 5Å. Both these structures obtained high

CHI energy scores of 7.0 kcal/mol and 11.6 kcal/mol, respectively. The lowest energy poses after reranking had PRMSDs of 0.6Å (1S3K/AD3), and 0.5Å (1M7D/AD4.2), with lower CHI energies of 1.0 kcal/mol and 0.9 kcal/mol respectively.

a PRMSD = 5.7Å b PRMSD = 0.6Å

Figure 4.10 (a) AD3 lowest energy pose for 1S3K before rescoring (white) compared to the crystal ligand (black); PRMSD = 5.7 Å. (b) Lowest energy pose after inclusion of the

CHI energy (white) compared to the crystal ligand (black); PRMSD = 0.6 Å.

Prior to rescoring, lowest energy structures obtained for 1MFD from all three programs had PRMSDs > 5Å, with CHI energies > 4 kcal/mol for the poses from AD4.2 and ADV, and 1.3 kcal/mol for the pose from AD3 (Figure 4.11a, S4.9). After rescoring, the lowest energy pose from AD3 remained unchanged, whereas, the corresponding pose from AD4.2 was replaced by a pose with a lower CHI energy score, however, the newly top-ranked pose still had a high PRMSD. Even though rescoring did not result in

44 correctly docked lowest energy poses in either of these cases, it improved the overall ranking of the lowest PRMSD structures (PRMSDs < 1Å) from 18 to 9 in AD3, and 13 to

2 in AD4.2 (S4.10). It should also be noted that the second lowest energy pose in AD3

(PRMSD = 1Å) remained unchanged in ranking after rescoring. In contrast, the relatively high CHI energy score of the lowest energy pose from ADV contributed to this pose being replaced by a correctly docked structure, with a lower CHI energy score, after rescoring (Figure 4.11b, S4.9, S4.10).

The ligand in 1MFD is a branched trisaccharide comprised of mannose (Man), galactose (Gal), and abequose (Abe). Abe is an analog of Gal (3,6-dideoxyGal), and the anchoring residue for the trisaccharide in the crystal structure 32 (Figure 4.11c). An examination of the docking results indicated that all three docking programs consistently generated better scores for poses in which the Gal residue replaces Abe in the binding site

(Figure 4.11d), with little increase in the SRMSD for the incorrect pose. That is, the trisaccharide can fit equally well into the binding site in the two possible orientations effectively flipped by 180°. The theoretical preference for Gal in the binding site appears to be a consequence of its ability to make additional hydrogen bonds with the protein relative to the more hydrophobic Abe. This observation suggests that the balance between contributions from hydrogen bonding versus hydrophobic interactions is imperfect in these docking algorithms. In addition, the 1MFD crystal structure reveals the presence of a water molecule within the binding pocket, mediating hydrogen bond interactions between the antibody and the ligand’s Abe residue. Given that explicit waters are not generally included in docking studies, the algorithms may be compensating for their absence by placing the more polar Gal inside the binding pocket.

45

This conclusion is supported by the observation that one of the hydroxyl groups of the

Gal residue (O-4) occupies a position in close proximity to this water molecule (PDB residue name: WAT 601) originally found in the crystal complex (Figure 4.11d).

The flipping of the carbohydrate ligand that was observed in 1MFD, was not observed in the case of its scFv counterpart (1MFA); instead, all three lowest energy poses (AD3; AD4.2; ADV) for 1MFA had orientations similar to that of the crystal ligand (PRMSDs < 2 Å). Since the ligands being docked to both antibodies are identical, we can infer that the two binding sites are not identical (Table 1). To facilitate a better understanding of the difference between the two binding pockets, their volumes were calculated using Fpocket 122; the volume of the 1MFA binding pocket was calculated to be 423.01Å3, while that of 1MFD was 582.51Å3. The 1MFD binding pocket, being

150Å3 larger, is able to accommodate the flipped orientation of the Gal residue, whereas, the smaller 1MFA binding pocket is not as accommodating of this ligand orientation, due to possible steric clashes. This potential steric clash was confirmed by superimposing the coordinates of the Gal residue onto those of Abe in 1MFA (Figure 4.11e, f).

46

a b

PRMSD = 5.5Å PRMSD = 1.0Å c d ABE GAL O4

GAL

WAT 601:O WAT 601:O ABE

e f O3

O6

47

Figure 4.11 Docking the trisaccharide to the Salmonella antibody (in 1MFD and 1MFA).

(a) Lowest energy pose from ADV for 1MFD before rescoring (white) compared to the crystal ligand (black); PRMSD = 5.5Å. (b) Lowest energy pose from ADV for 1MFD after rescoring (white) compared to the crystal ligand (black); PRMSD = 1.0Å. (c) and

(d) show the 1MFD antibody in transparent surface representation along with the oxygen atom belonging to the water molecule from the crystallographic co-complex, WAT 601; in (c) the crystal ligand from 1MFD is shown in CPK representation, and in (d) the lowest energy pose from ADV for 1MFD before rescoring (in CPK representation) showing the Gal residue replacing Abe within the binding pocket is shown. (e) The Gal residue from the ligand in 1MFD (in van der Waals representation) after being superimposed onto the Abe residue from the ligand in 1MFA is shown within the 1MFA binding site. A cross-section of the 1MFA antibody is represented as a transparent surface with potential steric clashes visible between the Gal residue and the antibody. (f) Same as

(e) but with the 1MFA antibody represented as an opaque surface thus more clearly depicting potential steric clashes between the O-3 and O-6 groups of the Gal residue and the interior of the binding pocket.

The known challenge associated with docking large, flexible molecules using

AD4.2 108,123 was encountered with the linear pentasaccharide ligand in 1M7I. None of the 100 poses were correctly docked (all PRMSDs > 2Å); the lowest energy pose had a

PRMSD of 3.9Å and a CHI energy of 18.5 kcal/mol (Figure 12a). After rescoring, the lowest energy pose had a considerably improved CHI energy score of 4.3 kcal/mol, however, it still had a high PRMSD (Figure 12b). It has been suggested that the maximum number of rotatable bonds be limited to 10 when employing AD4.2.123 The

48 ligand in 1M7I has nearly double that number at 19, making this quite a challenging system to dock using AD4.2. In AD3, although only 4 of the 100 output poses were correctly docked, they occupied the top 4 ranks, before and after rescoring. In ADV, 7 of the 100 output poses were correctly docked, of which 5 were amongst the 8 top-ranked poses, before and after rescoring. Although both AD3 and ADV seem to have had difficulty in finding the correct pose for the pentasaccharide, whenever such a pose was found, both programs scored them favorably. As these poses also had low SRMSD values, they were identified as lowest energy poses after rescoring.

a PRMSD = 3.9Å b PRMSD = 10.7Å

Figure 4.12 Docking to the antibody in 1M7I using AD4.2. (a) Lowest energy pose before rescoring (white) compared to the crystal ligand (black); PRMSD = 3.9Å. (b)

Lowest energy pose after rescoring (white) compared to the crystal ligand (black);

PRMSD = 10.7Å.

49

Conclusions

A solution to a major challenge encountered in flexible carbohydrate docking has been presented in this study by the development of intrinsic energy terms for carbohydrates, which quantify the relative energy of their glycosidic torsion angles. In 7 of the 18 cases (6 systems x AD3/AD4.2/ADV), the lowest energy poses generated by the docking programs had PRMSDs > 2Å, however, after rescoring using the CHI energy functions, the PRMSDs in 4 of the 7 cases improved, with correctly docked poses

(PRMSDs ≤ 2Å) replacing incorrect poses, and increasing the total count of correctly docked lowest energy poses to 15 out of 18. Rescoring also led to lowest energy poses that had SRMSDs ≤ 1Å in 16 out of 18 cases, and SRMSDs ≤ 1.5Å in the two remaining cases. Among the three docking programs employed in this study, ADV was most successful in producing and appropriately ranking the correct ligand pose, with a success rate of 83% before rescoring, and 100% after rescoring. Inclusion of the CHI energy term in rescoring docked poses enabled the filtering of poses based on their conformations, increasing the chances of finding the correct pose amongst all output poses generated.

In most docking applications, locating the correctly docked pose amongst the numerous output poses largely depends on the ranking of these poses based on their energy scores. The CHI energy functions may in principle be used in the assessment of carbohydrate structures obtained from any theoretical or experimental method. By favoring energetically reasonable ligand conformations, the CHI energies significantly improve the pose ranking for structures obtained from docking algorithms, making the rescored energy a better predictor of the quality of the docked pose. This improvement was observed across all three programs indicating that the CHI energy functions may be

50 employed independently of the scoring functions. The CHI energy functions could also be incorporated directly within docking programs as a component of the scoring function, although that might require a reoptimization of the scoring functions. Application to crystallographic data leads to the conclusion that proteins primarily recognize low-energy conformations of carbohydrates. This final observation has considerable relevance to the design of carbohydrate-based inhibitors and vaccines.

Individual Author Contributions

Anita K. Nivedha: Authored portions of the paper and prepared figures for the paper; designed docking protocols and the antibody alignment algorithm; performed the dockings; developed the CHI energy functions and applied the functions to docking results; provided tools for analysis, analyzed and interpreted the data.

Spandana Makeneni: Authored portions of the paper; co-designed docking protocols and the antibody alignment algorithm; performed binding site volume calculations and provided tools for the analysis of data.

B. Lachele Foley: Contributed to the design of the antibody alignment algorithm and the development of the CHI energy functions.

Matthew B. Tessier: Contributed to the design of preliminary docking protocols, provided PREP files for the non-standard sugar residues, and scripts for the collection of quantum mechanical data.

Robert J. Woods: Authored the paper; conceived and designed the experiment, and contributed to the analysis and interpretation of data.

51

5. VINA-CARB: IMPROVING GLYCOSIDIC ANGLES DURING

CARBOHYDRATE DOCKING

______A. K. Nivedha, D. F. Thieker, R. J. Woods. Accepted by J. Chem. Theory Comput. Reprinted here with permission of publisher.

52

Abstract

Docking programs are primarily designed to dock rigid, drug-like fragments onto macromolecules, and frequently encounter issues predicting more flexible carbohydrate molecules. The primary source of flexibility within a carbohydrate is the glycosidic linkage. Previous efforts have developed Carbohydrate Intrinsic (CHI) energy functions that reflect glycosidic torsion angle preferences. The following work represents the incorporation of the CHI-energy functions into the AutoDock Vina (ADV) scoring function, subsequently termed Vina-Carb (VC). Carbohydrate models generated by VC are penalized according to the CHI-energy profiles. Two new, user-adjustable parameters have been introduced; namely, a CHI-energy weight term (chi_coeff) that affects the magnitude of the CHI-energy penalty, and a CHI-cutoff term (chi_cutoff) that negates

CHI-energy penalties lower than the specified value. A dataset consisting of 76 protein- carbohydrate complexes and 29 apoprotein structures were used in the development of

VC, including antibodies, lectins and carbohydrate binding modules. Accounting for the intramolecular energies of carbohydrate ligands produced docked models that better reflected the natural configuration on the protein surface. VC produced accurate structures ranked within the top five models amongst 68% of the systems tested, compared to a success rate of 49% for ADV. Finally, a single enzyme system was employed in order to demonstrate the potential application of VC to proteins which distort glycosidic linkages of carbohydrate ligands upon binding. VC represents a significant step towards accurately predicting protein-carbohydrate interactions. In addition, the approach we present is generalizable to any other class of ligands that populate multiple well-defined conformational states. 53

Introduction

Carbohydrates represent one of the four major classes of organic macromolecules, and are involved in a range of processes that are critical for proper cellular function.9

Structural characterization of glycans and their binding partners (i.e. antibodies, lectins, carbohydrate binding modules, enzymes, etc.) has advanced our understanding of the molecular recognition process; however, obtaining three dimensional structures of these interactions is particularly challenging due to the inherent flexibility of glycans.124,125

This flexibility stems from either two or three freely rotatable bonds constituting the glycosidic linkages. 126 In contrast, rotation about the peptide backbone is restricted by the partial double-bond character of amide linkages. 127 As a result of the increased molecular motion present within carbohydrates, the majority of glycan-binding partners are not resolved in complex with their substrate. 59 Theoretical methods offer an alternative means for studying intermolecular glycan interactions that can complement experimental results. 94,95,128

Molecular docking is one such method that aims to predict various modes of non- covalent interaction between a macromolecule and a ligand, ranking the results based on binding energies. 129 In general, docking energy functions are a summation of the energy contributions from various non-bonded interactions in protein-ligand complexes such as electrostatics, van der Waals, hydrogen bonding, and hydrophobic interactions.129,130

These semi-empirical scoring functions are generalized for small molecule ligands with limited flexibility and often produce unnatural glycosidic angles when docking carbohydrates.48 This distortion is especially pronounced for large oligosaccharides which contain a higher number of degrees of freedom. 108

54

Previous studies have customized docking scoring functions for carbohydrates by either re-calibration of existing terms 89 or the inclusion of additional energy terms which model specific protein-carbohydrate interactions 131. For example, the SLICK scoring function

131 within BALLDock 132 includes an energy term for CH/π stacking interactions, and was calibrated using a set of carbohydrate-lectin complexes. In contrast, the previously reported CHI-energy functions 48 assign relative energies to the torsion angles of the glycosidic linkages. The CHI-energy functions were derived quantum mechanically based on the torsional energy profiles of several tetrahydropyran-based disaccharide models. Although the functions were developed using unbound carbohydrate models, the distribution of glycosidic torsion angles in protein-carbohydrate complexes obtained from the Protein Data Bank (PDB) has corresponded with the CHI-energy profiles.48 The conformational similarity between bound and unbound carbohydrates suggested that the

CHI-energy functions would perform well within a docking program. The CHI energy functions are transferable between scoring functions, and could also be employed in the evaluation and refinement of carbohydrate conformations obtained using experimental methods.

Vina-Carb (VC) represents the incorporation of the CHI-energy functions 48 into the AutoDock Vina 1.1.2 (ADV) scoring function. 108 The CHI-energy is calculated for each carbohydrate pose generated by VC, and added to the respective intermolecular interaction energy. Energetically unfavorable carbohydrate conformations generated by the program are penalized, and often rejected, within the Metropolis subroutine. The user can control how the CHI-energy penalty is applied in VC by adjusting the values of two input variables: a CHI-energy coefficient term (chi_coeff) and an energy cutoff value

55

(chi_cutoff). Changing the CHI-energy coefficient term affects the relative magnitude of the CHI-energy penalty compared to other energy terms within the ADV scoring function. The CHI-energy cutoff variable prevents penalization of poses with conformations which deviate from the ideal due to induced fit. Models with glycosidic torsion angles that would receive energetic penalties less than the CHI-energy cutoff value are reduced to zero. Here we expand the previous set of CHI-energy functions to include the ω-angle associated with glycosidic linkages to the O6 atom.

Unlike BALLDock/SLICK, which was calibrated on a set of lectin-sugar complexes, the optimum settings for Vina-Carb were determined using a set of 72 carbohydrate ligands crystallized with antibodies, lectins or carbohydrate binding modules from the PDB. Ligands within the development set range from a disaccharide to an undecasaccharide in length. A test set consisting of apo-proteins of receptors from the development set was used to examine and compare the optimized settings of VC with the original ADV. Finally, an application of VC to an enzyme system is demonstrated.

Methods

File Preparation

Antibody, lectin and CBM complexes containing carbohydrate ligands were collected from the Protein Data Bank (PDB) and employed as the Development Set for

VC. Details about the test systems used are provided in the S5.1. When duplicate protein chains were present in the PDB file, the chain corresponding to the lowest average B value of the corresponding ligand's atoms was used for docking. The apo-protein structures were employed as a Test Set, and the average B value of the individual protein

56 chains was used to select between duplicate chains. The antibodies were aligned to the Z- axis based on their CDR regions, as described previously48 The protein and ligand co- ordinates were formatted for docking with AutoDock Tools (ADT) 107 using the protocol described previously 48. Each docking event consists of a rigid macromolecule and a flexible ligand. Unless otherwise noted, all of the rotatable bonds within the ligand were flexible except for carbon-carbon and carbon-nitrogen bonds.

Docking Parameters

The dimensions and centers of the grid boxes are described in the SI. The maximum number of binding modes was limited to 20, and the energy range set at 10 kcal/mol. Two parameters have been added to the scoring function in VC that can be adjusted by the user: 1) chi_coeff, a weighting term for the CHI energies that augments the strength of the energetic penalty applied to the glycosidic torsion angles within the ligand (Figure 5.1a), 2) chi_cutoff, a parameter that introduces a flat-bottom potential by neutralizing the penalty assigned by the CHI energy curves to those glycosidic torsion angles which would receive a penalty less than the cutoff value (Figure 5.1b). For example, employing a chi_coeff of 2 is represented in the paper as VC2, and employing both a chi_coeff of 2 and chi_cutoff of 4 is depicted as VC2|4.

57

a.) 45 b.) 9 40 8 35 7 30 6 25 5

20 4 E [kcal/mol] E

E [kcal/mol] E 15 3

Δ Δ 10 2 5 1 0 0 0 60 120 180 240 300 360 0 60 120 180 240 300 360 ϕ [deg] ϕ [deg]

Figure 5.1 a.) The effect of applying CHI-coefficient values of 1 (solid line), 2 (dashed line) and 5 (dotted line) to the original VCΦ|β curve. b.) The effect of applying a CHI- cutoff value of 2 to the original CHIΦ|β curve (VC1|2).

Analysis

The results of each ADV docking experiment are variable due to the random seed implemented within the genetic algorithm. In order to account for this variation, the results from multiple independent docking experiments were averaged for each system tested. Unless otherwise stated, each Root Mean Square Deviation (RMSD) provided in this article represents the average result of 10 docking events. This method of analysis aims to eliminate spurious results and allows for a more accurate comparison between

ADV and VC. To increase comparability, the 10 random seeds generated for each of the

10 ADV docking experiments were explicitly defined for the 10 corresponding VC docking events.

Docking accuracy is determined through two types of RMSDs; namely, pose and shape RMSD. Both RMSDs compare the location of the docked ligand's ring atoms (C1,

58

C2, C3, C4, C5, and O5) to that of the crystal structure's equivalent atoms. A pose RMSD

(PRMSD) represents the deviation of the docked model from the location of the reference structure in space. In this manner, the PRMSD represents the accuracy of docking the ligand to the receptor. In contrast, the shape RMSD (SRMSD) uses least squares fitting to compare the docked model to the reference structure irrespective of their locations in space. The SRMSD represents the deviation of the docked model’s shape from that of the reference structure. The rmsd and match functions within Chimera 133 were used to calculate the PRMSD and SRMSD values. The PRMSDmin(5) and PRMSDmin(20) represents the minimum PRMSD from the top 5 ranked and top 20 models respectively, averaged across the 10 docking events. The SRMSDavg was calculated by averaging the

SRMSD values for each of the 20 models from the 10 docking experiments. The standard deviation values were calculated as the standard deviation of a sample.

Images of the molecules were prepared using the Visual Molecular Dynamics

(VMD) program. 134 The ligands are colored according to the source of the file. Crystal structures are colored blue, and output from ADV and VC are colored yellow and green, respectively. Additionally, each carbohydrate ring is colored according to whether the

1 4 CHI energy penalty is applied to the surrounding Φ/Ψ values. The C4 and C1 chair conformations are colored green, and other conformations that would be skipped by VC are colored red. Ring conformations have been determined according to the Cremer-

Pople definition. 135

59

CHI Energy Integration

Parsing the Ligand: The atom names for carbohydrate residues within the ligand file must follow established atom naming to be identified by the CHI energy scoring function of

VC. While the carbohydrate ligand file is parsed within parse_pdbqt.cpp, information about the atoms and residues of the ligand is stored within the data structure ligand_info.

Relevant glycosidic linkages, namely (1,2), (1,3), (1,4) and (1,6) linkages are detected.

Since the CHI energy functions were originally developed for chair conformations of oligosaccharide rings, it is necessary to determine the conformations of the residues comprising the input oligosaccharide ligand before the application of the energy functions.

Determination of Ligand Carbohydrate Ring Conformation: The ring conformations are identified based on a modified version of the Best-Fit-Four-Membered-Plane (BFMP) method 136 Selections made about the appropriate CHI energy functions to be used for each linkage are stored in the data structures glyco_info and ligand_glyco_info.

According to the BFMP method, a carbohydrate ring must fit three criteria in order to be

1 4 2 4 6 5 1 classified as a C4 or C1 sugar; namely, the internally defined d5, d1, and d3 or d2, d4,

3 and d6 conformations, respectively. When the program encounters carbohydrate conformations for which the CHI_energy functions are not applicable, it simply ignores the associated linkages. In certain protein-carbohydrate systems the sugar rings are only

4 1 slightly distorted from the standard C1 and C4 conformations and still merit application of CHI energy penalties. To accommodate such minor conformational distortions of the carbohydrate ring, in the current implementation of the BFMP method, a saccharide is

1 4 classified as a C4 or a C1 sugar if any 2 of the 3 criteria can be identified for the ring.

60

Scoring Individual Ligand Poses: Each docking run consists of a certain number of steps, determined heuristically. Each step is characterized by a random perturbation and a local optimization, which is followed by an evaluation of the generated pose. The random perturbation is performed by either transposing or rotating the ligand, or by adjusting any of the flexible torsion angles. A new function, eval_chi has been introduced within model.cpp in order to calculate the CHI energy penalty for each ligand pose. This function uses data from ligand_glyco_info to calculate the CHI energy penalty for every oligosaccharide pose generated. The CHI energy penalty calculated for each glycosidic torsion angle within eval_chi is modified according to two user-adjustable parameters

(chi_coeff and chi_cutoff). The total CHI energy of a given oligosaccharide is the summation of the CHI energies for each glycosidic torsion angle comprising the model, which is combined with the interaction energy natively calculated by ADV within the function eval_deriv. This composite energy is implemented within the metropolis_accept function in monte_carlo.cpp to calculate the acceptance probability of each ligand pose.

A ligand pose with unfavorable glycosidic torsion angles would be penalized by the application of CHI energies, thereby increasing its probability of rejection within the function.

Log file: A VC log file (called, VC_log.txt) is written out for each execution of the program and contains information about the glycosidic linkages identified by the program and details about whether CHI energy penalties were applied to each linkage.

61

Results & Discussion

Implementation of the CHI energy function aims to improve docking accuracy by correcting the shape of the carbohydrate ligand. In order to determine whether correcting the ligand shape would be sufficient to produce an accurate model for a complex, each of the crystal structures were initially subjected to a unique docking procedure in which the glycosidic linkages of the ligand were restrained to the angles that were present in the crystal structure. Of the 87 crystal structures selected for evaluation, 11 failed this initial positive control. Failure during this step suggests that alternative modifications to the

ADV scoring function would be necessary to produce accurate models for these 11 complexes; therefore, optimization of VC continued with the remaining 76 structures.

Optimization of the CHI-Energy Coefficient

Incorporation of the CHI-energy term into the ADV scoring function immediately produced output carbohydrate conformations comparable to X-ray crystal structures

(ADV vs. VC1 in Figure 5.1a). However, since the CHI-energy term was developed independently of the ADV scoring function, it may be disproportionate in magnitude.

Therefore, a range of CHI-energy coefficients (1, 2, 3, 4, 5, 10, and 50) were examined.

The effect of varying the CHI-coefficient for a set of 14 antibody-carbohydrate systems is reported in Figure 5.1. Each CHI-coefficient value led to poses with improved ligand conformations (lower SRMSDavg(20) values) than those produced with ADV. The CHI- coefficient imposes a higher penalty for torsions outside of the local minima of the CHI energy curves, thereby attenuating the production of incorrect oligosaccharide conformations during docking. Increasing the magnitude of the CHI-energy contribution generally led to a corresponding decrease in the SRMSDavg(20). This trend was

62 particularly noticeable for systems containing more than 5 carbohydrate residues, due to the increasing number of glycosidic linkages that were affected (Figure 5.1a).

Interestingly, the largest CHI-coefficient (CHI50) increased the SRMSDavg(20) for ligands containing less than 4 carbohydrate residues. This result is most likely due to an induced fit that occurred upon ligand binding, which caused the glycosidic linkages of the crystallized ligand to deviate from the theoretical minima that are heavily biased by

CHI50.

Notably, the accuracy of the pose (Figure 5.1b) diminished as the CHI contribution became increasingly large (i.e. VC10 and VC50), despite producing ligand conformations similar to the reference structure (Figure 5.1a). This suggests a problem associated with pose identification. To demonstrate this, the lowest energy model

35 generated from flexibly docking the 3C6S ligand using VC50 (SRMSD = 1.13 Å;

PRMSD = 23.8 Å) was rigidly re-docked. Results from ten docking experiments consistently produced an accurate model with a PRMSDmin(5) of 1.98 Å. Rigidly re- docking the ligand allowed the docking scoring function to segregate poses solely based on intermolecular interactions between the protein and ligand. However, during flexible docking, the harsh penalty applied by VC50 eliminated any model that deviated from the minima of the energy curve. Since very few of the generated models met this criterion, only those models that were unaffected by the CHI-energy penalty remained, including those positioned incorrectly. The intramolecular forces imparted by a high CHI-energy penalty appear to outweigh contributions from intermolecular interactions between the protein and ligand.

63

The effect of over-weighting the CHI contribution suggests that a fine balance between inter- and intramolecular interactions is required to successfully dock carbohydrate ligands. As a result, lower coefficients of the CHI-energy function (less than 4) produced more accurate models by enabling the generation of favorable glycosidic torsion angles without overshadowing the intermolecular forces involved in ligand binding. The performance of ADV and VC are comparable for systems containing di-, tri-, tetra- and pentasaccharide ligands; however, VC outperforms ADV with regards to larger oligosaccharide ligands. For example, the improvements in PRMSDmin amongst

34 35 the 5 top-ranked poses produced by ADV and VC1 for 1MFB , 3BZ4 , and 3C6S were

1.1, 2.0, and 2.27 Å, respectively. Using VC1 and VC2 produced acceptable PRMSDmin(5) poses for 13 out of 14 systems. As a result, only CHI coefficients of 1 or 2 were considered for subsequent experiments.

64 a. 6 di- tri- tetra- penta- hepta- deca- undeca-

5 ADVADV

VC1VC1 ]

Å 4 [ VC2VC2

VC3VC3

avg(20) 3 VC4VC4 6 di- tri- tetra- penta- hepta- deca- undeca- VC5 VC5 2 VC10 SRMSD VC 5 10 VC50ADVADV VC50

1 VC1VC1 ]

Å 4 [ VC2VC2

0 VC3VC3

avg(20) 3 14 VC4VC4 b. 13 VC5VC 12 5 2 VC10

SRMSD 11 ADVVC10 ]

Å 10

[ VC1VC50 9 VC50 18 VC2 min(5) 7 VC3 6 05 VC4 4 PRMSD 14 VC5 133 122 VC10 111 ADV

] VC50 0

Å 10

[ VC1 9 2G12 291-2G3-A SYA/J6 scFv SE155-4 Fab SE155-4 SE155-4 HU3S193 BR96 BR96 SE155-4 SYA/J6 SE155-4 F22-4 F22-4 8 VC2

min(5) 1OP3 1UZ8 1M7D 1MFA 1MFD 1MFE 1S3K 1CLY 1CLZ 1MFC 1M7I 1MFB 3BZ4 3C6S 7 VC3 6 5 VC4 4 PRMSD VC5 3 2 VC10 1 VC50 0 2G12 291-2G3-A SYA/J6 scFv SE155-4 Fab SE155-4 SE155-4 HU3S193 BR96 BR96 SE155-4 SYA/J6 SE155-4 F22-4 F22-4 1OP3 1UZ8 1M7D 1MFA 1MFD 1MFE 1S3K 1CLY 1CLZ 1MFC 1M7I 1MFB 3BZ4 3C6S Figure 5.2 Assessment of docking to 14 antibody systems with ADV and various CHI- energy coefficients of VC a.) SRMSDavg amongst the 5 top-ranked poses b.)

PRMSDmin(5).

Optimization of the CHI-Energy Cutoff

The CHI-energy functions were originally developed by modeling the rotational properties of disaccharide analogs in vacuo. The minima of the CHI-energy curves generally corresponded to experimentally-determined oligosaccharide structures as determined crystallographically; 48 however, oligosaccharides often undergo conformational changes resulting from induced fit, which may cause glycosidic linkages to deviate from idealized low energy values. Rather than defining the well bottom in terms of a range of allowable torsion angles, the limits are defined by CHI-energy range.

The chi_cutoff term negates the penalty associated with glycosidic linkage conformations surrounding the absolute energy minima in the CHI-energy curves. Use of a flat- bottomed CHI-energy potential allows induced fit to occur with no internal energy

65 penalty. Within this region, the pose is scored solely on the basis of the intermolecular interactions dictated by the native ADV scoring function.

To identify the optimal setting that permits an acceptable range of glycosidic angles, a CHI-energy cutoff was evaluated at integer values from 1 to 5 kcal/mol (Table

S5.2). Optimal results were obtained for each CHI-coefficient (VC1 and VC2) using CHI- cutoff values of either 1 or 2 kcal/mol (VC1|1, VC1|2, VC2|1 and VC2|2). These four settings of VC identified acceptable binding modes ranked within the top 20 poses for each of the

14 antibody systems, and ranked within the top 5 poses for 13 of the 14 antibodies. In order to examine the applicability of VC to protein-carbohydrate complexes other than antibody systems, as well as to further optimize the VC parameters, the study was extended to 62 additional carbohydrate-protein complexes, including carbohydrate binding modules (CMBs), lectins, and enzymes. The best performance was attained using a CHI-coefficient of 1 and a CHI-cutoff of 2 (VC1|2), which generated an acceptable pose amongst the top 5 models for 75% of the systems, compared to a 56% success rate for

ADV (Table 1). Although each of the 76 systems passed a positive control in which the reference structure was successfully docked with rigid glycosidic linkages, VC1|2 was unable to identify an acceptable pose for 25% of these systems. Challenges which may have prevented VC from identifying correct models will be discussed in the following section.

66

ϕ [deg] 0 60 120 180 240 300 360 16 14 14 12 12 10 10 8 8 6 6

4 4 [kcal/mol] E Δ 2 2

0 0

Percntage of structures of Percntage

0 to 4 to 0

25 to 29 to 25 54 to 50 79 to 75

100 to 104 to 100 129 to 125 154 to 150 179 to 175 204 to 200 229 to 225 254 to 250 279 to 275 304 to 300 329 to 325 354 to 350 ϕ [5 deg bins]

Figure 5.3 Comparison of the VC1|2 (dotted line) and VC2|1 (solid line) CHIΦ|β curve to the distribution of glycosidic linkages in carbohydrate crystal structures in the PDB. The bottom X-axis and left Y-axis correspond to the histogram which depicts the distribution of PDB structures, while the top X-axis and right Y-axis correspond to the CHI-energy curves.

Similar to the analysis performed by Nivedha et al.48, the carbohydrate crystal structures in the PDB were surveyed using the GlyTorsion tool from www.glycosciences.de 137 in order to calculate the percentage of glycosidic linkages exempted from penalization as a consequence of applying VC1|1, VC1|2, VC2|1, and VC2|2.

At VC1|2, the CHI energy penalty for 87% of glycosidic linkages in the PDB was nullified

(Figure 5.3), compared to values of 77%, 62% and 76% for VC1|1, VC2|1 and VC2|2, respectively. Therefore, using VC1|2 allowed for the maximum flexibility of glycosidic

67 linkages without penalization by the CHI-energy functions. Although VC1|2 was selected as default, the alternatives (VC1|1, VC2|1 and VC2|2) were nearly as efficient in binding mode prediction (Table 5.1); therefore, the CHI-cutoff and CHI-coefficient parameters remain user-adjustable.

Table 5.1 Comparison between ADV and VC at the four settings of CHI-coefficient and

CHI-cutoff.

Success Rate* [%] System types No. of Systems

ADV VC1|1 VC1|2 VC2|1 VC2|2

Antibodies 14 79 100 93 100 100

Lectins 42 55 64 71 67 67

CBMs 20 35 50 60 55 45

Totals 76 56 71 75 74 71

*Success Rate is defined as producing an accurate binding mode (PRMSDmin(5) < 2 Å)

Performance with ligands containing 1-6 linkages

In total there are 12 systems, consisting of lectins and CBMs, with ligands containing one or more 1,6 glycosidic linkages. The success rates for these systems (producing an accurate pose prediction amongst the top-5 poses) for ADV and VC1|2 were 25% and

42% respectively (Table 5.2).

68

Table 5.2 PRMSDmin(5) produced by ADV and VC1|2 for the 12 test systems with ligands containing 1,6-linkages.

PDB 1jpc 1k9i 1tei 1zhs 2vco 4gk9 2vuz 2yfz 1oh4 2ypj 2j73 2i74 ID

ADV 5.0 4.7 1.1 1.4 3.5 4.3 3.4 3.3 3.6 5.2 0.7 3.6

VC1|2 5.7 5.0 0.6 1.6 1.7 1.2 7.6 1.8 3.0 9.9 2.7 4.0

The performances of VC1|2 and ADV were further compared for the above systems by binning the values of the ω angle of all the docked poses from both programs into 10° bins. The histogram thus obtained was compared to the corresponding CHI-energy curves, for example, the data pertaining to sugars with 1,6 linkages in which the O4 atom is equatorially attached to the reducing sugar is shown in Figure 5.4. The distribution of

ω angles produced by VC1|2 can be divided into three energy regions centered around 60°,

180° and 300°, which is in agreement with the low-energy regions of the corresponding

CHI-energy curve. Additionally, the ω angles corresponding to the reference crystal structures also fall within the range of the two lowest energy wells of the CHI energy curve. In contrast, the distribution of ω angles produced by ADV are more evenly distributed across the 0° to 360° range. The challenges faced by the docking programs with docking the test set used in this study is outlined below.

69

ω [ ] 0 60 120 180 240 300 360 16 12

14 10 12 8 10

VC1|2 8 6 ADV VC1.2 6

4 Energy [kcal/mol] Series3

Percentage ofStructures 4 2 2

0 0

ω [10 bins]

Figure 5.4 Distribution of ω angles produced by ADV (blue) and VC1|2(green) for 12 test systems containing one or more 1,6-linkages overlaid against the reference crystal structure ω angles (red dots) and the corresponding CHI energy curve.

Docking Challenges

Both ADV and VC encountered recurring difficulties while docking ligands in the development set. These challenges resulted from issues inherent to the docking program, as well as ambiguities in atomic placement within the crystal structures that were used as a reference.

Excessive Carbohydrate-Protein Interactions

Obtaining a docked oligosaccharide in which part of the ligand extends away from the protein is particularly difficult for automated docking algorithms.138,139 Docking predicts complexes using a scoring function that maximizes favorable intermolecular

70 interactions. This approach promotes models that contain many residues interacting with the protein. For example, both ADV and VC1|2 fail to identify an acceptable pose amongst the 5 top-ranked poses when docking the tetrasaccharide ligand to the lectin binding domain of lectinolysin (PDB ID: 4GWI140). Only one residue of the ligand completely interacts with the protein surface in the crystal structure; however, the models produced during docking are unable to reproduce this orientation. Although VC1|2 produced poses similar to the crystal ligand (PRMSD=2.2Å), they were ranked lower than the other models which interact with the protein surface in their entirety. One approach to surmount this problem would be to dock only the component of the oligosaccharide that is in direct contact with the protein. Such a minimal binding determinant may be inferred from experimental binding data, such as glycan array screening141,142. VC improves the likelihood that the non-interacting segment will remain distal from the protein surface by penalizing unlikely glycosidic torsion angles. As an example, docking results produced by ADV and VC1|2 for the largest oligosaccharide in this test set (PDB ID: 3C6S) are displayed in Figure 5.5. While those residues that interact with the protein are correctly predicted in both instances, the model produced by

VC1|2 better represents the solvent-exposed residues. Glycosidic torsion angles obtained from the reference structure have been plotted as a function of the CHIφ|α energy curve alongside those of the 20 models produced by either ADV or VC1|2 (Figure 5.5c).

Approximately half of the ADV torsions exceeded the 2 kcal/mol cutoff, some of which would receive CHI energy penalties greater than 8 kcal/mol. In contrast, none of these torsion angles produced by VC were penalized by the CHI energy function for exceeding the 2 kcal/mol cutoff.

71

a. b.

c. 12

10 ]

mol 8 [kcal/ 6

4

CHI Energy Energy CHI 2

0 0 60 120 180 240 300 360

CHIΦ|α torsion angle [°]

Figure 5.5 a.) The PRMSDmin(5) pose from ADV compared to the reference ligand (blue). b.) The PRMSDmin(5) pose from VC1|2 compared to the reference ligand (blue). c.) The Φ torsion angles of α-sugars from the docked poses of the 3C6S ligand from both ADV

(yellow triangles) and VC1|2 (green squares) plotted on to the CHI curve. The torsion angles corresponding to the reference are plotted as blue circles.

72

Aromatic Stacking

The importance of aromatic residues within the binding site has been demonstrated by the corresponding decrease in affinity upon their substitution with other amino acids 143; however, aromatic stacking interactions are currently omitted from consideration in most docking scoring functions. As a result, docking algorithms can encounter difficulties when predicting binding modes of ligands that stack against aromatic amino acids. As an example, the carbohydrate ligand in 4AFD 144 stacks against four Tryptophan residues (Trp 55, 60, 99 and 108) in the binding groove of the corresponding CBM. Neither ADV nor VC accurately predict the binding mode, obtaining high PRMSDmin(5) values of 8.9Å and 5.4Å, respectively (Figure 5.6). In these situations, consideration of aromatic stacking interactions within the docking scoring function would be expected to improve the results. Previously, efforts have been made to incorporate CH/ᴨ stacking effects during carbohydrate docking 132,145.

73

Figure 5.6 The crystal structure of a CBM from endoglucanase Cel5A (PDB ID: 4AFD) is depicted in complex with a tetrasaccharide ligand. All amino acids further than 5 Å away from the ligand are colored grey. Those residues within 5 Å are colored orange if they are cyclic and red if acyclic.

Low-Resolution Experimental Data

Docking the tetrasaccharide ligand to the Se155-4 antibody (PDB ID: 1MFC34) appeared more challenging for VC than ADV (Figure 5.7a); however, the results were comparable for the other three ligands that have been crystallized with this antibody

(PDB ID: 1MFA31, 1MFD32, and 1MFE33). These three systems contain the same trisaccharide ligand, but differ from the tetrasaccharide by a rhamnose (Rha) residue.

This extra residue is responsible for the difference in PRMSDmin(5) values between VC1|2 and ADV (Figure 5.7a). While the positions of three of the four residues in the individual

74 structures closely align with one another, the ring of Rha-524 in the model produced by VC is flipped approximately 180° around the glycosidic ψ-angle, compared to the model produced by ADV. In the reported crystal structure for this complex 34, residue Rha-524 was described as “disordered,” and was placed in both the expected 94 and the flipped orientation in structures 1MFB and 1MFC, respectively. The ADV orientation more closely aligns with the “flipped” ligand from 1MFC, giving rise to a low

94 PRMSDmin(5) relative to VC, which predicts the normal conformation to be preferred.

While it is expected that complexation with the protein will distort the conformation of a bound oligosaccharide, the preponderance of crystallographic data (Figure 5.7b) indicates that large distortions, such as the flip of the glycosidic ψ-angle in 1MFC are rare. Thus, there is a clear role for the CHI-energy functions to aid in crystal structure refinement and/or curation by identifying such distorted glycosidic linkages as high energy.

a. b.

Figure 5.7 a.) Models representing the PRMSDmin(5) produced by docking 1MFC with

ADV (yellow) and VC at CHI1|2 (green). The primary difference between docked models is a rhamnose ring that is flipped approximately 180 degrees, highlighted by the orange

75 arrows. b.) Ligands from two crystal structures, 1MFB (blue) and 1MFC (cyan), also differ by the orientation of the RAM 524 ring.

An Assessment of ADV and VC using a Test Set of Apo Proteins

Cognate docking is useful for determining the ability of the docking algorithm to correctly place the ligand when the binding site is already preordered to receive the ligand; however, if the ultimate goal of docking is to successfully predict protein-ligand interactions in the absence of a pre-configured binding site, it is necessary to assess the performance on apo proteins. Apo protein crystal structures were available for a subset of systems from the cognate development set, and were employed as test cases to compare the performance of ADV and VC1|2. The average difference in amino acid positions between the apo and corresponding cognate proteins for residues within 5Å of the ligand was 0.77Å. ADV correctly predicted the binding modes in 35% of the systems, whereas

VC1|2 succeeded in 55% of the systems. If the top-20 poses were considered, instead of only the top-5, the success rates for ADV and VC1|2 increased to 55% and 83% respectively (

Table 5.3). VC1|2 also improved the rankings of these acceptable pose predictions

(Figure 5.8). In a given docking run, if there are multiple poses with PRMSD ≤ 2Å, the pose with a higher rank is considered an acceptable pose.

Table 5.3 Comparison between ADV and VC1|2 for the apo proteins Test Set.

76

Success Rate* [%]

No. of Systems PRMSDmin(5) PRMSDmin(20) System types

ADV VC1|2 ADV VC1|2

Antibodies 7 71 86 71 100

Lectins 10 50 50 70 90

CBMs 12 0 42 33 67

Totals 29 35 55 55 83

*Success Rate is defined as finding an accurate binding mode. (PRMSDmin < 2Å)

ADV VC1|2

14% 37% Rank acc < 5 22% 6 < Rank < 10 53% acc 64% Unacceptable 10%

Figure 5.8 A depiction of the ranks of acceptable poses (Rankacc), i.e., the lowest-ranked pose with PRMSD ≤ 2Å, produced by ADV and VC1|2 from docking oligosaccharide ligands onto apo protein structures.

77

Evaluation of Docking to an Enzyme System using ADV and Optimized VC

Enzyme active sites often distort monosaccharide ring shapes during catalysis, which makes docking to this class of proteins particularly challenging. As the CHI- energy functions were developed for use with low energy ring conformations, they would not necessarily be applicable to the distorted glycans found in enzyme complexes, and hence VC is unlikely to offer considerable improvement over ADV when applied to carbohydrate-processing enzymes. An exception to this general statement is for segments of the oligosaccharides extending beyond the active site, in which case the CHI-functions in VC should provide some enhanced accuracy. A single example of docking to a retaining hydrolase is presented here in order to demonstrate the potential application of VC to enzymes. Kitago et al. produced a series of crystal structures of the

WT cellulase 44A (Cel44A), and a catalytic knockout, in combination with cellulosic fragments. 146 Of the five structures produced, four of the ligands were bound to the (-) site (relative to the catalytic nucleophile), while only one contained a ligand that spanned the entire active site (PDB ID: 2EQD146) (Figure 5.9a). In that work, a reaction mechanism was proposed in which initial substrate binding enhanced activity through an assortment of interactions with the carbohydrate in the (-) site, while a dearth of interactions in the (+) site promoted product release. 146

VC successfully produced a model of the complex for the four ligands in the (-) site, but failed to correctly position the largest ligand that crosses the (+) site (Figure

5.9b). Although ADV failed to generate a correct model for the ligands bound to the (-) site, it outperformed VC when docking the ligand that extends across the active site. This result is unsurprising considering the high torsional penalties that would be applied by

78

VC to some of the glycosidic linkages within the crystal structure (Figure 5.9c). Although

VC would not penalize the glycosidic linkage of the (-1) residue due to the non-chair ring conformations, there are other uncommon torsion values in the distal regions of the ligand. For example, the Φ linkage between residues (+1) and (+2) of the reference structure would receive a penalty of 8 kcal/mol by the CHI energy function, effectively precluding selection of such a model by VC.

79 a.

b. 14

12

10 PRMSD [Å] min(5)

8 ADVADV min

VC1|2VC1|2 6 PRMSDmin(20) PRMSD ADV 4 ADV VC1|2VC1|2 2

0 2E0P 2EO7 2EEX 2EJ1 2EQD

c. 3.2 / --- 8.0 / 0.1 5.2 / 3.5

-3 +2 +3 +4 +1 +5 -2

0.1 / 0.1 --- / 0.7 1.3 / 0.3 0.1 / 0.5 -1

Figure 5.9 a) The ligands from five crystal structures (PDB ID: 2E0P, 2EO7, 2EEX,

2EJ1, and 2EQD) of the Cel44A enzyme are superimposed on the protein from PDB ID:

2EQD. Amino acids reported to be involved in substrate binding (N45, R47, W64, W71,

W327, W331, E359, and W392) are colored orange or red, depending on whether the

80 residue is aromatic or not. 146 The catalytic residue (Q186) is colored yellow. All other amino acids are grey. The active site has been separated into a (-) and (+) site. The circled values represent the position of each residue relative to the glycosidic linkage that is cleaved during catalysis. The ligands exclusive to the (-) side of the active site are depicted by varying shades of purple. The octasaccharide that extends across both the (-) and (+) site (2EQD) is colored blue. Each carbohydrate ring is colored according to whether the CHI energy penalty is applied to the surrounding Φ/Ψ values. Rings are either green or red depending on whether VC is or is not applied, respectively. b) A representation of the PRMSDmin(5) and PRMSDmin(20) poses from ADV and VC1|2. c) The glycosidic linkages of the octasaccharide that extends across the active site (2EQD) are labeled according to the penalty received by the CHI energy curve. Penalties greater than

2 kcal/mol are highlighted in red. VC is not applied to the (-1) residue since it is neither a

4 1 C1 nor C4 chair, so the ring is colored red and the penalties are unlisted.

Conclusions

The CHI energy functions were incorporated into ADV in order to improve carbohydrate docking results. Docking performance was evaluated with 72 antibody, lectin, or CBM systems. Although various CHI-energy coefficients were evaluated, the original energy profiles (chi_coeff = 1) produced accurate models with the highest frequency. Although exocyclic groups have been omitted from consideration during the modeling of the CHI energy curves by the use of tetrahydropyran molecules, the remaining interaction energy terms within the ADV scoring function account for the interactions of the molecule arising from the presence of these exocyclic groups. An

81 additional term that allows a range of glycosidic torsion angles to remain unpenalized has been implemented to enhance docking performance (chi_cutoff = 2). Although these settings have been selected as default values, the variables remain user-adjustable. VC1|2 produced accurate docked models for more systems than ADV when docking to either holo- or apo-protein receptors; however, ADV outperformed VC in a few cases where the reference ligands contained high-energy glycosidic linkages according to the CHI energy curves. This result suggests that accurately predicting warped glycosidic linkages, such as those found within the active site of an enzyme, would be difficult for VC. Although VC was not designed for enzymes, results from docking to a cellulase demonstrate the potential application of VC towards accurately predicting enzyme-glycan interactions.

There were a few commonalities within the systems that neither ADV nor VC could accurately reproduce. Ligands that partially extend into solution were difficult to reproduce due to the lack of intermolecular interactions. For these ligands, better results may be produced by docking only those parts of the ligand which are expected to interact with the protein. A few other systems were identified which may benefit from a term that accounts for aromatic stacking. Finally, a few low-resolution crystal structures were identified which contained ambiguous coordinates for the reference ligands, indicating a potential role for the CHI energy functions as a validation technique for crystallographic models.

VC is currently applicable to the most common saccharide moieties and linkages, such as chair conformations and 1,x-linkages. Additional residues, such as sialic acid, may be incorporated into VC once the CHI-energy functions become available.

82

The source code for VC is freely available at http://glycam.org/publication- materials/vina-carb

Individual Author Contributions

Anita K. Nivedha: Authored portions of the paper; coded CHI energy functions within

AutoDock Vina; co-designed docking protocols and analysis methodologies; provided tools for the analysis of data and made images for the paper.

David F. Thieker: Authored portions of the paper; co-designed docking protocols and analysis methodologies; provided tools for the analysis of data and made images for the paper.

Robert J. Woods: Authored the paper; conceived and designed the experiment, and contributed to the analysis and interpretation of data.

83

6. THE CONSIDERATION OF CH/Π INTERACTIONS IN CARBOHYDRATE-

PROTEIN DOCKING

Introduction

CH/π interactions occur between -CH groups and the π-electron density in aromatic molecules. These interactions were first postulated by Tamres in 1952 147, who noted that dissolving benzene in chloroform was an exothermic reaction. This result was followed up by extensive NMR and IR studies 148,149 which showed that this type of non- covalent interaction is qualitatively similar to hydrogen bonds. CH/π interactions have been described as interactions between a weak acid (C-H donor) and a weak base (π- acceptor), the interaction between a weak acid and a weak base, and are stable in both polar and non-polar solvents. 150 Individually, these bonds are relatively weak, with each interaction contributing 0.5-1.0 kcal/mol to the overall stabilization energy of the complex, 150,151 but the cumulative effect of multiple CH/π interactions has a pronounced influence on stability.

It has also been proposed that the strength of the CH/π interaction primarily originates from charge transfer, 152 indicating that dispersive forces play a major role in these interactions. 153 The hydrophobic effect also contributes favorably to this interaction when it is present in water as the solvent. However, it is not the major contributing factor, as shown in the study by Waters et al.,154 in which the replacement of an aromatic moiety by a more hydrophobic aliphatic group, led to a decrease in the interaction energy of the system under study, in which they showed that the mutation of a phenylalanine by a synthetic analog, in which the phenyl ring was replaced by a cyclohexane ring, weakened the interaction energy of the system with an acetylated monosaccharide from -0.5

84 kcal/mol to -0.1 kcal/mol. This result showed that the hydrophobic effect was not the major contributing factor to the interaction energy when there was a potential to form

CH/π interactions. Additionally, CH/π interactions may occur even in vacuum, 155,156 whereas hydrophobicity stems from a molecule’s interaction with water.

A. B.

Figure 6.1 The replacement of the aromatic group in (A) by the group aliphatic group in

(B) in the study by Water et al. 154 in an interaction with a tetraacetylglucose molecule led to a decrease in the interaction energy of the system.

Multiple surveys of the Protein Data Bank (PDB) have been performed to investigate the presence of CH/π interactions in protein crystal structures, and the sheer number of these interactions reveals their importance in protein structure stability and function. 50,157 For example, in a 2001 study, a survey of PDB was conducted on a set of

1154 non-redundant protein structures to detect CH/π interactions, and the authors

85 detected 31,087 individual interactions which satisfied their selection criterion. 151They discovered that nearly three-fourths of the Tryptophan residues, half of all Tyrosine and

Phenylalanine residues and one-fourth of all Histidine residues were involved as acceptors in CH/π interactions. In addition to their contribution to the stabilization of protein structures151, CH/π interactions are also found occurring in complexes of proteins with ligands or cofactors, nucleotides, carbohydrates or peptides.151,158 159 They are particularly common in carbohydrate-binding proteins, and affect binding affinity and conformation. For example, lysozyme is an endoglycosidase which binds to the β-

1,4-linked homopolymer of N-acetylglucosamine (GlcNAc), the main cell wall component in fungi. The enzyme has several aromatic amino acids in its binding pocket crucial for ligand recognition. An alteration of these aromatic residues using site-specific mutagenesis affected the affinity and the catalytic efficiency of the enzyme. 158

Protein-carbohydrate interactions are at the heart of several life processes including fertilization, embryogenesis, tissue maturation and tumor metastasis. 160 The affinities associated with this class of molecular recognition phenomenon are often strengthened by multivalency 161,162, as well as by interactions between polar or charged groups (hydrogen bonds, salt bridges), van der Waals contacts, and aromatic amino acids and –CH groups in carbohydrate residues (CH/π interactions) 55 (Figure 6.3). These CH/π interactions cause carbohydrate rings to stack roughly parallel or perpendicular to aromatic amino acids. 51,163 They have been observed in most protein-carbohydrate complexes, including enzymes and receptors, and more specifically for example in, lectins, plant toxins, antibodies and transport proteins. 56,164 Antibodies can also be raised

86 against carbohydrate antigens, and can therefore interact with sugars intermolecularly 25 via stacking interactions.

Figure 6.2 The carbohydrate antigen from Salmonella stacking against two aromatic amino acids, namely, a Tryptophan and a Tyrosine in the binding pocket of an antibody

Fab fragment. (PDB ID: 1MFE)33

Any pyranoside has two distinctive faces that can interact with an aromatic residue. From experimental and theoretical studies it can be seen that the presence of several axially oriented CH bonds facing the aromatic ring is favored, while interactions with axially oriented OH bonds is disfavored. 160 In a typical carbohydrate CH/π interaction, the hydrogen atoms in two or three CH groups on the hydrophobic face of a monosaccharide overlap with the π-electron density in an aromatic amino acid (Fig.

87

3.1b). It has also been shown experimentally that the elimination of aromatic residues within these binding sites leads to a decrease in the affinity of the protein-carbohydrate interaction, 165 and replacing one aromatic residue by another can be performed to modulate the properties of the interaction. It was found also that as the size of the interacting amino acid ring increased, there was a corresponding increase in affinity. At the same time, if electron-withdrawing groups, such as Flourine were added to the ring, it led to a decrease in affinity.147,158,166,167

Figure 6.3 Representation of CH/π interactions between β-D-Glucopyranose (βDGlcp) and Phenylalanine.

In this the present work, we obtained a CH/π interaction energy function using knowledge from previous experiments about the nature of interaction between the two groups and their contribution to the overall interaction energy of the complex.154 We examined the use of the resulting CH/π function in improving the ranking of theoretical

88 interaction energies for a test set of 60 lectin-carbohydrate systems that consisted of complexes in which CH/π interactions visibly contributed to binding, those which had a fewer than four CH/π interactions in the binding site, and also systems in which these interactions were absent. The theoretical structures were generated by automated docking employing AutoDock Vina 108 and Vina-Carb. 48 The CH/π function was applied after docking to assess its ability to improve the ranking of the theoretical poses, relative to the known crystal structures for the 60 systems.

Methods

CH/π interaction energy function

The CH/π interaction energy curve between a CH model and an aromatic ring moiety can be described using a Lennard-Jones’ potential with the minimum of the curve at ~0.5 kcal/mol, which is known to be the contribution from an individual CH/π interaction.154

The equation used to model these interactions is shown in Figure 6.4.

89

2

1.5

1

where, 4ε = 1.84; σ = 3.26 0.5 Model x

Energy [kcal/mol]Energy 0 3 4 5 6

-0.5

-1 Distance, x [Å]

Figure 6.4 The mathematical model (Lennard-Jones potential) used in this study to describe the interaction between a CH-group and an aromatic moiety.

Evaluation of Results

The RMSD of the docked ligand pose was computed relative to that in the crystal structure (PRMSD) for the carbohydrate ring atoms. Previously, we have reported that

PRMSD values are a convenient quantitative measure of the quality of a theoretical carbohydrate pose. 48 168

For re-scoring of the docking energies, the cumulative CH/π interaction energy score for each pose was combined with the docked energy obtained from ADV and VC, and the new energies are used to re-rank the docked models. The rank of the model with

48 the lowest PRMSD (PRMSDmin) was calculated before and after rescoring.

Test systems

The test set consisted of 60 lectin-carbohydrate crystal structures extracted from the PDB. Details about the systems are provided in Supplementary Information (SI). In

90 the case of PDB files with multimers of the complex, the monomer with the lowest average B-factor for the carbohydrate ligand was selected. The systems were prepared for docking using AutoDockTools. 107 The docking grid box was centered on the binding site of the protein, and docking was repeated ten times. The (x, y, z) co-ordinates of the grid box center are provided in S6.3. Docking was performed ten times, and the lowest Pose

PRMSD model determined each time. The average value of these ten lowest PRMSD values was calculated as the PRMSDmin in each case, as described in the work done by

Nivedha et al. 168 The requested number of output models was set to 20 for each of the 10 independent docking runs. Following docking, the CH/π interaction energy scoring function was applied to each docked model. The algorithm to perform this post-docking application of the CH/π function is described in detail in the following section.

Automatic Detection of CH/π interactions

The program reads in protein and carbohydrate structure files (PDB format) and calculates the equation of the plane, ax + by + c for each pyranose ring using the co- ordinates of the ring atoms. For the 5 carbon atoms in the ring, the positions of the attached hydrogen atoms are calculated as shown in 48. Using the H and C atomic co- ordinates, CH vectors are generated, and the center of the plane of the carbohydrate ring demarcated by atoms O5, C2, C3 and C5 is determined.

The program detects all Tyrosine, Tryptophan and Phenylalanine residues according to their residue name in the PDB file, and stores the coordinates for all the atoms comprising the aromatic rings. For each aromatic ring, the centroid is calculated

(one for each ring, therefore a total of two in the case of Tryptophan) and the distances between each aromatic center and all centers of the carbohydrate ring planes are

91 determined (dcenters) (Figure 6.5b). If any of the dcenters distances calculated is found to be less than 7Å, distances between the projections of the pyranose ring carbon atoms and the centroids of the aromatic rings are calculated (dcp) (Figure 6.5c). For each dcp distance calculated, if the value is less than 2.5Å and if the orientation if the CxHx bond is pointing towards the aromatic ring, an aromatic CH/π interaction energy score is calculated for that interaction using the distance between the carbon atom and the centroid of the aromatic ring as input (dcπ), c) Summation of all CH/π interaction energy scores for the entire carbohydrate molecule gives the total CH/π interaction energy score for that pose.

The performance of various CH/π interaction energy score coefficients were examined, namely, 0.3, 0.5, 0.7, 1.0, 1.5 and 2.0.

92

a.) Avg. position of O5, C2 and O1

b.) c.)

dcenters

d cp

Figure 6.5 Detection of CH/π interactions a.) An average position of the co-ordinates of the atoms C2, O5 and O1 is determined. In order to find the vector C1H1, the negative of the vector between points C1 and the average of atom positions C2, O5 and O1 (computed in (a.)) is determined. b.) The distance between the centroid of the aromatic ring and the plane of the carbohydrate ring delineated by atoms O5, C2, C3 and C5 is determined, dcenters (≤ 7Å). c.) The carbon atoms in the carbohydrate ring are projected onto the aromatic ring plane and the distances between each of these projections and the centroid of the aromatic ring is determined, dcp (≤ 2.5Å). Shown in green are the CH bond vectors pointing towards the aromatic ring (scored), and shown in red are the CH bond vectors pointing away from the aromatic ring (not scored).

93

Docking protocol

The protein and ligand files were prepared using AutoDockTools (version 1.5.4).

107 All C-O bonds were allowed freedom to rotate in the carbohydrate ligands. According to the protocol used in our earlier work, 168 the docking was performed 10 times using

AutoDock Vina and Vina-Carb. The 10 random seeds for each of the 10 docking runs were explicitly defined in order to increase comparability between results.

Results and Discussion

The systems in the test set were divided based on the number of detected CH/π interactions (n) (Table 6.1). Based on Boisbouvier’s work, 169 firstly, for each carbohydrate ligand in the test set, distances between all the ring carbon atoms and the centroids of all aromatic rings in the interacting protein were calculated, dCπ. For each ligand, a CH/π interaction is considered as being present if the dCπ distance is ≤ 4.3Å.

Both programs had a greater success rate at making accurate binding mode predictions of complexes with a greater number of intermolecular CH/π interactions in their binding pockets. This result could be indicative of the crucial role that these types of interactions play in determining the binding specificity of the carbohydrate ligands to their respective receptors.

Amongst the systems for which the programs succeeded in accurately predicting the ligand binding modes, the ranking of the accurate PRMSDmin poses improved after the addition of the CH/π interaction energy score especially in cases with a greater number of CH/π interactions (n≥2). In the case of systems with n≤1, the addition of the

CH/π interaction energy term decreased the ranking of the PRMSDmin pose. (Table 6.1)

94

Table 6.1 Average rank of accurate PRMSDmin pose predictions by ADV and VC1|2 before and after rescoring as a function of the CH/π interaction energy coefficients. The systems are divided into different groups based on the number of detected CH/π interactions.

VC1|2 Rank After addition of CH/π Number energies of CHs PRMSD [Å] Rank Before min CH/π Coefficient (n) 0.3 0.5 0.7 1 1.5 2 0 (n=8) 0.96 3.88 4.24 4.45 4.90 5.30 5.75 6.09 1 (n=3) 0.50 1.33 1.37 1.60 1.77 2.83 3.80 4.63 2 (n=2) 0.98 2.95 2.45 2.35 2.35 2.10 2.05 2.15 3 (n=4) 1.13 2.98 2.80 2.68 2.60 2.53 2.58 2.60 4 (n=8) 0.95 2.32 2.02 2.00 1.91 1.91 1.92 2.01 5 (n=7) 1.30 2.83 2.14 1.84 1.69 1.60 1.59 1.51 6 (n=1) 0.87 2.00 2.00 2.00 1.00 1.00 1.00 1.00 7 (n=1) 1.13 1.00 1.00 1.00 1.00 1.00 1.00 1.00 9 (n=2) 0.54 5.55 1.00 1.00 1.05 1.15 1.25 1.40 10 (n=2) 0.72 7.85 1.40 1.20 1.00 1.20 1.00 1.00 12 (n=1) 1.00 7.10 3.20 2.20 1.90 2.00 2.10 2.40 16 (n=1) 0.42 1.00 1.00 1.00 1.00 1.00 1.00 1.00 total, 0.87 3.40 2.05 1.94 1.85 1.97 2.09 2.23 n=40

ADV Rank Before Rank After Number of CHs (n) PRMSDmin [Å] CH/π Coefficient = 0.7

0 (n=5) 1.09 4.04 4.58 1 (n=3) 0.78 1.00 2.00 2 (n=1) 0.62 1.00 1.00 3 (n=4) 1.12 2.25 2.10 4 (n=8) 0.86 2.54 1.88 5 (n=7) 1.31 4.23 2.90 6 (n=1) 0.79 2.00 1.00

95

7 (n=1) 1.26 3.30 6.30 9 (n=2) 0.63 4.75 3.80 10 (n=2) 0.95 5.00 1.00 12 (n=0) - - - 16 (n=1) 0.42 1.00 1.00 total, n=35 0.89 2.83 2.51

Amongst the various CH/π interaction energy coefficients tested, coefficient values ≥ 0.7 resulted in the most improvement of pose ranking. Using higher values of the coefficient on with systems with a lower number of CH/π interactions caused the ranking of the PRMSDmin to decline. Therefore, based on the data obtained, a coefficient value of 0.7 was chosen as the optimal value to rescore docked carbohydrate poses using the CH/π interaction energy function.

In the case of systems 1VEO and 1ITC, the application of the CH/π interaction energy scores, improved the ranking of the accurate PRMSDmin poses produced by 7.6 and 10 places respectively. For example, in the case of ADV1|2, the PRMSD of the top- ranked pose before rescoring was 5.6Å, whereas after rescoring, the PRMSD of the top- ranked pose became 0.9Å. (Figure 6.6)

96

Figure 6.6 The effect of applying the CH/π interaction to the top-ranked pose produced by VC1|2 before and after rescoring. Shown in green is the crystal ligand, in white is the top-ranked pose before rescoring (PRMSD = 5.6Å) and in blue is the top-ranked pose after rescoring (PRMSD = 0.9Å).

Conclusions

The incorporation of the CH/π interaction energy term improved rankings of accurate PRMSDmin pose predictions produced by both ADV and VC1|2. A CH/π interaction energy coefficient of 0.7 produced optimal results for the test set considered.

In at least 40% of the total test systems, both docking programs were unable to produce accurate binding mode predictions. The inclusion of the CH/π interaction energy function within the VC scoring function can be expected to improve binding mode prediction, by

97 favorably scoring any such interaction between every docked pose generated by the algorithm and the protein receptor. This would in turn decrease the probability of rejection of such poses during the selection stage of the algorithm, before the final results are assembled. Additionally, an appropriate CH/π interaction coefficient value should also be included and its optimum value determined.

The algorithm for the detection of CH/π interactions can be further improved, for instance, by considering the angle of the CH vectors with respect to the normal to the aromatic ring plane. The test set can also be expanded to increase diversity, both with respect to receptor and ligand types, and also with respect to systems with or without intermolecular CH/π interactions. The consideration of pivotal CH/π interactions in protein-carbohydrate complexes, and accounting for the energies that these non-covalent interactions contribute to protein-carbohydrate binding can improve our binding mode predictions, and help us better understand the factors influencing biological recognition.

Future Directions

The CH/π interaction energy function presented in this study is a first-order approximation of an energy curve to model the interaction between an aliphatic CH- group and an aromatic ring. The model can be further improved by using data from available literature studying these interactions. In the 2006 study by Ringer et al., 155 the authors performed QM calculations to estimate the contribution of CH/π interactions to the total interaction energy in model systems using the Symmetry Adapted Perturbation

Theory (SAPT) 170 analysis. The authors performed computations on model systems

98 consisting of methane, as a model for aliphatic side-chains, and benzene, phenol or indole as aromatic components of phenylalanine, tyrosine and tryptophan. The model systems used are show in Figure 6.7.

a. b. c. d.

Figure 6.7 Model Systems used by Ringer et al. to quantify CH/π interactions using quantum mechanical calculations

They obtained potential energy curves by varying distances between the methane molecule and the aromatic moieties in each model complex. We observed that the reported energies were remarkably similar in terms of maximum interaction energy and shape of the interaction potential, and have developed a generic CH/π function by averaging the QM data and fitting a Lennard-Jones potential to the average values. Figure

6.8. This new energy function can be used to score CH/π interactions.

휎 12 휎 6 푉 = 휀 [(( ) − ( ) )] [6.1] 푥 푥 푥

99 where, x is distance between carbon atom and aromatic ring centroid.

a.) 5

4

3

Model 6.4a 2 Model 6.4b Model 6.4c 1 Model 6.4d

Energy Energy [kcal/mol] Average Curve 0 0 1 2 3 4 5 6 7 -1

-2 Distance [Å]

b.) 5

4

3

2 Average Curve

1 Model Energy [kcal/mol]Energy 0 0 1 2 3 4 5 6 7

-1

-2 Distance [Å]

Figure 6.8 a.) The individual interaction energy curves for the models (as described in

Figure 6.7) used by Ringer et al. 155, alongside the average of the individual curves. b.)

The average curve (a) shown alongside the mathematical model used in the current study.

100

7. CONCLUSIONS

In Chapter 4, the performances of three docking programs, namely AutoDock 3.0.5,

AutoDock 4.2 and AutoDock Vina were compared and AutoDock Vina had the most success in accurately predicting binding modes of the carbohydrate ligands. A set of six antibody-carbohydrate systems were used in this study. An algorithm for aligning the antibody structures to the co-ordinate axes prior to docking based on the complementarity determining regions was developed in order to increase comparability and reproducibility of the results, in addition to being useful in an automated docking pipeline to be implemented in GlycamWeb (www.glycam.com). A set of disaccharide models were used to develop the Carbohydrate Intrinsic (CHI) energy functions, which score oligosaccharide structures based on the conformations of their glycosidic linkages.

Application of the CHI energy functions resulted in an improvement of the rankings of the accurate pose predictions. A survey of the PDB for carbohydrate crystal structures, consisting of carbohydrates linked either covalently or non-covalently to various receptors including lectins, antibodies, enzymes and carbohydrate binding modules, revealed that the glycosidic torsion preferences of these structures were similar despite of being bound to different kinds of substrates. A majority of the glycosidic torsion angles fall into the same energy well, for each CHI energy curve. These energy functions can therefore also aid in the refinement of experimental oligosaccharide structures.

The research presented in chapter 5 described the incorporation of the CHI energy functions within AutoDock Vina’s scoring function, leading to the development of Vina-

Carb. The performance of Vina-Carb and the original AutoDock Vina were evaluated and compared against a set of protein-carbohydrate systems consisting of lectins, antibodies,

101 carbohydrate binding modules and enzymes. Vina-Carb significantly improved the conformations of the docked oligosaccharide poses. The integration of the CHI energy functions within the program led to the penalization of unfavorable glycosidic torsion angles, increasing the appearance of poses with energetically favorable glycosidic linkages in the output. The improvements effected in the conformation of the carbohydrate ligand automatically improved the chances of VC making accurate binding mode predictions. The source code of Vina-Carb ver. 1.0 is available for download at: http://glycam.org/publication-materials/vina-carb. The suite of CHI energy functions could be further expanded to include 2,x linkages, and other standard sugar conformations as needed.

In chapter 6, the role of CH/π interactions in binding specificity and affinity in protein-carbohydrate complexes has been outlined. Previously available quantum mechanical data describing the interaction between models of CH groups and aromatic amino acids was used to obtained mathematical models describing the CH/π interactions energy in such complexes. This CH/π interaction energy function, when applied to lectin- carbohydrate docked complexes with significant CH/π contacts in the binding pocket, improved the rankings of accurate binding mode predictions. This function can be incorporated within Vina-Carb’s scoring functions so that the presence of CH/π interactions is favored during docking, which could consequently further improve oligosaccharide binding mode predictions.

102

8. REFERENCES

(1) Drickamer, K.; Taylor, M. E. Biology of Animal Lectins. Annu. Rev. Cell

Biol. 1993, 9, 237-264.

(2) Varki, A. Biological Roles of Oligosaccharides: All of the Theories are

Correct. 1993, 3, 97-130.

(3) Haltiwanger, R. S.; Lowe, J. B. Role of glycosylation in development.

Annu Rev Biochem 2004, 73, 491-537.

(4) Cobb, B. A.; Kasper, D. L. Coming of age: carbohydrates and immunity.

European Journal of Immunology 2005, 35, 352-356.

(5) Beuvery, E. C.; Vanrossum, F.; Nagel, J. COMPARISON OF THE

INDUCTION OF IMMUNOGLOBULIN-M AND IMMUNOGLOBULIN-G

ANTIBODIES IN MICE WITH PURIFIED PNEUMOCOCCAL TYPE-3 AND

MENINGOCOCCAL GROUP-C POLYSACCHARIDES AND THEIR PROTEIN

CONJUGATES. Infection and Immunity 1982, 37, 15-22.

(6) Brown, G. D.; Gordon, S. Immune recognition: A new receptor for [beta]- . Nature 2001, 413, 36-37.

(7) Rademacher, T. W.; Parekh, R. B.; Dwek, R. A. Glycobiology. Ann. Rev.

Biochem. 1988, 57, 785-838.

(8) Feizi, T. Carbohydrate differentiation antigens: probable ligands for cell adhesion molecules. Trends in Biochemical Sciences 1991, 16, 84-86.

103

(9) Varki, A.; Cummings, R.; Esko, J.; Freeze, H.; Hart, G.; Marth, J.:

Essentials of Glycobiology; Cold Spring Harbor Laboratory Press: New York, 1999.

(10) Roth, Z.; Yehezkel, G.; Khalaila, I. Identification and quantification of protein glycosylation. International Journal of Carbohydrate Chemistry 2012, 2012.

(11) Chou, C.-F.; Smith, A. J.; Omary, M. Characterization and dynamics of O- linked glycosylation of human cytokeratin 8 and 18. Journal of Biological Chemistry

1992, 267, 3901-3906.

(12) Jackson, S. P.; Tijan, R. O-Glycosylation of Eukaryotic Transcription

Factors: Implications for Mechanisms of Transcriptional Regulation. Cell 1988, 55, 125-

133.

(13) Gerken, T. A.; Butenhof, K. J.; Shogren, R. Effects of Glycosylation on the Conformation and Dynamics of O-Linked Glycoproteins: Carbon-13 NMR Studies of

Ovine Submaxillary Mucin. Biochem. 1989, 28, 5536-5543.

(14) Wittwer, A. J.; Howard, S. C.; Carr, L. S.; Harakas, N. K.; Feder, J.;

Parekh, R. B.; Rudd, P. M.; Dwek, R. A.; Rademacher, T. W. Effects of N-Glycosylation on in Vitro Activity of Bowes Melanoma and Human Colon Fibroblast Derived Tissue

Plasminogen Activator. Biochem. 1989, 28, 7662-7669.

(15) Saso, L.; Silvestrini, B.; Guglielmotti, A.; Lahita, R.; Cheng, C. Y.

ABNORMAL GLYCOSYLATION OF ALPHA(2)-MACROGLOBULIN, A NON-

ACUTE-PHASE PROTEIN, IN PATIENTS WITH AUTOIMMUNE-DISEASES.

Inflammation 1993, 17, 465-479.

104

(16) Rook, G. A. W.; Steele, J.; Brealey, R.; Whyte, A.; Isenberg, D.; Sumar,

N.; Nelson, L.; Bodman, K. B.; Young, A.; Roitt, I. M.; Hutchison, F.; Williams, P.;

Scragg, I.; Edge, C. J.; Arkwright, P.; Ashford, D.; Wormald, M.; Rudd, P.; Redman, C.;

Dwek, R. A.; Rademacher, T. W. Changes in IgG Glycoform Levels may be Relevant to

Remission of Arthritis During Pregnancy.

(17) Rademacher, T. W.; Parekh, R. B.; Dwek, R. A.; Isenberg, D.; Rook, G.;

Axford, J. S.; Roitt, I. The Role of IgG Glycoforms in the Pathogenesis of Rheumatoid

Arthritis. Springer Semin. Immunopathol. 1988, 10, 231-249.

(18) Renaudineau, Y.; Saraux, A.; Dueymes, M.; Le Goff, P.; Youinou, P.

Importance of IgG Glycosylation in Rheumatoid Arthritis. Rev. Rhum. 1998, 65, 429-

433.

(19) Watson, M.; Rudd, P.; Bland, M.; Dwek, R.; Axford, J. S. Sugar Printing

Rheumatic Diseases: A Potential Method for Disease Differentiation Using

Immunoglobulin G Oligosaccharides. Arth Rheum 1999, 42, 1682-1690.

(20) Brockhausen, I.: Glycodynamics of mucin biosynthesis in gastrointestinal tumor cells. In Glycobiology and Medicine; Axford, J. S., Ed.; Advances in Experimental

Medicine and Biology, 2003; Vol. 535; pp 163-188.

(21) Porowska, H.; Paszkiewicz-Gadek, A.; Anchim, T.; Wolczynski, S.;

Gindzienski, A. Inhibition of the O-glycan elongation limits MUC1 incorporation to cell membrane of human endometrial carcinoma cells. International Journal of Molecular

Medicine 2004, 13, 459-464.

105

(22) Hakomori, S. I. Aberrant Glycosylation in Tumors and Tumor-Associated

Carbohydrate Antigens. Advances in Cancer Research 1989, 52, 257-331.

(23) Dennis, J. W.; Granovsky, M.; Warren, C. E. glycosylation and cancer progression. Biochimica et Biophysica Acta 1999, 1473, 21 - 34.

(24) Paulson, J. C.; Blixt, O.; Collins, B. E. Sweet Spots in Functional

Glycomics. Nat Chem Biol 2006, 2, 238-248.

(25) Murase, T.; Zheng, R. B.; Joe, M.; Bai, Y.; Marcus, S. L.; Lowary, T. L.;

Ng, K. K. S. Structural Insights into Antibody Recognition of Mycobacterial

Polysaccharides. Journal of Molecular Biology 2009, 392, 381-392.

(26) Kotra, L. P.; Golemi, D.; Amro, N. A. Dynamics of the

Lipopolysaccharide Assembly on the Surface of Escherichia coli. J. Am. Chem. Soc.

1999, 121, 8707-8711.

(27) Park, B. S.; Song, D. H.; Kim, H. M.; Choi, B.-S.; Lee, H.; Lee, J.-O. The

Structural Basis of Lipopolysaccharide Recognition by the TLR4–MD-2 Complex.

Nature 2009, 458, 1191-1195.

(28) Kelly, D. F.; Moxon, E. R.; Pollard, A. J. Haemophilus influenzae type b conjugate vaccines. Immunology 2004, 113, 163-174.

(29) Darkes, M. J. M.; Plosker, G. L. Pneumococcal conjugate vaccine

(Prevnar; PNCRM7): a review of its use in the prevention of Streptococcus pneumoniae infection. Paediatric drugs 2002, 4, 609-630.

106

(30) Vyas, N. K.; Vyas, M. N.; Chervenak, M. C.; Johnson, M. A.; Pinto, B.

M.; Bundle, D. R.; Quiocho, F. A. Molecular Recognition of Oligosaccharide Epitopes by a Monoclonal Fab Specific for Shigella Flexneri Y Lipopolysaccharide: X-ray

Structures and Thernodynamics. Biochemistry 2002, 41, 13575-13586.

(31) Zdanov, A.; Li, Y.; Bundle, D. R.; Deng, S.-J.; MacKenzie, C. R.; Narang,

S. A.; Young, N. M.; Cygler, M. Structure of a Single-Chain Antibody Variable Domain

(Fv) Fragment Complexed with a Carbohydrate Antigen a 1.7-Å Resolution. Proc. Natl.

Acad. Sci. USA 1994, 91, 6423-6427.

(32) Bundle, D. R.; Baumann, H.; Brisson, J.-R.; Gagné, S. M.; Zdanov, A.;

Cygler, M. Solution Structure of a Trisaccharide-Antibody Complex: Comparison of

NMR Measurements with a Crystal Structure. Biochemistry 1994, 33, 5183-5192.

(33) Cygler, M.; Rose, D. R.; Bundle, D. R. Recognition of a Cell-Surface

Oligosaccharide of Pathogenic Salmonella by an Antibody Fab Fragment. Science 1991,

253, 442-445.

(34) Cygler, M.; Wu, S.; Zdanov, A.; Bundle, D. R.; Rose, D. R. Recognition of a carbohydrate antigenic determinant of Salmonella by an antibody. Biochem Soc

Trans 1993, 21, 437-441.

(35) Vulliez-Le Normand, B.; Saul, F. A.; Phalipon, A.; Bélot, F.; Guerreiro,

C.; Mulard, L. A.; Bentley, G. A. Structures of synthetic O-antigen fragments from serotype 2a Shigella flexneri in complex with a protective monoclonal antibody.

Proceedings of the National Academy of Sciences of the United States of America 2008,

105, 9976-9981.

107

(36) Roseman, S. Reflections on glycobiology. Journal of Biological

Chemistry 2001, 276, 41527-41542.

(37) Dwek, R. A. Glycobiology: Toward Understanding the Function of

Sugars. Chem Rev 1996, 96, 683-720.

(38) Fischer, E. Ueber die Configuration des Traubenzuckers und seiner

Isomeren. II. Berichte der deutschen chemischen Gesellschaft 1891, 24, 2683-2687.

(39) Juaristi, E.; Cuevas, G.: The anomeric effect; CRC press, 1994.

(40) Juaristi, E.; Cuevas, G. Recent Studies of the Anomeric Effect.

Tetrahedron 1992, 48, 5019-5087.

(41) Tvaroska, I.; Carver, J. P. The Anomeric, Reverse Anomeric and Exo-

Anomeric Effects in C-, N-, and S- Glycosyl Compounds. Manuscript.

(42) Anomeric Effect. Origin and Consequences; Szarek, W. A.; Horton, D.,

Eds.; American Chemical Society: Washington, D.C., 1979; Vol. 87, pp 132.

(43) Tvaroska, I.; Kozar, T. The Conformational Properties of the Glycosidic

Linkage. Carbohydr. Res. 1981, 90, 173-185.

(44) Kirby, A. J.: The Anomeric Effect and Related Stereoelectronic Effects at

Oxygen; Springer-Verlag: New York, 1983.

(45) Fuchs, B.; Schleifer, L.; Tartakovsky, E. Probing the Anomeric Effect1:

The Structural Criterion. Nouveau Journal de Chimie 1984, 8, 275-278.

108

(46) Tvaroska, I.; Bleha, T.: Anomeric and Exo-Anomeric Effects in

Carbohydrate Chemistry. In Adv. Carbohydr. Chem. Biochem.; Tipson, R. S., Derek, H.,

Eds.; Academic Press: New York, 1989; Vol. 47; pp 45-123.

(47) Agirre, J.; Davies, G.; Wilson, K.; Cowtan, K. Carbohydrate anomalies in the PDB. Nature chemical biology 2015, 11, 303-303.

(48) Nivedha, A. K.; Makeneni, S.; Foley, B. L.; Tessier, M. B.; Woods, R. J.

Importance of ligand conformational energies in carbohydrate docking: Sorting the wheat from the chaff. J Comput Chem 2013.

(49) Bourne, Y.; van Tilbeurgh, H.; Cambillau, C. Protein-Carbohydrate

Interactions. Curr. Opin. Struct. Biol. 1993, 3, 681-686.

(50) Vyas, N. K. Atomic Features of Protein-Carbohydrate Interactions. Curr.

Opin. Struct. Biol. 1991, 1, 732-740.

(51) Quiocho, F. A. Carbohydrate-Binding Proteins: Tertiary Structures and

Protein-Sugar Interactions. Ann. Rev. Biochem. 1986, 55, 287-315.

(52) Munske, G. R.; Krakauer, H.; Magnuson, J. A. Calorimetric study of carbohydrate binding to concanavalin A. Archives of biochemistry and biophysics 1984,

233, 582-587.

(53) Bundle, D. R.; Young, N. M. Carbohydrate-protein Interactions in

Antibodies and Lectins. Curr. Opin. Struct. Biol. 1992, 2, 666-673.

(54) Quiocho, F. A.; Vyas, N. K. Novel Stereospecificity of the L--

Binding Protein. Nature 1984, 310, 381-386.

109

(55) Kozmon, S.; Matuska, R.; Spiwok, V. c.; Koca, J. Dispersion Interactions of Carbohydrates with Condensate Aromatic Moieties: Theoretical Study on the CH–p

Interaction Additive Properties. Phys. Chem. Chem. Phys. 2011, 13, 14215–14222.

(56) Elgavish, S.; Shaanan, B. Lectin-Carbohydrate Interactions: Different

Folds, Common Recognition Principles. Trends Biochem. Sci. 1997, 22, 462-467.

(57) Vyas, N. K.; Vyas, M. N.; Quiocho, F. A. Sugar and signal-transducer binding sites of the Escherichia coli galactose chemoreceptor protein. Science 1988, 242,

1290-1295.

(58) Quiocho, F. A. Protein-carbohydrate interactions: basic molecular features. Pure and Applied Chemistry 1989, 61, 1293-1306.

(59) DeMarco, M. L.; Woods, R. J. Structural Glycobiology: A Game of

Snakes and Ladders. Glycobiology 2008, 18, 426-440.

(60) Woods, R. J.; Tessier, M. B. Computational Glycoscience: Characterizing the Spatial and Temporal Properties of Glycans and Glycan—Protein Complexes. Curr.

Opin. Struct. Biol. 2010, 20, 575-583.

(61) Ghazarian, H.; Idoni, B.; Oppenheimer, S. B. A Glycobiology Review:

Carbohydrates, Lectins and Implications in Cancer Therapeutics. Acta Histochem. 2011,

113, 236-247.

(62) Hakomori, S. Tumor-associated carbohydrate antigens. Annu Rev Immunol

1984, 2, 103-126.

110

(63) Fukuda, M. Possible roles of tumor-associated carbohydrate antigens.

Cancer Research 1996, 56, 2237-2244.

(64) Eisen, M. B.; Sabesan, S.; Skehel, J. J.; Wiley, D. C. Binding of the

Influenza A Virus to Cell-Surface Receptors: Structures of Five Hemagglutinin–

Sialyloligosaccharide Complexes Determined by X-Ray Crystallography. Virology 1997,

232, 19-31.

(65) Suzuki, Y.; Nagao, Y.; Kato, H.; Matsumoto, M.; Nerome, K.; Nakajima,

K.; Nobusawa, E. Human influenza A virus hemagglutinin distinguishes sialyloligosaccharides in membrane-associated gangliosides as its receptor which mediates the adsorption and fusion processes of virus infection. Specificity for oligosaccharides and sialic acids and the sequence to which sialic acid is attached.

Journal of Biological Chemistry 1986, 261, 17057-17061.

(66) Wiley, D. C.; Skehel, J. J. The structure and function of the hemagglutinin membrane glycoprotein of influenza virus. Annual review of biochemistry 1987, 56, 365-

394.

(67) Magnani, J. L.; Ernst, B. From Carbohydrate Leads to Glycomimetic

Drugs. Nature Reviews Drug Discovery 2009, 8, 661-677.

(68) Dreitlein, W. B.; Maratos, J.; Brocavich, J. Zanamivir and oseltamivir:

Two new options for the treatment and prevention of influenza. Clinical Therapeutics

2001, 23, 327-355.

111

(69) Moscona, A. Inhibitors for Influenza. N Engl J Med 2005,

353, 1363-1373.

(70) Kevin, H. M.: Galectins and Disease Implication for Targeted

Therapeutics. In American Chemical Society, 2012; pp 61-77.

(71) Tessier, M. B.; Grant, O. C.; Heimburg-Molinaro, J.; Smith, D.; Jadey, S.;

Gulick, A. M.; Glushka, J.; Deutscher, S. L.; Rittenhouse-Olson, K.; Woods, R. J.

Computational Screening of the Human TF-Glycome Provides a Structural Definition for the Specificity of Anti-Tumor Antibody JAA-F11. PLoS One 2013, 8, e54874.

(72) Woods, R.; Yongye, A.: Computational Techniques Applied to Defining

Carbohydrate Antigenicity. In Anticarbohydrate Antibodies; Kosma, P., Müller-Loennies,

S., Eds.; Springer Vienna, 2012; pp 361-383.

(73) Kadirvelraj, R.; Gonzalez-Outeriño, J.; Foley, B. L.; Beckham, M. L.;

Jennings, H. J.; Foote, S.; Ford, M. G.; Woods, R. J. Understanding the Bacterial

Polysaccharide Antigenicity of Streptococcus agalactiae versus Streptococcus pneumoniae. PNAS 2006, 103, 8149-8154.

(74) Yongye, A. B.; Gonzales Outeriño, J.; Glushka, J.; Schultheis, V.; Woods,

R. J. The Conformational Properties of Methyl α-(2,8)-di/trisialosides and Their N-acyl

Analogs: Implications for Anti-Neisseria meningitidis B Vaccine Design. Biochemistry

2008, 47, 12493–12514.

(75) Calarese, D. A.; Scanlan, C. N.; Zwick, M. B.; Deechongkit, S.; Mimura,

Y.; Kunert, R.; Zhu, P.; Wormald, M. R.; Stanfield, R. L.; Roux, K. H.; Kelly, J. W.;

112

Rudd, P. M.; Dwek, R. A.; Katinger, H.; Burton, D. R.; Wilson, I. A. Antibody Domain

Exchange Is an Immunological Solution to Carbohydrate Cluster Recognition. Science

2003, 300, 2065-2071.

(76) Dyekjær, J. D.; Woods, R. J.: Predicting the Three-Dimensional Structures of Anti-Carbohydrate Antibodies: Combining Comparative Modeling and MD

Simulations. In NMR Spectroscopy and Computer Modeling of Carbohydrates. Recent

Advances. ; Vliegenthart, J. F. G., Woods, R. J., Eds.; ACS Symposium Series 930;

American Chemical Society: Washington, 2006; Vol. 930; pp 203-219.

(77) Gildersleeve, J.; Roach, T. A.; Li, Z.; Gildersleeve, J. C. Supplier-

Dependent Antiglycan Monoclonal Antibody Specificities: Comment On "High-

Throughput Carbohydrate Microarray Profiling of 27 Antibodies Demonstrates

Widespread Specificity Problems. Glycobiology 2008, 18, 746-756.

(78) Pincus, S. H.; Moran, E.; Maresh, G.; Jennings, H. J.; Pritchard, D. G.;

Egan, M. L.; Blixt, O. Fine specificity and cross-reactions of monoclonal antibodies to group B streptococcal capsular polysaccharide type III. Vaccine 2012, 30, 4849-4858.

(79) Cooke, R. M.; Hale, R. S.; Lister, S. G.; Shah, G.; Weir, M. P. The

Conformation of the Sialyl Lewis X Ligand Changes upon Binding to E-Selectin.

Biochemistry 1994, 33, 10591-10596.

(80) Mahmoudian, M. The cannabinoid receptor: computer-aided molecular modeling and docking of ligand. Journal of Molecular Graphics and Modelling 1997, 15,

149-153.

113

(81) Laederach, A.; Dowd, M. K.; Coutinho, P. M.; Reilly, P. J. Automated

Docking of , 2-Deoxymaltose, and Maltotetraose into the Soybean -Amylase

Active Site. Proteins: Structure, Function and Genetics 1999, 37, 166-175.

(82) Goodsell, D. S.; Morris, G. M.; Olson, A. J. Automated docking of flexible ligands: Applications of . Journal of Molecular Recognition 1996, 9, 1-

5.

(83) Sotriffer, C. A.; Flader, W.; Winger, R. H.; Rode, B. M.; Liedl, K. R.;

Varga, J. M. Automated Docking of Ligands to Antibodies: Methods and Applications.

Methods, Companion to Methods in Enzymol. 2000, 20, 280-291.

(84) Jorgensen, W. L. The many roles of computation in drug discovery.

Science 2004, 303, 1813-1818.

(85) Foley, B. L.; Tessier, M. B.; Woods, R. J. Carbohydrate Force Fields.

WIREs Computational Molecular Science 2011, 1-69.

(86) Laederach, A.; Reilly, P. J. Modeling Protein Recognition of

Carbohydrates. Proteins: Struct. Funct. Genet. 2005, 60, 591-597.

(87) Sapay, N.; Nurisso, A.; Imberty, A.: Simulation of Carbohydrates, from

Molecular Docking to Dynamics in Water. In Biomolecular Simulations; Monticelli, L.,

Salonen, E., Eds.; Methods in Molecular Biology; Humana Press, 2013; Vol. 924; pp

469-483.

114

(88) Bras, N. F.; Fernandes, P. A.; Ramos, M. J. Docking and molecular dynamics studies on the stereoselectivity in the enzymatic synthesis of carbohydrates.

Theor. Chem. Acc. 2009, 122, 283-296.

(89) Laederach, A.; Reilly, P. J. Specific Empirical Free Energy Function for

Automated Docking of Carbohydrates to Proteins. J. Comput. Chem. 2003, 24, 1748-

1757.

(90) Hwang, M.-J.; Ni, X.; Waldman, M.; Ewig, C. S.; Hagler, A. T.

Derivation of Class II Force Fields. VI. Carbohydrate Compounds and Anomeric

Effects. Biopolymers 1998, 45, 435-468.

(91) Woods, R. J.; Edge, C. J.; Wormald, M. R.; Dwek, R. A.: GLYCAM_93:

A Generalized Parameter Set for Molecular Dynamics Simulations of Glycoproteins and

Oligosaccharides. Application to the Structure and Dynamics of a Disaccharide Related to Oligomannose. In Complex Carbohydrates in Drug Research; Bock, K., Clausen, H.,

Krogsgaard-Larsen, P., Kofod, H., Eds.; Munksgaard: Copenhagen, Denmark, 1993; Vol.

36; pp 15-36.

(92) Weldon, A. J.; Tschumper, G. S. Intrinsic Conformational Preferences of and an Anomeric-Like Effect in 1-Substituted Silacyclohexanes. Int. J. Quantum Chem.

2007, 107, 2261-2265.

(93) Woodcock, H. L.; Moran, D.; Pastor, R. W.; MacKerell, A. D.; Brooks, B.

R. Ab initio modeling of glycosyl torsions and anomeric effects in a model carbohydrate:

2-Ethoxy tetrahydropyran. Biophysical Journal 2007, 93, 1-10.

115

(94) Kirschner, K. N.; Yongye, A. B.; Tschampel, S. M.; González-Outeiriño,

J.; Daniels, C. R.; Foley, B. L.; Woods, R. J. GLYCAM06: A Generalizable

Biomolecular Force Field. Carbohydrates. J. Comput. Chem. 2008, 29, 622–655.

(95) Guvench, O.; Mallajosyula, S. S.; Raman, E. P.; Hatcher, E.;

Vanommeslaeghe, K.; Foster, T. J.; Jamison, F. W.; MacKerell, A. D. CHARMM

Additive All-Atom Force Field for Carbohydrate Derivatives and Its Utility in

Polysaccharide and Carbohydrate–Protein Modeling. J. Chem. Theory Comput. 2011, 7,

3162-3180.

(96) French, A. D.; Kelterer, A.-M.; Johnson, G. P.; Dowd, M. K.; Cramer, C.

J. HF/6-31G* Energy Surfaces for Disaccharide Analogs. J. Comput. Chem. 2001, 22, 65.

(97) French, A. D.; Dowd, M. K. Exploration of Disaccharide Conformations by Molecular Mechanics. J. Mol. Struct. (Theochem) 1993, 286, 183-201.

(98) Talavera, A.; Eriksson, A.; Ökvist, M.; López-Requena, A.; Fernández,

Y.; Pérez, R.; Moreno, E.; Krengel, U. Crystal Structure of an Anti-ganglioside Antibody, and Modeling of the Functional Mimicry of its NeuGc-GM3 Antigen by an Anti- idiotypic Antibody. Molec. Immun. 2009, 46, 3466-3475.

(99) Paula, S.; Monson, N.; Ball, W. J., Jr. Molecular Modeling of Cardiac

Glycoside Binding by the Human Sequence Monoclonal Antibody 1B3. Proteins 2005,

60, 382-391.

116

(100) Blaszczyk-Thurin, M.; Murali, R.; Westerink, M. A. J.; Steplewski, Z.;

Sung Co, M.; Kieber-Emmons, T. Molecular recognition of the Lewis Y antigen by monoclonal antibodies. Protein Engineering 1996, 9, 447-459.

(101) Vyas, N. K.; Vyas, M. N.; Chervenak, M. C.; Bundle, D. R.; Pinto, B. M.;

Quiocho, F. A. Structural Basis of Peptide-Carbohydrate Mimicry in an Antibody-

Combining Site. Proc. Natl. Acad. Sci. USA 2003, 100, 15023-15028.

(102) Agostino, M.; Sandrin, M. S.; Thompson, P. E.; Ramsland, P. A.; Yuriev,

E. Peptide Inhibitors of Xenoreactive Antibodies Mimic the Interaction Profile of the

Native Carbohydrate Antigens. Pep. Sci. 2011, 96, 193-206.

(103) Agostino, M.; Jene, C.; Boyle, T.; Ramsland, P. A.; Yuriev, E. Molecular

Docking of Carbohydrate Ligands to Antibodies: Structural Validation against Crystal

Structures. J. Chem. Inf. Model. 2009, 49, 2749-2760.

(104) Agostino, M.; Sandrin, M. S.; Thompson, P. E.; Yuriev, E.; Ramsland, P.

A. In Silico Analysis of Antibody-carbohydrate Interactions and its Application to

Xenoreactive Antibodies. Glycobiol. 2009, 47, 105-115.

(105) Lee, M.; Lloyd, P.; Zhang, X.; Schallhorn, J. M.; Sugimoto, K.; Leach, A.

G.; Sapiro, G.; Houk, K. N. Shapes of Antibody Binding Sites: Qualitative and

Quantitative Analyses Based on a Geomorphic Classification Scheme. J. Org. Chem.

2006, 71, 5082-5092.

117

(106) Huey, R.; Morris, G. M.; Olson, A. J.; Goodsell, D. S. A Semiempirical

Free Energy Force Field with Charge-Based Desolvation. J. Comput. Chem. 2007, 28,

1145-1152.

(107) Morris, G. M.; Huey, R.; Lindstrom, W.; Sanner, M. F.; Belew, R. K.;

Goodsell, D. S.; Olson, A. J. Autodock4 and AutoDockTools4: Automated Docking with

Selective Receptor Flexiblity. J. Comput. Chem. 2009, 30, 2785-2791.

(108) Trott, O.; Olson, A. J. AutoDock Vina: Improving the Speed and

Accuracy of Docking with a New Scoring Function, Efficient Optimization and

Multithreading. J. Comput. Chem. 2010, 31, 455-461.

(109) GLYCAM Web. http://www.glycam.org.

(110) Zhang, W. H., T; Schafmeister, C; Ross, W. S., Case, D. A. AmberTools

Version 1.0. 2008.

(111) Bernstein, F. C.; Koetzle, T. F.; Williams, G. J. B.; Meyer, E. F.; Brice, M.

D.; Rodgers, J. R.; Kennard, O.; Shimanouchi, T.; Tasumi, M. Protein Data Bank -

Computer-Based Archival File for Macromolecular Structures. J. Mol. Biol. 1977, 112,

535-542.

(112) French, A. D.; Johnson, G. P.; Cramer, C. J.; Csonka, G. I.

Conformational analysis of by electronic structure theories. Carbohydrate research 2012, 350, 68-76.

(113) Williams, T.; Kelley, C. gnuplot 4.6 (2013). URL http://www. gnuplot. info/documentation. html.

118

(114) Martin, A.; Cheetham, J. C.; Rees, A. R. Modeling antibody hypervariable loops: a combined algorithm. Proceedings of the National Academy of Sciences 1989, 86,

9268-9272.

(115) Martin, A. C. R.; Cheetham, J. C.; Rees, A. R. Molecular Modeling of

Antibody Combining Sites. Methods Enzymol. 1991, 203, 121-153.

(116) Wu, T. T.; Kabat, E. An analysis of the sequences of the variable regions of Bence Jones proteins and myeloma light chains and their implications for antibody complementarity. The Journal of experimental medicine 1970, 132, 211-250.

(117) Chothia, C.; Lesk, A. M. Canonical Structures for the Hypervariable

Regions of Immunoglobulins. J. Mol. Biol. 1987, 196, 901-917.

(118) Frisch, M.; Trucks, G.; Schlegel, H. B.; Scuseria, G.; Robb, M.;

Cheeseman, J.; Scalmani, G.; Barone, V.; Mennucci, B.; Petersson, G. Gaussian 09,

Revision A. 02, Gaussian. Inc., Wallingford, CT 2009, 200.

(119) Hill, A. D.; Reilly, P. J. A Gibbs free energy correlation for automated docking of carbohydrates. Journal of computational chemistry 2008, 29, 1131-1141.

(120) French, A. D.; Tran, V. H.; Pérez, S.: Conformational Analysis of a

Disaccharide (Cellobiose) with the Molecular Mechanics Program (MM2). In Computer

Modeling of Carbohydrate Molecules; French, A. D., Brady, J. W., Eds.; American

Chemical Society: Washington, DC, 1990; Vol. Symposium Series 430; pp 191-212.

119

(121) Lütteke, T.; von der Lieth, C.-W. Data Mining the PDB for Glyco-

Related Data. Methods in Molecular Biology, : Methods and Protocols 2009,

534, 293-310.

(122) Le Guilloux, V.; Schmidtke, P.; Tuffery, P. Fpocket: An Open Source

Platform For Ligand Pocket Detection. BMC Bioinform. 2009, 10, 168.

(123) Chang, M. W.; Ayeni, C.; Breuer, S.; Torbett, B. E. Virtual Screening for

HIV Protease Inhibitors: A Comparison of AutoDock 4 and Vina. PLoS ONE 2010, 5, 9.

(124) Bohne, A.; Lang, E.; von der Lieth, C.-W. W3-SWEET: Carbohydrate

Modeling By Internet. Journal of Molecular Modeling 1998, 4, 33-43.

(125) Fadda, E.; Woods, R. J. Molecular Simulations of Carbohydrates and

Protein–carbohydrate Interactions: Motivation, Issues and Prospects. Drug Discov. Today

2010, 15, 596-609.

(126) Imberty, A. Oligosaccharide Structures: Theory Versus Experiment. Curr.

Opin. Struct. Biol. 1997, 7, 617-623.

(127) Pauling, L.: The Nature of the Chemical Bond; Cornell university press

Ithaca, NY, 1960; Vol. 3.

(128) Damm, W.; Frontera, A.; Tirado-Rives, J.; Jorgensen, W. L. OPLS All-

Atom Force Field for Carbohydrates. J. Comput. Chem. 1997, 18, 1955-1970.

120

(129) Halperin, I.; Ma, B.; Wolfson, H.; Nussinov, R. Principles of Docking:

An Overview of Search Algorithms and a Guide to Scoring Functions. Proteins:

Structure, Function and Genetics 2002, 47, 409-443.

(130) Schulz-Gasch, T.; Stahl, M. Scoring functions for protein–ligand interactions: a critical perspective. Drug Discovery Today: Technologies 2004, 1, 231-

239.

(131) Kerzmann, A.; Neumann, D.; Kohlbacher, O. SLICK − Scoring and

Energy Functions for Protein−Carbohydrate Interactions. J. Chem. Inf. Model. 2006, 46,

1635–1642.

(132) Kerzmann, A.; Fuhrmann, J.; Kohlbacher, O.; Neumann, D.

BALLDock/SLICK: A new method for protein-carbohydrate docking. Journal of

Chemical Information and Modeling 2008, 48, 1616-1625.

(133) Pettersen, E. F.; Goddard, T. D.; Huang, C. C.; Couch, G. S.; Greenblatt,

D. M.; Meng, E. C.; Ferrin, T. E. UCSF Chimera - A Visualization System for

Exploratory Research and Analysis. J. Comp. Chem. 2004, 25, 1605-1612.

(134) Humphrey, W.; Dalke, A.; Schulten, K. VMD - Visual Molecular

Dynamics. J. Molec. Graphics 1996, 14, 33-38.

(135) Cremer, D.; Pople, J. A. A General Definition of Ring Puckering

Coordinates. J. Am. Chem. Soc. 1975, 97, 1354-1358.

121

(136) Makeneni, S.; Foley, B. L.; Woods, R. J. BFMP: A Method for

Discretizing and Visualizing Pyranose Conformations. Journal of chemical information and modeling 2014, 54, 2744-2750.

(137) Lütteke, T.; Bohne-Lang, A.; Loss, A.; Goetz, T.; Frank, M.; von der

Lieth, C.-W. GLYCOSCIENCES. de: an Internet portal to support glycomics and glycobiology research. Glycobiology 2006, 16, 71R-81R.

(138) Sternberg, M. J. E.: Protein Structure Prediction: A Practical Approach,

1996.

(139) Schwede, T.: Computational Structural Biology: Methods and

Applications, 2008.

(140) Lawrence, S.; Feil, S.; Holien, J.; Kuiper, M.; Doughty, L.; Dolezal, O.;

Mulhern, T.; Tweten, R.; Parker, M. Manipulating the Lewis antigen specificity of the cholesterol-dependent cytolysin lectinolysin. Frontiers in Immunology 2012, 3.

(141) Oyelaran, O.; Gildersleeve, J. C. Glycan Arrays: Recent Advances and

Future Challenges. Current Opinion in Chemical Biology 2009, 13, 406-413.

(142) Taylor, M. E.; Drickamer, K. Structural Insights into what Glycan Arrays tell us About how Glycan-binding Proteins Interact with their Ligands. Glycobiology

2009, 19, 1155–1162.

(143) Muraki, M.; Morikawa, M.; Jigami, Y.; Tanaka, H. The roles of conserved aromatic amino-acid residues in the active site of human lysozyme: a site-specific

122 mutagenesis study. Biochimica et Biophysica Acta (BBA) - Protein Structure and

Molecular Enzymology 1987, 916, 66-75.

(144) Luís, A. S.; Venditto, I.; Temple, M. J.; Rogowski, A.; Baslé, A.; Xue, J.;

Knox, J. P.; Prates, J. A.; Ferreira, L. M.; Fontes, C. M. Understanding how noncatalytic carbohydrate binding modules can display specificity for xyloglucan. Journal of

Biological Chemistry 2013, 288, 4799-4809.

(145) Kerzmann, A.; Neumann, D.; Kohlbacher, O. SLICK - Scoring and

Energy Functions for Protein-Carbohydrate Interactions. J. Chem. Inf. Model. 2006, 46,

1635-1642.

(146) Kitago, Y.; Karita, S.; Watanabe, N.; Kamiya, M.; Aizawa, T.; Sakka, K.;

Tanaka, I. Crystal structure of Cel44A, a glycoside hydrolase family 44 endoglucanase from Clostridium thermocellum. The Journal of biological chemistry 2007, 282, 35703-

35711.

(147) Tamres, M. Aromatic Compounds as Donor Molecules in Hydrogen

Bonding1. Journal of the American Chemical Society 1952, 74, 3375-3378.

(148) Reeves, L.; Schneider, W. Nuclear magnetic resonance measurements of complexes of chloroform with aromatic molecules and olefins. Canadian Journal of

Chemistry 1957, 35, 251-261.

(149) Pimentel, G. C.; McClellan, A. L.: The Hydrogen Bond; W. H. Freeman and Company: New York, 1960.

123

(150) Nishio, M. The CH/π hydrogen bond: Implication in chemistry. Journal of

Molecular Structure 2012, 1018, 2-7.

(151) Brandl, M.; Weiss, M. S.; Jabs, A.; Sühnel, J.; Hilgenfeld, R. C-h⋯π- interactions in proteins. Journal of Molecular Biology 2001, 307, 357-377.

(152) Kitaura, K.; Morokuma, K. A new energy decomposition scheme for molecular interactions within the Hartree‐Fock approximation. International Journal of

Quantum Chemistry 1976, 10, 325-340.

(153) Tsuzuki, S.; Honda, K.; Uchimaru, T.; Mikami, M.; Tanabe, K. The

Magnitude of the CH/π Interaction between Benzene and Some Model Hydrocarbons.

Journal of the American Chemical Society 2000, 122, 3746-3753.

(154) Laughrey, Z. R.; Kiehna, S. E.; Riemen, A. J.; Waters, M. L.

Carbohydrate−π Interactions: What Are They Worth? J. Am. Chem. Soc. 2008, 130,

14625–14633.

(155) Ringer, A. L.; Figgs, M. S.; Sinnokrot, M. O.; Sherrill, C. D. Aliphatic

C−H/π Interactions: Methane−Benzene, Methane−Phenol, and Methane−Indole

Complexes. The Journal of Physical Chemistry A 2006, 110, 10822-10828.

(156) Mohamed, M. N. A.; Watts, H. D.; Guo, J.; Catchmark, J. M.; Kubicki, J.

D. MP2, density functional theory, and molecular mechanical calculations of C–H···π and hydrogen bond interactions in a -binding module–cellulose model system.

Carbohydrate Research 2010, 345, 1741-1751.

124

(157) Asensio, J. L.; Ardá, A.; Cañada, F. J.; Jiménez-Barbero, J. Carbohydrate–

Aromatic Interactions. Acc. Chem. Res. 2012, 46, 946-954.

(158) Muraki, M. The Importance of Ch / π Interactions to the Function of

Carbohydrate Binding Proteins. Protein and Peptide Letters 2002, 9, 195-209.

(159) Krapp, S.; Mimura, Y.; Jefferis, R.; Huber, R.; Sondermann, P. Structural

Analysis of Human IgG-Fc Glycoforms Reveals a Correlation Between Glycosylation and Structural Integrity. J. Mol. Biol. 2003, 325, 979-989.

(160) Fernández-Alonso, M. d. C.; Cañada, F. J.; Jiménez-Barbero, J.; Cuevas,

G. Molecular Recognition of Saccharides by Proteins. Insights on the Origin of the

Carbohydrate−Aromatic Interactions. Journal of the American Chemical Society 2005,

127, 7379-7386.

(161) Thobhani, S.; Ember, B.; Siriwardena, A.; Boons, G.-J. Multivalency and the Mode of Action of Bacterial Sialidases. J. Am. Chem. Soc. 2002, A-B.

(162) Kitov, P. I.; Bundle, D. R. On the nature of the multivalency effect: a thermodynamic model. Journal of the American Chemical Society 2003, 125, 16271-

16284.

(163) Neumann, D.; Kohlbacher, O. In Tilte2009.

(164) Vyas, N. K.; Vyas, M. N.; Quiocho, F. A. Comparison of the Periplasmic

Receptors for L-Arabinose, D-Glucose/D-Galactose, and D-. J. Biol. Chem. 1991,

266, 5226-5237.

125

(165) Vardakou, M.; Flint, J.; Christakopoulos, P.; Lewis, R. J.; Gilbert, H. J.;

Murray, J. W. A family 10 Thermoascus aurantiacus xylanase utilizes arabinose decorations of xylan as significant substrate specificity determinants. J Mol Biol 2005,

352, 1060-1067.

(166) Chávez, M. I.; Andreu, C.; Paloma, V.; Aboitiz, N.; Freire, F.; Groves, P.;

Asensio, J. L.; Asensio, G.; Muraki, M.; Cañada, F. J.; Jiménez-Barbero, J. On the

Importance of Carbohydrate-Aromatic Interactions for the Molecular Recognition of

Oligosaccharides by Proteins: NMR Studies of the Structure and Binding Affinity of

AcAMP2-like Peptides with Non-Natural Naphthyl and Fluoroaromatic Residues. Chem.

Eur. J. 2005, 11, 7060-7074.

(167) Wojciechowski, M.; Lesyng, B. Generalized Born Model: Analysis,

Refinement, and Applications to Proteins.

(168) Nivedha, A. K., Thieker F. David, Woods, R. J. Vina-Carb: Improving

Glycosidic Angles During Carbohydrate Docking. J. Chem. Theory Comput. 2015,

(Accepted).

(169) Plevin, M. J.; Bryce, D. L.; Boisbouvier, J. Direct detection of CH/π interactions in proteins. Nat Chem 2010, 2, 466-471.

(170) Jeziorski, B.; Moszynski, R.; Szalewicz, K. Perturbation theory approach to intermolecular potential energy surfaces of van der Waals complexes. Chem Rev 1994,

94, 1887-1930.

126

9. APPENDIX

Supplementary Information Chapter 4

S4.1. Editing Autodock Vina’s source code to give 100 docked poses

ADV’s source code was downloaded, and the variable par.mc.num_saved_mins in main_procedure (in the file main.cpp) was set to 100, and the program was re-compiled.

S4.2. AD3 parameters

a) Docking parameters outlev 1 # diagnostic output level seed pid time # seeds for random generator types COH # atom type names move 1MFA_ligand.pdbq # small molecule about 0.003 0.036 -0.094 # small molecule center tran0 random # initial coordinates/A or random quat0 random # initial quaternion ndihe 15 # number of active torsions dihe0 random # initial dihedrals (relative) or random tstep 2.0 # translation step/A qstep 50.0 # quaternion step/deg dstep 50.0 # torsion step/deg torsdof 7 0.3113 # torsional degrees of freedom and coeffiecent intnbp_r_eps 4.00 0.0222750 12 6 # C-C lj intnbp_r_eps 3.60 0.0257202 12 6 # C-O lj intnbp_r_eps 3.00 0.0081378 12 6 # C-H lj intnbp_r_eps 3.20 0.0297000 12 6 # O-O lj intnbp_r_eps 2.60 0.0093852 12 6 # O-H lj intnbp_r_eps 2.00 0.0029700 12 6 # H-H lj rmstol 2.0 # cluster_tolerance/A extnrg 1000.0 # external grid energy e0max 0.0 10000 # max initial energy; max number of retries ga_pop_size 200 # number of individuals in population ga_num_evals 800000 # maximum number of energy evaluations ga_num_generations 27000 # maximum number of generations ga_elitism 1 # number of top individuals to survive to next generation ga_mutation_rate 0.02 # rate of gene mutation ga_crossover_rate 0.8 # rate of crossover ga_window_size 10 # ga_cauchy_alpha 0.0 # Alpha parameter of Cauchy distribution ga_cauchy_beta 1.0 # Beta parameter Cauchy distribution set_ga # set the above parameters for GA or LGA sw_max_its 300 # iterations of Solis & Wets local search

127 sw_max_succ 4 # consecutive successes before changing rho sw_max_fail 4 # consecutive failures before changing rho sw_rho 1.0 # size of local search space to sample sw_lb_rho 0.01 # lower bound on rho ls_search_freq 0.06 # probability of performing local search on individual set_sw1 # set the above Solis & Wets parameters ga_run 100 # do this many hybrid GA-LS runs analysis # perform a ranked cluster analysis

b) Grid parameters npts 70 70 100 # num.grid points in xyz spacing 0.375 # spacing(A) gridcenter 0.0 0.0 11.0 # xyz-coordinates or auto smooth 0.5 # store minimum energy w/in rad(A) dielectric -0.1465 # <0, AD4 distance-dep.diel;>0, constant

S4.3. AD4.2 parameters

a) Docking parameters autodock_parameter_version 4.2 # used by autodock to validate parameter set outlev 1 # diagnostic output level intelec # calculate internal electrostatics seed pid time # seeds for random generator ligand_types C HD OA # atoms types in ligand move 1MFD_ligand.pdbqt # small molecule about 0.045 -0.063 -0.041 # small molecule center tran0 random # initial coordinates/A or random axisangle0 random # initial orientation dihe0 random # initial dihedrals (relative) or random tstep 2.0 # translation step/A qstep 50.0 # quaternion step/deg dstep 50.0 # torsion step/deg torsdof 15 # torsional degrees of freedom rmstol 2.0 # cluster_tolerance/A extnrg 1000.0 # external grid energy e0max 0.0 10000 # max initial energy; max number of retries ga_pop_size 200 # number of individuals in population ga_num_evals 800000 # maximum number of energy evaluations ga_num_generations 27000 # maximum number of generations ga_elitism 1 # number of top individuals to survive to next generation ga_mutation_rate 0.02 # rate of gene mutation ga_crossover_rate 0.8 # rate of crossover ga_window_size 10 # ga_cauchy_alpha 0.0 # Alpha parameter of Cauchy distribution ga_cauchy_beta 1.0 # Beta parameter Cauchy distribution set_ga # set the above parameters for GA or LGA sw_max_its 300 # iterations of Solis & Wets local search sw_max_succ 4 # consecutive successes before changing rho sw_max_fail 4 # consecutive failures before changing rho sw_rho 1.0 # size of local search space to sample sw_lb_rho 0.01 # lower bound on rho ls_search_freq 0.06 # probability of performing local search on individual set_psw1 # set the above pseudo-Solis & Wets parameters unbound_model bound # state of unbound ligand ga_run 100 # do this many hybrid GA-LS runs write_all # write all conformations in a cluster analysis # perform a ranked cluster analysis

128

b) Grid parameters npts 70 70 100 # num.grid points in xyz spacing 0.375 # spacing(A) gridcenter 0.0 0.0 11.0 # xyz-coordinates or auto smooth 0.5 # store minimum energy w/in rad(A) dielectric -0.1146 # <0, distance-dep.diel;>0, constant

S4.4. ADV Docking parameters center_x = 0 center_y = 0 center_z = 11 size_x = 26.25 size_y = 26.25 size_z = 37.5 energy_range = 10 num_modes = 100 cpu = 8

S4.5. Comparison of the glycosidic torsion angles in the crystal carbohydrate

ligands to those in the Glycam ligands and their corresponding CHI energy

scores.

Experimental Glycam Syste Torsion Disaccharide Unit m angle Torsion CHI Torsion CHI angle energy angle energy 71.5 0.0 60.0 0.4 DAbepα1-3DManpα-OMe φ 1MF 77.4 0.1 60.0 0.4 DGalpα1-2DManpα-OMe A -135.1 0.1 -120.2 0.3 DAbepα1-3DManpα-OMe ψ -94.7 0.0 -118.1 0.3 DGalpα1-2DManpα-OMe 76.1 0.0 60.0 0.4 DAbepα1-3DManpα-OMe φ 1MF 103.7 1.4 60.0 0.4 DGalpα1-2DManpα-OMe D -139.4 0.1 -120.2 0.3 DAbepα1-3DManpα-OMe ψ -151.6 0.2 -118.1 0.3 DGalpα1-2DManpα-OMe -66.7 0.0 -65.4 0.0 DGalpβ1-4DGlcpNAcβ-OMe φ -82.7 0.2 -65.5 0.2 LFucpα1-3DGlcpNAcβ-OMe 1UZ8 128.0 0.3 125.1 0.3 DGalpβ1-4DGlcpNAcβ-OMe ψ -99.4 0.1 -101.1 0.1 LFucpα1-3DGlcpNAcβ-OMe 1M7 (2-deoxy)LRhapα1- φ -78.3 0.1 -60.0 0.4 D 3DGlcpNAcβ-OMe

129

-63.2 0.3 -60.0 0.4 LRhapα1-3DGlcpNAcβ-OMe (2-deoxy)LRhapα1- -114.1 0.3 -120.2 0.3 ψ 3DGlcpNAcβ-OMe 111.2 0.2 120.1 0.3 LRhapα1-3DGlcpNAcβ-OMe -77.8 0.1 -66.9 0.1 LFucpα1-3DGlcpNAcα-OH φ -77.7 0.3 -66.4 0.0 DGalpβ1-4DGlcpNAcα-OH -78.1 0.1 -69.4 0.1 LFucpα1-2DGalpβ 1S3K -103.4 0.1 -99.4 0.1 LFucpα1-3DGlcpNAcα-OH ψ 139.3 0.3 127.8 0.3 DGalpβ1-4DGlcpNAcα-OH 140.4 0.3 125.5 0.3 LFucpα1-2DGalpβ -90.2 1.0 -62.4 0.1 LRhapα1-3DGlcpNAcβ -115.3 2.3 -69.5 0.1 LRhapα1-2LRhapα φ -81.7 0.1 -69.0 0.1 LRhapα1-3LRhapα -63.5 0.3 -65.9 0.2 LRhapα1-2LRhapα 1M7I 53.4 0.9 112.4 0.3 LRhapα1-2LRhapα -90.9 0.0 -114.8 0.3 LRhapα1-2LRhapα ψ 121.0 0.3 114.7 0.3 LRhapα1-3LRhapα 149.3 0.1 112.3 0.3 LRhapα1-2LRhapα Torsion angles are given in degrees; CHI energy scores are given in kcal/mol.

130

S4.6. CHI Energy Functions

Equation for the φ angle in α-linkages:

E(φ) =

2 2 (φ+1.9949∗ 102) (φ−1.706∗ 102) − − 2.977푒 6.7781∗102 + 1.0225 ∗ 102푒 1.6968 ∗ 103 + 1.0745 ∗

2 2 (φ+1.0531∗102) (φ−6.2012)2 (φ−9.1655 ∗ 101) − − − 101푒 4.7246∗ 103 + 3.6735푒 1.3477 ∗ 103 + 2.061푒 1.5 ∗ 103 +

2 2 2 (φ+2.2979∗ 101) (φ−8.3602∗ 101) (φ−1.7001∗ 102) − − − 6.1939푒 2.1223 ∗ 103 − 2.1115푒 1.2541∗ 103 − 9.8001 ∗ 101푒 1.5987∗ 103

Equation for the φ angle in β-linkages:

E(φ) =

2 2 (φ+3.3077∗ 102) (φ−3.0463∗ 102) − − 4. 5054 ∗ 102푒 4.4498∗ 103 + 2.3712 ∗ 101푒 8.3752 ∗ 103 +

2 2 (φ+1.5208 ∗ 102) (φ+2.3516 ∗ 101) − − 5.9353푒 6.0498 ∗ 103 + 2.2467 ∗ 101푒 6.0690 ∗ 102 + 1.0036 ∗

2 2 2 (φ−1.2096 ∗ 102) (φ+2.4268 ∗ 101) (φ−1.9632 ∗ 101) − − − 101푒 4.038∗ 103 − 1.8141 ∗ 101푒 5.4305 ∗ 102 + 5.8823푒 8.9793 ∗ 102 −

2.1283

Equation for the ψ angle in 1-2ax, 1-4ax and 1-3eq linkages:

131

2 2 (ψ−5.0456)2 (ψ−3.6249 ∗ 102) (ψ−1.212 ∗ 102) − − − E(ψ) = 4.6237푒 5.0058 ∗ 103 + 4.6139푒 2.0906 ∗ 103 + 4.9419푒 2.0938 ∗ 103 +

2 2 (ψ−2.4143 ∗ 102) (ψ−6.8425∗ 101) − − 4.029 ∗ 10−1푒 4.5683∗ 102 + 7.9888 ∗ 10−1푒 6.7881 ∗ 102 + 2.2299 ∗

2 (ψ−1.9293∗ 102) − 10−1푒 3.4725∗ 102 − 1.2565 ∗ 10−1

Equation for the ψ angle in 1-2eq, 1-4eq and 1-3ax linkages:

E(ψ) =

2 2 2 (ψ−10−30) (ψ−3.5777 ∗ 102) (ψ−1.4664 ∗ 102) − − − 4.4681푒 1.2796 ∗ 103 + 4.382푒 6.0501∗ 103 + 2.8495 ∗ 102푒 1.5518 ∗ 103 +

2 2 (ψ−2.2068 ∗ 102) (ψ−1.4737 ∗ 102) − − 4.7613푒 5.8929 ∗ 103 − 1.692 ∗ 102푒 1.7425 ∗ 103 − 1.1844 ∗

2 (ψ−1.4606 ∗ 102) − 102푒 1.3598 ∗ 103 + 1.0220

132

S4.7. Plots showing agreement between the quantum mechanical data points

(black dots) and CHI energy curves (grey lines)

Root mean squared deviations (RMSDs) were calculated between the quantum mechanical data points and corresponding data points on the CHI energy curve.

12 12

10 RMSD: 0.02 10 RMSD: 0.16

8 8

6 6

[kcal/mol]

[kcal/mol]

E

E E Δ Δ 4 4

2 2

0 0 0 60 120 180 240 300 360 0 60 120 180 240 300 360 φ [deg] φ [deg]

8 8

7 RMSD: 0.02 7 RMSD: 0.04

6 6

5 5

4 4

[kcal/mol]

[kcal/mol] E E

3 E 3

Δ Δ

2 2

1 1

0 0 0 60 120 180 240 300 360 0 60 120 180 240 300 360 ψ [deg] ψ [deg]

S4.8. GlyTorsion analysis

The various searches performed using the web-tool are tabulated below. S1 refers to the non-reducing sugar residue, while S2 refers to the reducing sugar residue. GlyTorsion searches performed for Figure 4.6 (main text):

Figure 8a Figure 8b S. No. S1 linkage S2 S. No. S1 linkage S2 1 a-D-* 1-2 *-Manp* 1 b-D-* 1-2 *-Manp* 2 a-D-* 1-2 *-Galp* 2 b-D-* 1-2 *-Galp* 3 a-D-* 1-2 *-Glcp* 3 b-D-* 1-2 *-Glcp* 4 a-L-* 1-2 *-Manp* 4 b-L-* 1-2 *-Manp* 5 a-L-* 1-2 *-Galp* 5 b-L-* 1-2 *-Galp* 6 a-L-* 1-2 *-Glcp* 6 b-L-* 1-2 *-Glcp* 7 a-D-* 1-4 *-Manp* 7 b-D-* 1-4 *-Manp* 8 a-D-* 1-4 *-Galp* 8 b-D-* 1-4 *-Galp* 9 a-D-* 1-4 *-Glcp* 9 b-D-* 1-4 *-Glcp*

133

10 a-L-* 1-4 *-Manp* 10 b-L-* 1-4 *-Manp* 11 a-L-* 1-4 *-Galp* 11 b-L-* 1-4 *-Galp* 12 a-L-* 1-4 *-Glcp* 12 b-L-* 1-4 *-Glcp* 13 a-D-* 1-3 *-Manp* 13 b-D-* 1-3 *-Manp* 14 a-D-* 1-3 *-Galp* 14 b-D-* 1-3 *-Galp* 15 a-D-* 1-3 *-Glcp* 15 b-D-* 1-3 *-Glcp* 16 a-L-* 1-3 *-Manp* 16 b-L-* 1-3 *-Manp* 17 a-L-* 1-3 *-Galp* 17 b-L-* 1-3 *-Galp* 18 a-L-* 1-3 *-Glcp* 18 b-L-* 1-3 *-Glcp*

Figure 8c Figure 8d S. No. S1 linkage S2 S. No. S1 linkage S2 1 *-Manp* 1-2 *-D-Manp* 1 *-Manp* 1-2 *-D-Glcp* 2 *-Galp* 1-2 *-D-Manp* 2 *-Galp* 1-2 *-D-Glcp* 3 *-Glcp* 1-2 *-D-Manp* 3 *-Glcp* 1-2 *-D-Glcp* 4 *-Manp* 1-2 *-L-Manp* 4 *-Manp* 1-2 *-L-Glcp* 5 *-Galp* 1-2 *-L-Manp* 5 *-Galp* 1-2 *-L-Glcp* 6 *-Glcp* 1-2 *-L-Manp* 6 *-Glcp* 1-2 *-L-Glcp* 7 *-Manp* 1-3 *-D-Glcp* 7 *-Manp* 1-2 *-D-Galp* 8 *-Galp* 1-3 *-D-Glcp* 8 *-Galp* 1-2 *-D-Galp* 9 *-Glcp* 1-3 *-D-Glcp* 9 *-Glcp* 1-2 *-D-Galp* 10 *-Manp* 1-3 *-L-Glcp* 10 *-Manp* 1-2 *-L-Galp* 11 *-Galp* 1-3 *-L-Glcp* 11 *-Galp* 1-2 *-L-Galp* 12 *-Glcp* 1-3 *-L-Glcp* 12 *-Glcp* 1-2 *-L-Galp* 13 *-Manp* 1-3 *-D-Manp* 13 *-Manp* 1-4 *-D-Glcp* 14 *-Galp* 1-3 *-D-Manp* 14 *-Galp* 1-4 *-D-Glcp* 15 *-Glcp* 1-3 *-D-Manp* 15 *-Glcp* 1-4 *-D-Glcp* 16 *-Manp* 1-3 *-L-Manp* 16 *-Manp* 1-4 *-L-Glcp* 17 *-Galp* 1-3 *-L-Manp* 17 *-Galp* 1-4 *-L-Glcp* 18 *-Glcp* 1-3 *-L-Manp* 18 *-Glcp* 1-4 *-L-Glcp* 19 *-Manp* 1-3 *-D-Galp* 19 *-Manp* 1-4 *-D-Galp* 20 *-Galp* 1-3 *-D-Galp* 20 *-Galp* 1-4 *-D-Galp* 21 *-Glcp* 1-3 *-D-Galp* 21 *-Glcp* 1-4 *-D-Galp* 22 *-Manp* 1-3 *-L-Galp* 22 *-Manp* 1-4 *-L-Galp* 23 *-Galp* 1-3 *-L-Galp* 23 *-Galp* 1-4 *-L-Galp* 24 *-Glcp* 1-3 *-L-Galp* 24 *-Glcp* 1-4 *-L-Galp* 25 *-Manp* 1-4 *-D-Manp* 26 *-Galp* 1-4 *-D-Manp* 27 *-Glcp* 1-4 *-D-Manp* 28 *-Manp* 1-4 *-L-Manp* 29 *-Galp* 1-4 *-L-Manp*

134

30 *-Glcp* 1-4 *-L-Manp*

In GlyTorsion, the ψ torsion angle is defined with respect to the Cx+1 atom of the reducing sugar, but since the CHI energy functions can only be applied to ψ torsion angles defined w.r.t. the Cx-1 atom, the ψ angle values from the web-tool were used to obtain the same torsion angle values defined w.r.t. the Cx-1 atom, by adding or subtracting 120° to the value depending on the D/L configuration of the reducing sugar.

S4.9. CHI energy score in kcal/mol of top-ranked poses, before and after

rescoring

AD3 AD4.2 ADV System Before After Before After Before After rescoring rescoring rescoring rescoring rescoring rescoring 1MFA 1.31 1.31 1.48 1.48 2.84 1.73 1MFD 1.29 1.29 4.42 1.84 4.21 2.51 1UZ8 1.29 0.29 0.65 0.65 0.76 0.76 1M7D 3.14 1.19 11.60 0.93 1.11 1.11 1S3K 7.02 0.99 1.26 1.26 1.67 1.67 1M7I 7.72 2.33 18.45 4.31 10.20 2.31

S4.10. Rank of lowest PRMSD structure, before and after rescoring

AD3 AD4.2 ADV System Before After Before After Before After rescoring rescoring rescoring rescoring rescoring rescoring 1MFA 55 67 10 14 1 1 1MFD 18 9 13 2 2 1 1UZ8 4 3 4 2 1 1 1M7D 10 4 7 2 1 1 1S3K 2 1 5 3 1 1 1M7I 1 3 31 50 1 1

S4.11. tLEaP input files to assemble ligands with deoxy sugars

135

a) Assembling the 1MFA/1MFD ligand

# ----- leaprc for loading the Glycam_04 force field addPdbResMap {{ 0 "OLS" "NOLS" } { 1 "OLS" "COLS" } { 0 "OLT" "NOLT" } { 1 "OLT" "COLT" }

{ 0 "OLP" "NOLP" } { 1 "OLP" "COLP" } { 0 "HYP" "NHYP" } { 1 "HYP" "CHYP" }}

# load atom type hybridizations addAtomTypes {{ "C" "C" "sp2" } { "CG" "C" "sp3" } { "CY" "C" "sp3" } { "H" "H"

"sp3" } { "H1" "H" "sp3" } { "H2" "H" "sp3" } { "HC" "H" "sp3" } { "HO" "H" "sp3" } {

"HW" "H" "sp3" } { "N" "N" "sp2" } { "OH" "O" "sp3" } { "OS" "O" "sp3" } { "O" "O"

"sp2" } { "O2" "O" "sp2" } { "OW" "O" "sp3" } {"OY" "O" "sp3" } { "S" "S" "sp3" }}

# load the main paramter set parm94 = loadamberparams /usr/local/programs/amber10/dat/leap/parm/parm94.dat glycam_06 = loadamberparams /usr/local/programs/glycam06/parameters/Glycam_06_current.dat

# load all prep files for polysaccharides loadamberprep /usr/local/programs/glycam06/prep_files/Glycam_06_current.prep

# load lib files loadOff solvents.lib loadOff ions94.lib amber_seq = sequence { OME ZMA } set amber_seq tail amber_seq.2.O3 amber_seq=sequence { amber_seq 0AE } set amber_seq tail amber_seq.2.O2 amber_seq=sequence { amber_seq 0LA } impose amber_seq {3 2} { {H1 C1 O3 C3 -60.0} } impose amber_seq {3 2} { {C1 O3 C3 H3 0.0} }

136 impose amber_seq {4 2} { {H1 C1 O2 C2 -60.0} } impose amber_seq {4 2} { {C1 O2 C2 H2 0.0} } charge amber_seq savepdb amber_seq salmonella_glycam.pdb

b) Assembling the 1M7D ligand

# ----- leaprc for loading the Glycam_04 force field addPdbResMap {{ 0 "OLS" "NOLS" } { 1 "OLS" "COLS" } { 0 "OLT" "NOLT" } { 1 "OLT" "COLT" }

{ 0 "OLP" "NOLP" } { 1 "OLP" "COLP" } { 0 "HYP" "NHYP" } { 1 "HYP" "CHYP" }}

# load atom type hybridizations addAtomTypes {{ "C" "C" "sp2" } { "CG" "C" "sp3" } { "CY" "C" "sp3" } { "H" "H"

"sp3" } { "H1" "H" "sp3" } { "H2" "H" "sp3" } { "HC" "H" "sp3" } { "HO" "H" "sp3" } {

"HW" "H" "sp3" } { "N" "N" "sp2" } { "OH" "O" "sp3" } { "OS" "O" "sp3" } { "O" "O"

"sp2" } { "O2" "O" "sp2" } { "OW" "O" "sp3" } { "OY" "O" "sp3" } { "S" "S" "sp3" }}

# load the main paramter set parm94 = loadamberparams /usr/local/programs/amber10/dat/leap/parm/parm94.dat glycam_06 = loadamberparams /usr/local/programs/glycam06/parameters/Glycam_06_current.dat

# load all prep files for polysaccharides loadamberprep /usr/local/programs/glycam06/prep_files/Glycam_06_current.prep amber_seq = sequence { OME 3YB 2DR 0hA } impose amber_seq {3 2} { {H1 C1 O3 C3 60.0} } impose amber_seq {3 2} { {C1 O3 C3 H3 0.0} } impose amber_seq {4 3} { {H1 C1 O3 C3 60.0} } impose amber_seq {4 3} { {C1 O3 C3 H3 0.0} } charge amber_seq savepdb amber_seq 1M7D_glycam.pdb

137

Supplementary Information Chapter 5

S5.1. A comparison of PRMSDmin(5) poses obtained using ADV and VC1

amd VC2 at all 5 CHI-cutoff values (1 to 5) is shown in 1a. The corresponding

standard deviation values are depicted in 1b.

a.) 1OP 1CL 1C L 3C PDB 3 1UZ8 1M7D 1MFA 1MFD 1MFE 1S3K Y Z 1MFC 1M7I 1MFB 3BZ4 6S F2 2G1 291-2G3- SYA/J scFv SE155- Fab SE155- SE155- HU3S19 BR9 BR9 SE155- SYA/J SE155 2- Ab 2 A 6 4 4 4 3 6 6 4 6 -4 F22-4 4 AD V 0.27 0.19 0.26 1.11 0.52 0.59 0.30 0.45 0.68 1.01 1.31 3.1 3.5 6.9 VC1| 0.25 0 4 0.18 0.281 0.947 0.39 0.986 0.293 0.54 0.43 1.76 0.748 1.1 1.1 2.8 VC1| 1 0.33 0.19 0.26 1.01 0.52 0.59 0.30 0.45 0.66 1.78 0.71 1.5 1.3 1.5 VC1| 2 0.21 0.18 0.26 1.00 0.52 0.59 0.30 0.45 0.68 1.72 0.54 1.4 1.7 2.4 VC1| 3 0.33 0.18 0.26 1.12 0.52 0.59 0.30 0.45 0.68 1.70 0.54 1.3 2.7 5.8 VC1| 4 0.21 0.19 0.26 1.23 0.52 0.59 0.30 0.45 0.68 1.04 0.69 1.8 3.2 5.7 VC1| 5 0.39 0.19 0.26 1.10 0.52 0.59 0.29 0.44 0.68 1.01 0.72 2.3 3.7 5.7 VC2| 0 0.32 0.18 0.51 0.96 0.37 0.89 0.19 0.65 0.59 1.88 0.80 1.3 1.2 3.6 VC2| 1 0.21 0.19 0.26 0.96 0.52 0.59 0.30 0.45 0.65 1.80 0.64 1.5 1.3 1.3 VC2| 2 0.21 0.19 0.26 1.12 0.52 0.59 0.30 0.45 0.68 1.72 0.54 1.4 1.6 1.9 VC2| 3 0.21 0.19 0.26 1.15 0.52 0.59 0.30 0.44 0.68 1.76 0.54 1.5 2.7 7.0 VC2| 4 0.33 0.19 0.26 1.16 0.52 0.59 0.30 0.46 0.69 0.97 0.76 1.9 3.3 5.8 VC2| 5 0.21 0.19 0.26 1.18 0.52 0.59 0.30 0.45 0.68 1.00 0.73 2.4 3.8 5.6

Those highlighted in green are less than 2.0 Å b.)

1OP 1CL 1CL 3BZ 3C6 PDB 3 1UZ8 1M7D 1MFA 1MFD 1MFE 1S3K Y Z 1MFC 1M7I 1MFB 4 S 2G1 291-2G3- SYA/J scFv SE155- Fab SE155- SE155- HU3S19 BR9 BR9 SE155- SYA/J SE155- F22- F22- Ab 2 A 6 4 4 4 3 6 6 4 6 4 4 4

ADV 0.2 0.0 0.0 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.5 1.2

VC1|0 0.0 0.0 0.0 0.1 0.0 0.3 0.0 0.0 0.0 0.2 0.1 0.1 0.2 0.9

VC1|1 0.2 0.0 0.0 0.3 0.0 0.0 0.0 0.0 0.0 0.2 0.1 0.1 0.1 0.7

VC1|2 0.0 0.0 0.0 0.3 0.0 0.0 0.0 0.0 0.0 0.2 0.0 0.5 0.6 1.5

VC1|3 0.2 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.3 0.0 0.4 0.9 0.3

VC1|4 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.4 0.3 0.4 0.2 0.6 VC1| 5 0.3 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.5 0.2 0.1

VC2|0 0.0 0.0 0.4 0.0 0.0 0.3 0.0 0.1 0.1 0.1 0.0 0.3 0.1 1.0

VC2|1 0.0 0.0 0.0 0.3 0.0 0.0 0.0 0.0 0.1 0.1 0.1 0.1 0.1 0.0

VC2|2 0.0 0.0 0.0 0.3 0.0 0.0 0.0 0.0 0.0 0.2 0.0 0.4 0.5 1.3

VC2|3 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.2 0.0 0.3 0.9 2.3

VC2|4 0.2 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.4 0.4 0.4 0.4 0.7

VC2|5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.5 0.1 0.3

138

S5.2. A list of all protein-carbohydrate crystal structures employed in the

study. The carbohydrate sequences have been obtained using the pdbcare tool

in glycosciences.de.

S.No. PDB ID LINUCS

1 1hql a-D-Galp-(1-3)-b-D-Galp-(1-1)-methyl

2 1lte b-D-Galp-(1-4)-b-D-Glcp

3 1niv a-D-Manp-(1-3)-a-D-Manp-(1-1)-methyl

4 1qos a-D-GlcpNAc-(1-4)-b-D-GlcpNAc

5 1slt b-D-Galp-(1-4)-a-D-GlcpNAc

6 2aai b-D-Galp-(1-4)-b-D-Glcp

7 2ovu a-D-Manp-(1-2)-a-D-Manp-(1-1)-methyl

8 2pel b-D-Galp-(1-4)-a-D-Glcp

b-D-Galp-(1-4)-b-D-Glcp

9 3o0x a-D-Glcp-(1-3)-a-D-Manp-(1-2)-a-D-Manp-(1-2)-a-D-Manp

10 4g1r a-D-Manp-(1-2)-a-D-Manp

11 4g1s a-D-Manp-(1-2)-a-D-Manp

12 1jpc a-D-3,6-deoxy-Manp

a-D-Manp-(1-6)+

|

a-D-Manp

|

a-D-Manp-(1-3)+

13 1qot a-L-Fucp-(1-2)-b-D-Galp

a-L-Fucp-(1-2)-b-D-Galp-(1-4)-b-D-Glcp

14 1sl6 b-D-Galp-(1-4)+

|

a-D-GlcpNAc

|

139

a-L-Fucp-(1-3)+

15 2auy b-D-GlcpNAc-(1-2)-a-D-Manp-(1-3)-a-D-Manp-(1-1)-methyl

16 2bos a-D-Galp-(1-4)-b-D-Galp-(1-4)-b-D-Glcp-(1-1)-butyl

a-D-Galp-(1-4)-b-D-Galp-(1-4)-b-D-Glcp

a-D-Galp-(1-4)-b-D-Galp

17 2e6v a-D-Manp-(1-2)-a-D-Manp-(1-3)-b-D-Manp

18 2eal a-D-GalpNAc-(1-3)-b-D-GalpNAc-(1-3)-b-D-Galp

19 2g7c a-D-Galp-(1-3)-b-D-Galp-(1-4)-b-D-GlcpNAc

20 2vxj a-D-Galp-(1-3)-b-D-Galp-(1-4)-b-D-Glcp

a-D-Galp-(1-3)-b-D-Galp-(1-4)-a-D-Glcp

a-D-Galp-(1-4)-D-1-deoxy-Galp

21 3ef2 a-D-Galp-(1-3)+

|

b-D-Galp

|

a-L-Fucp-(1-2)+

a-D-Galp-(1-3)+

|

a-D-Galp

|

a-L-Fucp-(1-2)+

22 1gsl a-L-Fucp-(1-2)-b-D-Galp-(1-4)+

|

b-D-GlcpNAc-(1-1)-methyl

|

a-L-Fucp-(1-3)+

23 1j8r b-D-GalpNAc-(1-3)-a-D-Galp-(1-4)-b-D-Galp-(1-4)-b-D-Glcp

24 1led a-L-Fucp-(1-4)+

|

b-D-GlcpNAc-(1-1)-methyl

|

a-L-Fucp-(1-2)-b-D-Galp-(1-3)+

140

25 1ulf a-D-GalpNAc-(1-3)+

|

b-D-Galp-(1-4)-b-D-Glcp

|

a-L-Fucp-(1-2)+

26 1w8f b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)+

|

b-D-Glcp

|

a-L-Fucp-(1-3)+

b-D-Galp-(1-4)-b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)+

|

b-D-Glcp

|

a-L-Fucp-(1-3)+

b-D-Galp-(1-4)+

|

b-D-Glcp

|

a-L-Fucp-(1-3)+

27 2zhk b-D-Galp-(1-4)-b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)-b-D-GlcpNAc

28 3lek a-L-Fucp-(1-4)+

|

a-D-GlcpNAc

|

a-L-Fucp-(1-2)-b-D-Galp-(1-3)+

29 3o0w a-D-Glcp-(1-3)-a-D-Manp-(1-2)-a-D-Manp-(1-2)-a-D-Manp

30 3wg3 a-D-GalpNAc-(1-3)+

|

b-D-Galp-(1-4)-b-D-GlcpNAc

|

a-L-Fucp-(1-2)+

141

31 3zwe a-D-Galp-(1-3)+

|

b-D-Galp-(1-4)-b-D-Glcp

|

a-L-Fucp-(1-2)+

32 4gwi a-L-Fucp-(1-2)-b-D-Galp-(1-4)+

|

b-D-GlcpNAc

|

a-L-Fucp-(1-3)+

33 4mrd b-D-GlcpA-(1-3)-b-D-GlcpNAc-(1-4)-b-D-GlcpA-(1-3)-D-1-deoxy-GlcpNAc

34 1k9i b-D-GlcpNAc-(1-2)-a-D-Manp-(1-6)+

|

a-D-Manp

|

b-D-GlcpNAc-(1-2)-a-D-Manp-(1-3)+

35 1tei b-D-GlcpNAc-(1-2)-a-D-Manp-(1-6)+

|

a-D-Manp

|

b-D-GlcpNAc-(1-2)-a-D-Manp-(1-3)+

36 1zhs a-D-Manp-(1-6)+

|

b-D-Manp-(1-4)-b-D-GlcpNAc-(1-4)-b-D-GlcpNAc

|

a-D-Manp-(1-3)+

37 2o2l a-D-GalpNAc-(1-3)+

|

b-D-Galp-(1-4)+

| |

a-L-Fucp-(1-2)+ b-D-Glcp

|

142

a-L-Fucp-(1-3)+

38 2vco a-D-Manp-(1-6)+

|

b-D-Manp-(1-4)-b-D-GlcpNAc-(1-4)-b-D-GlcpNAc

|

a-D-Manp-(1-3)+

39 4gk9 a-D-Manp-(1-6)+

|

a-D-Manp-(1-6)+

| |

a-D-Manp-(1-3)+ b-D-Manp

|

a-D-Manp-(1-3)+

b-D-Galp-(1-4)-b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)-b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)-b- 40 2wt2 D-GlcpNAc

b-D-Galp-(1-4)-b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)-b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)-b- 41 2zhm D-GlcpNAc

b-D-Galp-(1-4)-b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)-b-D-GlcpNAc

42 2vuz b-D-GlcpNAc-(1-2)-a-D-Manp-(1-6)+

|

b-D-Manp-(1-4)-b-D-GlcpNAc-(1-4)-b-D-GlcpNAc

|

b-D-GlcpNAc-(1-2)-a-D-Manp-(1-3)+

43 2ygm a-D-Galp-(1-3)+

|

b-D-Galp-(1-4)-b-D-GlcpNAc

|

a-L-Fucp-(1-2)+

44 1j84 b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp

45 2i74 a-D-Manp-(1-6)+

|

a-D-Manp-(1-6)-a-D-Manp

143

|

a-D-Manp-(1-3)+

46 2j1t a-L-Fucp-(1-2)-b-D-Galp-(1-4)+

|

b-D-GlcpNAc

|

a-L-Fucp-(1-3)+

47 2j1u a-D-GalpNAc-(1-3)+

|

b-D-Galp-(1-4)-b-D-Glcp

|

a-L-Fucp-(1-2)+

48 2j72 a-D-Glcp-(1-4)-a-D-Glcp-(1-4)-a-D-Glcp-(1-4)-a-D-Glcp

49 2j73 a-D-Glcp-(1-6)-a-D-Glcp-(1-4)-a-D-Glcp-(1-4)-a-D-Glcp

a-D-Glcp-(1-4)-a-D-Glcp-(1-4)-a-D-Glcp

50 3ach b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp

51 1gu3 b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp

52 1of4 b-D-Manp-(1-4)-b-D-Manp-(1-4)-b-D-Manp-(1-4)-b-D-Manp-(1-4)-b-D-Manp

53 1uxx b-D-Xylp-(1-4)-b-D-Xylp-(1-4)-b-D-Xylp-(1-4)-b-D-Xylp-(1-4)-b-D-Xylp

54 3aci b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp

55 2y6l b-D-Xylp-(1-4)-b-D-Xylp-(1-4)-b-D-Xylp-(1-4)-b-D-Xylp-(1-4)-b-D-Xylp

56 2yfz b-D-Galp-(1-2)-a-D-Xylp-(1-6)+

|

b-D-Glcp-(1-4)-b-D-Glcp

|

b-D-Glcp-(1-4)+

57 2zex b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp

b-D-Manp-(1-4)-b-D-Manp-(1-4)-b-D-Manp-(1-4)-b-D-Manp-(1-4)-b-D-Manp-(1-4)-b-D- 58 1gwl Manp

59 1gwm b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-a-D-Glcp

60 1oh3 b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-a-D-Glcp

61 1oh4 a-D-Galp-(1-6)+

144

|

a-D-Galp-(1-6)+ b-D-Manp-(1-4)-b-D-Manp-(1-4)-b-D-Manp

| |

b-D-Manp-(1-4)+

|

b-D-Manp-(1-4)+

62 2ypj a-D-Xylp-(1-6)+

|

a-D-Xylp-(1-6)+ b-D-Glcp-(1-4)-b-D-Glcp

| |

b-D-Glcp-(1-4)+

|

a-D-Xylp-(1-6)-b-D-Glcp-(1-4)+

63 2eo7 b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Manp

64 2eex b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp

65 2ej1 b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp

66 1mfa a-D-3-deoxy-Fucp-(1-3)+

|

a-D-Manp-(1-1)-methyl

|

a-D-Galp-(1-2)+

67 1mfd a-D-3-deoxy-Fucp-(1-3)+

|

a-D-Manp-(1-1)-methyl

|

a-D-Galp-(1-2)+

68 1mfc a-D-3-deoxy-Fucp-(1-3)+

|

a-D-Manp-(1-4)-a-L-Rhap

|

a-D-Galp-(1-2)+

69 1mfb a-D-3-deoxy-Fucp-(1-3)+

145

|

a-D-Manp-(1-4)-a-L-Rhap

|

a-D-Galp-(1-2)-a-D-Manp-(1-4)-a-L-Rhap-(1-3)-a-D-Galp-(1-2)+

70 1mfe a-D-3-deoxy-Fucp-(1-3)+

|

a-D-Manp

|

a-D-Galp-(1-2)+

71 1s3k a-L-Fucp-(1-2)-b-D-Galp-(1-4)+

|

a-D-GlcpNAc

|

a-L-Fucp-(1-3)+

72 1uz8 b-D-Galp-(1-4)+

|

b-D-GlcpNAc-(1-1)-methyl

|

a-L-Fucp-(1-3)+

73 1m7d a-L-Rhap-(1-3)-a-L-2,6-deoxy-Glcp-(1-3)-b-D-GlcpNAc-(1-1)-methyl

74 1m7i a-L-Rhap-(1-2)-a-L-Rhap-(1-3)-a-L-Rhap-(1-3)-b-D-GlcpNAc-(1-2)-a-L-Rhap-(1-1)-methyl

75 3bz4 a-D-Glcp-(1-4)+

|

a-D-Glcp-(1-4)+ a-L-Rhap-(1-3)-b-D-GlcpNAc-(1-

1)-methyl

| |

a-L-Rhap-(1-3)-b-D-GlcpNAc-(1-2)-a-L-Rhap-(1-2)-a-L-Rhap-(1-3)+

|

a-L-Rhap-(1-2)-a-L-Rhap-(1-3)+

76 3c6s a-D-Glcp-(1-4)+

|

a-D-Glcp-(1-4)+ a-L-Rhap-(1-3)-b-D-GlcpNAc-(1-

146

2)-L-1-deoxy-Rhap

| |

a-L-Rhap-(1-3)-b-D-GlcpNAc-(1-2)-a-L-Rhap-(1-2)-a-L-Rhap-(1-3)+

|

a-L-Rhap-(1-2)-a-L-Rhap-(1-3)+

77 1cly a-L-Fucp-(1-2)-b-D-Galp-(1-4)+

|

b-D-GlcpNAc-(1-1)-

|

a-L-Fucp-(1-3)+

78 1clz a-L-Fucp-(1-2)-b-D-Galp-(1-4)+

|

b-D-GlcpNAc-(1-1)-

|

a-L-Fucp-(1-3)+

79 1op3 a-D-Manp-(1-2)-a-D-Manp

b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1- 80 2eqd 4)-b-D-Glcp-(1-4)-b-D-Glcp

Systems which failed the positive control test.

1 1hlc b-D-Galp-(1-4)-b-D-Glcp

2 2dur a-D-Manp-(1-2)-a-D-Manp

3 3a0e a-D-Manp-(1-3)-a-D-Manp

4 2zx4 a-D-Galp-(1-4)-b-D-Galp-(1-4)-b-D-Glcp

5 1sl4 a-D-Manp-(1-6)+

|

a-D-Manp-(1-6)-a-D-Manp

|

a-D-Manp-(1-3)+

6 2zl6 a-L-Fucp-(1-2)-b-D-Galp-(1-3)-b-D-GlcpNAc-(1-3)-b-D-Galp-(1-4)-b-D-Glcp

7 1sl5 b-D-Galp-(1-4)+

|

b-D-GlcpNAc-(1-3)-b-D-Galp

147

|

a-L-Fucp-(1-3)+

8 4afd b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp-(1-4)-b-D-Glcp

9 2j1v a-L-Fucp-(1-2)-b-D-Galp-(1-4)-b-D-GlcpNAc

b-D-GlcpNAc-(1-4)-b-D-GlcpA-(1-3)-b-D-GlcpNAc-(1-4)-b-D-GlcpA-(1-3)-b-D-GlcpNAc- 10 2jcq (1-4)-b-D-GlcpA-(1-3)-b-D-GlcpNAc

11 2yg0 a-D-Galp-(1-6)-b-D-Manp

148

Supplementary Information Chapter 6

S6.1. PDB IDs of Lectin-Carbohydrate Systems used to test the CH/π

interaction energy function.

Accurate Predictions Made? Yes No 1led 2auy 2gvy 1cxf 1veo 2bos 1eo5 1qos 3azs 2ovu 1hlc 2gou 2aai 1lte 2vuz 4gwi 1itc 4g1r 2zl6 2zx4 1gsl 1tei 1sl6 2pel 4g1s 1k9i 1gwl 2g7c 2vxj 1pj9 3o0w 1jpc 1hql 3o0x 1ulf 3pfz 1niv 1w8f VC 1|2 1mxd 2o2l 1sl4 3wg3 2dur 1vbp 1j8r 3lek 1sl5 2wdb 1zhs 3a0e

4gk9 2e6v 3ef2 1uh3 2zhk 1qot

2vco 2wt2 2zhm 3zwe

1veo 1cxf 1hlc 2gou 1led 1lte 2gvy 1niv 3azs 1qos 2vuz 1ulf 2aai 1tei 2pel 3lek 1gsl 1zhs 2vco 4gwi ADV 1pj9 2auy 1eo5 2vxj 3zwe 2bos 4gk9 2zx4 1mxd 2dur 1jpc 3wg3 2g7c 2e6v

1hql 2o2l 1k9i

149

3pfz 2ovu 1sl6 1gwl 3o0w 1w8f 1itc 3o0x 1sl5 1qot 4g1r 1sl4 1uh3 4g1s 1vbp 2wdb 3a0e

3ef2 2zl6

2wt2

2zhk

1j8r

2zhm

S6.2. CHI energy functions to score the ω glycosidic torsion angle in

1,6-linkages.

Using the GlyTorsion tool available at www.glycosciences.de, the distribution of ω glycosidic torsion angles for 1,6 linkages were collected.

35

30

25

20

15 Equatorial O4 Axial O4

10 Distribution of Structures [%] Structures of Distribution 5

0

0

80 60 40 20

-20 -40 -60 -80

160 140 120 100

-140 -100 -120 -160 -180 ω [ ]

150

Figure S6.2: The distribution of ω glycosidic angles from carbohydrate crystal structures with 1,6-linkages divided into two sets based on the position of attachment

(equatorial/axial) of the O4 atom to the reducing sugar.

The dataset was divided into two based on whether the O4 atom forming a part of the reducing sugar is attached to the plane of the ring axially or equatorially. Based on the three possible rotamers, three parabolas were joined to form the CHI energy equation for these linkages. The relative energies of the minima for each of the three parabolic curves were determined using the crystal structure data, by making use of the formula to determine the Boltzmann factor by using the Boltzmann distribution for two states. The equations thus obtained are as follows:

퐸 = 푘 ∗ (푥 − 휃)2 + 푏 where, k = 0.0025.

When O4 is equatorially attached to the plane of the carbohydrate ring, and if x ∈ [0 – 120], θ = 60, b = 0.21, and if x ∈ (120 – 240], θ = 180, b = 1.39, and if x ∈ (240 – 360], θ = 300, b = 0.

When O4 is axially attached to the plane of the carbohydrate ring, and if x ∈ [0 – 120], θ = 60, b = 0, and if x ∈ (120 – 240], θ = 180, b = 0.3, and if x ∈ (240 – 360], θ = 300, b = 1.0.

151

S6.3. Gridbox Centers of Test Systems

1gsl 1veo 2g7c 2vuz 1tei 3azs 2zx4 4gk9 center_x 30.779 31.309 26.162 -10.236 37.595 25.555 39.534 5.942 center_y 14.986 13.767 14.95 -34.789 4.058 -0.181 20.789 2.899 center_z 32.023 40.95 4.176 3.215 -43.079 7.932 46.443 53.234 1jpc 2gou 2o2l 2wt2 1ulf 1eo5 3a0e 3pfz center_x 56.063 8.405 15.19 22.86 60.265 85.067 20.259 -22.802 center_y 45.229 -8.714 -55.302 6.174 3.52 60.767 -37.806 30.906 center_z 25.17 22.111 -15.243 12.353 5.723 44.647 3.05 -19.125 1k9i 2gvy 2ovu 3ef2 1vbp 1j8r 3lek 1cxf center_x 8.948 28.687 25.16 2.304 109.791 15.851 4.052 44.796 center_y 46.79 23.377 20.968 64.669 108.52 12.655 4.27 89.74 center_z 44.049 -19.395 21.031 3.633 127.115 62.124 19.329 47.379 1niv 1gwl 2vxj 3wg3 1w8f 1qos 3o0w 1hlc center_x 113.529 8.285 16.514 54.63 -2.792 -15.015 -18.2 11.686 center_y 52.413 -10.664 -30.606 -40.764 4.137 43.955 -18.233 32.302 center_z 138.995 27.185 115.835 -3.111 34.364 31.99 14.748 84.653 1sl4 1pj9 2zhk 3zwe 2aai 1qot 3o0x 1hql center_x 41.291 41.326 7.879 -5.218 31.715 77.957 31.2 23.809 center_y 28.944 85.533 42.544 11.166 41.265 8.421 11.371 8.536 center_z 16.404 46.799 7.094 -1.876 11.239 29.285 -16.511 12.031 1sl5 1uh3 2zhm 4g1r 2bos 1zhs 4gwi 1led center_x 29.388 37.554 -5.21 -1.592 14.64 13.899 -3.557 30.923 center_y -7.683 25.783 -5.945 -7.611 3.347 18.066 -3.798 14.723 center_z -4.434 49.609 14.265 11.645 56.284 -12.694 -20.05 31.526 1sl6 2wdb 2zl6 4g1s 2dur 2pel 1itc 1lte center_x 127.995 40.041 25.33 10.35 114.805 56.016 16.854 18.882 center_y 48.794 21.235 40.009 -12.03 16.861 27.792 -18.317 -2.108 center_z 38.953 14.007 -18.489 7.428 48.688 64.361 -12.045 36.227 2e6v 2vco 1mxd 2auy center_x 36.602 51.993 24.973 3.166 center_y 9.703 2.279 31.414 43.303 center_z 39.651 -20.137 1.73 25.044

152