<<

Structural insights into host cell adhesion by MIC4

by

Ben Cowper

Imperial College London Department of Life Sciences

Submitted (2012) in partial fulfilment of the requirements for the degree of Doctorate of Philosophy

Declaration

I hereby declare that the research described in this thesis is the result of my own work, except where indicated in the text. Information derived from the published and unpublished work of others has been acknowledged and a list of references is provided.

Ben Cowper

2

Abstract

Toxoplasma gondii is a highly pervasive protozoan parasite, capable of infecting almost all mammals, including humans. Infection can have fatal consequences for immuno-suppressed individuals, whilst transmission to a developing fetus can induce a spontaneous abortion in pregnant females. T. gondii is perceived to be a model organism in the study of its phylum, , which includes the species which cause . Apicomplexans are obligate intracellular parasites, in which a strong host cell attachment is established through surface microneme proteins (MICs). Numerous MICs have been identified in T. gondii, many forming multi-adhesive complexes, such as TgMIC1/4/6, within which TgMIC4 is known to possess host cell binding activity within its C-terminal apple-5 & 6 (A56) domains.

Prior to these studies, recombinant TgMIC4-A56 was produced yielding a partially-folded protein capable of binding to galactose. Herein the folded component has been identified as A5, and the solution structure of this has been solved via NMR spectroscopy. Carbohydrate microarray experiments have confirmed an ability to bind galactosyl-terminated oligosaccharides, whilst NMR and ITC experiments have enabled extensive characterisation of TgMIC4-A5 binding to a range of ligands. The domain binds particularly strongly to a pentasaccharide fragment from GM1 ganglioside; a potential in vivo receptor. Additionally, chemical shift perturbation data and intermolecular NOEs have been used to drive molecular docking of TgMIC4-A5 and lacto-N-biose (Galβ1→3GlcNac). The resulting structure suggests that the mechanism of galactose discrimination by TgMIC4-A5 is similar to that of other galactose-specific lectins.

Combined with collaborating studies, this work aids our overall understanding of TgMIC4 function and encourages speculation as to the precise roles of the protein within T. gondii. In addition to a probable contributory role in the initial stages of host cell invasion, the protein may modulate events downstream of this process, through interactions with galactosylated receptors.

3

Table of Contents

List of figures ...... 9 List of tables ...... 13 Acknowledgements ...... 15

Chapter 1: Biological Introduction ...... 20 1.1. An introduction to Toxoplasma gondii ...... 21 1.1.1. Identification and classification...... 21 1.1.2. Life cycle ...... 22 1.2. ...... 25 1.2.1. Disease transmission and consequences ...... 25 1.2.2. Disease prevalence and prevention ...... 27 1.3. Apicomplexan host cell invasion ...... 30 1.3.1. Cell ultrastructure and morphology ...... 30 1.3.2. The glideosome; a unique system for motility ...... 32 1.3.3. The moving junction ...... 34 1.3.4. The model of Apicomplexan host cell invasion ...... 34 1.3.5. Targeting host cell invasion ...... 37 1.4. Apicomplexan microneme proteins ...... 37 1.4.1. Oligomeric microneme protein complexes ...... 39 1.4.2. Apple/PAN domains ...... 41 1.4.3. The TgMIC1/4/6 complex ...... 47 1.5. Project aims & objectives ...... 53

Chapter 2: An Introduction to Nuclear Magnetic Resonance (NMR) Spectroscopy ...... 54 2.1. Introduction ...... 55 2.2. The origins of NMR signals ...... 55 2.2.1. Nuclear spin angular momentum and magnetism ...... 55 2.2.2. Larmor precession and resonance ...... 56 2.3. The excitation of NMR signals ...... 57 2.3.1. The vector model ...... 58 2.3.2. The rotating frame ...... 58 2.3.3. The effect of a pulse ...... 59

4

2.3.4. Chemical shifts ...... 60 2.3.5 Relaxation ...... 62 2.3.6. NMR signal detection ...... 65 2.3.7. Putting it together; pulse-acquire NMR ...... 65 2.4. Nuclear coupling ...... 66 2.4.1. Scalar coupling (and decoupling) ...... 66 2.4.2. Dipolar coupling ...... 67 2.5. The nuclear Overhauser effect (NOE) ...... 67 2.6. Multi-dimensional NMR ...... 70 2.6.1. Two-dimensional techniques ...... 70 2.6.2. Three-dimensional NMR ...... 75 2.7. Summary ...... 78

Chapter 3: Identification and production of a stable, adhesive fragment from TgMIC4-A56 ...... 79 3.1. Introduction ...... 80 3.2. Considerations for MIC protein production ...... 80 3.2.1. pET-32 Xa/LIC vector ...... 81 3.2.2. Origami cells ...... 82 3.3. Materials & methods ...... 82 3.3.1. Sequence analysis ...... 82 3.3.2. Gene cloning of TgMIC4-A5 and TgMIC4-A6 ...... 83 3.3.3. Protein expression ...... 86 3.3.4. Protein purification ...... 87 3.3.5. NMR analysis ...... 89 3.3.6. Identification of the folded fragment of TgMIC4-A56 ...... 89 3.4. Results...... 90 3.4.1. Sequence analysis ...... 90 3.4.2. Identification of the stable adhesive fragment within TgMIC4-A56 ...... 92 3.4.3. Initial characterisation of recombinant TgMIC4-A5 and TgMIC4-A6 ...... 101 3.5. Discussion & concluding remarks ...... 111

Chapter 4: Solution structure determination of TgMIC4-A5 ...... 113 4.1. Introduction ...... 114 4.2. Experimental requirements for solution structure determination ...... 114 4.3. Materials & methods ...... 114

5

4.3.1. Experimental set-up and data acquisition ...... 114 4.3.2. Chemical shift assignment...... 115 4.3.4. Structure calculation ...... 118 4.4. Results...... 120 4.4.1. Chemical shift assignment...... 120 4.4.2. Solution structure calculation of TgMIC4-A5 ...... 129 4.4.3. Solution structure validation ...... 129 4.5. Discussion ...... 141 4.5.1. The α1-β3-β4 hairpin loop ...... 141 4.5.2. Ring current shifts ...... 141 4.5.3. Homology with other apple/PAN domains ...... 144 4.5.4. The N-/C-terminal disulphide linkage ...... 145 4.5.5. Concluding remarks ...... 145

Chapter 5: Preliminary studies of carbohydrate-binding by TgMIC4-A5 ...... 146 5.1. Introduction ...... 147 5.2. Prior collaborative studies; carbohydrate microarray analysis ...... 147 5.2.1. Carbohydrate microarray data for TgMIC4-A56 ...... 148 5.2.2. Carbohydrate microarray data for TgMIC4-A5 ...... 148 5.3. Materials & methods ...... 152 5.3.1. NMR titration experiments...... 152 5.3.2. Determination of dissociation constants ...... 153 5.4. Results...... 158 5.4.1. NMR titration data ...... 158 5.4.2. Determination of the ligand-binding interface of TgMIC4-A5 ...... 164 5.4.3. Determination of dissociation constants ...... 170 5.4.4. Co-operative GM1-penta binding by TgMIC4-A5 and TgMIC1-NT ...... 175 5.5. Discussion ...... 178 5.5.1. Comparison of NMR and microarray data ...... 178 5.5.2. Sequence & structure comparison with other apple domains ...... 178 5.5.3. Concluding remarks ...... 180

Chapter 6: Calculation of an TgMIC4-A5/lacto-N-biose complex structure...... 182 6.1. Introduction ...... 183 6.2. Considerations for structure determination of protein/carbohydrate complexes ...... 183

6

6.2.1. X-ray crystallography ...... 183 6.2.2. NMR spectroscopy ...... 183 6.2.3. Application to the study of TgMIC4-A5/carbohydrate complexes ...... 185 6.3. Materials & Methods ...... 186 6.3.1. Intermolecular NOE detection and assignment ...... 186 6.3.2. Structure calculation of a TgMIC4-A5/lacto-N-biose model ...... 187 6.3.3. Mutagenesis studies ...... 190 6.4. Results...... 191 6.4.1. The 13C-edited NOESY spectrum of TgMIC4-A5/lacto-N-biose ...... 191 6.4.2. TgMIC4-A5 aromatic chemical shift re-assignment ...... 192 6.4.3. Lacto-N-biose 1H chemical shift assignment ...... 192 6.4.4. Completed iNOE assignments ...... 201 6.4.5. TgMIC4-A5/lacto-N-biose structure calculation using HADDOCK ...... 201 6.4.6. Mutagenesis studies...... 207 6.5. Discussion...... 216 6.5.1. A possible mechanism of galactose binding and discrimination by TgMIC4-A5...... 216 6.5.2. The contribution of saccharide units preceding galactose...... 217 6.5.3. Discrimination of β1,3 and β1,4-linked Gal/GlcNAc...... 218 6.5.4. Comparison with the crystal structure of a SML-2/galactose complex...... 219 6.5.5. Concluding remarks...... 220

Chapter 7: Conclusions & future perspectives...... 222 7.1. Summarising the biological problem and project aims ...... 223 7.2. Summarising the findings of these studies ...... 224 7.3. The in vivo role of TgMIC4 ...... 224 7.3.1. Binding to ganglioside receptors ...... 224 7.3.2. Contribution to host-cell invasion ...... 226 7.3.3. A possible modulator of downstream events ...... 228 7.4. Additional studies of TgMIC4 and the TgMIC1/4/6 complex ...... 229 7.4.1. Structural and functional analysis of TgMIC4-A12 ...... 229 7.4.2. Domain organisation of TgMIC4...... 230 7.4.3. The architecture of the TgMIC1/4/6 complex...... 232 7.4.4. The role of TgMIC4-A3 ...... 233 7.4.5. Implications for the model of the TgMIC1/4/6 complex ...... 235 7.5. Future perspectives ...... 237

7

Appendices ...... 239 Table of References ...... 281

8

List of figures

Figure 1.1. the life cycle of Toxoplasma gondii……………………………………………………………………………… 24 Figure 1.2: the ultrastructure of a typical Toxoplasma gondii tachyzoite……………………………………… 31 Figure 1.3: the T. gondii pellicle and glideosome.. ……………………………………………………………………….. 34 Figure 1.4: the mode of host cell invasion by T. gondii. ………………………………………………………………...35 Figure 1.5: domain structures of selected apicomplexan microneme proteins………………………...... 38 Figure 1.6: MIC protein complexes in T. gondii…………………………………………………………………………….. 39 Figure 1.7: a sequence alignment of thirty-five known apple/N domain sequences…………………….. 43 Figure 1.8: structural analysis of apple/N domains………………………………………………………………………. 44 Figure 1.9: ligand-binding by apple/N domains……………………………………………………………………………..46 Figure 1.10: atomic structures of TgMIC1…………………………………………………………………………………….. 48 Figure 1.11: atomic structures of TgMIC6-EGF2……………………………………………………………………………. 50 Figure 1.12: TgMIC4 domain organisation and structure……………………………………………………………….50 Figure 1.13: the current model of the TgMIC1/4/6 complex………………………………………………………... 52

Figure 2.1: the effect of an external magnetic field upon a spin-½ nucleus…………………………………… 57 Figure 2.2: the vector model of NMR……………………………………………………………………………………………. 58

Figure 2.3: the effective field, Beff…………………………………………………………………………………………………. 60 Figure 2.4: chemical shifts in the rotating frame…………………………………………………………………………… 62 Figure 2.5: spin relaxation in the transverse plane……………………………………………………………………….. 64 Figure 2.6: time and frequency domains. …………………………………………………………………………………….. 65 Figure 2.7: a schematic depiction of a 1D NMR experiment. ……………………………………………………….. 65 Figure 2.8: the origin of the NOE. ………………………………………………………………………………………………… 68 Figure 2.9: generating a second dimension in a 2D NMR experiment. ………………………………………... 71 Figure 2.10: the pulse sequence of a basic 2D homonuclear 1H-1H TOCSY experiment………………...72 Figure 2.11: the pulse sequence of a basic 2D NOESY experiment. ……………………………………………… 73 Figure 2.12: the pulse sequence of a HSQC experiment……………………………………………………………….. 74 Figure 2.13: Three-dimensional NMR spectrscopy. ……………………………………………………………………… 75 Figure 2.14: magnetisation transfer during triple-resonance NMR experiments………………………….. 77

Figure 3.1: the pET-32 Xa/LIC vector map. …………………………………………………………………………………… 81

9

Figure 3.2: the features of a pET-32 Xa/LIC vector expression product. ………………………………………. 82 Figure 3.3: the ligation-independent cloning (LIC) strategy.…………………………………………………………. 84 Figure 3.4: secondary structure prediction of TgMIC4 residues 410-580.……………………………………… 91 Figure 3.5: SDS-PAGE analysis of TgMIC4-A56/trx expression. …………………………………………………….. 92 Figure 3.6: SDS-PAGE analysis of TgMIC4-A56 purification…………………………………………………………… 93 Figure 3.7: NMR analysis of TgMIC4-A56……………………………………………………………………………………… 96 Figure 3.8: SDS-PAGE analysis of the trypsin digestion reaction…………………………………………………… 97 Figure 3.9: the gel filtration profile of trypsin-digested TgMIC4-A56……………………………………………. 98 Figure 3.10: peptide mass fingerprinting of trypsin-digested TgMIC4-A56…………………………………… 99 Figure 3.11: The 1H-15N HSQC spectrum of trypsin-digested TgMIC4-A56…………………………………….. 100 Figure 3.12: demonstrating the galactose-binding ability of digested TgMIC4-A56………………………. 101 Figure 3.13: sub-cloning of TgMIC4-A5 and TgMIC4-A6………………………………………………………………… 102 Figure 3.14: SDS-PAGE analysis of TgMIC4-A5/trx & TgMIC4-A6/trx expression………………………….. 103 Figure 3.15: purification of TgMIC4-A5…………………………………………………………………………………………. 104 Figure 3.16: digestion of TgMIC4-A5/trx fusion protein with TEV protease…………………………………. 105 Figure 3.17: purification of TgMIC4-A6…………………………………………………………………………………………. 106 Figure 3.18: NMR analysis of TgMIC4-A5……………………………………………………………………………………… 108 Figure 3.19: NMR analysis of TgMIC4-A6……………………………………………………………………………………… 109 Figure 3.20: Comparison of TgMIC4-A5 and TgMIC4-A56……………………………………………………………… 110

Figure 4.1: backbone amide chemical shift assignments for TgMIC4-A5………………………………………. 121 Figure 4.2: strips from the CBCA(CO)NH and HNCACB spectra of TgMIC4-A5……………………………….. 122 Figure 4.3: the chemical shift index (CSI) of TgMIC4-A5……………………………………………………………….. 123 Figure 4.4: aliphatic 1H side-chain assignment using H(C)CH-TOCSY data……………………………………… 126 Figure 4.5: assignment of aromatic side-chain nuclei…………………………………………………………………… 127 Figure 4.6: side-chain amide assignment using 15N-NOESY-HSQC data…………………………………………. 128 Figure 4.7: the solution structure of TgMIC4-A5…………………………………………………………………………… 131 Figure 4.8: the effects of invoking the C10-C79 disulphide linkage……………………………………………….. 132 Figure 4.9: RMSD and NOE-derived restraint per residue analysis……………………………………………….. 135 Figure 4.10: Ramachandran plots of backbone torsion angles (φ/ψ) from the TgMIC4-A5 structure ensemble……………………………………………………………………………………………………………………… 137 Figure 4.11: Ramachandran plots of backbone torsion angles (φ/ψ) from the average

10 energy-minimised TgMIC4-A5 structure………………………………………………………………………………………. 138 Figure 4.12: the configuration of the L32 side-chain…………………………………………………………………….. 139 Figure 4.13: χ1 vs χ2 plots for a selection of TgMIC4-A5 residues…………………………………………………. 130 Figure 4.14: the α1-β3-β4 hairpin loop of TgMIC4-A5…………………………………………………………………… 141 Figure 4.15: ring current shifted nuclei in TgMIC4-A5…………………………………………………………………… 143

Figure 5.1: carbohydrate microarray analysis of TgMIC4-A5………………………………………………………….150 Figure 5.2: the effect of exchange on NMR signals……………………………………………………………………….. 154 Figure 5.3: NMR titration of TgMIC4-A5 with galactose…………………………………………………………………160 Figure 5.4: NMR titration of TgMIC4-A5 with lacto-N-biose…………………………………………………………. 161 Figure 5.5: NMR titration of TgMIC4-A5 with GM1-penta…………………………………………………………….. 162 Figure 5.6: Comparison of TgMIC4-A5 bound to six different ligands…………………………………………….163 Figure 5.7: The assigned 1H-15N HSQC spectrum of galactose-bound TgMIC4-A5…………………………. 165

Figure 5.8: ΔδH vs ΔδN for galactose-induced chemical shift perturbations of TgMIC4-A5…………….. 166 Figure 5.9: ligand-induced chemical shift perturbations in TgMIC4-A5…………………………………………. 167 Figure 5.10: the ligand-binding surface of TgMIC4-A5………………………………………………………………….. 169 Figure 5.11: binding isotherms for the TgMIC4-A5/galactose interaction…………………………………….. 171 Figure 5.12: a binding isotherm for the TgMIC4-A5/lactose interaction……………………………………….. 171 Figure 5.13: a binding isotherm for the TgMIC4-A5/N-acetyl-D-lactosamine interaction……………… 172 Figure 5.14: isothermal titration calorimetry (ITC) data for the TgMIC4-A5/lacto-N-biose interaction..………………………………………………………………………………………………………………………………... 174 Figure 5.15: TgMIC1-NT purification and NMR analysis………………………………………………………………… 176 Figure 5.16: Non-synergistic GM1-penta binding by TgMIC1-NT and TgMIC4-A5…………………………. 177 Figure 5.17: structural comparison of A1 and A5………………………………………………………………………….. 180

Figure 6.1: the 13C-edited NOESY-HSQC of TgMIC4-A5/lacto-N-biose…………………………………………… 191 Figure 6.2: Assignment of ligand-bound TgMIC4-A5 aromatic nuclei……………………………………………. 193 Figure 6.3: lacto-N-biose bound state vs unbound state………………………………………………………………. 194 Figure 6.4: the molecular structure of lacto-N-biose……………………………………………………………………. 195 Figure 6.5: the 1H NMR spectrum of lacto-N-biose……………………………………………………………………….. 195 Figure 6.6: the 1H-1H TOCSY spectrum of lacto-N-biose………………………………………………………………… 197 Figure 6.7: the effect of varying TOCSY spin-lock duration……………………………………………………………. 198

11

Figure 6.8: the 1H-1H COSY spectrum of lacto-N-biose………………………………………………………………….. 198 Figure 6.9: Gal-H5 and H6 assignment………………………………………………………………………………………….. 200 Figure 6.10: the TgMIC4-A5/lacto-N-biose structure ensemble……………………………………………………. 203 Figure 6.11: TgMIC4-A5/lacto-N-biose structure agreement with experimental restraints………….. 204 Figure 6.12: analysis of lacto-N-biose glycosidic torsion angles……………………………………………………..206

15 Figure 6.13: expression, purification and NMR analysis of N-TgMIC4-A5K19A……………………………… 209

Figure 6.14: NMR titration of TgMIC4-A5K19A and lacto-N-biose…………………………………………………… 210 15 Figure 6.15: expression, purification and NMR analysis of N-TgMIC4-A5K60M……………………………… 211

Figure 6.16: NMR titration of TgMIC4-A5K60M and lacto-N-biose…………………………………………………… 212

Figure 6.17: expression, purification and NMR analysis of TgMIC4-A5Y69L. …………………………………… 214

Figure 6.18: NMR titration of TgMIC4-A5Y69L and lacto-N-biose……………………………………………………. 215 Figure 6.19: a potential mechanism of galactose recognition by TgMIC4-A5………………………………… 216 Figure 6.20: the electrostatic surface of ligand-docked TgMIC4-A5……………………………………………….218 Figure 6.21: Alignment of N-acetyl-D-lactosamine and docked lacto-N-biose………………………………. 219 Figure 6.22: structural comparison of galactose-binding by SML-2 and TgMIC4-A5……………………… 220

Figure 7.1: the crystal structures of pathogenic proteins in complex with GM1-penta………………… 226 Figure 7.2: the solution structure of TgMIC4-A12…………………………………………………………………………. 229 Figure 7.3: the hydrophobic domain interface of TgMIC4-A12……………………………………………………… 230 Figure 7.4: the crystal structure of factor XI……………………………………………………………………………….... 231 Figure 7.5: gel electrophoresis analysis of the native purified TgMIC1/4/6 complex…………………….. 232 Figure 7.6: NMR and MALDI-TOF spectra demonstrating the polydispersity of recombinantly- produced TgMIC4-A3…………………………………………………………………………………………………………………… 234 Figure 7.7: the updated model of the TgMIC1/4/6……………………………………………………………………….. 236

Figure A5: MALDI-TOF mass spectrometry of digested TgMIC4-A56…………………………………………….. 245 Figure A9: molecular structures of titrated apple-5 ligands……………………………………………...... 273 Figure A11.1: NMR titration of TgMIC4-A5 with lactose……………………………………………………………….. 277 Figure A11.2: NMR titration of TgMIC4-A5 with N-acetyl-D-lactosamine………………………………………278 Figure A11.3: NMR titration of TgMIC4-A5 with galacto-N-biose…………………………………………………. 279 Figure A11.4: NMR titration of TgMIC4-A5 with 3’sialyl-N-acetyl-D-lactosamine…………………………. 280

12

List of tables

Table 3.1: acquisition parameters for the NMR spectra recorded for assessment of folded states and lectin-activities…..……………………………………………………………………………………………………….. 89 Table 3.2: the estimated physical and chemical properties of TgMIC4-A56………………………………….. 91

Table 4.1: acquisition parameters for the NMR spectra recorded for the solution structure determination of TgMIC4-A5…..……………………………………………………………………………………………...... 116 Table 4.2: atypical chemical shifts in TgMIC4-A5………………………………………………………………………….. 124 Table 4.3: a summary of the experimental restraints used for the structure calculation………………. 130 Table 4.4: a summary of TgMIC4-A5 structure statistics………………………………………………………………. 133 Table 4.5: comparison of Ramachandran plot statistics derived using MolProbity and PROCHECK. 136 Table 4.6: structural homology of TgMIC4-A5 with other known apple-PAN domain structures….. 144

Table 5.1: summarising the arrayed probes to which TgMIC4-A5 exhibits highest affinity…………… 151 Table 5.2: detailing the NMR experiments recorded during the dual titration of MIC1-NT and MIC4-apple5 with GM1-penta…………………………………………………………………………………………………...... 158 Table 5.3: summarising the titrations of TgMIC4-A5…………………………………………………………………….. 159 Table 5.4: TgMIC4-A5/ligand dissociation constants…………………………………………………………………….. 173

Table 6.1: acquisition parameters for the NMR spectra recorded for chemical shift assignment of lacto-N-biose…………………………………………………………………………………………………………. 187 Table 6.2: TgMIC4-A5/lacto-N-biose intramolecular NOEs……………………………………………………………. 192 Table 6.3: chemical shifts of lacto-N-biose-bound TgMIC4-A5 tyrosine aromatic resonances...... 193 Table 6.4: 1H chemical shift assignments of lacto-N-biose……………………………………………………………. 200 Table 6.5: intramolecular NOE assignments……………………………………………………………………………….... 201 Table 6.6: a summary of statistics for the TgMIC4-A5/lacto-N-biose structure ensemble…………….. 202

Table A1.1. PCR primers……………………………………………………………………………………………………………….. 240 Table A1.2. PCR reaction mixtures……………………………………………………………………………………………….. 240 Table A1.3. PCR protocols…………………………………………………………………………………………………………….. 241 Table A2.3. Ni2+-affinity chromatography buffers…………………………………………………………………………. 242

13

Table A4.1. PCR primers for site-directed mutagenesis………………………………………………………………… 244 Table A4.2. PCR reaction mixture for site-directed mutagenesis…………………………………………………. 244 Table A4.3. PCR reaction protocol for site-directed mutagenesis………………………………………………… 244 Table A7: carbohydrate microarray data for TgMIC4-A56……………………………………………………………. 249 Table A8: carbohydrate microarray data for TgMIC4-A5……………………………………………………………... 268

14

Acknowledgements

First and foremost I would like to thank my supervisor, Steve Matthews, for giving me the opportunity to carry out this work, and for always being available to provide advice, support and guidance; it is much appreciated. I would also like to thank Ernesto Cota for his valuable advice over the years, and also for his role in enabling me to join the group in the first place.

I owe a great deal to Pete Simpson, whose help, support and unwavering patience in setting up, processing and analysing NMR experiments and structure calculations has proved to be truly invaluable. He has also provided excellent advice in various other areas during my PhD, and his contribution to the proof-reading of this thesis is much appreciated.

I would like to thank all members of the Matthews/Cota/Isaacson (and recently Hare/De Simone!) groups, past and present, who have contributed to making the last four years so enjoyable. I’d like to further thank the various members of the “Toxo” group for useful and insightful discussions, in particular Jan Marchant, who deserves further acknowledgement for his creation of the various NMRView scripts which make chemical shift assignment and structure analysis more simple and accessible. I would also like to thank James Garnett, for valuable advice in the lab over the years, Jon Taylor, for lab advice and proof-reading, and Stephen Hare, for proof-reading.

Finally I would like to thank my family; Mum, Dad and Beth, for their constant support both during this period and beyond. And, of course, I would like to thank my girlfriend Sharon, whose love, support, advice, and not to mention curiosity about the inner workings of Structural Biology (!) has meant very much to me.

15

List of abbreviations spp. – Species CSP – Circumsporozoite protein IFNγ – Interferon gamma AIDS – Acquired Immunodeficiency Syndrome MIC – Microneme protein Tg – Toxoplasma gondii Et – tenella Nc – caninum Pf – Plasmodium falciparum Pv – Plasmodium vivax Ho – Haementeria officinalis Hm – Hirudo medicinalis DNA – Deoxyribonucleic acid RNA – Ribonucleic acid RON – Rhoptry neck protein ROP – Rhoptry bulb protein TEM – Transmission electron microscopy GRA – Dense granule protein IMC – Inner membrane complex GAP – Glideosome-associated protein ER – Endoplasmic reticulum IMP – Inner membrane protein ATP – Adenosine triphosphate ROM – Rhomboid protease MJ – Moving junction PV – Parasitophorus vacuole TRAP – Thrombospondin-related anonymous protein SML-2 – muris lectin TSP – Thrombospondin EGF – Epidermal growth factor

16 vWA – von Willebrand A PAN – Plasminogen/apple/nematode MAR – Microneme adhesive repeat HFF – Human foreskin fibroblasts FXI – Factor XI PK – Prekallikrein HGF – Hepatocyte growth factor LAPP – Leech anti-platelet protein A1, 2, 3, etc – Apple domain 1, 2, 3, etc PDB – Protein data bank NMR – Nuclear magnetic resonance NT – N-terminus CT – C-terminus EST – Expressed sequence tag FID – Free induction decay FT – Fourier transform RDC – Residual dipolar coupling NOE – Nuclear Overhauser effect HSQC – Heteronuclear single quantum coherence COSY – Correlation spectrscopy TOCSY – Total correlation spectroscopy NOESY – Nuclear Overhauser effect spectrscopy INEPT – Insensitive nuclei enhanced via polarisation transfer DEPT – Distortionless enhancement via polarisation transfer LIC – Ligation-independent cloning Trx – Thioredoxin PCR – Polymerase chain reaction DTT – Dithiothreitol EDTA – Ethylenediaminetetraacetic acid dGTP – Deoxyguanosine triphosphate SOC – Super optimal broth LB – Lysogeny broth

17

RPM – Revolutions per minute SDS-PAGE – Sodium dodecyl sulphate polyacrylamide gel electrophoresis ITPG – Isopropyl-β-D-1-thiogalacopyranoside OD – Optical density CV – Column volume FXa – Factor Xa TEV – Tobacco etch virus Tris – Tris(hydroxymethyl)aminomethane FPLC – Fast protein liquid chromatography MWCO – Molecular weight cut-off MALDI-TOF – Matrix-assisted laser deabsorption/ionisation - time-of-flight ADR – Ambiguous distance restraint ARIA – Ambiguous restraints for iterative assignment CNS – Crystallography and NMR system TALOS – Torsion angle likelihood obtained from shifts and sequence similarity TAD – Torsion angle dynamics SA – Simulated annealing CSI – Chemical shift index BMRB – BioMagResBank PSVS – Protein structure validation suite RMSD – Root mean square deviation NGL – Neoglycolipid Lac – Lactose LacNAc – N-acetyllactosamine Gal – Galactose GalNAc – N-acetylgalactosamine Glc – Glucose GlcNAc – N-acetylglucosamine NeuAc – N-acetylneuraminic acid NeuGc – N-glycolylneuraminic acid Fuc – Fucose SU – Sulphate

18

Cer – Ceramide AO – Aminooxy Man – Mannose ITC – Isothermal titration calorimetry iNOE – Intermolecular NOE TrNOE – Transferred NOE STD – Saturation transfer difference PRE – Paramagnetic relaxation enhancement PCS – Pseudo-contact shift HADDOCK – High ambiguity driven biomolecular docking AIR – Ambiguous interaction restraint SAXS – Small angle X-ray scattering

19

Chapter 1: Biological Introduction

20

1.1. An introduction to Toxoplasma gondii

Toxoplasma gondii is a widespread infectious protozoan parasite. It is unusual in its ability to invade any nucleated cell type and in its broad host range, which includes virtually all warm-blooded mammals (Carruthers 2002). Unlike many pathogens, T. gondii is globally prevalent, and though infection rates vary between regions it is thought that up to a third of the world’s human population may carry the infection (Montoya & Liesenfeld 2004). The remarkable promiscuity and profligacy of the parasite makes for an interesting and significant subject for scientific research. This chapter aims to provide a comprehensive literature review of T. gondii biology and thereby a suitable backdrop to the research presented thereafter.

1.1.1. Identification and classification

T. gondii was discovered in 1908 by French bacteriologists Nicolle & Manceaux in the tissues of the North African rodent Ctenodactylus gundi (Nicolle & Manceaux 1908). The newly identified genus was named according to the observed morphology of the parasite (toxo = arc, plasma = life) and the species name borrowed from the host organism (Nicolle & Manceaux 1909).

The classification of the parasite was uncertain for the next 60 years, before electron microscopy revealed a cell ultrastructure akin to that of the apicomplexan (coccidian) Eimeria spp. Shortly after the coccidian nature of the parasite was confirmed via the full characterisation of a heteroxenous life cycle (reviewed in Tenter et al. 2000). The full taxonomic classification of T. gondii is as follows:

Domain Eukaryota Chromalveolata Superphylum Alveolata Phylum Apicomplexa Class Sub-class Coccidiasina Order Sub-order Eimerioria Family Sub-family Toxoplasmatinae Genus Toxoplasma Species gondii

21

Since the 1970s several species originally assigned to the Toxoplasma genus have been re-classified. As such, T. gondii is the only widely-accepted member of the genus (Tenter et al. 2000).

1.1.1.1. The phylum Apicomplexa

The Apicomplexa are a large group (approximately 4,500 members) of related protozoan organisms. The phylum name derives from the conserved presence of a so-called ‘apical complex’, a network of cytoskeletal structures at the cell anterior. Apicomplexans are dividable into two classes – and Conoidasida – distinguishable by the absence or presence of a conoid, an anterior cone composed of protein filaments. There are several notable apicomplexan parasites aside from T. gondii, which govern infections of significant medical and economic relevance. These include fellow coccidians Eimeria (causing poultry coccidiosis), (causing cattle ) and Neospora (causing cattle/canine coccidiosis) spp. Perhaps the most notable, the aconoidasidan Plasmodium spp. cause malaria, responsible for approximately 1 million deaths worldwide per year (World Health Organisation).

1.1.1.2. T. gondii: the model Apicomplexan

T. gondii has recently emerged as a model system for the study of the Apicomplexa, due to its good experimental tractability. Unlike its relatives, T. gondii is readily amenable to genetic manipulation, with established protocols for forward and reverse genetics and a high efficiency of transfection (Kim & Weiss 2004). This enables the routine production of tagged proteins (e.g. with GFP or c-myc) for the study of their cell biology, and heterologous expression of other apicomplexan proteins, such as Plasmodium knowlesi circumsporozoite protein (CSP) (Di Cristina et al. 1999). Additionally T. gondii has an unusually distinct intracellular morphology, facilitating biological analysis via microscopy (Roos et al. 1999; Kim & Weiss 2007).

1.1.2. Life cycle

T. gondii is a polyxenous parasite, associating with a multitude of hosts during its life cycle. The life cycle (depicted in figure 1.1) consists of distinct sexual and asexual cycles, correlating with infection of definitive hosts (the members of the Felidae family) and intermediate hosts (other mammals/birds) respectively. The two cycles are linked but can occur independently, which is thought to be a major

22 contributing factor to the overall success of the parasite (Black & Boothroyd 2000). A key aspect of T. gondii biology which underlies both life cycle stages is its ability to form “cysts” within infected host tissues – intracellular vacuoles containing variable numbers of bradyzoic (slow-replicating) parasites (Dubey et al. 1998). Encysted parasites can survive in a dormant state for long periods of time and are transmissible between hosts.

1.1.2.1. The sexual life cycle

The sexual life cycle occurs within the gastrointestinal tract of a felid. It is initiated upon ingestion of a Toxoplasma cyst, which passes through the stomach before proteolytic digestion of the cyst wall in the small intestine. Liberated bradyzoites invade the intestinal epithelium where they undergo several stages of asexual development, before gametogenesis and oocyst formation. Rupture of infected epithelial cells releases oocysts into the intestinal lumen, which are shed in the faeces (Dubey et al. 1998). This usually occurs 3-10 days following ingestion of the tissue cyst. The oocyst is a durable spore, capable of long-term survival outside of a host organism. Approximately 1-5 days following excretion the oocyst undergoes sporulation, producing four infectious sporozoites. Ingestion of a sporulated oocyst by a Felid leads to re-initiation of the sexual cycle (Dubey et al. 1998).

1.1.2.2. The asexual life cycle

The asexual T. gondii life cycle is initiated upon the ingestion by an intermediate host of a sporulated oocyst or a tissue cyst. These are digested in the gastrointestinal tract of the host, releasing infectious sporozoites or bradyzoites, which penetrate the intestinal epithelium and differentiate into tachyzoites (the fast-replicating form of the parasite). This marks the onset of an acute infection. Tachyzoites have an in vitro replication cycle of 6-8 hours and typically form intracellular populations of 64-128 parasites before exit and infection of neighbouring cells (Black & Boothroyd 2000). The parasite can therefore rapidly establish a disseminate infection, aided by its broad host cell range and the ability to cross the blood-brain barrier and penetrate the central nervous system (Carruthers, 2002).

After 7-10 days tachyzoites begin to differentiate into bradyzoites, which subsequently form tissue cysts. These can develop throughout the body, but are particularly prevalent in neural and muscular tissues (e.g. brain/eyes) (Dubey et al. 1998). The appearance of cysts marks the onset of a chronic infection,

23 during which the parasite can reside in its semi-dormant state for the lifetime of the host. Encysted parasites can reawaken upon host immuno-suppression and quickly re-establish an acute infection. Tissue cysts are also transmissible, and the ingestion of infected tissue by another host can re-initiate sexual or asexual proliferation.

Figure 1.1. the life cycle of Toxoplasma gondii. The sexual stage is initiated within the intestine of a definitive host (e.g. a domestic cat). Ingested parasites differentiate into gametes, which fuse to form oocysts which are then excreted. The cycle, from ingestion to excretion, takes 3-10 days. Ingestion of a maturing oocyst (e.g. via contaminated feed or a cat litter tray) by a definitive host re-initiates sexual reproduction. Conversely ingestion by an intermediate host initiates asexual proliferation, during which tachyzoites rapidly establish a disseminated acute infection. After 7-10 days tachyzoites begin to differentiate into bradyzoites, which form long-living tissue cysts. Encysted parasites reawaken following immuno-suppression of the host, re-initiating the asexual cycle. Cysts are also transmissible to definitive and intermediate hosts, thus re-initiating sexual and asexual proliferation. Diagram adapted from Black & Boothroyd 2000.

24

1.2. Toxoplasmosis

Infection with T. gondii gives rise to the disease toxoplasmosis. This section describes the various aspects of the disease, including transmission, clinical manifestations and prognosis.

1.2.1. Disease transmission and consequences

T. gondii infections are readily transmitted between host organisms, most commonly via the ingestion of encysted bradyzoites in undercooked meat or oocysts in contaminated water (Dubey 2004). Unlike most Apicomplexans, T. gondii is capable of transmission between non-human and human hosts (i.e. zoonosis) and thus human infection is widespread. The severity of ensuing disease varies depending largely on the immune capability of the host.

1.2.1.1. Immuno-competent individuals

T. gondii infection in humans with a fully functioning immune system is usually asymptomatic or induces mild flu-like symptoms (e.g. swollen lymph nodes). Symptoms are restricted by the host immune response to acute infection, during which antigen-specific T-cells secrete interferon-gamma (IFNγ) (Suzuki et al. 1989; Carruthers 2002). Studies in mice have revealed that IFNγ inhibits tachyzoite proliferation and induces the expression of bradyzoite-specific antigens (Suzuki et al. 1989; Bohne et al. 1993). Therefore the immune response appears to reduce the spread of infection by encouraging parasite differentiation into the slow-growing form. However the subsequent encystation of bradyzoites signals the beginning of a chronic (life-long) infection (Carruthers 2002). Therefore, rather than eradicating the parasite, the immune response indirectly serves to prolong parasite survival.

For many years it was believed that a latent chronic infection is inconsequential (provided the host maintains immuno-competence). However several recent studies have implicated chronic infection in leading to behavioural alterations in later life. This began with the observation that toxoplasmic rodents display decreased neophobia (fear of novelty) and aversion to predation (Holliman 1997). Interestingly these alterations increase the likelihood of transmission to the definitive host, thus completing the parasite’s life cycle. Subsequently there have been several reports linking toxoplasmosis with schizophrenia and other psychological disorders in humans (Torrey & Yolken 2003; Webster et al. 2006).

25

The mechanisms by which T. gondii could modify host behaviour are not well understood, however studies using animal models have demonstrated that infection leads to altered levels of several neurotransmitters which are modified in schizophrenic persons, including dopamine (Torrey & Yolken 2003). A recent screen of the T. gondii genome identified two genes encoding tyrosine hydroxylase, the enzyme responsible for the production of dopamine precursor L-DOPA (Gaskell et al. 2009).

1.2.1.2. Immuno-suppressed individuals

Acquisition of a T. gondii infection by an individual with immune dysfunction can lead to severe acute toxoplasmosis, with retinochoroiditis (inflammation of the retina/choroid) a common manifestation (Kim & Weiss 2007). Additionally immuno-suppression of an individual with an existing chronic infection can encourage the rupture of tissue cysts, releasing bradyzoites which, at low IFNγ levels, differentiate into tachyzoites and re-establish an acute infection (Carruthers 2002). The reawakening of a latent infection in the central nervous system, a common site of encystation, often causes encephalitis (acute inflammation of the brain). This can give rise to headaches, seizures and psychological problems, and can be fatal if untreated (Kim & Weiss 2007). This is of particular concern for AIDS patients (a disease of the immune system) and organ and bone marrow transplant recipients (who take immunosuppressant drugs in order to prevent organ rejection). T. gondii is one of the leading opportunistic pathogens associated with AIDS, responsible for an estimated 30% of AIDS-related deaths in Europe (Hill & Dubey 2002). Additionally, reawakened toxoplasmosis following bone marrow transplantation is rare but has a high mortality rate, at >90% (de Medeiros et al. 2001).

1.2.1.3. Pregnant women

Congenital toxoplasmosis occurs when a female acquires a T. gondii infection during the course of pregnancy. The parasite is capable of crossing the placental wall and thus can infect the developing foetus - a process called ‘vertical’ transmission (as opposed to the ‘horizontal’ transmission between neonates). Women who are infected prior to conception do not usually transmit the infection during gestation unless they undergo immuno-suppression (Kim & Weiss 2007). The rate of vertical transmission increases during the course of pregnancy; infection during the third trimester carries a 60- 70% chance of foetal infection. Conversely the consequences of congenital toxoplasmosis are most severe if acquired prior to the onset of third trimester. In such cases the newborn may suffer a variety of

26 neurological problems, including retinochoroiditis, encephalitis, microcephaly (decreased head size) and epilepsy. Newborns infected during the third trimester are often asymptomatic at birth, although the aforementioned symptoms may develop in later life (Kim & Weiss 2007).

1.2.1.4. Wild and domestic animals

T. gondii is widespread amongst wild and domestic animals (Kim & Weiss, 2007). This not only poses the risk of transmission to humans from pets (e.g. cats, dogs) but also has a significant impact upon the farming industry. A recent study in the Campania region of Italy revealed that >25% of sheep carry T. gondii (Fusco et al. 2007). Infection is a common cause of abortion in pregnant ewes (Kim & Weiss 2007).

1.2.2. Disease prevalence and prevention

T. gondii is a global parasite, with human infections having been detected on all six permanently- inhabited continents (Tenter et al. 2000). Toxoplasmosis is typically diagnosed via the Sabin-Feldman dye test, a serologic test which detects T. gondii-specific immunoglobulin G (IgG) antibodies. Individuals who have been exposed to the parasite produce such antibodies and thus give a positive test result (i.e. are ‘seropositive’) (Sabin & Feldman 1948). Seroprevalence and incidence rates vary widely between populations and hence prevention and control measures are applied on a region-by-region basis.

1.2.2.1. Epidemiology

The seroprevalence of toxoplasmosis between populations is highly variable - from 0 to 90% - and appears to depend upon age, climate and culture (Tenter et al. 2000). In the 1990s many central European countries (e.g. France, Germany) had estimated seroprevalences of 40-60% in women of child- bearing age. These figures are decreased in colder climates such as Scandinavia (11-28%) and increased in the warmer climes of south America and west Africa (55-75%) (Tenter et al. 2000). The seroprevalence in the UK stands at 22% as of 1992, whilst a more recent study in the USA reports a 9% seroprevalence in persons aged 12-49 from 1999-2004; a 5% decline on the previous decade (Jones et al. 2007). Declining seroprevalences have also been reported in regions of Greece (Diza et al. 2005), Switzerland (Bornand & Piguet 1991) and Sweden (Evengård et al. 2001), with the latter study also

27 reporting a low incidence rate, at 0.05% (5 new infections per 10,000 participants). However a recent study conducted in the Legnano region of Italy reports a declined seroprevalence alongside a high incidence rate of 0.9% - one of the highest in Europe (De Paschale et al. 2008). Therefore trends in seroprevalence and incidence are applicable only to the test population, and not necessarily representative of the wider population.

1.2.2.2. Drug treatment

Immuno-competent individuals do not usually require drug treatment for either acute or chronic stage toxoplasmosis. The asymptomatic nature of the infection coupled with the ineffectiveness of drugs against encysted bradyzoites precludes drug therapy. In rare cases where severe symptoms are observed during acute infection, the prescribed treatment usually involves a synergistic combination of (Daraprim®) and . Pyrimethamine is a common anti-malarial drug which inhibits the enzyme dihydrofolate reductase and thereby blocks the production of folic acid, a vitamin involved in nucleotide biosynthesis. Sulfadiazine is a type of - a class of antibiotic which also blocks folic acid biosynthesis, via inhibition of dihydropteroate synthetase (Kim & Weiss 2007).

Folic acid deficiency (leading to bone marrow suppression, causing anaemia) is a common side effect of this course of treatment, and hence the drugs are currently supplemented with folinic acid - a precursor to various folic acid derivatives which is not taken up by T. gondii (Kim & Weiss 2007). Unfortunately folinic acid has been shown to diminish the therapeutic effects of pyrimethamine against Plasmodium falciparum, however the recent emergence of non-disruptive alternatives - such as 5-methyl- tetrahydrofolate - is encouraging for future regimens (Nduati et al. 2008).

1.2.2.2.1. Immuno-compromised individuals

Pyrimethamine has proved to be an effective treatment for cardiac transplant recipients receiving a heart from a seropositive donor (Kim & Weiss 2007). It is also effective in the treatment of toxoplasmic AIDS patients, in whom it is often complemented with the antibiotic protein synthesis inhibitor . Following primary treatment AIDS patients are typically administered pyrimethamine and sulfadiazine for their remaining lifespan, in order to prevent relapse (Kim & Weiss 2007).

28

1.2.2.2.2. Pregnant women

Due to the risk of harm to the developing foetus, women who acquire a T. gondii infection during pregnancy are not usually treated with the standard pyrimethamine/sulfadiazine regimen. Instead they are administered spiramycin, an antibiotic inhibitor of protein synthesis which appears to reduce the risk of vertical transmission (Desmonts & Couvreur 1974). If foetal toxoplasmosis is diagnosed then spiramycin is often alternated with the standard regimen (Kim & Weiss 2007). Newborns with congenital toxoplasmosis are often asymptomatic until later life, and the optimum drug therapy in such cases in not yet clear. It is proposed that treatment from birth with pyrimethamine/sulfadiazine at low doses is beneficial (Kim & Weiss 2007).

1.2.2.3. Prevention of toxoplasmosis.

Increased protection against T. gondii infection can be provided simply via the maintenance of good hygiene standards, particularly regarding the storage and preparation of meat products. Pregnant women are further advised to avoid cleaning cat litter trays and gardening, in order to minimise the risk of exposure to infectious oocysts (Kim & Weiss 2007).

1.2.2.3.1 Screening for toxoplasmosis

Despite the risk of harm to a developing foetus very few countries routinely screen pregnant women for toxoplasmosis. In those that do - France, Italy & USA for example - the identification of a recently acquired infection is followed by treatment with spiramycin.

1.2.2.3.2. Vaccination: current and future prospects

Based upon what is known regarding the life cycle and modes of transmission of T. gondii, current vaccination strategies should include prevention of acute infection and congenital transmission, prevention of cyst formation (i.e. chronic infection) and prevention of oocyst shedding by cats (Innes & Vermeulen, 2006). At present the only available vaccine against T. gondii consists of live tachyzoites, derived from an attenuated strain which lacks the capacity to form tissues cysts or oocysts (Buxton & Innes 1995). This is commercially available for veterinary use and has served to reduce abortion rates in

29 sheep. However the vaccine is expensive, has a relatively short life-span and could potentially revert to a pathogenic form, hence it is not suitable for human use (Kur et al. 2009).

Due to the stringent safety requirements for human vaccines, a live vaccine is unlikely to be considered suitable in the future (Innes & Vermeulen 2006). Instead the search for a human vaccine is focused upon the identification of immuno-dominant antigens (i.e. ‘subunit vaccines’). Several cell-surface and secretory proteins have been identified as potential vaccine candidates. DNA-based vaccines encoding the microneme proteins AMA1 and MIC8 have proved effective in protecting mice against a subsequent challenge with T. gondii tachyzoites (Dautu et al. 2007; Liu et al. 2010). Meanwhile, mice immunised with the microneme proteins MIC1 and MIC4 purified from tachyzoite lysates bear significantly reduced numbers of tissues cysts and increased survival rates (Lourenco et al. 2006).

1.3. Apicomplexan host cell invasion

Most apicomplexans - T. gondii included - are obligate intracellular parasites (i.e. propagate within host cells) and are uniquely adapted for host cell invasion and intracellular survival. This chapter describes the numerous processes which govern the apicomplexan lifestyle.

1.3.1. Cell ultrastructure and morphology

A typical apicomplexan cell contains ubiquitous eukaryotic organelles (i.e. nucleus, ER) alongside several phylum-specific features; the apicoplast, apical complex, micronemes, rhoptries and dense granules, enclosed within a complex membranous system (the ‘pellice’) (figure 1.2). The apicoplast is a non- photosynthetic ; a layer of 3-5 membranes enclosing a 35kB circular DNA, encoding numerous duplicated tRNA and rRNA genes. It is an essential organelle for parasite survival, however its function is poorly understood (Maréchal & Cesbron-Delauw 2001).

1.3.1.1. The apical complex

The apical complex is a network of anterior cytoskeletal structures, the central component of which is the conoid, a spiral assembly of protein fibres. These fibres emerge from a pre-conoidal ring and are capped by a polar ring, from which protrude twenty-two microtubule filaments, which span the length

30 of the parasite underlying the pellicle (Dubey et al. 1998; Hu et al. 2006). The apical complex is thought to play an important role during parasite replication, with conoid duplication (the first visible step of replication) providing a scaffold for the assembly of the daughter cell cytoskeleton. Additionally the conoid is seen to undergo extension and retraction during host cell invasion - a Ca2+-dependent step (Mondragon & Frixione 1996) - and hence is expected to play a mechanical role during host cell attachment and/or penetration (Hu et al. 2006).

Figure 1.2: the ultrastructure of a typical Toxoplasma gondii tachyzoite. The cell contains a nucleus encircled by a rough endoplasmic reticulum. Anterior to the nucleus are the Golgi apparatus and a single mitochondrion. The diagram also depicts the hallmark features of an apicomplexan parasite, such as the apicoplast (green), dense granules (purple), micronemes (pink) and rhoptries (blue). Diagram taken from Ajioka et al. 2001.

1.3.1.2. Micronemes

Micronemes are small (<100nm length), cylindrical secretory organelles which co-localise with the apical complex in apicomplexan cells. Abundance varies according to the genera and developmental stage of the parasite; in general, larger invasive zoites (e.g. P. falciparum sporozoites compared with merozoites) contain large numbers of micronemes (Blackman & Bannister 2001). T. gondii tachyzoites typically contain up to 100 micronemes (Carruthers 2002), whilst Cryptosporidium parvum sporozoites have been found to bear as many as 167 micronemes (Tetley et al. 1998). Apical secretion of microneme proteins (MICs) occurs at a basal rate in the absence of host cells (Carruthers 1999) but is transiently up- regulated upon host-cell contact, with a co-incident increase in intracellular calcium levels (Carruthers & Sibley 1997). Once deployed MICs are responsible for establishing a strong apical host-cell attachment

31 prior to invasion (Carruthers & Sibley 1997), and are also involved in parasite locomotion; experimental stimulation of intracellular calcium fluxes in T. gondii induces MIC secretion and increased parasite motility (Wetzel et al. 2004).

1.3.1.3. Rhoptries

Rhoptries are relatively large (2-3 μm long) club-shaped secretory organelles (Boothroyd & Dubremetz 2008). As with micronemes they are localised at the apex of the parasite and vary in number according to genera and development stage; Plasmodium sporozoites typically contain only two rhoptries, whilst T. gondii tachyzoites can bear up to twelve (Blackman & Bannister 2001; Boothroyd & Dubremetz 2008). Rhoptry protein secretion occurs at the onset of parasite engulfment, promptly following apical attachment (Carruthers & Sibley 1997). The organelle contains two distinct sub-compartments - the ‘bulb’ (a wide base) and ‘neck’ (a tapered peak). The compartmentalised proteins are sequentially secreted directly into the host cell, where they appear to fulfil distinct functions. Rhoptry neck proteins (RONs), secreted first, are believed to be involved in the assembly of a ‘moving junction’ (see chapter 1.3.3), whilst bulb proteins (ROPs) are involved with mediating parasite engulfment and survival within the host (Boothroyd & Dubremetz 2008).

1.3.1.4. Dense granules

Dense granules are small (200 nm diameter) spherical bodies characterised by their observed high density during transmission electron microscopy (TEM). Abundance varies in different Apicomplexa genera, with T. gondii typically bearing 20 evenly-distributed granules per cell (Mercier et al. 2005). The contained proteins (GRAs) are secreted diffusely across the parasite plasma membrane shortly after invasion (Carruthers & Sibley 1997). GRA functions are currently poorly understood, however many proteins localise to the PV membrane or lumen, and therefore may promote intra-vacuolar replication of the parasite (Mercier et al. 2005; Laliberte & Carruthers 2008).

1.3.2. The glideosome; a unique system for motility

Whilst most parasites utilise protrusive structures such as cilia, flagella or pseudopods for motility, Apicomplexans posses a unique system termed the ‘glideosome’ (Keeley & Soldati 2004) (figure 1.3).

32

Figure 1.3: the T. gondii pellicle and glideosome. The subpellicular microtubules originate at the apical complex and span the length of the parasite. These underlie a membrane skeleton and inner membrane complex (IMC), a single layer of adjoining flattened vesicles, giving rise to a bi-membranous two-dimensional lattice of intramembranous particles (IMPs). Within the outer membrane of the IMC are two gliding-associated proteins (GAP45 & GAP50) which are coupled to a myosin light chain (MLC). This forms a base for TgMyoA, a motor protein which migrates across a rapidly polymerising (at the + end) and depolymerising (at the – end) actin filament. This filament is indirectly coupled to host adhesive parasite proteins, such as TgMIC2, via interaction with aldolase. The interaction of host-adhesive proteins with host receptors establishes a tight interaction between parasite and host, and thus the ATP-driven migration of TgMyoA across the actin filament serves to drag the parasite across the surface of the host cell, whilst the actin-coupled host adhesins are translocated towards the posterior of the parasite. Diagram adapted from Keeley & Soldati 2004.

33

The glideosome mediates gliding motility; a process of cell migration, which occurs at relatively fast speeds (approximately 3 - 5 μm s-1 in T. gondii), and is the basis of host-cell invasion and egress (Soldati & Meissner 2004). The pellicle consists of an outer plasma membrane and an inner membrane complex (IMC) - an array of adjoining flattened vesicles - sitting atop the microtubule protrusions of the apical complex (figure 1.3). The IMC and the plasma membrane are bridged by the glideosome, the central component of which is TgMyoA, a motor protein which ‘walks’ along a polymerising actin filament (Keeley & Soldati 2004).

Cell locomotion derives from the indirect coupling of the actin filament to host-adhesive surface proteins (i.e. microneme proteins). These adhere to host-cell receptors and thereby tether the parasite to the host cell. The action of TgMyoA therefore serves to drag the parasite across the surface of the host cell, its direction defined by the secretion of host-adhesins at the apical end of the parasite. The action of the glideosome progressively translocates the adhesins to the parasite posterior, where they are shed via protease activity (Soldati & Meissner 2004).

1.3.3. The moving junction

The term ‘moving junction’ describes the intimate association of parasite and host cell plasma membranes during cell invasion. This notion arose following electron microscopy studies of erythrocyte invasion by Plasmodium knowlesi and T. gondii, during which areas of high electron density emerge at the site of apical attachment, before migrating towards the posterior as the parasite is internalised (Aikawa et al. 1978; Schupp et al. 1980). The moving junction of T. gondii comprises the microneme protein TgAMA1, a conserved apicomplexan transmembrane protein which is essential for parasite survival (Hehl et al. 2000; Triglia et al. 2000), and several TgRONs (Alexander et al. 2005). It is believed that TgRONs are inserted into the host cell plasma membrane where they serve as receptor(s) for TgAMA1, which contacts the glideosome via its cytoplasmic tail (Besteiro et al. 2009).

1.3.4. The model of Apicomplexan host cell invasion

The combined activities of the glideosome, micronemes and rhoptries serve to drive invasion of host cells by Apicomplexan parasites; depicted in the step-wise scheme of T. gondii invasion (figure 1.4) (Carruthers & Boothroyd 2007). Driven by the glideosome, the parasite slides across the surface of a

34 host cell, aided by reversible (i.e. low-affinity) attachments by GPI-anchored surface antigens (SAGs) to glycosaminoglycans such as heparin (Ortega-Barria & Boothroyd 1999) (figure 1.4, step 1), enabling ‘sampling’ of the host cell surface. Upon discovery of a susceptible site for invasion, intracellular calcium levels are increased, presumably as a result of a cell signalling cascade, initiating a transient up- regulation of microneme protein secretion at the apical tip of the cell (step 2). This invokes a strong apical attachment and consequent re-orientation of the parasite such that it lies perpendicular to the plane of the host cell membrane (step 3).

Figure 1.4: the mode of host cell invasion by T. gondii. 1) As the parasite glides across the surface of a host cell it forms low-affinity attachments via GPI-anchored surface antigen (SAG) proteins. 2) Polarised secretion of microneme proteins (MICs) initiates a strong apical attachment. 3) The parasite re-orientates such that it lies perpendicular to the host cell plasma membrane. 4) The discharge of rhoptry neck (RON) proteins initiates the establishment of a moving junction (MJ) between parasite and host cell. Rhoptry bulb proteins (ROPs) are also secreted into the host cell cytoplasm. 5) The glideosome propels the parasite into the host cell. As the parasite enters the parasitophorus vacuole (PV), the MJ is translocated towards the posterior, whilst surface MICs are shed via the action of rhomboid proteases (ROMs). 6&7) As the parasite is fully engulfed into the host cell, the PV is closed and separates from the host plasma membrane. Steps 2-5 occur rapidly - within 15-20 seconds - whereas closure and separation of the PV can take up to two minutes. Diagram modified from Carruthers & Boothroyd 2007.

35

Parasite re-orientation coincides with a transient increase in host plasma membrane conductivity (Suss- Toby et al. 1996), consistent with a perforation of the bilayer and, thereby, the injection of secreted rhoptry neck proteins (Bestiero et al. 2009). The interaction of AMA1 with the host membrane-inserted RONs gives rise to a tight association between the parasite and host plasma membranes; the moving junction (MJ) (step 4). Via gliding motility the parasite forcibly inserts itself into the cell, creating an invagination of the host plasma membrane. The moving junction is translocated to the posterior of the cell (step 5), acting as a ‘molecular sieve’, specifically precluding entry of many host and parasite proteins into the PV membrane (Charron & Sibley 2004). Host adhesive MIC proteins are shed from the parasite surface at the posterior of the cell by rhomboid-family proteases; an essential step for the completion of invasion (Brossier et al. 2005).

The parasite is engulfed into a parasitophorus vacuole (PV), encircled by a host-derived membrane (Suss-Toby et al. 1996). The PV membrane is sealed via formation of a fission pore and the vacuole is released into the cell (steps 6 & 7), leaving a moving junction-derived residue on the host cell surface (Carruthers & Boothroyd 2007). This is followed by the release of GRAs, peaking approximately 20 minutes post-invasion (Carruthers & Sibley 1997) which, together with ROPs, serve to promote intracellular parasite survival and replication. Some GRAs/ROPs are thought facilitate nutrient scavenging via formation of PV membrane pores whilst others mediate the association of the PV with host organelles, such as mitochondria (Laliberte & Carruthers 2008).

The mechanism behind vacuole closure and separation is not well understood, although it is known to be a relatively lengthy process; taking up to two minutes (Suss-Toby et al. 1996). This contrasts with the rapid timescale of apical attachment to engulfment - taking 15-20 seconds – which is aided by the high processivity of the glideosome. Host cell invasion by T. gondii and other apicomplexans is therefore a dynamic and active process, contrasting with the passive mechanisms of cellular invasion employed by other intracellular microbes. Viruses and bacteria generally require recognition by the host cell and engulfment via endocytosis or phagocytosis (Carruthers 2002). By forcibly inducing host cell internalisation, apicomplexans subvert such pathways and thereby avoid host defences such as acidification and lysosomal degradation (Carruthers & Boothroyd 2007).

36

1.3.5. Targeting host cell invasion

The essentiality of host cell invasion for apicomplexan propagation makes it an attractive target for drug development. T. gondii, for example, is an auxotroph for several essential nutrients (including tryptophan, arginine, purines and cholesterol) which it scavenges from the host cell following invasion (Laliberte & Carruthers 2008). Inhibition of host cell invasion will deprive the parasite of these essential molecules and thereby prevent intracellular proliferation. Additionally it will prolong the exposure of the parasite to the host immune system. Detailed insights into the mechanisms of host cell recognition and attachment, gliding motility and cell invasion are therefore sought after, in order to encourage speculation of how they might be targeted.

1.4. Apicomplexan microneme proteins

The subcellular fractionation of micronemes from Sarcocystis tenella enabled the first insight into their contents, identifying two proteins (Dubremetz & Dissous 1980). Subsequently a multitude of microneme proteins (MICs) have been discovered and characterised in various apicomplexan parasites, including T. gondii, Plasmodium falciparum, Eimeria tenella, Neospora caninum and Sarcocystis muris (Tomley & Soldati 2001). The topologies of selected MICs are pictured in figure 1.5. Some proteins are conserved across several apicomplexan genera, for example T. gondii MIC2, N. Caninum MIC2 and E. tenella MIC1 are orthologous to P. falciparum TRAP; whilst T. gondii MIC4, N. Caninum MIC4, E. tenella MIC5 and S. muris SML-2 are orthologues.

Most MICs are modular arrays of sequences with homology to adhesive proteins from higher , such as thrombospondin (TSP), epidermal growth factor (EGF), von Willibrand factor type A (vWA), galectin and apple/PAN domains (Tomley & Soldati 2001). A novel fold - the microneme adhesive repeat (MAR) - was recently identified in T. gondii MIC1 and has subsequently been attributed to several additional proteins within the family (Blumenschein et al. 2007; Friedrich et al. 2010). Almost all MICs contain at least one adhesive domain, governing an interaction with a carbohydrate (e.g. a host cell surface receptor) or protein (e.g. another MIC). Several MICs also contain putative transmembrane (TM) and cytoplasmic sequences, which contain microneme sorting signals and associate with the glideosome (figure 1.3).

37

Figure 1.5: domain structures of selected apicomplexan microneme proteins. Included are T. gondii (Tg) microneme protein 1 (MIC1) (UniProt KB accession number O00834), MIC2 (O00816), MIC2-associated protein (M2AP) (Q967S9), MIC3 (Q9GRG4), MIC4 (Q9XZH7), MIC5 (P90611), MIC6 (Q9XYH7), MIC7 (Q9BIJ2), MIC8 (Q9BIM7), MIC9 (Q9BIM6), AMA1 (apical-membrane antigen) (O15681); P. falciparum (Pf) TRAP (thrombospondin- related anonymous protein) (P16893), AMA1 (Q7KQK5); N. caninum (Nc) MIC1 (Q8WRS0), MIC2 (Q9U8J9), MIC3 (Q9U483), MIC4 (Q2KHJ3); E. tenella (Et) MIC1 (O43981), MIC2 (Q9UAS1), MIC3 (Q6R5N1), MIC4 (Q9BI05), MIC5 (Q9U966); S. muris lectin (SML-2) (Q08668). MAR = microneme adhesive repeat, TSR = thrombospondin type I repeat, vWA = von Willebrand factor A, CB = chitin binding, EGF = epidermal growth factor, PAN = plasminogen apple nematode, , TM = transmembrane, ER = endoplasmic reticulum.

38

1.4.1. Oligomeric microneme protein complexes

It has been demonstrated that certain MICs associate to form multi-valent host-adhesive heteromers. Three complexes have been identified in T. gondii, depicted in figure 1.6. The physical association of TgMIC1, TgMIC4 and TgMIC6 was first demonstrated via co-immunoprecipitation (Reiss et al. 2001). TgMIC1 and TgMIC4 are soluble proteins with cell-binding properties (Fourmaux et al. 1996; Brecht et al. 2001). Both lack sorting information and depend upon association with membrane-spanning TgMIC6 in the endoplasmic reticulum in order to reach the micronemes; in the absence of TgMIC6 (i.e. in mic6KO parasites) both proteins are incorrectly trafficked to the dense granules (Reiss et al. 2001), the default secretory organelle (Karsten et al. 1998). TgMIC6 does not contribute directly to cell-binding and has therefore been classified as an ‘escorter’ protein, guiding the transport of its host-adhesive counterparts to the micronemes via a cytoplasmic acidic (EIEYE) motif (Reiss et al. 2001).

Figure 1.6: MIC protein complexes in T. gondii. The TgMIC1/4/6 complex assembles via the simultaneous interactions of TgMIC1 with TgMIC4 and TgMIC6. The TgMIC2/TgM2AP complex is thought to be a hetero- hexamer, whilst dimeric TgMIC3 is believed to associate with the TM-anchored TgMIC8.

The interaction of TgMIC2 and TgM2AP has been demonstrated via co-immunoprecipitation (Rabenau et al. 2001). It is further believed that three TgMIC2/TgM2AP dimers associate to form a hexamer (Jewett & Sibley 2004). Like TgMIC6, membrane-spanning TgMIC2 serves as an escorter, with cytoplasmic tyrosine-rich (SYHYY) and acidic (EIEYE) motifs guiding trafficking of the complex (Di Cristina et al. 2000). However TgMIC2 is additionally capable of binding to host cells via its vWA domain (Carruthers et al. 1999; Harper et al. 2004). Nonetheless TgM2AP association is an essential event; TgM2AP-deficient

39 parasites retain TgMIC2 in the early secretory compartments and cell invasion is significantly impaired (Huynh et al. 2003).

Data from immunofluorescence (IF) assays suggest that an interaction exists between soluble, adhesive TgMIC3 and transmembrane TgMIC8 (Garcia-Reguet et al. 2000; Meissner et al. 2002). It had been presumed that TgMIC8 functions as an escorter for TgMIC3, however it does not contain cytoplasmic sorting signals and is not essential for correct TgMIC3 trafficking (Kessler et al. 2008; Sheiner et al. 2010). On the contrary TgMIC3 appears to direct its own trafficking via an N-terminal propeptide, the deletion of which causes incorrect targeting to the dense granules (El Hajj et al. 2008). Similarly the propeptide of TgMIC5 determines its sub-cellular location (Brydges et al. 2008) and thus it appears that propeptides are also important trafficking determinants in MIC proteins (Gaji et al. 2011).

Both MIC2 and M2AP are conserved in E. tenella and N. caninum, and hence orthologous complexes are also expected to exist (Rabenau et al. 2001). However there is no homologue of M2AP in Plasmodium, and hence TRAP is believed to function alone (Brossier & Sibley 2005). TgMIC1, TgMIC4 and TgMIC6 are closely conserved in N. caninum, although early evidence suggests that NcMIC1 and NcMIC4 may not physically associate as in T. gondii (Keller et al. 2004). Meanwhile, EtMIC5 (a TgMIC4 orthologue) has been suggested to associate with EtMIC4 (Periz et al. 2007). It appears therefore that a variety of distinct complexes exist in different apicomplexan genera.

1.4.1.1. Transmembrane MICs bridge the host surface and parasite cytoskeleton

In addition to mediating microneme protein trafficking, MIC protein cytoplasmic sequences play crucial roles during gliding motility and host cell invasion. The cytoplasmic tails of TgMIC2, PfTRAP, TgMIC6, TgMIC8 and TgAMA1 mediate the association of host-adhered complexes with the glideosome, via an interaction with the actin-coupled glycolytic enzyme aldolase (figure 1.3). Mutagenesis studies have shown that a tryptophan residue located close to the C-terminus of each protein is essential for the interaction to occur (Jewett & Sibley 2003; Zheng et al. 2009; Sheiner et al. 2010).

As a parasite is engulfed into a host cell, the tight associations between surface molecules must be disengaged in order for the PV to close and separate from the host plasma membrane. MIC complexes are shed from the parasite surface via proteolytic cleavage (figure 1.4; step 5). Each of the

40 aforementioned membrane-spanning TgMICs contains a rhomboid protease cleavage site within its putative transmembrane domain (Brossier et al. 2003; Sheiner et al. 2010). The rhomboid proteases TgROM4 and TgROM5 have been implicated as mediators of these cleavage events (Brossier et al. 2005; Buguliskis et al. 2010).

1.4.1.2. Host-cell adhesion by T. gondii

Together the three T. gondii MIC complexes identified to date incorporate the four known host-adhesive proteins; TgMIC1, TgMIC2, TgMIC3 and TgMIC4. Individual and combined knockout of the genes encoding these proteins has enabled assessment of their overall contributions to host cell invasion. Deletion of the mic2 gene is lethal to the parasite (Brossier & Sibley 2005) whereas suppression of TgMIC2 expression induces an 80% decline in human foreskin fibroblast (HFF) invasion rate, impaired gliding motility, and parasite avirulence in mice (Huynh & Carruthers 2006). Compared with wild-type, mic1KO parasites exhibit a 60% reduction in HFF invasion rate and slightly reduced virulence in mice (Cérède et al. 2005; Friedrich et al. 2010). Meanwhile mic3KO parasites bear HFF invasion properties akin to wild-type, yet also exhibit slightly reduced virulence in mice. Interestingly, a doubly-depleted parasite (i.e. mic1+mic3ko) is almost entirely avirulent (Cérède et al. 2005). Finally, deletion of the mic4 gene induces a 40% decline in HFF invasion (Friedrich et al. 2010).

Together these findings demonstrate that TgMIC2 is an essential factor in T. gondii (as is the orthologue TRAP in P. falciparum), whilst TgMIC1, TgMIC3 and TgMIC4 deficiencies are not lethal but result in diminished invasion. Additionally, these non-essential TgMICs appear to share a synergistic relationship. It has been postulated that TgMIC2 supplies a basal host cell affinity which is enhanced by additional TgMICs in a cell type-dependent manner (e.g. by TgMIC1/4 in HFF cells) (Cérède et al. 2005).

1.4.2. Apple/PAN domains

Apple domains are a sub-class of the PAN (plasminogen/apple/nematode) domain superfamily (Tordai et al. 1999). All PAN domains contain four conserved internal cysteine residues, giving rise to two disulphide linkages, whilst apple domains contain an additional linkage between N- and C-terminal cysteines. Proteolysis and mass spectrometry of various PAN domains has identified a conserved pattern

41 of internal disulphide cross-links, between C1/C4 and C2/C3 (Wiman 1973; McMullen et al. 1991a; McMullen et al. 1991b; Zhou et al. 1998).

PAN domains have been identified in a wide range of organisms. The human blood plasma proteins factor XI (FXI) and prekallikrein (PK) contain multiple apple domains (McMullen et al. 1991a; McMullen et al. 1991b) whilst plasminogen/hepatocyte growth factor (HGF) family ‘N-domains’ bear PAN-like sequences (Zhou et al. 1998; Tordai et al. 1999). Proteins from various nematodes, including C. elegans, contain putative apple domains (Tordai et al. 1999), as do antithrombotic proteins from leech species Haementaria officinalis and Hirudo medicinalis (Gronwald et al. 2008; Huizinga et al. 2001). Additionally various apicomplexan microneme proteins contain apple-like sequences, namely T. gondii MIC4, N. caninum MIC4, S. muris SML-2 and E. tenella MIC5 (Brecht et al. 2001; Keller et al. 2004; Klein et al. 1998; Brown et al. 2000). Recent structural analyses of AMA1 from P. falciparum, P. vivax and T. gondii have identified divergent PAN domains, containing extended loop regions and altered disulphide linkage patterns (Bai et al. 2005; Pizarro et al. 2005; Crawford et al. 2010).

1.4.2.1. Sequences and structures of apple/PAN domains

The various identified PAN domains exhibit a high degree of sequence divergence, as demonstrated by the alignment in figure 1.7. Aside from the conserved cysteine residues, only a hydrophobic-alcohol- hydrophobic tripeptide downstream of C4, a leucine following C5, and an SG dipeptide upstream of C6 are commonly shared (Brown et al. 2000). In spite of this, the apple/PAN fold is well conserved across the known atomic structures of H. officinalis LAPP (Huizinga et al. 2001), H. medicinalis Saratin (Gronwold et al. 2008), HGF N-domain (Zhou et al. 1998), EtMIC5 apple9 (Brown et al. 2003) and FXI (Papagrigoriou et al. 2006) (figure 1.8A).

The apple/PAN domain folds consists of a core β-sheet of five anti-parallel strands (β1-5) flanked by an α-helix and a short two-stranded β-sheet (β1’, β4’), giving rise to an overall β1-β1’-β2-α-β3-β4-β4’-β5 sequential topology (figure 1.8B). The intersecting loop regions are of variable length and correlate with gaps in the sequence alignment. The strands β3 and β4 form a β-hairpin loop motif which is tethered to the preceding α-helix via the two internal disulphide linkages. The aforementioned hydrophobic-alcohol- hydrophobic tri-peptide forms the β3 strand and the intersected hydrophobic side-chains pack against the inner face of the α-helix (figure 1.8C) providing a stable hydrophobic core (Huizinga et al. 2001).

42

Figure 1.7: a sequence alignment of thirty-five known apple/N domain sequences. Included are domains from human factor FXI (hFXI), pre-kallikrein (hPK), hepatocyte growth factor (hHGF) and plasminogen (hPLA); Sarcocystis muris lectin-2 (SML-2); Toxoplasma gondii and Neospora caninum microneme protein 4 (TgMIC4 and NcMIC4); Eimeria tenella microneme protein 5 (EtMIC5); Haementeria officionalis leach anti-platelet protein (HoLAPP) and Hirudo medicinalis Saratin. Cysteines residues are highlighted in black and commonly conserved residues in grey. The secondary structural organisation of a typical apple/PAN domain, based upon known structures, is included; α- helices and β-strands are respectively denoted by blue and red shapes.

43

Figure 1.8: structural analysis of apple/N domains. A) Clockwise from top-left: the crystal structure of the single apple domain of leech anti-platelet protein (LAPP) from Haementeria officinalis (PDB ID 1I8N; blue); the solution structure of Saratin from Hirudo medicinalis (PDB ID 2K13; magenta); the crystal structure of human FXI-A1 (PDB ID 2F83; green). the solution NMR structure of the N-domain from human hepatocyte growth factor (HGF) (PDB ID 2HGF; orange); the solution NMR structure of the ninth apple domain (A9) from Eimeria tenella MIC5 (PDB ID 1HKY; purple). Sulphur atoms are coloured in yellow. B) The secondary structural topology of an apple/PAN domain. C) A superposition of the α1-β3 loop motifs from each apple/PAN domain structure, alongside an alignment of the β3 strand sequences. Conserved hydrophobic side-chains pack against the inner face of the α- helix.

44

1.4.2.2. The functions of apple/PAN domains

Functional characterisation of various apple/PAN domains has revealed a multitude of adhesive roles. For example, the apple domains of FXI are responsible for the recruitment of various proteins involved in the blood coagulation cascade. Thrombin and factor XII are recruited via adhesion to A1 and A4 respectively, whilst A2 and A3 adhere to high-molecular weight kininogen (HK), heparin and activated platelets (Emsley et al. 2010; Papagrigoriou et al. 2006). Although atomic structures of FXI/ligand complexes are yet to be attained, some of the key residues involved in ligand binding have been identified via mutagenesis (reviewed in Emsley et al. 2010). The positions of these residues are not conserved, giving rise to several distinct binding sites (figure 1.9A).

The leech proteins LAPP and Saratin are both believed to inhibit blood coagulation via binding to collagen (Huizinga et al. 2001; Gronwold et al 2008). Like for FXI, atomic structures remain elusive, however residues located at the collagen-binding interface of saratin have been identified via NMR titration experiments (Gronwold et al. 2008) (figure 1.9B). These residues are concentrated at the base of the α-helix and central β-sheet, in largely distinct positions from the key residues of FXI apple domains. Although LAPP and saratin share an orthologous relationship, and bear closely converging atomic structures, many of these key collagen-binding residues in saratin are not conserved in LAPP. Consequently it is not clear whether the mechanism of collagen binding is conserved.

The HGF N-domain binds to heparin - a crucial step for the binding and activation of the c-MET receptor, whose downstream signalling pathways modulate cell growth, cell motility and morphogenesis (Zhou et al 1998). The crystal structure of a HGF-N/heparin complex reveals another distinct binding site, with side-chains of residues within the α-helix and β2-strand contacting the carbohydrate (figure 1.9C) (Lietha et al. 2001).

Together, the data collected to date suggest that apple/PAN domains bind to their respective ligands via a wide variety of mechanisms. This is consistent with the aforementioned high sequence disparity between the domains, giving rise to distinct surfaces with ligand-binding pockets/interfaces located in varying positions.

45

Figure 1.9: ligand-binding by apple/N domains. A) The crystal structures of FXI-A1 (left), -A2 (middle) and -A3 (right). The residues involved in ligand interactions are depicted as spheres. These include the thrombin-binding E63, K83 and Q84 in A1; HK-binding G155 in A2; and platelet-binding S248 and R250, and heparin-binding K252, K253 and K255 in A3. B) The NMR structure of saratin with the residues comprising the collagen-binding interface (T8, E39, Y40, Y42, E62 and Y78) depicted as spheres. C) the crystal structure of the HGF N-domain/heparin complex (PDB ID 1GMN). D) The sequence alignment of the five depicted domains with ligand-binding residues highlighted. The positions of the key residues are mostly not conserved.

46

1.4.3. The TgMIC1/4/6 complex

The physical association of TgMIC1, TgMIC4 and TgMIC6 was first demonstrated in 2001 by Reiss and co- workers, who co-precipitated the three proteins from parasite lysates via antibodies to TgMIC1 and TgMIC6 (Reiss et al. 2001). TgMIC1 forms the spine of the complex, bridging TgMIC4 and TgMIC6, as pictured in figure 1.6 (Reiss et al. 2001; Saouros et al. 2005). Both TgMIC1 and TgMIC4 contribute host- adhesive sites, giving rise to a multi-valent complex.

1.4.3.1. T. gondii microneme protein 1 (TgMIC1)

TgMIC1 is a soluble protein of 456 amino acids, giving rise to a predicted molecular mass of 48,672 Da. It was one of the first T. gondii microneme proteins to be discovered, via the use of parasite lysates to raise monoclonal antibodies to three microneme-localised proteins (the others being TgMIC2 and TgMIC3) (Achbarou et al. 1991). The cDNA encoding the protein was later cloned and sequenced and the host-adhesive properties of the protein demonstrated (Fourmaux et al. 1996). TgMIC1 contains structured domains at its N- and C-termini, intersected by a linker region which is predicted to be flexible (Fourmaux et al. 1996; Reiss et al. 2001). The X-ray crystal structure of a 246 amino acid N- terminal fragment (TgMIC-NT) has identified two tandem microneme adhesive repeat (MAR) domains (Blumenschein et al. 2007), whilst the NMR solution structure of a 136 amino acid C-terminal fragment (TgMIC1-CT) has revealed a galectin-like fold (Saouros et al. 2005) (figure 1.10A).

Deletion of the mic1 gene results in the TgMIC4 and TgMIC6 being retained in the early secretory compartments (Reiss et al. 2001) and hence effectively serves to knockout the entire complex. This leads to a 50-60% decline in host cell invasion rate by T. gondii (Cérède et al. 2005; Friedrich et al. 2010), caused by a dual loss of TgMIC1 and TgMIC4 activities. Although it is devoid of micronemes sorting signals, TgMIC1 is therefore a vital component for transport of the complex. Further studies in mic1ko parasites have shown that TgMIC6 trafficking can be restored via complementation with recombinant TgMIC1-CT (Saouros et al. 2005). Whilst TgMIC1-CT bears structural homology to the galectin family of carbohydrate-binding proteins, many of the key residues which mediate galectin carbohydrate recognition are absent. Indeed, the domain is devoid of carbohydrate-binding activity, but has been found to interact with and stabilise the folding of the third EGF domain in TgMIC6 (Saouros et al. 2005).

47

This interaction is considered to be a quality-control checkpoint, ensuring the exit of properly folded and assembled complexes from the ER.

Figure 1.10: atomic structures of TgMIC1. A) The atomic structures of TgMIC1-NT (pink, PDB 2JH1) and TgMIC1-CT (orange, PDB 2BVB). TgMIC1-NT consists of two MAR domains (pink)), each comprised of two α-helices flanking five β-strands. In each MAR domain the fifth β-strand (highlighted in blue) comprises a binding pocket for sialylated oligosaccharides. TgMIC1-CT consists of a single globular domain; an 11-stranded β-sandwich with structural homology to the Galectin protein family. TgMIC1-NT and -CT are connected via a 56 amino acid linker which is predicted to be unstructured. B) a close-up of the second MAR domain bound to 3’-sialyl-N-acetyl-D- lactosamine (NeuAcα2,3Galβ1,4GlcNAc). The aforementioned binding pocket is highlighted in blue. This image was created using the crystal structure of TgMIC1-NT bound to α2,3-sialyl-N-acetyllactosamine (pdb ID 2JH7) and the solution structure of TgMIC1-CT (PDB ID 2BVB).

48

Co-immunoprecipitation and in vitro cell-binding assays have established that TgMIC1-NT is dually responsible for TgMIC4 recruitment and host cell adherence (Saouros et al. 2005). Cell binding activity has been probed via carbohydrate microarray analysis (Liu et al. 2009), demonstrating an affinity for terminally sialylated oligosaccharides. Consistent with this in vitro host cell invasion is abrogated in the presence of 10mM sialic acid and following pre-treatment of cells with neuraminidase. Combined NMR and X-ray crystallography studies have identified a binding pocket on either domain for 3’- and 6’- sialylated N-acetyl-D-lactosamine (Blumenschein et al. 2007; Garnett et al. 2009) (figure 1.10B).

1.4.3.2. T. gondii microneme protein 6 (TgMIC6)

TgMIC6 was discovered during a search of the T. gondii expressed sequence tag (EST) database (created by Ajioka et al. 1998) for proteins containing TRAP-like TM and cytoplasmic sequences (Reiss et al. 2001; Meissner et al. 2002). Isolation and sequencing of the cDNA enabled elucidation of a 349 amino acid sequence (with a predicted mass of 36,656 Da) containing three putative EGF-like domains and an acidic region preceding C-terminal TM and cytoplasmic sequences (figure 1.5) (Meissner et al. 2002). As described in chapter 1.4.1.1, these C-terminal regions are involved with microneme protein trafficking, glideosome association and surface shedding of MIC proteins post-invasion.

The EGF1 domain of TgMIC6 is liberated during transport to the micronemes (Reiss et al. 2001). The significance of this event is currently unknown, however it may represent a propeptide-mediated targeting step, as observed for TgMIC3 and TgMIC5 (Gaji et al. 2011). In addition to the previously described association of TgMIC6-EGF3 and TgMIC1-CT, it has recently been demonstrated that the EGF2 domain is also capable of interacting with TgMIC1-CT. Determination of the solution NMR structure of the two proteins in complex (figure 1.11B) has enabled detailed characterisation of the interaction (Sawmynaden et al. 2008). It was also demonstrated that TgMIC1-CT binds TgMIC6-EGF3 in via the same mechanism.

1.4.3.3. T. gondii microneme protein 4 (TgMIC4)

TgMIC4 was discovered via probing of parasite excretory-secretory antigen (ESA) with monoclonal antibodies. N-terminal sequencing of a recognised protein, followed by scanning of the T. gondii EST database, enabled the detection of a 580 amino acid sequence (of predicted molecular mass 63,021 Da)

49

(Brecht et al. 2001). TgMIC4 is comprised of six apple domains, arranged in three pairs, flanked by linking sequences of variable length (figure 1.12A). Phylogenetic analysis, using Phylogeny.fr (Dereeper et al. 2008), demonstrates a clear clustering of odd and even numbered domains (figure 1.12B), suggesting that the domain pairs have evolved through duplication and divergence (Brown et al. 2000).

Figure 1.11: atomic structures of TgMIC6-EGF2. A) The solution structure of TgMIC6-EGF2 in its free form (PDB ID 2K2T). The domain bears a canonical EGF-like fold. B) The solution structure of a TgMIC6-EGF2/TgMIC1-CT complex (PDB ID 2K2S).

Figure 1.12: TgMIC4 domain organisation and structure. A) The domain topology of TgMIC4. Apple domains are arranged in widely-spaced pairs. The protein contains an N-terminal ER leader peptide (aa’s 1-25) which is cleaved during transport into the ER. The protein is additionally proteolysed at two sites (arrowed) following secretion from the micronemes; at residue 59, and at an unspecified site towards the C-terminus, likely located in-between domains apple4 and apple5. B) A cladogram of the six apple domains. The odd and even numbered domains cluster independently. Branch support values are provided in red. Values below 50% have been collapsed.

50

In spite of the high sequence conservation within TgMIC4, the domain pairs appear to be functionally distinct. For example, delivery of TgMIC4 to the micronemes occurs via association of apple-12 (A12) with TgMIC1-NT (Reiss et al. 2001; Saouros et al. 2005). Meanwhile, in vitro cell-binding assays have demonstrated that TgMIC4 is capable of binding to HFF cells (Brecht et al. 2001). This activity is exclusively retained by a C-terminal fragment yielded from proteolysis by TgSUB1 following secretion. Though the precise TgSUB1 cleavage site is not clear, it is strongly believed to be located between the fourth and fifth apple domains, thereby localising the host-adhesive function to apple-56 (A56) (Lagal et al. 2010). Furthermore, it has been previously observed that deletion of the final twelve residues of TgMIC4 abolishes host cell adhesion (Brecht et al. 2001).

Deletion of the mic4 gene decreases T. gondii invasivity by approximately 40%, suggesting that the protein makes a substantial contribution to invasion (Friedrich et al. 2010). Additionally, the liberation of the host-adhesive fragment following secretion potentiates a role in events downstream of host cell invasion. Although detailed insights into the mode of receptor binding by TgMIC4 remain elusive, evidence suggests that the C-terminal domain pair recognises a galactosylated receptor. In vitro host cell invasion is diminished in the presence of galactose and lactose, suggesting that galactose recognition is involved in host cell recognition (Blumenschein et al. 2007; Friedrich et al. 2010). Furthermore, the TgMIC1/4 sub-complex has been previously purified via lactose-affinity chromatography (Lourenco et al. 2001), which is presumably mediated by TgMIC4, since previous studies have established that TgMIC1 adheres only to sialylated oligosaccharides (Blumenschein et al. 2007). Competition cell-binding assays of TgMIC4 in the presence of galactose or N-acetylglucosamine have revealed a biphasic effect, with low competitor doses (1 mg/ml) enhancing adhesion whilst high doses (50mg/ml) were inhibitory (Brecht et al. 2001). Furthermore, various apple domain-containing apicomplexan orthologues, namely SML-2, NcMIC4 and EtMIC5 (in complex with EtMIC4) have previously been purified via lactose-affinity chromatography (Klein et al. 1998; Keller et al. 2004; Periz et al. 2007).

1.4.3.4. The current model of the TgMIC1/4/6 complex

The identification of two TgMIC1-binding EGF-like domains in TgMIC6 provided the first insight into the overall architecture of the TgMIC1/4/6 complex. This prompted the formulation of a model (Sawmynaden et al. 2008) (figure 1.13), in which membrane-anchored TgMIC6 presents a fixed arrangement of four host-adhesive binding sites, provided by two TgMIC1 molecules.

51

Figure 1.13: the current model of the TgMIC1/4/6 complex. Solution/crystal structures are included where available. TgMIC6 is represented by grey and red cylinders (denoting TM and acidic sequences) alongside an EGF3 homology model (blue) and EGF2 solution structure (green). EGF1 is cleaved during transport to the micronemes and is therefore not pictured. The TgMIC6 ecto-domain is liberated from the parasite surface post-secretion (indicated by an arrow). TgMIC6 recruits TgMIC1 molecules via interactions of its EGF domains with TgMIC1-CT (orange). The solution structure of TgMIC6-EGF2/TgMIC1-CT complex is pictured alongside a homology model for TgMIC6-EGF3/TgMIC1-CT. TgMIC1-CT is connected to the host-adhesive TgMIC1-NT (magenta) via a linker of predicted flexibility. Crystal structures of TgMIC1-NT in complex with α2,3-sialyl-N-acetyllactosamine (magenta/pink) are pictured, and host-adhesive activity emphasised by pink arrow-heads. TgMIC1-NT is additionally responsible for recruitment of TgMIC4, believed to occur via an interaction with the A12 domain pair. TgMIC4 domain pairs are represented by yellow circles, intersected by undefined linker regions. The C-terminus of TgMIC4, cleaved following secretion (indicated by arrows), is believed to contain host cell-binding activity; emphasised by yellow arrow-heads. Diagram modified from Sawmynaden et al. 2008.

52

Additional host-adhesive sites are provided by TgMIC4. Of course this model is very much theoretical, based on various in vivo and in vitro studies using truncated proteins and sub-cloned domains. The total stoichiometry of the complex is currently unknown, but is likely to be of higher order than depicted here.

1.5. Project aims & objectives

Microneme proteins are important mediators of host cell attachment during host cell invasion by Apicomplexan parasites. Detailed insights into their respective structures and functions is therefore sought in order to further our understanding of parasite biology, and aid the future development of therapeutic strategies. The TgMIC1/4/6 complex is one of the better characterised microneme protein complexes at present, however TgMIC4 remains relatively poorly characterised on both a structural and functional level (figure 1.13).

In order to further our understanding of the mechanism host cell adhesion by TgMIC4, recombinant TgMIC4-A56 has been previously produced in our laboratory, and demonstrated to be only partially structured via NMR spectroscopy (Prof. Steve Matthews, personal communications). Carbohydrate microarray analysis (i.e. a means of screening for binding to carbohydrate; reviewed in Liu et al. 2009) has revealed a highly specific affinity for terminally-galactosylated oligosaccharides, ruling out the notion that it can also adhere to glucose (Brecht et al. 2001). Limited proteolysis of the partially structured protein yielded a stable fragment, approximately the size of an apple domain, which retained lectin activity. However, the identity of this fragment, its mode of galactose recognition, and its overall role in vivo, remain unknown.

By primarily using NMR spectroscopy this project sets out to enhance our understanding of TgMIC4 host cell receptor binding and recognition, via the following objectives:

 Identification of the folded, galactose-binding fragment within recombinant TgMIC4-A56.  Production of a recombinant form of this fragment for solution structure determination.  Ligand-binding analysis of the adhesive fragment, with a view to structure determination of a protein/ligand complex.

53

Chapter 2: An Introduction to Nuclear Magnetic

Resonance (NMR) Spectroscopy

54

2.1. Introduction

Nuclear Magnetic Resonance (NMR) is a spectroscopic technique with a broad range of applications in chemistry and biology. In this project it has been utilised for protein structure determination and studies of carbohydrate ligand binding. This chapter introduces some basic theoretical aspects of NMR, before discussing its applications to studying proteins and carbohydrates.

2.2. The origins of NMR signals

2.2.1. Nuclear spin angular momentum and magnetism

With regards to NMR spectroscopy, atomic nuclei are characterised by a fundamental property known

as spin angular momentum, I. This is defined by the spin quantum number, I , the value of which depends upon the number of unpaired protons and neutrons within the nucleus; each unpaired nucleon contributes I = ½ . For example, in 12C all protons and neutrons are paired, therefore the nucleus has zero spin (i.e. is ‘NMR silent’), whereas 1H and 13C contain a single unpaired proton, giving rise to ‘NMR active’ nuclei of I = ½ (Hore 1995). The magnitude of spin angular momentum is given by

ħ (Equation 2.1) where ħ is the reduced Planck constant (h/2π). Atomic nuclei also possess charge, which gives rise to an associated magnetic moment, μ, in nuclei possessing spin. The magnetic moment is directly proportional to spin angular momentum, as given by

I (Equation 2.2) where γ is the gyromagnetic ratio; an intrinsic proportionality constant (i.e. a measure of magneticity) (Hore 1995). Both μ and I are vector quantities, meaning their direction, as well as magnitude, is quantised. The orientation of a nucleus with respect to an arbitrary axis (e.g. z-axis) is given by

ħ (i.e. ħγ ) (Equation 2.3)

55 where m is the magnetic quantum number , which has 2I + 1 values in integer steps between -I and +I. Thus, for a spin-½ nucleus m equals -½ or + ½, giving rise to two possible orientations (Hore 1995).

2.2.2. Larmor precession and resonance

When placed in an external magnetic field (B0), the magnetic moment of a nucleus (i.e. the ‘spin’) aligns with the field via its 2I + 1 orientations. The energy, E, of the magnetic moment in the presence of a magnetic field is given by

(Equation 2.4) substitution of which with equation 2.3 gives

ħ (Equation 2.5)

Thus, a spin-½ nucleus can adopt two distinct energy states, representing m -½ and + ½. These are often defined as corresponding to parallel (α-state) and anti-parallel (β-state) alignment with the magnetic field direction, the latter being of higher energy (figure 2.1A) (Hore 1995).The external magnetic field induces a torque on the magnetic moment, causing it to precess about the axis of the field (figure 2.1B). The rate of precession, termed the Larmor frequency, is given by

(rad s-1) or (Hz) (Equation 2.6)

The direction of precession is therefore determined by the gyromagnetic ratio of the nucleus, with a positive γ (e.g. 1H, 13C) conferring negative precession (i.e. clockwise rotation in a right-handed axis). The Larmor frequency is directly proportional to magnetic field strength. At present NMR spectrometers employ field strengths up to 23.5 Tesla, equating to a 1H frequency of 1 GHz (i.e. in the radiofrequency region of the electromagnetic spectrum). It is standard practice to refer to an NMR spectrometer by the Larmor frequency of hydrogen, e.g. a “600 MHz spectrometer” (Hore 1995).

56

Figure 2.1: the effect of an external magnetic field upon a spin-½ nucleus. A) the nucleus aligns either parallel or antiparallel to the applied magnetic field, B0. The latter orientation is of higher energy (i.e. the ‘excited’ state). B) a depiction of Larmor precession using the conventional model of a vector in a Cartesian co-ordinate frame. The external field is applied along the z-axis, and induces a torque upon the nucleus (depicted in the α-state; aligned parallel to field), causing it to precess about the z-axis at the Larmor frequency.

Resonance occurs when the spin state of a nucleus is altered (e.g. transition from α- to β-state). This is achieved via the absorption of electromagnetic radiation such that the resonance condition, ΔE = hv, is satisfied, and is governed by the selection rule Δm = ±1; stating that only transitions between adjacent energy levels (termed ‘single-quantum’) are allowed. The frequency of the supplied electromagnetic radiation must therefore be equal to the Larmor frequency of the nucleus (Hore 1995), since

(Equation 2.7)

2.3. The excitation of NMR signals

In terms of the conventional Cartesian co-ordinate model (figure 2.1B), the transition from α- to β-state corresponds to the rotation of the vector from the z-axis into the x-y (transverse) plane, where ‘magnetisation’ is ultimately detected (see chapter 2.3.6). In practice this is achieved by applying a small oscillating magnetic field B1 (commonly referred to as a ‘pulse’) along the x-axis, of frequency equal to that of Larmor precession (i.e. radiofrequency, rf). This oscillating field is capable of influencing magnetisation despite being significantly smaller than B0. In order to describe how this is the case, it is first necessary to introduce some key concepts, namely the vector model and the rotating frame.

57

2.3.1. The vector model

When a sample, containing multiple spins, is placed into an external magnetic field, the energy of the system is minimised if all magnetic moments align parallel to the field (i.e. in the α-state). In fact, the population of the energy levels at equilibrium is given by the Boltzmann distribution,

(Equation 2.8)

where Nα/Nβ represent the numbers of nuclei in either energy state, kB the Boltzmann constant and T the temperature (Hore 1995). Thus, in practice the relatively small difference between energy levels (ΔE) is eclipsed by the energy of thermal motion (kBT), resulting in only a minor excess of spins in the ground- state at thermal equilibrium. This slight excess is commonly represented as a group of spins precessing about the z-axis, giving rise to a single ‘bulk magnetisation’ vector, M (figure 2.2) (Keeler 2005). This provides a simplified picture, termed the vector model, avoiding the more complicated visualisation of multiple magnetic moments.

Figure 2.2: the vector model of NMR. At thermal equilibrium (i.e. the steady-state) there exists a slight excess of spins in the ground state. These are represented as a single, bulk magnetisation vector.

2.3.2. The rotating frame

The interaction of M with B1 necessitates the rather complex visualisation of the interaction of two simultaneously precessing vectors in order to understand what is happening during an NMR experiment.

58

Consequently, this picture is simplified via visualisation of events in a rotating frame, where the

Cartesian co-ordinate frame is rotated at the same rate as B1 precession (Keeler 2005). The linearly - + oscillating B1 field can be visualised as two counter-rotating vectors; B1 and B1 . Many of the nuclei most commonly studied by NMR (1H, 13C) bear positive gyromagnetic ratios, thus undergo negative precession

- + and interact with B1 (the lack of interaction with B1 means it is ignored). Rotation of the co-ordinate - frame at the same rate as B1 results in the field appearing stationary, enabling sole consideration of the position of M.

2.3.3. The effect of a pulse

Despite being much smaller than B0, B1 is capable of manipulating M provided that the resonance condition (equation 2.7) is satisfied. This can be explained using the rotating frame, in which the apparent frequency of Larmor precession, termed the ‘offset’, is given by

(Equation 2.9)

where ω0 and ωro are the Larmor and rotating frame frequencies respectively (Keeler 2005). By substituting this into equation 2.6 and rearranging, a ‘reduced field’, ΔB, along the z-axis is given by:

(Equation 2.10)

Additionally, B1 oscillates along the x-axis (stationary in the rotating frame, provided that the transmitter frequency, ωtx, is equal to ωro). ΔB and B1 combine vectorially to give the effective field, Beff, with which M interacts (figure 2.3). Thus, setting the transmitter frequency close to the Larmor frequency minimises

Ω and ΔB and thereby enhances the contribution of B1 to Beff, enabling rotation of M further toward the x-y plane (Keeler 2005). (i.e. if Ω equals zero then so does ΔB, hence Beff is equal to B1). The angle through which M turns, known as the tilt angle, θ, is dependent on the amplitude and duration of the pulse. Rotation by 90o, into the x-y plane, corresponds to equalisation of the populations of α- and β-

o states, whilst a 180 pulse, to -z, corresponds to inversion of populations. Once B1 is switched off, the z- axis B0 field predominates once more, and M precesses about the field direction at its resonance frequency.

59

Figure 2.3: the effective field, Beff. In the rotating frame, the external field B0 is reduced, ΔB. The application of B1 gives rise to an effective field, Beff; the vector sum of ΔB and B1, with which M interacts.

2.3.4. Chemical shifts

In reality, not all nuclei of a particular type (e.g. 1H) resonate at an identical frequency. Differences in the chemical environments (see below) surrounding nuclei elicit deviations from the Larmor frequency, giving rise to ‘chemical shifts’ (Hore 1995).

2.3.4.1. The chemical shift scale

The resonance frequency of a nucleus depends on the strength of magnetic field. For example, 1H resonance frequencies are approximately 600 MHz and 400 MHz in 14.1 and 9.4 T fields. Consequently, resonance frequencies are normalised with respect to the Larmor frequency, giving rise values in parts per million (ppm), via

(Equation 2.11)

where and ref are the resonance frequency and Larmor frequency (i.e. treated as a reference value) respectively (Hore 1995).

60

2.3.4.2. Nuclear shielding

The chemical shift of a nucleus depends on its surrounding chemical environment. The presence of the external field, B0, induces motions in the electrons surrounding the nucleus, generating a small opposing field, B’. The results in an experienced field, Bex, given by

(Equation 2.12)

where σ is the shielding constant, representing the proportionality between B’ and B0 (Hore 1995). As a result the resonance condition becomes

(Equation 2.13)

The degree of shielding is dictated by the electronic structure near the nucleus. In the case of hydrogen, bonding to a strongly electron-withdrawing atom, such as nitrogen, causes de-shielding, and thus an increased resonant frequency compared to, for example, a carbon-bonded hydrogen. Additionally, the external magnetic field induces a current in the circulating delocalised electrons of aromatic groups, which can induce significant shielding and de-shielding effects, causing relatively large deviations in the resonance frequencies of proximate nuclei (Gomes 2001; Hore 1995).

2.3.4.3. Exciting a range of frequencies

Theoretically the chemical shift effect causes a problem, as B1 is applied at a particular frequency, which may not be ‘on-resonance’ with the frequency of a strongly shielded/de-shielded nucleus. However, in practise a pulse can excite a range of frequencies depending on its amplitude and duration; Heisenberg’s Uncertainty Principle dictates that a pulse ‘width’ of Δt has an uncertainty of 1/Δt. Application of a short pulse at high amplitude (i.e. a ‘hard’ pulse) excites a wide range of frequencies, whilst a longer pulse of lower amplitude (i.e. a ‘soft’ pulse) excites a narrower range. A hard pulse of ~10 μs is usually employed to induce a 90o rotation of bulk magnetisation (Claridge 2009).

61

2.3.4.4. Chemical shifts in the rotating frame

Following application of a 90o pulse, spins with distinct chemical shifts precess in the x-y plane at their distinct resonance frequencies, resulting in dispersion of the bulk magnetisation. In a model system of three spins I, II and III, with chemical shifts of I > II > III, visualisation the respective vectors in a

Cartesian co-ordinate frame is simplified in the rotating frame. If ro is to equal to II then this vector appears stationary, and we therefore only observe behaviour of spins I and III offset to this reference frequency (figure 2.4).

Figure 2.4: chemical shifts in the rotating frame. Following application a 90o pulse bulk magnetisation is rotated into the x-y plane. Individual magnetic moments then begin to precess at their characteristic resonance frequencies. If rf is equal to II then spins I and III evolve according to their offsets from this reference frequency.

This is analogous to the common mode of detection during an NMR experiment, where the transmitter frequency (i.e. in the centre of the spectrum) is subtracted from the resonance frequency. Thus, absolute frequencies, in MHz, are not detected, but offsets, in Hz, from the reference (Claridge 2009).

2.3.5 Relaxation

Following application of the rf pulse, magnetisation vector precesses in the x-y plane at its resonance frequency. However, over time the energy conferred to the system is lost and the steady-state is restored; a process known as relaxation. Nuclear spin relaxation occurs over relatively long time periods (≤1 s) compared other spectroscopic transitions (e.g. <1 ps for electrons), enabling detection (see

62 chapter 2.3.6) or further manipulation of magnetisation (i.e. multi-pulse NMR; see chapter 2.6).

Relaxation occurs via two pathways, characterised by time constants, T1 and T2 (Claridge 2009).

2.3.5.1. Longitudinal relaxation (T1)

Longitudinal relaxation describes the re-establishment of bulk magnetisation along the +z axis, and is characterised by the time constant T1. Decay occurs at exponential rate, and is effectively complete after o a period of 5T1 following a 90 pulse (Claridge 2009). Based on the resonance condition (equation 2.7), the return to thermal equilibrium (i.e. β- to α- transition) must be stimulated by a transverse magnetic field oscillating close to the Larmor frequency. However, rather than applying an rf pulse, localised fields occur spontaneously within a molecule, primarily via dipolar interactions and chemical shift anisotropy (CSA).

The dipolar mechanism describes the through-space interaction between neighbouring spins. Spin-½ nuclei are analogous to bar magnets (i.e. possessing magnetic North and South poles) and therefore generate localised magnetic fields, which fluctuate as the molecule tumbles in solution and stimulate relaxation of proximate spins at appropriate frequencies. The dipolar interaction is effective only over short distances (proportional to r-3), however multiple nuclei can contribute local fields hence the mechanism is the major pathway to relaxation in, for example, buried hydrogen nuclei in a protein (Claridge 2009). Additionally, the anisotropic electron distribution surrounding nuclei causes local field fluctuations (i.e. ‘chemical shift anisotropy’). Local fields average due to molecular tumbling, giving rise to a distinct chemical shift, however the fluctuating field can still stimulate relaxation. This mechanism predominates for nuclei which exhibit stronger electron anisotropy (e.g. in aromatic rings), and its significance increases at higher field strengths. Sufficiently large chemical shift anisotropies cause signal broadening (Keeler 2005).

2.3.5.2. Transverse relaxation (T2)

Transverse relaxation describes the loss of magnetisation from the x-y plane, and is characterised by time constant T2. According to the vector model (chapter 2.3.1), at thermal equilibrium there exists a bulk magnetisation vector; the sum of multiple individual magnetic moment vectors. Following a 90o pulse, bulk magnetisation is rotated into the transverse plane, where spins precess at their characteristic

63 frequencies. In theory, like spins should precess at identical frequencies and thereby maintain net magnetisation (i.e. ‘phase coherence’). Yet in practice, fluctuations in local fields lead to discrepancies in the precession frequencies of individual vectors. This causes a gradual loss of phase coherence over time, with individual vectors dispersing across x-y plane, leading to a net zero transverse magnetisation (Keeler 2005; Claridge 2009).

o Figure 2.5: spin relaxation in the transverse plane. Following a 90 x pulse all like spins possess phase coherence, and precess at an identical rate giving rise to a net magnetisation vector. However, over time variations in local fields between tumbling molecules cause discrepancies in precessional frequencies, causing individual vectors to become dispersed across the x-y plane, leading to a loss of net magnetisation.

2.3.5.3. Rotational correlation time (τC)

Both T1 and T2 are intimately associated with molecular motion, characterised by the rotational correlation time, τc (i.e. the average time it takes the molecule to rotate through one radian). A long correlation time therefore describes slow motion (i.e. characteristic of a large molecule) and vice versa (Claridge 2009). The rate of motion dictates the frequency distribution of a fluctuating local field (the

‘spectral density’), and thus directly influences T1. However the more significant effect occurs upon T2, since at slow motion the averaging of local fields is less effective, causing loss of phase coherence (i.e. relaxation) over as shorter time period. This ultimately gives rise to broadened NMR signals, and complicates the study of larger (i.e. slower-tumbling) molecules using NMR spectroscopy.

64

2.3.6. NMR signal detection

So far this section has described how application of a short pulse of rf energy induces rotation of bulk magnetisation from the z-axis into x-y plane, where individual spins precess at their characteristic frequencies whilst undergoing relaxation. This precession induces a current in the detection coil; a small coil of wire aligned with the x-axis, which is amplified and recorded, yielding a free induction decay (FID); an oscillating wave which decays over time. This time domain data bears the overlapping resonance frequencies and amplitudes of multiple nuclei and is impossible to interpret manually except in the most basic of instances. De-convolution into an interpretable frequency domain spectrum is carried out via Fourier transformation (figure 2.6) (Keeler 2005).

Figure 2.6: time and frequency domains. Nuclear precession in the transverse plane is detected as a function of time, giving rise to free induction decay (FID), which is converted into the frequency domain via Fourier transformation.

2.3.7. Putting it together; pulse-acquire NMR

The most simple NMR experiment - commonly referred to as a 1D (one-dimensional) experiment - brings together the concepts describes in this chapter. The pulse sequence, depicted in figure 2.7, comprises three stages; beginning with a delay to ensure complete accrual of equilibrium magnetisation. A 90o pulse then rotates magnetisation into the transverse plane, where it precesses and relaxes whilst data is acquired. The sequence is repeated numerous times and the FIDs combined in order to improve the signal-to-noise ratio (S:N), since signal and noise are respectively enhanced by factors of N and (Keeler 2005).

65

Figure 2.7: a schematic depiction of a 1D NMR experiment. The experiment consists of a single 90o rf pulse along the x-axis (step 2), followed by FID acquisition (step 3). The process is repeated multiple times, with a delay period (step 1) to allow relaxation to the steady-state.

Although this pulse sequence can be employed for any spin-½ nucleus, it yields spectra of poor

15 13 sensitivity for the likes of N and C, due to their small gyromagnetic ratios (γH:γC ≈ 4, γH:γN ≈ 10). Thus, one-dimensional spectra of such nuclei are usually acquired using more complicated pulse schemes, in which high-γ 1H nuclei are excited before magnetisation transfer to the low-γ nucleus (e.g. INEPT; see chapter 2.6.1.3) (Keeler 2005).

2.4. Nuclear coupling

The summary of basic NMR concepts provided in chapter 2.3 suggests that a precessing magnetic moment gives rise to a solitary signal in the frequency spectrum. In reality the presence of surrounding nuclei causes signals to be split into several components; an effect known as ‘coupling’ (Hore 1995). Couplings can be eradicated, giving rise to singlet resonances, via implementation of ‘decoupling’ pulses.

2.4.1. Scalar coupling (and decoupling)

Scalar (or J-) coupling occurs through chemical bonds and gives rise to multiplets in spectra. This occurs due to neighbouring nuclei adopting two possible spin states (assuming it is spin-½), which confer slight differences in local chemical environment, giving rise to distinct resonance frequencies. These in turn give rise to distinct signals in spectrum, the frequency separation between which is known as the coupling constant, J. J-values are independent of magnetic field strength and thus are always quoted in units of Hz. The relationship between the magnitude of J and the frequency separation between the coupled nuclei, Δ , defines the precise appearance of a split signal. Strong couplings (Δ ≈ J) give rise to

66 close multiplets, tending towards singlets in extreme cases, whereas weak couplings (Δ > J) give rise to largely-spaced multiplet components, which can be difficult to identify manually (Hore 1995).

Whilst coupling patterns can prove informative in the chemical shift assignment of small molecules, in proteins they tend to complicate spectra by increasing signal overlap. Furthermore, since the area under a signal is proportional to the number of nuclei which give rise to it, splitting spreads signal over a wider area and reduces the signal-to-noise ratio. These issues are exacerbated in isotopically-enriched molecules (i.e. 15N/13C-labelled proteins) since the additional spin-½ nuclei cause further splitting. Consequently, NMR experiments of proteins usually employ decoupling pulses (Keeler 2005). For example, broadband decoupling involves a series of pulses at the Larmor frequency of the coupled nucleus during detection, causing rapid transition between α and β. Neighbouring nuclei therefore experience an average local field, giving rise to a single signal.

2.4.2. Dipolar coupling

The dipolar interaction between neighbouring nuclei was introduced in chapter 2.3.5.1. In addition to stimulating relaxation, fluctuations in local fields also confer distinct resonance frequencies in neighbouring nuclei, and can therefore enhance or reduce the signal splitting mediated by scalar coupling. However, in solution isotropic tumbling averages dipolar couplings to zero. Partial alignment of the molecule (i.e. anisotropy) is effective in yielding residual dipolar couplings (RDCs), which are becoming increasingly fundamental to structural studies by NMR (Tolman et al. 1995).

2.5. The nuclear Overhauser effect (NOE)

A further consequence of the dipolar coupling is the nuclear Overhauser effect (NOE), defined as the change in intensity of a resonance resulting from the excitation (i.e. α- to β- transition) of another nucleus via through-space magnetisation transfer (Kaiser 1962). This occurs via cross-relaxation between dipolar-coupled nuclei, and is therefore a distance-dependent interaction, proportional to r-6 (where r is the inter-nuclear distance), equating to a maximum distance of approximately 5 Å (Claridge 2009). Correlation of the resonances of proximate nuclei in this manner, and determination of approximate inter-nuclear distances, is fundamental to molecular structure determination (Cavanagh 2007).

67

The classical description of the origin of the NOE utilises a simplified system, consisting of two dipolar- coupled spin-½ nuclei, I and S (Hore 1995). These give rise to four distinct energy states; αIαS, αIβS, βIαS and βIβS, for which the Boltzmann distribution dictates respective populations of n + 2Δ (αIαS), n (αIβS and βIαS) and n – 2Δ (βIβS) at thermal equilibrium (figure 2.8A). Selective saturation of spin S via a continuous rf pulse results in equalisation of the population difference between its spin states (figure 2.8B). Re-establishment of thermal equilibrium is thus sought, and can occur via six possible relaxation pathways (figure 2.8Ci). Four of these correspond to single quantum transitions, characterised by the

X S ‘transition probability’ W1 , where x denotes spin I or S. W1 transitions remain saturated by rf energy, I whilst the population differences pertaining to W1 transitions remain at equilibrium, hence neither transition plays a role in NOE observance. The NOE instead derives from the two ‘cross-relaxation’ pathways, termed W0 (zero-quantum; direct transition between equivalent energy levels) and W2 (double-quantum; direct transition across two energy levels), which involve simultaneous flipping of both I and S spins. (It should be noted that since electromagnetic radiation is neither emitted or absorbed, these pathways are exempt from the Δm ± 1 selection rule). This is enabled via correlation of the local field experienced by each spin, due to their dipolar interaction and the motion of the molecule.

The W2 pathway serves to increase the population differences across spin I transitions, and therefore enhances its resonance intensity, giving rise to a positive NOE (figure 2.8Cii). Contrarily, the W0 pathway decreases this population difference, reducing the resonance intensity and giving rise to a negative NOE (figure 2.8Ciii). These pathways compete, with the winner ultimately dictated by the rotational correlation time of the molecule. In a rapidly-tumbling molecule (<1,000 Da) the high-energy transition

(W2) predominates, whilst the lower energy transition is stimulated in a slow-tumbling molecule (>2,000

Da) (Claridge 2009). Therefore molecules in the region 1,000-2000 Da, where W2 ≈ W0, do not commonly exhibit detectable NOEs.

NOEs derived from the selective saturation of a spin are termed ‘steady-state’, due to the establishment of a new equilibrium between the competing relaxation pathways during the saturation period. Typically a series of spectra would be recorded, saturating a range of nuclei, and NOEs accumulated (Claridge 2009). However steady-state NOE cannot be translated into accurate inter-nuclear distances, since cross-relaxation is ‘diluted’ by the influences other neighbouring spins. Therefore it is more common to detect ‘transient’ NOEs, via experiments using non-selective pulses (i.e. enabling observation of all NOE in a molecule in a single experiment) (Claridge 2009). This involves instantaneous rotation of

68 magnetisation into the -z axis (i.e. inversion of spin populations), followed by a time period during which magnetisation is transferred between dipolar-coupled nuclei, prior to detection.

Figure 2.8: the origin of the NOE. A) Depicting the four energy levels for a system containing two dipolar-coupled spins, I and S, labelled with spin populations (blue) and population differences (red) at equilibrium. B) Following saturation of spin S, the population differences between its energy states are equalised, i.e. equilibrium is perturbed. C)i) Depicting the six possible relaxation pathways, including zero-quantum (W0), single-quantum (W1) and double-quantum (W2) transitions. The NOE occurs due to the cross-relaxation pathways, W0 and W2. ii) Cross- relaxation via W2 re-stores equilibrium to spin S, and increases the population difference of spin I, giving rise to a positive NOE. iii) Cross-relaxation via W0 re-stores equilibrium to spin S, but decrease the population difference of spin I, thus giving rise to a negative NOE.

69

2.6. Multi-dimensional NMR

A one-dimensional NMR spectrum contains all of the resonances of a particularly nucleus (i.e. commonly 1H) in a single frequency domain. In large molecules (e.g. proteins), this causes substantial signal overlap and makes interpretation (e.g. chemical shift assignment) impossible. Detailed studies of proteins have been made possible via the division of multi-dimensional NMR experiments. Here, resonances of proximate nuclei are correlated via magnetisation transfer through scalar or dipolar coupling. This reduces overlap by spreading data across multiple frequency domains.

2.6.1. Two-dimensional techniques

2.6.1.1. Generation of extra dimensions

All two-dimensional (2D) experiments have same basic format, consisting of four stages; preparation (P), evolution (E), mixing (M) and detection (D) (figure 2.9A) (Keeler 2005). The preparation stage involves perturbation of bulk z-magnetisation, via one or more pulses. Magnetisation is then allowed to precess

(i.e. evolve) at its characteristic frequency during a time period t1, before a second pulsing stage (termed

‘mixing’) and detection during a time period t2. A second dimension is created by incrementally increasing the duration of the period t1 over a series of experiments; t1 = 0, t1 = Δ, t1 = 2Δ, etc. The effect of this can be explained using the vector model, via referral to the most basic 2D pulse sequence; comprising solitary 90o pulses in the preparation and mixing stages (figure 2.9B) (Aue et al. 1976).

In the case of a single uncoupled spin X, the initial 90o pulse places the magnetisation vector transverse

o plane, where it evolves at its offset frequency during t1. The application of a second 90 pulse rotates the y-component of the transverse magnetisation on to the -z axis, whilst the x-component continues to

o precess and is detected in the transverse plane (figure 2.9C). Thus if t1 = 0 then the two 90 pulses add o up to 180 , placing magnetisation directly onto the -z axis, hence no transverse signal is obtained. As t1 is increased, x-magnetisation is allowed to develop, and thus incrementally precesses through 360o, passing through maxima (positive signal), null and minima (negative signal) positions with respect to the receiver coil. This ultimately gives rise to a series of 1D spectra in which the amplitude fluctuates as a function of time, i.e. an interferogram (figure 2.8D). During longer t1 durations, x-magnetisation undergoes relaxation, causing the amplitude of the interferogram to diminish. The frequency of

70 fluctuation depends upon the precessional frequency of the nucleus, and thus the magnetisation is said to be ‘frequency-labelled’ as a function of t1. Fourier transformation of the two time domains gives rise to a pair of frequency domains, f1 and f2, which are combined to yield a 2D spectrum (figure 2.8E) (Keeler 2005).

Figure 2.9: generating a second dimension in a 2D NMR experiment. A) the basic scheme of all 2D experiments. B) the most basic 2D pulse sequence. 90o pulses are represented by solid black bars, are applied along the x-axis o unless stated otherwise (Aue et al. 1976). C) following the initial 90 pulse, x-magnetisation evolves during t1 until o application of the 90 second pulse. D) incrementation of t1 duration (starting at t1 =0) modulates the intensity of the detected signal, giving rise to an interferogram. E) a diagram of an example 2D spectrum, which are commonly represented as contour plots. In the case of a single uncoupled spin, X, the spectrum contains a single peak where the resonance frequency meets on either axis.

71

Of course, the described example of an uncoupled spin does not provide any information which could not be gained from a simple 1D experiment. The value of multi-dimensional NMR experiments derives from the ability to transfer magnetisation between coupled nuclei during the mixing period, thereby frequency-labelling a different nucleus, and giving rise to cross-peaks of correlated resonance frequencies in the spectrum (Keeler 2005). Therefore, the information yielded from a two-dimensional NMR experiment is largely dictated by the pulse sequence employed during the mixing period. During this project various 2D NMR experiments have been applied to the study of carbohydrate molecules.

2.6.1.2. Homonuclear resonance correlation

2.6.1.2.1. 1H-1H Correlation Spectroscopy (COSY)

The basic 2D pulse sequence depicted in figure 2.9B in fact belongs to the correlation spectroscopy (COSY) experiment (Aue et al. 1976). During the mixing period, consisting of a single 90o pulse, magnetisation is transferred between J-coupled nuclei. Therefore the experiment is capable of correlating the resonances of nuclei separated by two or three bonds (e.g. geminal and vicinal hydrogens).

2.6.1.2.2. 1H-1H Total Correlation Spectroscopy (TOCSY)

The pulse sequence of a basic 2D TOCSY experiment is closely related to that of the COSY, but with mixing time comprised of a ‘spin-lock’ period (figure 2.10) (Bax & Davis 1985). This consists of a series of low power 180o (y) pulses, typically applied for up to 100 ms. These pulses serve to continually refocus chemical shifts so that all spins are ‘locked’ along the y-axis. This prevents chemical shift evolution and hence forces the system into the strong-coupling condition (Δ ≈ J; see chapter 2.4.1). Magnetisation can therefore be sequentially propagated along a chain of nuclei (a ‘spin-system’) via J-coupling, meaning that TOCSY experiments are capable of correlating the resonances of entire aliphatic amino acid side-chains or carbohydrate saccharide rings.

72

Figure 2.10: the pulse sequence of a basic 2D homonuclear 1H-1H TOCSY experiment. The sequence is very similar to that of the 2D 1H-1H COSY, but with a mixing period consisting of a spin-lock (a series of 180o y-axis pulses).

2.6.1.2.3. 1H-1H Nuclear Overhauser Effect Spectroscopy (NOESY)

The pulse sequence of a NOESY experiment is similar to those employed by the 2D TOCSY and COSY experiments. As described in chapter 2.5, NOE build-up is enabled via inversion of the spin populations

(i.e. rotation of magnetisation into the -z axis), followed by a fixed delay period, τm, during which z- magnetisation is transferred between dipolar-coupled spins. A final 90o pulse places magnetisation back into the transverse plane for detection (figure 2.11) (Jeener et al. 1979).

Figure 2.11: the pulse sequence of a basic 2D NOESY experiment. The sequence is very similar to those of the 2D o COSY and TOCSY, but with a mixing period consisting of a time delay, τm, intersecting two 90 pulses.

2.6.1.3. Heteronuclear resonance correlation

The most common 2D experiment involving correlation of heteronuclei is the HSQC (Heteronuclear Single Quantum Coherence), in which magnetisation transferred via J-coupling between hydrogen and a heteronucleus (i.e. 15N or 13C) (Keeler 2005). As described in chapter 2.3.7, detection of 15N and 13C is inherently insensitive due to their small gyromagnetic ratios. Consequently experiments such the HSQC

73 utilise the INEPT (Insensitive Nuclei Enhanced by Polarisation Transfer) experiment, enabling excitation and detection of the higher sensitivity, frequency-labelled 1H nucleus (figure 2.12A) (Morris & Freeman 1979).

Figure 2.12: the pulse sequence of a HSQC experiment. A) The pulse sequence itself, with preparation (P), evolution (E), mixing (M) and detection (D) stages labelled. 90o and 180o pulses are represented by filled and empty bars respectively. Magnetisation is transferred from 1H to X (i.e. 15N or 13C) via INEPT, followed by frequency-labelling and transfer back to 1H (via reverse INEPT) for detection (Keeler 2005). B) demonstrating the manipulation of 1H magnetisation during INEPT, resulting in anti-phase alignment along the x-axis.

The HSQC pulse sequence contains a relatively complex preparation period, consisting of two 90o pulses separated by a time period, Δ. The first pulse rotates 1H nuclei into the transverse plane, where the coupling and chemical shift evolve. After a period of Δ/2, simultaneous 180o (x) pulses of 1H and X (i.e. 13C/15N) cause refocusing of the chemical shift offset whilst the coupling continues to evolve (i.e. termed a “spin-echo”). This results in anti-phase magnetisation along x-axis at the end of the period, provided

o 1 that Δ = ½JH-X (figure 2.12B). Magnetisation is then transferred via simultaneous 90 H/X pulses, and

74

13 15 frequency labelling of C/ N occurs during the incremented evolution period (t1). Magnetisation then transferred back to 1H via a reverse INEPT sequence for detection (Keeler 2005). During pulsing 1H decoupling is carried out via application of a broadband sequence during t2, as described in chapter 2.4.1. Additionally, the coupling effects of 1H nuclei upon a directly-attached 15N nuclei is eradicated via a 180o 1H pulse during the evolution period. This inverts the 1H spin state, ensuring that 15N nuclei experience averaged local fields.

2.6.2. Three-dimensional NMR

Three-dimensional NMR experiments adopt the same basic principles of 2D NMR, but with additional evolution and mixing periods prior to detection (t3) giving rise to two ‘indirectly-acquired’ interferograms (t2 and t3) (figure 2.13A). The two evolution periods are individually incremented, and the resulting time domains are Fourier transformed and combined yielding a ‘3D box’ (figure 2.13B). 3D NMR experiments are routinely employed for resonance assignment of protein molecules, since they significantly reduce signal overlap compared to 2D spectra. Many backbone and side-chain resonance can be deciphered from ‘triple-resonance’ experiments, with additional side-chain and NOE data yielded from heteronuclear-edited data-sets (Cavanagh 2007).

Figure 2.13: Three-dimensional NMR spectrscopy. A) the general scheme for a 3D experiment. The additional evolution and mixing periods enable frequency labelling with two nuclei. B) the three frequency domains are combined to give a 3D box.

75

2.6.2.1. Triple-resonance

Triple-resonance NMR experiments usually correlate the resonance frequencies of three different nuclei (i.e. 1H, 15N and 13C). This involves rapid magnetisation transfer via J-couplings, commonly via the previously-described INEPT pulse sequence (figure 2.12A). Experiments are typically named after the nuclear resonances which are correlated, with nuclei through which magnetisation is only transferred denoted in brackets. For example, the CBCA(CO)NH experiment correlates Cβ, Cα, NH and HN resonances via magnetisation transfer through the carbonyl (Grzesiak & Bax 1992).

The magnetisation transfer scheme of various triple-resonance experiments are provided in figure 2.14. Typically pairs of experiments are used to elucidate resonances from sequentially connected residues.

For example, the HNCO experiment correlates the NH/HN from residue i with the carbonyl from residue i-1, whilst the HN(CA)CO additionally correlates the residue i carbonyl (Kay et al. 1990). Thus it is possible to decipher the respective carbonyl resonances via comparison of the datasets. Cα and Cβ resonances can also be identified using a pair of analogous experiments; HNCACB (Wittekind & Mueller 1993) and CBCA(CO)NH (Grzesiak & Bax 1992), and further correlations enabled by (H)CC(CO)NH (Grzesiak et al. 1993) and HBHA(CO)NH (Grzesiak & Bax 1993) data. Together the experiments enable complete resonance assignment of a protein backbone.

2.6.2.2. Heteronuclear-edited experiments

Additional chemical shift data is obtained from heteronuclear-edited experiments. These typically combine a homonuclear 2D step (such as those described in chapter 2.6.1.2) and a heteronuclear correlation (e.g. HSQC). For example a NOESY-HSQC experiment involves resonance correlation between dipolar-coupled hydrogen nuclei, followed by J-coupling mediated magnetisation transfer to 15N or 13C (Marion et al. 1989; Fesik et al 1989). Overlap of the NOE signals in a 2D NOESY spectrum is therefore reduced via dispersion across a third dimension. A similar principle is applied in the (H)CCH-TOCSY and H(C)CH-TOCSY experiments (Bax et al. 1990), which are commonly employed to carry out side-chain assignment.

76

Figure 2.14: magnetisation transfer during triple-resonance NMR experiments. Hα, Hβ, Cx (where x can denote α, β and beyond), NH and HN resonances are correlated using a combination of triple-resonance experiments. Nuclei whose resonances are ultimately detected are highlighted in red, and those through which magnetisation is passed but not detected are highlighted in yellow. In each experiment magnetisation is passed from NH onto HN for detection, and therefore each effectively gives rise to a 1H-15N HSQC spectrum with 13C (or 1H in the case of HBHA(CO)NH) resonances resolved along the third dimension.

77

2.7. Summary

NMR spectroscopy is a powerful technique which is commonly applied to study protein structure, dynamics and ligand binding. The method utilises the fundamental nuclear property ‘spin’, giving rise to a magnetic moment which can be ‘excited’ and manipulated via applied radiofrequency fields, and detected, ultimately yielding a frequency spectrum. Nuclei have characteristic resonance frequencies (i.e. chemical shifts) which provide information on local chemical environment and therefore structure. Through-bond and through-space resonance correlation is carried out using relatively complex pulse sequences, giving rise to multi-dimensional spectra which are fundamental to the study of large molecules.

Here, two- and three-dimensional NMR spectroscopy has been utilised for chemical shift assignment and structure determination of T. gondii MIC4 A5 domain, and the detailed study of its mode of binding to carbohydrate ligands. These studies are discussed in detail in chapters 4-6.

78

Chapter 3: Identification and production of a stable, adhesive fragment from TgMIC4- A56

79

3.1. Introduction

Prior to the onset of these studies, the DNA sequence encoding TgMIC4 residues 410-580 (encompassing putative domains A5 and A6) had been sub-cloned and expressed in bacterial cells. Protein purification yielded a reagent, hereafter known as TgMIC4-A56, which NMR analysis demonstrated to be partially folded, but capable of binding to galactosated ligands. Limited proteolysis yielded a stable protein of approximately 12 kDa which retains lectin activity.

These studies aimed to provide new insights into the mode of host-receptor recognition and binding by TgMIC4. In order to achieve this, it was first necessary to determine the identity of the previously isolated protein fragment. To this end, TgMIC4-A56 protein expression and purification was repeated, followed by proteolysis and proteomic analysis of the resulting fragment. Additionally, the individual domains A5 and A6 were sub-cloned, expressed, purified and their folded states and ligand-binding capabilities analysed using NMR spectroscopy.

3.2. Considerations for MIC protein production

A protocol for the routine production of recombinant Apicomplexan microneme proteins has been established in our laboratory (Saouros et al. 2007). Protein expression is carried out in E. coli, a popular and widely-used expression host due to its fast growth rate and straightforward, efficient transformation and expression protocols. However, many MICs contain disulphide bonds. Heterologous expression of correctly folded disulphide-linked proteins, such as TgMIC4 apple domains, is often problematic in bacterial hosts (de Marco, 2009), due to the maintenance of the bacterial cytoplasm in a reduced state via thioredoxin and glutathione reductases (Bessette et al. 1999). Endogenous disulphide- linked proteins are folded in the oxidative periplasm (Rietsch et al. 1997). It would therefore be logical to direct recombinant polypeptides to this compartment, however in practise this often results in low yields (de Marco 2009) and thus cytoplasmic expression is preferable.

Previous attempts at cytoplasmic MIC protein expression in widely-used E. coli strains such as BL21 (DE3) lead to poor expression or low yields of functional proteins (Saouros et al. 2007). Consequently DNA sequences, including that encoding TgMIC4-A56, have been cloned into a specialised expression vector, and expressed in an E. coli strain adapted for enhanced disulphide formation in the cytoplasm.

80

3.2.1. pET-32 Xa/LIC vector

The pET-32 Xa/LIC vector (figure 3.1A) is adapted for increased production of correctly-folded disulphide-linked proteins in the bacterial cytoplasm, via incorporation of an N-terminal His-tagged thioredoxin (trx) tag (figure 3.2B). Thioredoxin is a small (~12 kDa) redox enzyme, catalysing thiol- disulphide exchange via two cysteine residues in its active site. Its presence is therefore aimed at promoting disulphide ‘shuffling’ in the fusion partner, in order to increase the yield of active protein. Additionally it has been observed that a thioredoxin tag can significantly improve solubility and yield of proteins which are otherwise insoluble when expressed in E. coli (LaVallie et al. 1993).

Figure 3.1: the pET-32 Xa/LIC vector map. The DNA sequence encoding thioredoxin (trxA) is positioned so as to precede a multiple-cloning site, which includes the LIC site. Additionally the vector poses an Ap resistance marker, encoding the bla (β-lactamase) gene.

81

Figure 3.2: the features of a pET-32 Xa/LIC vector expression product. The protein of interest is inserted at the ‘Xa/LIC site’. The vector introduces N-terminal thioredoxin-, His-, and S-tags, removable via digestion using thrombin or factor Xa (Xa) proteases.

3.2.2. Origami cells

Expression of disulphide-linked proteins in the bacterial cytoplasm is commonly carried out using OrigamiTM B (DE3) E. coli cells. The notation “DE3” indicates that the cell is a lysogen of the λDE3 prophage, and thus harbours a chromosomal copy of the T7 RNA polymerase. This lies under the control of a lacUV5 promoter, whilst the pET-32 Xa/LIC plasmid harbours a hybrid T7/lac promoter. Expression from both promoters is tightly regulated by a repressor molecule, lacI. Addition of IPTG displaces the repressor, permitting expression of T7 RNA polymerase, which in turn transcribes the gene of interest.

OrigamiTM B (DE3) cells are a derivative of the widely-used BL21 (DE3) strain, and thus are deficient in Lon and OmpT proteases, enhancing the stability of expressed proteins. They are further deficient in thioredoxin reductase (trxB) and glutathione reductase (gor), which reduces the reductive potential of the cell cytoplasm, favouring the formation of disulphide bonds (Derman et al. 1993; Prinz et al. 1997). TrxB and gor gene deletions are selectable via resistance to kanamycin and , and Origami cell lines are therefore compatible with the β-lactamase producing pET-32 Xa/LIC plasmid.

3.3. Materials & methods

3.3.1. Sequence analysis

A secondary structure prediction for TgMIC4-A56 was performed using PsiPred (Jones 1999). Estimated chemical and physical parameters (e.g. molecular mass, molar extinction coefficient) were derived using the ExPASy ProtParam server (Wilkins et al. 1999).

82

3.3.2. Gene cloning of TgMIC4-A5 and TgMIC4-A6

A recombinant pET-32 Xa/LIC DNA construct encoding TgMIC4-A56 was kindly provided by Dr Jan Marchant (Imperial College London). Additionally, pET-32 Xa/LIC constructs for individual expression of TgMIC4-A5 and TgMIC4-A6 were created, as described below.

3.3.2.1. Construct design

In order to be consistent with the previous studies of TgMIC4-A56 construct, S410 was selected as the first residue of TgMIC4-A5. This enabled the incorporation of a substantial linker (9 amino acids) upstream of the first cysteine residue, which may have structural/functional significance. Similarly, the construct was designed to terminate following the three residues which intersect TgMIC4-A5 and TgMIC4-A6 (residues 489-491). The next residue - C492 - marks the onset of the sixth apple domain, hence the construct was terminated at this point in order to prevent possible interference with the disulphide-linkage pattern of TgMIC4-A5. Similarly, TgMIC4-A6 was designed to include residues 489-491 prior to its first cysteine. The construct was extended to the end of the protein, incorporating fifteen amino acids downstream of the final cysteine, as in TgMIC4-A56. These final residues are alleged to be important for TgMIC4 host-cell binding (Brecht et al. 2001).

During the course of further studies of TgMIC4-A5 (described in chapters 4-6) a new plasmid was created, containing an additional sequence encoding a tobacco etch virus (TEV) protease digestion site adjacent to the LIC/FXa site. This move was economically-motivated, due to the ability to produce His- tagged TEV protease in-house.

3.3.2.2. Ligation-Independent cloning (LIC)

The design of the pET-32 Xa/LIC vector enables plasmid construction via ligation-independent cloning (LIC). This involves direct cloning of PCR products into vectors without the need for restriction digests or ligation reactions (Aslanidis & De Jong, 1990). This involves the use of a linearised vector with non- complementary, GTP-free overhangs of 15 nucleotides. PCR primers are designed to incorporate compatible sequences onto the ends of the target insert, and complementary overhangs are generated via the 3’→5’ exonuclease activity of T4 DNA polymerase (in the presence of dGTP) (figure 3.3). The

83 annealed insert and vector are then transformed into competent E. coli cells where ligation occurs to yield a circular plasmid.

Figure 3.3: the ligation-independent cloning (LIC) strategy. 5’ and 3’ primers are designed with specific 15 nucleotide overhangs, which are this incorporated into the PCT product. Treatment of the PCR product with T4 DNA polymerase in the presence of dGTP results in formation of sticky ends which are compatible with the linearised LIC plasmid.

3.3.2.3. PCR and vector annealing

Template TgMIC4 cDNA was provided by Dr Jan Marchant. Oligonucleotide primers were designed manually and purchased from Invitrogen. PCR reactions were performed using a Mastercycler Gradient (Eppendorf) and Thermococcus kodakaraensis (KOD) DNA polymerase (Novagen). Details of all PCR reactions, including primer sequences, reaction mixtures, and protocols, are provided in appendix A1.

84

Reactions were analysed via agarose gel electrophoresis (appendix A3.1). Amplified inserts were purified using a QIAquick Gel Extraction kit (Qiagen), according to the manufacturers instructions.

In order to generate 5’ and 3’ overhangs for plasmid annealing, purified PCR products were treated with T4 DNA polymerase in the presence of dGTP (25 mM) and DTT (100 mM) (Novagen). Reaction mixtures were incubated at 22oC for 30 minutes, after which the enzyme was inactivated via incubation at 75oC for 20 minutes. T4 DNA polymerase-treated inserts were annealed to the pET-32 Xa/LIC via incubation of the two components at 22oC for 5 minutes, followed by the addition of EDTA (25 mM) and incubation for a further 5 minutes.

3.3.2.4. Bacterial transformation & colony PCR

Bacterial transformation was initiated via addition of 1 μl of annealing reaction to 50 μl of NovaBlue GigaSinglesTM component E. coli cells. Cells were incubated on ice for 5 minutes, heat-shocked for 30 seconds at 42oC, and further incubated on ice for 2 minutes, before addition of 250 μl SOC medium (Novagen) and gentle agitation at 37oC for 60 minutes. Cells were then spread onto LB-agar (see appendix A2) plates containing carbenicillin (50 μg/ml) and incubated at 37oC for 18 hours.

Bacterial transformants (i.e. colonies) were further assessed for the presence of inserted genes via PCR. Selected colonies were picked using a sterile tip and transferred to 50 μl sterile water (with a copy made by further touching the tip in 6 ml LB media containing carbenicillin). Cells were then lysed via vortex and incubation at 100oC for 5 minutes, before centrifugation at 10,000 RPM for 1 minute to remove cell debris. 10 μl of supernatant was used as a template for a PCR reaction, using Thermophilus acquaticus (Taq) DNA polymerase and primers corresponding to the target insert (see Appendix A1). Reactions were analysed via agarose gel electrophoresis.

3.3.2.5. Plasmid purification

Copies of positive bacterial transformants from colony PCR in 6 ml LB were incubated at 37oC for 18 hours. Cells were then harvested via centrifugation at 4,000 RPM for 15 minutes. Plasmids were purified using a QIAprep Spin Miniprep kit (Qiagen), according to the manufacturer’s instructions. DNA sequence

85 integrity was verified by Beckman Coulter Genomics (formerly Cogenics) or GATC Biotech. Forward and reverse reactions were carried out using primers to the vector S-tag and T7 terminator respectively.

3.3.3. Protein expression

Expression of TgMIC4-A56, TgMIC4-A5 and TgMIC4-A6 was carried out using identical methods, which are in general accordance with previously described protocols (Saouros et al. 2007).

3.3.3.1. Bacterial transformation

2-3 μl (approximately 200 ng) of plasmid was added to 50 μl chilled OrigamiTM B(DE3) E. coli cells (Novagen) in a sterile cryotube. Cells were incubated on ice for 30 minutes before heat-shock at 42oC for 60 seconds. Cells were returned to ice for 5 minutes before addition of 300 μl LB media (see appendix A2) and gentle agitation at 37oC for 60 minutes. 50 μl of culture was then spread onto an LB-agar plate containing carbencillin (50 μg/ml), kanamycin (15 μg/ml) and tetracycline (12.5 μg/ml), which was incubated at 37oC for 18 hours.

3.3.3.2. Expression trials

A fresh colony from a bacterial transformation was used to inoculate 6 ml LB medium containing

o antibiotics. Cultures were incubated at 37 C and the optical density at 600 nm (OD600) monitored until it reached 0.6. A 1 ml sample was removed (the un-induced sample) before induction of protein expression via addition of Isopropyl-β-D-thiogalactopyranoside (IPTG) (to a final concentration of 0.5 mM). Cultures were further incubated at 28oC for 16-18 hours, before removal of 1 ml (the induced sample). Expression levels were assessed using the B-PER II Protein Extraction Reagent (Thermo Scientific) as per the manufacturer’s instructions. Briefly, chemical-induced cell lysis of the un-induced and induced samples is followed by separation of soluble and insoluble proteins via centrifugation. Samples were analysed via SDS-PAGE (see appendix A3.2).

3.3.3.3. Large-scale expression

A single colony from bacterial transformation was used to inoculate 50 ml LB medium containing

86 antibiotics. This was incubated at 37 oC for 16-18 hours before inoculation of 1 L LB medium such to an

o OD600 of 0.05-0.1. Cultures were incubated at 37 C until an OD600 of 0.6 was observed, at which point expression was induced via addition of 0.5 mM IPTG. Cultures were incubated at 28oC for a further 16- 18 hours, after which cells were harvested via centrifugation at 4,000 RPM for 15 minutes. Cell pellets were then snap-frozen in dry ice or liquid nitrogen, and stored at -20oC until protein purification.

The study of proteins via two- and three-dimensional NMR spectroscopy requires enrichment with 15N and/or 13C isotopes. In order to achieve this, expression was carried out in M9 minimal medium

15 o containing 0.07% (w/v) NH4Cl (see appendix A2). 1 L cultures were incubated at 37 C and the OD595 monitored until reaching 0.8, at which point expression was induced via addition of 0.5 mM IPTG, followed by incubation at 28oC for a further 16-18 hours.

3.3.4. Protein purification

Purification of TgMIC4-A56, TgMIC4-A5 and TgMIC4-A6 was carried out using largely identical methods, in general accordance with the previously described protocols (Saouros et al. 2007). Deviations from the standard protocol are highlighted and described.

3.3.4.1. Cell lysis and Ni2+-affinity chromatography

Cell pellets were thawed on ice and resuspended in 25 ml lysis buffer (appendix A2). Cells were complemented with an EDTA-free protease inhibitor tablet (Roche) and lysozyme (1 mg/ml), before lysis via sonication on ice, using a Sonopuls HD 2200 (Bandelin) at 60% intensity for a net 4 minute time period. Cell lysate was centrifuged at 15,000 RPM for 30 mins at 4oC in order to remove unbroken cells and debris.

Cell supernatant was passed through a polypropylene column (Qiagen) containing 2 ml Ni-NTA resin (Generon), pre-equilibrated with 5 column volumes (CV) of lysis buffer. Flow-through was collected before washing with 6 CV wash buffer (containing 20 mM imidazole; see Appendix A2), and elution of Ni-bound TgMIC4-A56/trx fusion protein in 6 CV elution buffer (containing 200mM imidazole). Collected fractions were analysed via sodium dodecyl sulphate polyacrylamide gel electrophoresis (SDS-PAGE) (see appendix A3.2).

87

3.3.4.2. Fusion protein digestion using factor Xa

Purified fusion proteins were dialysed for 18 hours at 4oC against 4 L Factor Xa reaction buffer (appendix A2). Factor Xa (FXa) protease was then added (1-10 units (U) per mg of fusion protein) and the sample agitated at room temperature for 18 hours. Samples were removed at increasing time intervals for analysis via SDS PAGE, before reaction termination of the reaction via addition of 1mM PMSF.

3.3.4.2.1. Using thrombin

TgMIC4-A6/trx samples were dialysed for 18 hours at 4oC against 4 L Thrombin reaction buffer (appendix A2). Thrombin protease was added (1-10U per mg of fusion protein) and the sample agitated at room temperature for 18 hours. The reaction was terminated via addition of PMSF.

3.3.4.2.2. Using TEV

In the case of TgMIC4-A5/trx bearing a TEV cleavage site, samples were dialysed for 18 hours at 4oC against 4 L TEV reaction buffer (appendix A2). TEV protease was added at a ratio of 1:100 (protease:protein). The reaction was allowed to proceed for 18 hours.

3.3.4.3. Reverse Ni2+-affinity chromatography

Following proteolytic digestion, thioredoxin was removed via passage through a polypropylene column packed with a 2ml bed volume of Ni-NTA resin, pre-equilibrated with 6 CV digestion reaction buffer. Proteins were collected in the flow-through, and the resin washed with a further 5 CV FXa buffer. Ni- bound trx was then eluted in 5 CV FXa buffer containing 250 mM imidazole. Collected fractions were analysed via SDS-PAGE.

3.3.4.4. Size-exclusion chromatography

Proteins were further purified via gel filtration, using a HiLoadTM 16/60 SuperdexTM 75 column and an AKTATM FPLC system (GE Healthcare). Samples were concentrated via centrifugation at 4,500 RPM and 4oC, using Vivaspin 3 kDa MWCO concentrators (Generon), to <5 ml volume. The column was pre-

88 equilibrated with 1.5 volumes of NMR buffer before sample injection and protein elution at a flow rate of 1 ml/min. Protein elution was monitored via UV absorbance at 280 nm and SDS-PAGE analysis.

3.3.4.5. Protein quantitation

Whenever necessary, protein quantitation was carried out using a NanoDrop 2000c (Thermo Scientific), using the molar extinction coefficient and molecular mass values calculated by ExPASy ProtParam.

3.3.5. NMR analysis

Following gel filtration samples were concentrated via centrifugation at 4oC and 4,500 RPM, using 20 ml Vivaspin 3 kDa MWCO concentrators (Generon) to approximately 500 μl volumes, and supplemented with D2O (10% v/v), 0.1% (v/v) sodium azide and 1% (v/v) EDTA-free protease-inhibitor cocktail. Protein folded states were assessed via 1D and 2D NMR spectroscopy, using the experiments detailed in table 3.1. In some instances lectin activity was assessed via acquisition of a 1H-15N HSQC spectrum in the presence of 10 molar equivalents of galactose. All experiments were recorded on an in-house Bruker Avance II DRX500 spectrometer equipped with a TXI cryoprobe.

Spectral Larmor Experiment Nuclei Data points width (Hz) frequency (Hz) 1H 1H 8192 12503.899 500.20 1H 2048 6265.664 500.20 1H-15N HSQC 15N 256 1520.681 50.68

Table 3.1: acquisition parameters for the NMR spectra recorded for assessment of folded states and lectin activity.

3.3.6. Identification of the folded fragment of TgMIC4-A56

3.3.6.1. Trypsin digestion

TgMIC4-A56 was diluted in trypsin reaction buffer (Appendix A2) to a 10 ml volume. To this was added bovine pancreatic trypsin pre-treated with TPCK (Sigma Aldrich), at a ratio of 1:15 protease:protein. The

89 reaction was agitated at room temperature for 60 minutes before termination with 1 mM PMSF. The liberated protein was then purified via gel filtration as described above.

3.3.6.2. Proteomic analysis

N-terminal sequencing (via Edman degradation) was carried out by Dr Jeff Keen (University of Leeds). Peptide mass fingerprinting was carried out by Dr Paul Hitchen (Imperial College London).

3.4. Results

3.4.1. Sequence analysis

The secondary structure prediction of TgMIC4-A56 is pictured in figure 3.4. It predicts, with high confidence, two motifs in which an α-helix is followed by two β-strands, resembling the disulphide- linked α-β-β hairpin loop observed in all known PAN/apple domain structures (see chapter 1.4.2). Several additional β-strands are predicted in positions which are consistent with known apple/PAN domain structures. Residues outside of the putative domain boundaries (based upon the domain delineation in UniProtKB Q9XZY7) are predicted to form loop regions, save for a short β-strand at the C- terminus, although this is predicted with low confidence. Overall the prediction suggests that the selected domain boundaries for the recombinant TgMIC-A56, TgMIC4-A5 and TgMIC4-A6 proteins are sensible, and do not occlude any structured regions. Estimated chemical and physical parameters for TgMIC4-A56, TgMIC4-A5 and TgMIC4-A6 are presented in table 3.2.

90

-1 -1 TgMIC4 Molecular mass (Da) Extinction coefficient (M cm ) pI Construct residues Free Trx- Free Trx- Free Trx- TgMIC4-A56 410-580 18,883 36,123 25,160 39,265 5.36 5.71

TgMIC4-A5 410-491 9,131 26,383 7,825 21,930 8.23 6.45

TgMIC4-A5(TEV) 410-491* 9,200 27,234 7,825 23,420 8.27 6.30

TgMIC4-A6 489-580 10,061 27,301 17,335 31,440 4.33 5.13

*digestion with TEV leaves a residual glycine residue at the N-terminus of the liberated protein.

Table 3.2: the estimated physical and chemical properties of TgMIC4-A5, TgMIC4-A6 and TgMIC4-A56. Values were calculated using the ExPASy ProtParam server (Wilkins et al.1996) and are provided both free and trx-fused forms.

Figure 3.4: secondary structure prediction of TgMIC4 residues 410-580. As expected the prediction yields two signature PAN/apple-domain α-β-β hairpin loops, predicted with high confidence. Residues within the putative domain boundaries (based upon the UniProtKB domain delineation of TgMIC4 (Q9XZY7)) are highlighted in yellow. The secondary structure prediction was created using PsiPred (Jones 1999).

91

3.4.2. Identification of the stable adhesive fragment within TgMIC4-A56

3.4.2.1. Protein expression

Expression of the TgMIC4-A56/trx fusion protein was assessed via SDS-PAGE analysis of cell lysates from culture samples taken before and after induction with IPTG (figure 3.5). IPTG-induction results in the expression of a soluble protein of the expected size of ~36 kDa (figure 3.5; lane 4). An additional species is present at ~31 kDa; possibly a product of TgMIC4-A56/trx degradation. Whilst a good yield of soluble protein appears to have been obtained, there is also a significant level of concomitant insoluble protein expression (figure 3.5; lane 3). Nonetheless, expression was scaled up in order to obtain sufficient yields for NMR studies.

Figure 3.5: SDS-PAGE analysis of TgMIC4-A56/trx expression. M = molecular weight markers; masses of interest are labelled. Lane 1 = uninduced sample, insoluble fraction; lane 2 = uninduced sample, soluble fraction; lane 3 = induced sample, insoluble fraction; lane 4 = induced sample, soluble fraction. Expression products are arrowed.

3.4.2.2. Protein purification

The incorporation of a poly-Histidine tag at the C-terminus of thioredoxin in pET-32 Xa/LIC expression products enables purification via Ni2+-affinity chromatography. TgMIC4-A56/trx was successfully purified via incubation of cell lysate with Ni2+-resin, and washing of non-binding components before elution of the Ni2+-bound protein in a high-imidazole buffer (see figure 3.6A). Two predominant species eluted from the column, corresponding with the two apparent products of protein expression.

92

Following initial purification, the thioredoxin tag was removed via proteolysis. To this end, the pET-32 Xa/LIC vector incorporates a factor Xa (FXa) protease target site (IEGR↓) at the ligation-independent cloning site, and an additional thrombin protease target site (LVPR↓GS) downstream of the His-tag (figure 3.2). Digestion with FXa is preferable, as it avoids the retainment of a substantial flexible linker at the N-terminus.

Treatment of TgMIC4-A56/trx with FXa protease was successful in removing thioredoxin (figure 3.6B). The reaction generated a ‘ladder’ of protein fragments, indicative of digestion occurring at multiple sites. Nonetheless the sample contains several more prominent species, two of which correspond approximately to the molecular weights of TgMIC4-A56 (~20 kDa) and thioredoxin tag (~17 kDa). Additional lower molecular weight species, at ~12 and ~14 kDa were also yielded, potentially corresponding to stable degradation products of either TgMIC4-A56 or thioredoxin.

A B C

Figure 3.6: SDS-PAGE analysis of TgMIC4-A56 purification. M = molecular weight markers. The identities of relevant protein fragments are labelled. A) Ni2+ affinity chromatography. Lane 1 = cell lysate supernatant; lane 2 = column flow-through; lane 3 = 20 mM imidazole wash fraction; lane 4 = 200mM imazdole elution fraction. B) Factor Xa digestion of TgMIC4-A56/trx. Lane 1 = prior to protease addition; lane 2-5 = 2, 4, 6 and 18 hours after protease addition. The reaction generates multiple protein fragments, as indicated. C) Reverse Ni2+ affinity chromatography. Lane 1 = column flow-through collected; lane 2 = column wash fraction; lane 3 = 200mM imazdole eluction fraction; lane 4 = pooled and concentrated flow-through and wash fractions (i.e. containing purified TgMIC4-A56. “Trx*” denotes a product of a trx degradation.

Factor Xa digestion of TgMIC4-A56/trx liberated un-tagged TgMIC4-A56 from his-tagged trx. Trx was therefore removed via passage through a Ni2+ resin column (figure 3.6C), with TgMIC4-A56 collected in

93 the flow-through and wash fractions (lanes 1 and 2). Following advice from colleagues with previous experience of pET-32Xa/LIC constructs, the sample was passed through the column three or four times, in order to ensure complete removal of trx. Trx was retained on the Ni2+ resin, alongside the numerous contaminant proteins from FXa digestion (figure 3.6C; lane 3), including the aforementioned ~12 and ~14 kDa proteins. The apparent presence of a His-tag suggests that these represent degradation products of the linker at the trx C-terminus (i.e. the S-tag; see figure 3.2).

Due to the relatively high purity of the protein, and that fact that further purification steps were planned (i.e. trypsin digestion), gel filtration was not deemed to be necessary at this stage. Instead, the protein was buffer-exchanged and concentrated in preparation for NMR analysis.

3.4.2.3. NMR analysis of TgMIC4-A56

In order to comprehensively demonstrate the partially unfolded nature of TgMIC4-A56, an 15N-labelled sample was used to record 1H and 1H-15N HSQC spectra (figure 3.7). NMR spectroscopy is an effective technique for assessing protein folding (Hoffmann et al. 2005). Unfolded proteins are highly dynamic and undergo rapid exchange between random conformations. Resultantly, hydrogen nuclei experience an array of chemical environments, ultimately giving rise to high-intensity averaged signals. These averages are largely invariant, causing substantial overlap; amide-hydrogen resonances cluster at 7.5- 8.5ppm and methyl protons at ~1ppm.

Whilst folded globular proteins also exhibit dynamic behaviour, they commonly exhibit a single minimum-energy conformation, in which nuclei experience a single chemical environment. Nuclear shielding or de-shielding caused by proximal functional groups can often give rise to unique chemical environments (Hoffmann et al. 2005). For example, in the presence of an applied magnetic field a ‘ring current’ is induced in the delocalised π-electrons of an aromatic ring (Gomes 2001). Hydrogen nuclei which lie above or below the plane of the aromatic ring are shielded by this electron density, and consequently they resonate up-field of their typical chemical shift (e.g. ring-current shifted methyl protons often resonate <0 ppm). Meanwhile amide protons which are involved in hydrogen-bonding (e.g. those in secondary structure) experience de-shielding, and as a consequence resonate down-field of their typical chemical shift (i.e. >9 ppm). These induced effects give rise to a wide dispersion of chemical shifts; a characteristic feature of a folded protein (Hore 1995).

94

The NMR spectra of TgMIC4-A56, pictured in figure 3.7, bear the hallmark features of both folded and unfolded protein. The 1H spectrum (figure 3.7A) bears multiple signals above 9 ppm, alongside several resonances between 5 and 6 ppm; where Hα nuclei in β-sheet conformation (i.e. secondary structure) typically resonate. However there are multiple overlapping signals at 7.5-8.5 ppm and 1 ppm, which respectively represent amide and methyl protons from unfolded protein. Furthermore the ‘structured’ peaks are rather broad, indicative of protein aggregation/self-association; high-molecular weight species posses longer rotational correlation times, causing enhanced signal relaxation and therefore broadened signals. This may cause the total disappearance of additional structured signals of TgMIC4-A56.

The partially-folded nature of TgMIC4-A56 is perhaps best encapsulated by its 1H-15N HSQC spectrum (figure 3.7B). This experiment provides an amide ‘fingerprint’, correlating the resonances of J-coupled hydrogen and nitrogen nuclei. The spectrum should therefore contain approximately one peak per residue; one per backbone amide (proline excepted) plus additional side-chain resonances. The 1H-15N HSQC spectrum of TgMIC4-A56 bears some discernible signals, but not enough to account for a fully- folded protein of 170 residues. Additional signals are overlapped in the centre of the spectrum, representative of unfolded protein.

95

A

B

Figure 3.7: NMR analysis of TgMIC4-A56. Spectra were acquired at a protein concentration of ~200 µM. A) the 1H NMR spectrum. B) the 1H-15N HSQC spectrum. Both spectra bear the diagnostic features of folded and unfolded protein, demonstrating that TgMIC4-A56 is partially-folded. The apparent broadening of signals further suggests that the protein may be self-associated.

96

3.4.2.4. Isolation of the stable, folded protein.

In order to remove the unfolded part of the protein, TgMIC4-A56 was treated with trypsin protease. SDS-PAGE analysis revealed that the protein had undergone a degree of spontaneous degradation prior to trypsin addition (figure 3.8A; lane 1), demonstrating the intrinsic instability of the protein. Degradation was accelerated by the addition of trypsin, yielding a stable species of approximately 13 kDa (figure 3.8A; lane 9). The relative molecular weights of the digested and undigested species are consistent with the removal of the majority of an apple domain. Following termination of the digestion reaction, the ~13 kDa protein was purified via gel filtration (figure 3.9). The protein eluted within a pair of broad peaks, suggesting that several distinct species are present. Fractions containing the protein were pooled and concentrated for NMR and proteomic analysis.

Figure 3.8: SDS-PAGE analysis of the trypsin digestion reaction. A sample was taken prior to protease addition (lane 1) and then 2.5, 5, 10, 15, 20, 30, 40, 50 and 60 minutes afterwards (lanes 2-9). The reaction yields a stable protein fragment of approximately 14 kDa, via an intermediate species of approximately 17 kDa.

97

Figure 3.9: the gel filtration profile of trypsin-digested TgMIC4-A56. Gel filtration was carried using a HiPrep S75 16/60 column of 120 ml volume (with a void volume of ~40 ml). The products of TgMIC4-A56 trypsin digestion elute across three peaks. SDS-PAGE analysis (gel inlay), under reducing conditions, shows that the stable fragment elutes predominantly in peak A. The apparent absence of proteins in peak C, coupled with its late elution, suggests that it contains the small cleaved peptides from the trypsin digestion reaction.

3.4.2.5. Identification of the stable protein via proteomic analysis

In order to identify the stable product of trypsin digestion, samples were submitted for proteomic analysis. N-terminal sequencing (via Edman degradation) identified the N-terminus of A5 (i.e. S410- P411-D412-F413-H414), suggesting that trypsin acts upon the C-terminus. Consistent with this, peptide mass fingerprinting (via MALDI-TOF mass spectrometry) identified several peptides from A5 along with a peptide corresponding to the A6 C-terminus (figure 3.10 and appendix A5).

Together, these findings demonstrate that the stable component of TgMIC4-A56 corresponds to TgMIC4-A5. A C-terminal peptide is retained, presumably as a result of a disulphide linkage involving

98

C565. The C-terminus of the TgMIC4-A5 fragment has not been verified, but it is likely to extend to the first trypsin target site in the TgMIC4-A6 sequence (R493), thereby incorporating C491. Although TgMIC4-A6 appears to be unfolded, it is possible that C491 and C565 form a disulphide linkage, as expected from conventional apple domain disulphide linkage patterns (chapter 1.4.2).

410 420 430 440 450 460 SPDFHDEVECVHTGNIGSKAQTIGEVKRASSLSECRARCQAEKECSHYTYNVKSGLCYPK

470 480 490 500 510 520 RGKPQFYKYLGDMTGSRTCDTSCLRRGVDYSQGPEVGKPWYSTLPTDCQVACDAEDACLV

530 540 550 560 570 FTWDSATSRCYLIGSGFSAHRRNDVDGVVSGPYTFCDNGENLQVLEAKDTE

Figure 3.10: peptide mass fingerprinting of trypsin-digested TgMIC4-A56. The sequence of TgMIC4 residues 410- 580 is provided, with TgMIC4-A5 and TgMIC4-A6 coloured in green and blue respectively, and linker sequences coloured in grey. Potential trypsin digestion sites are highlighted in bold typeface. Sequences which have been successfully identified via N-terminal sequencing or mass fingerprinting are underlined (for selected MALDI-TOF data see Appendix A5).

3.4.2.6. NMR analysis of digested TgMIC4-A56

The removal of the majority of the TgMIC4-A6 sequence results in a significantly improved 1H-15N HSQC spectrum (figure 3.11). Much of the overlapping signal from flexible protein present prior to digestion (figure 3.7B) is now absent, with the remainder presumed to represent the residual TgMIC4-A6 residues. Additionally, the spectrum bears an increased number of discernable disperse signals, whose narrower line-widths suggest that the digested protein is less prone to self-association. The spectrum contains a larger-than-expected number of signals for a single apple domain, consistent with the presence of a retained portion of TgMIC4-A6.

As discussed in chapter 1.5, previous carbohydrate microarray analysis of undigested TgMIC4-A56 has revealed that it adheres to terminally-galactosated oligosaccharides. In order to demonstrate that the liberated TgMIC4-A56 fragment retains lectin activity, the 1H-15N HSQC spectrum was acquired following addition of ten molar equivalents of galactose to the sample (figure 3.12). NMR provides a simple and effective method for detecting protein/ligand interactions. Upon ligand binding, the chemical

99 environments of the atoms within (and surrounding) the binding interface will be altered; usually conveyed by a change in chemical shift. Thus, the observance of multiple chemical shift changes in the 1H-15N HSQC spectrum of digested TgMIC4-A56 following the addition of galactose is indicative of an interaction.

Figure 3.11: The 1H-15N HSQC spectrum of trypsin-digested TgMIC4-A56. The spectrum was acquired at a protein concentration of ~175 µM. Much of the unfolded protein within TgMIC4-A56 appears to have been removed, resulting a much improved NMR spectrum from that depicted in figure 3.7B.

100

Figure 3.12: demonstrating the galactose-binding ability of digested TgMIC4-A56. Superimposed 1H-15N HSQC spectra of digested TgMIC4-A56 before (black) and after (red) the addition of ten molar equivalents of galactose. Spectra were acquired at a protein concentration of ~175 µM. The observation of clear differences between the overlaid spectra demonstrates that an interaction occurs.

3.4.3. Initial characterisation of recombinant TgMIC4-A5 and TgMIC4-A6

The work detailed in section 3.4.2 demonstrates that recombinant TgMIC4-A56 contains an unfolded A6 domain alongside a folded A5 domain with lectin activity. In order to study this activity, recombinant TgMIC4-A5 was produced, with a view to solution structure determination and functional analysis. Recombinant production of TgMIC4-A6 was simultaneously carried out in order to assess its ability to fold in isolation and, if possible, characterise its function.

101

3.4.3.1. Gene cloning

The desired TgMIC4 DNA sequences were successfully amplified via PCR, as verified via agarose gel electrophoresis (figure 3.13A). Inserts were then cloned into pET-32 Xa/LIC and transformed into NovaBlue GigaSinglesTM E. coli cells with high efficiency. In all tested colonies the presence of the insert was verified by colony PCR (figure 3.13B) and sequence integrity verified (figure 3.13C).

Figure 3.13: sub-cloning of TgMIC4-A5 and TgMIC4-A6. A) Agarose gel electrophoresis of PCR products. M = molecular weight marker; relevant weights are indicated. Both fragments run at a weight of approximately 300 base pairs. B) Agarose gel electrophoresis of colony PCR products, verifying the presence of the genes of interest. C) an example chromatogram from the sequencing reaction of an TgMIC4-A5 construct. The depicted portion encodes TgMIC4-A5 residues 419-426.

3.4.3.2. Protein expression

Expression levels of TgMIC4-A5 and TgMIC4-A6 constructs were examined via SDS-PAGE (figure 3.14). In each case ITPG-induction resulted in the expression of a protein of approximately 31 kDa, i.e. slightly

102 larger than expected of either protein. In both cases expression of insoluble protein is also observed, most substantially in the case of TgMIC4-A6/trx (figure 3.14B; lane 3). Nonetheless, due to the significant level of concomitant soluble protein production, expression was scaled-up to 1 L culture volumes, in order to obtain sufficient yields for NMR studies.

Figure 3.14: SDS-PAGE analysis of TgMIC4-A5/trx (A) & TgMIC4-A6/trx (B) expression. M = molecular weight markers; masses of interest are labelled. Lane 1 = uninduced sample, insoluble fraction; lane 2 = uninduced sample, soluble fraction; lane 3 = induced sample, insoluble fraction; lane 4 = induced sample, soluble fraction. Expression products are arrowed.

3.4.3.3. Protein purification

3.4.3.3.1. Purification of TgMIC4-A5

TgMIC4-A5/trx was successfully purified from cell lysate via Ni2+-affinity chromatography (figure 3.15A), and digested using FXa (figure 3.15B). As for TgMIC4-A56/trx, the digestion reaction generated a ladder of trx fragments. Liberated TgMIC4-A5 was recovered at high purity (figure 3.15C; lanes 2 & 3), however gel filtration was performed nonetheless to ensure complete removal of low-level contaminants and to verify the oligomeric state of the protein (figure 3.15D). TgMIC4-A5 eluted as a single major peak, at an elution volume consistent with a 9 kDa monomer.

103

Figure 3.15: purification of TgMIC4-A5. M = molecular weight markers – relevant masses are indicated. A) Ni2+- affinity chromatography. Lane 1 = column flow-through; lane 2 = 5 mM imidazole wash fraction; lane 3 = 20 mM imidazole wash fraction; lane 4 = 200mM imazdole elution fraction. B) Factor Xa digestion of TgMIC4-A5/trx. Lane 1 = prior to protease addition; lane 2-5 = 2, 4, 6 and 18 hours after protease addition. The reaction generates multiple protein fragments, as indicated. C) Reverse Ni2+-affinity chromatography. Lane 1 = 200 mM imidazole elution fraction; lane 2 = column flow-through fraction; lane 3 = column wash fraction. The identities of the separated products are indicated. D) The gel filtration profile of TgMIC4-A5. Gel filtration was carried using a HiPrep S75 16/60 column of 120 ml volume (with a void volume of ~40 ml). The protein eluted at 87.5 ml, as verified by SDS-PAGE of pooled and concentrated fractions (see gel inlay).

104

During the course of these studies the decision was taken to incorporate a TEV protease digestion site (ENLYFQ↓G) into the TgMIC4-A5 construct. Initially this was merely an economically-motivated move, since recombinant TEV protease can be produced in-house. However the increased specificity of TEV over FXa also results in a ‘cleaner’ digestion, with no secondary products produced (see figure 3.16). The liberated TgMIC4-A5 domain was recovered via reverse Ni2+-affinity chromatography and gel filtration, as for unmodified TgMIC4-A5 (figure 3.15).

Figure 3.16: digestion of TgMIC4-A5/trx fusion protein with TEV protease. M = molecular weight markers. Lane 1 = prior to protease addition; lane 2-5 = 2, 4 and 18 hours after protease addition. Liberated protein products are labelled.

3.4.3.3.2. Purification of TgMIC4-A6

TgMIC4-A6/trx purification from cell lysate via Ni2+-affinity chromatography yielded the expressed protein alongside multiple products of lower mass (figure 3.17A). These are likely to be products of degradation, indicating that TgMIC4-A6 is present in an unstable form. In spite of this, digestion of TgMIC4-A6/trx with FXa was highly inefficient (data not shown), resulting in utilisation of the thrombin cleavage site. Consequently a substantial linker was retained at the TgMIC4-A6 N-terminus, giving rise to TgMIC4-A6 and trx fragments of similar size (~14 kDa) (figure 3.17B). The failure of FXa to digest an apparently unstable protein was somewhat surprising, but has been observed previously in our laboratory (Dr Jan Marchant, personal communications). Subsequent purification steps (described below) suggest that the protein is prone to aggregation, hence it is possible that this leads to occlusion of the protease from its target site (which lies only three residues from the domain boundary).

105

Figure 3.17: purification of TgMIC4-A6. M = molecular weight markers – relevant masses are indictated. A) Ni2+- affinity chromatography. Lane 1 = column flow-through; lane 2 = 5 mM imidazole wash fraction; lane 3 = 20 mM imidazole wash fraction; lane 4 = 200mM imazdole elution fraction. B) Thrombin digestion. Lane 1 = prior to protease addition; lane 2-4 = 2, 4 and 6 hours after protease addition. The reaction generates two distinct protein products. C) Reverse Ni2+-affinity chromatography. Lane 1 = 200 mM imidazole elution fraction; lane 2 = column flow-through fraction; lane 3 = column wash fraction. The identities of the separated products are indicated. TgMIC4-A6 runs significantly higher than its calculated molecular weight of ~13.3 kDa. D) The gel filtration profile of TgMIC4-A6. Gel filtration was carried using a HiPrep S75 16/60 column of 120 ml volume (with a void volume of ~40 ml). The protein eluted at 65.9 ml as verified by SDS-PAGE of pooled, concentrated fractions (see gel inlay).

106

Liberated TgMIC4-A6 was recovered via reverse Ni2+-affinity chromatography and purified via gel filtration (figure 3.17C & D). Concentration of the protein prior to gel filtration appears to promote aggregation, with the gel filtration profile containing a substantial void volume (containing high molecular weight species which are too large to enter the stationary phase). Additionally, TgMIC4-A6 elutes in a single major peak, at an elution volume expected of a ~30 kDa protein. The presence and stability of TgMIC4-A6 aggregates is demonstrated concisely by SDS-PAGE analysis of the corresponding (concentrated) fractions, in which several higher-order species are present despite the presence of ample DTT and SDS in the sample buffer (figure 3.17D; gel inlay).

3.4.3.4. NMR analysis

3.4.3.4.1. NMR analysis of 15N-TgMIC4-A5

The 1H spectrum of TgMIC4-A5 (figure 3.18A) bears the characteristics of a folded protein. There is one particularly prominent ring current-shifted signal, at 0 ppm, and several discernable Hα resonances between 5 and 6 ppm. The protein amides also bear a wide dispersion of resonances; best represented in the 1H-15N HSQC spectrum in figure 3.18B. This spectrum contains a total of 89 peaks; a sensible figure for a protein of 82 residues including three proline. The spectrum bears a particularly unusual resonance, at 1H = 11.6 ppm, which was not observable in the NMR spectra of TgMIC4-A56. The protein therefore appears to be folded and mono-disperse, and amenable to solution structure determination.

3.4.3.4.2. NMR analysis of 15N-TgMIC4-A6

The 1H and 1H-15N HSQC spectra of TgMIC4-A6 (figure 3.19) portray a sample which predominantly contains unfolded protein. Although the 1H spectrum (figure 3.19A) bears a clear ring current shifted signal, at approximately -0.5 ppm, and the 1H-15N HSQC (figure 3.19B) depicts poses signals at 10.1 ppm, overall resonance dispersion is poor, with large clusters of intense signals observed at 7.7-8.7 ppm and at 1 ppm. The presence of the substantial flexible linker at the protein N-terminus means that observation of unfolded protein should be expected, however the intensity of these signals coupled with the lack of structured features suggests that the domain itself is predominantly unfolded. The few observed ‘structured’ signals are relatively broad, consistent with the apparent self-associative nature of the protein.

107

A

B

Figure 3.18: NMR analysis of TgMIC4-A5. A) the 1H spectrum. B) the 1H-15N HSQC spectrum. The wide dispersion of resonances, extending >9 ppm and <1 ppm, indicates that structured protein is present. The spectra bear very little signal overlap, suggesting that the protein is fully folded.

108

A

B

Figure 3.19: NMR analysis of TgMIC4-A6. Spectra were acquired at a protein concentration of ~200 µM. A) the 1H spectrum. B) the 1H-15N HSQC spectrum. The spectra portray a protein which is predominantly unfolded and aggregated.

109

3.4.3.4.3. Comparison of TgMIC4-A5 and TgMIC4-A6 with TgMIC4-A56

A comparison of the 1H-15N HSQC spectra of TgMIC4-A5 and TgMIC4-A56 is provided in figure 3.20. As expected, almost all of the ‘structured’ resonances from TgMIC4-A56 are retained in TgMIC4-A5, confirming that it comprises the folded, galactose-adhesive portion of TgMIC4-A56. Additionally, the clustered, overlapping (i.e. ‘unfolded’) resonances of TgMIC4-A56 are absent in TgMIC4-A5, reaffirming that it is the TgMIC4-A6 sequence which gives rise to these signals in TgMIC4-A56.

Figure 3.20: Comparison of TgMIC4-A5 and TgMIC4-A56. Superimposed 1H-15N HSQC spectra of TgMIC4-A56 (black) and TgMIC4-A5 (red). Both spectra were recorded in identical conditions (303K, pH 6.5). Most of the structured resonances in digested TgMIC4-A56 are accounted for by TgMIC4-A5.

Interestingly, there is small number of ‘structured’ resonances in TgMIC4-A56 which are not attributable to TgMIC4-A5 (i.e. at 1H = 10.2 ppm/15N = 129 ppm). These resonances are retained in trypsin-digested

110

TgMIC4-A56 (figure 3.10), suggesting that a minority of residual TgMIC4-A6 residues are in fact structured. A similar pattern of resonances is also observed in this region of the 1H-15N HSQC spectrum of TgMIC4-A6 (figure 3.19B). This suggests that the domain exists in the same state (i.e. largely unfolded and self-associated) in either context.

3.5. Discussion & concluding remarks

This chapter has described the recombinant expression and purification of the TgMIC4-A56 domain pair from TgMIC4, demonstrated to be partially-folded and self-associated via NMR spectroscopy. Limited proteolysis with trypsin yielded a stable fragment, with a markedly improved 1H-15N HSQC spectrum demonstrating the removal of the majority of unfolded protein. During purification both the TgMIC4- A56/trx and liberated TgMIC4-A56 are susceptible to spontaneous degradation, indicative of instability at the C-terminus (i.e. TgMIC4-A6). This was confirmed via proteomic analysis of the trypsin-digested protein, revealing that most of TgMIC4-A6 is removed, aside from a residual cross-linked portion of TgMIC4-A6.

The ability of the TgMIC4-A5 fragment to bind galactose has also been demonstrated via NMR spectroscopy. The identification of such activity is contradictory to a previous report stating that TgMIC4 does not bind host cells in the absence of twelve residues at its C-terminus. Interestingly, much of the TgMIC4 C-terminal tail is retained in trypsin-digested TgMIC4-A56. Comparison of the 1H-15N HSQC spectra of TgMIC4-A5 and digested TgMIC4-A56 reveals that these residues are unstructured and do not contribute to galactose-binding. However, since TgMIC4-A6 is unfolded within TgMIC4-A56, it is probable that the residual C-terminal tail is not present in its native conformation. In order to investigate the role of TgMIC4-A6 and the C-terminal tail, this portion of TgMIC4 was produced recombinantly, yielding an unfolded, aggregated protein. Despite several attempts, including variation of expression conditions and protein refolding, it has not proved possible to produce a folded protein sample for functional studies. Therefore the precise of role TgMIC4-A6 and the C-terminus remains unknown.

In order to further study the lectin activity of TgMIC4-A5, the domain has been produced recombinantly. The 1H-15N HSQC spectrum of the protein bears close resemblance to those of TgMIC4-A56 in terms of their ‘structured’ signals. However, the absence of residual unstructured residues means that overall the

111 sample is of substantially higher quality, as demonstrated by its amenability to solution structure determination; described in chapter 4.

112

Chapter 4: Solution structure determination of TgMIC4-A5

113

4.1. Introduction

The results described in chapter 3 describe the identification of A5 as the structured, adhesive domain within recombinant TgMIC4-A56. Subsequent recombinant production of TgMIC4-A5 yielded a sample with good NMR spectral properties (figures 3.18). This chapter describes the determination of the solution structure of TgMIC4-A5 using NMR spectroscopy.

4.2. Experimental requirements for solution structure determination

Protein structure determination via NMR spectroscopy first requires chemical shift assignment of all (or as many as possible) 1H, 15N and 13C nuclei (i.e. NMR ‘active’) in the protein. Typically this involves recording of a series of triple-resonance data (chapter 2.6.2.1) which together enable identification protein backbone chemical shifts (NH, HN, CO, Cα, Cβ, Hα and Hβ) (e.g. HNCACB). Additional aliphatic side-chain resonances are then deciphered using TOCSY experiments, which correlate all of the J-coupled 1H and 13C atoms in each amino acid side-chain, whilst side-chain amide and aromatic nuclei are typically assigned using NOESY data.

The bottleneck of protein structure determination using NMR is the generation of ‘restraints’, i.e. experimentally-derived parameters which guide structure calculation. For example, ‘distance restraints’ are approximations of inter-nuclear distances. These are derived from the successful interpretation of NOESY spectra, which correlate the chemical shifts of proximate hydrogen nuclei (see chapter 2.5). Prior knowledge of protein chemical shifts enables assignment of NOE cross-peaks, and the distance-dependence of NOE signal intensity allows approximate inter-nuclear distances to be calculated. Distance restraints are therefore an abundant and powerful source of structural information. Additional complementary restraints, concerning dihedral angles, disulphide and hydrogen-bonds, can also be implemented to aid the structure calculation.

4.3. Materials & methods

4.3.1. Experimental set-up and data acquisition

Uniformly 15N/13C-labelled TgMIC4-A5 samples of concentration >200 μM were prepared as described in chapter 3.3. Two samples were used to record the NMR spectra, due to degradation of the initial sample

114 over time. A series of experiments were conducted in NMR buffer (see Appendix A2), before lyophilisation

13 and reconstitution of the sample in 100% D2O prior to acquisition of TOCSY and C-NOESY-HMQC experiments, in order to minimise artefacts caused by H2O in these spectra.

A summary of the complete set of NMR spectra acquired for the structure calculation process is provided in table 4.1. All spectra were acquired at 303K, using in-house Bruker Avance II DRX500 and DRX800 spectrometers equipped with TXI cryoprobes. Data was processed using NMRPipe (Delaglio et al. 1995) and visualised using NMRView (Johnson & Blevins 2000). Spectra were referenced in the direct 1H dimension according to the resonance frequency of water. Indirect 15N and 13C dimensions were referenced using the frequency ratios determined by Wishart et al. (1995).

4.3.2. Chemical shift assignment

TgMIC4-A5 chemical shift assignment was carried out using in-house scripts for NMRView (Dr Jan Marchant, Imperial College London). Backbone assignment was facilitated by use of MARS (Zweckstetter & Jung, 2004).

4.3.3. Generation of restraints for structure calculation

4.3.3.1. NOE assignment; ARIA

NOESY spectra peaks were picked using NMRView, and lists were moderated manually in order to remove spectral noise and waters signals. NOE assignment was performed using ARIA (Ambiguous Restraints for Iterative Assignments), a computer program capable of automated NOE assignment and structure generation (Nilges et al. 1997). Armed with a chemical shift table and NOESY peak list, ARIA performs NOE assignment in a matter of seconds. Short-range couplings (e.g. involving atoms in adjacent residues) can often be unambiguously assigned, however the assignment of long-range couplings (i.e. involving atoms separated by >4 residues) can be complicated by the presence of multiple viable assignments, giving rise to ambiguity. ARIA initially implements ambiguous NOEs as ambiguous distance restraints (ADRs) (Nilges 1995), many of which are deconvoluted during the structure calculation.

115

Experiment Nucleus Data Spectral Field Comments points width (Hz) strength (Hz) 1H 2048 7507.508 500.20 1H-15N HSQC 15N 256 1520.681 50.68 1 H 2048 6265.664 500.20 Recorded in 100% D 0. 1H-13C-HSQC 2 13C 406 8802.817 50.68 Selective for aliphatic nuclei. 1 13 1 H- C-HSQC H 2048 6265.664 500.20 Selective for aromatic side- (aromatic-selective) 13C 148 4528.986 50.68 chain nuclei. 1 H 1024 7507.508 500.20 HNCACB 15N 74 1520.681 50.68 13C 96 8333.333 125.78 1 H 1024 7507.508 500.20 CBCA(CO)NH 15N 74 1520.681 50.68 13C 96 8333.333 125.78 1 H 1024 7507.508 500.20 HN(CA)CO 15N 74 1520.681 50.68 13C 110 1666.667 125.78 1 H 1024 7507.508 500.2 HNCO 15N 74 1520.681 50.68 13C 104 1666.667 125.78 1 H 1024 7507.508 500.20 (H)CC(CO)NH 15N 78 1520.681 50.68 13C 128 11947.431 125.78 1 H 1024 12019.231 800.32 HBHA(CO)NH 15N 120 2433.090 81.10 1H-in 128 4803.074 300.32 1 H 1024 6250.000 500.20 13 H(C)CH-TOCSY C 80 4401.408 125.78 Recorded in 100% D20. 1H-in 256 4752.852 500.20 13 C 256 8810.573 125.78 13 (H)CCH-TOCSY C-J 80 4402.377 125.78 Recorded in 100% D20. 1H 1024 6250.000 500.20 1H 2048 7507.508 500.20 15N-NOESY-HSQC 15N 80 1520.681 50.68 Mixing time = 100ms. 1H-in 256 7507.508 500.20 1H 2048 10000.000 800.32 Recorded in 100% D 0. 13C-NOESY-HSQC 1H-in 256 14084.507 800.32 2 Mixing time = 100ms. 13C 80 10000.000 201.23

Table 4.1: acquisition parameters of the NMR experiments recorded for the solution structure determination of TgMIC4-A5.

116

In its initial step (‘iteration 0’), ARIA uses only unambiguous distance restraints to calculate an ensemble of structures, via liaison with ‘Crystallography & NMR System’ (CNS) (Brünger et al. 1998). The lowest-energy structures are then used to analyse the NOESY data. Firstly, the spectra are calibrated using the average inter-nuclear distances across the low-energy ensemble. ADRs are then assessed for NOE violations, with assignments which are consistently violated discounted. A new restraints list is then created, and merged with the initial list, before calculation of a new ensemble of structures. The process is repeated for numerous iterations, during which the number of ADRs is minimised.

4.3.3.2. Dihedral angle restraints; TALOS

Dihedral angle restraints were obtained using the computer program TALOS (Cornilescu et al. 1999). The chemical shifts of backbone atoms in α-helical and β-sheet configurations deviate predictably from random coil values, giving rise to ‘secondary shifts’. (Incidentally, this forms the basis of chemical shift indexing; described in section 4.4.1.2). Secondary shifts are utilised by TALOS in predicting phi (φ) and psi (ψ) values for each residue. TALOS fragments the protein sequence into overlapping amino acid triplets, which are used to scan a database of 186 protein crystal structures for triplets with similar sequences and secondary shifts. If the φ and ψ values of the ten most closely-matched triplets are consistent (i.e. in the same region of the Ramachandran map), then the averages (and standard deviations) are used as predictions for the central residue in the template triplet.

The reliability of TALOS has been assessed via sequential removal of proteins from the database and prediction of their dihedral angles using the remaining data. Overall it was found that TALOS makes predictions for an average of 72% of residues and, across all 186 proteins, only 1.8% of predictions were found to be incorrect relative to the crystal structure. It is believed that inaccuracies are mainly due to actual differences in protein structure in the crystalline state (Cornilescu et al. 1999).

4.3.3.3. Hydrogen-bond restraints

Following lyophilisation of the NMR sample and reconstitution in 100% D2O (i.e. for the TOCSY experiments), 1 15 solvent exchange of NH nuclei leads to the disappearance of resonances from the H- N HSQC spectrum. Protein secondary structure is maintained largely by hydrogen-bonds between backbone amide and carbonyl groups. Amides which form H-bonds undergo solvent exchange more slowly and therefore are

117 retained in the 1H-15N HSQC spectrum for a longer time. By acquiring a series of 1H-15N HSQC spectra immediately after reconstitution in 100% D2O, it was therefore possible to identify residues which form secondary structure. H-bonds were invoked as distance restraints (i.e. specifying a 3-4 Å distance between non-hydrogen atoms).

4.3.3.4. Disulphide bond restraints

In proteins which are believed to contain disulphide bonds, disulphide restraints can also be implemented. As discussed in chapter 1.4.2, apple domains bear a conserved pattern of disulphide linkages. It was therefore expected that TgMIC4-A5 would bear C10/C79, C35/C57 and C39/C45 cross-links. Disulphide restraints can take the form of a distance restraint (i.e. specifying that Sγ atoms must be within 1.95-2.15 Å of each other) or a covalent linkage (i.e. governing removal of Hγ atoms and formation of the Sγ-Sγ bond).

4.3.4. Structure calculation

Structure calculations were performed using ARIA v2 (Nilges et al. 2007), in conjunction with CNS v1.2 (Brunger 2007).

4.3.4.1. The simulated annealing protocol

Structure calculations by ARIA/CNS make use of a simulated annealing protocol. TgMIC4-A5 structure calculations were carried out using the default protocol. This involves simulated heating of the starting molecule (e.g. an extended polypeptide in ARIA iteration 0) to 10,000 K, conferring a high-energy system (i.e. high atomic mobility) in which the molecule evolves via torsion-angle dynamics (TAD) (Stein et al. 1997), guided by topallhdg5.3.pro and parallhdg5.3.pro force-fields, defining torsion angle and non-bonded interaction parameters (Nilges & Linge 1999). The simulation is iterated for 1100 steps in search of a global energy minimum, before gradual cooling of the system to 2000 K over 550 steps. As the temperature (and thus atomic mobility) is decreased, the molecule continues to evolve via TAD, under the incrementally increased influence of the experimental restraints. The temperature is then further decreased to 0 K, during which the energy of the structure is minimised via molecular dynamics in Cartesian space (Brünger et al. 1998; Linge et al. 2003).

118

4.3.4.2. Calculation set-up

Data files describing the NOESY peaks, chemical shift assignments, dihedral angles, H-bonds, disulphide linkages and protein sequence were uploaded into ARIA v2. For initial TgMIC4-A5 structure calculations, NOE assignment tolerances were set to 0.1 ppm in both the direct and indirect 1H dimensions and 1 ppm in the 15N and 13C dimensions. During refinement, 1H and 15N tolerances were incrementally lowered to 0.04 ppm and 0.5 ppm respectively.

ARIA calculations were run for eight iterations (excluding the initial iteration 0), in which twenty structures were generated. The eight lowest-energy structures were then used to refine the NOESY data at the onset of the subsequent iteration (as described in chapter 4.3.3.1). Following iteration 8, the ten lowest-energy structures were subjected to a final water-refinement step (Linge et al. 2003).

4.3.4.3. Violation analysis

During NOE calibration, ARIA calculates inter-nuclear distances with error-bounds. If the corresponding nuclei are separated by a distance exceeding the error-bound in the resulting structure ensemble, then the NOE is said to be violated. The same is also said of dihedral angles which are outside of the standard deviation provided by TALOS. In order to enhance the validity of an NMR structure, it is necessary to minimise the number of significant violations (i.e. those above 0.5 Å for NOEs or above 5o for dihedrals). Although initial TgMIC4-A5 structure calculations gave rise to ensembles of reasonably well-converged apple domain folds, these bore multiple NOE violations over 0.5 Å. In all cases these were attributable to inaccurately picked NOE peak centres and incorrect (or missing) chemical shift assignments, giving rise to incorrect NOE assignments. After several rounds of structure calculation and NOESY data refinement, the finalised solution structure ensemble of TgMIC4-A5 was obtained.

119

4.4. Results

4.4.1. Chemical shift assignment

4.4.1.1. Backbone assignment

1 15 NH, HN, CO, Cα and Cβ cross-peaks were picked manually via analysis of H- N HSQC, HNCACB, CBCA(CO)NH, HN(CA)CO and HNCO spectra using in-house scripts for NMRView, and assigned via a semi-automated approach using MARS. Armed with a chemical shift table, protein sequence, and secondary structure prediction, MARS is capable of identifying sequential connectives in a matter of minutes. The initial MARS run successfully assigned approximately 70% of the protein backbone. Gaps in the assignment were filled via the amendment of inaccurately picked cross-peaks and the inclusion of forced assignments, identified manually via inspection of the data, in subsequent MARS runs.

Following the final MARS run, all five nuclei (four in glycine) were assigned in 71 out of 79 (90%) of non- proline residues (figure 4.1). Within the putative TgMIC4-A5 domain boundaries (C10-C79), 64 out of 68 (94%) non-proline residues were assigned. Sequential connectives were verified manually using NMRView (figure 4.2). It was not possible to assign residues S1-F4, as their resonances were not present in the NMR spectra, indicating that they are unstructured and flexible on an intermediate timescale (i.e. causing signal broadening). Residues H5-E9 and T81-S82 also out-lie the putative domain boundaries and may also be unstructured, however closer proximity to the globular domain appears to confer slower motions and appearance of their resonances in the spectra.

The resonances of residues C10 and C79 (and their i+1 residues; V11 and D80 respectively) were also absent from the NMR spectra. According to apple/PAN domain conventions these cysteines are expected to be disulphide-linked. In native TgMIC4, A5 is separated from A6 by only three residues, hence it is likely that the lack of adjacent domain in TgMIC4-A5 leaves the terminal disulphide-linked cysteines surface-exposed. This may lead to ‘crankshaft’ motions, where rotation of the χ1 bond of one cysteine residue causes isomerism of the disulphide linkage. This has been previously observed in disulphide-bonded proteins, leading to a loss of NMR signals (Dames et al. 2005; Grey et al. 2003).

120

Additionally, the amide resonance of residue S30 was missing from the 1H-15N HSQC spectrum. The remaining backbone resonances were detectable, suggesting that the residue is conformationally stable but with an amide hydrogen atom undergoing rapid exchange with water.

Figure 4.1: backbone amide chemical shift assignments for TgMIC4-A5. The 1H-15N HSQC of TgMIC4-A5 labelled with backbone amide chemical shift assignments. Pairs of resonances which correspond to side-chain amides – unassigned at this stage – are highlighted by red lines.

121

Figure 4.2: strips from the CBCA(CO)NH and HNCACB spectra of TgMIC4-A5. The depicted region encompasses residues T22-E25. HNCACB strips bear cross-peaks for the Cα (black) and Cβ (red) nuclei of residues i and i-1. The CBCA(CO)NH strips enable discrimination of these resonances as they only bear cross-peaks for i-1 nuclei. The Cαi-1 and Cβi-1 peaks in each CBCA(CO)NH strip align with the Cα and Cβ peaks in the HNCACB strip of the preceding residue, as they correspond to the same nuclei. This enables sequential alignment of pairs of strips.

122

4.4.1.2. Chemical shift indexing (CSI)

Backbone assignment was completed via the assignment of Hα and Hβ nuclei using HBHA(CO)NH data (or TOCSY data for proline nuclei). Selected assignments were then used to predict the secondary structure of TgMIC4-A5 via chemical shift indexing (CSI) (Wishart et al. 1992). This method is based upon the observed chemical shifts differences of nuclei in secondary structure with respect to random coil values. For example, Hα/Cβ nuclei within α-helical or β-sheet configurations experience respective up-field and down-field shifts compared with the random coil resonance, whilst the opposite is true of Cα nuclei. Chemical shifts can therefore be used to provide information on protein secondary structure (figure 4.3).

Figure 4.3: the chemical shift index (CSI) of TgMIC4-A5. Depicted are the chemical shift indexes for the Hα, Cα and Cβ atoms of residues C10-C79. Chemical shifts which are diagnostic of α-helical and β-sheet configurations are represented by red and blue ‘lollipops’ respectively, continuous sequences of which give rise to the underlying secondary structure prediction. Resonances which are suggestive of random coil configuration are represented by grey circles. Additional regions of expected secondary structure which are not predicted by CSI are encircled by boxes.

The CSI-derived secondary structure prediction of TgMIC4-A5 contains a single α-helix flanked by three β- strands, in positions which are consistent with the secondary structures of known apple domain structures (i.e. β1’, β3 and β5 in figure 1.8B). However, based on these known structures, the predicted α-helical region

123

(spanning residues R36-Q40) is shorter than expected, whilst several β-strands (i.e. β1, β2, β4 and β4’ in figure 1.8B) are not predicted when Hα, Cα and Cβ chemical shifts are considered together. However, isolated examination of Hα shifts trends - the most sensitive indicator of secondary structure - suggests that these missing features may in fact occur in the expected positions (indicated in figure 4.3).

4.4.1.3. Side-chain assignment

Assignment of aliphatic, aromatic and amide side-chain nuclei was undertaken using a variety of methods (described below). The complete list of assignments (Appendix A6) has been deposited in the Biological Magnetic Resonance Data Bank (BMRB) (accession number 17667). A list of unusual assignments, where the chemical shift differs significantly from the database average, is provided in table 4.2.

The residues C10 and C79 remain completely unassigned. Due to the missing amide resonances of V11 and D80, it was not possible to assign their Hα/Cα and Hβ/Cβ resonances during backbone assignment, whilst further efforts to assign these nuclei using NOESY data (i.e. searching for NOEs from proximate assigned nuclei) were unsuccessful. The proposed crankshaft motions of the disulphide linkage may preclude the observation of NOEs.

Chemical BMRB chemical shift (ppm) Residue Nucleus shift (ppm) Average Std. dev. R36 Hα 2.45 4.27 0.44 H47 C 182.5 175.2 2.1 N51 Hδ22 9.6 7.27 0.52 K60 Hβ2 0.03 1.79 0.25 Hβ2 0.36 2.05 0.39 P64 Hβ3 0.56 2.05 0.39 Hγ3 0.42 1.92 0.37

Table 4.2: atypical chemical shifts in TgMIC4-A5. A list of TgMIC4-A5 nuclei whose resonances lie outside of the chemical shift range (average ± standard deviation) for the atom type stated in the Biological Magnetic Resonance Data Bank (BMRB).

124

4.4.1.3.1. Aliphatic assignments.

Aliphatic side-chain atoms were assigned via manual inspection of H(C)CH-TOCSY and (H)CCH-TOCSY datasets, using in-house scripts for NMR View (figure 4.4). This enabled the assignment of all aliphatic 1H and 13C nuclei within residues T13-R77. The remaining residues are either partially or totally unassigned.

4.4.1.3.2. Aromatic assignments

Due to their hydrophobicity, aromatic residues are often buried in the core of a protein. Consequently their nuclei often bear large numbers of NOEs, hence their assignments are particularly desirable. The chemical shifts of the γ-, δ- and ε-nuclei of the phenylalanine, histidine and five tyrosine residues were assigned using 13C-NOESY-HSQC data. This approach utilises the observance of NOEs between Hβ and Hδ/Hε nuclei, as described in figure 4.5.

4.4.1.3.3. Amide assignments

The core domain of TgMIC4-A5 contains multiple basic residues; five arginines, three glutamines, two asparagines and one histidine. Attempts to assign their side-chain amide resonances were made using 15N-

NOESY-HSQC data, utilising the observance of intra-residue NOEs from HN to assigned nuclei within the side- chain. This proved difficult in many and as such the amide resonances of all five arginines, three glutamines and one asparagine are either unassigned or ambiguous. However, it did prove possible to assign two rather atypical HN resonances, at 9.6 ppm and 11.6 ppm (figure 4.1), to N51 Hδ (figure 4.6) and H47 Hδ/Hε respectively.

125

Figure 4.4: aliphatic 1H side-chain assignment using H(C)CH-TOCSY data. Depicted are the data strips corresponding to residue I23. TOCSY experiments correlate all J-coupled 1H and 13C atoms within an amino acid side-chain. Knowledge of 13 HX/CX resonances enables extraction of data strips which bear the resonances of all of the H atoms within the spin system, and thereby facilitates assignment.

126

Figure 4.5: assignment of aromatic side-chain nuclei. Described is the assignment of Y48. Knowledge of Hβ1 (2.72ppm), Hβ2 (2.79ppm) and Cβ (~40.9ppm) resonances allows localisation of the section of the 13C-NOESY-HSQC spectrum which bears NOEs from Hβ’s (top-left panel). The signals at approximately 6.8ppm represent NOEs from Hβ to Hδ, whilst NOEs from Hβ to Hε are also visible at 6.6 ppm. Theoretically Hβ-to-Hε correlations are beyond the NOE detection limit however NOEs can arise due to spin diffusion. The Hδ resonance frequency of ~6.8 ppm can now be used to seek out the Cδ resonance (top-right panel), which in turn can verify the Hε (bottom-left panel) assignment. The Hε assignment can then be used to seek out the Cε chemical shift (bottom-right panel).

127

Figure 4.6: side-chain amide assignment using 15N-NOESY-HSQC data. Described is the assignment of the N51 amide. The 1H-15N HSQC (lower panel) provides the resonances frequencies of the amide atoms (HN = 9.6 ppm and 7.1 ppm, NH = 118ppm). These are used to extract the relevant area of the 15N-NOESY-HSQC spectrum (upper panel). This spectral window bears two NOEs correlating the respective HN nuclei, with two further NOEs at approximately 2.8 ppm. These are likely to represent additional intra-residue correlations with an aliphatic 1H nucleus. Inspection of the 1H assignments for each asparagine and glutamine residue in the protein reveals that only N51 has an assignment at ~2.8ppm (Hβ2 = 2.83ppm - see Appendix A4). This enables unambiguous assignment this particular side-chain amide to N51.

128

4.4.2. Solution structure calculation of TgMIC4-A5

4.4.2.1. Sequence submission

As described in section 4.4.1, residues S1-F4 were completely unassigned. Additionally, it was elucidated through manual inspection of NOESY data that residues H5-E9 and D80-S82 bore only intra-residue and sequential NOEs, and therefore would be largely ‘unrestrained’ during structure calculation. These residues were therefore omitted from the submitted sequence, in order to avoid erroneous interference with the structure of the core sequence. Although residues C10 and C79 were also unassigned, they were included in the submitted sequence due to their putative domain-capping disulphide linkage.

4.4.2.2. Experimental restraints

A summary of the implemented experimental restraints is provided in table 4.3. A total of 1008 NOEs were assigned by ARIA during the structure calculation (837 unambiguous and 171 ambiguous) over 73 residues, giving rise to 13.8 NOE-derived restraints per residue. A further 38 TALOS-derived dihedral angle restraints were implemented, for residues which were predicted to be in secondary structure (based on the chemical shift index; chapter 4.4.1.2).

During the process of sample lyophilisation and reconstitution in 100% D2O, the residues C35, R36, A37, R38, C39, Y48, T49, N51, L56, C57, Y58 and K60 were identified as forming H-bonds. However, restraints were omitted from initial structure calculations in order to avoid biasing of the protein conformation. After several rounds of data/structure refinement, and observance of proximity between putative residue pairs, restraints were implemented. Similarly, disulphide-bond restraints were also omitted from initial structure calculations. Covalent linkages between C35/C57 and C39/C45 were invoked following the assignment of Hβ-to-Hβ NOEs by ARIA. Additionally, the Cβ resonances of these residues are consistent with the residues being in the oxidised (i.e. disulphide-linked) state. However the failure to assign residues C10 and C79 means that NOEs between these residues were not detectable. Thus, due to the lack of experimental evidence supporting the presence of the C10/C79 disulphide bond, structure calculations were performed both with and without the linkage invoked.

129

NOE-derived 1008

Ambiguous 171 Unambiguous 837 Intra-residue 386 Sequential 158 Short range (|i-j| 2-3) 63 Medium range (|i-j|4-5) 9 Long range (|i-j| >5) 221 Dihedral angles (φ/ψ) 38

H-bond 12

Disulphide 2/3

Total 1060/1061

Table 4.3: a summary of the experimental restraints used for the structure calculation. A breakdown of the NOE- derived restraints (further broken down into type), dihedral angles, H-bonds and disulphide linkages. The total number of restraints varies depending on whether the C10/C79 disulphide linkage is implemented.

4.4.2.3. The tertiary structure of TgMIC4-A5.

Final ensembles of ten low-energy TgMIC4-A5 structures, and average energy-minimised structures, are pictured in figures 4.7 and 4.8. Structure statistics are additionally provided in table 4.4. The final structure ensembles have been deposited in the Protein Data Bank (PDB) (Berman et al. 2000) under the accession code 2LL3. TgMIC4-A5 bears a canonical apple/PAN fold, with a central 3 stranded β-sheet flanked by an α- helix on one face, and a short two-stranded β-sheet on the other. Based on the apple/PAN domain secondary structure topology depicted in figure 1.8B, the helix (α1) encompasses residues L31-A41, whilst the strands β1’, β3, β4, β4’ and β5 encompass residues I16-S18, S46-N51, S55-R61, F66-K68 and D72-S76 respectively. The strands β1 and β2 are unrefined in the depicted structures, however the positions of residues H12-N15 and K27-A29 with respect to the central β-sheet suggest that they form these strands.

Inclusion of a C10/C79 disulphide bond appears to be largely inconsequential in terms of the overall domain structure (figure 4.8), with the two cysteines lying relatively proximate even when the linkage is not invoked. Both ensembles converge closely across residues N15-R77, whilst the outlying residues are divergent, due to

130 a lack of NOE restraints guiding their configuration (as depicted in figure 4.9). Inclusion of a C10/C79 bond reduces divergence in these regions, as reflected by the respective RMS deviations of all heavy atoms (table 4.4).

Figure 4.7: the solution structure of TgMIC4-A5. A) the average energy-minimised structure of TgMIC4-A5 with the C10-C79 disulphide bond included. Sulphur atoms are coloured in yellow. B) superimposed Cα traces of the ten low- energy structures. The structures converge with an RMS deviation of 0.27 ± 0.043 Å for backbone atoms in secondary structure.

131

Figure 4.8: the effects of invoking the C10-C79 disulphide linkage. A) a superimposition of the average energy- minimised TgMIC4-A5 structures when calculated with the C10-C79 disulphide bond included (green) and omitted (red). The structures converge with an RMS deviation of 0.335 Å over 721 atoms. B) superimposed Cα traces of the ten low-energy TgMIC4-A5 structures with the C10-C79 disulphide linkage omitted. The structures converge with an RMS deviation of 0.24 ± 0.049 Å for backbone atoms in secondary structure.

132

C10-C79 disulphide C10-C79 disulphide Structural parameter included omitted

RMS deviations (Å) Backbone atoms in secondary structure 0.29 ± 0.041 0.28 ± 0.042 Heavy atoms in secondary structure 0.80 ± 0.081 0.77 ± 0.082 All backbone atoms 0.71 ± 0.22 0.98 ± 0.38 All heavy atoms 1.15 ± 0.12 1.27 ± 0.31 RMS deviations from idealized covalent geometry Bond lengths (Å) 0.0042 ± 0.00012 0.0041 ± 0.00018 Bond angles (o) 0.50 ± 0.018 0.48 ± 0.019 RMS deviations from experimental restraints Bond lengths (Å) 0.034 ± 0.0023 0.033 ± 0.0017 Bond angles (o) 0.59 ± 0.15 0.57 ± 0.082 Energies (kcal mol-1)

Ebonds 19.32 ± 1.11 18.27 ± 1.65

Eangles 19.72 ± 5.43 20.38 ± 5.93

Eimpropers 192.27 ± 27.88 165.44 ± 9.52

Edihedrals 337.46 ± 7.25 335.62 ± 6.17

Evdw -626.62 ± 6.87 -618.728 ± 6.66

Eelec -2524.70 ± 51.50 -2539.01 ± 31.00

Etotal -2522.54 ± 63.54 -2568.03 ± 29.88

Table 4.4: a summary of TgMIC4-A5 structure statistics. RMS deviations and energies are expressed as the mean average (± standard deviation) across the ten structures. These figures were obtained from ARIA.

133

4.4.3. Solution structure validation.

In order to attest to the validity of the calculated structure of TgMIC4-A5, detailed analysis of the structure statistics has been carried out. Additional validation procedures, namely the analysis of dihedral angles, have been performed via submission to the Protein Structure Validation Suite (PSVS) (Bhattacharya et al. 2007), a server which integrates numerous reputable evaluative tools, including PROCHECK (Laskowski et al. 1993) and MolProbity (Davis et al., 2007). Due to strong similarities in the structure statistics and geometric properties of the calculated structures, data and discussion is only provided for the TgMIC4-A5 structure which includes a C10-C79 disulphide linkage.

4.4.3.1. Structure statistics and experimental restraints.

The ensemble of TgMIC4-A5 structures bears no NOE or dihedral violations over 0.5 Å or 5o respectively. Additionally bond lengths and angles are in good agreement with idealized geometries and experimental restraints (table 4.4). The domain structure is well-defined overall, with respective RMS deviations of approximately 0.29 Å and 0.8 Å for secondary structured backbone and heavy atoms (i.e. N, C, O and S). The equivalent values are less favourable over the entire protein, due to the N- and C-terminal divergence in the structure ensemble.

Plots of RMSD and NOEs per residue (figures 4.9B and 4.9C) clearly demonstrate the pattern of convergence across the protein sequence, and its strong correlation with the degree of restraint. Residues C10-G14 and R77-C79 bear high individual RMSDs (>1 Å), due to an overall lack of restraints defining their configurations; the result of several residues being either partially or fully unassigned. The intersecting sequence generally converges closely (with RMS deviations of 0.2-0.4 Å), consistent with the observation of large numbers of NOEs. However the RMSD per residue occasionally fluctuates above 0.5 Å, namely for residues K43, K53, K63 and L70, each of which occurs in a surface-exposed loop (figure 4.9A). Consequently they bear relatively low numbers of NOEs (figure 4.9C), and are afforded increased configurational freedom during structure calculation compared with a better-restrained ‘buried’ residue.

134

A B

C

Figure 4.9: RMSD and NOE-derived restraint per residue analysis. A) The secondary structure configuration of TgMIC4- A5, with the α-helix and β-sheets represented by a blue box and red arrows respectively. B) A graph showing the RMSD of each residue in the structure ensemble. RMSD values were calculated by ARIA, using the coordinates of HN, NH, Cα, C and O atoms. C) A graph showing the number of unambiguous NOEs per residue, broken down into type; intra- residue, sequential, short-range, medium-range and long-range.

135

4.4.3.2. Backbone (φ/ψ) angles.

The validity of backbone phi (φ) and psi (ψ) angles can be assessed by plotting a Ramachandran map; a graph of ψ vs φ, in which values usually lie within certain regions (Ramachandran et al. 1963). Ramachandran maps for the solution structure ensemble of TgMIC4-A5, derived using MolProbity, are pictured in figure 4.10. These show that only a single residue, R77, in just one of the ten structures bears a ψ/φ combination in a disallowed region. Meanwhile, the plot for the average energy-minimised structure shows that T13 lies in a disallowed region (figure 4.11). In both cases T13 also has a low phi-psi ‘G-factor’, a score calculated by PROCHECK which defines the ‘normality’ of a stereo-chemical parameter (a low score, such as T13 Gφψ = -3, denotes an unusual value).

The unusual nature of T13 and R77 dihedral angles is verified further by PROCHECK. Whilst MolProbity classifies dihedral angles using updated definitions (Lovell et al. 2003), PROCHECK utilises more stringent guidelines, and as a result finds additional dihedral angles in disallowed regions (table 4.5), all of which correspond to residues T13 and R77. Both residues lie within the divergent regions of the TgMIC4-A5 structure, where increased configurational freedom appears to give rise to unfavourable dihedral angles in a few instances amongst the structure ensemble.

Plot region MolProbity PROCHECK Most favoured 645 (94.9%) 509 (86.3%) Additionally allowed 34 (5%) 74 (12.5%) (Generously allowed) - 2 (0.3%) Disallowed 1 (0.1%) 5 (0.8%) Total 680 590

Table 4.5: comparison of Ramachandran plot statistics derived using MolProbity and PROCHECK. The more stringent guidelines of PROCHECK mean that a lower percentage of residues are found in the most favoured regions. PROCHECK does not classify glycine and proline residues, hence the lower overall total of assessed residues.

136

Figure 4.10: Ramachandran plots of backbone torsion angles (φ/ψ) from the TgMIC4-A5 structure ensemble. Ramachandran plots for all non-glycine/non-(pre-)proline residues (i.e. general) alongside those of glycine, proline and pre-proline residues. Light and dark blue lines respectively encircle the most favoured and additionally allowed regions of the plot. Angles which out-lie these regions are said to be of an unfavourable conformation. All ten structures from the final ensemble were submitted for analysis, with only a single ψ/φ combination found to be unfavourable. Plots were generated using MolProbity (Davis et al. 2007) via the PSVS (Bhattachayra et al. 2007).

137

Figure 4.11: Ramachandran plots of backbone torsion angles (φ/ψ) from the average energy-minimised TgMIC4-A5 structure. Ramachandran plots for general, glycine, proline and pre-proline residues. Light and dark blue lines respectively encircle the most favoured and additionally allowed regions of the plot. Only T13 is found to be in a disallowed region of the map. Plots were generated using MolProbity via the PSVS.

138

4.4.3.3. Sidechain (χ) angles.

The positions of amino acid side-chain atoms are defined by chi (χ) angles; χ1 and χ2 describe the N-(Cα-Cβ)- Xγ and Cα-(Cβ-Xγ)-Xδ angles respectively (where X = C, O, S or H). These tend to cluster at 60o (gauche+), -60o (gauche-) or 180o (trans), in order to avoid collisions of neighbouring atoms. Inspection of these values (i.e. via Ramachandran-like χ1 vs χ2 maps) is therefore another useful method of protein structure validation. This has been performed by PROCHECK, which additionally calculates a χ1-χ2 G-factor.

A representative selection of χ1 vs χ2 plots and G-factors is depicted in figure 4.13. For numerous residues the ensemble of angles clusters in a single favourable region of the plot, giving rise to a good G-factor (e.g. L56 and C57). In many other residues the angles are variable yet favourable (e.g. K27), and therefore maintain acceptable G-factors (i.e. above -2).

The most notably poor χ1-χ2 G-factor, of -2.75, belongs to residue L32. In this instance the ensemble of χ2 angles is largely unfavourable, but it is consistent (i.e. nine out of ten structures closely converge) to the extent that it is likely to be a genuine feature. This is corroborated by inspection of the TgMIC4-A5 structure, revealing that L32 lies at the top of the α-helix and protrudes into the core of the protein, flanked by Y48 and Y50, residues to which it bears multiple NOEs (figure 4.12).

Figure 4.12: the configuration of the L32 side-chain. An enlarged area of a representative TgMIC4-A5 structure, with the side-chains of L32, Y48 and Y50 pictured. The unusual L32 χ2 angle, of approximately 120o, directs the two methyl groups into the core of the protein, in close proximity to the aromatic rings of Y48 and Y50.

139

Figure 4.13: χ1 vs χ2 plots for a selection of TgMIC4-A5 residues. Gridlines are included at 60o, 180o and 300o. χ1/χ2 combinations are denoted by boxes, coloured depending on the favourability of the values (yellow = favourable, red = unfavourable). The most favourable χ1/χ2 combinations are at gridline intersections. cv = circular variance (i.e. the degree of clustering; low values = high clustering). Gf = G-factor. These plots were created using PROCHECK via the PSVS.

140

4.5. Discussion

4.5.1. The α1-β3-β4 hairpin loop

As described in chapter 1.4.2.1, all known apple/PAN domain structures bear an α1-β3-β4 hairpin loop (see figure 1.8C). This motif is also present in the structure of TgMIC4-A5, with the side-chains of residues C35 and C39 extending from the inner face of the α-helix and forming disulphide linkages with residues C57 (on strand β4) and C45 (preceding strand β3) respectively. The hydrophobic side-chains of residues Y48 and Y50 (on strand β3) pack against the inner face of the α-helix (figure 4.14).

Figure 4.14: the α1-β3-β4 hairpin loop of TgMIC4-A5. Helix α1 is tethered to stands β3 and β4 via two disulphide linkages. Tyrosine side-chains establish a hydrophobic core along the inner face of the helix.

4.5.2. Ring current shifts

As detailed in table 4.2, TgMIC4-A5 bears several unusual chemical shifts (i.e. values which out-lie the standard deviation in the BMRB). Knowledge of the atomic structure of the domain enables postulation of the causes of these shifts, serving to further validate the structure and the assignment data. As

141 demonstrated in figure 4.15, each of the seven unusual shifts can be attributed to the influence of a proximate aromatic ring (i.e. ring current shifts; described in chapter 3.4.2.3). The imidazole ring of residue H47 influences the chemical shifts of several nuclei. K60-Hβ2 and P64-Hβ2 both lie above the plane of this ring, and are shielded such that their resonances shift towards 0 ppm (figure 4.15E & C). P64 is flanked on its opposite side by Y67, the aromaticity of which causes further shifts in the Hβ3 and Hγ3 atoms (figure 4.15C). The backbone carbon (i.e. C=O) nucleus of H47 is sandwiched between the ring currents of the imidazole ring and the benzene ring of the adjacent Y48 (figure 4.15B), and consequently is shifted significantly downfield (181.5 ppm). Y48 has an additional influence, serving to shield the Hα nucleus of R36 (figure 4.15A). Finally, the unusual side-chain amide resonance of N51 (at 9.6 ppm) can be explained by its positioning adjacent to the aromatic ring of Y69 (figure 4.15D), causing a significant downfield shift.

142

A

B C

D E

Figure 4.15: ring current shifted nuclei in TgMIC4-A5. A) the Hα nucleus of R36 is shielded by the aromatic ring of Y48. B) H47-C, pictured as a black sphere, lies adjacent to the planes of the imidazole ring and the phenol group of Y48. C) the various proline side-chain nuclei are flanked by the aromatic rings of H47 and Y67. D) The N51-Hδ22 nucleus lies adjacent to the plane of the Y69 aromatic ring. E) The K60-Hβ2 nucleus is shielded by the H47 aromatic ring.

143

4.5.3. Homology with other apple/PAN domains

As stated in chapter 1.4.2, apple/PAN domains have a highly conserved fold, even in cases of low sequence identity. It is evident from visual inspection of the TgMIC4-A5 structure that it retains the core structural features of an apple/PAN domain. Nonetheless, the structure was submitted to the DaliLite server v1.3 (Holm & Rosenström, 2010), in order to characterise its homology to other known apple/PAN domain structures (table 4.6). This reveals that TgMIC4-A5 bears close structural homology to many known apple/PAN domain structures in spite of generally low sequence identities. In all instances the core structure of the domain (i.e. the assembled α-helix and β-strands) is well conserved, leading to alignment of the majority of the 70 TgMIC4-A5 residues, and observation of relatively low RMS deviations. Structural divergence is mainly restricted to the varying lengths of intersecting loop regions, which are particularly substantial in FXI and AMA1.

Aligned Sequence Protein PDB ID Method Z-score RMSD (Å) residues identity (%) hFXI 2F83 X-ray 10.1 1.5 68 (out of 583) 13 hFXI-A4 2J8J NMR 4.3 2.8 65 (90) 15 hHGF-N 1GP9 X-ray 8.1 2.1 66 (170) 9 PfAMA1 2Z8W X-ray 8.3 1.8 68 (335) 15 EtMIC5-A9 1HKY NMR 6.7 2.3 65 (86) 20 HoLAPP 1I8N X-ray 5.5 2.5 66 (89) 14 HmSar 2K13 NMR 3.3 2.9 62 (103) 15

Table 4.6: structural homology of TgMIC4-A5 with other known apple-PAN domain structures. The average energy- minimised TgMIC4-A5 solution structure was used to search for homolgous structures using the DaliLite server v1.3 (Holm & Rosenstrom 2010), identifying human Factor XI (hFXI) and hepatocyte growth factor N-domain (hHGF-N), P. falciparum apical membrane antigen (PfAMA1), the ninth apple domain from E. tenella microneme protein 5 (EtMIC5- A9), H. officinalis leech anti-platelet protein (HoLAPP) and Hirudo medicinalis saratin (HmSar). Homologues are ranked according to Z-score; an indicator of the difference between the actual alignment score and the average alignment score using randomised sequences (i.e. a Z-score >3 is indicative of homology). RMSD values correspond to the alignment of Cα atoms following rigid body superimposition.

144

4.5.4. The N-/C-terminal disulphide linkage

A contentious issue regarding the TgMIC4-A5 structure is whether or not the N- and C-terminal cysteines form a disulphide linkage. Whilst it is expected that these residues are cross-linked, the absence of chemical shift assignments mean that their proximity has not been proved experimentally. In an attempt to resolve this issue a sample was submitted for peptide fragmentation and fingerprinting via MALDI-TOF mass spectrometry under non-reducing conditions, however this failed to detect either crosslinked or non- crosslinked peptides. Consequently, the TgMIC4-A5 structure has been calculated with and without this linkage involved. The latter yields an ensemble of structures in which the cysteines are proximate nonetheless, meaning that it is probable that they do in fact exist in disulphide-linked form. The isolated lack of resonances in residues C10, V11, C79 and D80 could be interpreted as evidence for the existence of a disulphide bond. The resonances of outlying linker residues are assignable, making it improbable that the disappearance of cysteine resonances is merely caused by random flexibility on an intermediate timescale. Instead it is likely that ‘crankshaft’ motions of a disulphide linkage lead to the disappearance of resonances.

4.5.5. Concluding remarks

The work presented in this chapter has described the chemical shift assignment and structure calculation of domain A5 from TgMIC4, followed by validation of the resulting structure. The domain bears a canonical apple domain fold, which is in good agreement with the experimental data and previously determined apple domain structures. The validation statistics for the structure ensembles are also largely favourable. The structure therefore provides a good basis for ligand binding studies, which are described in chapter 5.

It is worth noting that in the context of full-length TgMIC4, domain A5 exists within a domain pair and is separated from the adjacent A6 domain by only three residues. The solution structure of TgMIC4-A12 has recently been solved in our laboratory (Dr Jan Marchant), revealing the paired domains to be closely associated (described further in chapter 7.4.1). Assuming that the interface is conserved in all TgMIC4 apple domain pairs, then sub-cloning of TgMIC4-A5 introduces a somewhat non-native context, and its structure at the A6 interface may might be altered. Indeed, it is likely that the absence of the A6 domain leads to the observed instability at the N-/C-termini. However, as described in chapter 5, the ligand-binding site of TgMIC4-A5 is located at a site distant from the N-/C-termini, and hence these observations are not considered to be significant for further studies.

145

Chapter 5: Preliminary studies of carbohydrate- binding by TgMIC4-A5.

146

5.1. Introduction

Previous studies have identified that TgMIC4 and its Apicomplexan orthologues are capable of adhering to galactosylated carbohydrates (Brecht et al. 2001; Klein et al. 1998; Keller et al. 2004). In TgMIC4 this activity was initially localised to an unspecified site near the C-terminus (Brecht et al. 2001), which we have now further refined to apple domain 5 (see chapter 3). This chapter will describe the initial studies of carbohydrate-binding by TgMIC4-A5, predominantly via NMR methods, leading to discussion of observations in the context of the domain solution structure.

5.2. Prior collaborative studies; carbohydrate microarray analysis

Carbohydrate microarray technology is a relatively new development in the field of glycobiology, enabling high-throughput screening of protein binding to carbohydrates. Whilst numerous microarray methodologies exist (reviewed in Liu et al. 2009), carbohydrate-binding by TgMIC4-A5 was investigated using the ‘NGL’ system created in the laboratory of Professor Ten Feizi (Imperial College London) (Fukui et al. 2002). Here, glycan probes are assembled in tiny spots across a nominal surface area (i.e. a chip), allowing for simultaneous assessment of hundreds of potential ligands. Quantitative detection of protein binding is performed using a system of antibody-conjugates and fluorescent probes, the sensitivity of which means that only small quantities of protein are required.

‘Neoglycolipids’ (NGLs) are created via the conjugation of a lipid tail to an oligosaccharide (Feizi et al. 1994). The hydrophobicity of the lipid enables non-covalent immobilisation to nitrocellulose (Wang et al. 2002; Fukui et al. 2002), where NGL amphipathicity promotes receptor clustering. This is aided by the mobility conferred by non-covalent attachment, in contrast to the rigidity of covalently-attached probes (Liu et al. 2009). Receptor clustering may enhance the detection of often low-affinity protein/ carbohydrate interactions (Liu et al. 2009), and is analogous to the in vivo mode of carbohydrate recognition by some proteins (Crocker & Feizi 1996).

Covalently-constructed microarrays often involve extensive chemical modifications; limiting in terms of natural oligosaccharides, which can only be purified in small quantities. By contrast, the production of NGL arrays is relatively straightforward. Large mixtures of natural and synthetic oligosaccharides are simultaneously lipid-conjugated, resolved via chromatography, and immobilised alongside natural and

147 synthetic glycolipids, which require no modification (Liu et al. 2009). At present, microarrays can be constructed containing up to 600 neoglycolipid probes, including N-glycans, O-glycans, gangliosides, glycosaminoglycans, poly-sialyls, and various polysaccharide fragments (Liu et al. 2009).

5.2.1. Carbohydrate microarray data for TgMIC4-A56

As discussed in chapter 1.5, carbohydrate microarray data has been previously collected for the partially-folded TgMIC4-A56 sample. The results of the array, containing 503 probes, demonstrate that the protein adheres exclusively to galactose-terminated oligosaccharides (Appendix A7). These include numerous lactose-derived and ganglioside-based probes, along with several N- and O-linked glycans. The presence of a terminal galactose unit appears to be imperative, with the numerous probes containing an internal galactose unit unrecognised by the protein. Since the structured/functional part of TgMIC4-A56 has been identified as A5 , it was expected that sub-cloned TgMIC4-A5 would yield a very similar microarray profile.

5.2.2. Carbohydrate microarray data for TgMIC4-A5

Based on the results of the TgMIC4-A56 microarray a smaller, refined array was constructed for testing TgMIC4-A5. This contained 118 oligosaccharide probes, encompassing six groups; A (Lac-, LacNAc-, LNT- based). B (N-glycan related), C (ganglioside-related), D (polysialyl), E (O-glycan related) and F (miscellaneous). Data was collected by Dr Yen Liu (Imperial College London Glycosciences laboratory), in general accordance with previously described protocols (Campanero-Rhodes et al. 2006; Blumenschein et al. 2007). Briefly, 118 carbohydrate probes were arrayed in duplicate (2 fmol and 5 fmol spots) on a nitrocellulose matrix. TgMIC4-A5/trx fusion protein (20 μg/ml) was then pre-complexed with mouse anti-histidine and biotinylated goat anti-mouse IgG antibodies, and incubated with the arrayed chip for 1 hour. The array was then washed, before quantitative detection of ligand-bound protein via addition of streptavidin-labelled Alexa-Fluor 647 and measurement of fluorescence intensity. The array was performed in triplicate, enabling a mean average (and standard deviation) to be deduced.

Since the detection of oligosaccharide-bound protein requires the presence of a histidine-tag, a 500 μl sample of TgMIC4-A5/trx fusion was extracted prior to factor Xa digestion. The protein was therefore assayed in factor Xa reaction buffer (Appendix A2).

148

5.2.2.1. TgMIC4-A5 binds to galactose-terminated oligosaccharides

Summarised results of the TgMIC4-A5 carbohydrate microarray are provided in figure 5.1 and table 5.1. A complete table of data is provided in appendix A8. The data demonstrate that TgMIC4-A5 binds significantly (i.e. over 1000 fluorescence units) to eighteen probes. As for TgMIC4-A56, these are primarily lactose-based and ganglioside-related. Each ‘hit’ bears a terminal galactose unit, whilst the numerous probes containing internal galactose saccharides do not elicit significant binding. The bound probes are structurally variable, ranging from simple disaccharides to complex, branched-chain oligosaccharides, some of which bear multiple galactose termini. However increased mass and multi- denticity do not appear to give rise to tighter interactions, with some mono-dentate disaccharides bearing the high fluorescence intensities. This may be due to mimicking of multi-denticity by monodentate probes, via amphipathy-mediated clustering on the nitrocellulose matrix.

It is noteworthy that TgMIC4-A5 does not bind to the galactose monosaccharides in the array (see appendix A8; probe #109 and #110), in contrast to the previous demonstration, via NMR spectroscopy, of in vitro galactose binding by trypsin-digested TgMIC4-A56 (i.e. TgMIC4-A5) (figure 3.12). This may suggest that TgMIC4-A5 possesses a relatively-deep binding pocket, access to which by immobilised galactose is hindered by the proximate nitrocellulose matrix. Steric hindrance may also prevent TgMIC4- A5 from binding many of the galactose-terminated probes in which the preceding sugar unit is branched (probes #7, #10 and #57). Many of these probes also bear a N-acetylgalactosamine (GalNAc) terminus, which may also be less well accommodated. Contrarily, the protein does bind to the GalNAc-terminated SM2 ganglioside (probe #52), though this ligand is bound with a relatively low affinity amongst the eighteen hits (table 5.1).

5.2.2.2. Branched-chain negative charge may enhance oligosaccharide binding

Many of the ganglioside-related probes to which TgMIC4-A5 binds contain a negatively-charged (sulphate or sialic acid) branch from an internal galactose. It is clear from the microarray data that TgMIC4-A5 cannot bind to solely sulphate- or sialyl-terminated probes, yet the presence of such a group may enhance adhesion to galactosylated oligosaccharides. For example, GM1 and GM1-penta (probes #60 and #61) elicit strong fluorescence intensities, which are diminished in their asialylated derivatives

149

(probes #58 and #59) (table 5.1). Additionally, the presence of a sulphate group on SM2 may aid adhesion to its GalNAc terminus, a moiety which is generally unrecognised by the protein otherwise.

Figure 5.1: carbohydrate microarray analysis of TgMIC4-A5. Depicting the fluorescence intensity of bound protein (in arbitrary fluorescence units: AFU) for each of the 118 probes in the microarray. Values correspond to 5 fmol spots. Selected oligosaccharide structures are annotated, with differing disaccharide termini colour coded. Gal = galactose, Glc = glucose, GalNAc = N-acetylgalactosamine, GlcNAc = N-acetylglucosamine, SU = sulphate, NeuAc = N-acetylneuraminic acid, NeuGc = N-glycolylneuraminic acid, Cer = ceramide, AO = aminooxy-lipid conjugate. A full list of probe names, structures and precise fluorescence intensities can be found in Appendix A8.

150

Probe # Group Name Structure Fluorescence Galβ-3GalNAcβ-4Galß-4Glcβ-Cer C (Ganglioside- 54 SM1a │ 40703 ± 304 related) SU-3 Galβ-3GalNAcβ-4Galβ-4Glcβ-Cer 62 C GM1(Gc) │ 32137 ± 680 NeuGcα-3 2 A (Lactose-based) LacNAc-AO Galβ-4GlcNAc-AO 30939 ± 1121 Galβ-3GalNAcβ-4Galβ-4Glc 61 C GM1-penta │ 29569 ± 239 NeuAcα-3 Galβ-4GlcNAc-AO 8 A Lex-Tri-AO │ 20131 ± 162 Fucα-3 Galβ-3GalNAcβ-4Galβ-4Glcβ-Cer 60 C GM1 │ 20021 ± 876 NeuAcα-3 Galβ-3GalNAcβ-4Galβ-4Glcβ-Cer 67 C GD1b │ 18783 ± 601 NeuAcα-8NeuAcα-3 Galβ-3GalNAcβ-4Galβ-4Glc 63 C GM1(Gc)-penta │ 17262 ± 999 NeuGcα-3 1 A Lac-AO Galβ-4Glc-AO 15717 ± 256 4 A LacNAc(1-3)-AO Galβ-3GlcNAc-AO 14131 ± 1143 59 C Asialo-GM1-Tetra Galβ-3GalNAcβ-4Galβ-4Glc 5403 ± 298 Galβ-4GlcNAcβ-6 │ Galβ-4GlcNAcβ-2Manα-6 │ 44 B (N-glycan related) NA4 Manβ-4GlcNAcβ-4GlcNAc 4179 ± 15 │ Galβ-4GlcNAcβ-4Manα-3 │ Galβ-4GlcNAcβ-2 Galβ-3GlcNAc-AO 6 A Lea-Tri-AO │ 4129 ± 595 Fucα-4 GalNAcβ-4Galβ-4Glcβ-Cer 52 C SM2 │ 2097 ± 113 SU-3 58 C Asialo-GM1 Galβ-3GalNAcβ-4Galβ-4Glcβ-Cer 2963 ± 161 90 D (O-glycan related) Notch-3 Galβ-4GlcNAcβ-3Fucα-Thr 2519 ± 50 Galβ-3GlcNAc 5 A Lea-Tri │ 1409 ± 80 Fucα-4 Galβ-4GlcNAcβ-2Manα-6 Fucα-6 │ │ 45 B NA2F-AO Manβ-4GlcNAcβ-4GlcNAc-AO 1063 ± 94 │ Galβ-4GlcNAcβ-2Manα-3

Table 5.1: summarising the arrayed probes to which TgMIC4-A5 exhibits highest affinity. Probe numbers, names and topological structures are provided alongside fluorescence intensities and error bounds (i.e. standard deviation). Probes are ranked by fluorescence intensity, in descending order. Topological structures are colour- coded according to their differing disaccharide termini. Gal= galactose, Glc = glucose, GalNAc = N- acetylgalactosamine, GlcNAc = N-acetylglucosamine, SU = sulphate, NeuAc = N-acetylneuraminic acid, NeuGc = N- glycolylneuraminic acid, Cer = ceramide, AO = aminooxy-lipid conjugate. Fuc = fucose, Man = mannose.

151

5.3. Materials & methods

5.3.1. NMR titration experiments.

Detailed insights into protein/ligand interactions can be provided via incremental addition of ligand to protein (i.e. titration), and monitoring of the effects induced upon an NMR spectrum (i.e. chemical shift perturbations). This was carried out for a selection of TgMIC4-A5 ligands.

5.3.1.1. Selection of ligands for titration

The carbohydrate microarray identified numerous candidate ligands for further investigation via NMR titration analysis. Many of these are complex oligosaccharides, and hence a detailed evaluation of each would not be economically viable. However, examination of the topologies of the eighteen oligosaccharides reveals a number of similarities and trends; for example, galactose is usually preceded by a N-acetyl-D-glucosamine (GlcNAc) or N-acetyl-D-galactosamine (GalNAc) saccharide. In total, four disaccharide termini are encompassed; Galβ1-4Glc (lactose), Galβ1-4GlcNAc (N-acetyl-D-lactosamine), Galβ1-3GalNAc (Galacto-N-biose) and Galβ1-3GlcNAc (Lacto-N-biose). These, together with galactose monosaccharide, were selected for NMR titration analysis. 3’sialyl-N-acetyl-D-lactosamine (NeuAcα2- 3Galβ1-3GlcNAc) was also titrated, in order to confirm that TgMIC4-A5 is incapable of binding to non- terminal galactose. Additionally, GM1-penta (Galβ1→3 GalNAcβ1-4*NeuAcα2-3+Galβ1-3Glc) was selected, in order to investigate the mode of binding to a more complex oligosaccharide, and a potential in vivo receptor. Though SM1a produced the highest fluorescence intensity, its lack of commercial availability precluded its selection for further studies. The molecular structures of the selected ligands are provided in Appendix A9.

5.3.1.2. Experimental procedure

Based upon data from the carbohydrate microarray, chemical shift perturbation analysis was carried out for eight putative apple5 ligands; galactose (Melford Laboratories), lactose (Melford Laboraties), N- acetyl-D-lactosamine (Dextra Laboratories), lacto-N-biose (Dextra Laboratories), galacto-N-biose (Dextra Laboratories), 3’-sialyl-N-acetyl-D-lactosamine (Sigma Aldrich) and GM1-penta (Enzo Life Sciences). Each interaction was examined via incremental addition of a concentrated ligand solution to 15N-labelled

152

TgMIC4-A5 (of at least 100 μM concentration) followed by acquisition of a 1H-15N HSQC spectrum. At each addition step, a ~100 μl volume of sample was decanted into an Eppendorf tube, ligand added, and the tube inverted several times to ensure thorough mixing before replacement in the NMR tube. The process was repeated until chemical shift changes were no longer observed (i.e. protein saturation was achieved). The pH of the sample was checked at regular intervals, in order to ensure that any observed chemical shift perturbations were not the result of pH fluctuations. All experiments were performed in NMR buffer, at 303K, using an in-house Bruker Avance III DRX600 spectrometer equipped with a TCI cryoprobe.

5.3.2. Determination of dissociation constants

The affinity between a protein and ligand (i.e. protein, DNA/RNA, carbohydrate or small molecule) is quantified in terms of the dissociation constant, Kd, for the interaction. This is defined as the ligand concentration (in molar units) at which the concentrations of bound and unbound protein are equal.

-3 -12 Protein/ligand interaction Kd values usually fall between 10 M (i.e. weak) and 10 M (i.e. very strong), -3 -6 with carbohydrate ligands typically binding weakly (i.e. 10 - 10 M) (Rao et al. 1998). Kd values for various TgMIC4-A5/ligand interactions were determined using a combination of NMR titration data analysis and isothermal titration calorimetry (ITC).

5.3.2.1. Via NMR titration data

5.3.2.1.1. Protein/ligand interactions and exchange-regimes

Via incremental addition of ligand to protein, one can observe the behaviour of the protein in various pre-saturated states. This provides insight into the kinetics of the interaction, as the nature of a chemical shift perturbation is determined by the relationship between the dissociation rate constant (koff) and the frequency separation between the resonances of the free and bound states, Δ (Hz) (Meyer & Peters,

2003). The relationship, koff > Δ , describes ‘fast-exchange’, where protein rapidly exchanges between free and bound states (i.e. diagnostic of a weak interaction), such that neither exists long enough for its true resonance frequencies to be detected. Instead, weighted average chemical shifts are yielded, giving rise to incremental chemical shift perturbations with increasing ligand concentration (see figure 5.2).

153

Conversely, koff < Δ describes slow-exchange, during which ligand-bound protein exists for a time sufficient for its true chemical shifts to be detected (i.e. diagnostic of a strong interaction). This results in two distinct NMR signals, representing free and bound protein and with relative intensities proportional to their respective populations. As ligand concentration increases the bound-state signal becomes predominant (figure 5.2). Finally, koff ≈ Δ describes intermediate-exchange, which combines the aesthetics of fast and slow-exchange; resonances shift incrementally, but are simultaneously broadened before narrowing as the protein becomes saturated (Meyer & Peters, 2003).

Figure 5.2: the effect of exchange on NMR signals. The diagnostic characteristics of fast, intermediate and slow- exchanging protein/ligand interactions. Whilst the starting (free-state) and ending (bound-state) resonances appear the same regardless, the resonances of interim states appear differently depending on the rate of protein exchange between free and bound state (Meyer & Peters, 2003).

5.3.2.1.2. Protocol

In cases where sizeable chemical shift perturbations occur in fast-exchange on the NMR time-scale, the observation of incremental chemical shift changes (Δδn) enables the Kd to be estimated (Fielding 2007).

For a 1:1 interaction, the relationship between Kd and Δδn is given by

154

(Equation 5.1)

where [P]n and [L]n represent the protein and ligand concentrations at point (n) in the titration, whilst

Δδn and Δδmax respectively represent the chemical shift change from the unbound state at point (n), and the total chemical shift change between the unbound and saturated states. A complete derivation of this equation is provided in appendix A10.

Where possible kd values were determined by plotting Δδn versus [L]n for a series of perturbed residues, and fitting of binding isotherms via modulation of Δδmax and Kd values and least-squares refinement (i.e. minimisation of the differences between observed and calculated Δδn values). This was performed using the Microsoft Excel 2007® Solver module. Isotherms were plotted for five different residues per ligand, and average Kd values (and standard deviations) calculated.

5.3.2.2. Isothermal Titration Calorimetry (ITC)

In cases where it was not possible to estimate Kd using chemical shift perturbation data, values were sought via isothermal titration calorimetry (ITC), a biophysical technique which measures the thermodynamic properties of interactions in solution (Wiseman et al. 1989). ITC data analysis yields the total enthalpy (ΔH), stoichiometry (n) and association constant (Ka) of the interaction, from which Gibbs θ θ free energy (ΔG ), entropy (ΔS ) and dissociation constant (Kd) can be determined.

5.3.2.2.1. The ITC experiment

A standard ITC experiment comprises a sample cell (containing protein sample) and a reference cell (containing buffer or water), between which a thermal equilibrium is established. Ligand is incrementally injected into the sample cell, where interaction with the protein causes heat to be released or absorbed (depending on the nature of the interaction). Importantly, a separate ‘blank’ run is also carried out, where ligand is injected into buffer without protein, in order to detect thermal changes occurring due to the ligand dilution (i.e. the ‘baseline’). During titration thermal changes are monitored by the calorimeter and the power supply to the sample cell adjusted accordingly in order to maintain thermal

155 equilibrium. This gives rise to a plot of the power required to maintain thermal equilibrium versus time (i.e. consisting of a ‘spike’ at each injection point if the molecules interact). Integration of the raw data calculates the heat exchanged per mole of injectant at each point, plotted against molar ratio (ligand:protein) to yield a binding isotherm. Data is fitted via least-squares refinement, via modulation of

θ θ ΔH, Ka and n (Wiseman et al. 1989), which are used to determine ΔG , ΔS and Kd via the equations,

(Equation 5.2) where R and T represent the gas constant and temperature respectively, and

(Equation 5.3)

5.3.2.2.2. The feasibility of ITC experiments

The feasibility of studying an interaction by ITC is governed by the ‘c-value’ (Wiseman et al. 1989), given by

(Equation 5.4)

where [P] is the initial protein concentration and Ka is the association constant. c defines the shape of the binding isotherm, with a clear sigmoid curve required for accurate data extraction. It widely believed that useful isotherms, providing reliable data, are obtained within a ‘c-window’ of 1-1000, however various studies have suggested more refined windows of 10-100 or 20-100 (Myszka et al. 2003; Wiseman et al. 1989). Whilst these may optimise the sigmoidal shape of the isotherm, it has been demonstrated that thermodynamic parameters can be derived from experiments where c = <10, (Turnbull & Daranas 2003).

5.3.2.2.3. ITC protocol for the TgMIC4-A5/lacto-N-biose complex

In order to achieve a desirable c-value TgMIC4-A5 was expressed and purified from eight 1 l cell cultures, as previously described (chapter 3.3.3). This yielded a sample of concentration 650 μM, resulting in an estimated c value of 6.5. Due to the relatively weak nature of the interaction it was decided that lacto-N-

156 biose would be titrated at a 15-fold higher concentration, in order to ensure saturation of the protein. Both samples were constituted in NMR buffer (Appendix A2) and the experiment was conducted at 303K; identical conditions to the NMR titrations. Data was collected using a VP-ITC Microcalorimeter (MicroCalTM). TgMIC4-A5 (650 μM) was incrementally titrated with 5 μl volumes of lacto-N-biose (9.75 mM) at 5 minute intervals for 10 injections, followed by 10 μl volumes for a further 24 injections. Data was analysed using Origin® v.7.0 software.

5.3.3. Assessing co-operative GM1-penta binding by TgMIC4-A5 and TgMIC1-NT

Analysis of the oligosaccharide binding profiles of TgMIC4-A5 and TgMIC1-NT (described by Blumenschein et al. 2007) prompted a dual titration of these proteins with GM1-penta. This section describes the preparation of TgMIC1-NT protein samples and the titration protocol.

5.3.3.1. Production and initial NMR analysis of recombinant TgMIC1-NT

A previously prepared pET-32 Xa/LIC construct expressing TgMIC1 residues 17-262 (MIC1-NT) was kindly provided by Dr Savvas Saouros (Imperial College London). Uniformly 15N-labelled protein was expressed and purified as previously described (Saouros et al. 2007), i.e. as for TgMIC4 apple domains (chapter 3.3.3 & 3.3.4). The protein was exchanged into NMR buffer (appendix A2) during size-exclusion chromatography, and concentrated to a 300 μl volume, supplemented with D2O (10% v/v), 0.1% (v/v) sodium azide and 1% (v/v) EDTA-free protease-inhibitor cocktail. The folded state of purified TgMIC1-NT was verified via acquisition of a 1H-15N HSQC spectrum, before titration with GM1-penta (via the protocol described in chapter 5.3.1.2).

5.3.3.2. Titration of TgMIC1-NT/GM1-penta with TgMIC4-A5

Observing the behaviour of two proteins during an NMR titration is likely to be complicated by resonance overlap. This potential problem was partially negated via titration of 15N-labelled TgMIC1-NT with uniformly 13C/15N-labelled TgMIC4-A5. This enabled selective observation of TgMIC4-A5 via a HNCO spectrum, with acquisition of a single data point in the 13C dimension giving rise to a “pseudo-2D” (i.e. 1H-15N HSQC-like) spectrum.

157

The 15N-MIC1-NT/GM1-penta sample was saturated with two molar equivalents of 13C/15N-apple5 over two increments. Details of acquired spectra are provided in table 5.2. All spectra were acquired at 303 K, using the in-house Bruker Avance III 600 spectrometer equipped with a TCI cryoprobe.

Spectral Field Experiment Nuclei Data points Comments width (Hz) strength (Hz) 1H 2048 7500.000 600.05 1H-15N-HSQC 15N 140 1945.904 60.81 1 H 2048 19230.769 600.05 A pseudo-2D experiment, HNCO 15N 92 1945.904 60.81 selective for 13C /15N- 13C 1 - 150.895 labeled protein.

Table 5.2: the NMR experiments recorded during the dual titration of MIC1-NT and MIC4-apple5 with GM1-penta.

5.4. Results

5.4.1. NMR titration data

The titrations of TgMIC4-A5 with galactose, lacto-N-biose and GM1-penta are depicted in figures 5.3-5.5. These respectively depict interactions in fast, intermediate and slow-exchange regimes. Additional data for the titrations with lactose, N-acetyl-D-lactosamine, galacto-N-biose and 3’-sialyl-N-acetyl-D- lactosamine are provided in appendix A11 (figure A11.1 – A11.4). The results are additionally summarised in table 5.2. As expected, TgMIC4-A5 was found to adhere to all ligands bearing a terminal galactose unit, including galactose monosaccharide (figure 5.3). The titration with 3’-sialyl-N-acetyl-D- lactosamine also confirmed that TgMIC4-A5 cannot bind to internal galactose (figure A11.4).

5.4.1.1. Binding affinity appears proportional to ligand size

Contrary to the microarray data, the NMR titrations suggests that TgMIC4-A5 binds to larger oligosaccharides with increasing affinity. The interaction with galactose monosaccharide appears to be the weakest, occurring in fast-exchange. The interactions with disaccharides tend towards intermediate- exchange, whilst GM1-penta forms a relatively strong, slow-exchanging interaction (table 5.3). Incidentally, the protein appears to be capable of distinguishing β1,3 and β1,4-linked disaccharides, with the former appearing to bind with higher affinity.

158

Number of Saturation Ligand Exchange-regime Comments saccharides ratio (L:P) Galactose 1 8:1 Fast See fig 5.3. Weak interaction. Lactose 2 6:1 Fast/intermediate See fig A11.1. N-acetyl-D-lactosamine 2 6:1 Fast/intermediate See fig A11.2. Lacto-N-biose 2 5:1 Intermediate See fig 5.4. Galacto-N-biose 2 5:1 Intermediate See fig A11.3. 3’sialyl-N-acetyl-D-lactosamine 3 - - See fig A11.4. No interaction. GM1-penta 5 2:1 Slow See fig 5.5. Strong interaction.

Table 5.3: summarising the titrations of TgMIC4-A5. Stated for each ligand is the number of component saccharides, the protein/ligand molar ratio at saturation and the observed exchange-regime. The strength of the interaction appears to be proportional to the size of the ligand.

5.4.1.2. Ligands elicit a near-conserved pattern of chemical shift perturbations

Comparison of the precise chemical shift perturbations induced by each ligand reveals a striking pattern of conservation (figure 5.6). Compared with galactose monosaccharide (figure 5.6; green), no additional residues are perturbed by the various galactose-terminated oligosaccharides. The positions of bound- state resonances are also largely consistent, save for several unique shifts induced by lacto-N-biose (figure 5.6; black), where a subtle reorganisation of the TgMIC4-A5 interface may be required to accommodate the ligand. This suggests that galactose is intimately associated with TgMIC4-A5, perhaps docking within a cavity, whilst preceding units are relatively peripheral. This is consistent with the inability of TgMIC4-A5 to recognise non-galactose terminated probes in the carbohydrate microarray, discussed in chapter 5.2.2.1.

It could be further interpreted that saccharides preceding galactose do not contribute to TgMIC4-A5 binding through intermolecular contacts. Instead the proportionality between binding affinity and ligand size could merely be driven by the decreased entropic penalties associated with binding of larger molecules. Whilst this may be a contributing factor, the ability of TgMIC4-A5 to discriminate β1,3 and β1,4 linked disaccharides (i.e. molecules of identical size), and the apparent enhancement of binding by negatively-charged branched saccharides in the microarray (see chapter 5.2.2.2), suggests that intermolecular contacts are established. It must be remembered that a 1H-15N HSQC-monitored titration only reports the effects of ligand-binding upon the amide resonances of each residue, and may not

159 always convey the participation of side-chain atoms. Thus it is feasible that preceding saccharides stabilise the interaction with galactose through side-chain contacts.

Figure 5.3: NMR titration of TgMIC4-A5 with galactose. The superimposed 1H-15N HSQC spectra of TgMIC4-A5 in the presence of 0 (black), 1 (green), 2 (magenta), 4 (blue), 6 (cyan) and 8 (red) molar equivalents (Meq) of galactose, at which point the protein was saturated. Spectra were acquired at a protein concentration of ~200 µM. The interaction occurs in fast-exchange, with most perturbed resonances undergoing clear incremental shift changes with increased galactose concentration. The chemical shift perturbations of selected residues are annotated. In instances where the frequency separation between the unbound and saturated is particularly large, characteristics of intermediate (S54) and slow exchange (S18, Y67) are observed.

160

Figure 5.4: NMR titration of TgMIC4-A5 with lacto-N-biose. The superimposed 1H-15N HSQC spectra of TgMIC4-A5 in the presence of 0 (black), 1 (green), 3 (magenta), 4 (blue) and 5 (red) molar equivalents (Meq) of lacto-N-biose, at which point the protein was saturated. Spectra were acquired at a protein concentration of ~200 µM. The interaction occurs in intermediate exchange, with many resonances broadened beyond detection, at pre- saturating lacto-N-biose concentrations.

161

Figure 5.5: NMR titration of TgMIC4-A5 with GM1-penta. The superimposed 1H-15N HSQC spectra of TgMIC4-A5 in the presence of 0 (black), 1 (green) and 2 (red) molar equivalents (Meq) of GM1-penta. Spectra were acquired at a protein concentration of ~200 µM. The interaction occurs in slow-exchange; at half-saturation (1 Meq; green) free- state and bound-state resonances are simultaneously observed in many cases.

162

Figure 5.6: Comparison of TgMIC4-A5 bound to six different ligands. Superimposed 1H-15N HSQC spectra of TgMIC4-A5 saturated with lacto-N-biose (black), galactose (green), lactose (cyan), galacto-N-biose (magenta), N- acetyl-D-lactosamine (blue) and GM1-penta (red). The positions of bound-state resonances are largely conserved.

163

5.4.2. Determination of the ligand-binding interface of TgMIC4-A5

NMR titration data is highly complementary in instances where the atomic structure and chemical shifts of the unbound protein have been determined. By deciphering the chemical shifts of ligand-bound protein, chemical shift perturbations can be quantitatively mapped onto the protein surface, generating a ‘map’ of the ligand-binding interface. The 1H-15N HSQC experiment is commonly utilised for chemical shift mapping, as it only requires re-assignment of backbone chemical shifts. Usefully, the observed conservation of backbone amide chemical shift perturbations ensures that a chemical shift map derived from the galactose titration data is applicable to the whole family of TgMIC4-A5 ligands.

The reliability of chemical shift mapping depends upon the overall structure of the ligand-bound protein being largely unchanged from the unbound form, though this is often the case for carbohydrate-bound proteins (Deane et al. 2011; Moothoo & Naismith 1998; Toone 1994). Additionally, use of the 1H-15N HSQC experiment confers reliance upon the contribution of a residue to ligand-binding being conveyed by a change in its amide resonance frequency.

5.4.2.1. Re-assignment of the galactose-bound TgMIC4-A5 backbone

The interaction of TgMIC4-A5 and galactose occurs predominantly in fast-exchange, meaning that the positions of most amide resonances of the bound-state protein can be elucidated via visual inspection of the superimposed 1H-15N HSQC spectra. Nonetheless, since several resonances undergo intermediate or slow-exchange, the backbone atoms of galactose-bound TgMIC4-A5 were re-assigned using conventional triple-resonance NMR spectroscopy, i.e. using the same methods as for backbone assignment of the unbound protein (see chapter 4.4.1.1). All spectra were acquired using the in-house DRX600 Bruker Avance II spectrometer, equipped with a TCI cryoprobe.

As for unbound TgMIC4-A5, backbone assignment was completed for 94% of non-proline residues within the putative domain boundaries (i.e. C10-C79), with assignments missing for residues C10, V11, S30 and C79. The assigned 1H-15N HSQC spectra of ligand-bound TgMIC4-A5 is pictured in figure 5.7. Amongst the various backbone amide chemical shift changes, the side-chain amide resonances of residue N51 (one of which is particularly unusual) is also perturbed.

164

Figure 5.7: The assigned 1H-15N HSQC spectrum of galactose-bound TgMIC4-A5. Superimposed 1H-15N HSQC spectra of TgMIC4-A5 in the presence of increasing concentrations of galactose, culminating in saturation with 8 molar equivalents (red); the conditions used for backbone chemical shift re-assignment. Assignments are annotated, including that of the side-chain amide of N51.

5.4.2.2. Determination of total chemical shift perturbations (Δδtot)

1 To determine the total backbone amide chemical shift perturbation (Δδtot) for each residue, H (ΔδH) 15 and N (ΔδN) chemical shift changes were weighted, combined, and then ‘normalised’ with respect to the maximum combined chemical shift change (i.e. giving values ranging 0 - 1), via the equation

(Equation 5.5)

165

Firstly, a graph of ΔδH vs ΔδN was plotted (figure 5.8), demonstrating that residue Y67 undergoes unusually large chemical shift changes. The values are significantly greater than for all other residues, such that scaling Δδtot with respect to Y67 would yield a largely-compressed range of values. Therefore

Y67 was excluded from the weighting process and attributed a maximum Δδtot value of 1 from the outset. Inspection of the remaining data points revealed the maximum 1H and 15N chemical shift changes to be approximately 0.4 ppm and 1.5 ppm respectively, and ΔδH values were therefore weighted via multiplication by four. The highest combined ΔδH/ΔδN value of 1.91 (corresponding to K60) was then employed as a scaling factor, yielding normalised Δδtot values.

Figure 5.8: ΔδH vs ΔδN for galactose-induced chemical shift perturbations of TgMIC4-A5. A plot of ΔδH vs ΔδN concisely depicts the relative chemical shifts perturbations of each residue. The identities of the most significantly perturbed residues are annotated. The data points are concentrated at the x/y-intercept and become increasingly disperse along the axes. Y67 strongly out-lies the trends of dispersion and was excluded from the scaling process.

The maximum ΔδH and ΔδN values of approximately 0.4 ppm (S18) and 1.5 ppm (K60) lead to the weighting of ΔδH values via multiplication by four.

166

A plot of Δδtot per residue is presented in figure 5.9. Particularly large perturbations are observed in residues towards the N- and C-terminus, most notably in I16, K19, Q21, S54, K60, Y67 and G71. The effect induced upon N51 is also notable, considering that the perturbations of its side-chain amide resonances (figure 5.7) are not accounted for during this process.

Figure 5.9: ligand-induced chemical shift perturbations in TgMIC4-A5. A plot of the normalised, combined, and scaled chemical shift perturbations for TgMIC4-A5 residues C10-C79 in the presence of galactose.

167

5.4.2.3. The ligand-binding interface of TgMIC4-A5

The chemical shift mapped solution structure of TgMIC4-A5 is presented in figure 5.10. It demonstrates that the residues undergoing the most significant perturbations are concentrated at the junction between the two- and three-stranded β-sheets. There lies a relatively deep cavity, which likely represents the docking site for galactose. This is consistent with previous speculation regarding steric occlusion of certain probes (e.g. galactose monosaccharide) during microarray analysis, and supports the proposed model where an intimately-bound galactose is preceded by peripherally-bound saccharides. Binding at this site would represent a new binding mode for the apple/PAN domain family, with studies of other domains identifying ligand-binding interfaces or key residues in other regions of the fold (figure 1.7).

The putative galactose-binding pocket is encircled by several residues which may form crucial contacts with ligands atoms, namely K19, N51, Y58, K60, Y67 and Y69 (see figure 5.10B). Y58 demonstrates the limitations of chemical shift mapping, as its proximity to the binding site is not reflected by a backbone amide chemical shift perturbation. Together, these residues pose multiple functional groups which could mediate interaction to an (oligo-)saccharide.

5.4.2.4. The biochemistry of protein/carbohydrate interactions

Protein/carbohydrate complexes are typically formed via docking of the carbohydrate into a groove on a protein surface, such as the putative binding pocket of TgMIC4-A5. This represents a relatively loose-fit binding mode, which contributes to the regular observance of weak binding affinities. Instances where the carbohydrate is buried within the protein core are less-commonly observed and are usually characterised by a tighter affinity (Rao et al. 1998).

Protein/carbohydrate interactions are characterised by non-covalent forces. Carbohydrates are hydrophilic by nature, containing an abundance of hydroxyl groups, which readily act as donor or acceptor groups in formation of hydrogen-bonds with protein atoms. Many well-characterised protein/carbohydrate interfaces contain vast networks of direct and indirect (i.e. via water) hydrogen- bonds (Vyas et al. 1988; Gaiser et al. 2006). The putative ligand-binding site of TgMIC4-A5 contains an

168 abundance of potential hydrogen-bonded groups, including various hydroxyl groups from tyrosine residues and amine groups of asparagine and lysine (figure 5.10).

Figure 5.10: the ligand-binding surface of TgMIC4-A5. Normalised Δδtotal values were incorporated into the average energy-minimised structure of TgMIC4-A5 (including the C10-C79 disulphide) as B-factors in the .pdb file. Residues were then coloured according to B-factor, using a white-to-red gradient of eight increments (i.e. white;

Δδtot = 0, red; Δδtot = 1). A) Molecular surface (left-panel) and cartoon (right-panel) representations of the chemical shift-mapped TgMIC4-A5 structure, viewed from the same perspective. B) Enlarged views of the putative ligand- binding pocket, once again each viewed from the same perspective. Amino acid side-chains are included in the cartoon representation (right-panel).

169

In spite of their overall hydrophilicity, some saccharides contain hydrophobic patches, depending on the relative orientations of their C-H groups. For example, in galactose, the hydrogen atoms of C1 and C3-C5 are oriented on the same face of the ring, forming an extended hydrophobic surface (Rao et al. 1998). Such surfaces can mediate “ring-stacking” against protein aromatic side-chains (Vyas et al. 1988; Rao et al. 1998), which can also serve as hydrogen-bond acceptors (Levitt et al. 1988). Ring-stacking is enhanced by CH-π interactions; small van der walls forces between partially positively-charged aliphatic protons and delocalised aromatic π-electrons (Muraki 2002). This is basis of galactose discrimination by Galectin family proteins and other galactose-specific lectins (Nesmelova et al. 2008). TgMIC4-A5 contains three tyrosine residues within its putative ligand-binding site which may partake in analogous ring-stacking interactions (figure 5.10).

5.4.3. Determination of dissociation constants

An estimate of the dissociation constant, Kd, for each interaction was obtained where possible via analysis of NMR titration. Where not possible (i.e. intermediate- or slow-exchange), ITC was carried out, enabling a more comprehensive thermodynamic characterisation of the interaction.

5.4.3.1. NMR titration data

The NMR titrations of TgMIC4-A5 with galactose, lactose and N-acetyl-D-lactosamine all exhibit properties of fast-exchange, enabling Kd to be estimated. Superimposed plots of normalised (i.e. 1 15 negating differences between H and N chemical shift scales) chemical shift changes, Δδn, versus [L]n -4 for five residues are provided in figures 5.12-5.14. Fitting of the data revealed a Kd of ~2.70 x 10 M for the TgMIC4-A5/galactose with a standard deviation of 5 x 10-6 M for the five analysed residues (figure 5.11). Binding more strongly, lactose and N-acetyl-D-lactosamine returned similar values of ~1.60 x 10-4 (± 7 x 10-6) M (figure 5.12) and ~1.67 x 10-4 (± 8 x 10-6) M (figure 5.13).

170

Figure 5.11: binding isotherms for the TgMIC4-A5/galactose interaction. Overlaid plots of normalised Δδn vs -4 galactose concentration for a variety of TgMIC4-A5 residues. Fitting of the data yields a Kd of ~2.70 x 10 M (± 5 x 10-6 M). The experiment was performed at a protein concentration of ~200 µM.

Figure 5.12: a binding isotherm for the TgMIC4-A5/lactose interaction. Overlaid plots of normalised Δδn vs lactose -4 -6 concentration for a variety of TgMIC4-A5 residues. Fitting of the data yields a Kd of ~1.60 x 10 (± 7 x 10 ) M. The experiment was performed at a protein concentration of ~200 µM.

171

Figure 5.13: a binding isotherm for the TgMIC4-A5/N-acetyl-D-lactosamine interaction. Overlaid plots of normalised Δδn vs N-acetyl-D-lactosamine concentration for a variety of TgMIC4-A5 residues. Fitting of the data -4 -6 yields a Kd of ~1.67 x 10 (± 8 x 10 ) M. The experiment was performed at a protein concentration of ~200 µM.

Calculation of Kd values in this manner places a strong emphasis on precise protein/ligand concentration determination. Protein concentrations of TgMIC4-A5 NMR samples were determined using a NanoDrop® instrument, i.e. via measurement of UV light absorbance at 280 nm, the extent of which is governed by tryptophan and (to a lesser extent) tyrosine content. Although TgMIC4-A5 contains five tyrosine residues, a lack of tryptophan limits its absorptivity, which may impact on the accuracy of determined protein concentrations. Several other methods of protein quantitation were tried (e.g. the Bradford assay), however A280 measurement was found to provide the most consistent and reliable results. Nonetheless inaccuracies in protein concentration determination are possible, meaning the calculated

Kd values must be treated as estimates rather than as definitive.

5.4.3.2. Isothermal Titration Calorimetry (ITC)

The number of parameters obtainable by ITC would make it a preferable method for characterisation of TgMIC4-A5 binding to galactose, lactose and N-acetyllactosamine. However c-value projections make for

-4 a less attractive option. For example, the NMR-derived TgMIC4-A5/galactose Kd of ~2.6 x 10 M would

172 necessitate an TgMIC4-A5 concentration of 2.6 mM in order to achieve c = 10. In addition to potential solubility limitations, the requirement of a 1.4 ml sample volume equates to ~33 mg of a protein which typically yields 1-2 mg per 1 L culture.

Due to their increased binding affinities, lacto-N-biose, galacto-N-biose and GM1-penta represented more viable ligands for ITC. Based on the NMR-derived Kd values for the β1,4-linked disaccharides (~1.6 -4 x 10 M), it was estimated that TgMIC4-A5 interacts with lacto-N-biose or galacto-N-biose with a Kd of ~1.0 x 10-4 M, necessitating a protein concentration of 1 mM to achieve c = 10. However, the requirement of high protein concentrations confers a need for high ligand concentrations, since ligand is injected into the sample chamber at a concentration 10-fold greater than the protein. For this reason, the interaction of TgMIC4-A5 and the particularly expensive GM1-penta was not characterised by ITC. Similarly, the technique was only applied to one of the β1,3-linked disaccharides - lacto-N-biose - since the Kd and thermodynamic parameters are likely to be similar.

The ITC profile for the TgMIC4-A5/lacto-N-biose titration is demonstrative of an interaction (figure 5.14).

3 -1 4 The data was fitted to a one-site binding model, calculating Ka ~9.2 x 10 M , equating to Kd ~1.1 x 10 M. The interaction is spontaneous (ΔG = -5568.5 cal mol-1), with a favourable change in enthalpy (ΔH = - 5642 ± 358 cal mol-1) offset by only a minor unfavourable entropy change (ΔS = -0.241 K-1 cal mol-1).

5.4.3.3. Summary

The estimated dissociation constants for various TgMIC4-A5/ligand interactions are summarised in table 5.4. The values are consistent with the trends observed in the NMR titration data; galactose binds less strongly than β1,4-linked disaccharides (i.e. lactose/N-acetyl-D-lactosamine), which themselves bind less strongly than β1,3-linked disaccharides (i.e. lacto-N-biose). The affinity of GM1-penta has not been

-6 quantified, however the occurrence of slow-exchange suggests a Kd of the order ~10 M.

Ligand Dissociation constant, Kd Technique Galactose 2.60 x 10-4 M (260 μM) NMR Lactose 1.60 x 10-4 M (160 μM) NMR N-acetyl-D-lactosamine 1.67 x 10-4 M (167 μM) NMR Lacto-N-biose 1.10 x 10-4 M (110 μM) ITC

Table 5.4: estimated TgMIC4-A5/ligand dissociation constants.

173

Figure 5.14: isothermal titration calorimetry (ITC) data for the TgMIC4-A5/lacto-N-biose interaction. A) The raw data depicts a series of negative spikes, denoting heat release due to a protein/ligand interaction. The size of the thermal changes decreases towards the end of the titration, as the protein reaches saturation. B) Integration of the raw data reveals the energy released (in kcal) per mole of injectant, plotted against molar ratio to give a -4 binding isotherm. The curve was fitted to a one-site binding model, yielding a dissociation constant Kd ~1.10 x 10 M. The experiment was carried out at a protein concentration of ~650 µM.

174

5.4.4. Co-operative GM1-penta binding by TgMIC4-A5 and TgMIC1-NT

As detailed in chapter 1.4.3.1, binding of TgMIC1-NT to sialylated oligosaccharides has been previously demonstrated, including the resolution of crystal structures of complexes with several ligands (Blumenschein et al. 2007; Garnett et al. 2009). One such ligand, 3’-sialyl-N-acetyllactosamine (NeuAcα2,3Galβ1,3GlcNac), is almost identical to a trisaccharride sequence found within GM1-penta (Galβ1,3GalNAcβ1,4*NeuAcα2,3+Galβ1,3Glc) (appendix A9). (In fact, the fragment within GM1-penta differs by the absence of a N-acetyl substituent in the Glc saccharide, however crystal structures show that TgMIC1-NT does not closely contact this unit; see figure 1.10B).

Although carbohydrate microarray experiments of TgMIC1-NT demonstrate a very weak affinity for GM1-penta (Blumenschein et al. 2007), it was postulated, based on the observations discussed above, that the protein and pentasaccharide should interact strongly. Furthermore, the demonstration that TgMIC4-A5 binds a different moiety of GM1-penta - the branched-chain galactose terminus - further encouraged speculation that both TgMIC1-NT and TgMIC4-A5 (known hereafter as TgMIC4-A5) could bind the oligosaccharide in tandem. These notions were assessed via NMR titration experiments.

5.4.4.1. Production and NMR analysis of recombinant TgMIC1-NT

The purification profile for recombinant TgMIC1-NT is depicted in figure 5.15. The protein was extracted from cell lysate via Ni2+-affinity chromatography (figure 5.15A), liberated from the thioredoxin-tag using factor Xa protease (figure 5.15B) and recovered via a second Ni2+-affinity chromatography step (figure 5.15C). The folded state of purified TgMIC1-NT was verified via acquisition of a 1H-15N HSQC spectrum, before titration with GM1-penta. GM1-penta induces numerous chemical shift perturbations in the 1H- 15N HSQC (figure 5.16A). Like for TgMIC4-A5, the interaction occurs in slow-exchange for all observable shifts, with saturation achieved by two molar equivalents of GM1-penta.

5.4.4.2. Titration of TgMIC1-NT/GM1-penta with TgMIC4-A5

The NMR spectra depicted in figure 5.17 demonstrate that TgMIC4-A5 displaces TgMIC1-NT from GM1- penta. Following addition of one molar equivalent of TgMIC4-A5 to TgMIC1-NT/GM1-penta, TgMIC4-A5 converts entirely to the bound-state; demonstrated by the 2D HNCO spectrum (figure 5.16B).

175

Figure 5.15: TgMIC1-NT purification and NMR analysis. SDS-PAGE analysis of purification M = molecular weight markers. A) Ni2+ affinity chromatography. Lane 1 = cell lysate supernatant; lane 2 = column flow-through; lane 3 = 20 mM imidazole wash; elution of TgMIC1-NT/trx in 200 mM imidazole. B) Factor Xa digestion of TgMIC1-NT/trx. Lane 1 = prior to protease addition; lane 2 = 18 hours after protease addition. The reaction yields TgMIC1-NT and trx fragments (arrowed). C) Reverse Ni2+ affinity chromatography. Lane 1 = digested sample; lane 2 = column flow- through; lane 3 = column wash; lane 4 = elution of column-bound components in 200 mM imazdole; lane 5 = the NMR sample (pooled, gel filtered and concentrated).

Concurrent reversion of approximately 50% of TgMIC1-NT to the free state is detectable in the 1H-15N HSQC spectrum (figure 5.16Aii). Since TgMIC4-A5 saturation requires an excess of GM1-penta (i.e. a 2:1 ratio), a residual quantity of free ligand exists at any given time, resulting in the presence of some retained bound-state TgMIC1-NT. Addition of a further molar equivalent of TgMIC4-A5 forces complete displacement of TgMIC1-NT from the residual ligand (see figure 5.16Aiii).

The NMR titrations with GM1-penta of TgMIC4-A5 (figure 5.5) and TgMIC1-NT both describe an interaction occurring in slow-exchange, with saturation at approximately 2 molar equivalents. Whilst this indicates that the proteins bind the ligand with approximately equal affinities, the displacement of TgMIC1-NT by TgMIC4-A5 suggests that the latter exhibits a greater affinity. This may arise due to overlap of the saccharides contacted by each protein. As previously described, carbohydrate microarray data suggests that recognition of galactosylated oligosaccharides by TgMIC4-A5 is enhanced in the presence of branched chain sialic acid units (see chapter 5.2.2.2), Thus, it is possible that adhesion to the galactose terminus of GM1-penta by TgMIC4-A5 is coupled with ‘stripping’ of the sialic acid terminus from the surface of TgMIC1-NT.

176

Figure 5.16: Non-synergistic GM1-penta binding by TgMIC1-NT and TgMIC4-A5. A) The 1H-15N HSQC of TgMIC1-NT in the absence (black) and presence (red) of 2 molar equivalents of GM1-penta. The enlarged area (i) depicts a section of the spectrum in which the free and bound-states of the protein are clearly represented. Addition of 1 molar equivalent of TgMIC4-A5 initiates TgMIC1-NT displacement from GM1-penta (ii; green). Addition of a further molar equivalent (i.e. ‘saturation’ of GM1-penta) leads to total displacement of TgMIC1-NT (iii; green). B) Superimposed pseudo-2D HNCO spectra of TgMIC4-A5 before (black) and after (red) addition of one molar equivalent to the TgMIC1-NT/GM1-penta complex. The protein converts almost entirely to the bound state. The correlation of 1HN and 15NH resonances with 13C enables selective detection of TgMIC4-A5.

177

5.5. Discussion

5.5.1. Comparison of NMR and microarray data

The microarray and NMR titration data are consistent in that they demonstrate that TgMIC4-A5 adheres only to terminally-galactosylated oligosaccharides. However there are several discrepancies in terms of the titrated ligands. For example, the microarray data suggest that incorporation of an N-acetyl group into lactose (giving N-acetyl-D-lactosamine) causes a significant increase in binding affinity, with an approximate two-fold increase in fluorescence intensity (table 5.1, probes #1 and #2). Contrarily, the NMR titration data suggest that the interactions are analogous. Additionally, the microarray reports that N-acetyl-D-lactosamine (Galβ1,4GlcNAc) binds significantly tighter than its β1,3-linked analogue (lacto- N-biose). This is in direct contrast with the NMR titration data, which suggests that β1,3-linked disaccharides bind significantly tighter than β1,4-linked analogues. Additionally, the proportional relationship between ligand size and binding affinity apparent in the NMR data is not reflected by the microarray data.

These discrepancies may arise due to the immobilisation of carbohydrate probes during microarray analysis. This may restrict ligand dynamics or cause steric hindrance of protein binding, whilst, as previously mentioned, amphipathy-mediated receptor clustering may enhance binding affinities. NMR titrations were performed in solution, providing an isotropic environment. Additionally, the proportionality between ligand size and binding affinity observed via NMR represents a logical and rational trend. Together, these factors suggest that the NMR data is more credible. Therefore, whilst the microarray procedure has proved invaluable in identifying candidate ligands of TgMIC4-A5, conclusions in this study regarding the nature/affinity of the interactions are based upon the NMR data. For example, it is believed that TgMIC4-A5 binds β1,3-linked galactose with a greater affinity than β1,4- linked analogues, although at this stage it is not possible to comment on the mechanism behind this discrimination.

5.5.2. Sequence & structure comparison with other apple domains

Previous studies of TgMIC4 orthologues S. muris lectin (SML-2) and N. caninum MIC4 have demonstrated that these proteins are also capable of binding to lactose (Klein et al. 1998; Keller et al.

178

2004). Although the host-adhesive activity of NcMIC4 has yet to be localised, the high sequence identity of its fifth apple domain with TgMIC4-A5 (59%) suggests that it will share its structure, complete with galactose-binding pocket, and mode of ligand-binding. The same applies to the solitary full-length apple domain of SML-2, which also shares a high sequence identity (43%) with TgMIC4-A5. Consistent with these observations, the residues encircling the putative ligand-binding pocket of TgMIC4-A5 are conserved or similar in the equivalent domains (figure 1.7). N51, K60, Y67 and Y69 are conserved throughout, and K19 is conserved in NcMIC4 TgMIC4-A5, with an equivalent arginine residue in SML-2 ensuring conservation of residue type. Although Y58 is substituted in both cases, aromaticity is retained in the form of a histidine residue, which likely maintains adhesive properties.

It has previously been stated that the E. tenella MIC4/5 complex is capable of binding to a lactose- affinity column (Periz et al. 2007). Based on these studies of TgMIC4-A5, and the previous studies of NcMIC4 and SML-2, it appears likely that this occurs via the orthologous EtMIC5, which contains 11 apple domains (figure 1.5). The precise location(s) of this activity have yet to be determined, however TgMIC4-A5 bears the highest sequence identity to EtMIC5-A3 (30%), which retains many of the key galactose-binding residues (figure 1.7).

Within TgMIC4, A5 shares high sequence identities with A1 (47%) and A3 (34%), yet the N-terminal fragment of TgMIC4 (containing the first four apple domains), liberated during in vivo proteolysis, does not bind to host cells (Brecht et al. 2001). This suggests that A1 and A3 do not share the ability of A5 to bind galactosylated oligosaccharides. Indeed, microarray analysis of TgMIC4-A12 performed in collaboration with our laboratory confirms that the domains are devoid of lectin-like activity (Dr Yan Liu & Dr Jan Marchant, Imperial College London, personal communications).

Interestingly, many of the residues surrounding the putative ligand-binding site of TgMIC4-A5 (and its orthologues) are conserved, or similar, in A1 and A3 (figure 5.17A). N51, K60 and Y67 are conserved throughout, whilst A1 contains arginine and histidine residues at positions 19 and 58; substitutions which appear to be tolerable for ligand-binding based on the sequence comparison of SML-2 and NcMIC4 with TgMIC4-A5. Y58 is additionally conserved in A3, which also sees vague conservation of residue type (i.e. extended aliphatic chain) at position 19, in the form of a methionine residue. The conservation/similarity of these residues in the non-adhesive domains suggests that they may not be

179 key mediators of the TgMIC4-A5/galactose interaction. Contrarily, Y69 is substituted for rather different residue types in both domains, which suggests that it may be a key residue for galactose-binding.

As stated previously, the solution structure of TgMIC4-A12 has recently been solved in our laboratory (Dr Jan Marchant). This enables structural comparison of TgMIC4-A5 with the closely-related A1 domain and rationalisation of their differing functions (figure 5.17B). This reveals that the presence of a leucine residue at position 69 leaves the pocket more exposed than in TgMIC4-A5, and results in the loss of an aromatic ring against which a galactose ring may stack and establish crucial contacts.

Figure 5.17: structural comparison of A1 and A5. A) an alignment of A1, A3 and A5 sequences. The A5 residues which encircle the binding pocket (i.e. labelled in figure 5.10B) are highlighted red. B) the molecular surface of A1 (left-panel) alongside a superimposed stick representations of the relevant side-chains of A1 (blue) and TgMIC4-A5 (green). A1 residues are labelled. Both images are viewed from the same perspective as the images in figure 5.10B.

5.5.3. Concluding remarks

The results presented in this chapter provide the first insight into the mode of host cell recognition and binding by TgMIC4-A5. Potential ligands were first identified via carbohydrate microarray assays, demonstrating, as expected, a propensity for adhesion to galactose-terminated oligosaccharides. This enabled the selection of various ligands for further studies, namely titration and chemical shift

180 perturbation analyses, revealing a conserved mode of galactose recognition. Chemical shift mapping identified a putative ligand-binding pocket, containing numerous aromatic and hydrogen-bonding groups; consistent with previously established modes of carbohydrate (i.e. particularly galactose) recognition by proteins. This has further enabled rationalisation of the apparent abilities of other apicomplexan MICs/apple domains to bind galactose.

The NMR titration data demonstrates a proportional relationship between oligosaccharide size and binding affinity, with the pentasaccharide fragment of GM1 ganglioside binding significantly more strongly (slow-exchange) than galactosyl monosaccharide and disaccharides (fast/intermediate- exchange). Additionally, Kd measurements from NMR and ITC demonstrate that TgMIC4-A5 discriminates β1,3 and β1,4-linked galactose. Neither of these trends is corroborated by the carbohydrate microarray data, however this data does strongly suggest that binding affinity is enhanced for oligosaccharides containing a branched negative group. Together these observations imply that saccharides preceding galactose make significant contributions to protein interactions. The fact that these contributions are not conveyed by the 1H-15N HSQC-monitored titrations is consistent with a binding model where galactose sits within the putative ligand-binding cavity of TgMIC4-A5 whilst preceding units span the surface of the protein.

The microarray procedure incorporates numerous probes which are found on host cell surfaces, namely an array of glycosphingolipids; ceramide-conjugated oligosaccharides which are prevalent on vertebrate cell surfaces (Lopez & Schnaar 2009). Though many of the ligands selected for more detailed studies represent fragments of larger oligosaccharides in the microarray, it is known that both lactose and GM1- penta are commonly incorporated into glycosphingolipids, giving rise to lactosylceramide and GM1 ganglioside, both of which have been identified as receptors for microbial proteins (Karlsson 1998). These therefore represent potential cognate in vivo receptors for TgMIC4.

Although the results presented in this chapter provide useful insight into galactose-binding by TgMIC4- A5, the conclusions drawn are largely speculative and based on circumstantial evidence at this stage. In order to definitively define the mechanism of ligand-binding by TgMIC4-A5, the atomic structure of a TgMIC4-A5/ligand complex has been sought, as described in chapter 6.

181

Chapter 6: Calculation of an TgMIC4-A5/lacto-N- biose complex structure.

182

6.1. Introduction

Chapter 5 describes the initial characterisation of the interactions of TgMIC4-A5 with various galactosylated ligands, encouraging conjecture regarding the mechanism of galactose recognition by the domain. This chapter describes the attempts made to characterise the TgMIC4-A5 /galactose interaction in atomic detail.

6.2. Considerations for structure determination of protein/carbohydrate complexes

In terms of protein complex structure determination, carbohydrates represent particularly challenging binding partners. Interactions are typically of weak affinity, making for relatively unstable, dynamic complexes which can prove difficult to characterise. For example, as reported in chapter 5.4.3, TgMIC4- A5 binds to its various ligands with dissociation constants of the order 10-4 M. Although several of these complexes bear characteristics of slow-exchange on the NMR timescale, they represent weak affinities

-12 in the overall context of protein/ligand interactions (where Kd can reach 10 M). Nonetheless, there are several methods by which detailed structural insights might be gained.

6.2.1. X-ray crystallography

The most common route to solving a protein/carbohydrate complex structure is via X-ray crystallography. This is partly due to the limited applicability of NMR spectroscopy to protein/carbohydrate systems (discussed below). However X-ray crystallography is highly productive in its own right, largely due to the ability to introduce ligand molecules via soaking of existing well- diffracting protein crystals. This provides a routine means of high-throughput ligand screening and X-ray structure determination (Hassell et al. 2007; Nienaber et al. 2000; Schieborr et al. 2005). However, there are problems associated with the method, such as the disruption of protein crystals during soaking (Schieborr et al. 2005) or, more fundamentally, failure to produce high-quality protein crystals.

6.2.2. NMR spectroscopy

Solution structure determination of a protein/ligand complex typically involves assignment of each molecule in its bound-state and, crucially, measurement and assignment of intermolecular NOE’s

183

(iNOE’s), giving rise to distance restraints to guide a structure calculation. iNOE detection is maximal in cases where molecules interact across large interfaces with high affinity, therefore this protocol is regularly employed for slow-exchanging complexes involving protein or DNA/RNA ligands. However, its application to carbohydrate ligands is limited; they are small molecules which usually bind proteins with weak affinity; the abundance of hydroxyl groups means that protein contacts are often made via hydrogen-bonds through protons which are unobservable via NMR spectroscopy using aqueous solvents; and their NMR spectra are usually heavily overlapped due to the relative degeneracy of heterocyclic ring hydrogen chemical shifts, complicating chemical shift assignment. Consequently, there are very few examples of protein/carbohydrate complex solution structures calculated using iNOEs (Asensio et al. 1995; Espinosa et al. 2000).

There are several alternative NMR-based methods used to study protein-carbohydrate interactions. For example, measurement of transferred NOEs (trNOE), where cross-relaxation between bound-state ligand nuclei is ‘transferred’ to the more-easily observed free-state via chemical-exchange, provides information on the bound-state conformation of the ligand (Casset et al. 1997; Clore & Gronenborn

1982). Detection is optimal for particularly weak interactions (i.e. mM Kd, fast-exchange) involving larger proteins (>30 kDa) (Clore & Gronenborn 1982; Yamaguchi 2008). Meanwhile, saturation-transfer difference (STD) NMR aims to identity the binding epitope(s) within a ligand structure (Mayer et al. 1999). This involves selective saturation of protein resonances, leading to magnetisation transfer to nuclei on the bound ligand via spin diffusion. The technique is therefore best suited to fast-exchange interactions involving large proteins, in which spin diffusion is more efficient (Xia et al. 2010).

TrNOE and STD-NMR methods can provide complementary data (Angulo et al. 2006; Angulo et al. 2008), and are generally advantageous for carbohydrate systems in that they are best suited to fast-exchange (i.e. weak) interactions. However, neither method provides information regarding the orientation of the ligand with respect to the protein.

In the past decade the laboratory of James Prestegard has aimed to implement novel NMR-based methods to characterise protein/carbohydrate interactions. This began with determination of a galactin- 3/N-acetyllactosamine complex using restraints derived from paramagnetic relaxation enhancement

(PRE); the dampening of NMR signals due to increased T2 relaxation induced by a paramagnetic centre (i.e. spin-labelled carbohydrate) (Jain et al. 2001). Subsequently, measurement of residual dipolar

184 couplings (RDCs) has proved effective in yielding orientational data for protein/carbohydrate complexes (Jain et al. 2003; Zhuang et al. 2008; Zhuang et al. 2006). In fast-exchanging complexes, this usually requires enhanced alignment of the protein (and thereby the protein/carbohydrate complex) in order to distinguish averaged free- and bound-state ligand data. This necessitates chemical modification of the protein, via conjugation of a sulphated lipid tail to a free cysteine residue (Zhuang et al. 2006) or incorporation of a lanthanide-binding tag enabling chelation of a paramagnetic ion (Zhuang et al. 2008). In the latter case, the observance of pseudo-contact shifts (PCS) further aids structure calculation.

6.2.3. Application to the study of TgMIC4-A5/carbohydrate complexes

Unfortunately attempts to crystallise TgMIC4-A5 have been unsuccessful. The protein was entered into six Imperial College London crystallisation screens at the optimal concentration (determined via PCTTM pre-crystallisation test (Hampton Research)) of 8 mg/ml, yielding no hits. As previously noted (see chapter 4.4.1.1), TgMIC4-A5 is dynamic at its disulphide-linked N- and C-terminus, which may prevent crystallisation.

In the absence of protein crystals, focus was maintained upon NMR-based methods. Since both trNOE and STD-NMR methods are best utilised for fast-exchanging complexes involving large proteins, TgMIC4- A5, at approximately 10 kDa, is not best suited to these techniques. Additionally, the modifications required to obtain RDC-derived structure restraints appear to be unsuitable for TgMIC4-A5; the introduction of a free cysteine residue would likely perturb the folding of the heavily disulphide-linked domain, whilst a lanthanide-binding tag would require N- or C-terminal stability and rigidity (which does not appear to exist in TgMIC4-A5) in order to be effective. Similarly, measurement of paramagnetic relaxation enhancements would require spin-labelled ligand, involving a rather extensive organic chemical synthesis (Jain et al. 2001).

Due to the unsuitably of these techniques to the various TgMIC4-A5/ligand systems, efforts were focused on detection of iNOEs in order to generate distance restraints for structure calculation via docking (using HADDOCK). Due to its relatively low cost and status as one of the tighter-binding TgMIC4- A5 ligands, lacto-N-biose (Galβ1→3GlcNAc) was selected for these studies.

185

6.3. Materials & Methods

6.3.1. Intermolecular NOE detection and assignment

The processes of intermolecular NOE detection were carried out via acquisition of multiple NMR spectra, as detailed below. All spectra were acquired in NMR buffer, at 303 K, using an in-house Bruker Avance III DRX600 spectrometer equipped with a TCI cryoprobe, unless stated otherwise. Data was processed, visualised and referenced as previously described (chapter 4.3.1).

6.3.1.1. The 13C-edited NOESY experiment

Intermolecular NOEs can be detected by recording a 13C-edited NOESY-HSQC spectrum of a complex, using one 13C-labelled component and one unlabelled component. The experiment is the same in principal as a conventional 13C-NOESY-HSQC but with suppression of signal from 13C-coupled 1H nuclei prior to the NOE step, ensuring selective cross-relaxation from ligand to protein 1H nuclei, i.e. ‘filtering’ of protein intra-molecular NOEs. Following NOE transfer, protein 1H nuclei resonances are additionally correlated with the J-coupled 13C nucleus (Zwahlen et al. 1997).

The spectrum was therefore acquired using a uniformly 13C-labelled TgMIC4-A5 sample saturated with five molar equivalents of lacto-N-biose. Identical spectral parameters were used to the 13C-NOESY-HSQC detailed in table 4.1. The spectrum as acquired using an in-house Bruker Avance II DRX800 spectrometer equipped with a TXI cryoprobe.

6.3.1.2. TgMIC4-A5 aromatic chemical shift re-assignment

As will be discussed in chapter 6.4.1, the detected intermolecular NOEs appeared to exclusively correlate aromatic TgMIC4-A5 nuclei. It was therefore decided that only chemical shift re-assignment of ligand-bound aromatic nuclei was necessary at this stage. This was achieved via NMR titration of 13C- labelled TgMIC4-A5 with lacto-N-biose using the protocol described in chapter 5.3.1.2, but with monitoring of the chemical shift perturbations of aromatic 1H/13C nuclei via the 1H-13C HSQC (aromatic- selective) spectrum (detailed in table 4.1).

186

6.3.1.3. Lacto-N-biose 1H chemical shift assignment

Lacto-N-biose chemical shift assignment was carried out using conventional 2D NMR methods (reviewed in Bubb 2003. Since the 13C-edited NOESY-HSQC experiment does not correlate carbon resonances of the unlabelled component (i.e. lacto-N-biose) efforts were focused primarily on 1H assignment. The NMR spectra acquired in order to achieve this are detailed in table 6.1. Spectra were acquired using a 10 mM solution of lacto-N-biose in 100% D2O unless stated otherwise.

Data Spectral Larmor Experiment Nucleus Comments points width (Hz) frequency (MHz) 1H 1H 32768 7211.539 600.053 F1,F2-13C-reject 1H 8192 5411.255 600.053 1 mM LNB 1H-1H TOCSY 1H 512 5411.255 600.053 + 250 μM 13C-apple5. 1H 8192 5411.255 600.053 Recorded with mixing times of 1H-1H TOCSY 1H 512 5411.255 600.053 10,30, 70 and 100 milliseconds. 1H 8192 5411.255 600.053 1H-1H COSY 1H 512 5411.255 600.053 1H 8192 5411.255 600.053 1H-1H NOESY 1H 512 5411.255 600.053 1H 2048 4795.396 600.053 1H-13C-HSQC 13C 178 9049.817 150.895

13 DEPT-135 C 32768 16666.667 150.895

Table 6.1: acquisition parameters for the NMR spectra acquired for chemical shift assignment of lacto-N-biose.

6.3.2. Structure calculation of a TgMIC4-A5/lacto-N-biose model

Chemical shift assignment of ligand-bound TgMIC4-A5 aromatic nuclei and lacto-N-biose 1H nuclei enabled assignment of intermolecular NOEs, which were used to drive docking of the molecules using HADDOCK (Dominguez et al. 2003).

187

6.3.2.1. HADDOCK (High Ambiguity Driven protein-protein Docking)

HADDOCK is a computer program which performs molecular docking driven by experimental data . Upon release, it aimed to provide a means of protein/protein complex structure modelling in instances where atom-specific restraints (i.e. from iNOE or RDC data) are unavailable. The authors describe the use of chemical shift perturbations and site-directed mutagenesis data to identify “active” residues (i.e. those which undergo large ligand-induced chemical shift perturbation, or mutation of which abrogates binding). These are translated into ambiguous interaction restraints (AIRs) (specifying a maximum distance of 3 Å between any atom from an active residue and any atom from any residue on its binding partner) to guide molecular docking. The authors concluded by suggesting that iNOEs could be implemented to aid docking, and there are several recent instances which demonstrate this (Long et al. 2010; Matta-Camacho et al. 2009; Nudelman et al. 2011).

Whilst the software was originally designed for protein/protein docking, the latest versions (v2.0 and above) of HADDOCK have been expanded to enable docking of protein/carbohydrate complexes (de Vries et al. 2007), and this has recently been applied with some success in our laboratory (Livia Lai, personal communication).

6.3.2.2. Docking protocol

HADDOCK shares many similarities with ARIA (described in chapter 4.3.4); both utilise CNS as the structure generation engine, and employ torsion-angle dynamics simulated annealing (TAD-SA) guided by parallhdg5.3.pro (parameter) and topallhdg5.3.pro (topology) force-fields. However, whilst ARIA begins with an extended polypeptide, HADDOCK performs docking of rigid molecules.

TgMIC4-A5/lacto-N-biose docking was carried out using the default HADDOCK protocol. This consists of three stages, beginning with the generation of 1000 structures via rigid body docking and energy minimisation (iteration 0). These structures are ranked based on a variety of parameters and the best 200 structures carried into iteration 1, where they are refined via a three-step TAD-SA protocol. First, the system is heated to 2000 K before cooling to 50 K over 1000 steps whilst the relative orientation of rigid molecules is optimised. This is then repeated, but with flexibility of active residue side-chain atoms permitted, and an additional 3000 cooling steps. Finally, the system is heated to 500 K and cooled to 50

188

K over 1000 steps, with flexibility permitted in both side-chain and backbone atoms of active residues. The 200 structures are then energy-minimised before water-refinement in Cartesian space, during which flexibility is permitted in the backbone atoms of non-active residues (Dominguez et al. 2003).

6.3.2.3. Calculation set-up

6.3.2.3.1. PDB files

The TgMIC4-A5 solution structure containing an N-/C-terminal disulphide-bond was selected for molecular docking. Since the putative ligand binding site of TgMIC4-A5 lies on the opposing face of the protein to this linkage, the debated presence of this linkage is irrelevant with regards to ligand docking. Ten .pdb files were uploaded, representing the complete ensemble of lowest-energy conformers. A lacto-N-biose .pdb file was generated using the GlyCaNS webserver (http://haddock.chem.uu.nl/glycans) created by Alexandre Bonvin and Michael Krzeminski (University of Utrecht). This server also provides parameter and topology files, adapted for carbohydrate molecules, which were implemented in the calculation.

6.3.2.3.2. Restraints files

Intermolecular NOEs were implemented as distance restraints between the protein and ligand atom pairs. Each restraint was attributed an upper distance bound of 5.0 Å; the approximate upper limit of

1 the NOE (since I ∝ /r6), and a lower bound of 1.8 Å, in order to avoid steric clashes. In addition, a selection of chemical shift perturbation-derived AIRs were assigned. Since chemical shift perturbations are transmitted across a large surface area of TgMIC4-A5 (figure 5.10), a refined set of residues was selected, with K19, N51, Y58, K60, Y67 and Y69 defined as ‘active’. The assignment of the intermolecular NOEs enabled exclusion of distant residues.

In terms of lacto-N-biose, the involvement of both Gal and GlcNAc saccharides prompted the definition of both “residues” as active. This ensured flexibility about rotatable bonds, such as the glycosidic linkage, aimed at preventing conformational bias of the docked disaccharide towards the structure generated by GlyCaNS.

189

6.3.2.4. Definition of the flexible interface

In addition to residues governed by distance-based restraints, it is often desirable to permit flexibility in additional residues at the molecular interface during simulated annealing. Whilst previously such regions have had to be assigned manually, the latest versions of HADDOCK (i.e. v2.0 and beyond) enable automation of this process (de Vries et al. 2007); a feature utilised for TgMIC4-A5/lacto-N-biose docking. In this mode, flexibility is conferred to all residues within 5 Å of the partner molecule.

6.3.2.5. Analysis of HADDOCK output

Analysis of HADDOCK calculations was carried out using the provided scripts. First, the 200 structures were organised into clusters based upon their RMS deviation. Clusters were then individually assessed for a variety of parameters; buried surface area, RMSD and various energy terms. The ‘HADDOCK score’ is a weighted sum of the van der Waals, electrostatic, desolvation and restraint violation energies, and the buried surface area. The cluster with the most favourable statistics was taken to represent the calculated structure of TgMIC4-A5/lacto-N-biose.

6.3.3. Mutagenesis studies

Based on the calculated structure of TgMIC4-A5/lacto-N-biose, several key residues were identified and mutated in order to observe the effect on ligand binding. TgMIC4-A5(TEV) pET-32 Xa/LIC constructs encoding individual amino acid substitutions K19A, K60M and Y69L were generated using a QuikChange II Site-Directed Mutagenesis kit (Agilent) according to the manufacturer’s instructions. Nucleotide substitutions were introduced into TgMIC4-A5(TEV) DNA via PCR (appendix A4) with mutagenic primers designed using the QuikChange Primer Design Application (Agilent). Reaction mixtures were then incubated with DpnI at 37oC for 1 hour before transformation of XL-1 blue super-competent E. coli (using the protocol described in chapter 3.3.2.4). Plasmids were purified via mini-prep (chapter 3.3.2.5) and mutated DNA sequences were verified by GATC-Biotech. Expression and purification of 15N-labelled proteins was carried out as for wild-type (wt) TgMIC4-A5 (chapter 3.3.3-4). Assessment of protein folding was carried out via 1D and 2D NMR spectroscopy (chapter 3.3.5), and lacto-N-biose binding ability of each protein was carried out via NMR titration (chapter 5.3.1.2).

190

6.4. Results

6.4.1. The 13C-edited NOESY spectrum of TgMIC4-A5/lacto-N-biose

The 13C-edited NOESY-HSQC spectrum of TgMIC4-A5/lacto-N-biose detected a total of seven intermolecular NOEs (figure 6.1 & table 6.2). Four different protein 1H nuclei appear to be encompassed, with chemical shifts of 7.21, 6.92, 6.78 and 6.52 ppm, and J-coupled 13C chemical shifts of approximately 48 or 65 ppm (i.e. aliased from 117 or 134 ppm in the ‘folded’ spectrum). These values are diagnostic of aromatic nuclei, consistent with the chemical shift mapped structure of TgMIC4-A5 (figure 5.10), where several tyrosine residues encircle the putative galactose-binding pocket. In terms of lacto-N-biose, the iNOEs correlate 1H chemical shifts at 3.5-4 ppm; the spectral window within which carbohydrate ring protons usually resonate, and approximately 2 ppm; likely to represent the methyl group of the N- acetylglucosamine saccharide. This aside, the correlated nuclei cannot be unambiguously assigned at this stage (table 6.2).

Figure 6.1: the 13C-edited NOESY-HSQC of TgMIC4-A5/lacto-N-biose. Seven intermolecular NOEs were detected (numbered 1-7), depicted here in spectral strips. Each NOE correlates a protein 1H resonance (horizontal axis) with a carbohydrate 1H resonance (vertical axis). In the native spectrum, NOE peaks are resolved along a 13C axis.

191

TgMIC4-A5 Lacto-N-biose iNOE 1 13 1 H (ppm) CJ (ppm) Assignment H (ppm) Assignment #1 3.93 Ring proton 7.21 134.6 Aromatic #2 3.66 Ring proton #3 3.78 Ring proton 6.92 133.8 Aromatic #4 2.04 GlcNAc-CH3 #5 3.78 Ring proton 6.78 116.8 Aromatic #6 2.04 GlcNAc-CH3 #7 6.52 116.8 Aromatic 3.53 Ring proton

Table 6.2: TgMIC4-A5/lacto-N-biose intramolecular NOEs. Summarising the chemical shifts of correlated protein and ligand nuclei. At this stage, very few unambiguous assignments can be made.

6.4.2. TgMIC4-A5 aromatic chemical shift re-assignment

Chemical shift assignment of ligand-bound TgMIC4-A5 aromatic nuclei was achieved via titration and observation of chemical shift perturbations in the aromatic-selective 1H-13C HSQC spectrum. Y58-CHε aside, the chemical shift perturbations induced upon TgMIC4-A5 aromatic nuclei occur in fast-exchange, enabling manual assignment of bound-state resonances (table 6.3 and figure 6.2).

6.4.3. Lacto-N-biose 1H chemical shift assignment

6.4.3.1. Unbound- vs bound-state lacto-N-biose

In the absence of 13C-labelled lacto-N-biose (which is not commercially available) the logical way to assign the protein-bound carbohydrate is via acquisition and analysis of 13C-edited NMR spectra (i.e. selective for unlabelled ligand resonances) of a 13C-TgMIC4-A5/lacto-N-biose complex. However, in the saturated complex the presence of a five-fold ligand excess means that approximately 80% of lacto-N- biose molecules will be unbound at any given time. Therefore the lacto-N-biose chemical shifts detected during 13C-edited NOESY-HSQC acquisition (i.e. weighted averages of bound and unbound chemical shifts) may simply represent unbound the state, making the resonances of unbound lacto-N-biose applicable, thus avoiding the need for less sensitive 13C-edited experiments.

192

Figure 6.2: Assignment of ligand-bound TgMIC4-A5 aromatic nuclei. Superimposed 1H-13C HSQC (aromatic- selective) spectra of TgMIC4-A5 in the presence of 0 (black), 2 (green), 4 (blue) and 5 (red) molar equivalents of lacto-N-biose. Resonance assignments for the unbound state are annotated. The observation of incremental chemical shift changes enables simple elucidation of bound-state assignment.

Residue/type 1H (ppm) 13C (ppm) δ 6.73 132.9 Y48 ε 6.57 116.8 δ 6.92 131.2 Y50 ε 6.80 117.6 δ 7.20 134.0 Y58 ε - - δ 7.02 132.9 Y67 ε 6.52 117.0 δ 6.92 133.0 Y69 ε 6.78 117.4

Table 6.3: chemical shifts of lacto-N-biose-bound TgMIC4-A5 tyrosine aromatic resonances. Summarising the chemical shifts of TgMIC4-A5 tyrosine δ and ε nuclei in the lacto-N-biose-bound state. Y58ε cannot be assigned as the resonance is exchange-broadened beyond detection.

193

In other to ascertain whether the averaged lacto-N-biose chemical shifts detected in the context of the TgMIC4-A5/lacto-N-biose complex are perturbed from those of unbound lacto-N-biose, the 13C-edited (i.e. F1,F2-13C-reject) 1H-1H TOCSY spectrum of saturated 13C-TgMIC4-A5/lacto-N-biose and the 1H-1H TOCSY spectrum of lacto-N-biose were recorded. The spectra are near-identical (figure 6.3), confirming that whilst the chemical shifts of lacto-N-biose may be perturbed upon protein-binding, any perturbations are averaged out by the substantial excess of unbound ligand. Consequently, lacto-N-

1 biose H assignment was carried out via conventional NMR spectroscopy of the unbound molecule.

Figure 6.3: lacto-N-biose bound state vs unbound state. The 1H-1H TOCSY spectrum of lacto-N-biose (black) superimposed with the F1,F2-13C-reject 1H-1H TOCSY spectrum of 13C-TgMIC4-A5/lacto-N-biose (red). The chemical shifts of the ligand in bound and unbound states are almost identical.

194

6.4.3.2. Assignment strategy and implementation

Due to the chemical similarities of saccharide ring protons (visible in the molecular structure of lacto-N- biose; figure 6.4), the NMR spectra of carbohydrates are usually complicated by resonance overlap. This is demonstrated by the 1H NMR spectrum of lacto-N-biose (figure 6.5), in which ring proton resonances cluster at 3.5 - 4 ppm. Excepted from this region are anomeric proton (i.e. H1) signals, which typically shift downfield due to nuclear de-shielding by the proximate ring . The 1H NMR spectrum of lacto-N-biose contains anomeric signals at 4.42, 4.47 and 5.18 ppm. Additionally, the N- acetylgalactosamine (GalNAc) saccharide of lacto-N-biose contains a methyl group, giving rise to a distinct singlet at 2 ppm.

Due to the substantial resonance overlap and ambiguity associated with the 1H NMR spectrum of lacto- N-biose, ring proton chemical shift assignment was carried out using 2D NMR spectroscopy. An annotated 1H-1H TOCSY spectrum is pictured in figure 6.6, with a complete list of chemical shift assignments provided in table 6.4. Assignments were made by first identifying the anomeric (H1) resonances, which represent good starting points for the assignment of subsequent ring protons (Gheysen et al. 2008). The 1H-1H TOCSY spectrum demonstrates that the resonance correlation patterns of the protons at 4.47 and 4.42 ppm are near-identical, suggesting that these signals represent a single nucleus undergoing chemical exchange. Furthermore, these resonances are correlated with only three subsequent ring protons (i.e. H2-H4). This is diagnostic of a galactose saccharide, where the very small J- coupling (1-1.5 Hz) between H4 and H5 results in inefficient magnetisation transfer (Inagaki et al. 1989). The signals at 4.42 and 4.47 ppm are therefore attributable to Gal-H1.

The1H-1H TOCSY spectrum also demonstrates the presence of an additional anomeric proton resonance at 4.75 ppm (i.e. overlapped with water signal in the 1H NMR spectrum). This, along with the signal at 5.18 ppm, represents GlcNAc-H1, which is free to exchange between the equatorial (α-anomer) and axial (β-anomer) positions; characterised by the distinct resonances. These are easily distinguished, as an equatorial H1 undergoes increased nuclear de-shielding (induced by the proximate ring oxygen atom) compared to an axial H1, causing it to resonate further downfield. Thus, the signals at 5.18 and 4.75 ppm can be attributed to GlcNAc-H1α and GlcNAc-H1β respectively.

195

Figure 6.4: the molecular structure of lacto-N-biose. Carbons are numbered according to convention for carbohydrate rings; in ascending order from the anomeric centre. The protons which should be assignable when the molecule is constituted in 100% D2O are highlighted in red.

1 Figure 6.5: the H NMR spectrum of lacto-N-biose. The spectrum was recorded in D2O, and therefore the amide proton resonance (typically found downfield, at 8-10 ppm) is not visible due to hydrogen-deuterium exchange. The broad signal at approximately 4.72 ppm likely represents the resonance of residual water protons.

196

Chemical shift assignment of additional ring protons was relatively straightforward via inspection of 1H- 1H TOCSY spectra. TOCSY resonance correlation occurs during a ‘spin-lock’ period, the duration of which dictates the extent of correlation. In the case of lacto-N-biose, a spin-lock of 100 milliseconds is sufficient to correlate the resonances of H1-H6 in GlcNAc and H1-H4 in Gal (figure 6.6). Influenced by previous studies by Bax & Davis (1985) and Gheysen et al. (2008), a series of 1H-1H TOCSY spectra were acquired with shorter spin-lock durations (down to 10 ms), enabling selective exclusion of resonance correlations of more distant nuclei (figure 6.7). Thus beginning at H1, sequential assignment of subsequent ring protons was carried out, aided by referral to a 1H-1H COSY spectrum (figure 6.8). Inspection of signal multiplicities in the 1H-1H TOCSY spectra helped verify assignments; for example, GlcNAc-H6α gives rise to a “double-doublet” at 3.84 ppm.

Figure 6.6: the 1H-1H TOCSY spectrum of lacto-N-biose. Assignments are annotated. The experiment correlates the resonances of all of the aliphatic protons within each spin system (i.e. saccharide). The spectrum demonstrates that resonance correlation patterns of the anomeric signals at 4.42 and 4.47 ppm are near-identical, suggesting that they represent a single, exchanging nucleus. The spectrum was acquired with a spin-lock duration of 100 milliseconds, enabling correlation of the entire GlcNAc ring.

197

Figure 6.7: the effect of varying TOCSY spin-lock duration. Depicted are aligned strips, extracted at the GlcNAc- H1α resonance frequency (5.18 ppm), from the 1H-1H TOCSY spectrum of lacto-N-biose, acquired with spin-lock durations of 100 ms (black), 70 ms (red), 30 ms (green) and 10 ms (blue). Whilst 100 ms is sufficient to correlate the resonances of H1-H6, incrementally decreasing the duration results in a gradual loss of resonance correlation. This facilitates sequential assignment of the resonances of H2-H6. For example, the triplet at 4.07 ppm is the only signal retained throughout, demonstrating that it represents GlcNAc-H2α.

198

Figure 6.8: the 1H-1H COSY spectrum of lacto-N-biose. Selected assignments are annotated. The spectrum demonstrates the presence of an anomeric proton at 4.75 ppm, in addition to those at 4.42, 4.47 and 5.18 ppm. The experiment correlates the chemical shifts of vicinal and geminal protons, facilitating sequential assignment of ring proton resonances, beginning at the easily-identified anomeric centres (H1).

In the Gal saccharide, the lack of correlation of H1 with H5/6 meant that these nuclei could not be assigned as described above. Instead, assignments were sought via the 1H-1H NOESY spectrum. Though Gal-H1 is distant from Gal-H5/6 in the saccharide ring chain, the nuclei are spatially proximate (figure 6.4), giving rise to the observance of NOEs (figure 6.9A). Gal-H5 and H6 resonances were distinguished via 13C NMR spectroscopy, with detection of the 13C isotope at natural abundance. Acquisition of a DEPT-

135 spectrum, in which CH2 groups appear anti-phase to CH and CH3 groups (Claridge 1999), enabled identification of C6 chemical shifts (figure 6.9B), which were used to probe a 1H-13C HSQC spectrum to identify the Gal-H6 chemical shift (figure 6.9C).

199

Figure 6.9: Gal-H5 and H6 assignment. A) a section of the 1H-1H NOESY spectrum of lacto-N-biose, depicting NOEs from the Gal-H1 resonance frequencies (4.42 and 4.47 ppm). Gal-H1 bears NOEs to numerous assigned nuclei, in addition to previously-undetected nuclei at 3.72 and 3.78 ppm; believed to represent Gal-H5 and H6. B) A section of the DEPT-135 spectrum of lacto-N-biose, in which CH2 groups appear anti-phase with respect to CH and CH3 groups, enabling identification of C6 resonances, at ~61 ppm. C) C6 resonances can be used to probe the 1H-13C HSQC spectrum of lacto-N-biose for J-coupled H6 resonances. GlcNAc(α/β)-H6 is already assigned, hence Gal-H6 can be easily identified at 3.78 ppm.

1H chemical shift (ppm) Nucleus Gal GlcNAc-α GlcNAc-β

H1 4.42/4.47 5.18 4.75 H2 3.53 4.07 3.81 H3 3.66 3.94 3.55 H4 3.93 3.60 3.50 H5 3.72 3.89 3.92 H6 3.78 3.84 3.80

Table 6.4: 1H chemical shift assignments of lacto-N-biose.

200

6.4.4. Completed iNOE assignments

Armed with the chemical shift assignments of ligand-bound TgMIC4-A5 tyrosine Hδ and Hε nuclei (table 6.3) and lacto-N-biose (table 6.4), it proved possible to unambiguously assign all seven intermolecular NOEs, as summarised in table 6.5. These were implemented as distance restraints for the structure calculation of the TgMIC4-A5/lacto-N-biose complex.

TgMIC4-A5 Lacto-N-biose iNOE 1 13 1 H (ppm) CJ (ppm) Assignment H (ppm) Assignment

#1 3.93 Gal-H4 7.21 134.6 Y58ε #2 3.66 Gal-H3 #3 3.78 Gal-H6 6.92 133.8 Y69δ #4 2.04 GlcNAc-CH3 #5 3.78 Gal-H6 6.78 116.8 Y69ε #6 2.04 GlcNAc-CH3 #7 6.52 116.8 Y67ε 3.53 Gal-H2

Table 6.5: Intramolecular NOE assignments. A complete list of assigned iNOEs.

The observance of iNOEs involving Y58, Y67 and Y69 is consistent with the chemical shift-mapped structure of TgMIC4-A5 (see figure 5.10), in which these residues encircle the putative binding pocket. In terms of lacto-N-biose, each correlated ring proton resonance denotes a galactose proton, suggesting that this saccharide unit sits within the binding pocket, as suggested previously (chapter 5.4.1.2).

6.4.5. TgMIC4-A5/lacto-N-biose structure calculation using HADDOCK

Conventionally, solution structure determination of a protein/ligand complex first requires complete chemical shift assignment of the molecules in their bound-states. This facilitates iNOEs assignment, but also enables re-calculation of molecular structures in the context of the complex (e.g. using ARIA); important in cases where bound-state structures are significantly altered from the free-state. However, protein structures are often largely unperturbed upon binding by carbohydrate ligands (Toone 1994; Moothoo et al. 1998; Deane et al. 2011), due to their relatively small interfaces and tendency to bind

201 shallow grooves on protein surfaces. Furthermore, it proved possible to assign the TgMIC4-A5/lacto-N- biose intermolecular NOEs merely via chemical shift re-assignment of TgMIC4-A5 aromatic nuclei. Consequently, re-assignment and structure re-calculation of ligand-bound TgMIC4-A5 appeared superfluous. Instead, lacto-N-biose was docked onto the previously-calculated solution structure of TgMIC4-A5 using HADDOCK, guided by ambiguous interaction restraints (AIRs) along with unambiguous distance restraints derived from the intermolecular NOEs.

6.4.5.1 The calculated structure of the TgMIC4-A5/lacto-N-biose complex

An ensemble of ten TgMIC4-A5/lacto-N-biose structures, comprising the most-favourable statistics as calculated by HADDOCK, is pictured in figure 6.10, alongside stick-model and molecular-surface depictions of a representative structure (i.e. the structure closest to the average). A summary of structure statistics for this ensemble is provided in table 6.6. The structure ensemble has been submitted to the Protein Data Bank (Berman et al. 2000) under the accession code 2LL4. Consistent with previous speculation, HADDOCK positions the galactose saccharide within the cavity on the surface of TgMIC4-A5, whilst the preceding GlcNAc saccharide spans the protein surface. The interaction appears to driven by the commonly-observed ring-stacking and hydrogen-bonding forces (see chapter 5.4.2.4).

Energies (kcal mol-1)

Evdw -20.32 ± 1.79

Eelectostatic -68.09 ± 12.66

Edesolvation 5.46 ± 3.48

EAIR 30.71 ± 3.31

Einterface -57.71 ± 13.70

Enon-bonded -88.42 ± 11.43

Ebinding -474.21 ± 51.05 Buried surface area (Å2) 499.627 ± 13.74 HADDOCK score -25.42 ± 3.45 RMS deviation (Å) 0.67 ± 0.45

Table 6.6: a summary of statistics for the TgMIC4-A5/lacti-N-biose structure ensemble. RMS deviations and energies are expressed as mean averages (± standard deviation) across the best-ranked structure cluster.

202

Figure 6.10: the TgMIC4-A5/lacto-N-biose structure ensemble. A) The ten TgMIC4-A5/lacto-N-biose structures from the most favourable cluster generated by HADDOCK. The structures converge with a RMS deviation of 0.67 ± 0.45 Å. In the right-hand panel, the lacto-N-biose structures have been extracted and expanded, better demonstrating the degree of variability across the calculated structures. B) A representative TgMIC4-A5/lacto-N- biose structure. In the left-hand panel, chemical shift perturbation data is mapped onto the molecular surface of TgMIC4-A5. In the right-hand panel the molecular surface is partially transparent, revealing the positions the relevant amino acid side-chains.

203

6.4.5.2. Agreement with intermolecular NOEs

The calculated structure is in good overall agreement with the iNOE-derived distance restraints (figure 6.11). Both Gal-H3 and -H4 point towards the aromatic ring of Y58, whilst Gal-H2 and Gal-H6 are respectively proximate to Y67 and Y69, ensuring that the corresponding distance restraints are satisfied.

Figure 6.11: TgMIC4-A5/lacto-N-biose structure agreement with experimental restraints. A) Demonstrating the inter-nuclear separations of Y58-Hε with Gal-H3/H4 and Y67Hε with Gal-H2. All are within 5 Å and therefore satisfy the corresponding intermolecular NOEs. B) Demonstrating the inter-nuclear separations of Y67-Hδ/Hε with Gal-H6 and GlcNAc-CH3.

Whilst GlcNAc-CH3 and Y69-Hε nuclei are sufficiently proximate to satisfy the corresponding restraint,

GlcNAc-CH3 and Y69-Hδ nuclei are positioned no closer than 5.4 Å (figure 6.11B), causing a violation with respect to the upper distance bound of 5 Å. This upper bound was selected as it represents the approximate NOE detection limit, however it is possible to observe NOEs beyond this limit if they involve equivalent, rotationally-exchanging nuclei. In such instances the individual contributions of equivalent nuclei give rise to an ‘effective distance’ (reff) which is less than actual distance of the most proximate nucleus (Fletcher et al. 1996). Here, it is likely that the presence of site-exchangeable methyl and aromatic hydrogen nuclei enhances the NOE to a detectable level. In fact, since each NOE involves an aromatic hydrogen, this effect is likely to enhance signal intensity for each of the observed NOEs.

204

6.4.5.3. Validation of the TgMIC4-A5/lacto-N-biose model

The implementation of unambiguous intra-molecular NOEs improves the accuracy of HADDOCK structure generation compared to the sole use of AIRs. However, it is also arguable that these seven NOEs are not sufficient to comprehensively characterise the complex. For example, the calculation does not incorporate any data (i.e. trNOEs, glycosidic torsion angles) describing the bound-state conformation of lacto-N-biose. Consequently this is dictated entirely by the force-fields and energy-minimisation protocols of HADDOCK, guided by limited distance restraints.

Although only a single initial lacto-N-biose conformer was provided for docking, the flexibility conferred upon the molecule enables a range of conformers to be sampled during the calculation. Indeed, comparison of the lacto-N-biose before and after docking demonstrates that significant conformational changes occur (figure 6.12A). However, whilst the docked structure represents an energetically- favourable state, it may not precisely represent the native bound-state. Whilst the galactose ring is relatively well-restrained, the relative orientation of the GlcNAc saccharide may be inaccurate due to a lack of restraints involving its ring atoms.

As for proteins, the validity of carbohydrate structures can be assessed based upon their glycosidic bond torsion angles. A Ramachandran-type plot of allowed φ/ψ angles for a β1,3 glycosidic linkage between galactose and N-acetylated glucose, downloaded from GlycoMapsDB (Frank et al. 2007), is provided in figure 6.12B. Mapped on to this are the φ/ψ values of the 44 Galβ1,3GlcNAc disaccharide fragments contained within the Protein Data Bank, retrieved using GlyTorsion (Lütteke et al. 2005). The angles within the lacto-N-biose structure before and after docking are also indicated; both lie within the ‘allowed’ regions of the plot. Although the docked structure adopts a higher-energy conformation, it has been noted previously that this is often the case for a protein-bound oligosaccharide (Homans 1993).

The calculated structure of ligand-bound TgMIC4-A5 is only nominally altered from the average solution structure of free TgMIC4-A5, superimposing with an RMS deviation of 0.76 Å (over 797 atoms). Nonetheless validation of the ligand-bound structure was carried out, via submission to the Protein Structure Validation Suite (PSVS) (Bhattachayra et al. 2007), reporting no major concerns. Due to strong similarities with that of TgMIC4-A5 solution structure (presented in chapter 4.4.2), validation data for ligand-bound TgMIC4-A5 is not described here.

205

Figure 6.12: analysis of lacto-N-biose glycosidic torsion angles. A) Comparison of the φ and ψ angles of lacto-N- biose prior to (blue) and following (green) docking, demonstrating that HADDOCK can sample multiple conformations and therefore is not biased by the starting structure. B) a Ramachandran-type plot, depicting the ‘allowed’ φ/ψ values for a Galβ1,3GlcNAc (i.e. lacto-N-biose) disaccharide. Non-white regions of the plot denote ‘allowed’ values, and are coloured via a green-to-pink gradient according to energy. The φ/ψ values of the 44 Galβ1,3GlcNAc fragments from structures in the protein data bank are mapped onto the plot (denoted by +’s) alongside the φ/ψ values of lacto-N-biose before (blue circle) and after (dark green circle) docking. This plot downloaded from GlycoMapsDB (Frank et al. 2007).

206

6.4.6. Mutagenesis studies.

6.4.6.1. Selection of residues for mutation.

Studies of the mechanism of ligand-binding by TgMIC4-A5 have so far highlighted six residues which may be critical; K19, N51, Y58, K60, Y67 and Y69 (figure 5.10 & 6.10). It would be of interest to mutate each of these residues individually and in various combinations, to observe the consequences upon ligand binding. In some instances there are several different mutations which might provide valuable findings; for example, mutation of Y67 to phenylalanine or leucine would respectively reveal the contributions of the tyrosine hydroxyl group and aromatic ring.

Due to constraints of time and resources it was not possible to create and assess each desired permutation. Instead, three individual mutations were selected; K19A, to assess whether the presence of the large protruding side-chain is important for ligand binding; K60M, to assess the contribution of the side-chain amine at the base of the binding pocket; and Y69L, to assess the contribution of the aromatic ring. As detailed in chapter 5.5.2, many of the TgMIC4-A5 residues which appear to mediate TgMIC4-A5 adhesivity are conserved or similar in A1, with the exception of Y/L substitution in residue 69 (see figure 5.17A). It was therefore postulated that such a mutation would significantly impair ligand binding. The position of each residue relative to docked lacto-N-biose is evident in figure 6.10B.

6.4.6.2. Production of TgMIC4-A5 mutant samples.

Expression and purification of 15N-labelled proteins was carried out as for wild-type (wt) TgMIC4-A5, with no notable differences in terms of the behaviour of the protein (figures 6.13A, 6.15A and 6.17A).

6.4.6.3. NMR analysis of protein structure and adhesivity.

1H-15N HSQC spectra were recorded in order to verify the folded state of each protein (figures 6.13B, 6.15B and 6.17B). This also enabled broad assessment of the effect of each mutation upon TgMIC4-A5 structure, thereby verifying whether any resulting loss of function could merely be due to perturbation of the ligand binding site. Since the 1H-15N HSQC experiment only reports on the chemical shifts of amide nuclei, its application in assessing the wider structural differences between TgMIC4-A5 mutants must be

207 tempered with caution. NMR spectroscopy is a highly sensitive technique, in which very minor changes in chemical environment can induce large chemical shift changes. Nonetheless, 1H-15N HSQC spectra were used as a broad marker of structure integrity.

1H-15N HSQC-monitored titrations of TgMIC4-A5 mutants with lacto-N-biose were also carried out in order to assess the effect of each mutation upon ligand binding. Occurrences of enthalpy-entropy compensation, where a decrease in binding enthalpy (i.e. affinity) is compensated by a decrease in entropy (Richieri et al. 1997), were therefore not observed. For this reason ITC would provide a more thorough analysis, however this has not been carried out due to constraints of time and resources.

6.4.6.3.1. TgMIC4-A5K19A retains full adhesive function.

1 15 The H- N HSQC spectrum of TgMIC4-A5K19A bears several notable differences to that of TgMIC4-A5wt (figure 6.13B). These are localised to the mutation site and proximate residues; expected consequences of the changes in local chemical environment. Otherwise the majority of peaks overlap (at least partially) with those of TgMIC4-A5wt, indicating that the protein retains the wild-type structure with an intact ligand-binding site. Indeed, NMR titration with lacto-N-biose demonstrated that ligand-binding activity is retained (figure 6.14). As for TgMIC4-A5wt, the interaction occurs in intermediate-exchange, with saturation achieved upon addition of five molar equivalents of lacto-N-biose. This suggests that K19A mutation has no effect on ligand-binding, although enthalpy-entropy compensation cannot be ruled out.

6.4.6.3.2. TgMIC4-A5K60M exhibits abrogated adhesive function.

In contrast to TgMIC4-A5K19A, TgMIC4-A5K60M is totally deficient for lacto-N-biose binding (figure 6.16). 1 15 However it must be noted that the H- N HSQC spectrum of TgMIC4-A5K60M deviates rather substantially from that of TgMIC4-A5wt (figure 6.15B). This is to be expected considering that the mutated residue is largely buried, resulting in more disseminated changes in local chemical environment. Residues which are most distant from the mutated site are largely unaffected, suggesting that the protein retains the overall TgMIC4-A5wt structure. However, more discrete distortion of the ligand-binding site cannot be confidently ruled out, hence the apparent importance of residue K60 must considered in context.

208

15 Figure 6.13: expression, purification and NMR analysis of N-TgMIC4-A5K19A. A) SDS-PAGE analysis of TgMIC4-

A5K19A expression and purification. M = molecular weight markers. The relevant protein is indicated by an arrow at each stage. i) Expression of thioredoxin-fused TgMIC4-A5K19A. Lane 1 = whole cell lysate prior to IPTG-induction; lanes 2 & 3 = insoluble and soluble fractions following IPTG-induction. ii) Nickel-ion affinity chromatography. Lane 1 = cell lysate; lane 2 = flow-through; lane 3 = 20 mM imidazole wash; lane 4 = elution in 250 mM imidazole. iii) TEV cleavage and protein purification. Lane 1 = prior to TEV addition; lane 2 = after 18 hours incubation with TEV; lane 2+ 3 = Ni -column flow-through; lane 4 = column-bound protein. iv) Gel purified TgMIC4-A5K19A; the NMR sample. B) 1 15 Overlaid H- N HSQC spectra of TgMIC4-A5wt (black) and TgMIC4-A5K19A (red), demonstrating the consequences of mutation upon amide resonances. TgMIC4-A5wt resonances which are significantly altered are annotated.

209

1 15 Figure 6.14: NMR titration of TgMIC4-A5K19A and lacto-N-biose. Superposed H- N HSQC spectra of TgMIC4-A5K19A in the presence of 0 (black), 1 (green), 3 (blue), 4 (magenta) and 5 (red) molar equivalents of lacto-N-biose, at which point the protein was saturated. The interaction occurs in intermediate exchange, with most resonances undergoing broadening, often beyond detection, at pre-saturating lacto-N-biose concentrations.

210

15 Figure 6.15: expression, purification and NMR analysis of N-TgMIC4-A5K60M. A) SDS-PAGE analysis of TgMIC4-

A5K60M expression and purification. M = molecular weight markers. The relevant protein is indicated by an arrow at each stage. i) Expression of thioredoxin-fused TgMIC4-A5K19A. Lane 1 = whole cell lysate prior to IPTG-induction; lanes 2 & 3 = insoluble and soluble fractions following IPTG-induction. ii) Nickel-ion affinity chromatography. Lane 1 = cell lysate; lane 2 = flow-through; lane 3 = 20 mM imidazole wash; lane 4 = elution in 250 mM imidazole. iii) TEV cleavage and protein purification. Lane 1 = prior to TEV addition; lane 2 = after 18 hours incubation with TEV; lane 3 = Ni2+-column flow-through; lane 4 = column-bound protein. iv) Gel purified TgMIC4-A5_K60M; the NMR sample. 1 15 B) Overlaid H- N HSQC spectra of TgMIC4-A5wt (black) and TgMIC4-A5K60M (green), demonstrating the consequences of mutation upon amide chemical shifts.

211

1 15 Figure 6.16: NMR titration of TgMIC4-A5K60M and lacto-N-biose. Superposed H- N HSQC spectra of TgMIC4-

A5K60M in the presence of 0 (black) and 15 (red) molar equivalents (Meq) of lacto-N-biose. The spectra indicate that the protein to disaccharide do not interact.

212

6.4.6.3.3. TgMIC4-A5Y69L exhibits diminished adhesive function.

1 15 Like TgMIC4-A5K19A, the H- N HSQC spectrum of TgMIC4-A5Y69L exhibits only a small number of notable differences to that TgMIC4-A5wt, limited to residues near the mutated site (figure 6.17B). This suggests that the protein retains wild-type structure and a relatively intact binding-site. The titration of TgMIC4-

A5Y69L with lacto-N-biose demonstrates that binding to lacto-N-biose is significantly impaired (figure 6.18A). The protein saturates at 30 molar equivalents of ligand and the interaction occurs in fast- exchange, enabling calculation of a dissociation constant of 1.64 x 10-3 (± 7.5 x 10-5) M (figure 6.18B). The weak nature of this interaction suggests that residue Y69 plays an important role in binding of lacto- N-biose. The position of its side-chain in the calculated structure of TgMIC4-A5/lacto-N-biose suggests that it may contact both the Gal and GlcNAc saccharide units (figure 6.10).

213

Figure 6.17: expression, purification and NMR analysis of TgMIC4-A5Y69L. A) SDS-PAGE analysis of TgMIC4-A5Y69L expression and purification. M = molecular weight markers. The relevant protein is indicated by an arrow at each stage. i) Expression of thioredoxin-fused TgMIC4-A5K19A. Lane 1 = whole cell lysate prior to IPTG-induction; lanes 2 & 3 = insoluble and soluble fractions following IPTG-induction. ii) Nickel-ion affinity chromatography. Lane 1 = cell lysate; lane 2 = flow-through; lane 3 = 20 mM imidazole wash; lane 4 = elution in 250 mM imidazole. iii) TEV cleavage and protein purification. Lane 1 = prior to TEV addition; lane 2 = after 18 hours incubation with TEV; lane 2+ 3 = Ni -column flow-through; lane 4 = column-bound protein. iv) Gel purified TgMIC4-A5Y69L; the NMR sample. B) 1 15 Overlaid H- N HSQC spectra of TgMIC4-A5wt (black) and TgMIC4-A5Y69L (blue), demonstrating the consequences of mutation upon amide resonances. TgMIC4-A5wt resonances which are significantly altered are annotated.

214

1 15 Figure 6.18: NMR titration of TgMIC4-A5Y69L and lacto-N-biose. A) Superposed H- N HSQC spectra of TgMIC4-

A5Y69L in the presence of 0 (black), 2 (green), 5 (blue), 10 (magenta), 20 (cyan) and 30 (red) molar equivalents (Meq) of lacto-N-biose, at which point the protein was saturated. The interaction occurs in primarily in fast- exchange, with most perturbed resonances undergoing clear incremental shift changes with increasing lacto-N- biose concentration. B) Superimposed plots of Δδn vs lacto-N-biose concentration for five TgMIC4-A5Y69L residues. -3 Curves were fitted via least squares refinement, obtaining a Kd of ~1.61 x 10 M. The experiment was carried out at a protein concentration of ~200 µM.

215

6.5. Discussion.

6.5.1. A possible mechanism of galactose binding and discrimination by TgMIC4-A5.

Based on the calculated model of the TgMIC4-A5/lacto-N-biose complex, the mechanism of galactose- recognition by TgMIC4-A5 appears closely-related to that of other galactose-specific lectins (e.g. Galectins), where a conserved aromatic residue (usually a tryptophan) stacks against the non-polar face of the galactopyranose (comprised of H1 and H3-H5) (Nesmelova et al. 2008). This hydrophobicity- mediated association is stabilised by CH-π interactions, of comparable strength to a hydrogen-bond (Sujatha et al. 2005). In the TgMIC4-A5/lacto-N-biose structure, the relative orientations of the galactopyranose ring of lacto-biose and TgMIC4-A5 residue Y58 suggest that ligand recognition occurs via this mechanism (figure 6.19). The aromatic ring of Y69 residue may also partake in stacking against H6 atoms (to which it bears intermolecular NOEs).

Figure 6.19: a potential mechanism of galactose recognition by TgMIC4-A5. Structural comparison of TgMIC4- A5/lacto-N-biose (green) and human galectin-9/N-acetyllactosamine (magenta) (PDB accession number 3NV2) (magenta) (Yoshida et al., 2010). Galactose H1, H3, H4 and H5 atoms form a contiguous hydrophobic surface which stacks against an aromatic ring (i.e. tyrosine/tryptophan). The interactions are stabilised by hydrogen-bonds; potential H-bonded atom pairs are indicated by dashed lines. To aid clarity, in both structures the saccharide unit preceding galactose is omitted, as are hydrogen atoms (with the exception of Gal-H1 & H3-H5).

216

The strength of the Galectin/galactose interaction is enhanced by hydrogen-bonds involving the ring hydroxyl groups. C4-OH and C6-OH are co-ordinated by conserved arginine and asparagine residues encircling the binding pocket. TgMIC4-A5 bears Y67 (perpendicular to galactose, rather than stacking) and N51 in analogous positions, whilst the sidechain amine of K60 also protrudes into the binding pocket, and may hydrogen-bond with the C3-OH group of galactose. TgMIC4-A5 also contains residue K19 in a similar to position to an arginine residue of galectin-9. This residue may form contacts with the ligand or may simply perform a structural role, forming a steep wall to the side of the binding pocket.

The ability of TgMIC4-A5 and Galectin-like proteins to distinguish galactose derives from the axial orientation of its C4-OH group, which confers extended non-polarity on the opposing face. In the case of glucose, mannose and fucose, C4-OH is equatorially-oriented, preventing hydrogen-bond formation and disrupting hydrophobic surface such that adherence to binding pockets of this nature is not possible (Nesmelova et al. 2008).

6.5.2. The contribution of saccharide units preceding galactose.

As previously discussed, the amide chemical shift perturbations induced by galactose monosaccharide and galactose-terminated di- and penta-saccharides are largely conserved, raising questions over the contributions of preceding saccharides (see chapter 5.4.1.2). The calculated TgMIC4-A5/lacto-N-biose structure supports the proposed model, where galactose is intimately bound (i.e. within surface pocket) whilst preceding units provide stabilising contacts. The 13C-edited NOESY spectrum detects an NOE between Y69 and GlcNAc, demonstrating that GlcNAc adopts a rigid conformation in close proximity to the protein surface. Additionally, the glucopyranose ring is oriented such that H1, H3, H5 and H6 sit above the aromatic ring of Y67 (figure 6.10B), suggesting that ring stacking interactions may occur.

It is also interesting to examine the electrostatic surface of the calculated structure, and the presence of several substantial patches of positive charge surrounding the docked ligand (figure 6.20). This may lead to stabilisation of interactions with negatively-charged (e.g. sialylated) receptors, such as GM1, as suggested by the carbohydrate microarray data (see chapter 5.2.2.2). This is consistent with the ability of TgMIC4-A5 to displace TgMIC1-NT from GM1-penta (see chapter 5.4.4).

217

Figure 6.20: the electrostatic surface of ligand-docked TgMIC4-A5. Regions of negative, neutral and positive charge are via a red-to-white-to-blue gradient. The docked disaccharide is encircled by several regions of positive charge, potentiating recruitment of a sialic acid moiety.

6.5.3. Discrimination of β1,3 and β1,4-linked Gal/GlcNAc.

An interesting feature to arise from the initial carbohydrate binding studies of TgMIC4-A5 is its ability to distinguish β1,3 and β1,4-linked Gal/GlcNAc disaccharides. Calculation of a structure of the TgMIC4- A5/lacto-N-biose complex enables rationalisation of this observation. Should N-acetyllactosamine (Gal β1,4GlcNAc) bind TgMIC4-A5 in the same orientation as lacto-N-biose (Gal β1,3GlcNAc) then the relative orientation of the GlcNAc saccharide would naturally be inverted (figure 6.21). The hydrophobic surface of GlcNAc (comprised of H1, H3, H5 and H6 atoms) would therefore point away from the aromatic ring of Y67, preventing ring stacking from occurring, conferring a weaker interaction.

218

Figure 6.21: Alignment of N-acetyl-D-lactosamine and docked lacto-N-biose. Demonstrating the orientation of the GlcNAc saccharide in N-acetyllactosamine (purple) with respect to lacto-N-biose (green). The inverted orientation would result in a loss of hydrophobicity-mediated ring stacking interactions with Y67.

6.5.4. Comparison with the crystal structure of a SML-2/galactose complex.

As previously mentioned, the TgMIC4 orthologue Sarcocystis muris lectin (SML-2) has known affinity for galactose (Klein et al. 1998). Its single complete apple domain bears 43% sequence identity to TgMIC4- A5, prompting comparative discussion upon discovery of the key galactose-binding residues of TgMIC4- A5, concluding that the proteins are likely to share a mode of galactose binding (chapter 5.5.2).

Following the completion of these studies, the crystal structure of SML-2 in complex with 1-thio-β-D- galactose at 2.1 Å was published (Muller et al. 2011). Overall, the general mode of galactose binding by SML-2 closely resembles that of TgMIC4-A5. The protein structures converge with an RMS deviation of 0.8 Å (across 53 Cα atoms), and the positions of the key residues lining the binding pocket of TgMIC4-A5 are particularly well-conserved in SML-2, giving rise to an equivalent binding pocket within which the sulphated galactose lies (figure 6.22). The saccharide ring stacks against a histidine side-chain (equivalent to Y58) and is stabilised by H-bonds; akin to the mode of galactose recognition by Galectins.

219

Figure 6.22: structural comparison of galactose-binding by SML-2 and TgMIC4-A5. The TgMIC4-A5/lacto-N-biose structure superimposed with the crystal structure of SML-2 in complex with 1-thio-β-D-galactose (PDB ID 2YIO). The positions of the residues encircling the binding pocket are closely-conserved, however the precise orientation of the respective galactose saccharides differs.

However, the precise orientation of the galactose saccharide differs slightly between the structures, with the SML-2-bound ligand sitting marginally deeper within the pocket and rotated counter-clockwise by approximately 40o. Given the credibility associated with a crystal structure at this resolution, this may appear to expose to the limitations of the calculated TgMIC4-A5/lacto-N-biose structure discussed in the previous section. However, it is worth noting that the SML-2 ligand incorporates a sulphur group at position C1, which may induce a subtle re-orientation of the saccharide compared to non-derivatised galactose. Furthermore, the accommodation of the preceding GlcNAc saccharide of lacto-N-biose may also require reorganisation. Indeed, the relative counter-clockwise rotation of the saccharide in the SML-2/1-thio-β-D-galactose means that any preceding saccharide would be rotated away from residue Y69 (to which GlcNAc bears an NOE in the TgMIC4-A5/lacto-N-biose complex).

6.5.5. Concluding remarks.

This chapter describes the molecular docking of TgMIC4-A5 and lacto-N-biose driven by NMR data, namely chemical shift perturbations and intermolecular NOEs. Whilst the credibility of such a structure

220 may not be equal to that of an X-ray structure or a conventionally-determined solution structure, the application of unambiguous distance restraints, to which the resulting structure is in good agreement, enhances the credibility compared to ambiguity-driven docking procedures. Nonetheless, the limitations of the implemented protocol are potentially exposed by the very recent crystal structure of the orthologous SML-2 in complex with sulphated galactose, where the orientation of galactose differs noticeably from that within lacto-N-biose. However, it is possible that these orientational differences are caused, or at least exacerbated, by the chemical modification of thiol-substituted galactose bound to SML-2, and the presence of a preceding saccharide in lacto-N-biose.

Irrespective of the precise ligand orientation, the overall mode of galactose recognition by TgMIC4-A5 and SML-2 is consistent. The hydrophobic face of the galactopyranose ring stacks against a conserved aromatic residue, and is reinforced by numerous hydrogen-bonds. This is analogous to the mode of Galectins and other galactose-specific lectins, but represents yet another new binding mode for the apple/PAN domain family (i.e. compared to the interfaces depicted in figure 1.9).

The calculated structure of TgMIC4-A5/lacto-N-biose is consistent with the NMR titration data; suggesting that saccharide units preceding galactose (i.e. N-acetylglucosamine in the case of lacto-N- biose) establish intermolecular contacts across the protein surface and thereby enhance binding affinity. This has further enabled rationalisation of the ability of TgMIC4-A5 to distinguish β1,3 and β1,4-linked galactose. Although the lacto-N-biose disaccharide may not represent a cognate in vivo receptor for TgMIC4-A5, the calculated structure of the complex provides insight into potentially more relevant ligands (e.g. GM1-penta) since the mode of galactose recognition appears to be conserved between ligands. Furthermore, the surface charge distribution of the calculated structure provides evidence supporting a charge-mediated interaction between TgMIC4-A5 and negatively-charged branched-chain oligosaccharides (i.e. gangliosides), as suggested by the carbohydrate microarray data.

To summarise, the work presented provides detailed insights into the mode of galactosated receptor recognition by TgMIC4. The biological significance of these findings is discussed in chapter 7.

221

Chapter 7: Conclusions & future perspectives.

222

7.1. Summarising the biological problem and project aims

Toxoplasma gondii is an Apicomplexan parasite with the unique ability to invade nearly all nucleated cell types (Carruthers 1999). This enables infection of a broad range of host organisms, including humans, giving rise to the disease ‘toxoplasmosis’. Infection is usually asymptomatic in healthy individuals, however the parasite is capable of forming dormant cysts within infected tissues, maintaining infection for the lifespan of the host. Encysted parasites often reawake in immune-compromised individuals, with potentially-fatal consequences, whilst a new infection can causes birth defects in pregnant females. Additionally, due to its amenability to laboratory manipulation, T. gondii is widely considered to be a model organism for the study of the Apicomplexa (Roos et al. 1999), which includes several notable human and vetinary pathogenic genera, namely Eimeria, Neospora, Cryptosporidium and Plasmodium.

Apicomplexan parasites contain several phylum-specific features which drive host cell invasion (Carruthers & Boothroyd 2007), such as the glideosome; a unique locomotive system, and the moving junction; a tight association between host and parasite membranes, comprised of secreted proteins from micronemes and rhoptries; organelles containing an array of adhesive proteins which mediate host cell attachment and engulfment of the parasite. Together these features enable Apicomplexan parasites to forcibly invade host cell cells, thereby avoiding degradation via lysosome fusion inside the host cell, and hence they represent viable targets for drug therapy.

Microneme proteins (MICs) are secreted at the onset of invasion and form a strong apical attachment to host cells, providing a basis for parasite internalisation. MICs are modular proteins commonly containing repeating sequences with homology to adhesive eukaryotic protein (Tomley & Soldati 2001). Some MICs have been shown to form multimeric host-adhesive complexes (Carruthers & Tomley 2008), the assembly and functions of which are becoming increasingly well characterised. For example, several molecular interactions underlying the MIC1/4/6 complex in T. gondii have now been characterised (figure 1.13), including atomic resolution insight into sialylated oligosaccharide (i.e. putative host cell receptor) binding by TgMIC1. Whilst TgMIC6 functions only as an intracellular trafficking determinant and a scaffold protein on the parasite surface (Reiss et al. 2001), host cell binding activity has also been localised to the TgMIC4 C-terminus (Brecht et al. 2001). However, the cognate in vivo receptor(s) for TgMIC4, its precise mode of attachment and the functional significance of the interaction remain undiscovered; gaps in our understanding of host cell invasion by T. gondii which this study has aimed to

223 fill. Since close orthologues of TgMIC4 have been identified in several other Apicomplexan parasites, it was further hoped that functional characterisation of TgMIC4 would provide insight into host cell invasion by a broader range of organisms.

7.2. Summarising the findings of these studies

Prior to the onset of these studies, the C-terminus of TgMIC4 had been recombinantly produced in our laboratory. This yielded a partially-folded protein with galactose-binding activity, from which limited proteolysis yielded an unidentified stable fragment with retained lectin activity. Through proteomic analyses these studies have identified the stable component to be TgMIC4-A5 (chapter 3). This prompted production of recombinant TgMIC4-A5 in isolation, determination of its solution structure and detailed characterisation of binding to a range of galactosylated saccharides via carbohydrate microarray and NMR methods. Studies have culminated in the calculation of a structure for the TgMIC4- A5/lacto-N-biose complex via data-driven modelling, yielding a structure in which the mode of galactose adhesion appears akin to that of Galectin-family proteins and other galactose-specific lectins. The microarray and NMR data suggest that galactose recognition is conserved for all identified TgMIC4-A5 ligands, and that preceding saccharide units enhance affinity via specific protein contacts.

The calculated structure is broadly consistent with the recently published structure of TgMIC4 orthologue Sarcocystis muris SML-2 in complex with a thiolated galactose monosaccharide. This suggests that, as expected, the mode of galactose recognition is conserved across TgMIC4-related Apicomplexan microneme proteins.

7.3. The in vivo role of TgMIC4

7.3.1. Binding to ganglioside receptors

These studies have identified several gangliosides as in vitro receptors for TgMIC4-A5. Gangliosides are glycosphingolipids; lipid-carbohydrate conjugates, which are characterised by the presence of varying numbers of sialic acid saccharides. They often contain branched-chain saccharides, and commonly sport galactosyl/sialyl (e.g. GM1, GD1b) or sialyl/sialyl (e.g. GT1b, GQ1b) termini (Lopez & Schnaar 2009).

224

TgMIC4-A5 binding to both GM1 and GD1b has been demonstrated in these studies (table 5.1), and the interaction of the pentasaccharide fragment of GM1 has been probed in some detail.

Insertion of the ceramide tails of gangliosides into the plasma membrane presents the oligosaccharide portions on the cell surface , where they serve as receptors modulating function cell-cell recognition and immune and metabolic functions (Lopez & Schnaar 2009). Gangliosides are present on the surfaces of all vertebrate cells, but are particularly prevalent on nerve cell surfaces. Since the central nervous system is a common site of T. gondii encystation, it is probable that terminally galactosylated gangliosides such as GM1 represent cognate in vivo receptors for TgMIC4. This correlation has been previously noted in the context of TgMIC1 activity (Blumenschein et al. 2007), the affinity of which for sialic acid also governs binding to an array of gangliosides (including GM1, as demonstrated in these studies).

There are numerous examples in nature of microbial proteins binding to glycoconjugate receptors, including gangliosides (reviewed by Imberty & Varrot 2008). For example, a secreted toxin from Clostridium tetani binds to GT1b on the surface of neuronal cells, ultimately giving rise to tetanus (Fotinou et al. 2001). The putative TgMIC4 receptor GM1 has also been previously identified as a receptor for several microbial proteins. The secreted toxin of Vibrio cholerae binds to GM1 on epithelial cell surfaces in the small intestinal wall leading causing uptake into the cell where it modulates plasma membrane permeability, ultimately giving rise to the symptoms of cholera (Holmgren et al. 1975). Structural characterisation by X-ray crystallography using GM1-penta reveals a symmetrical pentameric arrangement of toxin molecules, with the ligand bound at each subunit interface (Merritt et al. 1994). The mode of galactose recognition is analogous to that of Galectin-family proteins and TgMIC4-like proteins, involving CH/π stacking interactions and H-bonds, whilst the protein also makes substantial contacts with the sialic acid saccharide, as is expected of TgMIC4-A5 (figure 7.1A).

The major capsid protein VP1 from Simian virus 40 (SV40) has also been found to adhere to GM1, leading to virion uptake and successful infection, which has been implicated as a cause of cancer in animals (Eddy et al. 1962; Tsai et al., 2003). This interaction has also been structural characterised by X- ray crystallography (Neu et al. 2008). Like cholera toxin, the protein forms a pentamer and binds the ligand at the junction between subunits. However at an atomic level the ligand-binding site of VP1 is distinct from that of cholera toxin, with the terminal galactose lying proximate to the protein surface but not co-ordinated by an aromatic group as in galactose-specific lectins (figure 7.1B). Instead it appears

225 that the sialic acid saccharide of GM1 is the primarily recognised saccharide in this instance, as is the case for TgMIC1. This is in evidence by the increased affinity of VP1 for GM1 bearing a N-glycolylated, rather than N-acetylated, sialic acid.

Figure 7.1: the crystal structures of pathogenic proteins in complex with GM1-penta. A) Cholera toxin/GM1. The galactose saccharide of GM1-penta (white) stacks against a tryptophan indole ring of a toxin subunit (brown), whilst the sialic acid protrudes towards the interfacewith a second subunit (pale green). B) SV40 VP1/GM1. Once again the ligand binds at the interface between two subunits (coloured pink and cyan), but in a different orientation to cholera toxin; sialic acid protrudes into a binding pocket whilst galactose spans the protein surface and does not stack against an aromatic group.

Thus it clear that GM1 is diverse receptor for microbial proteins, recruiting proteins via distinct epitopes, and capable of mediating internalisation of an invading pathogen (SV40) or a single secreted protein for modulation of host cellular properties (cholera toxin). The consequences/significance of TgMIC4 binding to GM1 and other ganglioside receptors is currently unclear, and there are several possibilities.

7.3.2. Contribution to host-cell invasion

Although the ability of TgMIC4 to bind host cells has been clearly demonstrated (Brecht et al. 2001), the extent of its contribution to host cell attachment and invasion by T. gondii is currently not clear. It has been demonstrated that a mic1KO strain is 60% deficient in host cell invasion compared to wild-type,

226 however the dependence of TgMIC4 and TgMIC6 upon TgMIC1 for exit from the ER means that this deficiency results from a total loss of the entire TgMIC1/4/6 complex, hence the individual contributions of TgMIC1 and TgMIC4, if any, is unclear (Cerede et al. 2005). Attempts to resolve this have been made via the creation of a mic4KO strain (Friedrich et al. 2010). Initial studies demonstrated a 40% decline in invasion rate compared to wild-type, however repeat experiments have observed more nominal effects (Dominique Soldati-Favre, University of Geneva, personal communications).

Irrespective of these host cell invasion assays, it is evident from these studies that TgMIC4 contains a domain towards its C-terminus (A5) which is capable of binding to molecules which are widely present on host cell surfaces (gangliosides). It follows that the presence of such a domain is likely to lead to attachment to such receptors on host cell surfaces following the discharge of microneme proteins, and therefore a contribution to invasion. Of course it has also been demonstrated that the adhesive C- terminus is liberated from the parasite-tethered complex by TgSUB1 activity following secretion (Brecht et al. 2001; Lagal et al. 2010). Therefore the extent of the contribution of TgMIC4 to host cell invasion is likely to be dictated by the duration for which it remains intact, which is currently unclear.

It is possible that TgMIC4 proteolysis is regulated according to the dependence of the parasite upon galactose recognition for invasion. For example, in cell types which are rich in terminally galactosylated receptors, binding by TgMIC4-A5 may elicit a TgMIC4 conformation in which the TgSUB1 cleavage site is protected, whilst for cell types rich in terminally sialylated receptors, the lack of TgMIC4 binding may leave the protein prone to digestion. The TgSUB1 processing site is believed to be located within the extended linker region intersecting the TgMIC4-A34 and TgMIC4-A56 domain pairs, which would permit such conformational flexibility of the adhesive domain. Such a model would also be consistent with the previous suggestion that the host-adhesive of T. gondii proteins share a synergistic relationship, functioning in a cell type-dependant manner and underlying the broad cell tropism of the parasite (Cerede et al. 2005). Interestingly, a varying dependence upon sialic acid has been previously demonstrated for erythrocyte invasion by P. falciparum (Dolan et al. 1990).

It is noteworthy that the orthologous N. caninum MIC4 does not appear to associate with NcMIC1 (Keller et al. 2004), whilst a transmembrane binding partner for SML-2 has yet to be identified. It is possible therefore that these soluble galactose-binding proteins are secreted into the extracellular medium and do not contribute to host cell invasion; instead fulfilling a role which T. gondii could recover

227 via liberation of the adhesive fragment of TgMIC4. The lack of a mechanism for host cell invasion via galactose recognition in N. caninum and S. muris would help explain the reduced tissue and host tropism of these parasites compared to the remarkably promiscuous T. gondii.

7.3.3. A possible modulator of downstream events

The galactose-specific binding profile of TgMIC4-A5 raises similarities with Galectins, and structural analyses of ligand binding by TgMIC4-A5 and SML-2 suggest a similar mechanism of galactose recognition. Galectins are important regulators of the immune and inflammatory responses and apoptosis (Rabinovich & Gruppi 2005). It is therefore plausible that TgMIC4 proteolysis liberates a galectin ‘mimic’ (TgMIC4-A56) which serves to block Galectin function, potentially causing down- regulation of the immune response and ultimately promoting parasite propagation.

There are numerous examples in nature of host immune system manipulation by pathogens, including the inhibition of complement pathway components via expression of mimicking proteins in viral and protozoan organisms (reviewed in Cooper 1991). In Entamoeba histolytica this occurs via a surface galactose-specfic lectin (Braga et al. 1992). Various protozoan pathogens have been shown to disrupt Galectin activity; for example, host cell adhesion by Trypanosoma cruzi is enhanced by Galectin-3 (Kleshchenko et al. 2004), whilst a Leishmania major surface proteins recruits Galectin-3, leading to cleavage and inactivation (Pelletier & Sato 2002). Similarly, it has recently been demonstrated that glycosylphosphoinositols on surface of T. gondii recruit and induce degradation of galectin-3 (Niehus et al. 2012). This provides a precedent for the usurpation of the immune system during T. gondii infection.

Additionally, the moving junction of T. gondii has been found to act as a ‘molecular sieve’, controlling the entry of host cell membrane components into the PV membrane (Mordue et al. 1999). This structure permits the entry of numerous GPI-linked proteins and, intriguingly, GM1 has also been detected within the parasitophorus vacuole membrane. Thus it is possible that the adhesive TgMIC4 C-terminal fragment is carried through the moving junction via GM1, where it could modulate events governing parasite replication and egress. It is also interesting to note that a ‘MIC4-like’ protein is expressed and secreted from the dense granules during the sexual stages of parasite development, and is associated with the oocyst wall (Ferguson et al. 2000). It is therefore possible that MIC4 continues to be expressed and released in an analogous manner from an internalised parasite.

228

7.4. Additional studies of TgMIC4 and the TgMIC1/4/6 complex

In addition to the mode of host cell adhesion by TgMIC4, several additional features regarding TgMIC4 and the TgMIC1/4/6 complex as a whole were poorly understood at the onset of these studies (as stated in chapter 1.5). Alongside these studies, various additional projects have been undertaken in our laboratory, which have collectively enhanced our understanding of TgMIC4 structure and TgMIC1/4/6 complex assembly. These studies, summarised briefly here, are documented in detail in a manuscript submitted for publication in 2012.

7.4.1. Structural and functional analysis of TgMIC4-A12

The solution structure of TgMIC4 residues 58-231 (i.e. comprising the putative A12 domain pair) has been solved in our laboratory (Dr Jan Marchant) (figure 7.2). Each domain bears a canonical apple/PAN fold with the three residue linker conferring a close interface, in which the α-helix of A2 stacks against A1.

Figure 7.2: the solution structure of TgMIC4-A12. A1 and A2 domains are labelled and coloured in dark and light blue respectively. Sulphur atoms are coloured in yellow. This structure was solved by Dr Dr Jan Marchant, and submitted under PDB accession code 4A5V.

Although the domains adopt similar overall folds, the divergence of the domains within pairs is demonstrated by structural alignment of either domain with the solution structure of TgMIC4-A5. TgMIC4- A1 and TgMIC4-A5 bear 47% sequence identity and their structures superimpose closely, with an RMSD of 0.79 Å (for Cα atoms of 59 aligned residues). Meanwhile, the structures of TgMIC4-A1 and TgMIC4-A5, which share only 16% sequence identity, superimpose with RMSD of 1.58 Å (for Cα atoms of 41 aligned residues).

229

Previous studies have suggested that domain pair A12 mediates recruitment of TgMIC4 into the TgMIC1/6 sub-complex via an association with TgMIC1-NT (i.e. the host-adhesive region). Exhaustive attempts to demonstrate the interaction in our laboratory, via gel filtration, ITC, and NMR, were unsuccessful, prompting the suggestion that recombinant TgMIC1-NT contains a pair of miss-matched disulphide linkages from a β-finger region at the C-terminus of MAR2 (highlighted in figure 7.7). A ‘native’ β-finger peptide was therefore produced in isolation, and titration into TgMIC4-A12 demonstrates an interaction, with numerous chemical shift perturbations observed for A2 residues (Dr Jan Marchant). Consistent with these findings, studies of mic1KO parasites complemented with truncated TgMIC1, missing the β-finger region, reveal that this region is required for correct trafficking of TgMIC4 to the micronemes (Dominique Soldati-Favre, University of Geneva, personal comms).

7.4.2. Domain organisation of TgMIC4

The relatively high degree of sequence conservation between TgMIC4-A12 and TgMIC4-A34 (39%) and TgMIC4-A56 (42%) suggests that the overall structure of TgMIC4-A12 will be conserved in the additional domain pairs. The outer face of the α-helix of A2 is punctuated by alanine residues, which together with surrounding residues form hydrophobic contacts with A1 residues (figure 7.3). This hydrophobicity is generally maintained in A4 and A6 suggesting that the domain interface is conserved.

Figure 7.3: the hydrophobic domain interface of TgMIC4-A12. The α-helix of TgMIC4-A1 (cyan) contains numerous alanine residues which, together with a proximate phenylalanine side-chain, establish a hydrophobic core with several residues from A1 (dark blue). Residue type is largely conserved at each position (denoted a-i) amongst the odd- and even-numbered domains, suggesting that this interface is conserved in each domain pair.

230

However, the architecture of TgMIC4 (i.e. the relative orientations of domain pairs) has yet to be characterised. The only structure of a multi-apple domain protein published to date is that of factor XI, in which the four apple domains associate in a disk-like arrangement (figure 7.4). Interestingly the domain interface within pairs of adjacent domains is broadly similar to that observed in TgMIC4-A12, although the relative orientation of each domain is inverted (i.e. the α-helix of the odd-numbered domain stacks against the even-numbered domain) and the presence of a six residue linker permits a more loose association.

It is possible that TgMIC4 domain pairs assemble in a similar manner to FXI domains. However, the presence of extended linker sequences between domain pairs may give rise to a more extended, flexible conformation. In order to characterise the shape of TgMIC4, recombinant full-length protein was prepared in our laboratory (Lloyd Liew) and submitted for small angle X-ray scattering (SAXS) at Diamond Light Source, the results of which are pending.

Figure 7.4: the crystal structure of factor XI. The domains are coloured red (A1), pink (A2), orange (A3) and sand (A4). The domains are evenly-spaced by six residue linkers, and assemble into a disk-like structure. The interface between odd- and even-numbered domains is similar to that of TgMIC4-A12, but with inversion of the respective orientation of domains.

231

7.4.3. The architecture of the TgMIC1/4/6 complex

In order to characterise the native higher-order structure of the TgMIC1/4/6 complex, it was purified from parasite cell lysates via lactose-affinity chromatography (i.e. via TgMIC4 activity) as previously described (Lourenco et al. 2001) (Camila Pinzan & Livia Lai). Blue-native gel electrophoresis demonstrated the obtainment of a species of ~660 kDa, of which SDS-PAGE under reducing conditions reveals the presence of TgMIC1 and TgMIC4 (figure 7.5). Incubation of the ~660 kDa species with 8M urea or 6M guanidinium chloride yields products of ~440 kDa and ~220 kDa. Extraction and SDS-PAGE analysis of the latter species demonstrates that it predominantly contains TgMIC1 (data not shown).

Figure 7.5: gel electrophoresis analysis of the native purified TgMIC1/4/6 complex. M = molecular weight markers; relevant weights are labelled. A) SDS-PAGE under reducing conditions. Lane 1 = the lactose-affinity column elution fraction. Protein bands believed to represent TgMIC1 and TgMIC4 are labelled. B) Blue-native gel electrophoresis. Lane 1 = the lactose-affinity column elution fraction; lane 2 = incubated with 8M urea; lane 3 = incubated with 6M guanidinium chloride. Protein bands believed to represent TgMIC1/4 and TgMIC1 are labelled.

The data therefore suggest that multimers of TgMIC1 and TgMIC1/TgMIC4 associate non-covalently, giving rise to a ~660 kDa adhesive complex. The stability of the sub-complexes in the presence of high detergent concentrations suggests that covalent linkages are involved. In terms of the ~220 kDa species, the existence of a TgMIC1 trimer is the most plausible explanation, since the transmembrane escorter TgMIC6 contains three EGF domains which could provide a scaffold for assembly and covalent linkage of three TgMIC1 molecules. The interaction the TgMIC1 C-terminal domain with both EGF3 and

232

EGF2 domains has been previously demonstrated (Saouros et al. 2005; Sawmynaden et al. 2008). The EGF1 domain is liberated during trafficking to the micronemes, however it is intact within the ER lumen (where the complex is assembled), and contains many of the residues which appear to mediate TgMIC1 association in EGF3/EGF2.

Covalent linkage of TgMIC4 molecules to TgMIC1, giving rise to a hetero-hexamer, is also feasible, since TgMIC4-A3 domain contains a ‘spare’ cysteine which could mediate intra-molecular cross-linking. Interestingly, this cysteine is absent in NcMIC4, which does not co-purify NcMIC1 from parasite lysates (Keller et al. 2004). Previous studies have demonstrated that TgMIC4-A12 is sufficient for ER exit alongside TgMIC1/6 (Reiss et al. 2001), presumably via the association with the TgMIC1 β-finger demonstrated in our laboratory. It is possible that this interaction is stabilised by formation of a covalent linkage from TgMIC4-A3 following the initial association of the two proteins.

7.4.4. The role of TgMIC4-A3

In order to probe the role of the ‘spare’ cysteine of TgMIC4-A3, TgMIC4 residues 225-304 were cloned, expressed and purified (Ben Cowper), using the methods described in this thesis for recombinant production of TgMIC4-A5/6 proteins. Analysis via NMR spectroscopy, including the acquisition of 1H-15N HSQC and triple resonance spectra, revealed a significantly polydispersed sample, with TgMIC4-A3 present in three observable isoforms (figure 7.6B & C). Peptide fingerprinting via MALDI-TOF mass spectrometry under non-reducing conditions (carried out by Dr Jeff Keen, University of Leeds) demonstrated these species to derive from varying disulphide-linkage patterns in the region containing the ‘spare’ cysteine (figure 7.6D). Interestingly, mutation of the spare cysteine (via site-directed mutagenesis; see Appendix A4) failed to fully eradicate sample polydispersity, indicating that the domain bears a natural propensity for disulphide shuffling which is not apparent in the other TgMIC4 apple domains studied. This may facilitate covalent linkage to TgMIC1-NT, which itself appears to be capable of disulphide shuffling when produced recombinantly.

The activity of TgMIC4-A3 was probed via NMR titration with the TgMIC1 β-finger peptide. This failed to detect an interaction, suggesting that any association of TgMIC4-A3 with TgMIC1-NT occurs outside of this region. This is consistent with a model where TgMIC4 is recruited via an A2/β-finger interaction, followed by stabilisation of the complex via a covalent linkage involving A3.

233

Figure 7.6: NMR and MALDI-TOF spectra demonstrating the polydispersity of recombinantly-produced TgMIC4- A3. A) the amino acid sequence of TgMIC4-A3. Cysteines are highlighted (‘spare’ cysteine in black). B) 1 15 Superimposed sections from the H- N HSQC spectra of TgMIC4-A3 (black) and TgMIC4-A3mut (red), which respectively comprise three and two observable isoforms. C) Selected data strips from the HNCACB spectrum of TgMIC4-A3, comprising two sets of near-identical correlated chemical shift patterns, corresponding to E27 and A28 in their distinct states. D) MALDI-TOF mass spectra of TgMIC4-A3 (upper) and TgMIC4-A3mut (lower). Major peaks are annotated; observed and excised sequences are coloured in blue and grey respectively. Data was collected under non-reducing conditions, in order to preserve disulphide linkages. Multiple peak are detected to the presence of numerous disulphide bond arrangements.

234

7.4.5. Implications for the model of the TgMIC1/4/6 complex

The studies described in this thesis, in conjunction with collaborating work, encourage update of the model of the TgMIC1/4/6 complex, initially depicted in figure 1.13 (figure 7.7A). The solution structure of TgMIC4-A12 has been solved and shown to interact with the β-finger of MAR2, although precise details of this interaction remain unknown. The calculated structure of the TgMIC4-A5/lacto-N-biose complex has also been added.

The model remains theoretical; still based upon in vivo and in vitro studies using truncated proteins and sub-cloned domains. However, the first insights into the higher-order structure of the complex have been provided by the described studies of the natively purified complex. The findings are broadly consistent with the model, suggesting that entities comprising TgMIC1 and TgMIC4 associate giving rise to a large polyvalent heteromer (figure 7.7Bi). It is possible that these entities interact further to form a large adhesive matrix on the parasite surface (figure 7.7Bii). Polyvalency is known to be a common feature of host-cell attachment by pathogens, particularly those involving carbohydrate receptors, where commonly observed weak binding affinities are enhanced by the cumulative activity of multiple adhesins (Mammen et al. 1998; Lee & Lee 2001).

235

Figure 7.7: the updated model of the TgMIC1/4/6. A) The model from figure 1.13 with TgMIC4 detail added. Solution structures of TgMIC4-A12 and TgMIC4-A5/lacto-N-biose are included. Processing sites are indicated by arrows. The β-finger region of TgMIC1-NT, which interacts with TgMIC4-A1 (cyan), is circled. Diagram adapted from Sawmynaden et al. 2008. B) Diagrams were created by Professor Stephen Matthews. i) a schematic diagram of a possible arrangement of TgMIC1 and TgMIC4 molecules giving rise to a ~660 kDa species. ii) a possible arrangement of TgMIC1/TgMIC4 complex on the parasite surface, giving rise to a multi-valent matrix.

236

7.5. Future perspectives

The results described in this thesis have advanced our overall understanding of the mode of host cell recognition by the T. gondii MIC1/4/6 complex, revealing that TgMIC4 binds galactosylated receptors and providing detailed insights into the mode of galactose recognition. This, along with previous studies of sialylated receptor binding by TgMIC1, may encourage the development of therapeutics aimed at inhibition of T. gondii host cell attachment and invasion. This would be facilitated by structural characterisation of TgMIC4-A5 binding to more complex oligosaccharides. Ultimately this may require generation of new TgMIC4-A5 constructs which are amenable to crystallisation and X-ray diffraction, towards high-throughput structure determination of TgMIC4-A5 in complex with an array of ligands via crystal soaking.

It is also desirable to elucidate fully the in vivo role of TgMIC4. Several theories have been postulated here, and these could be tested using relatively straightforward assays. For example, assessment of the capacity for galactose-recognition events to contribute to host cell invasion by T. gondii could be assessed via cell invasion assays in the presence of β-galactosidase. Analogous assays have been previously carried out using neuraminidase, demonstrating that the binding of sialylated receptors (presumably by TgMIC1) is required for efficient host cell invasion (Blumenschein et al. 2007). The assay could be expanded to incorporate a variety of host cell lines with different oligosaccharide expression profiles, in order to comprehensively characterise the respective roles and the synergistic relationship, if one exists, between TgMIC1 and TgMIC4.

The proteolytic liberation of the host-adhesive portion of TgMIC4 also prompts the need to assess its contribution to events downstream of host cell invasion. For example, by complementing a mic4KO parasite strain with C-terminally epitope-tagged TgMIC4 it should be possible to monitor its sub-/extra- cellular location in an invaded parasite. This would provide more conclusive evidence for a post-invasive role of the galactose-specific lectin in promoting parasite survival.

Several aspects regarding the structure and assembly of TgMIC1/4/6 remain poorly understood. The overall structure of TgMIC4 is unknown and whilst forthcoming SAXS data may prove informative, X-ray crystallography would be required to obtain a high resolution structure. The mode of incorporation of TgMIC4 into the TgMIC1/6 sub-complex is also yet to be fully characterised. This could be rectified by

237 solving the structure of a TgMIC4-A12/TgMIC1 β-finger complex via NMR spectroscopy or X-ray crystallography. The spare cysteine of TgMIC4-A3 appears to have interesting activity, and may also be involved in covalent association with TgMIC1. This activity is currently poorly understood, and could be investigated further via co-expression or re-folding experiments with TgMIC1-NT.

A precise stoichiometry of the complex remains unclear; further attempts to characterise this would ideally be made via X-ray crystallography or electron microscopy. However, to date purification of the native complex has yielded only very small quantities of protein, insufficient for structural studies.

238

Appendices

239

A1. Polymerase chain reaction (PCR).

A1.1. PCR primers

o ∏ Construct Primers (5→3)* Tm ( C) F: GGTATTGAGGGTCGCTCTCCTGATTTTCACCACGAAG 68 TgMIC4-A5 R: AGAGGAGAGTTAGAGCCTCAACTTGTATCACATGTTC 64 F: GGTATTGAGGGTCGCGAGAACCTGTACTTCCAGGGCTCTCCTGATTTTCACCACGAAG┼ 75 TgMIC4-A5 (TEV) R: AGAGGAGAGTTAGAGCCTCAACTTGTATCACATGTTC 64 F: GGTATTGAGGGTCGCGATACAAGTTGCCTTAGGAGGG 69 TgMIC4-A6 R: AGAGGAGAGTTAGAGCCTCATTCTGTGTCTTTCGCTTCAAGCACCTG 71

*LIC overhangs are highlighted in blue. ∏Tm denotes primer melting (i.e. annealing) temperature. ┼Additional overhang (encoding TEV target site) highlighted in grey.

A1.2. PCR reaction mixtures.

KOD DNA polymerase buffer (10X) 5 μl Taq DNA polymerase buffer (10X) 5 μl dNTPs (2 mM each) 3 μl dNTPs (10 mM each) 1 μl Template DNA 1 μl* Template DNA 10 μl┼ Forward primer (5 pmol/μl) 4 μl Forward primer (5 pmol/μl) 1 μl Reverse primer (5 pmol/μl) 4 μl Reverse primer (5 pmol/μl) 1 μl

MgCl2 (25 mM) 5 μl MgCl2 (25 mM) 5 μl Sterile water 27.6 μl Sterile water 26.5 μl KOD DNA polymerase (2.5 U/μl) 0.4 μl Taq DNA polymerase (2.5 U/μl) 0.25 μl Reactions using KOD DNA polymerase Reactions using Taq DNA Polymerase

*of template DNA of approximate concentration 10 ng/μl.

┼Cell supernatant used as DNA template.

240

A1.3. PCR protocols

Step Temperature (oC) Duration (mm:ss) Step Temperature (oC) Duration (mm:ss) 1 98 2:00 1 94 2:00 2 60-70* 0:05 2 60-70* 1:00 3 72 0:30 3 72 2:00 4 98 0:15 4 94 1:00 5 60-70* 0:05 5 60-70* 1:00 6 72 030 6 72 2:00 Repeat steps 4-6 30x Repeat steps 4-6 30x 7 98 0:15 7 94 1:00 8 60-70* 0:05 8 60-70* 1:00 9 72 1:00 9 72 4:00 10 15 Hold 10 15 Hold Protocol for KOD DNA polymerase Protocol for Taq DNA Polymerase

*Annealing temperature depends on primer melting temperature (Tm)

241

A2. Media & buffer compositions.

A2.1. Lysogeny broth (LB) media (1L): A2.3. Ni2+-affinity chromatography buffers.  10 g tryptone  5 g yeast extract Component Lysis Wash Elution  5 g NaCl Tris-HCl (pH 7.9) 50 mM  Add 15 g agar for LB-agar NaCl 500 mM Imidazole 5 mM 20 mM 200mM

A2.2. M9 Minimal media (1L):  6 g Na2HPO3 A2.4. Factor Xa reaction buffer:  3 g KH2PO4  0.5 g NaCl  50 mM Tris-HCl (pH 8) pH adjust to 7.0-7.4 and autoclave.  100 mM NaCl  5 mM CaCl2  1 ml MgSO4 (1 M)

 10 μl CaCl2 (1 M)

 1 ml FeSO4 (0.01M) A2.5. Thrombin reaction buffer:  1 ml 1000X vitamins:  20 mM Tris-HCl (pH 8.4) o 0.4 g/L choline chloride  150 mM NaCl o 0.5 g folic acid  2.5 mM CaCl2 o 0.5 g pantothenic acid

o 0.5 g nicotinamide o 1 g myo-inositol A2.6. TEV reaction buffer: o 0.5 pyridoxal-HCl  50 mM Tris-HCl (pH 8.0) o 0.5g Thiamine-HCl  100 mM NaCl o 0.05g riboflavin  3 mM Glutathione o 1 g biotin  0.3 mM Oxidised glutathione  1 ml 1000X micronutrients: o 0.4 mM H3BO3 A2.7. NMR buffer: o 30 μM CoCl2

o 10 μM CuSO4  20 mM KH2PO4/K2HPO4 (pH 6.5) o 80 μM MnCl2  100 mM NaCl o 3 μM ammonium molybdate)  2 g D-glucose (13C if desired) 15 A2.8. Trypsin reaction buffer:  0.7 g NH4Cl ( N if desired)  50 mM Tris-HCl (pH 8)  100 mM NaCl  20 mM CaCl2

242

A3. Gel electrophoresis.

A3.1. Agarose gels (DNA): Resolving gel (12%) recipe (4 gels):

 7.6 ml 1.25 M bisTris-HCl (pH 6.65) Gel recipe (makes 1 gel):  8.1 ml 37.5:1% acrylamide/bis-  1 g agarose. acrylamide  100 ml 1X TAE buffer  11.3 sterile water  1 μl SYBR® Safe DNA stain  100 μl ammonium persulphate  28 μl TEMED

50X TAE buffer (1L): 5X MOPS gel running buffer:  242 g Tris  250 mM MOPS  100 ml 0.5 M EDTA pH 8.0  250 mM Tris  57.2 ml acetic acid  5 mM EDTA Make up to 1L with water.  0.5 % (w/v) SDS

Samples were mixed with 6X loading dye 2X Laemmli reducing sample buffer:

(Promega) and loaded into gels alongside 1kB  120 mM Tris-HCl (pH 6.8) DNA ladder (Promega),  20% (v/v) glycerol  4% SDS Gels were run in 1X TAE buffer for 60-80  0.02 % bromophenol blue minutes at 100 V, before visualisation using the  50 mM DTT

SYBR® Safe DNA stain system (Invitrogen). (NOTE: For non-reduced samples, DTT was omitted.)

A3.2. SDS-PAGE (Protein): Samples were mixed with 2X Laemlli buffer and Stacking gel (4%) recipe (makes 4 gels): incubated at 100oC for 2 minutes and loaded  2 ml 1.25 M bisTris-HCl (pH 6.65) into gels alongside Mark12TM protein markers  0.79 ml 37.5:1% acrylamide/bis- (Invitrogen). acrylamide  4.3 ml sterile water  48 μl 10% (w/v) ammonium persulphate Gels were run in 1X MOPS buffer for 50 minutes  24 μl TEMED at 180 V, before staining with Instant Blue (Generon).

243

A4. Site-directed mutagenesis

A4.1. PCR primers

Construct Mutation Primers (5→3)*

F: GTCCACACGGGCAACATTGGGTCAGCAGCACAAACCATTG K19A R: CAATGGTTTGTGCTGCTGACCCAATGTTGCCCGTGTGGAC

F: CTTACAATGTAAAATCCGGTTTGTGTTATCCAATGAGAGGAAAGCCTCAATTT K60M R: AAATTGAGGCTTTCCTCTCATTGGATAACACAAACCGGATTTTACATTGTAAG TgMIC4-A5(TEV) F: TCCAAAAAGAGGAAAGCCTCAATTTGCAAAGTATCTTGGCGACATGACGG Y67A R: CCGTCATGTCGCCAAGATACTTTGCAAATTGAGGCTTTCCTCTTTTTGGA F: AGAGGAAAGCCTCAATTTTATAAGCTTCTTGGCGACATGACGGG Y69L R: CCCGTCATGTCGCCAAGAAGCTTATAAAATTGAGGCTTTCCTCT F: GGAACGGTGCCGCGCTGATGGAAGATGC TgMIC4-A3(TEV) C39A R: GCATCTTCCATCAGCGCGGCACCGTTCC

*mutagenic codons are underlined.

A4.2. PCR reaction mixture. A4.3. PCR reaction protocol.

Pfu Ultra DNA polymerase buffer (10X) 5 μl Step Temp (oC) Duration dNTPs (2 mM each) 1 μl (mm:ss) Template DNA 1 μl 1 98 1:00 Forward primer (5 pmol/μl) 1 μl 2 98 1:00 Reverse primer (5 pmol/μl) 1 μl 3 55 1:00 Sterile water 40 μl 4 68 6:00 Pfu Ultra DNA polymerase (2.5 U/μl) 1 μl Repeat steps 2-4 16x 5 98 1:00 6 55 1:00 7 68 6:00

8 15 Hold

244

A5. Peptide mass fingerprinting of digested TgMIC4-A56.

Trypsin-digested TgMIC4-A56 was analysed via SDS-PAGE under reducing conditions. The gel band corresponding to the stable product of digestion was then excised and submitted for peptide mass fingerprinting. Briefly, this involves extraction of the protein from the gel, before denaturation and digestion with trypsin. Fragmented peptides are then detected via MALDI-TOF mass spectrometry. This procedure was carried out by Dr Paul Hitchen at the Imperial College London, Biopolymer Mass Spectrometry laboratory.

Figure A5: MALDI-TOF mass spectrometry of digested TgMIC4-A56. The presence of several peptides, corresponding to core regions of A5 and the C-terminus of apple6 sequence, has been detected via mass spectrometry (spectra were provided by Dr Paul Hitchen, Imperial College London).

245

A6. TgMIC4-A5 chemical shift assignments

H5.C 174.436 I16.HB 1.61 Q21.HG2 2.48 V26.CG2 22.7 L32.HA 3.06 H5.CA 56.372 I16.HG11 1.19 Q21.C 175.451 V26.N 126.700 L32.HB1 1.69 H5.CB 30.120 I16.HG12 1.09 Q21.CA 57.057 K27.HN 8.052 L32.HB2 1.57 D6.HN 8.157 I16.HG21 0.63 Q21.CB 29.747 K27.HA 4.66 L32.HG 1.61 D6.C 176.052 I16.HD11 0.53 Q21.CG 33.9 K27.HB1 1.93 L32.HD11 1.05 D6.CA 54.589 I16.C 175.078 Q21.N 120.576 K27.HB2 1.65 L32.HD21 0.93 D6.CB 41.565 I16.CA 58.549 T22.HN 8.419 K27.HG1 1.37 L32.C 178.625 D6.N 121.137 I16.CB 39.447 T22.HA 4.77 K27.HG2 1.35 L32.CA 57.845 E7.HN 8.310 I16.CG1 26.3 T22.HB 4.1 K27.HD1 1.68 L32.CB 42.030 E7.CA 56.684 I16.CG2 18.2 T22.HG21 1.36 K27.HD2 1.68 L32.CG 26.7 E7.CB 30.642 I16.CD1 10.9 T22.C 174.720 K27.HE1 2.9 L32.CD1 25.7 E7.N 120.657 I16.N 119.365 T22.CA 62.280 K27.HE2 2.9 L32.CD2 26.1 V8.HN 8.187 G17.HN 8.814 T22.CB 69.891 K27.C 176.108 L32.N 122.752 V8.C 176.033 G17.HA1 4.25 T22.CG2 23.5 K27.CA 53.998 S33.HN 8.221 V8.CA 62.418 G17.HA2 3.66 T22.N 122.967 K27.CB 36.463 S33.HA 3.87 V8.CB 32.728 G17.C 172.750 I23.HN 9.007 K27.CG 24.4 S33.HB1 3.79 V8.N 120.844 G17.CA 44.223 I23.HA 4.124 K27.CD 29.1 S33.HB2 3.79 E9.HN 8.479 G17.N 116.651 I23.HB 1.51 K27.CE 42.1 S33.C 177.003 E9.C 175.884 S18.HN 7.266 I23.HG11 0.76 K27.N 126.282 S33.CA 61.721 E9.CA 57.057 S18.HA 5.56 I23.HG12 1.48 R28.HN 8.685 S33.CB 62.5 E9.CB 29.971 S18.HB1 3.66 I23.HG21 0.76 R28.HA 4.61 S33.N 113.328 E9.N 123.333 S18.HB2 3.66 I23.HD11 0.60 R28.HB1 1.75 E34.HN 7.815 V11.C 174.795 S18.C 175.585 I23.C 174.630 R28.HB2 1.75 E34.HA 4.04 V11.CA 61.534 S18.CA 54.371 I23.CA 60.361 R28.HG1 1.61 E34.HB1 1.98 V11.CB 34.000 S18.CB 65.340 I23.CB 40.567 R28.HG2 1.61 E34.HB2 1.95 H12.HN 8.949 S8.N 107.852 I23.CG1 27.0 R28.HD1 3.22 E34.HG1 2.4 H12.C 175.003 K19.HN 9.292 I23.CG2 17.4 R28.HD2 3.22 E34.HG2 2.13 H12.CA 55.416 K19.HA 4.38 I23.CD1 14.9 R28.C 177.048 E34.C 178.197 H12.CB 30.195 K19.HB1 1.85 I23.N 128.189 R28.CA 57.057 E34.CA 58.98 H12.N 126.122 K19.HB2 1.52 G24.HN 8.168 R28.CB 31.090 E34.CB 29.782 T13.HN 8.288 K19.HG1 1.07 G24.HA1 3.97 R28.CG 27.1 E34.CG 36.6 T13.HA 4.43 K19.HG2 0.95 G24.HA2 3.75 R28.CD 43.2 E34.N 123.940 T13.HB 4.31 K19.HD1 1.27 G24.C 172.944 R28.N 122.239 C35.HN 7.423 T13.HG21 1.21 K19.HD2 1.27 G24.CA 44.596 A29.HN 9.559 C35.HA 4.47 T13.C 172.467 K19.HE1 2.38 G24.N 111.782 A29.HA 4.89 C35.HB1 2.66 T13.CA 61.683 K19.HE2 2.29 E25.HN 8.186 A29.HB1 1.40 C35.HB2 2.66 T13.CB 69.891 K19.C 175.675 E25.HA 4.28 A29.C 177.122 C35.C 174.719 T13.CG2 21.6 K19.CA 55.789 E25.HB1 1.88 A29.CA 50.715 C35.CA 56.139 T13.N 114.233 K19.CB 32.284 E25.HB2 1.87 A29.CB 22.435 C35.CB 35.519 G14.HN 8.706 K19.CG 25.5 E25.HG1 2.14 A29.N 126.043 C35.N 118.374 G14.HA1 4.29 K19.CD 28.9 E25.HG2 2.28 S30.HA 4.37 R36.HN 8.191 G14.HA2 3.67 K19.CE 41.2 E25.C 176.719 S30.HB1 3.98 R36.HA 2.45 G14.C 174.108 K19.N 127.211 E25.CA 55.639 S30.HB2 3.98 R36.HB1 1.76 G14.CA 46.984 A20.HN 7.720 E25.CB 31.016 S30.C 173.184 R36.HB2 1.43 G14.N 111.187 A20.HA 4.2 E25.CG 36.4 S30.CA 60.170 R36.HG1 1.26 N15.HN 8.385 A20.HB1 1.69 E25.N 119.285 S30.CB 64.046 R36.HG2 1.26 N15.HA 4.92 A20.C 175.675 V26.HN 8.584 S31.HN 7.154 R36.HD1 3.22 N15.HB1 3.25 A20.CA 52.804 V26.HA 3.78 S31.HA 4.5 R36.HD2 3.22 N15.HB2 2.68 A20.CB 20.345 V26.HB 1.89 S31.HB1 4.00 R36.C 178.435 N15.C 174.332 A20.N 121.840 V26.HG11 0.42 S31.HB2 4.10 R36.CA 59.370 N15.CA 55.042 Q21.HN 8.422 V26.HG21 0.94 S31.C 173.153 R36.CB 28.852 N15.CB 36.94 Q21.HA 4.03 V26.C 175.033 S31.CA 56.527 R36.CG 26.9 N15.N 117.800 Q21.HB1 1.96 V26.CA 63.444 S31.CB 65.984 R36.CD 43.4 I16.HN 7.206 Q21.HB2 1.96 V26.CB 31.592 S31.N 110.737 R36.N 120.865 I16.HA 4.70 Q21.HG1 2.41 V26.CG1 21.0 L32.HN 8.731 A37.HN 7.75

246

A37.HA 4.05 K43.HA 4.04 Y47.NE2 109.4 V52.CG1 20.2 Y58.HB1 3.46 A37.HB1 1.46 K43.HB1 1.88 Y48.HN 8.680 V52.CG2 21.5 Y58.HB2 2.78 A37.C 180.405 K43.HB2 1.96 Y48.HA 5.52 V52.N 120.584 Y58.HD1 7.23 A37.CA 55.192 K43.HG1 1.52 Y48.HB1 2.72 K53.HN 7.844 Y58.HD2 7.23 A37.CB 17.734 K43.HG2 1.54 Y48.HB2 2.79 K53.HA 4.37 Y58.HE1 6.69 A37.N 121.230 K43.HD1 1.74 Y48.HD1 6.77 K53.HB1 1.95 Y58.HE2 6.69 R38.HN 7.771 K43.HD2 1.74 Y48.HD2 6.77 K53.HB2 2.06 Y58.C 172.467 R38.HA 4.05 K43.HE1 3.05 Y48.HE1 6.58 K53.HG1 1.42 Y58.CA 53.9 R38.HB1 2.14 K43.HE2 3.05 Y48.HE2 6.58 K53.HG2 1.47 Y58.CB 39.224 R38.HB2 1.96 K43.C 178.003 Y48.C 172.482 K53.HD1 1.68 Y58.CD1 64.4 R38.HG1 1.72 K43.CA 59.743 Y48.CA 54.520 K53.HD2 1.68 Y58.CD2 64.4 R38.HG2 1.96 K43.CB 32.657 Y48.CB 40.865 K53.HE1 3.0 Y58.CE1 48.6 R38.HD1 3.14 K43.CG 24.6 Y48.CD1 63.7 K53.HE2 3.0 Y58.CE2 48.6 R38.HD2 3.37 K43.CD 29.0 Y48.CD2 63.7 K53.C 177.883 Y58.N 128.589 R38.C 179.092 K43.CE 41.8 Y48.CE1 47.3 K53.CA 57.654 P59.HA 4.64 R38.CA 59.519 K43.N 128.833 Y48.CE2 47.3 K53.CB 32.583 P59.HB1 2.37 R38.CB 29.747 E44.HN 8.917 Y48.N 114.233 K53.CG 24.7 P59.HB2 2.15 R38.CG 26.9 E44.HA 4.36 T49.HN 8.466 K53.CD 28.5 P59.HG1 2.44 R38.CD 43.9 E44.HB1 2.31 T49.HA 5.54 K53.CE 41.7 P59.HG2 1.66 R38.N 120.562 E44.HB2 1.87 T49.HB 3.66 K53.N 119.982 P59.HD1 3.82 C39.HN 7.712 E44.HG1 2.26 T49.HG21 1.25 S54.HN 8.843 P59.HD2 4.05 C39.HA 4.39 E44.HG2 2.26 T49.C 172.497 S54.HA 4.42 P59.C 175.973 C39.HB1 2.79 E44.C 176.078 T49.CA 61.833 S54.HB1 3.98 P59.CA 63.5 C39.HB2 3.22 E44.CA 56.162 T49.CB 72.2 S54.HB2 3.87 P59.CB 32.284 C39.C 175.884 E44.CB 30.046 T49.CG2 23.2 S54.C 176.108 P59.CG 27.7 C39.CA 58.326 E44.CG 37.2 T49.N 117.868 S54.CA 58.326 P59.CD 51.4 C39.CB 41.462 E44.N 113.761 Y50.HN 9.865 S54.CB 64.444 K60.HN 9.077 C39.N 115.961 C45.HN 7.642 Y50.HA 5.79 S54.N 114.167 K60.HA 5.20 Q40.HN 8.398 C45.HA 4.71 Y50.HB1 3.13 G55.HN 8.207 K60.HB1 0.03 Q40.HA 3.813 C45.HB1 4.01 Y50.HB2 2.91 G55.HA1 4.19 K60.HB2 1.03 Q40.HB1 2.10 C45.HB2 3.23 Y50.HD1 6.91 G55.HA2 3.74 K60.HG1 1.05 Q40.HB2 2.10 C45.C 174.228 Y50.HD2 6.91 G55.C 173.108 K60.HG2 1.18 Q40.HG1 2.31 C45.CA 56.386 Y50.HE1 6.79 G55.CA 46.014 K60.HD1 1.1 Q40.HG2 2.51 C45.CB 44.596 Y50.HE2 6.79 G55.N 112.983 K60.HD2 1.23 Q40.C 176.958 C45.N 118.165 Y50.C 173.123 L56.HN 7.645 K60.HE1 2.81 Q40.CA 58.251 S46.HN 9.717 Y50.CA 55.192 L56.HA 4.03 K60.HE2 3.35 Q40.CB 27.73 S46.HA 4.86 Y50.CB 43.626 L56.HB1 1.68 K60.C 174.407 Q40.CG 34.3 S46.HB1 4.33 Y50.CD1 62.1 L56.HB2 0.68 K60.CA 55.042 Q40.N 116.460 S46.HB2 4.13 Y50.CD2 62.1 L56.HG 1.46 K60.CB 35.940 A41.HN 7.333 S46.C 176.152 Y50.CE1 48.2 L56.HD11 0.86 K60.CG 24.7 A41.HA 4.24 S46.CA 58.027 Y50.CE2 48.2 L56.HD21 0.58 K60.CD 29.7 A41.HB1 1.52 S46.CB 65.713 Y50.N 129.736 L56.C 174.168 K60.CE 43.5 A41.C 177.033 S46.N 125.842 N51.HN 8.111 L56.CA 56.012 K60.N 124.051 A41.CA 53.102 H47.HN 8.725 N51.HA 5.62 L56.CB 42.357 R61.HN 8.039 A41.CB 19.002 H47.HA 5.06 N51.HB1 3.05 L56.CG 26.4 R61.HA 5.28 A41.N 118.732 H47.HB1 3.55 N51.HB2 2.83 L56.CD1 24.9 R61.HB1 1.92 E42.HN 7.736 H47.HB2 3.25 N51.C 175.272 L56.CD2 22.4 R61.HB2 1.64 E42.HA 4.66 H47.HD1 11.65 N51.CA 52.505 L56.N 118.971 R61.HG1 1.52 E42.HB1 2.11 H47.HD2 6.34 N51.CB 41.089 C57.HN 7.222 R61.HG2 1.56 E42.HB2 2.11 H47.HE1 6.74 N51.N 126.737 C57.HA 5.638 R61.HD1 3.16 E42.HG1 2.40 H47.HE2 11.65 V52.HN 8.489 C57.HB1 3.06 R61.HD2 3.19 E42.HG2 2.07 H47.C 182.539 V52.HA 3.69 C57.HB2 2.90 R61.C 176.600 E42.C 176.496 H47.CA 55.789 V52.HB 2.33 C57.C 172.497 R61.CA 53.102 E42.CA 54.296 H47.CB 28.56 V52.HG11 1.16 C57.CA 53.923 R61.CB 34.448 E42.CB 30.792 H47.CD2 57.3 V52.HG21 1.19 C57.CB 40.194 R61.CG 26.2 E42.CG 35.0 H47.CE1 69.6 V52.C 176.943 C57.N 117.419 R61.CD 43.5 E42.N 120.472 H47.N 121.863 V52.CA 63.698 Y58.HN 10.107 R61.N 116.032 K43.HN 8.890 Y47.ND1 109.4 V52.CB 31.836 Y58.HA 5.01 G62.HN 8.831

247

G62.HA1 4.14 F66.CE2 60.8 L70.C 177.227 R77.HB2 1.92 G62.HA2 3.87 F66.N 123.218 L70.CA 57.057 R77.HG1 1.71 G62.C 174.660 Y67.HN 8.622 L70.CB 41.238 R77.HG2 1.71 G62.CA 44.521 Y67.HA 4.87 L70.CG 26.8 R77.HD1 3.24 G62.N 106.234 Y67.HB1 3.13 L70.CD1 23.6 R77.HD2 3.67 K63.HN 8.341 Y67.HB2 3.07 L70.CD2 24.1 R77.C 175.735 KG63.HA 4.37 Y67.HD1 7.06 L70.N 128.629 R77.CA 60.490 K63.HB1 1.65 Y67.HD2 7.06 G71.HN 9.272 R77.CB 30.642 K63.HB2 1.65 Y67.HE1 6.53 G71.HA1 4.18 R77.CG 27.3 K63.HG1 1.46 Y67.HE2 6.53 G71.HA2 3.81 R77.CD 42.9 K63.HG2 1.46 Y67.C 172.228 G71.C 173.646 R77.N 123.400 K63.HD1 1.68 Y67.CA 56.012 G71.CA 45.268 T78.HN 7.863 K63.HD2 1.68 Y67.CB 40.417 G71.N 114.781 T78.HA 4.51 K63.HE1 2.98 Y67.CD1 63.6 D72.HN 7.680 T78.HG21 1.22 K63.HE2 2.98 Y67.CD2 63.6 D72.HA 5.04 T78.C 173.690 K63.CA 54.371 Y67.CE1 47.9 D72.HB1 2.53 T78.CA 59.669 K63.CB 32.657 Y67.CE2 47.9 D72.HB2 2.31 T78.CB 70.264 K63.CG 24.5 Y67.N 119.988 D72.C 176.824 T78.N 106.934 K63.CD 28.9 K68.HN 8.431 D72.CA 55.639 D80.C 176.182 K63.CE 41.8 K68.HA 4.85 D72.CB 41.089 D80.CA 54.669 K63.N 121.546 K68.HB1 1.76 D72.N 122.560 D80.CB 41.313 P64.HA 3.58 K68.HB2 1.76 M73.HN 8.826 T81.HN 8.090 P64.HB1 0.36 K68.HG1 1.31 M73.HA 5.29 T81.C 173.899 P64.HB2 0.56 K68.HG2 1.57 M73.HB1 2.0 T81.CA 61.683 P64.HG1 1.54 K68.HD1 1.71 M73.HB2 2.30 T81.CB 69.667 P64.HG2 0.42 K68.HD2 1.71 M73.HG1 2.85 T81.N 113.693 P64.HD1 3.26 K68.HE1 3.01 M73.HG2 2.85 S82.HN 7.986 P64.HD2 3.49 K68.HE2 3.21 M73.C 175.720 S82.C 178.749 P64.C 175.929 K68.C 175.824 M73.CA 55.117 S82.CA 60.191 P64.CA 62.952 K68.CA 55.714 M73.CB 36.313 S82.CB 64.892 P64.CB 31.463 K68.CB 32.881 M73.CG 32.5 S82.N 123.528 P64.CG 26.37 K68.CG 24.6 M73.N 118.999 P64.CD 50.1 K68.CD 28.9 T74.HN 8.411 P65.HN 6.998 K68.CE 41.7 T74.HA 5.83 Q65.HA 4.51 K68.N 121.648 T74.HB 3.73 Q65.HB1 1.98 Y69.HN 7.808 T74.HG21 1.29 Q65.HB2 1.55 Y69.HA 4.10 T74.C 173.690 Q65.HG1 2.15 Y69.HB1 2.70 T74.CA 61.460 Q65.HG2 2.19 Y69.HB2 2.39 T74.CB 71.831 Q65.C 175.600 Y69.HD1 6.97 T74.CG2 21.5 Q65.CA 53.475 Y69.HD2 6.97 T74.N 118.609 Q65.CB 31.463 Y69.HE1 6.79 G75.HN 8.230 Q65.CG 32.3 Y69.HE2 6.79 G75.HA1 5.07 Q65.N 122.330 Y69.C 174.839 G75.HA2 3.31 F66.HN 8.332 Y69.CA 59.594 G75.C 171.766 F66.HA 5.59 Y69.CB 40.194 G75.CA 43.924 F66.HB1 3.07 Y69.CD1 63.9 A75.N 113.677 F66.HB2 2.86 Y69.CD2 63.9 S76.HN 7.808 F66.HD1 7.18 Y69.CE1 47.9 S76.HA 4.38 F66.HD2 7.18 Y69.CE2 47.9 S76.HB1 3.20 F66.HE1 7.00 Y69.N 126.386 S76.HB2 3.05 F66.HE2 7.00 L70.HN 7.711 S76.C 174.615 F66.C 176.660 L70.HA 3.76 S76.CA 57.803 F66.CA 58.176 L70.HB1 1.31 S76.CB 64.743 F66.CB 39.8 L70.HB2 1.31 S76.N 112.778 F66.CD1 60.9 L70.HG 1.39 R77.HN 7.927 F66.CD2 60.9 L70.HD11 0.82 R77.HA 3.99 F66.CE1 60.8 L70.HD21 0.79 R77.HB1 1.97

248

A7. Carbohydrate microarray data for TgMIC4-A56

During the initial purification of TgMIC4-A56 (i.e. prior to the onset of these studies), a sample of thioredoxin-fused protein was submitted for carbohydrate microarray analysis. The microarray protocol is briefly described in chapter 5.2.2. The array contained 503 probes (in duplicate spots of 2 and 5 fmol) encompassing ten groups; A (Gal-, Glc-, Lac- and LacNAc-based), B (LNnT-, LNT-based tetrasaccharides), C (polylactosamine-based), D (N-glycan related), E (polysialyl), F (ganglioside-related), G (O-glycan related), H (glycosaminoglycan-related), I (homo-oligomers) and K (miscellaneous). The protein concentration was added at a concentration of 20 μg/ml. The array was performed in triplicate, enabling a mean average (and standard deviation) fluorescence intensity to be deduced.

The microarray data, presented in table A7, demonstrate that TgMIC4-A56 binds with high specificity to terminally-galactosated oligosaccharides. Data was collected by Dr Yan Liu (Imperial College London, Glycosciences laboratory).

Probe Structure Fluor. Error number/name

Group A: Gal-, Glc-, Lac-, LacNAc- based

1 Lac Galß-4Glc 121 6

Lactocerebro 2 Galß-4Glcß-Cer 0 6 sides GalNAcα- 3 GalNAcα-3Galß-4Glc 0 0 3Galß-4Glc Galα-4Galß- 4 Galα-4Galß-4GlcNAc 0 14 4GlcNAc Ceramide 5 Galα-4Galß-4Glcß-Cer 0 3 trihexoside Globoside 6 GalNAcß-3Galα-4Galß-4Glcß-Cer 0 18 (P-antigen) Forssmann 7 GalNAcα-3GalNAcß-3Galα-4Galß-4Glcß-Cer 0 9 glycolipid Galα-3Gal 8 B-Tri │ 300 3 Fucα-2 9 GM4 NeuAcα-3Galβ-Cer 0 24 NeuAcα- 10 NeuAcα-3Galß-4Glc 0 19 (3')Lac 11 GM3 NeuAcα-3Galß-4Glcß-Cer 0 14 NeuAcα- 12 NeuAcα-6Galß-4Glc 0 6 (6')Lac 13 GM3(Gc) NeuGcα-3Galß-4Glc-Cer 0 80 NeuAcα- 14 NeuAcα-3Galß-4GlcNAc 0 13 (3')LN NeuAcα- 15 NeuAcα-6Galß-4GlcNAc 0 22 (6')LN Neu5,9Ac- 16 Neu5,9Acα-6Galß-4GlcNAc 0 17 (6')LN 17 GD3 NeuAcα-8NeuAcα-3Galß-4Glcß-Cer 0 33

18 GSC-210 SU-3GlcAß-3Galß-Cer42 0 11

249

SU-3Galß-4Glcß-C30 19 GSC-150 │ 0 1 Fucα-3 SU-3Galß-3GlcNAc SU(3')-Lea- 20 │ 0 16 Tri Fucα-4 SU-3Galß-4GlcNAc SU(3')-Lex- 21 │ 0 23 Tri Fucα-3 22 GSC-209 GlcAß-3Galß-Cer42 0 20

23 GSC-432 3-deoxy,3-carboxymethyl-Galß-4Glcß-C30 0 12 3-deoxy,3-carboxymethyl-Galß-3Glcß-C30 24 GSC-430 │ 0 8 Fucα-4 Glucocerebro 25 Glcß-Cer? 0 23 sides Galactocereb 26 Galß-Cer 0 7 rosides 27 Lac-AO Galß-4Glc-AO 13463 221

28 LacNAc-AO Galß-4GlcNAc-AO 19269 242 Galß-3GlcNAc 29 Lea-Tri │ 601 11 Fucα-4 Galß-3GlcNAc-AO 30 Lea-Tri-AO │ 5468 377 Fucα-4 Galß-4GlcNAc 31 Lex-Tri │ 89 8 Fucα-3 Galß-4GlcNAc-AO 32 Lex-Tri-AO │ 15008 90 Fucα-3 NeuAcα- 33 NeuAcα-3Galß-4Glc-AO 0 14 (3')Lac-AO NeuAcα- 34 NeuAcα-6Galß-4Glc-AO 0 59 (6')Lac-AO NeuAcß- 35 NeuAcß-3Galß-4Glc 0 0 (3')Lac NeuAcß- 36 NeuAcß-3Galß-4Glc-AO 0 30 (3')Lac-AO NeuAcß- 37 NeuAcß-6Galß-4Glc 0 20 (6')Lac NeuAcß- 38 NeuAcß-6Galß-4Glc-AO 0 45 (6')Lac-AO 39 Neuα-(3')Lac Neuα-3Galß-4Glc 0 28 Neuα- 40 Neuα-3Galß-4Glc-AO 0 1 (3')Lac-AO 41 Neuα-(6')Lac Neuα-6Galß-4Glc 0 4 Neuα- 42 Neuα-6Galß-4Glc-AO 0 5 (6')Lac-AO Neu4,5Ac- 43 Neu4,5Acα-3Galß-4Glc 0 30 (3')Lac Neu4,5Ac- 44 Neu4,5Acα-3Galß-4Glc-AO 0 3 (3')Lac-AO 45 Sulfatide SU-3Galβ-Cer 0 2

46 GSF-1 SU-3Galβ-C30 0 1

47 GSF-19 SU-6Glcβ-C30 0 7

48 SM3 SU-3Galβ-4Glcß-Cer 0 6

49 Haematoside NeuAcα-3Galß-4Glcß-Cer 0 7 3-deoxy,3-carboxymethyl-Galß-4Glcß-C30 50 GSC-260 │ 0 4 Fucα-3 51 H-Di Fucα-2Gal 0 26

52 A-Tri GalNAcα-3Gal 0 14 │

250

Fucα-2

NeuAcα-3Galß-3GlcNAc SA(3')-Lea- 53 │ 0 18 Tri Fucα-4 54 SU(3')-LN SU-3Galß-4GlcNAc 0 1

55 LacNAc Galß-4GlcNAc 2137 84 NeuAcα- 56 NeuAcα-3Galß-4GlcNAc-AO 0 15 (3')LN-AO Galß-4GlcNAc-(Me)AO Lex-Tri- 57 │ 0 22 (Me)AO Fucα-3 58 LacNAc(1-3) Galß-3GlcNAc 37 5 LacNAc(1-3)- 59 Galß-3GlcNAc-AO 6268 108 AO 60 GSC-426 3-deoxy,3-carboxymethyl-Galß-C30 0 10

61 GD3-tetra NeuAcα-8NeuAcα-3Galß-4Glc 0 5

62 GD3-tetra-AO NeuAcα-8NeuAcα-3Galß-4Glc-AO 0 13

63 GSC-17 NeuAcα-3Galß-4Glcß-Cer36 0 11 Neu5,9Acα-3Galß-4GlcNAcß-C30 64 GSC-511 │ 0 14 Fucα-3 Neu5,9Acα-3Galß-3GlcNAcß-C30 65 GSC-513 │ 0 1 Fucα-4 66 GSC-75 (4-deoxy)NeuAcα-3Galß-4Glcß-Cer36 0 9

67 GSC-76 (7-deoxy)NeuAcα-3Galß-4Glcß-Cer36 0 1

68 GSC-77 (8-deoxy)NeuAcα-3Galß-4Glcß-Cer36 0 5

69 GSC-51 (9-deoxy)NeuAcα-3Galß-4Glcß-Cer36 0 6

70 GSC-153 (4,8-deoxy)NeuAcα-3Galß-4Glcß-Cer36 0 0

71 GSC-78 (4-OMe)NeuAcα-3Galß-4Glcß-Cer36 0 14

72 GSC-79 (9-OMe)NeuAcα-3Galß-4Glcß-Cer36 0 24

73 GSC-61 NeuAcα-6Galß-4Glcß-Cer36 0 3

74 GSC-437 NeuAcα-8NeuAcα-8NeuAcα-3Galβ-4Glcß-Cer36 0 20

75 GSC-144 KDNα-6Galß-Cer36 0 12

76 GSC-197 KDNα-3Galβ-4Glcß-Cer28 0 8

77 GSC-198 KDNα-3Galβ-4Glcß-Cer34 0 4

78 GSC-96 NeuAcα-9NeuAcα-3Galβ-4Glcß-Cer36 0 4

79 Galα-6Glc-AO Galα-6Glc-AO 384 4

80 GSC-23 (C7)NeuAcα-3Galß1-4Glcß-Cer36 0 6

81 GSC-24 (C8)NeuAcα-3Galß1-4Glcß-Cer36 0 15 NeuAcα-3Galß-4GlcNAcß-3Galß-Cer36 82 GSC-131 │ 0 15 Quvα-3 NeuAcα-3Galß-4GlcNAcß-3Galß-Cer36 83 GSC-163 │ 0 20 Rhaα-3 84 GSC-199 KDNα-3Galβ-4Glcß-C30 0 7

85 GSC-229 NeuAcα-8NeuAcα-3Galß-4Glcß-Cer36 0 1

86 GSC-230 NeuAcα-8NeuAcα-3Galß-Cer36 29 10

87 GSC-231 NeuAcα-8NeuAcα-6Galß-Cer36 0 8

88 GSC-232 NeuAcα-8NeuAcα-6Glcß-Cer36 0 4

89 GSC-234 NeuAcα-(S)-6Gal(S)β-4Glcß-Cer36 0 10 SU3 │ 90 GSC-236 Galß-4GlcNAcß-3Galß-C30 0 13 │ Fucα-3

251

NeuAcα-3(4,6-deoxy)Galß-4GlcNAcß-3Galß-Cer36 91 GSC-257 │ 0 1 Fucα-3 GalNAcβ-6Galβ-4Glcß-Cer36 92 GSC-284 │ 0 11 NeuAcα-3 93 GSC-296 GlcAß-3Galß-4Glcß-C30 0 14

94 GSC-353 SU-3GlcAß-3Galβ-4Glcß-C30 0 12

95 GSC-439 NeuAcα-8NeuAcα-8NeuAcα-6Galß-Cer36 0 20 NeuAcα-3Galß-4GlcNAcß-C30 96 GSC-440 │ 0 25 Fucα-3 Neu4,5Acα-3Galß-4GlcNAcß-C30 97 GSC-512 │ 0 22 Fucα-3 GalNAcβ-4Galβ-3Galß-C30 98 GSC-575 │ 0 1 NeuAcα-3 99 GSC-9 NeuAcα-(S)-6Glcß-Cer36 0 4

100 GSC-12 NeuAcα-(S)-6Galß-4Glcß-Cer36 0 10

101 GSC-16 NeuAcα-3Galß-4Glcß-Cer32 0 19

102 GSC-18 NeuAcα-3Galß-4Glcß-Cer42 0 9

103 GSC-27 NeuAcα-6Galß-Cer36 0 8

104 GSC-40 NeuAcα-(S)-3Galß-Cer42 0 3

105 GSC-50 (C8 diastereoisomer)NeuAcα-3Galß-4Glcß-Cer36 0 11

106 GSC-59 NeuAcα-6GlcNAcß-Cer36 0 15

107 GSC-60 NeuAcα-6Glcß-Cer36 0 1

108 GSC-62 NeuAcα-2Glcß-Cer36 0 3

109 GSC-72 NeuAcα-(S)-6Galβ-(S)-Cer36 0 13

110 GSC-73 NeuAcα-(S)-6Galβ-4Glcβ-(S)-Cer36 0 4

111 GSC-95 NeuAcα-(S)-6GlcNAcß-Cer36 0 30 NeuAcα-3Galß-4GlcNAcß-3Galß-Cer36 112 GSC-121 │ 0 29 (3-deoxy)Fucα-3 NeuAcα-3Galß-4GlcNAcß-3Galß-Cer36 113 GSC-133 │ 0 9 (2-OMe)Fucα-3 SU-3Galß-4Glcß-Cer36 114 GSC-160 │ 0 3 Fucα-3 NeuAcα-3Galß-4Glcß-C30 115 GSC-161 │ 0 2 Fucα-3 NeuAcα-3Galß-4Glcß-Cer36 116 GSC-162 │ 0 7 Fucα-3 117 GSC-178 NeuAcα-3Galß-4Glcß-Cer34 0 1

118 GSC-187 NeuAcα-3Galß-C29 0 5

119 PI-1 NeuAcα-3(6-NAc)Galß-4GlcNAc 0 10

120 PI-1-AO NeuAcα-3(6-NAc)Galß-4GlcNAc-AO 0 13

121 PI-2 NeuAcα-3(6-NBz)Galß-4GlcNAc 0 15

122 PI-2-AO NeuAcα-3(6-NBz)Galß-4GlcNAc-AO 0 1

Group B: LNnT-, LNT- based; tetrasaccharides

123 LNT Galß-3GlcNAcß-3Galß-4Glc 681 4

124 LNnT Galß-4GlcNAcß-3Galß-4Glc 328 58 Paraglobosid 125 Galß-4GlcNAcß-3Galß-4Glcß-Cer 457 89 e

126 B-like Galα-3Galß-4GlcNAcß-3Galß-4Glcß-Cer 0 2 pentaosylcer

252

amide

Klaus 127 Galß-3Galß-4GlcNAcß-3Galß-4Glcß-Cer 184 19 glycolipid 128 LNFP-I Fucα-2Galß-3GlcNAcß-3Galß-4Glc 0 4 Galß-3GlcNAcß-3Galß-4Glc 129 LNFP-II │ 0 6 Fucα-4 Galß-4GlcNAcß-3Galß-4Glc 130 LNFP-III │ 0 16 Fucα-3 GalNAcα-3Galß-3GlcNAcß-3Galß-4Glc 131 A-Hexa │ 0 10 Fucα-2 Fucα-2Galß-4GlcNAcß-3Galß-4Glc 132 LNnDFH-I │ 0 1 Fucα-3 Galß-4GlcNAcß-3Galß-4Glc 133 LNnDFH-II │ │ 0 12 Fucα-3 Fucα-3 Galß-3GlcNAcß-3Galß-4Glc 134 LNDFH-II │ │ 0 22 Fucα-4 Fucα-3 Fucα-2Galß-4GlcNAcß-3Galß-4Glc 135 LNnTFH-I │ │ 0 17 Fucα-3 Fucα-2 Fucα-2Galß-3GlcNAcß-3Galß-4Glc 136 LNTFH-I │ │ 0 17 Fucα-4 Fucα-2 137 LSTa NeuAcα-3Galß-3GlcNAcß-3Galß-4Glc 0 2 Sialylparagl 138 NeuAcα-3Galß-4GlcNAcß-3Galß-4Glcß-Cer 0 10 oboside NeuAcα-3Galß-3GlcNAcß-3Galß-4Glc SA(3')-LNFP- 139 │ 0 22 II Fucα-4 NeuAcα-3Galß-4GlcNAcß-3Galß-4Glc SA(3')-LNFP- 140 │ 0 12 III Fucα-3 NeuAcα-3Galß-4GlcNAcß-3Galß-4Glcß-Cer36 141 GSC-64 │ 0 22 Fucα-3 NeuAcα-3/6Galß-3GlcNAcß-3Galß-4Glc SA(3/6)LNFP- 142 │ 0 6 I Fucα-2 Neuα-3Galß-4GlcNAcß-3Galß-4Glcß-Cer36 143 GSC-472 │ 0 16 Fucα-3 NeuAcα-3Galß-3GlcNAcß-3Galß-4Glc 144 DSLNT │ 0 2 NeuAcα-6 145 GSC-190 SU-3GlcAß-3Galß-4GlcNAcß-3Galß-4Glcß-Cer42 0 5

146 GSC-208 SU-3GlcAß-3Galß-4GlcNAcß-3Galß-4Glcß-C30 0 13 SU-3Galß-4GlcNAcß-3Galß-4Glc SU(3')-LNFP- 147 │ 0 13 III Fucα-3 SU-6 │ SU(3',6)- 148 SU-3Galß-4GlcNAcß-3Galß-4Glc 0 10 LNFP-III │ Fucα-3 SU-6Galß-4GlcNAcß-3Galß-4Glc SU(6')-LNFP- 149 │ 0 4 III Fucα-3 SU-6 │ 150 GSC-406 Neuα-3Galß-4GlcNAcß-3Galß-4Glcß-Cer36 0 11 │ Fucα-3 SU-6 GSC-268 151 │ 0 12 deNAc Neuα-3Galß-4GlcNß-3Galß-4Glcß-Cer36 │

253

Fucα-3

SU-6 │ 152 GSC-269 NeuAcα-3Galß-4GlcNAcß-3Galß-4Glcß-Cer36 0 6 │ Fucα-3 SU-6 SU-6 │ │ 153 GSC-270 NeuAcα-3Galß-4GlcNAcß-3Galß-4Glcß-Cer36 0 5 │ Fucα-3 154 GSC-189 GlcAß-3Galß-4GlcNAcß-3Galß-4Glcß-Cer42 0 1

155 GSC-207 GlcAß-3Galß-4GlcNAcß-3Galß-4Glcß-C30 0 11 SU-6 │ 156 GSC-268 NeuAcα-3Galß-4GlcNAcß-3Galß-4Glcß-Cer36 0 11 │ Fucα-3 NeuAcα-6Galß-4GlcNAcß-3Galß-4Glc SA(6')-LNFP- 157 │ 0 11 VI Fucα-3 Galß-4GlcNAcß-3Galß-4Glc-AO 158 LNFP-III-AO │ 0 0 Fucα-3 SU-3Galß-3GlcNAcß-4Galß-4Glc SU(3')-LNFP- 159 │ 0 19 II Fucα-4 160 LSTc NeuAcα-6Galβ4-GlcNAcβ3-Galβ4-Glc 0 4 Galβ-3GlcNAcβ-3Galβ-4Glc 161 LSTb │ 7039 377 NeuAcα-6 Galß-4GlcNAcß-3Galß-4Glc 162 LNnDFH-V │ │ 0 3 Fucα-3 Fucα-2 Fucα-2Galß-3GlcNAcß-3Galß-4Glc 163 LNDFH-I │ 0 5 Fucα-4 GalNAcα-3Galß-3GlcNAcß-3Galß-4Glc 164 A-Hepta │ │ 0 20 Fucα-2 Fucα-4 Leb- Fucα-2Galß-3GlcNAcß-3Galß-4Glcß-Cer 165 hexaosylcera │ 0 13 mide Fucα-4 B- Galα-3Galß-4GlcNAcß-3Galß-4Glcß-Cer 166 hexaosylcera │ 0 2 mide Fucα-2 167 GSC-31 NeuAcα-3Galß-4GlcNAcß-3Galß-4Glcß-Cer36 0 20

168 SU(3')-Tri SU-3Galß-4GlcNAcß-3Gal 0 21 SU-6Galß-3GlcNAcß-3Galß-4Glc SU(6')-LNFP- 169 │ 0 0 II Fucα-4 Led-II 170 pentaosylcer Fucα-2Galß-3GlcNAcß-3Galß-4Glcß-CerA 0 24 amide Led-I 171 pentaosylcer Fucα-2Galß-3GlcNAcß-3Galß-4Glcß-CerB 0 39 amide 172 DLNN GlcNAcß-3Galß-4Glc 0 1 NeuAcα-3Galß-4GlcNAcß-3Galß-Cer36 173 GSC-105 │ 0 3 Fucα-3 NeuAcα-3Galß-4GlcNAcß-6Galß-4Glcß-Cer36 174 GSC-154 │ 0 13 Fucα-3 NeuGcα-3Galβ-4GlcNAcβ-3Galß-Cer36 175 GSC-177 │ 0 0 Fucα-3 176 GSC-396 NeuGcα-3Galβ-3GlcNAcβ-3Galβ-4Glcß-C30 0 0

177 GSC-147 KDNα-3Galβ-3GlcNAcβ-3Galβ-4Glcß-Cer36 0 6

254

KDNα-3Galβ-4GlcNAcβ-3Galß-C30 178 GSC-341 │ 0 7 Fucα-3 KDNα-3Galβ-4GlcNAcβ-3Galβ-4Glcß-Cer36 179 GSC-149 │ 0 14 Fucα-3 (6P)- 180 P-6Fructose-AO 0 2 Fructose-AO 181 GSC-191 GlcAß-3Galß-4GlcNAcß-3Galß-4Glcß-Cer36 0 17

182 GSC-192 SU-3GlcAß-3Galß-4GlcNAcß-3Galß-4Glcß-Cer36 0 4 (3-carboxymethyl)Galß-4GlcNAcß-3Galß-Cer36 183 GSC-225 │ 0 5 Fucα-3 184 GSC-272 NeuAcα-3Galβ-3GlcNAcβ-3Galβ-4Glcß-C30 0 13

185 GSC-273 NeuAcα-3Galβ-4GlcNAcβ-3Galβ-4Glcß-C30 0 40 KDNα-3Galß-4GlcNAcß-3Galß-4Glcß-C30 186 GSC-311 │ 0 6 Rhaα-3 KDNα-3Galß-4GlcNAcß-3Galß-4Glcß-C30 187 GSC-314 │ 0 15 Fucα-3 188 GSC-397 NeuGcα-6Galβ-3GlcNAcβ-3Galβ-4Glcß-C30 0 13 NeuAcα-3Galß-4GlcNAcß-3Galß-C30 189 GSC-479 │ 0 10 Fucα-3 NeuAcα-3Galß-4GlcNß-3Galß-4Glcß-Cer36 190 GSC-533 │ 0 4 Fucα-3 NeuAcα-3Galß-4GlcNAcß-3Galß-Cer36 191 GSC-123 │ 0 17 (4-deoxy)Fucα-3 NeuAcα-3Galß-4GlcNAcß-3Galß-Cer36 192 GSC-127 │ 0 8 (6-deoxy)L-Talα-3 NeuAcα-3(4-deoxy)Galß-4GlcNAcß-3Galß-Cer36 193 GSC-175 │ 0 1 Fucα-3 NeuAcα-3(6-deoxy)Galß-4GlcNAcß-3Galß-Cer36 194 GSC-176 │ 0 5 Fucα-3

Group C: polylactosamine (linear,branched)-based

195 pLNH Galß-3GlcNAcß-3Galß-4GlcNAcß-3Galß-4Glc 78 28

196 pLNnH Galß-4GlcNAcß-3Galß-4GlcNAcß-3Galß-4Glc 1048 17 Galß-4GlcNAcß-6 │ 197 LNnH Galß-4Glc 3666 322 │ Galß-4GlcNAcß-3 Galß-4GlcNAcß-6 │ 198 LNH Galß-4Glc 824 13 │ Galß-3GlcNAcß-3 Galß-3GlcNAcß-3Galß-4GlcNAcß-6 │ 199 iLNO Galß-4Glc 2437 75 │ Galß-3GlcNAcß-3 Galß-4GlcNAcß-6 I- │ 200 octaosylcera Galß-4GlcNAcß-3Galß-4Glcß-Cer 863 27 mide │ Galß-4GlcNAcß-3 Galα-3Galß-4GlcNAcß-6 B-like │ 201 decaosylcera Galß-4GlcNAcß-3Galß-4Glcß-Cer 10 9 mide │ Galα-3Galß-4GlcNAcß-3

202 LND Galß-4GlcNAcß-6 1258 54 │

255

Galß-4GlcNAcß-6 │ │ Galß-3GlcNAcß-3 Galß-4Glc │ Galß-3GlcNAcß-3 Galα-3Galß-4GlcNAcß-6 │ B-like Galα-3Galß-4GlcNAcß-6 Galß-4GlcNAcß-3Galß-4Glcß-Cer 203 pentadecaosy │ │ 0 7 lceramide Galß-4GlcNAcß-3 │ Galα-3Galß-4GlcNAcß-3

Galα-3Galß-4GlcNAcß-6 │ Galα-3Galß-4GlcNAcß-6 Galß-4GlcNAcß-3Galß-4Glcß-Cer B-like │ │ 204 eicosaosylce Galα-3Galß-4GlcNAcß-6 Galß-4GlcNAcß-3 0 8 │ │ ramide Galß-4GlcNAcß-3 │ Galα-3Galß-4GlcNAcß-3 Galß-3GlcNAcß-3Galß-4GlcNAcß-3Galß-4Glc 205 pLNFH-IV │ 1253 35 Fucα-3 Galß-3GlcNAcß-3Galß-4GlcNAcß-3Galß-4Glc 206 DFpLNH-II │ │ 0 7 Fucα-4 Fucα-3 Galß-4GlcNAcß-6 │ │ 207 MFLNH-III Fucα-3 Galß-4Glc 443 1 │ Galß-3GlcNAcß-3 Galß-4GlcNAcß-6 │ │ 208 MFLNnH(a) Fucα-3 Galß-4Glc 4704 103 │ Galß-4GlcNAcß-3 Galß-4GlcNAcß-6 │ Galß-4Glc 209 DFLNH(c) │ 0 16 Fucα-2Galß-3GlcNAcß-3 │ Fucα-4 Fucα-3 │ Galß-4GlcNAcß-6 │ 210 DFLNnH Galß-4Glc 0 7 │ Galß-4GlcNAcß-3 │ Fucα-3 Galß-4GlcNAcß-6 │ │ Fucα-3 Galß-4Glc 211 TFLNH │ 0 1 Fucα-2Galß-3GlcNAcß-3 │ Fucα-4 Galß-4GlcNAcß-6 │ │ 212 DFLNH(a) Fucα-3 Galß-4Glc 0 1 │ Fucα-2Galß-3GlcNAcß-3 Galß-3GlcNAcß-3Galß-4GlcNAcß-6 │ │ 213 MFiLNO-IV Fucα-3 Galß-4Glc 1915 153 │ Galß-3GlcNAcß-3 Galß-3GlcNAcß-3Galß-4GlcNAcß-6 │ │ │ Fucα-4 Fucα-3 Galß-4Glc 214 TFiLNO │ 0 5 Galß-3GlcNAcß-3 │ Fucα-4 Galß-4GlcNAcß-6 │ │ 215 MFLND Fucα-3 Galß-4GlcNAcß-6 547 45 │ │ Galß-3GlcNAcß-3 Galß-4Glc │

256

Galß-3GlcNAcß-3

Galß-4GlcNAcß-6 │ 216 MSLNnH-I Galß-4Glc 14 13 │ NeuAcα-6Galß-3GlcNAcß-3 Galß-4GlcNAcß-6 │ │ 217 MFMSLNnH Fucα-3 Galß-4Glc 0 15 │ NeuAcα-6Galß-3GlcNAcß-3 Galß-4GlcNAcß-6 │ │ 218 MSMFLNH Fucα-3 Galß-4Glc 0 6 │ NeuAcα-3Galß-3GlcNAcß-3 NeuAcα-6Galß-4GlcNAcß-6 │ 219 DSLNnH Galß-4Glc 0 35 │ NeuAcα-6Galß-4GlcNAcß-3 220 GSC-219 SU-3GlcAß-3Galß-4GlcNAcß-3Galß-4GlcNAcß-3Galß-4Glcß-Cer36 0 16

221 GSC-216 GlcAß-3Galß-4GlcNAcß-3Galß-4GlcNAcß-3Galß-4Glcß-Cer42 0 17

222 GSC-218 GlcAß-3Galß-4GlcNAcß-3Galß-4GlcNAcß-3Galß-4Glcß-Cer36 0 4 NeuAcα-6Galß-4GlcNAcß-6 │ 223 MSLNH Galß-4Glc 527 81 │ Galß-3GlcNAcß-3 NeuAcα-3Galβ-4GlcNAcβ-3Galβ-3GlcNAc 224 C4U │ │ │ 0 12 SU-6 SU-6 SU-6 Fucα-3 │ 225 FucC4U NeuAcα-3Galβ-4GlcNAcβ-3Galβ-3GlcNAc 0 12 │ │ │ SU-6 SU-6 SU-6 Fucα-2Galß-3GlcNAcß-3Galß-4GlcNAcß-3Galß-4Glc 226 TFpLNH-I │ │ 0 3 Fucα-4 Fucα-3 Fucα-3 │ Galß-4GlcNAcß-6 │ 227 DFLNH(b) Galß-4Glc 0 8 │ Galß-3GlcNAcß-3 │ Fucα-4 Galß-4GlcNAcß-6 │ I- Galß-4GlcNAcß-6 Galß-4GlcNAcß-3Galß-4Glcß-Cer 228 dodecaosylce │ │ 2060 290 ramide Galß-4GlcNAcß-3 │ Galß-4GlcNAcß-3 Galß-4GlcNAcß-6 │ Galß-4GlcNAcß-6 Galß-4GlcNAcß-3Galß-4Glcß-Cer I- │ │ 229 hexadecaosyl Galß-4GlcNAcß-6 Galß-4GlcNAcß-3 468 35 │ │ ceramide Galß-4GlcNAcß-6 │ Galß-4GlcNAcß-3

Galß-4GlcNAcß-6 │ Galß-4GlcNAcß-6 Galß-4GlcNAcß-3Galß-4Glcß-Cer │ │ I- Galß-4GlcNAcß-6 Galß-4GlcNAcß-3 230 eicosaosylce │ │ 302 7 Galß-4GlcNAcß-6 Galß-4GlcNAcß-6 ramide │ │ Galß-4GlcNAcß-3 │ Galß-4GlcNAcß-3

Galß-4GlcNAcß-6 │ Galß-4GlcNAcß-6 Galß-4GlcNAcß-3Galß-4Glcß-Cer │ │ B-like Galß-4GlcNAcß-6 Galß-4GlcNAcß-3 231 pentaeicosao │ │ 0 16 Galα-3Galß-4GlcNAcß-6 Galß-4GlcNAcß-6 sylceramide │ │ Galß-4GlcNAcß-3 │ Galα-3Galß-4GlcNAcß-3

257

Galα-3Galß-4GlcNAcß-6 │ │ B-III Fucα-2 Galß-4GlcNAcß-3Galß-4Glcß-Cer 232 dodecaosylce │ 0 6 ramide Galα-3Galß-4GlcNAcß-3 │ Fucα-2 Galα-3Galß-4GlcNAcß-6 │ │ B-IV Fucα-2 Galß-4GlcNAcß-3Galß-4Glcß-Cer 233 tetradecaosy │ 0 13 lceramide Galα-3Galß-4GlcNAcß-3Galß-4GlcNAcß-3 │ Fucα-2 234 GSC-217 SU-3GlcAß-3Galß-4GlcNAcß-3Galß-4GlcNAcß-3Galß-4Glcß-Cer42 0 9 NeuAcα-3Galß-4GlcNAcß-3Galß-4GlcNAcß-3Galß-4Glcß-Cer36 235 GSC-220 │ │ 0 9 Fucα-3 Fucα-3 NeuAcα-3Galß-4GlcNAcß-3Galß-4GlcNAcß-3Galß-4Glcß-Cer36 236 GSC-221 │ 0 6 Fucα-3

Group D: N-glycan related

237 Man2aGN2 Manα-6Manß-4GlcNAcß-4GlcNAc 0 5 Manα-3Manα-6 │ 238 Man4aGN2 Manß-4GlcNAcß-4GlcNAc 0 3 │ Manα-3 Manα-6 │ 239 Man4bGN2 Manα-3Manα-6 0 2 │ Manß-4GlcNAcß-4GlcNAc Manα-6 │ Manα-3Manα-6 240 Man5GN2 │ 0 8 Manß-4GlcNAcß-4GlcNAc │ Manα-3 Manα-6 │ Manα-3Manα-6 241 Man7(D1)GN2 │ 0 6 Manß-4GlcNAcß-4GlcNAc │ Manα-2Manα-2Manα-3 Manα-6 Fucα-6 │ │ 242 Man3FGN2 Manß-4GlcNAcß-4GlcNAc 0 10 │ Manα-3 Manα-2Manα-6 │ Manα-3Manα-6 243 Man7(D3)GN2 │ 0 26 Manß-4GlcNAcß-4GlcNAc │ Manα-2Manα-3 Manα-6 │ Manα-3Manα-6 244 Man6GN2 │ 0 11 Manß-4GlcNAcß-4GlcNAc │ Manα-2Manα-3 Manα-2Manα-6 │ Manα-3Manα-6 Man8(D1D3)GN 245 │ 0 27 2 Manß-4GlcNAcß-4GlcNAc │ Manα-2Manα-2Manα-3 Manα-2Manα-6 │ 246 Man9GN2 Manα-2Manα-3Manα-6 0 11 │ Manß-4GlcNAcß-4GlcNAc

258

│ Manα-2Manα-2Manα-3 Manα-6 │ 247 Man3XylGN2 Xylß-2Manß-4GlcNAcß-4GlcNAc 0 7 │ Manα-3 Galß-4GlcNAcß-2Manα-6 Fucα-6 │ │ 248 N1 Manß-4GlcNAcß-4GlcNAc 867 178 │ Manα-3 Manα-6 │ 249 N2 Manß-4GlcNAcß-4GlcNAc 2579 85 │ Galß-4GlcNAcß-2Manα-3 GlcNAcß-2Manα-6 │ 250 NGA2 Manß-4GlcNAcß-4GlcNAc 0 1 │ GlcNAcß-2Manα-3 GlcNAcß-2Manα-6 │ 251 NGA2B GlcNAcß-4Manß-4GlcNAcß-4GlcNAc 0 28 │ GlcNAcß-2Manα-3 Galß-4GlcNAcß-2Manα-6 │ 252 NA2 Manß-4GlcNAcß-4GlcNAc 2851 193 │ Galß-4GlcNAcß-2Manα-3 GlcNAcß-2Manα-6 Fucα-6 │ │ 253 NGA2F Manß-4GlcNAcß-4GlcNAc 0 7 │ GlcNAcß-2Manα-3 Galß-4GlcNAcß-2Manα-6 Fucα-6 │ │ 254 NA2F Manß-4GlcNAcß-4GlcNAc 7010 36 │ Galß-4GlcNAcß-2Manα-3 Galß-4GlcNAcß-2Manα-6 Fucα-6 │ │ 255 NA2FB GlcNAcß-4Manß-4GlcNAcß-4GlcNAc 3038 179 │ Galß-4GlcNAcß-2Manα-3 NeuAcα-6Galß-4GlcNAcß-2Manα-6 │ 256 A2(2-6) Manß-4GlcNAcß-4GlcNAc 0 15 │ NeuAcα-6Galß-4GlcNAcß-2Manα-3 NeuAcα-3Galß-4GlcNAcß-2Manα-6 Fucα-6 │ │ 257 A2F(2-3) Manß-4GlcNAcß-4GlcNAc 0 13 │ NeuAcα-3Galß-4GlcNAcß-2Manα-3 GlcNAcß-2Manα-6 │ GlcNAcß-4Manß-4GlcNAcß-4GlcNAc 258 NGA3B │ 0 3 GlcNAcß-4Manα-3 │ GlcNAcß-2 Galß-4GlcNAcß-2Manα-6 │ Manß-4GlcNAcß-4GlcNAc-Fucα-3 259 NA3-Lex │ 2046 14 Galß-4GlcNAcß-4Manα-3 │ Galß-4GlcNAcß-2 Galß-4GlcNAcß-2Manα-6 │ Manß-4GlcNAcß-4GlcNAc 260 NA3 │ 9006 322 Galß-4GlcNAcß-4Manα-3 │ Galß-4GlcNAcß-2

259

NeuAcα-3Galß-4GlcNAcß-2Manα-6 │ Manß-4GlcNAcß-4GlcNAc 261 A3 │ 0 7 NeuAcα-3Galß-4GlcNAcß-4Manα-3 │ NeuAcα-6Galß-4GlcNAcß-2 Galß-4GlcNAcß-6 │ Galß-4GlcNAcß-2Manα-6 │ 262 NA4 Manß-4GlcNAcß-4GlcNAc 8825 58 │ Galß-4GlcNAcß-4Manα-3 │ Galß-4GlcNAcß-2 GlcNAcß-6 │ GlcNAcß-2Manα-6 │ 263 NGA4 Manß-4GlcNAcß-4GlcNAc 0 6 │ GlcNAcß-2Manα-3 │ GlcNAcß-4 GlcNAcß-2 │ GlcNAcß-4Manα-6 │ │ GlcNAcß-6 │ │ 264 NGA5B GlcNAcß-4Manß-4GlcNAcß-4GlcNAc 638 20 │ │ GlcNAcß-6 │ │ GlcNAcß-4Manα-3 │ GlcNAcß-2 Manα-6 │ Manα-3Manα-6 265 GNMan5BGN2 │ 0 4 GlcNAcß-4Manß-4GlcNAcß-4GlcNAc │ GlcNAcß-2Manα-3 Manα-6Man 266 Man3(α3,α6) │ 0 10 Manα-3 Manα-6Manα-6Man 267 Man5(α3,α6) │ │ 0 2 Manα-3 Manα-3 268 Man2(α6) Manα-6Man 0 33 Galß-4GlcNAcß-2Manα-6 │ 269 N4 Manß-4GlcNAcß-4GlcNAc 1751 654 │ Manα-3 Galß-4GlcNAcß-2Manα-6 Fucα-6 │ │ 270 NA2F-AO Manß-4GlcNAcß-4GlcNAc-AO 3815 176 │ Galß-4GlcNAcß-2Manα-3 Manα-2Manα-6 │ Manα-2Manα-3Manα-6 271 Man9GN2-AO │ 0 19 Manß-4GlcNAcß-4GlcNAc-AO │ Manα-2Manα-2Manα-3 Manα-2Manα-6 │ Manα-6 272 Glc1Man9GN2 │ │ 0 7 Manα-2Manα-3 Manβ-4GlcNAcβ-4GlcNAc │ Glcα-3Manα-2Manα-2Manα-3 Manα-6 273 Man3FXylGN2 │ 0 20 Xylβ-2Manα-4GlcNAcβ-4GlcNAc │ │

260

Manα-3 Fucα-3

NeuAcα-Galß-4GlcNAcß-2Manα-6 │ 274 AGP-Bi-Ac2 Manß-4GlcNAcß-4GlcNAc 0 4 │ NeuAcα-Galß-4GlcNAcß-2Manα-3 NeuGcα-Galß-4GlcNAcß-2Manα-6 ? │ 275 AGP-Bi-AcGc Manß-4GlcNAcß-4GlcNAc 0 9 │ NeuAcα-Galß-4GlcNAcß-2Manα-3 NeuGcα-Galß-4GlcNAcß-2Manα-6 │ 276 AGP-Bi-Gc2 Manß-4GlcNAcß-4GlcNAc 0 9 │ NeuGcα-Galß-4GlcNAcß-2Manα-3 277 Man1GN1 Manß-4GlcNAc 0 1 (Galß-4) GlcNAcß-2Manα-6 │ 278 N3 Manß-4GlcNAcß-4GlcNAc 731 45 │ ? GlcNAcß-2Manα-3 279 Man2GN1 Manα-3Manß-4GlcNAc 0 15

280 Fuc-GlcNAc Fucα-6GlcNAc 0 13

281 Man2(α3) Manα-3Man 0 31

282 Man2(α2) Manα-2Man 0 8 Manα-6 │ Manα-3Manα-6 Man7(D1)GN2- 283 │ 0 11 AO Manß-4GlcNAcß-4GlcNAc-AO │ Manα-2Manα-2Manα-3 Manα-2Manα-6 │ Manα-6 Glc1Man9GN2- 284 │ │ 0 2 AO Manα-2Manα-3 Manβ-4GlcNAcβ-4GlcNAc-AO │ Glcα-3Manα-2Manα-2Manα-3 Manα-6 │ Manα-3Manα-6 Glc2Man7(D1) 285 │ 0 32 GN1-AO Manß-4GlcNAc-AO │ Glcα-3Glcα-3Manα-2Manα-2Manα-3 Manα-6 │ Manα-3Manα-6 Glc3Man7(D1) 286 │ 0 13 GN1-AO Manß-4GlcNAc-AO │ Glcα-2Glcα-3Glcα-3Manα-2Manα-2Manα-3

Group E: polysialyl

287 SA2(α8) NeuAcα-8NeuAc 0 39

288 SA3(α8) NeuAcα-8NeuAcα-8NeuAc 0 2

289 SA4(α8) NeuAcα-8NeuAcα-8NeuAcα-8NeuAc 0 15

290 SA5(α8) NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα-8NeuAc* 0 20

291 SA6(α8) NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα-8NeuAc* 0 16

292 SA7(α8) NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα-8NeuAc* 0 10 NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα-8NeuAc-8NeuAcα- 293 SA8(α8) 0 14 8NeuAc* NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα-8NeuAc-8NeuAcα- 294 SA9(α8) 0 26 8NeuAc-8NeuAcα* NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα-8NeuAc-8NeuAcα- 295 SA10(α8) 0 30 8NeuAcα-8NeuAcα-8NeuAc* NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα-8NeuAc-8NeuAcα- 296 SA11(α8) 0 2 8NeuAαc-8NeuAcα-8NeuAcα-8NeuAc*

261

297 NeuAc NeuAc 0 20

298 NeuAc-AO NeuAc-AO 0 2

299 NeuGc NeuGc 0 5

300 NeuGc-AO NeuGc-AO 0 10

Group F: Ganglioside-related

301 Asialo-GM2 GalNAcβ-4Galβ-4Glcβ-Cer 0 11 Asialo-GM1- 302 Galß-3GalNAcß-4Galß-4Glc 3008 116 Tetra 303 Asialo-GM1 Galβ-3GalNAcβ-4Galβ-4Glcβ-Cer 889 5 GalNAcβ-4Galβ-4Glcβ-Cer 304 GM2 │ 0 5 NeuAcα-3 305 GM1b NeuAcα-3Galβ-3GalNAcß-4Galß-4Glcß-Cer* 10204 184 Galβ-3GalNAcβ-4Galβ-4Glcβ-Cer 306 GM1(Gc) │ 10163 919 NeuGcα-3 Galβ-3GalNAcβ-4Galβ-4Glcβ-Cer 307 GM1 │ 4264 116 NeuAcα-3 GalNAcß-4Galß-4Glcß-Cer 308 GD2 │ 0 19 NeuAcα-8NeuAcα-3 NeuAcα-3Galß-3GalNAcß-4Galß-4Glcß-Cer 309 GD1a │ 48 24 NeuAcα-3 Galß-3GalNAcß-4Galß-4Glcß-Cer 310 GD1b │ 3827 360 NeuAcα-8NeuAcα-3 GalNAcß-4Galß-3GalNAcß-4Galß-4Glcß-Cer │ │ NeuGcα-3 NeuAcα-3 GalNAc- 311 0 20 GD1a(Ac,Gc) GalNAcß-4Galß-3GalNAcß-4Galß-4Glcß-Cer │ │ NeuAcα-3 NeuGcα-3 NeuAcα-8NeuAcα-3Galβ-3GalNAcβ-4Galβ-4Glcβ-Cer 312 GT1a │ 0 0 NeuAcα-3 NeuAcα-3Galβ-3GalNAcβ-4Galβ-4Glcβ-Cer 313 GT1b │ 0 12 NeuAcα-8NeuAcα-3 NeuAcα-8NeuAcα-3Galβ-3GalNAcβ-4Galβ-4Glcβ-Cer 314 GQ1b │ 0 1 NeuAcα-8NeuAcα-3 Galß-3GalNAcß-4Galß-4Glcß-Cer 315 SM1a │ 10491 54 SU-3 SU-3GalNAcß-4Galß-4Glcß-Cer 316 SB2 │ 0 11 SU-3 Galβ-3GalNAcβ-4Galβ-4Glc 317 GM1-penta │ 11172 420 NeuAcα-3 Galβ-3GalNAcβ-4Galβ-4Glc GM1(Gc)- 318 │ 8200 549 penta NeuGcα-3 NeuAcα-3Galß-3GalNAcß-4Galß-4Glc 319 GD1a-hexa │ 0 7 NeuAcα-3 GalNAcß-4Galß-4Glcß-Cer 320 SM2 │ 271 27 SU-3 321 GSC-68 NeuAcα-6Galß-3GalNAcß-4Galß-4Glcß-Cer36 0 5 GalNAcβ-4Galβ-4Glcß-Cer36 322 GSC-442 │ 0 1 NeuAcα-6

323 GSC-155 Galβ-3GalNAcβ-4Galβ-4Glcß-Cer36 4717 172 │

262

NeuAcα-6

NeuAcα-3Galβ-3GalNAcβ-4Galβ-4Glcß-Cer36 324 GSC-118 │ 0 7 NeuAcα-6 NeuAcα-6Galβ-3GalNAcβ-4Galβ-4Glcß-Cer36 325 GSC-107 │ 0 7 NeuAcα-6 GalNAcβ-4Galβ-4Glcß-Cer36 326 GSC-193 │ 0 3 KDNα-3 KDNα-3Galβ-3GalNAcβ-4Galβ-4Glcß-Cer36 327 GSC-195 │ 0 1 KDNα-3 SU-6 328 GSC-335 │ 0 2 NeuAcα-3Galß-3GalNAcß-4Galß-4Glcß-Cer36 329 GSC-488 NeuAcα-3Galß-3GalNAcß-C30 0 2 SU-6 330 GSC-489 │ 0 25 NeuAcα-3Galß-3GalNAcß-C30 NeuAcα-3Galß-3GalNAcß-C30 331 GSC-490 │ 0 18 NeuAcα-6 332 GSC-491 NeuAcα-3Galß-3(6-deoxy-6-CH2COOH)GalNAcß-C30 0 2 GalNAcβ-4Galβ-3Glcß-C30 333 GSC-576 │ 0 14 NeuAcα-3 GalNAcβ-4Galβ-4Glcß-Cer36 334 GSC-108 │ 0 8 NeuAcα-3

Group G: O-glycan related

335 Galß-3GalNAc Galß-3GalNAc 0 3

336 Galß-6GalNAc Galß-6GalNAc 1724 23 GalNAcα- 337 GalNAcα-3GalNAc 0 30 3GalNAc 338 A8/1 GlcNAcα-4Galβ-OX 0 33 GlcNAcα-4Galβ-3Galβ-OX 339 A15/3 │ 0 10 Fucα-2 GalNAcβ-4Galβ-OX 340 B12/3 │ 19 1 NeuGcα-3 GalNAcβ-4Galβ-3GlcNAcβ-OX 341 B13/b │ 0 12 KDNα-3 NeuAcα-3Galß-3GalNAc 342 DST │ 0 2 NeuAcα-6 343 A15/1 SU-6GlcNAcß-OY 0 4 SU-6 344 A8/2 │ 0 3 Fucα-3GlcNAcß-OY 345 B13/a GlcAβ-3Galβ-3GlcNAcβ-OX 0 8

346 Man-Ser Manα-Ser 0 15

347 Notch-1 Fucα-Thr 0 39

348 Notch-2 GlcNAcβ-3Fucα-Thr 0 7

349 Notch-3 Galβ-4GlcNAcβ-3Fucα-Thr 2906 274

350 Man-Thr Man-Thr 0 9

351 GalNAc-Thr GalNAc-Thr 0 15

352 GalNAc-Ser GalNAc-Ser 0 10

353 Man-Thr-Succ Man-Thr-Succ 0 8

354 Man-Ser-Succ Man-Ser-Succ 0 21

263

Galß- 355 Galß-3GalNAc-AO 2774 11 3GalNAc-AO Galß- 356 Galß-6GalNAc-AO 11389 434 6GalNAc-AO 357 BSM-Di-A1-AO NeuGcα-6GalNAc-AO 0 2

Group H: Glycosaminoglycan (GAG)-related

ΔUA-3GalNAcß-4GlcAß-3GalNAcß-4GlcAß-3GalNAcß-4GlcAß- | | | SU-4 SU-4 SU-4 358 CSA-14 0 16 3GalNAcß-4GlcAß-3GalNAcß-4GlcAß-3GalNAcß-4GlcAß-3GalNAc* | | | | SU-4 SU-4 SU-4 SU-4 ΔUA-3GalNAcß-4IdoAß-3GalNAcß-4IdoAß-3GalNAcß-4IdoAß- | | | SU-4 SU-4 SU-4 359 CSB-14 0 5 3GalNAcß-4IdoAß-3GalNAcß-4IdoAß-3GalNAcß-4IdoAß-3GalNAc* | | | | SU-4 SU-4 SU-4 SU-4 ΔUA-3GalNAcß-4GlcAß-3GalNAcß-4GlcAß-3GalNAcß-4GlcAß- | | | SU-6 SU-6 SU-6 360 CSC-14 0 4 3GalNAcß-4GlcAß-3GalNAcß-4GlcAß-3GalNAcß-4GlcAß-3GalNAc* | | | | SU-6 SU-6 SU-6 SU-6 ΔUA-4GlcNS-AO 361 Hep-Di-IS-AO │ │ 0 25 SU-2 6-SU ΔUA-3GalNAcß-4GlcAß-3GalNAc** 362 CSA-4 │ │ 0 3 SU-4 SU-4 ΔUA-3GalNAcß-4IdoAα-3GalNAc** 363 CSB-4 │ │ 0 23 SU-4 SU-4 ΔUA-3GalNAcß-4GlcAß-3GalNAc* 364 CSC-4 │ │ 0 24 SU-6 SU-6 365 HA-S4 GlcAß-3GlcNAcß-4GlcAß-3GlcNA* 0 11 GlcAß-3GlcNAcß-4GlcAß-3GlcNAcß-4GlcAß-3GlcNAcß-4GlcAß- 366 HA-S14 0 18 3GlcNAcß-4GlcAß-3GlcNAcß-4GlcAß-3GlcNAcß-4GlcAß-3GlcNA* ΔUA-4GlcNS 367 Hep-Di IS │ │ 0 23 SU-2 6-SU 368 HS-8 ΔUA-4GlcNAcα-4HexA-4GlcNAcα-4HexA-4GlcNAcα-4HexA-4aMan* 0 2

Group I: Homo-oligomers

369 Glc2(α2) Glcα-2Glc 0 31

370 Lam-5 Glcß-3Glcß-3Glcß-3Glcß-3Glc 0 6

371 Lam-6 Glcß-3Glcß-3Glcß-3Glcß-3Glcß-3Glc 0 20

372 Malto-2 Glcα-4Glc 0 5

373 Malto-3 Glcα-4Glcα-4Glc 0 3

374 Malto-4 Glcα-4Glcα-4Glcα-4Glc 0 7

375 Cello-5 Glcß-4Glcß-4Glcß-4Glcß-4Glc 0 1

376 Dextran-2 Glcα-6Glc 0 14

377 Dextran-4 Glcα-6Glcα-6Glcα-6Glc 0 19

378 Dextran-5 Glcα-6Glcα-6Glcα-6Glcα-6Glc 0 11

379 Dextran-6 Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glc 0 16 Glcβ-6Glcβ-6Glcβ-6Glcβ-6Glcβ-6Glcβ-6Glcβ-6Glcβ-6Glcβ- 380 Pust-13-AD 0 22 6Glcβ-6Glcβ-6Glcβ-6Glc-AD* 381 GN2 GlcNAcß-4GlcNAc 0 9

264

382 GN3 GlcNAcß-4GlcNAcß-4GlcNAc 0 8

383 Man4(ß4) Manß-4Manß-4Manß-4Man 0 14

384 Man5(ß4) Manß-4Manß-4Manß-4Manß-4Man 0 25

385 Man6(ß4) Manß-4Manß-4Manß-4Manß-4Manß-4Man 0 17

386 Xyl5(ß4) Xylß-4Xylß-4Xylß-4Xylß-4Xyl 0 9

387 Xyl6(ß4) Xylß-4Xylß-4Xylß-4Xylß-4Xylß-4Xyl 0 41

388 Ara6(α5) Araα-5Araα-5Araα-5Araα-5Araα-5Ara 0 40

389 Ara7(α5) Araα-5Araα-5Araα-5Araα-5Araα-5Araα-5Ara 0 5 Glcβ-3Glcβ-3Glcβ-3Glcβ-3Glcβ-3Glcβ-3Glcβ-3Glcβ-3Glcβ- 390 Curd-11-AO 0 8 3Glcβ-3Glc-AO* Glcß-6Glcß-6Glcß-6Glcß-6Glcß-6Glcß-6Glcß-6Glcß-6Glcß- 391 Pust-11-AO 0 11 6Glcß-6Glc-AO* Glcβ-6Glcβ-6Glcβ-6Glcβ-6Glcβ-6Glcβ-6Glcβ-6Glcβ-6Glcβ-6Glc- 392 Pust-10-AO 0 12 AO* Glcβ-3Glcβ-3Glcβ-3Glcβ-3Glcβ-3Glcβ-3Glcβ-3Glcβ-3Glcβ- 393 NSG-11-AO 0 7 3Glcβ-3Glc-AO* 394 Pust-7-AO Glcß-6Glcß-6Glcß-6Glcß-6Glcß-6Glcß-6Glc-AO 0 5

395 Dextran-7 Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glc 0 3 Glcβ-3Glcβ-3Glcβ-3Glcβ-3Glcβ-3Glcβ-3Glcβ-3Glcβ-3Glcβ- 396 Curd-13 0 6 3Glcβ-3Glcβ-3Glcβ-3Glc* Glcβ-3Glcβ-3Glcβ-3Glcβ-3Glcβ-3Glcβ-3Glcβ-3Glcβ-3Glcβ- 397 Curd-13-AO 0 3 3Glcβ-3Glcβ-3Glcβ-3Glc-AO* Glcβ-3Glcβ-3Glcβ-3Glcβ-3Glcβ-3Glcβ-3Glcβ-3Glcβ-3Glcβ- 398 Curd-11 0 12 3Glcβ-3Glc* 399 Dextran-7-AO Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glc-AO 0 20

400 Dextran-3-AO Glcα-6Glcα-6Glc-AO 0 5

401 Glc2(α3) Glcα-3Glc 0 33

402 Cello-2 Glcß-4Glc 0 6

403 Glc2(α2)-AO Glcα-2Glc-AO 0 11

404 Glc2(α3)-AO Glcα-3Glc-AO 0 40

405 Cello-2-AO Glcß-4Glc-AO 0 1

406 Malto-5 Glcα-4Glcα-4Glcα-4Glcα-4Glc 0 16

407 Malto-6 Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glc 0 25

408 Malto-7 Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glc 0 4

409 Malto-2-AO Glcα-4Glc-AO 0 10

410 Malto-3-AO Glcα-4Glcα-4Glc-AO 0 15

411 Malto-4-AO Glcα-4Glcα-4Glcα-4Glc-AO 0 2

412 Malto-5-AO Glcα-4Glcα-4Glcα-4Glcα-4Glc-AO 0 5

413 Malto-6-AO Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glc-AO 0 10

414 Malto-7-AO Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glc-AO 0 12

415 Dextran-3 Glcα-6Glcα-6Glc 0 8

416 Dextran-2-AO Glcα-6Glc-AO 0 12

417 Dextran-4-AO Glcα-6Glcα-6Glcα-6Glc-AO 0 12

418 Dextran-5-AO Glcα-6Glcα-6Glcα-6Glcα-6Glc-AO 0 3

419 Dextran-6-AO Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glc-AO 0 10

420 Lam-2 Glcß-3Glc 0 4

421 Lam-3 Glcß-3Glcß-3Glc 0 1

422 Lam-4 Glcß-3Glcß-3Glcß-3Glc 0 16

423 Lam-7 Glcß-3Glcß-3Glcß-3Glcß-3Glcß-3Glcß-3Glc 0 10

424 Lam-2-AO Glcß-3Glc-AO 0 13

425 Lam-3-AO Glcß-3Glcß-3Glc-AO 0 63

265

426 Lam-4-AO Glcß-3Glcß-3Glcß-3Glc-AO 0 26

427 Lam-5-AO Glcß-3Glcß-3Glcß-3Glcß-3Glc-AO 0 13

428 Lam-6-AO Glcß-3Glcß-3Glcß-3Glcß-3Glcß-3Glc-AO 0 5

429 Lam-7-AO Glcß-3Glcß-3Glcß-3Glcß-3Glcß-3Glcß-3Glc-AO 0 11

430 Curd-8-AO Glcß-3Glcß-3Glcß-3Glcß-3Glcß-3Glcß-3Glcß-3Glc-AO* 0 10

431 Curd-9-AO Glcß-3Glcß-3Glcß-3Glcß-3Glcß-3Glcß-3Glcß-3Glcß-3Glc-AO* 0 1 Glcß-3Glcß-3Glcß-3Glcß-3Glcß-3Glcß-3Glcß-3Glcß-3Glcß-3Glc- 432 Curd-10-AO 0 3 AO* Glcß-3Glcß-3Glcß-3Glcß-3Glcß-3Glcß-3Glcß-3Glcß-3Glcß- 433 Curd-12-AO 0 1 3Glcß-3Glcß-3Glc-AO* 434 Pust-2-AO Glcß-6Glc-AO 0 9

435 Pust-3-AO Glcß-6Glcß-6Glc-AO 0 10

436 Pust-4-AO Glcß-6Glcß-6Glcß-6Glc-AO 0 11

437 Pust-5-AO Glcß-6Glcß-6Glcß-6Glcß-6Glc 0 12

438 Pust-6-AO Glcß-6Glcß-6Glcß-6Glcß-6Glcß-6Glc-AO 0 5

439 Pust-8-AO Glcß-6Glcß-6Glcß-6Glcß-6Glcß-6Glcß-6Glcß-6Glc-AO* 0 10

440 Pust-9-AO Glcß-6Glcß-6Glcß-6Glcß-6Glcß-6Glcß-6Glcß-6Glcß-6Glc-AO* 0 8 Glcß-6Glcß-6Glcß-6Glcß-6Glcß-6Glcß-6Glcß-6Glcß-6Glcß- 441 Pust-12-AO 0 1 6Glcß-6Glcß-6Glc-AO* 442 Malto-8-AO Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glc-AO* 0 6

443 Malto-9-AO Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glc-AO* 0 15 Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glc- 444 Malto-10-AO 0 4 AO* Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glcα- 445 Malto-11-AO 0 21 4Glcα-4Glc-AO* Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glcα- 446 Malto-12-AO 0 12 4Glcα-4Glcα-4Glc-AO* Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glcα-4Glcα- 447 Malto-13-AO 0 21 4Glcα-4Glcα-4Glcα-4Glc-AO* 448 Dextran-8-AO Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glc-AO* 0 20

449 Dextran-9-AO Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glc-AO* 0 10 Dextran-10- Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glc- 450 0 3 AO AO* Dextran-11- Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glcα- 451 0 3 AO 6Glcα-6Glc-AO* Dextran-12- Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glcα- 452 0 11 AO 6Glcα-6Glcα-6Glc-AO* Dextran-13- Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glcα-6Glcα- 453 0 17 AO 6Glcα-6Glcα-6Glcα-6Glc-AO* 454 Cello-3 Glcß-4Glcß-4Glc 0 11

455 Cello-6 Glcß-4Glcß-4Glcß-4Glcß-4Glcß-4Glc 0 2

456 Cello-3-AO Glcß-4Glcß-4Glc-AO 0 35

457 Cello-4-AO Glcß-4Glcß-4Glcß-4Glc-AO 0 11

458 Cello-5-AO Glcß-4Glcß-4Glcß-4Glcß-4Glc-AO 0 9

459 Cello-6-AO Glcß-4Glcß-4Glcß-4Glcß-4Glcß-4Glc-AO 0 1 Cello F7 460 ?unknown 0 5 mix-AO Cello F8 461 ?unknown 0 3 mix-AO Cello F9 462 ?unknown 0 15 mix-AO Cello F10 463 ?unknown 0 13 mix-AO Cello F11 464 ?unknown 0 6 mix-AO 465 Grifo-3-AO ?unknown 0 7

466 Grifo-4-AO ?unknown 0 19

266

467 Grifo-5-AO ?unknown 0 16 468 Grifo-6-AO ?unknown 0 33 469 Grifo-7-AO ?unknown 0 14 470 Grifo-8-AO ?unknown 0 10 471 Grifo-9-AO ?unknown 0 46 472 Grifo-10-AO ?unknown 0 2 473 Grifo-11-AO ?unknown 0 6 474 Grifo-12-AO ?unknown 0 22 475 Grifo-13-AO ?unknown 0 17

Group K: Miscellaneous

476 (6P)-Man5 P-6Manα-3Manα-3Manα-3Manα-2Man 0 5 Glc(α6,α4,α4 477 Glcα-6Glcα-4Glcα-4Glc 0 7 ) Glcß-4Glcß-4Glcß-4Glc 478 Xyl3Glc4 │ │ │ 0 13 Xylα-6 Xylα-6 Xylα-6 479 Glc Glc 0 5 480 Glc-AO Glc-AO 0 13 481 Gal Gal 0 3 482 Gal-AO Gal-AO 0 36 483 Man Man 0 0 484 Man-AO Man-AO 0 2 485 Fuc Fuc 0 16 486 Fuc-AO Fuc-AO 0 19 487 Rha Rha 0 13 488 Rha-AO Rha-AO 0 16 489 GN GlcNAc 0 15 490 GN-AO GlcNAc-AO 0 6 491 GalNAc GalNAc 0 2 492 GalNAc-AO GalNAc-AO 0 42 493 SU-Tyr SU-Tyr 0 1 494 (6P)-Man-AO P-6Man-AO 0 1 495 (6P)-Man P-6Man 0 13 SU- 496 SU-Cholesterol 0 14 Cholesterol 497 GSC-441 NeuAcα-3Galß-4GlcNAcß-6GalNAcα-3Galß-4Glcß-C30 0 8 498 GSC-70 NeuAcα-6Galß-6GalNAcß-4Galß-4Glcß-Cer36 0 14 499 (6P)-Glc-AO P-6Glc-AO 0 1 500 GN-Asn GlcNAc-Asn 0 12 GlcNAcß1- 501 GlcNAcß1-4Fuc-AO 0 16 4Fuc-AO NeuAcα-3Galß-4GlcNAcß-4GalNAcß-3Galß-4Glcß-C30 502 GSC-384 │ 0 2 Fucα-3 503 GSC-446 NeuAcα-3Galß-4GlcNAcß-6GalNAcα-3Galß-4Glc-C30 0 21

Table A7: carbohydrate microarray data for TgMIC4-A56. Results correspond to 5 fmol spots. The name and topological structure of each probe is provided, alongside the fluorescence intensity and error-bound (i.e. standard deviation) of adhered protein, in arbitrary fluorescence units (AFU). Gal = galactose, Glc = glucose, GalNAc = N-acetylgalactosamine, GlcNAc = N-acetylglucosamine, SU = sulphate, NeuAc = N- acetylneuraminic acid, NeuGc = N-glycolylneuraminic acid, Cer = ceramide, AO = aminooxy-lipid conjugated, Fuc = fucose, Man = mannose, Rha = rhamnose, P = phosphate, Asn = asparagine, Ser = serine, Thr = threonine, Xyl = xylulose, Ara = arabinose, Hex = hexose, KDN = deaminoneuraminic acid, Succ = succinate.

267

A8. Carbohydrate microarray data for TgMIC4-A5.

The microarray data for TgMIC4-A5 are presented in table A8. Data was collected by Dr Yan Liu (Imperial College London, Glycosciences laboratory).

Probe number/name Structure Fluor. Error

Group A: Lac-, LacNAc-, LNnT based

1 Lac-AO Galß-4Glc-AO 15717 256 2 LacNAc-AO Galß-4GlcNAc-AO 30939 1121 3 LacNAc(1-3) Galß-3GlcNAc 18 5 4 LacNAc(1-3)-AO Galß-3GlcNAc-AO 14131 1143 Galß-3GlcNAc 5 Lea-Tri │ 1409 80 Fucα-4 Galß-3GlcNAc-AO 6 Lea-Tri-AO │ 4129 595 Fucα-4 Galß-4GlcNAc 7 Lex-Tri │ 69 31 Fucα-3 Galß-4GlcNAc-AO 8 Lex-Tri-AO │ 20131 162 Fucα-3 Galß-4GlcNAc-(Me)AO 9 Lex-Tri-(Me)AO │ 367 357 Fucα-3 Galß-4GlcNAcß-3Galß-4Glc-AO 10 LNFP-III-AO │ 31 26 Fucα-3 NeuAcα- 11 NeuAcα-3Galß-4Glc-AO 0 20 (3')Lac-AO 12 NeuAcß-(3')Lac NeuAcß-3Galß-4Glc 2 33 NeuAcß- 13 NeuAcß-3Galß-4Glc-AO 71 9 (3')Lac-AO 14 Neuα-(3')Lac Neuα-3Galß-4Glc 37 1 Neuα-(3')Lac- 15 Neuα-3Galß-4Glc-AO 0 15 AO NeuAcα-(3')LN- 16 NeuAcα-3Galß-4GlcNAc-AO 6 2 AO Neu4,5Ac- 17 Neu4,5Acα-3Galß-4Glc 0 1 (3')Lac Neu4,5Ac- 18 Neu4,5Acα-3Galß-4Glc-AO 33 20 (3')Lac-AO NeuAcα- 19 NeuAcα-6Galß-4Glc-AO 28 26 (6')Lac-AO 20 NeuAcß-(6')Lac NeuAcß-6Galß-4Glc 13 3 NeuAcß- 21 NeuAcß-6Galß-4Glc-AO 0 7 (6')Lac-AO 22 Neuα-(6')Lac Neuα-6Galß-4Glc 10 2 Neuα-(6')Lac- 23 Neuα-6Galß-4Glc-AO 14 20 AO Group B: N-glycan related

24 Man2(α3) Manα-3Man 8 12 25 Man2(α2) Manα-2Man 22 18 Manα-6Man 26 Man3(α3,α6) │ 7 21 Manα-3 Manα-6Manα-6Man 27 Man5(α3,α6) │ │ 17 7 Manα-3 Manα-3

268

28 Man1GN1 Manß-4GlcNAc 0 8 29 Man2GN1 Manα-3Manß-4GlcNAc 0 11 Manα-2Manα-6 │ Manα-2Manα-3Manα-6 30 Man9GN2-AO │ 2 2 Manß-4GlcNAcß-4GlcNAc-AO │ Manα-2Manα-2Manα-3 Manα-6 │ Manα-3Manα-6 31 Man7(D1)GN2-AO │ 8 9 Manß-4GlcNAcß-4GlcNAc-AO │ Manα-2Manα-2Manα-3 Manα-2Manα-6 │ Manα-6 32 Glc1Man9GN2 │ │ 0 9 Manα-2Manα-3 Manβ-4GlcNAcβ-4GlcNAc │ Glcα-3Manα-2Manα-2Manα-3 Manα-2Manα-6 │ Manα-6 33 Glc1Man9GN2-AO │ │ 4 16 Manα-2Manα-3 Manβ-4GlcNAcβ-4GlcNAc-AO │ Glcα-3Manα-2Manα-2Manα-3 Manα-6 │ Glc2Man7(D1)GN Manα-3Manα-6 34 │ 17 10 1-AO Manß-4GlcNAc-AO │ Glcα-3Glcα-3Manα-2Manα-2Manα-3 Manα-6 │ Glc3Man7(D1)GN Manα-3Manα-6 35 │ 1 4 1-AO Manß-4GlcNAc-AO │ Glcα-2Glcα-3Glcα-3Manα-2Manα-2Manα-3 Manα-6 │ 36 Man3FXylGN2 Xylβ-2Manα-4GlcNAcβ-4GlcNAc 0 13 │ │ Manα-3 Fucα-3 GlcNAcß-2Manα-6 │ 37 NGA2 Manß-4GlcNAcß-4GlcNAc 0 3 │ GlcNAcß-2Manα-3 GlcNAcß-2Manα-6 │ 38 NGA2B GlcNAcß-4Manß-4GlcNAcß-4GlcNAc 26 19 │ GlcNAcß-2Manα-3 GlcNAcß-2Manα-6 │ GlcNAcß-4Manß-4GlcNAcß-4GlcNAc 39 NGA3B │ 0 5 GlcNAcß-4Manα-3 │ GlcNAcß-2 GlcNAcß-6 │ GlcNAcß-2Manα-6 │ 40 NGA4 Manß-4GlcNAcß-4GlcNAc 0 2 │ GlcNAcß-2Manα-3 │ GlcNAcß-4 GlcNAcß-2 │ 41 NGA5B 23 3 GlcNAcß-4Manα-6 │ │

269

GlcNAcß-6 │ │ GlcNAcß-4Manß-4GlcNAcß-4GlcNAc │ GlcNAcß-4Manα-3 │ GlcNAcß-2 GlcNAcß-2Manα-6 Fucα-6 │ │ 42 NGA2F Manß-4GlcNAcß-4GlcNAc 3 3 │ GlcNAcß-2Manα-3 Manα-6 │ Manα-3Manα-6 43 GNMan5BGN2 │ 0 15 GlcNAcß-4Manß-4GlcNAcß-4GlcNAc │ GlcNAcß-2Manα-3 Galß-4GlcNAcß-6 │ Galß-4GlcNAcß-2Manα-6 │ 44 NA4 Manß-4GlcNAcß-4GlcNAc 4179 15 │ Galß-4GlcNAcß-4Manα-3 │ Galß-4GlcNAcß-2 Galß-4GlcNAcß-2Manα-6 Fucα-6 │ │ 45 NA2F-AO Manß-4GlcNAcß-4GlcNAc-AO 1063 94 │ Galß-4GlcNAcß-2Manα-3 NeuAcα-3Galß-4GlcNAcß-2Manα-6 │ Manß-4GlcNAcß-4GlcNAc 46 A3 │ 14 0 NeuAcα-3Galß-4GlcNAcß-4Manα-3 │ NeuAcα-6Galß-4GlcNAcß-2 NeuAcα-3Galß-4GlcNAcß-2Manα-6 Fucα-6 │ │ 47 A2F(2-3) Manß-4GlcNAcß-4GlcNAc 4 7 │ NeuAcα-3Galß-4GlcNAcß-2Manα-3 NeuAcα-6Galß-4GlcNAcß-2Manα-6 │ 48 A2(2-6) Manß-4GlcNAcß-4GlcNAc 0 2 │ NeuAcα-6Galß-4GlcNAcß-2Manα-3 NeuAcα-Galß-4GlcNAcß-2Manα-6 │ 49 AGP-Bi-Ac2 Manß-4GlcNAcß-4GlcNAc 18 1 │ NeuAcα-Galß-4GlcNAcß-2Manα-3 NeuGcα-Galß-4GlcNAcß-2Manα-6 ? │ 50 AGP-Bi-AcGc Manß-4GlcNAcß-4GlcNAc 3 5 │ NeuAcα-Galß-4GlcNAcß-2Manα-3 NeuGcα-Galß-4GlcNAcß-2Manα-6 │ 51 AGP-Bi-Gc2 Manß-4GlcNAcß-4GlcNAc 0 5 │ NeuGcα-Galß-4GlcNAcß-2Manα-3 Group C: Ganglioside-related GalNAcß-4Galß-4Glcß-Cer 52 SM2 │ 2097 113 SU-3 SU-3GalNAcß-4Galß-4Glcß-Cer 53 SB2 │ 32 15 SU-3 Galß-3GalNAcß-4Galß-4Glcß-Cer 54 SM1a │ 40703 304 SU-3 SU-3Galß-3GalNAcß-4Galß-4Glcß-Cer 55 SB1a │ 3 6 SU-3 56 Asialo-GM2 GalNAcβ-4Galβ-4Glcβ-Cer 15 24 GalNAcβ-4Galβ-4Glcβ-Cer 57 GM2 0 3 │

270

NeuAcα-3

58 Asialo-GM1 Galβ-3GalNAcβ-4Galβ-4Glcβ-Cer 2963 161 Asialo-GM1- 59 Galß-3GalNAcß-4Galß-4Glc 5403 298 Tetra Galβ-3GalNAcβ-4Galβ-4Glcβ-Cer 60 GM1 │ 20021 876 NeuAcα-3 Galβ-3GalNAcβ-4Galβ-4Glc 61 GM1-penta │ 29569 239 NeuAcα-3 Galβ-3GalNAcβ-4Galβ-4Glcβ-Cer 62 GM1(Gc) │ 32137 680 NeuGcα-3 Galβ-3GalNAcβ-4Galβ-4Glc 63 GM1(Gc)-penta │ 17262 999 NeuGcα-3 NeuAcα-3Galß-3GalNAcß-4Galß-4Glcß-Cer 64 GD1a │ 370 5 NeuAcα-3 NeuAcα-3Galß-3GalNAcß-4Galß-4Glc 65 GD1a-hexa │ 4 20 NeuAcα-3 GalNAcß-4Galß-3GalNAcß-4Galß-4Glcß-Cer │ │ NeuGcα-3 NeuAcα-3 GalNAc- 66 0 15 GD1a(Ac,Gc) GalNAcß-4Galß-3GalNAcß-4Galß-4Glcß-Cer │ │ NeuAcα-3 NeuGcα-3 Galß-3GalNAcß-4Galß-4Glcß-Cer 67 GD1b │ 18783 601 NeuAcα-8NeuAcα-3 GalNAcß-4Galß-4Glcß-Cer 68 GD2 │ 0 4 NeuAcα-8NeuAcα-3 69 GD3-tetra NeuAcα-8NeuAcα-3Galß-4Glc 20 25 70 GD3-tetra-AO NeuAcα-8NeuAcα-3Galß-4Glc-AO 11 6 NeuAcα-8NeuAcα-3Galβ-3GalNAcβ-4Galβ-4Glcβ- Cer 71 GT1a 51 25 │ NeuAcα-3 NeuAcα-3Galβ-3GalNAcβ-4Galβ-4Glcβ-Cer 72 GT1b │ 18 6 NeuAcα-8NeuAcα-3 NeuAcα-8NeuAcα-3Galβ-3GalNAcβ-4Galβ-4Glcβ-Cer 73 GQ1b │ 189 8 NeuAcα-8NeuAcα-3 Group D: Polysialyl 74 SA2(α8) NeuAcα-8NeuAc 2 1 75 SA3(α8) NeuAcα-8NeuAcα-8NeuAc 5 2 76 SA4(α8) NeuAcα-8NeuAcα-8NeuAcα-8NeuAc 0 2 77 SA5(α8) NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα-8NeuAc* 0 8 NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα- 78 SA6(α8) 3 1 8NeuAc* NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα- 79 SA7(α8) 3 9 8NeuAcα-8NeuAc* NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα- 80 SA8(α8) 0 2 8NeuAc-8NeuAcα-8NeuAc* NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα- 81 SA9(α8) 0 4 8NeuAc-8NeuAcα-8NeuAc-8NeuAcα* NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα- 82 SA10(α8) 11 17 8NeuAc-8NeuAcα-8NeuAcα-8NeuAcα-8NeuAc* NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα-8NeuAcα- 83 SA11(α8) 8NeuAc-8NeuAcα-8NeuAαc-8NeuAcα-8NeuAcα- 5 12 8NeuAc* Group E: O-glycan related

84 GalNAc-Ser GalNAcα-Ser 0 9

271

85 GalNAc-Thr GalNAcα-Thr 0 8 GalNAcβ-4Galβ-OX 86 B12/3 │ 61 7 NeuGcα-3 NeuAcα-3Galß-3GalNAc 87 DST │ 6 3 NeuAcα-6 88 Notch-1 Fucα-Thr 0 8 89 Notch-2 GlcNAcβ-3Fucα-Thr 16 11 90 Notch-3 Galβ-4GlcNAcβ-3Fucα-Thr 2519 50 91 Man-Ser Manα-Ser 1 14 92 Man-Thr Manα-Thr 0 2 93 Man-Ser-Succ Manα-Ser-Succ 5 22 94 Man-Thr-Succ Manα-Thr-Succ 0 5 Group F: Miscellaneous

95 Glc Glc 0 12 96 Glc-AO Glc-AO 0 2 97 Rha Rha 2 2 98 Rha-AO Rha-AO 27 11 99 Fuc Fuc 3 28 100 Fuc-AO Fuc-AO 129 18 101 Man Man 0 2 102 Man-AO Man-AO 0 4 103 (6P)-Man P-6Man 21 18 104 (6P)-Man-AO P-6Man-AO 17 10 105 GN GlcNAc 0 11 106 GN-AO GlcNAc-AO 22 13 107 GalNAc GalNAc 3 24 108 GalNAc-AO GalNAc-AO 27 7 109 Gal Gal 0 1 110 Gal-AO Gal-AO 0 5 111 NeuAc NeuAc 3 8 112 NeuAc-AO NeuAc-AO 16 23 113 NeuGc NeuGc 0 27 114 NeuGc-AO NeuGc-AO 0 31 ΔUA-4GlcNS 115 Hep-Di IS │ │ 5 18 SU-2 6-SU ΔUA-4GlcNS-AO 116 Hep-Di-IS-AO │ │ 0 16 SU-2 6-SU 117 Glc2(α2) Glcα-2Glc 0 16 118 Fuc-GlcNAc Fucα-6GlcNAc 0 7

Table A8: carbohydrate microarray data for TgMIC4-A5. Results correspond to 5 fmol spots. The name and topological structure of each probe is provided, alongside the fluorescence intensity and error-bound (standard deviation) of adhered protein, in arbitrary fluorescence units (AFU). Gal = galactose, Glc = glucose, GalNAc = N-acetylgalactosamine, GlcNAc = N-acetylglucosamine, SU = sulphate, NeuAc = N-acetylneuraminic acid, NeuGc = N-glycolylneuraminic acid, Cer = ceramide, AO = aminooxy-lipid conjugate. Fuc = fucose, Man = mannose, P = phosphate, Rha = rhamnose, Ser – serine, Thr = threonine, Succ = succinate.

272

A9. The molecular structures of TgMIC4-A5 ligands.

273

274

A10. Calculation of dissociation constants, Kd, for apple5/carbohydrate interactions using NMR chemical shift perturbations.

For a 1:1 protein/ligand interaction:

(Equation A10.1)

the dissociation constant, Kd, is defined as

Where [P], [L] and [PL] are the respective concentrations of free protein, free ligand and protein/ligand complex at equilibrium. The total protein and ligand concentrations at point (n) in the titration are given by

[P]n = [P] + [PL] and [L]n = [L] + [PL]

⇒ [P] = [P]n – [PL] and [L] = [L]n – [PL]

Substitution of these into equation A10.1 gives

expansion of which gives

This takes the general form of a quadratic equation - ax2 + bx + c = 0, where x = [PL] - and its roots can be determined using the standard formula;

275

This gives (Equation A10.2)

For a protein/ligand interaction which occurs in fast-exchange, the ratio of bound-to-total protein is proportional to the observed chemical shift perturbation at point (n) in the titration, i.e. given by

where Δδn is the difference between the chemical shift at point (n) and that of the unbound state, and

Δδmax is the difference in chemical shift between the unbound and saturated states. Substituting this into equation A10.2 gives

This equation can be rearranged to give:

The incremental addition of ligand to protein during titration leads to accumulation of multiple Δδn points, enabling determination of the Kd as described chapter 5.4.3.1.

276

A11. Additional NMR titration data.

The NMR titrations of TgMIC4-A5 with lactose, N-acetyl-D-lactosamine, galacto-N-biose and 3’-sialyl-N- acetyl-D-lactosamine are depicted in figures A11.1 – A11.4.

Figure A11.1: NMR titration of TgMIC4-A5 with lactose. The superimposed 1H-15N HSQC spectra of TgMIC4-A5 in the presence of 0 (black), 1 (green), 2 (magenta), 4 (blue) and 6 (red) molar equivalents (Meq) of lactose, at which point the protein was saturated. Spectra were acquired at a protein concentration of ~200 µM. The interaction occurs in fast-to-intermediate exchange, with many resonances broadened at pre-saturating lactose concentrations.

277

Figure A11.2: NMR titration of TgMIC4-A5 with N-acetyl-D-lactosamine. The superimposed 1H-15N HSQC spectra of TgMIC4-A5 in the presence of 0 (black), 1 (green), 2 (magenta), 4 (blue) and 6 (red) molar equivalents (Meq) of N-acetyl-D-lactosamine, at which point the protein was saturated. Spectra were acquired at a protein concentration of ~200 µM. The interaction occurs in fast-to-intermediate exchange, with a selection of resonances strongly broadened at pre-saturating N-acetyl-D-lactosamine concentrations.

278

Figure A11.3: NMR titration of TgMIC4-A5 with galacto-N-biose. The superimposed 1H-15N HSQC spectra of TgMIC4-A5 in the presence of 0 (black), 1 (green), 3 (magenta), 4 (blue) and 5 (red) molar equivalents (Meq) of galacto-N-biose, at which point the protein was saturated. Spectra were acquired at a protein concentration of ~200 µM. The interaction occurs in intermediate exchange, with most resonances undergoing broadening, often beyond detection, at pre-saturating galacto-N-biose concentrations. As in the titrations of the other disaccharides, the S18 bound-state resonance is exchanged-broadened.

279

Figure A11.4: NMR titration of TgMIC4-A5 with 3’sialyl-N-acetyl-D-lactosamine. The superimposed 1H-15N HSQC spectra of TgMIC4-A5 in the presence of 0 (black) and 10 (red) molar equivalents (Meq) of 3’sialyl-N-acetyl-D- lactosamine. Spectra were acquired at a protein concentration of ~200 µM. No chemical shift perturbations occur, demonstrating that the molecules do not interact under these conditions.

280

Table of References

Aikawa, M., Miller, L. H., Johnson, J., & Rabbege, J. (1978). Erythrocyte entry by malarial parasites. A moving junction between erythrocyte and parasite. J. Cell Biol., 77(1), 72-82.

Ajioka, J. W., Boothroyd, J. C., Brunk, B. P., Hehl, A., Hillier, L., Manger, I. D., Marra, M., Overton, G. C., Roos, D. S., Wan, K-L, Waterson, R., & Sibley L. D. (1998). Gene Discovery by EST Sequencing in Toxoplasma gondii. Reveals Sequences Restricted to the Apicomplexa. Genome Research, 8, 18-28.

Ajioka, J. W., Fitzpatrick, J. M., & Reitter, C. P. (2001). Toxoplasma gondii genomics: shedding light on pathogenesis and chemotherapy. Exp. Rev. Mol. Med., 1-19.

Alexander, D., Mital, J., Ward, G., Bradley, P., & Boothroyd, J. (2005). Identification of the Moving Junction Complex of Toxoplasma gondii: A Collaboration between Distinct Secretory Organelles. PLoS Pathogens, 1, 137-149.

Angulo, J., Rademacher, C., Biet, T., Benie, A. J., Blume, A., Peters, H., Palcic, M., Parra, F., & Peters, T. (2006). NMR analysis of carbohydrate–protein interactions. Methods Enzymol., 416, 12–30.

Angulo, Jesús, Díaz, I., Reina, J. J., Tabarani, G., Fieschi, F., Rojo, J., & Nieto, P. M. (2008). Saturation transfer difference (STD) NMR spectroscopy characterization of dual binding mode of a mannose disaccharide to DC-SIGN. ChemBioChem, 9(14), 2225-7.

Asensio, J. L., Canada, F. J., Bruix, M., Rodriguez-Romero, A., & Jimenez-Barbero, J. (1995). The interaction of hevein with N-acetylglucosamine-containing oligosaccharides. Solution structure of hevein complexed to chitobiose. Eur. J. Biochem., 230(2), 621-33.

Aslanidis, C., & De Jong, P. J. (1990). Ligation-independent cloning of PCR products (LIC-PCR). Nucleic Acids Res., 18(20), 6069–6074. Oxford Univ Press.

Aue, W.P, Bartholdi, E., & Ernst, R.R. (1976). Two dimensional spectroscopy. Application to nuclear magnetic resonance. J Chem Phys., 64(5), 2229–2246.

Bai, T., Becker, M., Gupta, A., Strike, P., Murphy, V. J., Anders, R. F., & Batchelor, A. H. (2005). Structure of AMA1 from Plasmodium falciparum reveals a clustering of polymorphisms that surround a conserved hydrophobic pocket. Proc. Natl. Acad. Sci. USA, 102(36), 12736-41.

Bax, A. D., & Davis, D. G. (1985). MLEV-17-Based Two-Dimensional Homonuclear Magnetization Transfer Spectroscopy. J. Magn. Res., 360, 355-360.

Bax, A., Clore, G.M., & Gronenborn, A.M. (1990). 1H-1H correlation via isotropic mixing of 13C magnetisation, a new three-dimensional approach for assigning 1H and 13C spectra of 13C-enriched proteins. J Magn Res., 88, 425-431.

Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N., & Bourne, P. E. (2000). The Protein Data Bank. Nuc. Acids Res., 28, 235-242.

Bessette, P. H., Åslund, F., Beckwith, J., & Georgiou, G. (1999). Efficient Folding of Proteins with Multiple Disulfide Bonds in the Escherichia coli Cytoplasm. Proc. Natl. Acad. Sci. USA, 96, 13703-13708.

281

Besteiro, S., Michelin, A., Poncet, J., Dubremetz, J.-F., & Lebrun, M. (2009). Export of a Toxoplasma gondii Rhoptry Neck Protein Complex at the Host Cell Membrane to Form the Moving Junction during Invasion. PLoS Pathogens, 5, 1-14.

Bhattacharya, A., Tejero, R., Montelione, G. T., Wood, R., & Medical, J. (2007). Evaluating Protein Structures Determined by Structural Genomics Consortia. Proteins: Struc. Func. & Bioinf., 66, 778- 795.

Black, M., & Boothroyd, J. (2000). Lytic Cycle of Toxoplasma gondii. Microbiol. Mol. Biol. Rev., 64, 607- 623.

Blackman, M J, & Bannister, L. H. (2001). Apical organelles of Apicomplexa: biology and isolation by subcellular fractionation. Mol. Biochem. Parasitol., 117(1), 11-25.

Blumenschein, T., Friedrich, N., Childs, R., Saouros, S., Carpenter, E., Campanero-Rhodes, M., Simpson, P., Chair, W., Koutroukides, T., Blakman, M. J., Feizi, T., Soldati-Favre, D., & Matthews, S. (2007). Atomic resolution insight into host cell recognition by Toxoplasma gondii. EMBO J, 26, 2808-20.

Bohne, W., Heesemann, J., & Gross, U. (1993). Induction of bradyzoite-specific Toxoplasma gondii antigens in gamma interferon-treated mouse macrophages. Infect. Immun., 61(3), 1141-5.

Boothroyd, John, & Dubremetz, J.-F. (2008). Kiss and spit: the dual roles of Toxoplasma rhoptries. Nature Rev. Microbiol., 6, 79-88.

Bornand, J., & Piguet, J. (1991). Toxoplasma infestation: prevalence, risk of congenital infection and development in Geneva from 1973 to 1987]. Schweiz. Med. Wochenschr., 121(1-2), 21.

Braga, L. L., Ninomiya, H., Mccoy, J. J., Eacker, S., Wiedmer, T., Pham, C., Wood, S., Sims, P. J., & Petri Jr, W. A. (1992). Inhibition of the Galactose-specific Adhesin of Entamoeba histolytica. J. Clin. Invest., 90, 1131-1137.

Brecht, S, Carruthers, V., Ferguson, D., Giddings, O., Wang, G., Jakle, U., Harper, J., Sibley, L. D., & Soldati, D. (2001). The Toxoplasma Micronemal Protein MIC4 is an Adhesin Composed of Six Conserved Apple Domains. J Biol. Chem., 276, 4119-27.

Brossier, F., & Sibley, D. (2005). Toxoplasma gondii: microneme protein MIC2. Int. J Biochem. Cell Biol., 37, 2266-2272.

Brossier, F., Jewett, T. J., Lovett, J. L., & Sibley, L. D. (2003). C-terminal Processing of the Toxoplasma Protein MIC2 Is Essential for Invasion into Host Cells. Biochemistry, 278(8), 6229 -6234.

Brossier, F., Jewett, T. J., Sibley, L. D., & Urban, S. (2005). A spatially localized rhomboid protease cleaves cell surface adhesins essential for invasion by Toxoplasma. Proc. Natl. Acad. Sci. USA, 102(11), 4146-51.

Brown, P J, Billington, K. J., Bumstead, J. M., Clark, J. D., & Tomley, F. M. (2000). A microneme protein from Eimeria tenella with homology to the Apple domains of coagulation factor XI and plasma pre- kallikrein. Mol. Biochem. Parasitol., 107(1), 91-102.

Brown, Philip J, Mulvey, D., Potts, J. R., Tomley, F. M., & Campbell, I. D. (2003). Solution structure of a PAN module from the apicomplexan parasite Eimeria tenella. J Struct. Func. Genomics, 4, 227-234.

282

Brunger, A. T. (2007). Version 1.2 of the Crystallography and NMR system. Nature Protocols, 2(11), 2728- 33.

Brydges, S. D., Harper, J. M., Parussini, F., Coppens, I., & Carruthers, V. (2008). A transient forward- targeting element for microneme-regulated secretion in Toxoplasma gondii. Biol. Cell, 100(4), 253- 64.

Brünger, Axel T, Adams, P. D., Clore, G. M., Delano, W. L., & Gros, P. (1998). Crystallography & NMR System: A New Software Suite for Macromolecular Structure Determination. Med. Microbiol. & Immunol., 54, 905-921.

Bubb, W. a. (2003). NMR spectroscopy in the study of carbohydrates: Characterizing the structural complexity. Con. Magn. Res., 19A(1), 1-19.

Buguliskis, J. S., Brossier, F., Shuman, J., & Sibley, L. D. (2010). Rhomboid 4 (ROM4) affects the processing of surface adhesins and facilitates host cell invasion by Toxoplasma gondii. PLoS Pathogens, 6(4), e1000858.

Buxton, D., & Innes, E. (1995). A commercial vaccine for ovine toxoplasmosis. Parasitology, 110, 11. Cambridge Univ Press.

Carruthers, VB. (1999). Armed and dangerous: Toxoplasma gondii uses an arsenal of secretory proteins to infect host cells. Parasitol. Int., 48, 1-10.

Carruthers, VB. (2002). Host cell invasion by the opportunistic pathogen Toxoplasma gondii. Acta Tropica, 81, 111- 122.

Carruthers, VB, & Boothroyd, J. (2007). Pulling together: an integrated model of Toxoplasma cell invasion. Curr. Opin. Microbiol., 10, 83-9.

Carruthers, VB, & Sibley, L. (1997). Sequential protein secretion from three distinct organelles of Toxoplasma gondii accompanies invasion of human fibroblasts. Eur. J. Cell Biol., 73, 114-23.

Carruthers, VB, & Tomley, F. M. (2008). Receptor-ligand interaction and invasion: Microneme proteins in apicomplexans. Subcell. Biochem., 47, 33-45.

Carruthers, VB, Giddings, O. K., & Sibley, L. D. (1999). Secretion of micronemal proteins is associated with toxoplasma invasion of host cells. Cell Microbiol., 1, 225-235.

Casset, F., Imberty, A., Pérez, S., Etzler, M. E., Paulsen, H., & Peters, T. (1997). Transferred nuclear Overhauser enhancement (NOE) and rotating-frame NOE experiments reflect the size of the bound segment of the Forssman pentasaccharide in the binding site of Dolichos biflorus lectin. Eur. J. Biochem., 244(1), 242-50.

Cavanagh, J., Fairbrother, W., Palmer, A., Skelton, N., & Rance, M. (2007). Protein NMR Spectrscopy: Principles & Practise (2nd ed.). Academic Press.

Charron, A. J., & Sibley, L. D. (2004). Molecular partitioning during host cell penetration by Toxoplasma gondii. Traffic, 5(11), 855-67.

Claridge, T. (2009). High-Resolution Techniques in Organic Chemistry (2nd ed.). Elsevier.

283

Clore, G., & Gronenborn, A. (1982). Theory and applications of the transferred nuclear Overhauser effect to the study of the conformations of small ligands bound to proteins. J. Magn. Res., 417, 402-417.

Cooper, N. R. (1991). Complement evasion strategies of mlcroorgamsms. Immunol. Today, 12(0), 327- 331.

Cornilescu, G., Delaglio, F., & Bax, A. (1999). Protein backbone angle restraints from searching a database for chemical shift and sequence homology. J Biomol. NMR, 13, 289-302.

Crawford, J., Tonkin, M. L., Grujic, O., & Boulanger, M. J. (2010). Structural characterization of apical membrane antigen 1 (AMA1) from Toxoplasma gondii. J. Biol. Chem., 285(20), 15644-52.

Crocker, P. R., & Feizi, T. (1996). Carbohydrate recognition systems: functional triads in cell-cell interactions. Curr. Opin. Struc. Biol., 6(5), 679-91.

Cérède, O., Dubremetz, J. F., Soête, M., Deslée, D., Vial, H., Bout, D., & Lebrun, M. (2005). Synergistic role of micronemal proteins in Toxoplasma gondii virulence. J. Exp. Med., 201(3), 453-63. 2

Dames, S. a, Mulet, J. M., Rathgeb-Szabo, K., Hall, M. N., & Grzesiek, S. (2005). The solution structure of the FATC domain of the protein kinase target of rapamycin suggests a role for redox-dependent structural and cellular stability. J. Biol. Chem., 280(21), 20558-64.

Dautu, G., Munyaka, B., Carmen, G., Zhang, G., Omata, Y., Xuenan, X., & Igarashi, M. (2007). Toxoplasma gondii: DNA vaccination with genes encoding antigens MIC2, M2AP, AMA1 and BAG1 and evaluation of their immunogenic potential. Exp. Parasitol., 116(3), 273-82.

Davis, I. W., Leaver-Fay, A., Chen, V. B., Block, J. N., Kapral, G. J., Wang, X., Murray, L. W., Arendall III, B., Snoeynik, J., Richardson, J. S., & Richardson D. C. (2007). MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Res., 35(Web Server issue), W375- 83. de Marco, A. (2009). Strategies for Successful Recombinant Expression of Disulfide Bond-Dependent Proteins in Escherichia Coli. Microb. Cell Fact., 8, 26. de Medeiros, B. C., de Medeiros, C. R., Werner, B., Loddo, G., Pasquini, R., & Bleggi-Torres, L. F. (2001). Disseminated toxoplasmosis after bone marrow transplantation: report of 9 cases. Transpl. Infect. Dis., 3(1), 24-8. de Vries, S., Dijk, M. V., Vries, S. J. D., Dijk, A. D. J. V., Thureau, A., Hsu, V., Wassenaar, T., & Bonvin, A. M. J. J. (2007). HADDOCK versus HADDOCK: New features and performance of HADDOCK2.0 on the CAPRI targets. Proteins: Struc. Func. & Bioinf., 726-733.

De Paschale, M., Agrappi, C., Clerici, P., Mirri, P., Manco, M. T., Cavallari, S., & Vigano, E. F. (2008). Seroprevalence and incidence of Toxoplasma gondii infection in the Legnano area of Italy. Clin. Microbiol. Infect., 14(2), 186-189.

Deane, J. E., Graham, S. C., Kim, N. N., Stein, P. E., McNair, R., Cachón-González, M. B., Cox, T. M., & Read, R. J. (2011). Insights into Krabbe disease from structures of galactocerebrosidase. Proc. Natl. Acad. Sci. USA, 108(37), 15169-73.

284

Delaglio, F, Grzesiek, S., Vuister, G. W., Zhu, G., Pfeifer, J., & Bax, A. (1995). NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J. Biomol. NMR, 6(3), 277-93.

Dereeper, a, Guignon, V., Blanc, G., Audic, S., Buffet, S., Chevenet, F., Dufayard, J.-F., Guindon, S., Lefort, V., Lescot, M., Claverie, J-M, & Gascuel, O. (2008). Phylogeny.fr: robust phylogenetic analysis for the non-specialist. Nucleic Acids Res., 36(Web Server issue), W465-9.

Derman, A., Prinz, W., Belin, D., & Beckwith, J. (1993). Mutations that allow disulfide bond formation in the cytoplasm of Escherichia coli. Science, 262, 1744-47.

Desmonts, G., & Couvreur, J. (1974). Toxoplasmosis in pregnancy and its transmission to the fetus. Bull. N. Y. Acad. Med., 50(2), 146-59.

Di Cristina, M, Ghouze, F., Kocken, C. H., Naitza, S., Cellini, P., Soldati, D., Thomas, a W., & Crisanti, A. (1999). Transformed Toxoplasma gondii tachyzoites expressing the circumsporozoite protein of Plasmodium knowlesi elicit a specific immune response in rhesus monkeys. Infect. Immun., 67(4), 1677-82.

Di Cristina, Manlio, Spaccapelo, R., Soldati, D., Crisanti, A., & Bistoni, F. (2000). Two Conserved Amino Acid Motifs Mediate Protein Targeting to the Micronemes of the Apicomplexan Parasite Toxoplasma gondii. Mol. Cell. Biol., 20, 7332-7341.

Diza, E., Frantzidou, F., Souliou, E., Arvanitidou, M., Gioula, G., & Antoniadis, A. (2005). Seroprevalence of Toxoplasma gondii in northern Greece during the last 20 years. Clin. Microbiol. Infect., 11(9), 719-23.

Dolan, S. a, Miller, L. H., & Wellems, T. E. (1990). Evidence for a switching mechanism in the invasion of erythrocytes by Plasmodium falciparum. J. Clin. Invest., 86(2), 618-24.

Dominguez, C., Boelens, R., & Bonvin, A. M. J. J. (2003). HADDOCK: A Protein-Protein Docking Approach Based on Biochemical or Biophysical Information. J. Am. Chem. Soc., 125, 1731-1737.

Dubey, J. P. (2004). Toxoplasmosis - a waterborne zoonosis. Vet. Parasitol., 126(1-2), 57-72.

Dubey, J. P., Lindsay, D. S., & Speer, C. a. (1998). Structures of Toxoplasma gondii tachyzoites, bradyzoites, and sporozoites and biology and development of tissue cysts. Clin. Microbiol. Rev., 11(2), 267-99.

Dubremetz, J.-F., & Dissous, C. (1980). Characteristic Proteins of Micronemes and Dense Granules from Saracocystis tenella Zoites (Protozoa, ). Mol. Biochem. Parasitol., 1, 279-289.

Eddy, B., Borman, G., Grubbs, G., & Young, R. (1962). Identification of the oncogenic substance in rhesus monkey kidney cell culture as simian virus 40. Virology, 17, 65-75.

El Hajj, H., Garcia-Reguet, N., Papoin, J., Soete, M., Dubremetz, J.-F., & Lebrun, M. (2008). Molecular signals in the trafficking of Toxoplasma gondii protein MIC3 to the micronemes. Eukaryotic Cell, 7(6), 1019-1028.

Emsley, J., Mcewan, P. A., & Gailani, D. (2010). Structure and function of factor XI. Structure, 115(13), 2569-2577.

285

Espinosa, J. F., Asensio, J. L., García, J. L., Laynez, J., Bruix, M., Wright, C., Siebert, H. C., Gabius, H-J., Canada, F. J., & Jimenez-Barbero, J. (2000). NMR investigations of protein-carbohydrate interactions binding studies and refined three-dimensional solution structure of the complex between the B domain of wheat germ agglutinin and N,N’, N"-triacetylchitotriose. Eur. J. Biochem., 267(13), 3965-78.

Evengård, B., Petersson, K., Engman, M. L., Wiklund, S., Ivarsson, S. a, Teär-Fahnehjelm, K., Forsgren, M., Gilbert, R., & Malm, G. (2001). Low incidence of toxoplasma infection during pregnancy and in newborns in Sweden. Epidemiol. Infect., 127(1), 121-7.

Feizi, T, Stoll, M., Yuen, C.-T., Chai, W., & Lawson, A. (1994). Neoglycolipids: Probes of Oligosaccharide Structure, Antigenicity and Function. Methods Enzymol., 230, 484-519.

Ferguson, D., Brecht, S., & Soldati, D. (2000). The microneme protein MIC4, or an MIC4-like protein, is expressed within the macrogamete and associated with oocyst wall formation in Toxoplasma gondii. Int. J. Parasitol., 30, 1203-09.

Fesik, S.W., Eaton, H.L, Olejniczak, E.T, & Zuiderweg, E.R.P (1989). 2D and 3D NMR spectroscopy employing 13C-13C magnetisation transfer by isotropic mixing. Spin system identification in large proteins. J Am Chem Soc., 112(2), 886-889.

Fielding, L. (2007). NMR methods for the determination of protein–ligand dissociation constants. Prog. NMR Spec., 51(4), 219-242.

Fletcher, C. M., Jones, D. N., Diamond, R., & Neuhaus, D. (1996). Treatment of NOE constraints involving equivalent or nonstereoassigned protons in calculations of biomacromolecular structures. J. Biomol. NMR, 8(3), 292-310.

Fotinou, C., Emsley, P., Black, I., Ando, H., Ishida, H., Kiso, M., Sinha, K., Fairweather, N., & Isaacs, N. W. (2001). The crystal structure of tetanus toxin Hc fragment complexed with a synthetic GT1b analogue suggests cross-linking between ganglioside receptors and the toxin. J. Biol. Chem., 276, 32274-81.

Fourmaux, M., Achbarou, A., Mercereau-Puijalon, O., Biderre, C., I, B., Loyens, A., Odberg-Ferragut, C., Camus, D., & Dubremetz, J-F. (1996). The MIC1 microneme protein of Toxoplasma gondii contains a duplicated receptor-like domain and binds to host cell surface. Mol. Biochem. Parasitol., 83, 201- 210.

Frank, M, Lütteke, T., & von der Lieth, C.-W. (2007). GlycoMapsDB: a database of the accessible conformational space of glycosidic linkages. Nucleic Acids Res., 35(Database issue), 287-90.

Friedrich, N., Santos, J. M., Liu, Y., Palma, A. S., Leon, E., Saouros, S., Kiso, M., Blackman, M. J., Matthews, S., Feizi, T., & Soldati-Favre, D. (2010). Members of a novel protein family containing microneme adhesive repeat domains act as sialic acid-binding lectins during host cell invasion by apicomplexan parasites. J. Biol. Chem., 285(3), 2064-76.

Fukui, S., Feizi, T., Galustian, C., Lawson, A. M., & Chai, W. (2002). Oligosaccharide microarrays for high- throughput detection and specificity assignments of carbohydrate-protein interactions. Nature Biotechnology, 20(10), 1011-7.

286

Fusco, G., Rinaldi, L., Guarino, A., Proroga, Y. T. R., Pesce, A., Giuseppina, D. M., & Cringoli, G. (2007). Toxoplasma gondii in sheep from the Campania region (Italy). Vet. Parasitol., 149, 271-274.

Gaiser, O. J., Piotukh, K., Ponnuswamy, M. N., Planas, A., Borriss, R., & Heinemann, U. (2006). Structural basis for the substrate specificity of a Bacillus 1,3-1,4-beta-glucanase. J. Mol. Biol., 357(4), 1211-25.

Gaji, R. Y., Flammer, H. P., & Carruthers, V. (2011). Forward targeting of Toxoplasma gondii proproteins to the micronemes involves conserved aliphatic amino acids. Traffic, 12(7), 840-53.

Garcia-Reguet, N., Lebrun, M., Fourmaux, M.-N., Mercereau-Puijalon, O., Mann, T., Beckers, C., Samyn, B., Van Beeumen, J., Bout, D., & Dubremetz, J-F. (2000). The microneme protein MIC3 of Toxoplasma gondii is a secretory adhesin that binds to both the surface of the host cells and the surface of the parasite. Cell. Microbiol., 2, 353-364.

Garnett, J., Liu, Y., Leon, E., Allman, S. A., Friedrich, N., Saouros, S., Curry, S., Soldati-Favre, D., Davis, B. G., Feizi, T. & Matthews, S. Details insights from microarray and crystallographic studies into carbohydrate recognition by microneme protein 1 (MIC1) of Toxoplasma gondii. Protein Sci., 18, 1935-1947.

Gaskell, E., Smith, J. E., Pinney, J. W., Westhead, D. R., & McConkey, G. (2009). A unique dual activity amino acid hydroxylase in Toxoplasma gondii. PLoS One, 4(3), e4801.

Gheysen, K., Mihai, C., Conrath, K., & Martins, J. C. (2008). Rapid identification of common hexapyranose monosaccharide units by a simple TOCSY matching approach. Chem. Eur. J., 14(29), 8869-78.

Gomas, J.A.N.F. (2001). Aromaticity and ring currents. Chem. Rev. 101(5), 1349-1384.

Grey, M. J., Wang, C., & Palmer, A. G. (2003). Disulfide bond isomerization in basic pancreatic trypsin inhibitor: multisite chemical exchange quantified by CPMG relaxation dispersion and chemical shift modeling. J. Am. Chem. Soc., 125(47), 14324-35.

Gronwald, W., Bomke, J., Maurer, T., Domogalla, B., Huber, F., Schumann, F., Kremer, W., Fink, F., Rysiok, T., Frech, M., & Kalbitzer, H. R. (2008). Structure of the leech protein saratin and characterization of its binding to collagen. J. Mol. Biol., 381(4), 913-27.

Grzesiak, S., & Bax, A. (1992). Correlating backbone amide and side chain resonance in larger by multiple relayed triple resonance NMR. J Am Chem Soc. 114(16), 6291-6294.

Grzesiak, S., Anglister, J., & Bax, A. (1993). Correlation of backbone amide and aliphatic side-chain resonances in 13C/15N-enriched proteins by isotropic mixing of 13C magnetisation. J Magn Res B, 101, 114-119.

Grzesiak, S., & Bax, A. (1993). Amino acid type determination in the sequential assignment procedure of uniformly 13C/15N-enriched proteins. J Biomol NMR, 3, 185-204.

Harper, J. M., Hoff, E. F., & Carruthers, V. (2004). Multimerization of the Toxoplasma gondii MIC2 integrin-like A-domain is required for binding to heparin and human cells. Mol. Biochem. Parasitol., 134(2), 201-12.

Hassell, A. M., An, G., Bledsoe, R. K., Bynum, J. M., Carter, H. L., Deng, S.-J. J., Gampe, R. T., Grisard, T. E., Madauss, K. P., Notle, R. T., Rocque, W. J., Wang, L., Weaver, K. L., Williams, S. P., Wisely, G. B., Xu,.

287

R., & Shewchuk, L. M. (2007). Crystallization of protein-ligand complexes. Acta Cryst. D63, 63(Pt 1), 72-9.

Hehl, A. B., Lekutis, C., Grigg, M. E., Bradley, P. J., Dubremetz, J.-F., Ortega-Barria, E., & Boothroyd, J. C. (2000). Toxoplasma gondii Homologue of Plasmodium Apical Membrane Antigen 1 Is Involved in Invasion of Host Cells. Infect. Immun., 68, 7078-7086.

Hill, D., & Dubey, J. P. (2002). Toxoplasma gondii: transmission, diagnosis and prevention. Clin. Microbiol. Infect., 8(10), 634-40.

Hoffmann, B., Eichmuller, C., Steinhauser, O., & Konrat, R. (2005). Rapid assessment of proteoin structural stability and fold variation by NMR. Methods Enzymol., 394, 142-175

Holliman, R. E. (1997). Toxoplasmosis, behaviour and personality. J. Infect., 35(2), 105-10.

Holm, L., & Rosenström, P. (2010). Dali server: conservation mapping in 3D. Nucleic Acids Res., 38(Web Server issue), W545-9.

Holmgren, J. A. N., Lonnroth, I., Manssont, J.-eric, & Svennerholmt, L. (1975). Interaction of cholera toxin and membrane GM1 ganglioside of small intestine. Biochemistry, 72(7), 2520-2524.

Homans, S. W. (1993). Conformation and dynamics of oligosaccharides in solution. Glycobiology, 3(6), 551-5.

Hore, P. (1995). Nuclear Magnetic Resonance (1st ed.). Oxford University Press.

Hu, K., Johnson, J., Florens, L., Fraunholz, M., Suravajjala, S., DiLullo, C., Yates, J., Roos, D. S., & Murray, J. M. (2006). Cytoskeletal components of an invasion machine - the apical complex of Toxoplasma gondii. PLoS Pathogens, 2(2), 121-138.

Huizinga, E. G., Schouten, A., Connolly, T. M., Kroon, J., Sixma, J. J., & Gros, P. (2001). The structure of leech anti-platelet protein, an inhibitor of haemostasis. Acta Cryst. D57, 1071–1078.

Huynh, M.-H., & Carruthers, V. (2006). Toxoplasma MIC2 is a major determinant of invasion and virulence. PLoS pathogens, 2(8), e84.

Huynh, M.-H., Rabenau, K. E., Harper, J. M., Beatty, W. L., Sibley, L., & Carruthers, V. (2003). Rapid invasion of host cells by Toxoplasma requires secretion of the MIC2-M2AP adhesive protein complex. EMBO J., 22(9), 2082-90.

Imberty, Anne, & Varrot, A. (2008). Microbial recognition of human cell surface glycoconjugates. Curr. Opin. Struc. Biol., 18(5), 567-76.

Inagaki, F., Shimada, I., Kohda, D., Suzuki, A., & Bax, A. (1989). Relayed HOHAHA, a Useful Method for Extracting Subspectra of Individual Components of Sugar Chains. J. Magn. Res., 190, 186- 190.

Innes, E., & Vermeulen, A. N. (2006). Vaccination as a control strategy against the coccidial parasites Eimeria, Toxoplasma and Neospora. Parasitology, 133, S145-68.

Jain, N. U., Noble, S., & Prestegard, J. H. (2003). Structural Characterization of a Mannose-binding Protein–Trimannoside Complex using Residual Dipolar Couplings. J. Mol. Biol., 328(2), 451-462.

288

Jain, N., Venot, A., Umetmoto, K., Leffler, H., & Prestegard, J. (2001). Distance mapping of protein- binding sites using spin-labeled oligosaccharide ligands. Protein Sci., 10, 2393-2400.

Jeener, J., Maier, B.H, Bachmann, P., & Ernst, R.R. (1979). Investigation of exchange processes by two- dimensional NMR spectroscopy. J Chem Phys. 71, 4546-4554.

Jewett, T. J., & Sibley, L. D. (2003). Aldolase Forms a Bridge between Cell Surface Adhesins and the Actin Cytoskeleton in Apicomplexan Parasites. Mol. Cell, 11, 885-894.

Jewett, T. J., & Sibley, L. D. (2004). The Toxoplasma proteins MIC2 and M2AP form a hexameric complex necessary for intracellular survival. J Biol. Chem., 279(10), 9362-9.

Johnson, B. A., & Blevins, R. (2000). NMRView : A computer program for the visualization and analysis of NMR data. Science, 4(1994), 603-614.

Jones, D. T. (1999). Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol., 292(2), 195-202.

Jones, J. L., Kruszon-Moran, D., Sanders-Lewis, K., & Wilson, M. (2007). Toxoplasma gondii infection in the United States, 1999-2004, Decline from the prior decade. Am. J. Trop. Hygeine Med., 77(3), 405-10.

Karlsson, K. (1998). On the Character and Functions of Sphingolipids. Acta Biochim. Pol., 425, 429-438.

Karsten, V., Qi, H., Beckers, C. J. M., Reddy, A., Dubremetz, J.-francois, Webster, P., & Joiner, K. A. (1998). The Protozoan Parasite Toxoplasma gondii Targets Proteins to Dense Granules and the Vacuolar Space Using Both Conserved and Unusual Mechanisms. J. Cell Biol., 141(6), 1323-1333.

Kay, L.E., Ikura, M., Tschudin, R., & Bax, A. (1990). Three-dimensional triple-resonance NMR spectroscopy of isotopically enriched proteins. J Magn Res., 89, 496-514.

Keeler, J. (2005). Understanding NMR Spectroscopy (1st ed.). John Wilet & Sons Ltd.

Kaiser, R. (1962). Use of the Nuclear Overhauser Effect in the Analysis of High‐Resolution Nuclear Magnetic Resonance Spectra. Journal Chem Phys. 39(1), 2435-2443.

Keeley, A., & Soldati, D. (2004). The glideosome: a molecular machine powering motility and host-cell invasion by Apicomplexa. Trends Cell. Biol., 14, 528-532.

Keller, N., Riesen, M., Naguleswaran, A., Vonlaufen, N., Stettler, R., Leepin, A., Wastling, J. M., & Hemphill, A. (2004). Identification and Characterization of a Neospora caninum Microneme- Associated Prortein (NcMIC4) That Exhibits Unique Lactose-Binding Properties. Infect. Immun., 72(8), 4791-4800.

Kessler, H., Herm-Götz, A., Hegge, S., Rauch, M., Soldati-Favre, D., Frischknecht, F., & Meissner, M. (2008). Microneme protein 8 - a new essential invasion factor in Toxoplasma gondii. J. Cell Sci., 121(Pt 7), 947-56.

Kim, K, & Weiss, L. (2007). Toxoplasma gondii: The Model Apicomplxan. Perspectives and Methods. (1st ed.). Academic Press.

289

Klein, H., Loschner, B., Zyto, N., Portner, M., & Montag, T. (1998). Expression, purification, and biochemical characterization of a recombinant lectin of Sarcocystis muris (Apicomplexa) cyst merozoites. Glycoconj. J., 15, 147-153.

Kleshchenko, Y. Y., Moody, T. N., Furtak, V. A., Ochieng, J., Lima, M. F., & Villalta, F. (2004). Human Galectin-3 Promotes Trypanosoma cruzi Adhesion to Human Coronary Artery Smooth Muscle Cells. Infect. Immun., 72(11), 6717-6721.

Kur, J., Holec-Gasior, L., & Hiszczynska-Sawicka, E. (2009). Current status of toxoplasmosis vaccine development. Expert Rev. Vaccines, 8(6), 791–808. Expert Reviews.

LaVallie, E., DiBlasio, E., Kovacic, S., & Grant, K. (1993). A thioredoxin gene fusion expression system that circumvents inclusion body formation in the E. coli cytoplasm. Biotechnology, 11, 187-193.

Lagal, V., Binder, E. M., Huynh, M.-H., Kafsack, B. F. C., Harris, P. K., Diez, R., Chen, D., Cole, R. N., Carruthers, V. B., & Kim, K. (2010). Toxoplasma gondii protease TgSUB1 is required for cell surface processing of micronemal adhesive complexes and efficient adhesion of tachyzoites. Cell. Microbiol., 12(12), 1792-808.

Laliberte, J., & Carruthers, V. B. (2008). Host cell manipulation by the human pathogen Toxoplasma gondii. Cell. Mol. Life Sci., 65(12), 1900-15.

Laskowski, R. a., MacArthur, M. W., Moss, D. S., & Thornton, J. M. (1993). PROCHECK: a program to check the stereochemical quality of protein structures. J. Applied Cryst., 26(2), 283-291. International Union of Crystallography.

Lee, R. T., & Lee, Y. C. (2001). Affinity enhancement by multivalent lectin-carbohydrate interaction. Glycoconj. J., 17(7-9), 543-51.

Lietha, D., Chirgadze, D., Mulloy, B., Blundell, T., & Gherardi, E. (2001). Crystal structures of NK1-heparin complexes reveal the basis for NK1 activity and enable engineering of potent agonists of the MET receptor. EMBO J., 20(20), 5543-5555.

Linge, Jens P, Williams, M. a, Spronk, C. a E. M., Bonvin, A. M. J. J., & Nilges, M. (2003). Refinement of protein structures in explicit solvent. Proteins: Struc. Func. & Genetics, 50(3), 496-506.

Liu, M. M., Yuan, Z. G., Peng, G. H., Zhou, D. H., He, X. H., Yan, C., Yin, C. C., He,. Y., Lin, R. Q., Song, H. Q., & Zhu, X. Q. (2010). Toxoplasma gondii microneme protein 8 (MIC8) is a potential vaccine candidate against toxoplasmosis. Parasitol. Res., 106(5), 1079-84.

Liu, Y., Palma, A. S., & Feizi, T. (2009). Carbohydrate microarrays: key developments in glycobiology. Biol. Chem., 390(7), 647-56.

Long, J., Garner, T. P., Pandya, M. J., Craven, C. J., Chen, P., Shaw, B., Williamson, M. P., Layfield, R., & Searle, M. (2010). Dimerisation of the UBA domain of p62 inhibits ubiquitin binding and regulates NF-kappaB signalling. J. Mol. Biol., 396(1), 178-94.

Lopez, P. H. H., & Schnaar, R. L. (2009). Gangliosides in cell recognition and membrane protein regulation. Curr. Opin. Struc. Biol., 19(5), 549-57.

290

Lourenço, E. V., Bernardes, E., Silva, N., Mineo, J., Panuto-Castelo, A., & Roque-Barreira, M. C. (2006). Immunization with MIC1 and MIC4 induces protective immunity against Toxoplasma gondii. Microbes & Infect., 8, 1244-1251.

Lourenço, E. V., Pereira, S. R., Faça, V. M., Coelho-Castelo, A. A. M., Mineo, J. R., Roque-Barreira, M. C., Greene, L. J., & Panunto-Castelo, A. (2001). Toxoplasma gondii micronemal protein MIC1 is a lactose-binding lectin. Glycobiology, 11(7), 541.

Lovell, S. C., Davis, I. W., Iii, W. B. A., Bakker, P. I. W. D., Word, J. M., Prisant, M. G., Richardson, J. S., & Richardson D. C. (2003). Structure Validation by C-aplha Geometry: phi, psi and C-beta Deviation. Proteins: Struc. Func. & Genetics, 50, 437- 450.

Lütteke, Thomas, Frank, M., & von der Lieth, C.-W. (2005). Carbohydrate Structure Suite (CSS): analysis of carbohydrate 3D structures derived from the PDB. Nucleic Acids Res., 33(Database issue), D242- 6.

Mammen, M., Choi, S.-ki, & Whitesides, G. M. (1998). Polyvalent Interactions in Biological Systems : Implications for Design and Use of Multivalent Ligands and Inhibitors. 1Angew. Chem. Int. Ed., 37, 2754-2794.

Maréchal, E., & Cesbron-Delauw, M. F. (2001). The apicoplast: a new member of the plastid family. Trends Sci., 6(5), 200-5.

Marion, D., Kay L.E., Sparks, S.W., Torchia, D.A., & Bax, A. (1989). Three-dimensional heteronuclear NMR of 15N-labeled proteins. J Am Chem Soc, 111, 1515-1518.

Matta-Camacho, E., Kozlov, G., Trempe, J.-F., & Gehring, K. (2009). Atypical binding of the Swa2p UBA domain to ubiquitin. J. Mol. Biol., 386(2), 569-77.

McMullen, B., Fujikawa, K., & Davie, E. (1991)a. Location of the Disulfide Bonds in Human Coagulation Factor XI: The Presence of Tandem Apple Domains. Biochemistry, 30, 2056-2060.

McMullen, B., Fujikawa, K., & Davie, E. (1991)b. Location of the Disulfide Bonds in Human Plasma Prekallikrein: The Presence of Four Novel Apple Domains in the Amino-Terminal Portion of the Molecule. Biochemistry, 30, 2050-2056.

Meissner, M., Reiss, M., Viebig, N., Carruthers, V., Toursel, C., Tomavo, S., Ajioka, J. W., & Soldati, D. (2002). A family of transmembrane microneme proteins of Toxoplasma gondii contain EGF-like domains and function as escorters. J. Cell Sci., 115, 563-574.

Mercier, C., Adjogble, K. D. Z., Däubener, W., & Delauw, M-F-C. (2005). Dense granules: are they key organelles to help understand the parasitophorous vacuole of all apicomplexa parasites? Int. J. Parasitol., 35(8), 829-49.

Merritt, E. A., Sarfaty, S., Martial, J. A., Van Den Akker, F., Hoir, C. L., & Hol, W. (1994). Crystal structure of cholera toxin B-pentamer bound to receptor GMl pentasaccharide. Protein Sci., 166-175.

Meyer, B., & Peters, T. (2003). NMR Spectroscopy Techniques for Screening and Identifying Ligand Binding to Protein Receptors. Angewandte Chemie, 42(8), 864-890.

291

Mondragon, R., & Frixione, E. (1996). Ca(2+)-dependence of conoid extrusion in Toxoplasma gondii tachyzoites. J. Euk. Microbiol., 43, 120-127.

Montoya, J. G., & Liesenfeld, O. (2004). Toxoplasmosis. Lancet, 363(9425), 1965-76.

Moothoo, D. N., & Naismith, J. H. (1998). Concanavalin A distorts the beta-GlcNAc-(1,2)-Man linkage of beta-GlcNAc-(1,2)-alpha-Man-(1,3)-[beta-GlcNAc-(1,2)-alpha-Man-(1,6)]-Man upon binding. Glycobiology, 8(2), 173-181.

Mordue, D., Desai, N., Dustin, M., & Sibley, D. (1999). Invasion by Toxoplasma gondii Establishes a Moving Junction That Selectively Excludes Host Cell Plasma Membrane Proteins on the Basis of Their Membrane Anchoring. J Exp. Med., 190, 1783-1792.

Morris, G., & Freeman, R. (1979). Enhancement of nuclear magnetic resonance signals via polarisation transfer. J Am Chem Soc. 101, 760-762

Muller, J., Muller, E., Montag, T., & Zyto, N. (2001). Characterization and crystallization of a novel Sarcocystis muris lectin, SML-2. Acta Cryst. D57, 1042-1045.

Muraki, M. (2002). The importance of CH/pi interactions to the function of carbohydrate binding proteins. Pro. Pep. Lett., 9(3), 195–209.

Myszka, D., Abdiche, Y., Arisaka, F., Byron, O., Eisenstein, E., Hensley, P., Thomson, J., Lombardo, C. R., Schwarz, F., Stafford, W., & Doyle, M. L. (2003). The ABRF-MIRG'02 study: assembly state, thermodynamic, and kinetic analysis of an enzyme/inhibitor interaction. J. Biomol. Tech., 14(4), 247-269.

Nduati, E., Diriye, A., Ommeh, S., Mwai, L., Kiara, S., Masseno, V., Kokwaro, G., & Nzila, A. (2008). Effect of folate derivatives on the activity of drugs used against malaria and cancer. Parasitol. Res., 102, 1227-1234.

Nesmelova, I. V., Dings, R. P. M., & Mayo, K. H. (2008). Understanding galectin structure–function relationships to design effective antagonists. Galectins (1st ed., pp. 33–69). Wiley.

Neu, U., Woellner, K., Gauglitz, G., & Stehle, T. (2008). Structural basis of GM1 ganglioside recognition by simian virus 40. Proc. Natl. Acad. Sci. USA, 105, 5219-24.

Nicolle, C., & Manceaux, L. (1908). Sur une infection a corps de Leishman (ou organismes voisons) du gondi. C. R. Acad. Sci, 147, 736.

Nicolle, C., & Manceaux, L. (1909). Sur un protozoaire nouveau du gondi. C. R. Acad. Sci, 148, 369.

Niehus, S., Elass, E., Coddeville, B., Guérardel, Y., Schwarz, R. T., & Debierre-Grockiego, F. (2012). Glycosylphosphatidylinositols of Toxoplasma gondii induce matrix metalloproteinase-9 production and degradation of galectin-3. Immunobiology, 217(1), 61-4.

Nienaber, V. L., Richardson, P. L., Klighofer, V., Bouska, J. J., Giranda, V. L., & Greer, J. (2000). Discovering novel ligands for macromolecules using X-ray crystallographic screening. Nature Biotech., 18(10), 1105-8.

292

Nilges, M, & Linge, J. P. (1999). Influence of non-bonded parameters on the quality of NMR structures: A new force field for NMR structure calculation. J. Biomol. NMR, 13, 51-59.

Nilges, M. (1995). Calculation of protein structures with ambiguous distance restraints. Automated assignment of ambiguous NOE crosspeaks and disulphide connectivities. J. Mol. Biol., 245(5), 645- 60.

Nilges, M, Macias, M., O’Donoghue, S., & Oschkinat, H. (1997). Automated NOESY Interpretation with Ambiguous Distance Restraints: The Refined NMR Solution Structure of the Pleckstrin Homology Domain from B-Spectrin. J Mol. Biol., 269, 408-422.

Nilges, Michael, Rieping, W., Habeck, M., Bardiaux, B., Bernard, A., & Malliavin, T. (2007). ARIA2: automated NOE assignment and data integration in NMR structure calculation. Bioinformatics, 23, 381-2.

Nudelman, I., Akabayov, S., Schnur, E., Biron, Z., Levy, R., Xu, Y., Yang, D., & Anglister, J. (2011). Inter- molecular interactions in a 44 kDa interferon-receptor complex detected by asymmetric reverse- protonation and 2D NOESY. Biochemistry, 49(25), 5117-5133.

Ortega-Barria, E., & Boothroyd, J. C. (1999). A Toxoplasma Lectin-like Activity Specific for Sulfated Polysaccharides Is Involved in Host Cell Infection. J. Biol. Chem., 274(3), 1267-76.

Papagrigoriou, E., McEwan, P. a, Walsh, P. N., & Emsley, J. (2006). Crystal structure of the factor XI zymogen reveals a pathway for transactivation. Nature Struct. Mol. Biol., 13(6), 557-8.

Pelletier, I., & Sato, S. (2002). Specific recognition and cleavage of galectin-3 by Leishmania major through species-specific polygalactose epitope. J. Biol. Chem., 277(20), 17663-70.

Periz, J., Gill, A. C., Hunt, L., Brown, P., & Tomley, F. M. (2007). The microneme proteins EtMIC4 and EtMIC5 of Eimeria tenella form a novel, ultra-high molecular mass protein complex that binds target host cells. J. Biol. Chem., 282(23), 16891-8.

Pizarro, J. C., Vulliez-Le Normand, B., Chesne-Seck, M.-L., Collins, C. R., Withers-Martinez, C., Hackett, F., Blackman, M. J., Faber, B. W., Remarque, E. J., Kocken, C. H. M., Thomas, A. W., & Bentlyey, G. A. (2005). Crystal structure of the malaria vaccine candidate apical membrane antigen 1. Science, 308(5720), 408-11.

Prinz, W, Aslund, F., Holmgren, A., & Beckwith, J. (1997). The role of the thioredoxin and glutaredoxin pathways in reducing protein disulfide bonds in the Escherichia coli cytoplasm. J. Biol. Chem., 272(25), 15661-7.

Rabenau, K., Sohrabi, A., Ajioka, I., Tripathy, A., Reitter, C., James, W., Tomley, F. M., & Carruthers, V. B. (2001). TgM2AP participates in Toxoplasma gondii invasion of host cells and is tightly associated with the adhesive protein TgMIC2. Mol. Microbiol., 41, 537-547.

Rabinovich, G., & Gruppi, A. (2005). Galectins as immunoregulators during infectious processes: from microbial invasion to the resolution of the disease. Parasite Immunol., 27, 103 - 114.

Ramachandran, G., Ramakrishnan, C., & Sasisekharan, V. (1963). Stereochemistry of polypeptide chain configurations. J. Mol. Biol., 7, 95.

293

Rao, V., Qasba, P., Balaji, P., & Chadrasekaran, R. (1998). Conformation of Carbohydrates (1st ed.). Overseas Publishers Association.

Reiss, M., Viebig, N., Brecht, S., Fourmaux, M-N, Soete, M., Cristina, M. D., Dubremetz, J. F., & Soldati, D. (2001). Identification and Characterization of an Escorter for Two Secretory Adhesins in Toxoplasma gondii. J. Cell. Biol., 152, 563-578.

Rietsch, A., Bessette, P., Georgiou, G., & Beckwith, J. (1997). Reduction of the periplasmic disulfide bond isomerase, DsbC, occurs by passage of electrons from cytoplasmic thioredoxin. J. Bacteriol., 179(21), 6602-8.

Roos, D.S., Crawford, M. J., Donald, R. G. K., Fohl, L. M., Hager, K. M., Kissinger, J. C., Reynolds, M. G., Striepen, B., & Sullivan Jr, W. J. (1999). Transport and trafficking: Toxoplasma as a model for Plasmodium. Novartis Foundation Symposium (pp. 176–198). Wiley Online Library.

Sabin, A. B., & Feldman, H. A. (1948). Dyes as microchemical indicators of a new immunity phenomenon affecting a protozoon parasite (Toxoplasma). Science, 108(2815), 660.

Saouros, S, Blumenschein, A., Sawmynaden, K., Marchant, J., Koutroukides, T., Liu, B., Simpson, P., Carpenter, E. P., & Matthews, S. J. (2007). High-Level Bacterial Expression and Purification of Apicomplexan Micronemal Proteins for Structural Studies. Prot. & Pep. Lett., 44, 411-415.

Saouros, Savvas, Edwards-Jones, B., Reiss, M., Sawmynaden, K., Cota, E., Simpson, P., Dowse, T. J., Jakle, U., Ramboarina, S., Shivarattan, T., Matthews, S., & Soldati-Favre, D.. (2005). A novel galectin-like domain from Toxoplasma gondii micronemal protein 1 assists the folding, assembly, and transport of a cell adhesion complex. J Biol. Chem., 280, 38583-91.

Sawmynaden, Kovilen, Saouros, S., Friedrich, N., Marchant, J., Simpson, P., Bleijlevens, B., Blackman, M., Soldati-Favre, D., & Matthews, S. (2008). Structural insights into microneme protein assembly reveal a new mode of EGF domain recognition. EMBO Rep., 9, 1149-55.

Schieborr, U., Vogtherr, M., Elshorst, B., Betz, M., Grimme, S., Pescatore, B., Langer, T., Saxena, K., & Schwalbe, H. (2005). How much NMR data is required to determine a protein-ligand complex structure? ChemBioChem, 6(10), 1891-8.

Schupp, K., Michel, R., Riether, W., & Bierther, F. (1980). Formation of a close junction during invasion of erythrocytes by Toxoplasma gondii in vitro. Int. J Parasitol., 10, 309-313.

Sheiner, L., Santos, J. M., Klages, N., Parussini, F., Jemmely, N., Friedrich, N., Ward, G. E., & Soldati-Favre, D. (2010). Toxoplasma gondii transmembrane microneme proteins and their modular design. Mol. Microbiol., 77(June), 912-929.

Soldati, Dominique, & Meissner, M. (2004). Toxoplasma as a novel system for motility. Curr. Opin. Cell Biol., 16, 32-40.

Stein, E. G., Rice, L. M., & Brünger, A. T. (1997). Torsion-angle molecular dynamics as a new efficient tool for NMR structure calculation. J. Magn. Res., 124(1), 154-64.

Sujatha, M. S., Sasidhar, Y. U., & Balaji, P. V. (2005). Insights into the role of the aromatic residue in galactose-binding sites: MP2/6-311G++** study on galactose- and glucose-aromatic residue analogue complexes. Biochemistry, 44(23), 8554-62.

294

Suss-Toby, E., Zimmerberg, J., & Ward, G. (1996). Toxoplasma invasion: The parasitophorous vacuole is formed from host cell plasma membrane and pinches off via a fission pore. Proc. Natl. Acad. Sci. USA, 93, 8413-8418.

Suzuki, Y., Conley, F. K., & Remington, J. S. (1989). Importance of endogenous IFN-gamma for prevention of toxoplasmic encephalitis in mice. J. Immunol., 143(6), 2045-50.

Tenter, A. M., Heckeroth, A. R., & Weiss, L. M. (2000). Toxoplasma gondii: from animals to humans. Int. J. Parasitol., 30(12-13), 1217-58.

Tetley, L., Brown, S. M., McDonald, V., & Coombs, G. H. (1998). Ultrastructural analysis of the sporozoite of Cryptosporidium parvum. Microbiol., 144, 3249-55. Retrieved from

Tolman, J. R., Prestegard, J. H., Flanagan, J. M., & Kennedy, M. A. (1995). Nuclear magnetic dipole interactions in field-oriented proteins: Information for structure determination in solution. Proc. Natl. Acad. Sci. USA, 92, 9279-9283.

Tomley, F M, & Soldati, D. S. (2001). Mix and match modules: structure and function of microneme proteins in apicomplexan parasites. Trends Parasitol., 17(2), 81-8.

Toone, E. J. (1994). Structure and energetics of protein-carbohydrate complexes. Curr. Opin. Struc. Biol., 4(5), 719–728. Elsevier.

Tordai, H., Banyai, L., & Patthy, L. (1999). The PAN module: the N-terminal domains of plasminogen and hepatocyte growth factor are homologous with the apple domains of the prekallikrein family and with a novel domain found in numerous nematode proteins. FEBS Lett., 461, 63-67.

Torrey, E Fuller, & Yolken, R. H. (2003). Toxoplasma gondii and schizophrenia. Emerging Infectious Diseases, 9, 1375-1380.

Triglia, T., Healer, J., Caruana, S. R., Hodder, a N., Anders, R. F., Crabb, B. S., & Cowman, a F. (2000). Apical membrane antigen 1 plays a central role in erythrocyte invasion by Plasmodium species. Mol. Microbiol., 38(4), 706-18.

Tsai, B., Gilbert, J. M., Stehle, T., Lencer, W., Benjamin, T. L., & Rapoport, T. a. (2003). Gangliosides are receptors for murine polyoma virus and SV40. EMBO J., 22(17), 4346-55.

Turnbull, W. B., & Daranas, A. H. (2003). On the value of c: can low affinity systems be studied by isothermal titration calorimetry? J. Am. Chem. Soc., 125(48), 14859-66.

Vyas, N., Vyas, M., & F, Q. (1988). Sugar and signal-transducer binding sites of the Escherichia coli galactose chemoreceptor protein. Science, 242, 1290-5.

Wang, D., Liu, S., Trummer, B. J., Deng, C., & Wang, A. (2002). Carbohydrate microarrays for the recognition of cross-reactive molecular markers of microbes and host cells. Nature Biotechnology, 20(3), 275-81.

Webster, J. P., Lamberton, P. H. L., Donnelly, C. a, & Torrey, E. F. (2006). Parasites as causative agents of human affective disorders? The impact of anti-psychotic, mood-stabilizer and anti-parasite medication on Toxoplasma gondii’s ability to alter host behaviour. Proc. Royal Soc., 273(1589), 1023-30.

295

Wetzel, D. M., Chen, L. A., Ruiz, F. a, Moreno, S. N. J., & Sibley, L. D. (2004). Calcium-mediated protein secretion potentiates motility in Toxoplasma gondii. J. Cell Sci., 117(Pt 24), 5739-48.

Wilkins, M. R., Gasteiger, E., Bairoch, A., Sanchez, J. C., Williams, K. L., Appel, R. D., & Hochstrasser, D. F. (1999). Protein identification and analysis tools in the ExPASy server. Methods Mol. Biol., 112, 531- 52.

Wiman, B. (1973). Primary structure of peptides released during activation of human plasminogen by urokinase. Eur. J. Biochem., 39(1), 1-9.

Wiseman, T., Williston, S., Brandts, J. F., & Lin, L. N. (1989). Rapid measurement of binding constants and heats of binding using a new titration calorimeter. Anal. Biochem., 179(1), 131-7.

Wishart, D S, Sykes, B. D., & Richards, F. M. (1992). The Chemical Shift Index: A Fast and Simple Method for the Assignment of Protein Secondary Structure through NMR Spectroscopy. Biochemistry, 31, 1647-1651.

Wishart, David S, Bigam, C. G., Yao, J., Abildgaard, F., Jane, H. D., Oldfield, E., Markley, J. L., & Sykes, B. D. (1995). Chemical shift referencing in biomolecular NMR. Science, 6, 135-140.

Wittekind, M., & Mueller, L. (1993). HNCACB, a high-sensitivity 3D NMR experiment to correlate amide- proton and nitrogen resonance with alpha- ad beta-carbon resonances in proteins. J Magn Res B, 101, 201-205.

Xia, Y., Zhu, Q., Jun, K.-Y., Wang, J., & Gao, X. (2010). Clean STD-NMR spectrum for improved detection of ligand-protein interactions at low concentration of protein. Magn. Res. Chem., 48(12), 918-24.

Yamaguchi, Y, K. K. (2008). Analyses of Sugar-Protein Interactions by NMR. Experimental Glycoscience: Glycochemistry (1st ed., Vol. Part I, Se, pp. 121-123). Springer.

Yoshida, H., Teraoka, M., Nishi, N., Nakakita, S.-ichi, Nakamura, T., Hirashima, M., & Kamitori, S. (2010). X-ray structures of human galectin-9 C-terminal domain in complexes with a biantennary oligosaccharide and sialyllactose. J. Biol. Chem., 285(47), 36969-76.

Zheng, B., He, A., Gan, M., Li, Z., He, H., & Zhan, X. (2009). MIC6 associates with aldolase in host cell invasion by Toxoplasma gondii. Parasitol. Res.

Zhou, H., Mazzulla, M., Kaufman, J., Stahl, S., Wingfield, P., Rubin, J., Bottaro, D., & Byrd, R. A. (1998). The solution structure of the N-terminal domain of hepatocyte growth factor reveals a potential heparin-binding site. Structure, 6, 109-116.

Zhuang, T., Lee, H. S., Imperiali, B., & Prestegard, J. H. (2008). Structure determination of a Galectin-3– carbohydrate complex using paramagnetism-based NMR constraints. Protein Sci., 17(7), 1220– 1231.

Zhuang, T., Leffler, H., & Prestegard, J. H. (2006). Enhancement of bound-state residual dipolar couplings : Conformational analysis of lactose bound to Galectin-3. Protein Sci., 15, 1780-1790.

Zwahlen, C., Legault, P., Vincent, J. F., Greenblatt, J., Konrat, R., & Kay, L. E. (1997). Methods for Measurement of Intermolecular NOEs by Multinuclear NMR Spectroscopy: Application to a Bacteriophage λ N-Peptide/boxB RNA Complex. J. Am. Chem. Soc., 119, 6711-6721.

296

Zweckstetter, M., & Jung, Y.-S. (2004). MARS – robust automatic backbone assignment of proteins. J Biomol. NMR, 30, 11-23.

297