<<

Unlocking nature’s glycosylation potential: characterization and engineering of novel /trehalose synthases

ir. Margo Diricks

Thesis submitted in fulfillment of the requirements for the degree of

Doctor (PhD) in Applied Biological Sciences

Academic year: 2016-2017

Promotor

Prof. dr. Tom Desmet (Ghent University)

Faculty of Bioscience engineering

Department of Biochemical and Microbial Technology

Centre for Synthetic Biology

Coupure Links 653, 9000 Ghent – Belgium

Dean

Prof. dr. ir. Marc Van Meirvenne

Rector

Prof. dr. Anne De Paepe

Examination committee

Prof. dr. ir. Dirk Reheul (Ghent University, Chairman)

Prof. dr. ir. Matthias D’hooghe (Ghent University, Secretary)

Prof. dr. ir. Bartel Vanholme (Ghent University, VIB)

Prof. dr. ir. Wim Van den Ende (KU Leuven)

Prof. dr. ir. Yves Briers (Ghent University)

Prof. dr. Tom Desmet (Ghent University) To refer to this thesis

Diricks, M. (2017) Unlocking nature’s glycosylation potential: characterization and engineering of novel sucrose/trehalose synthases. PhD thesis, Faculty of Bioscience Engineering, Ghent University, Ghent.

Acknowledgements This work was supported by the Special Research Fund (BOF, doctoral scholarship) of Ghent University as well as the European Commission FP7-project ‘SuSy’ (grant nr. 613633).

Cover illustration

Designed by Margo Diricks

ISBN 978-94-6357-018-3 Copyright © 2017 by Margo Diricks. All rights reserved.

The author and the promotor give the authorization to consult and to copy parts of this work for personal use only. Every other use is subject to the copyright laws. Permission to reproduce any material contained in this work should be obtained from the author.

Table of contents

List of abbreviations ...... 1

Introduction ...... 3

CHAPTER 1: Literature review ...... 9

1 Carbohydrate active ...... 10

1.1 Introduction ...... 10

1.2 Classification ...... 11

1.3 ...... 11

1.4 GT4 family ...... 15

2 ...... 16

2.1 Reaction, classification and mechanism ...... 16

2.2 promiscuity ...... 17

2.3 Structure ...... 17

2.4 Factors influencing expression and activity ...... 19

2.5 Biotechnological applications ...... 21

3 Construction of phylogenetic trees ...... 22

4 engineering ...... 24

4.1 Engineering substrate specificity ...... 25

4.2 Engineering protein stability ...... 30

4.3 Engineering of GTs: case studies ...... 32

5 Supplementary materials ...... 35

CHAPTER 2: Identification and characterization of novel bacterial SuSy enzymes ...... 39

1 Abstract ...... 40

2 Introduction ...... 41

3 Materials and methods ...... 42

3.1 Materials ...... 42

3.2 Phylogenetic analysis ...... 42 3.3 Cloning of novel SuSy genes in a constitutive expression system...... 45

3.4 Construction of truncation mutants of SuSyAc ...... 47

3.5 Enzyme production and purification ...... 47

3.6 Enzyme assays...... 48

3.7 SDS-PAGE analysis ...... 50

3.8 Western blot analysis ...... 50

3.9 Effect of pH, temperature and divalent cations on SuSy activity ...... 51

3.10 sequence accession numbers ...... 51

3.11 Statistical analysis ...... 51

4 Results and discussion ...... 52

4.1 Phylogenetic and taxonomic analysis of prokaryotic SuSy sequences ...... 52

4.2 Recombinant expression of novel prokaryotic SuSys ...... 54

4.3 Effect of pH, temperature and divalent cations on the activity of novel prokaryotic SuSys ...... 55

4.4 Kinetic properties and substrate specificity of novel prokaryotic SuSys ...... 57

4.5 Production of UDP- with SuSyAc ...... 59

4.6 Truncation mutants of SuSyAc ...... 60

4.7 Genomic organization of SuSy genes in ...... 61

4.8 Phosphorylation of prokaryotic SuSys ...... 64

5 Conclusions ...... 65

6 Supplementary materials ...... 67

CHAPTER 3: Improving the affinity of a bacterial SuSy for the nucleotide acceptor UDP .....77

1 Abstract ...... 78

2 Introduction ...... 79

3 Materials and methods ...... 79

3.1 Amino acid distribution ...... 79

3.2 Site-directed mutagenesis ...... 80

3.3 Enzyme production and purification ...... 81

3.4 Characterization of variant SuSys ...... 81 3.5 Coupled reactions ...... 81

3.6 Homology modeling ...... 82

4 Results and discussion ...... 83

4.1 Nucleotide preference and its relation to the QN motif ...... 83

4.2 Mutational analysis of the QN motif ...... 85

4.3 Coupled reaction between SuSy and a C- ...... 88

5 Conclusions ...... 90

6 Supplementary materials ...... 91

CHAPTER 4: Driving the donor specificity of SuSy towards GalFru, for the efficient one-step production of UDP-galactose ...... 97

1 Abstract ...... 98

2 Introduction ...... 99

3 Materials and methods ...... 100

3.1 Materials ...... 100

3.2 PCR protocols ...... 100

3.3 Construction of site-directed mutants and enzyme libraries ...... 102

3.4 Transformation...... 104

3.5 Enzyme production and purification ...... 104

3.6 Enzyme assays...... 105

3.7 Screening protocol for site-directed mutants ...... 105

3.8 Screening protocol for enzyme libraries ...... 105

3.9 Experiments on consumption by E. coli ...... 108

3.10 Production of iCLEAs ...... 108

3.11 Visualization of interactions between enzyme and substrate...... 109

4 Results and discussion ...... 110

4.1 Activity of SuSy on UDP-galactose and GalFru...... 110

4.2 Development of a screening protocol to screen enzyme libraries for improved activity on GalFru ...... 111

4.3 Selection of target residues for mutagenesis ...... 114

4.4 Production of iCLEAs ...... 131 5 Conclusions ...... 133

6 Supplementary materials ...... 135

CHAPTER 5: Engineering the stability of SuSyAc ...... 141

1 Abstract ...... 142

2 Introduction ...... 143

3 Materials and methods ...... 143

3.1 In silico prediction of stabilizing mutations: foldX and Rosetta ...... 143

3.2 Construction of site-directed mutants ...... 144

3.3 Enzyme production and purification ...... 145

3.4 Determination of kinetic and thermodynamic stability ...... 145

3.5 Statistical analysis ...... 146

4 Results and discussion ...... 146

4.1 Stability of wild-type SuSyAc...... 146

4.2 Increasing the stability of flexible regions ...... 148

4.3 Predicting stabilizing mutations in silico ...... 150

5 Conclusions ...... 153

6 Supplementary materials ...... 154

CHAPTER 6: Introducing activity into a glycosyltransferase ...... 163

1 Abstract ...... 164

2 Introduction ...... 165

3 Materials and methods ...... 166

3.1 Alignments and phylogenetic analyses ...... 166

3.2 Cloning of TreTFp from an inducible expression system (pET21a) into a constitutive one (pCXP34) ...... 166

3.3 Construction of site-directed mutants and chimeric enzymes ...... 167

3.4 Expression and purification of TreTFp (mutants) from the pET21a and pCXP34 vector ...... 169

3.5 Enzyme assays...... 169

3.6 Screening protocol for site-directed mutants of TreTFp ...... 170

3.7 Homology modeling ...... 170 4 Results and discussion ...... 171

4.1 Comparison of TreT and TreP: functional properties, reaction mechanism, phylogenetic analysis and structure ...... 171

4.2 Characterization of TreTFp ...... 179

4.3 Mutagenesis of TreTFp ...... 180

5 Conclusion ...... 185

6 Supplementary materials ...... 186

CHAPTER 7: General discussion and future perspectives ...... 195

1 Wrongly annotated protein sequences highlight the need for the development of an automated validation system ...... 196

2 Choice of SuSy enzyme depends on the application ...... 198

3 Structure-function relationships within the GT4 family ...... 199

4 Correlated positions as hotspots for mutagenesis ...... 202

5 The quest for stable enzymes ...... 203

6 Outlook ...... 204

References ...... 207

Summary ...... 245

Samenvatting ...... 249

Curriculum vitae ...... 253

List of abbreviations

ABTS 2,2'-azino-bis(3-ethylbenzothiazoline-6-sulphonic acid) AN acetonitrile ADP AS ammonium sulphate ATP BCA bicinchoninic acid BP back-up plate CAZymes carbohydrate active enzymes CCE crude cell extract (supernatant after centrifugation of a bacterial culture) CDP cytidine diphosphate CLEA cross-linked enzyme aggregate CPEC circular polymerase extension cloning DMSO dimethyl sulfoxide DNA deoxyribonucleic acid dNTP deoxyribose nucleoside triphosphate EDTA ethylenediaminetetraacetic acid Fru fructose Frk Fructokinase (EC 2.7.1.4) GDP guanosine diphosphate GA glutaraldehyde Gal galactose GalE UDP-glucose 4′-epimerase (EC 5.1.3.2) GalFru α-D-galactopyranosyl-(1→2)-β-D-fructofuranoside GalT Glc glucose Glc1P α-D-glucose 1- GlcT GT glycosyltransferase HGT horizontal gene transfer LB-miller lysogeny broth-Miller (10 g/L trypton, 5 g/L yeast extract, 10 g/L NaCl) LCN lichenan, a glucosyl analogue LCN* LCN but with an axial C4-OH group (instead of equatorial) MP master plate MSA multiple sequence alignment MTP microtiterplate

1

NDP nucleoside diphosphate (e.g. ADP, UDP, CDP, GDP, TDP) NHF 1,5-anhydrofructose Ni-NTA nickel-nitrilotriacetic acid (Ni-NTA)

PBS phosphate buffered saline (composed of NaH2PO4, Na2HPO4 and NaCl) PMSF phenylmethylsulfonyl fluoride SPS Sucrose-Phosphate Synthase (EC 2.4.1.14) Suc sucrose (α-D-glucopyranosyl-(1→2)-β-D-fructofuranoside) SuSy Sucrose Synthase (EC 2.4.1.13) SuSyAc SuSy of ATCC 51756 (UniProt ID: A0A059ZV61) SuSyAt1 SuSy isoform 1 of (UniProt ID: P49040) SuSyCt SuSy of Chroococcidiopsis thermalis PCC 7203 (UniProt ID: K9U774) SuSyDa SuSy of Denitrovibrio acetiphilus DSM 12809 (UniProt ID: D4H6M0) SuSyMr SuSy of Melioribacter roseus JCM 17771 (UniProt ID: I7A3T6) SuSyNe SuSy of Nitrosomonas europaea ATCC 19718 (UniProt ID: Q820M5) SuSyTe SuSy of Thermosynechococcus elongatus BP-1 (UniProt ID: Q8DK23) TLC thin layer chromatography Tre trehalose TreP retaining Trehalose Phosphorylase (EC 2.4.1.231) TrePGf TreP of Grifola frondosa (UniProt ID: O75003) TreT Trehalose glycosylTransferring synthase (EC 2.4.1.245) TreTFp TreT of Fervidobacterium pennivorans (WP_014451106) TreTPh TreT of Pyrococcus horikoshii ATCC 700860 (UniProt ID: O58762) TPS Trehalose 6-phosphate Synthase (EC 2.4.1.15) TPSEc TPS of Escherichia coli K12 (OtsA, UniProt ID: P31677) UDP uridine diphosphate UDP-Gal uridine diphosphate galactose UDP-Glc uridine diphosphate glucose WPP whole plasmid PCR WT wild-type

2

Introduction

3 Introduction

Industrial or white biotechnology involves the use of living organisms (such as bacteria, yeast and algae) or parts thereof to generate industrial products out of renewable carbon sources or waste streams. It includes centuries-old practices such as the manufacturing of cheese, bread, beer and wine, as well as more recent examples like the production of bulk chemicals (e.g. bioethanol or building blocks for polymers) as well as high value compounds (e.g. antibiotics, therapeutic proteins or prebiotics). Some of these newly developed technologies will contribute to the shift from a fossil based economy to a sustainable biobased economy and provide a means to deal with the finite supply of fossil resources and climate change1.

One of the cornerstones of white biotechnology is biocatalysis, which can be defined as the use of enzymes to perform chemical conversion of molecules. Enzymes are typically highly efficient, regio- and stereoselective and able to catalyze reactions under mild operational conditions in an aqueous environment. They thus present a green alternative for classical chemical synthesis and are currently widely used in the textile (e.g. cellulase for bio-stoning2), food (e.g. thermolysin for the production of the sweetener aspartame3), feed (e.g. phytase for improved digestibility4), chemical (e.g. nitrile hydratase for the production of acrylamide5), medical (e.g. glucose oxidase as biosensor6) and pulp/paper industries (e.g. cellulase to facilitate de-inking7)8–10.

This work focuses on the use of enzymes for the conversion and production of compounds containing glycosidic bonds, such as sucrose (table sugar, α-D-glucopyranosyl-(1→2)-β-D- fructofuranoside), nucleotide-activated sugars and glycosides with therapeutic significance (e.g. the anti-oxidant nothofagin) (Figure 1). Nature’s most efficient biocatalysts for the creation of these glycosidic linkages are GlycosylTransferases (GTs), which catalyze the transfer of a sugar group from an activated donor substrate onto an acceptor11. This process, known as glycosylation, can be used to improve the activity, solubility, stability, flavor and/or pharmacokinetic behavior of small (hydrophobic) compounds12. Unfortunately, low expression yields and/or activity, poor long-term stability and – perhaps most importantly – the need for nucleotide sugars as donor substrates, has hampered the large-scale application of GTs13–16. Nucleotide sugars such as uridine diphosphate glucose (UDP-Glc), are indeed rarely available in large amounts and very expensive (e.g. 150 €/g for UDP-Glc), making the GT reaction economically unfeasible. This issue was addressed in this work using two different strategies: the use of Sucrose Synthase (SuSy) as intermediate enzyme to produce nucleotide sugars from the cheap substrate sucrose (300 €/ton) and the engineering of GTs to alter the donor substrate specificity towards cheaper glycosyl-.

A general overview of this thesis is presented in Figure 1. Chapter 1 consists of a concise literature review to provide the reader with some background knowledge related to this work. Research results are presented and discussed in Chapter 2-6. Finally, a general discussion and future perspectives are covered in Chapter 7.

4 Introduction

Figure 1 Outline of this PhD dissertation. Target products or donor substrates are colored red. Suc: sucrose, NDP: nucleoside diphosphate (e.g. ADP or UDP), Glc: glucose (blue colored structure), Fru: fructose, Glc1P: glucose 1-phosphate, Tre: trehalose, Pi: inorganic phosphate, *: mutant enzyme, GT: GlycosylTransferase, SuSy: Sucrose Synthase, TreT: Trehalose glycosylTransferring synthase.

5 Introduction

SuSy is an atypical GT that catalyzes the readily reversible conversion of sucrose (Suc) and a nucleoside diphosphate (NDP) into NDP-Glc and fructose (Fru). Chapter 2 describes the identification of novel bacterial SuSy enzymes and the characterization of their properties (e.g. expression yield, stability and kinetic parameters). This information was used to identify the best candidate for the production of UDP-Glc as end-. Furthermore, new insights into the sucrose metabolism of bacteria were acquired by phylogenetic, taxonomic and genomic analysis of sucrose metabolizing enzymes. In Chapter 2, the most promising enzyme from Chapter 1 was engineered to improve its catalytic parameters for the nucleotide acceptor UDP (Chapter 3), which was required to use the enzyme efficiently in coupled reactions involving a second GT enzyme. In these one-pot reactions, the nucleotide sugar produced by SuSy is used as donor substrate for the other GT, resulting in the cost-effective production of valuable glycosides with in situ regeneration/recycling of the nucleotide acceptor. In addition, several attempts were made to alter the donor substrate specificity towards the sucrose analogue GalFru (Chapter 4) and to further enhance the stability of SuSy (Chapter 5). Efficient conversion of GalFru (α-D- galactopyranosyl-(1→2)-β-D-fructofuranoside) as non-natural substrate would result in the one- step production of the nucleotide sugar UDP-galactose (UDP-Gal: 2500 €/g), thereby providing a more general glycosylation platform. In Chapter 6, the possibility to alter the donor specificity of a GT from nucleotide sugars towards cheaper glycosyl-phosphates such as glucose 1-phosphate (Glc1P: 15 €/g), is scrutinized. Trehalose glycosylTransferring synthase (TreT), which produces trehalose from NDP-glucose and glucose, was used as a test case to this end. Experiments described in Chapter 3-6 mainly consist of (semi-)rational protein engineering techniques (site- directed mutagenesis, site-saturation mutagenesis and the construction of chimeric enzymes), which make use of structural and/or sequence information to identify hotspots for mutagenesis.

It has to be noted that a significant part of this research was conducted in context of a collaborative European FP7 project: ‘SuSy – Sucrose Synthase as Effective mediator of Glycosylation’ (http://www.glycosusy.eu/). Eight different partners from five European countries were involved in this project, amongst which the University of Ghent, the group of my promotor Prof. Dr. Tom Desmet in particular. The main objective of this project was, just like mine, the development of a generic technology for the cost-efficient application of GT enzymes at an industrial scale. To this end, production of glucosides (glycosides with a glucose moiety) using SuSy and a second leloir GT would be optimized, as well as the production of other glycosides (e.g. quercetin-galactoside) by the integration of a third enzyme: (LS), which produces sucrose analogues (e.g. GalFru) from sucrose and a monosaccharide (Figure 2). Although the work presented in this thesis is clearly relevant for the European project, it also contributed to our fundamental knowledge about sucrose metabolism in bacteria and catalytic mechanisms of retaining GTs.

6 Introduction

Figure 2 Schematic overview of the major objective of the European FP7 project ‘SuSy’: cost- efficient production of glycosides using LevanSucrase (LS), Sucrose Synthase (SuSy) and a second leloir GlycosylTransferase (GT) as biocatalysts.

7 Introduction

8

CHAPTER 1: Literature review

9 Chapter 1: Literature review

1 Carbohydrate active enzymes

1.1 Introduction

Carbohydrates (also called saccharides or sugars, e.g. cellulose or sucrose), glycosides (e.g. resveratrol glucoside) and glycoconjugates (e.g. glycoproteins) are molecules that play an important role in many biological processes and are therefore indispensable for living creatures. They are, for example, used by cells as an energy source, they are key molecules in different metabolic pathways, they function as coenzymes, signaling molecules (e.g. for intra- and interspecies communication), structural components, and they are involved in biological recognition processes17,18. In addition, many bacteria produce toxic glycosylated small molecules to gain a selective advantage in their habitat18. Besides the biologic significance, the synthesis of glycosidic bonds is also of high commercial value, because the produced compounds can be used for a wide range of applications in food, feed, pharmaceutical and cosmetic industries (Figure S2). Oligosaccharides, for example, have great potential in the pharmaceutical industry, as antibiotics preventing the entrance of bacteria or viral invaders19, and in the food industry, as health-promoting prebiotics and low-caloric or non-cariogenic sweeteners20–22. Furthermore, glycosylation of a non-carbohydrate acceptor such as pharmaceuticals or hydrophobic compounds can be used as a potential mechanism to improve their activity, solubility, stability, bioactivity, flavor and/or pharmacokinetic behaviour12. Ascorbic acid (vitamin C) for example, well known for its anti-oxidant activity and used in food, cosmetic and pharmaceutical industries, is more stable and has an improved long-term storage when a glucosyl moiety is attached to it23. Resveratrol on the other hand, a molecule which exhibits cardio protective effects and anti-cancer properties, is much more soluble when glycosylated24–26. Galactosylation of quercetin, which is a common flavonoid in food and a potent anti-oxidant/anti-inflammatory agent, provides a mean to avoid the unwanted side-effects attributed to the aglycon, to target the molecule to the liver and to increase its bioavailability and solubility27,28.

Glycosides can be extracted directly from natural sources such as but this method is often highly-laborious and low-yielding. Alternatively, glycosylation can be performed either by conventional chemical synthesis (e.g. Koenigs-Knorr reaction with glycosyl halides activated with silver salts29) or by a biocatalytic reaction with enzymes. Although chemical glycosylation is used intensively in the field of glycochemistry, it suffers from a number of drawbacks including labor- intensive activation and protection procedures to allow for regioselectivity, multistep synthetic routes with low overall yields, the use of toxic catalysts and solvent, the production of a large amount of waste and incompatibility with EC food regulations14,30. Therefore, carbohydrate active enzymes (CAZymes), which are able to synthesize, degrade or modify carbohydrates under milder conditions and with high regio- and/or stereoselectivity, present an attractive ‘green’ alternative for glycosylation reactions14,30. Indeed, it has been shown that optimized enzymatic

10 Chapter 1: Literature review glycosylations can result in a fifteen times higher space-time yield and a 5-fold reduction of waste generation31.

1.2 Classification

CAZymes can be classified in different ways, depending on what criterion is used. A well-known database of these enzymes is the CAZy database, which currently covers five enzyme classes: glycoside , glycosyltransferases, polysaccharide , carbohydrate esterases and auxiliary activities32. The latter are redox enzymes, such as lytic polysaccharide mono- oxygenases, that work in conjunction with CAZymes. Each of these enzyme classes consist of a (large) number of families and classification of sequences within these families is based on hydrophobic cluster analysis and/or Hidden Markov Model and BLAST-based sequence similarity methods. By corollary, enzymes within one family are expected to share a similar sequence, three-dimensional fold, reaction mechanism and a common ancestor32,33.

Other widely used classification systems are based on structural information (SCOP) or the reaction they catalyze. The latter is expressed as the Enzyme Commission (EC) number consisting of four digits and is associated with a recommended name for the enzyme34. Six EC classes are described: (EC 1), (EC 2), hydrolases (EC 3), lyases (EC 4), (EC 5) and (EC 6). Within one family of the CAZy classification, different specificities (or EC numbers) can be found.

1.3 Glycosyltransferases

Nature’s most efficient catalysts for the creation of glycosidic bonds are GTs. They are present in and and catalyze the transfer of a sugar group from an activated donor substrate onto an acceptor. Most of the GTs use nucleoside diphosphate (NDP) sugars such as UDP-glucose, ADP-glucose (ADP-Glc), UDP-galactose (UDP-Gal), UDP-glucuronate, UDP- , UDP-, UDP-rhamnose, UDP-N-acetylglucosamine or GDP-mannose (GDP- Man) as donors, and these are often referred to as Leloir glycosyltransferases11,13,35. UDP dependent Leloir GTs (UGTs) constitute the largest group within this class. Non-leloir GTs, on the other hand, use sugar donors with nucleoside monophosphate (e.g. CMP-NeuAc), lipid phosphate (e.g. dolichol phosphate oligosaccharides) or unsubstituted phosphate (e.g. glucose 1- phosphate, Glc1P) as activating moieties. Possible acceptor substrates include other sugars, lipids, proteins, nucleic acids, antibiotics or other small molecules. Although the sugar unit is most commonly attached to the nucleophilic oxygen of a hydroxyl substituent of the acceptor substrate, it can also occur to other nucleophiles such as nitrogen (e.g. formation of N-linked glycoproteins), sulfur (e.g., the formation of thioglycosides), and carbon (e.g. C-glycoside antibiotics)11,36.

1.3.1 Structural folds of GTs

The majority of GTs either adopt GT-A or GT-B folds (Figure 3)11. In addition, a few enzymes with a GT-C or a GT-D fold are reported37–39, but these structures fall without the scope of this review.

11 Chapter 1: Literature review

GT-A enzymes consist of two closely abutting β/α/β containing Rossmann-fold like domains, typical for nucleotide binding proteins. The N-terminal mainly binds the nucleotide sugar, while the C-terminal is important for acceptor binding. Most enzymes belonging to the GT-A family have a common Asp-X-Asp (DXD) motif, which is involved in the coordination of an essential divalent cation and/or ribose11,40.

The GT-B fold is characterized by two loosely associated β/α/β Rosmann-like domains connected by a linker region. In contrast to GT-A enzymes, the N-terminal domain (GT-BN) is involved in acceptor binding and the C-terminal one (GT-BC) in sugar donor (e.g. NDP-Glc) binding. Both domains face each other, resulting in a cleft containing the . Ligand binding is associated with conformational changes in the relative orientation of these two domains11,41. Metal ions are not implicated in the catalytic mechanism of these enzymes although they can have an influence on enzyme activity11,42–44.

Figure 3 Cartoon representation of the two major folds observed for glycosyltransferases. The GT-A fold (left) is represented by an inverting enzyme from Bacillus subtilis (PDB 1QGQ) (left) and the GT- B fold (right) by a bacteriophage T4 β-glucosyltransferase (PDB 1JG7).

1.3.2 Reaction mechanism of GTs

Similar to other CAZymes, GTs can be either retaining or inverting depending on their reaction mechanism. Indeed, the configuration at the anomeric carbon (C1) of the donor substrate can either be retained (e.g. α to α) or inverted (e.g. α to β) in the resulting product. Both types can adopt either a GT-A or GT-B fold.

Inverting GTs display an SN2-like direct displacement mechanism involving an enzymatic base catalyst (typically Glu or Asp), which deprotonates the acceptor to facilitate its nucleophilic attack (Figure S1A)11. This type of reaction leads to an altered configuration at the anomeric carbon (C1) of the donor substrate (e.g. α to β). The departure of the nucleoside diphosphate is facilitated by a coordinated divalent cation (e.g. Mg2+ or Mn2+) in GT-A enzymes and by a positively charged side chain and/or hydroxyls and helix dipoles in GT-B enzymes11.

A general consensus about the exact reaction mechanism of retaining GTs has not yet been reached45, but basically two plausible catalytic mechanisms have been proposed: a double

12 Chapter 1: Literature review displacement mechanism involving a covalent glycosyl-enzyme intermediate46,47 and an internal

SNi front side mechanism characterized by a nucleophilic attack occurring on the same side of the departure of the leaving group11,48–54 (Figure S1B and Figure S1C). The latter can either occur concerted involving only one oxocarbenium ion transition state or stepwise with a short-lived oxocarbenium phosphate ion pair intermediate. Although covalent glycosyl-intermediates have been experimentally observed by mass spectrometry46, absence of such an intermediate or an appropriately positioned enzymatic nucleophile and computational studies have suggested an SNi mechanism for several retaining GTs11,48–54.

1.3.3 Conserved domains/motifs of GTs

The sequences of acceptor-binding domains of GTs are typically highly variable as they have evolved to accommodate a very diverse range of acceptors molecules44,55. In contrast, three conserved nucleotide recognition domains (NRD) have been identified: a C-terminal NRD1α domain in retaining GT-Bs, a corresponding C-terminal NRD1β domain present in inverting GT- Bs and an N-terminal NRD2 domain, exclusively found in membrane-bound GT-As (Figure 4)56.

Figure 4 Conserved domains of glycosyltransferases. PSPG box: conserved motif of secondary product GTs. The HX7E (NRD1β) and EX7E (NRD1α) motifs are bold and underlined. The PQ and (D)Q motifs (NRD1β) are underlined.

Members of the NRD1α family can accept both purine and pyrimidine-containing sugar and are characterized by the presence of an EX7E motif within the NRD. This motif consist of two glutamate residues (E) separated by seven non-conserved residues (depicted as X)56,57. The two glutamate residues are thought to be involved in catalysis - as a nucleophile, proton donor or stabilizer of the glucose moiety of the nucleotide sugar - but the exact role and the relative importance of the residues is still a matter of debate56–62. Nevertheless, they both have proven to be critical for the activity of several different GTs. The inverting GTs are subdivided into the NRD1β family. Here, a similar conserved region as in NRD1α, HX7E is present, in addition to two other PQ and/or (D)Q motifs (Figure 4). As all NRD1 GTs can only accept nucleotide sugars with a pyrimidine moiety (UDP or TDP), the Q residues are thought to play a role in the recognition of this unit while the first residue of the H/EX7E motif could be discriminating between retaining and inverting types56. In plant GTs involved in the glycosylation of secondary metabolites (e.g. flavonoids), part of the NRD1β motif is also known as the PSPG

13 Chapter 1: Literature review box (plant secondary product GTs). It consists of 44 amino acids in the C-terminal region, including fourteen conserved positions, and is thought to be involved in sugar donor binding (Figure 4)63. Despite its name, this motif was also identified in vertebrate, insect and bacterial glycosyltransferases64. The third family, NRD2, consists of membrane-bound inverting GTs capable of using both purine- and pyrimidine sugar donors56. Based on the CAZy classification, most NRD1α enzymes belong to GT4 (GT-B fold), NRD1β enzymes to GT1 (GT-B fold) and NRD2 enzymes to GT2 or GT21 (GT-A fold)56,65. It has to be noted that many other GTs do not contain any of these three domains56.

1.3.4 Applications of GTs

GTs are generally considered as useful synthetic tools for the production of natural oligosaccharides and glycosides, with a diverse range of applications in various industries (Figure S2)13,66. Glycosylation of specialized metabolites is mostly catalyzed with UDP-glucose, UDP- glucuronate, UDP-xylose, UDP-arabinose and UDP-rhamnose, while formation of oligosaccharides typically involves UDP-galactose or UDP-N-acetylglucosamine13. GT enzymes are highly selective, resulting in high yields. In addition, numerous different substrate specificities are currently identified14. However, low expression yields and/or activity, lack of long-term stability and their need for expensive nucleotide sugars, has hampered their large-scale application13–16.

Currently, two major strategies are applied to make GT reactions more cost-efficient: in vitro production of nucleotide sugars and regeneration/recycling of the expensive cofactors using coupled enzymatic reactions or the use of whole-cell in vivo systems13. The latter rely on the intracellular UDP-sugar pool of the microbial host and the methodology eliminates the need for extensive enzyme purification. However, in vivo systems often suffer from acceptor toxicity or solubility issues, low conversion yields/titers, problems regarding scale-up and a more complex downstream processing is typically necessary to purify the target glycosides13.

Three major routes exist for the formation of nucleotide sugars: the kinase, phosphorylase and synthase pathway (Figure 5)13. The first two involve the production of a sugar 1-phosphate, either starting from a monosaccharide using a kinase enzyme or from a disaccharide or polysaccharide using . This sugar phosphate is subsequently converted to the corresponding UDP-sugar using uridylyltransferases and UTP. The other pathway produces the nucleotide sugar in one step using a synthase enzyme and a nucleoside diphosphate (e.g. UDP). To that end, Sucrose Synthase or Trehalose synthase are frequently used because of their favorable equilibrium constants and cheap donor substrates (sucrose and trehalose)13,67,68. Glycosylation of an acceptor compound by a GT using the nucleotide sugar as donor, results in the release of the nucleoside diphosphate (e.g. UDP). The latter can directly by reused by the synthase enzyme or can be converted back to UTP using ATP and another kinase enzyme (Figure 5). The synthase and phosphorylase route are mainly restricted to the production of UDP-Glc. However, many other sugar nucleotides can be formed starting from UDP-Glc using the (combined) action of

14 Chapter 1: Literature review epimerases, dehydrogenases, reductases or decarboxylases13. A commonly applied reaction is the epimerization of UDP-Glc to UDP-Gal by a UDP-glucose 4-epimerase (GalE)69–71. However, this step is often rate-limiting in the coupled (three-enzyme) reaction cycle with a galactosyltransferase (GalT)13,72. In addition, the equilibrium of GalE favors UDP-Glc formation, which means that the coupled GalT must have a strong preference for galactosyltransfer over glucosyltransfer to efficiently obtain the galactosylated compound. The galactokinase route, which involves the phosphorylation of galactose and subsequent conversion to UDP-Gal, has therefore been proposed to be a more interesting alternative13. This pathway does, however, require the supply of expensive UTP and typically involves at least three enzymes making reaction optimization more difficult73,74.

Finally, it has to be noted that the abovementioned enzymatic in vitro routes for UDP-sugar synthesis can also be introduced in whole-cells by genetic engineering to increase the natural intracellular nucleotide sugar pool, thereby boosting the production of the glycosides13.

Figure 5 Routes towards the formation of UDP-sugars and the coupled reaction with a GT for the production of glycosides. T: target acceptor compound, S: sugar moiety, Pi: inorganic phosphate, PPi: pyrophosphate. S1-S2: a disaccharide (e.g. trehalose or sucrose). S1-Sn: a disaccharide (e.g. sucrose) or a polysaccharide (e.g. maltodextrin). S1: a monosaccharide (e.g. glucose).

1.4 GT4 family

Campbell and colleagues started with the classification of GTs based on their amino acid sequence in 1997, resulting in 26 families65. Nowadays, GTs are classified into more than 100 different families32. Family four (GT4) constitutes one of the largest families and contains sequences from bacteria, , fungi, plants and . The enzymes within this family follow a retaining mechanism and adopt a GT-B fold. They use either nucleotide sugars or glycosyl-phosphates as donors, suggesting an evolutionary link between the two classes75. Most

15 Chapter 1: Literature review of the members are (EC 2.4.1.-) which transfer activated hexoses (e.g. (UDP-)glucose or (GDP-)mannose) to an acceptor molecule but pentosyltransferases using e.g. UDP-xylose are also present76. Lipopolysaccharide N-acetylglucosaminyltransferase (EC 2.4.1.56), on the other hand, transfers UDP-GlcNAc to a lipopolysaccharide while N,N′- diacetylbacillosaminyl-diphospho-undecaprenol α-1,3-N-acetylgalactosaminyltransferase (EC 2.4.1.290) uses UDP-GalNac as sugar donor. The family also contains enzymes with a pharmaceutical significance such as WaaG and AviGT4. WaaG is an α-1,3-glucosyltransferase that transfers a glucose molecule from UDP-Glc to the L-glycero-D-manno-heptose II residue of the inner core of lipopolysaccharides (LPS) present on the cell membrane of Gram-negative bacteria like Escherichia coli77,78. Inhibition of LPS biosynthesis, e.g. by inhibiting WaaG, has been highlighted as a promising strategy to make pathogenic and multi-drug resistant strains of Gram-negative bacteria susceptible to antibiotics that are normally used against Gram-positive bacteria78. AviGT4, on the other hand, is part of the antibiotic avilamycin A pathway79.

Three enzyme specificities from the GT4 family are particularly important for this work: Sucrose Synthase (SuSy, EC 2.4.1.13), Trehalose glycosylTransferring synthase (TreT, EC 2.4.1.245) and Trehalose Phosphorylase (TreP, EC 2.4.1.231). The former will be discussed in this chapter, while the other two are covered in Chapter 6.

2 Sucrose Synthase

2.1 Reaction, classification and mechanism

Sucrose Synthase (SuSy, EC 2.4.1.13) is a member of the GT-B fold family and more precisely of the GT4 retaining subfamily. In vitro, it catalyzes the reversible conversion of sucrose (α-D- glucopyranosyl-(1→2)-β-D-fructofuranoside) and a nucleoside diphosphate (NDP) into D-fructose and NDP-Glc (Figure 6). However, SuSy has also been suggested to be involved both in the breakdown and/or synthesis of sucrose in vivo80–82. Since its discovery in 195583, only SuSys from plants and have been characterized. However, also several non- photosynthetic organisms harbor a SuSy enzyme (Chapter 2).

Figure 6 Reaction scheme of Sucrose Synthase (SuSy)84. NDP: nucleoside diphosphate (e.g. ADP or UDP).

The pH of the solution influences both the equilibrium of the SuSy reaction and the optimal activity. In the sucrose synthesis direction, the enzyme displays maximal activity between pH 7.5 and 9.5, while pH optima in the breakdown direction range between 5.5 and 7.584. Reported equilibrium constants (Keq =[Fru]*[NDP-Glc]/[Suc]*[NDP]) range between 0.125 and 0.6 at pH

16 Chapter 1: Literature review

7.568,83,85, indicating that sucrose formation is thermodynamically favored. However, at more acidic pH (pH≈5), Keq was found to be >1 thereby favoring nucleotide sugar synthesis86. As expected for a GT catalyzed reaction87,88, protons are released or consumed in the synthesis or breakdown direction, respectively.

Crystal structure data indicate that SuSy uses a stepwise SNi-like reaction mechanism involving a stabilized oxocarbenium phosphate ion pair intermediate (Figure S1C). Experiments with inhibitors on the other hand revealed that UDP-Glc and UDP are the first substrates to bind, and the last product to be released, depending on the direction of the reaction48,89.

2.2 Substrate promiscuity

Besides its natural substrates D-fructose and NDP-glucose, SuSy is known to accept a diverse range of acceptors and donors in the synthesis direction. High conversion yields could be obtained under optimized conditions although catalytic efficiencies for most of the non-natural substrates were severely reduced compared to UDP-Glc and fructose. For a complete overview, the reader is referred to the review on SuSy by Schmölzer and colleagues84. Summarized, SuSy from potato and/or rice could use various D/L- ketoses, D/L aldoses, 1,6-anhydro-beta-D-glucose and di- and trisaccharides as alternative acceptors for fructose with relative activities ranging between 2-100%84,90–93. Furthermore, different UDP-sugars could serve as donor substrates for potato SuSy. UDP-Gal, for example, could be used with 23% of the activity compared to UDP- Glc. SuSy of rice and wheat germ, on the other hand, did not show any detectable activity on UDP-Gal90,94. To the best of our knowledge, donor promiscuity in the cleavage direction has only been tested for SuSy from rice90. 16 di- and trisaccharides were evaluated, but only 2- deoxysucrose (α-2-deoxyGlc-(1→2)-β-Fru) appeared to be promising with a relative activity of 55% compared to sucrose.

2.3 Structure

Currently, two types of crystal structures of SuSy are available: an open form from the bacterium Nitrosomonas europaea (SuSyNe)95 without substrates and closed structures from the plant Arabidopsis thaliana (SuSyAt1), which were obtained by crystallizing the enzyme in the presence of substrates (UDP and fructose or UDP-Glc) (Figure 7). According to the induced fit model, structural changes occur upon binding of the substrates leading to a closed conformation suited for catalysis. These rearrangements reshape the active site, resulting in stronger interactions with the substrates and proper positioning of critical residues95.

SuSyAt1 consists of four domains: a cellular targeting domain (CTD), the ENOD40 peptide binding domain (EPBD) and two typical GT-B catalytic Rosmann-fold like domains (Figure 8A). The ENOD40 peptides A and B, involved in root nodule organogenesis in legumes, bind to the EPBD domain96. The CTD domain is responsible for the association with certain cell organelles. Indeed, SuSy can be present in different compartments of a plant such as the cytosol or

17 Chapter 1: Literature review mitochondria but it can also be associated with the plasma membrane, plastid membranes or proteins such as actin97–100.

Figure 7 Surface representation (created in PyMol388) of SuSyNe (PDB 4RBN) and SuSyAt1 (PDB 3S28). Substrates in the active site are represented by sticks. In SuSyNe, the visualized substrates are those from the superposed structures of SuSyAt1 (PDB 3S27 and 3S28). In SuSyAt1, the substrates are not visible as the active site is shielded from the solvent by the enzyme’s residues. LCN: lichenan, a glucose analogue, Fru: fructose, UDP: uridine diphosphate.

Figure 8 (A) Domain organization of SuSyAt1. CTD: cellular targeting domain, EPBD: ENOD40 peptide binding domain. GT-BN and GT-BC: catalytic N and C-terminal domains, respectively. (B) Tetrameric crystal structure of SuSyAt1 with color code according to domain organization presented in A. (C) Schematic diagram of the active site of SuSyAt1. Hydrogen bonds are shown as dotted lines. W: water molecule, LCN: a glucose analogue, Fru: fructose. Part B and C are copied from48.

18 Chapter 1: Literature review

The substrates are bound in the cleft between GT-BN and GT-BC and interactions with each other and with the enzyme residues occur predominantly through an extensive hydrogen-bond network and not through hydrophobic stacking interactions (Figure 8C). As expected for a GT-B enzyme,

UDP-glucose mainly interacts with the donor GT-BC domain, although a hydrogen bond between the backbone of Gly-303 (GT-BN) and the beta phosphate of UDP was also observed. In addition, the backbone carbonyl oxygen of His-438 is perfectly positioned to stabilize the partial positive charge at the C1 carbon of the glucose moiety. Fructose is solely bound through interactions with the acceptor GT-BN domain (Figure 8C). Two phosphorylation sites in the N-terminal domain were identified in plant SuSys: Ser-13 (SuSyAt1) in the CTD domain and Ser-167 (SuSyAt1) in the EPBD domain84,101.

The active form of SuSy is typically homotetrameric (Figure 8B), consisting of four identical subunits each with a molecular mass of about 90 kDa84. However, phosphorylation, sucrose concentration, ionic strength and pH were found to be factors influencing the oligomeric state of the enzyme102,103. Consequently, SuSy enzymes consisting of less and more than four subunits have been described (e.g. monomeric104, dimeric102, trimeric93, hexameric103 and octameric103). Two different interfaces between monomers, A:B and A:D, could be distinguished (Figure 8B). The A:D interface is created by the N-terminal half of the CTD-EPBD domain linker and the C terminal domain. The much less hydrophobic A:B interface is mainly composed of EPBD domains.

An exclusive feature of SuSy among GTs, is an extremely long helix Nα1 in the GT-BN domain, which extends from the EPBD domain into the active site, contacting fructose and UDP. It is believed that conformational changes in the CTD (e.g. induced by phosphorylation) of one subunit could be transmitted to the active site of a neighboring monomer via EPBD and helix Nα184.

2.4 Factors influencing expression and activity

2.4.1 Presence of different isoforms within one species

In plants and cyanobacteria, multiple SuSy genes have been identified in one species105,106. These isoforms display clear differences in their spatial and temporal expression patterns in plants107–112. Three isoforms of pea, for example, were found to have different expression patterns in different organs of the plant and during organ development113, while isoforms of maize SuSy differed in their intracellular location (association with the membrane)114. In addition, their expression can be differentially regulated by exogenous factors such as temperature, salt concentration, anaerobiosis, low temperatures or limiting carbohydrate supply107. The different isoforms are also suggested to have different metabolic roles113. Cytosolic SuSys could supply products for general metabolism (energy production through ) or ADP-Glc for synthesis115,116, whereas plasmalemma-associated forms are thought to supply UDP-Glc for cellulose synthesis by the membrane integrated cellulose synthase complex97,113. It has to be

19 Chapter 1: Literature review noted, however, that the SuSy-based pathway of starch synthesis in plants, which involves the production of ADP-Glc in the cytosol by SuSy, transport of the nucleotide sugar into plastids through a translocator and subsequent conversion of ADP-Glc into starch by , is still highly controversial. Indeed, although several experiments have linked the activity of SuSy to starch synthesis117–124, other reports indicate that SuSy is not (or only to a minor extent) involved in this process116,125–129. In addition, there is still some disagreement about the existence of an ADP-Glc transporter mediating the import of the sugar nucleotide into the plastid127,128,130.

Expression of sucrose metabolizing enzymes in cyanobacteria was also found to depend on the isoform and external stimuli131,132. In Anabaena sp. PCC 7120, expression of both susy genes and one gene coding for a Sucrose-Phosphate Synthase (spsA) was increased by salt stress (addition of NaCl), while expression of spsB was decreased. Despite the increased expression of both sucrose degrading (susy) and synthesizing (sps) genes, net accumulation of sucrose was observed under salt stress and it was suggested that this could be attributed to transcriptional and post-translational regulation of SPS activity by NaCl. Such sucrose cycles, which are characterized by a permanent process of formation and degradation of sucrose, have also been observed in plants. Although the exact role of these cycles has not yet been elucidated, it has been proposed that they could allow plants to respond with a high degree of sensitivity to factors influencing sugar accumulation, osmotic potential, respiration and sugar signaling133–135. Next to exogenous factors, expression can also be regulated by intracellular proteins. Indeed, NtcA, a global nitrogen regulator in cyanobacteria, acts as a transcriptional activator of the spsB and invB genes and as an inhibitor of susy expression136. Together with other experiments, these observations indicate a clear link between sucrose metabolism and nitrogen fixation in cyanobacteria136–139.

2.4.2 Post-translational modifications: phosphorylation and S-thiolation

One of the most important post-translational modifications of plant SuSys is phosphorylation, which was found to occur at a serine residue in the CTD domain and a serine residue in the EPBD domain84. Phosphorylation of the EPBD serine site (Ser-167 in SuSyAt1) has been proposed to promote SuSy degradation via a ubiquitin/proteasome pathway140,141. Phosphorylation at the CTD serine (Ser-13 in SuSyAt1) site has found to promote the formation of tetramers102 and also affects intracellular location by promoting or inhibiting membrane association98,142,143. In addition, it could also stimulate the sucrose cleavage activity (but not the synthesis activity) of some SuSys, mostly due to an increase in affinity for sucrose and/or UDP101,144. For others, phosphorylation at this site did not affect the activity significantly98,143. It has to be noted that plant SuSys are only phosphorylated if extracted from their natural hosts or if they are expressed in Saccharomyces cerevisiae93, but not if Escherichia coli (E. coli) is used as expression host. However, phosphate groups can still be attached in vitro by protein kinases. In addition, phosphorylation can possibly also be mimicked if the serine residue is replaced by acidic

20 Chapter 1: Literature review amino acids such as Asp or Glu, although other reports contradict this84,93. Expression of SuSy1 from potato in yeast resulted in higher activities but unaltered affinities compared to expression in E. coli and both enzymes also differed in their acceptor substrate spectrum93. The yeast SuSy1 S11A (non-phosphorylated) mutant displayed markedly improved affinities for all substrates albeit with a much lower Vmax while the E.coli S11D mutant expressed in E. coli did not have improved activities although affinity for UDP-Glc and fructose was increased93. These results indicate that the effect of phosphorylation on SuSy is still not fully understood.

Another post-translational modification in plant SuSy involves binding of ENOD40 peptides. These proteins are hormone-like molecules which play an important role in root nodule organogenesis in legumes but homologs have also been identified in non-legumes indicating a more general role in plants145,146. In legumes, sucrose breakdown by SuSy is necessary for normal nodule development and nitrogen fixation145. It has been shown that peptide A can bind to SuSy from soybean by disulfide bond formation at Cys-264 (S-thiolation), activating sucrose cleavage activity but not synthesis activity. The ENOD40 proteins were also found to bind tightly with SuSy from non-legumes such as maize, even though it lacked the particular Cys residue. This interaction inhibited protein phosphorylation at Ser-170 thereby preventing SuSy degradation via the ubiquitin/proteasome pathway140.

2.5 Biotechnological applications

Typically, GTs catalyze the transfer of a sugar from an activated sugar donor to an acceptor molecule in a virtually irreversible way. In contrast, the SuSy reaction is readily reversible because of the high energy glycosidic linkage of sucrose (~27.6 kJ/mol)147, which is comparable to that of nucleotide-activated sugars. Consequently, SuSy is perfectly suited for both the production of valuable disaccharides (synthesis reaction) and the production of nucleotide- activated sugars (breakdown direction)84.

Production of sucrose analogues

Because of the broad acceptor and donor spectrum of plant SuSys in the synthesis reaction, production of various sucrose analogues was achieved successfully with these enzymes84. These analogues can serve as starting materials for the synthetic production of valuable glycostructures with applications in pharmaceutical, cosmetic, food and feed industry148–150. They are also useful probes to study sugar signaling and sugar transport in plants151,152. However, with a few exceptions, most produced sucrose analogues had a modified fructosyl unit (e.g. α-D- glucopyranosyl-α-L-sorbofuranoside)84.

Production of nucleotide sugars and coupled reactions with GTs for the production of valuable glycosides

Nucleotide sugars are involved in several important processes in vivo. UDP/TDP-sugars are commonly used in biosynthetic pathways of glycosylated natural products, CDP-sugars are

21 Chapter 1: Literature review required for some pathogenic bacterial antigens, GDP-sugars are used in a range of glycobiological processes and ADP-sugars are indispensable for intracellular trafficking, post- translational modification of proteins, DNA repair, programmed cell death, and metabolism153. To study these processes and to produce valuable glycosylated compounds with Leloir GT enzymes, nucleotide-activated sugars are of utmost importance. However, the high price of these molecules still remains a major bottleneck. In this respect, SuSy presents a promising biocatalyst for the production of nucleotide sugars, NDP-Glc in particular, starting from the cheap and abundantly available sucrose. Expansion of the substrate range to other NDP- sugars (e.g. UDP-Gal) can be achieved by combination with additional glucosyl-modifying enzymes84. In addition, SuSy can also be coupled with other (natural product) (GlcTs) to produce glycosylated compounds with in situ regeneration of the expensive nucleotide (sugar)66,84. The use of a coupled system offers several advantages: laborious isolation of NDP-sugars, which can also lead to partial degradation of the labile compounds, is bypassed. Moreover, expensive nucleotides (e.g. UDP) only have to be applied in catalytic instead of stoichiometric amounts. The latter also aids in preventing product inhibition of GTs by UDP and reverse glycosylation, thereby improving the overall conversion efficiency84.

For a complete overview of examples in which SuSy is used for production purposes, the reader is referred to the recently published review about SuSy by Schmölzer and coworkers84.

3 Construction of phylogenetic trees

In Chapter 2 and Chapter 6, evolutionary relationships between enzymes are discussed using phylogenetic trees. To understand the methods section involving the construction and validation of the phylogenetic trees some background knowledge is required and will be provided here.

Several methods exist to infer phylogenetic trees such as maximum parsimony (MP), maximum likelihood (ML) and neighbor-joining (NJ) method154. In brief, the most parsimonious tree is the one that minimizes the number of evolutionary changes required to explain the observed data155. According to the maximum likelihood approach, the ‘best’ tree is chosen, i.e. the one with the highest probability of evolving the observed data, taken into account a certain model of evolution156. The neighbor joining method, on the other hand, is a distance based method that creates trees based on the percentage similarity between pairs of input sequences and an evolutionary model157,158. Maximum parsimony does not take into account a model of evolution and also neglects unobservable substitutions. Consequently, this method is hardly used anymore. Furthermore, several computer simulation experiments have revealed that ML would be more efficient and robust compared to NJ. The latter means that it is better at estimating the correct phylogeny even when the assumptions of the method are violated (e.g. when an incorrect evolutionary model is used in the analysis)159–161.

Constructing an accurate ML tree requires the use of a substitution model i.e. a statistical model of molecular evolution that specifies the relative rates of all possible substitutions. For DNA

22 Chapter 1: Literature review sequences this comprises a 4x4 nucleotide substitution matrix, while for protein sequences a 20x20 amino acid substitution matrix is used. The latter thus contains for every amino acid i the probability that it is replaced by another amino acid j, if a mutation occurs during evolution. Indeed, it is well known that some kind of amino acid substitutions are more likely to occur than others because the involved amino acids share similar properties (polarity, structure,…)162. One of the first amino acid substitution models, developed by Dayhoff and coworkers, was the point accepted mutation or PAM matrix163. It is derived from comparisons of existing sequences of multiple protein families, and it simply counts the number of times a specific mutation is accepted. The probability is subsequently derived from the frequency of occurrence of a specific amino acid and a specific substitution. Another so called ‘counting model’ is the JTT matrix developed by Jones, Taylor and Thornton, which is based on a much larger database of closely related sequences (>85% sequence identity)164. The former models however do not take into account evolutionary distances between sequences, a downside that was countered by applying the maximum likelihood (ML) principle which includes branch lengths. A well-known example of such a substitution model is the WAG matrix developed by Whelan and Goldman165. Currently, several variants on these models have been developed, but none of them is universally preferred for all alignments166. Consequently, a best-fitting evolutionary model for your input alignment has to be selected ad hoc based on statistical criteria such as the likelihood ratio test, Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC) with specialized programs (e.g. ProtTest167 or MEGA168). In addition to the model, different parameters can be included: ‘+I’ (invariable sites), ‘+G’ (gamma distributed rate heterogeneity of variable sites), and/or ‘+F (observed amino acid frequencies)’. Invariable sites are positions in the sequence which have remained unchanged during the course of evolution, most probably because of functional constraints, while variable sites can be assumed to vary in substitution rate according to a (discrete) gamma distribution169.

Once the substitution model is known, a maximum likelihood tree can be constructed. To validate the inferred tree topology (including the nodes and clusters within a tree), the bootstrap method170 can be used (Figure 9). This statistical resampling technique consists of several subsequent steps. First, the input multiple sequence alignment (MSA), which can be regarded as a matrix consisting of m sequences and an alignment length of n positions (amino acids, gaps or nucleotides), is used to infer a phylogenetic tree using a particular algorithm (e.g. ML). Next, n columns are randomly chosen from the MSA, with replacements, giving rise to a new mxn pseudodataset. Some of the original columns will not be present in this dataset while others will occur multiple times. The pseudodataset is then used to build a new phylogenetic tree using the same algorithm as before and its topology is compared to the original tree. This process of site sampling and tree building is repeated several hundred times (= amount of bootstrap replicates z). The bootstrap value at each node of the original tree is then calculated as the percentage of bootstrap trees in which that particular node (cluster of sequences) is recovered. It thus indicates

23 Chapter 1: Literature review the reliability of the cluster descending from the node or how well that node is supported. As a general rule, only nodes with bootstrap values higher than 70% should be considered reliable171.

In this thesis, phylogenetic reconstructions were conducted with MEGA172. This is a very user- friendly program that can be used for several purposes amongst which the construction of MSAs, visualization and manipulation of those alignments, determination of a best-fit substitution model, inference of phylogenetic trees (including bootstrap analysis) or ancestral sequences and so on. The use of one environment eliminates the need for interconverting file formats although the program also allows input files from other software packages or online tools such as clustal omega.

Figure 9 General overview of the bootstrap method. In this example, two out of three bootstrap trees cluster sequence A and B together, hence, the bootstrap value for that node is 67% (2/3). Seq: sequence, MSA: multiple sequence alignment.

4 Enzyme engineering

Protein engineering, a process that aims at altering the structure of an enzyme (or other protein) to improve or change its properties, such as the activity, specificity and (thermo)stability, has become a very important tool to overcome the limitations of natural enzymes as biocatalysts173– 175. Generally, two different strategies can be discerned, directed evolution and rational design.

24 Chapter 1: Literature review

The term ‘directed evolution’ is used to describe all molecular biology techniques that mimick natural evolution in a laboratory setup176. Applied to enzyme engineering, this involves the random mutagenesis of the target gene, followed by selection or screening (e.g. with activity assays) of huge libraries of variants. The best hit can subsequently be used in next rounds of mutagenesis until the desired level of change is reached. Two different strategies for the generation of enzyme variants during directed evolution can be used: either random mutagenesis (‘asexual evolution’) or recombination methods (‘sexual evolution’). Recombination methods make new combinations from a pool of parent genes to create variant enzymes177. An example of a random recombination method is DNA shuffling178,179. In this technique, parent genes are first fragmented by DNase. Subsequently, the pieces having a sufficient overlap anneal to each other and are extended. In asexual evolution on the other hand, random mutations are introduced in one protein-coding parent gene, typically by means of error-prone PCR (epPCR). EpPCR uses non-optimal reaction conditions to increase the error rate of DNA-polymerases177.

Although directed evolution is simple to perform and little knowledge and understanding of the enzyme is needed, it suffers from a major drawback, i.e. it is very expensive, labor-intensive and time-consuming. Therefore, rational approaches are of primary interest. In rational protein design, site-directed mutagenesis is performed to replace predetermined amino acids. Residues are chosen based on the mechanism and molecular basis of key properties of the protein. This information can be obtained from crystal structures, homology models, multiple sequence alignments, or specialized software packages175,180. The determination of the exact mutations that should bring about the desired change has however proved to be difficult. Therefore in practice, most researchers use semi-rational design, a combination of rational design and directed evolution. Like in rational design, only specific residues are targeted for mutagenesis, but instead of one predetermined mutation, several alternative amino acids are evaluated (e.g. site-saturation mutagenesis), resulting in much smaller and ‘smarter’ libraries compared to directed evolution.

In this work the focus lies on the use of (semi-)rational design to get a better understanding of the structure and function of GT4 enzymes, to change substrate specificity and to improve protein stability. Only techniques used in this thesis will be discussed in the following sections as extensive reviews about protein engineering are already abundantly available175,181–183.

4.1 Engineering substrate specificity

4.1.1 CASTing

Active-site redesign is still one of the most commonly applied techniques to alter the specificity or broaden the substrate range of an enzyme. Well known is the approach of Combinatorial Active Site Saturation Testing (CASTing)184 in which spatially close positions (two or three) in several sets around the active site are mutated simultaneously by site-saturation mutagenesis, allowing for potential synergistic effects to occur. The best hit from one randomized set can be used as

25 Chapter 1: Literature review starting template for another set, hits from seperate sets can be combined or directed evolution can be used to further improve the desired property184.

Site-saturation libraries, which consist of a mix of mutant and wild-type (WT) enzyme coding plasmids, are typically made using degenerate primer mixes in specific PCR protocols. The NNK primer (N: any base, K: T or G), for example, is a mix of 32 primers coding for all 20 naturally occurring amino acids and 1 stop-codon at the target position(s) in the enzyme (Table S1). Because of the codon redundancy and the stochastic nature of sampling, some variants will be more represented than others during screening experiments. In case of the NNK primer, serine and leucine mutations can result from three different codons, while the other amino acids are represented by only one or two codons (Table S1). Therefore, to evaluate all possible mutants, more colonies should be screened than the theoretically amount of possible variants, a process called oversampling. The amount of oversampling is determined by the degree of coverage you want to obtain and can be calculated according to the formula185

푻 푶 = = −풍풏(ퟏ − 푷푪) ( 1 ) 푽 With O: oversampling factor, V: number of theoretically possible variants, T: the number of transformants that have to be screened for a certain percentage coverage (PC). Consequently, to statistically cover 95% of all possible sequence variants, a threefold excess of colonies should be screened (-ln(1-0,95)≈3)185. This means that at least 96 transformants (T≈3*32) have to be considered if a NNK primer (V=32) is used. Simultaneous saturation of two positions requires 3072 colonies (T≈3*32*32) to be screened. To reduce the screening effort, smaller primer sets can be used such as the NDT (small set of amino acids with different properties), VRK (hydrophilic amino acids) or the NDT/VHG/TGG primer trio (Table S1). The use of the latter is called the 22c trick186, as the primer includes 22 codons, coding for all 20 amino acids. With the 22c trick, 50% less colonies have to be screened if two positions are saturated simultaneously, compared to the NNK primer. It has to be mentioned that preferential binding of some of the primers within the set can lead to a biased library. The quality of the library can be checked by sequencing and increasing the oversampling factor can compensate for bad quality libraries.

4.1.2 Domain swapping

A rational method of protein recombination that does not involve the creation of huge random libraries is the design of hybrid enzymes, also called chimeras, by means of ‘domain swapping’. Strictly speaking, domain swapping can be regarded as the replacement of secondary/tertiary elements or even whole domains of one protein by the corresponding homologous regions of another protein187,188. This process also exists in nature as a mechanism of natural evolution of protein function, illustrating the possibility to use it as a protein engineering strategy189. Domain swapping can offer valuable information about structure-function relationships but can also be used to modify an enzyme specificity or thermal stability188. A nice example was demonstrated by van Beek and coworkers190. They replaced the C-terminal part of a stable Baeyer-Villiger

26 Chapter 1: Literature review monooxygenase (BVMO) by the respective domains of other BVMOs which are less stable, but display a much wider substrate range. By blending BVMOs in this way, a large part of the thermostable enzyme is conserved, while a significant part of the substrate is exchanged resulting in a more promiscuous but stable BVMO190.

4.1.3 Correlated mutations

Correlated positions within a sequence (alignment) are also known as correlated mutations or co- evolving residues. These residues are typically functionally related and only certain combinations of amino acids at the corresponding positions will allow the protein to maintain its activity, folding and/or stability191. A nice example of two correlated residues are those forming stabilizing (non- covalent) interactions such as hydrogen bonds or salt bridges. Such type of co-evolving residues are found in close spatial proximity in the three-dimensional structure of the protein. If one of these residues is mutated (e.g. negatively charged glutamate to positively charged arginine), the correlated position should mutate concomitantly (e.g. arginine to glutamate) to preserve the interaction (e.g. salt bridge) (Figure 10, green). Another well-known set of correlated positions are specificity-determining residues (SDR). These are not necessarily located in the vicinity of each other but they tend to be in functional sites conferring specificity, typically (but not exclusively) the active site192. These larger groups of residues are specifically co-conserved within a particular protein subfamily, but vary between subfamilies with different enzyme specificities (Figure 10, blue)191,193,194. Consequently, SDR are potential targets to alter the activity or change the specificity192.

Correlated positions (including SDR) can be detected based on sequence alignments using sequence co-variation analysis. To this end, several free online software programs such as I- COMS or CMAT are available. In this thesis, the Cornet tool of 3DM, proprietary software developed by Bioprodict, was used. 3DM information systems are databases that combines several protein-related data (e.g. mutational data automatically extracted from literature) with protein alignments (Figure 11)195. Conserved structural regions within the protein superfamily are extracted from available crystal structures of its members and used to guide a much larger multiple sequence alignment. The program uses its own family-specific numbering scheme, based on these conserved core positions (Figure 11B and Figure 11C). Consequently, residues from different enzymes with the same 3DM number are located at equivalent positions in their tertiary structures. Next to correlation mutation analysis, 3DM offers several other functionalities such as phylogeny reconstruction, data statistics (e.g. calculation of amino acid conservation, average B-factor, hydrophobicity,…) and the possibility to visualize data (e.g. correlated positions) in 3D structures (Figure 11A).

27 Chapter 1: Literature review

Figure 10 (Top) Conserved and correlated positions in a protein multiple sequence alignment. (Bottom) Visualization of correlated positions in a protein structure. This figure was kindly provided by Jorick Franceus191.

Figure 11 3DM protein-superfamily platform. (A) Functionalities available in 3DM. (B) Part of a structure-based sequence alignment of glycosyltransferases with superfamily specific 3DM numbering and consensus residues. The different subfamilies within the large protein superfamily are named after the crystal structures that were used to guide the alignment (e.g. 3S28: PDB ID of SuSyAt1). Consensus residues are those that occur most frequently in a specific subset of sequences. (C) Part of the sequence from SuSyAt1 in 3DM, including the EX7E motif. Both numbering based on full sequence (sequence numbering) and numbering based on conserved core regions (3DM numbering) are shown. Core residues are colored red.

28 Chapter 1: Literature review

4.1.4 Molecular imprinting of enzymes

Changing the substrate specificity of an enzyme can also be achieved without altering the primary sequence of the protein, e.g. by imprinting. Imprinting of ligands, also called ligand-induced enzyme memory, was first described by Russell and Klibanov in 1988. They observed that the transesterification activity of powdered subtilisin in organic media could be improved up to 100 fold if the enzyme was lyophilized (from an aqueous solution) in the presence of competitive inhibitors (the imprinting molecules, which were removed afterwards). It was postulated that the inhibitors induced conformational changes in the enzyme’s active site, locking it into a conformation which resembles the enzyme substrate complex during the lyophilization process196. In addition to improved catalytic performance, molecular imprinting has also led to enzymes with altered substrate selectivity and enantioselectivity (e.g. D vs L-substrate)197–200. However, this new conformational state (imprinted memory) can only be retained in anhydrous organic solvents (not in an aqueous solution) as the enzyme is much less flexible in this type of environment196. Non-aqueous media such as organic solvents offer many advantages: altered regio- and enantioselectivity, altered substrate specificity, increased thermostability, reversed thermodynamic equilibrium towards synthesis (e.g. hydrolases), increased solubility of hydrophobic substrates and water-dependent side reactions and/or bacterial contamination can be avoided196,201–203. However, the activity of many enzymes is often severely deprived in organic solvents. Furthermore, reactions using polar substrates such as sugars are preferably performed in aqueous solutions (although a biphasic system or ionic liquids can also be used). To maintain the changed conformational state of the enzyme after imprinting in an aqueous environment, chemical crosslinking can be applied. To this end, the imprinting molecules are first incubated for a while with the enzyme. The enzyme is subsequently recovered by precipitation (e.g. using solvents such as tert-butanol, ammonium sulphate or polyethyleenglycol) and crosslinked, giving rise to imprinted crosslinked enzyme aggregates (iCLEAs) (Figure 12).

Figure 12 General scheme for the production of imprinted cross-linked enzyme aggregates (iCLEAs). AS: ammonium sulphate, GA: glutaraldehyde.

29 Chapter 1: Literature review

The production of CLEAs without prior imprinting has also been extensively used as a strategy to increase the stability of the enzyme towards denaturation by heat, organic solvents and proteolysis and to increase the volumetric productivity and recovarability204–208. To achieve crosslinking, the enzyme can e.g. be derivatized with functional groups (e.g. vinyl groups), which can be connected by radical polymerization under UV using ethylene glycol dimethacrylate. The latter is also known as the CLIP strategy and was successfully used to convert a glucose oxidase that is unable to use D-galactose, into an enzyme that could use this substrate with a catalytic efficiency comparable to the wild-type reaction of the native enzyme and to change the substrate specificity of cyclodextrine glycosyltransferase, the substrate selectivity of proteases and the enantioselectivity of an epoxide hydrolase198–200,209. Next to the derivatization strategy, the crosslinking reagent glutaraldehyde has found widespread use for enzyme immobilization. It is commercially highly available, rather inexpensive, highly reactive and more efficient than other aldehydes in generating stable crosslinks.

Next to its use in biocatalysis, imprinting has also been applied for the creation of molecular imprinted polymers (MIPs), which are used to separate and purify bioactive molecules such as oligopeptides or as sensing elements in biomedical devices210,211.

4.2 Engineering protein stability

4.2.1 Kinetic versus thermodynamic stability

Protein denaturation is often a very complex process but can generally be approached by a classic two-step process: N ↔ U → D, where N is the native folded functional state, U the reversible inactive (partially) unfolded state and D the irreversible denatured inactive enzyme. The unfolded protein may refold to its native state or undergo irreversible denaturation by protein aggregation or misfolding212–214. Consequently, stability can be approached from both a thermodynamic and kinetic point of view. Thermodynamic stability is related to the difference in gibss free energy between the folded and unfolded states (∆Gunfold) while kinetic stability is related to the energy barrier separating the native state from the unfolded and irreversibly-denatured proteins212 (Figure 13).

A thermodynamic stable protein is characterized by a low amount of (partially) unfolded states in equilibrium with the native state (Kunfold <1, ∆Gunfold > 0) and can be evaluated by its melting temperature Tm, which is the temperature at which 50% of the proteins are unfolded. Tm values can be determined e.g. with differential scanning fluorimetry (DSF), differential scanning calorimetry (DSC) or circular dichroism (CD)212,214–216. CD is defined as the difference in absorption of left-handed and right-handed circular polarized light. When proteins unfold during a temperature increase, the highly structured secondary elements such as α helices and β sheets are lost leading to a change in CD bands215. DSF relies on the fluorescence of a dye such as SYPRO Orange, which binds to the hydrophobic parts of an unfolded protein, while DSC is based on the absorption of heat that occurs when a protein unfolds214,215.

30 Chapter 1: Literature review

Kinetic stability on the other hand, reflects the time a protein remains active before undergoing irreversible inactivation at a given temperature217. The most commonly reported measure for kinetic stability is the half- of denaturation (t50) at a specific temperature (e.g. 60°C) i.e. the time it takes for the activity of the protein to be reduced to half of its original activity.

Depending on the gibss free energies of the different states and the rate-limiting step of the unfolding/denaturation process212, thermodynamic and kinetic stability can be positively 218,219 213 correlated (e.g. an increase in both Tm and t50 upon mutation) , negatively correlated or independent of each other212. Although Tm values are often easier to determine, biotechnological applications often require kinetically stable proteins, e.g. pharmaceuticals with a long shelf-life or enzymes which can withstand extreme process conditions in a bioreactor for a sufficient amount of time to have an economically attractive process212.

Figure 13 Difference between kinetic and thermodynamic stability. Thermodynamic stability is related to the difference in Gibss free energy between the native (N) and unfolded (U) states (∆Gunfold), while kinetic stability depends on the energy barrier separating the native state from the (partially) unfolded and denatured states (D). The latter is thus related to the rate of irreversible denaturation (k). If ∆Gfold is negative, folding occurs spontaneously (thermodynamically stable native state with lower gibss free energy). ΔH: the enthalpy, T: temperature, ΔS: entropy, K: equilibrium 212 constant, R: gas constant, k0: front factor. Figure drawn based on .

4.2.2 Strategies to improve the stability

Increasing the stability of an enzyme can be achieved by physical immobilization220 or protein engineering techniques. Next to directed evolution (e.g. using error-prone PCR), there are several (semi-)rational ways to improve the stability by enzyme engineering, such as site-saturation mutagenesis of flexible residues (B-fit method221), tunnel-forming residues and residues at multimeric interfaces. ‘Entropic stabilization’ (rigidification) can be obtained by mutating (glycine) residues to alanine or proline, thereby decreasing conformational flexibility, or changing arginine to lysine, which has more rotamers in the folded state222,223. Introduction of additional salt bridges, disulfide bridges, hydrogen bonds, hydrophobic interactions or clusters of aromatic-aromatic interactions can also increase the resistance against unfolding224–227. Stabilizing mutations can occur both in the inner part of the protein or at the surface224.

31 Chapter 1: Literature review

Computational methods can be used to predict stabilizing mutations in silico, for example by consensus design, ancestral protein reconstruction or calculation of changes in folding free energies upon mutation (e.g. FoldX and Rosetta)228–236. According to the consensus approach, stability can be increased by introducing the most occurring amino acid (consensus residues) at one or more positions based on a sequence alignment of all known homologues enzymes. These residues have already been selected through evolution and should be beneficial to the protein228,237,238. The last decade, various computer algorithms (e.g. FoldX and Rosetta) were mutant WT optimized which calculate the difference in free energy of unfolding (∆∆Gunfold=∆G -∆G ) between the WT enzyme and the mutant. FoldX236 describes the energetic contributions to protein stability in simple empirical terms as shown in Equation 2.

∆푮 = 푾풗풅풘 ∗ ∆푮풗풅풘 + 푾풔풐풍풗푯 ∗ ∆푮풔풐풍풗푯 + 푾풔풐풍풗푷 ∗ ∆푮풔풐풍풗푷 + ∆푮풘풃 + ∆푮풉풃풐풏풅 + ∆푮풆풍 + 푾풎풄 ∗ 푻 ∗ ∆푺풎풄 + 푾풔풄 ∗ 푻 ∗ ∆푺풔풄 ( 2 )

∆Gvdw stands for the summation of the Van Der Waals contributions of all atoms. Interactions of apolar and polar groups with the bulk solvent are included by the desolvation terms ∆GsolvH and ∆GsolvP, respectively. ∆Ghbond accounts for the hydrogen-bonds based on simple geometric considerations while ∆Gwb stands for the extra stabilizing effect of water molecules with more than one hydrogen bond with the protein. ∆Gel incorporates the electrostatic contributions (salt bridges) while ∆Smc is the entropy cost for fixing the protein backbone in the folded state. Finally, the different side-chain conformations are enclosed in the ∆Ssc term. The composites of the equation are weighted by the different W terms. Rosetta235 uses another energy function and is computationally more demanding than FoldX239.

4.3 Engineering of GTs: case studies

The substrate specificity for both the donor and acceptor has been modified or improved successfully for both GT-A and GT-B enzymes, even by only single point mutations. A few examples which can be related to the research conducted in this thesis will be discussed here.

β1,4-galactosyltransferase-I (β4Gal-T1, inverting GT-A) transfers the galactosyl moiety from UDP-Gal to various GlcNAc or Glc acceptors in vivo. Introducing a Y289L mutation created space for an additional N-acetyl group on the donor substrate, leading to an increase in GalNAc-T activity of about 200 times without impairing Gal-T activity240. Replacing the isoleucine at the corresponding position in a β1–4-N-acetylgalactosaminyltransferase to a Tyr, on the other hand, reduced its GalNAc activity by nearly 1000-fold, while enhancing its GalT activity by 80-fold241.

In VvGT1, a glucosyltransferase, the third position of the HX7E motif (a Trp residue) interacts with the O4 hydroxy group of glucose242. In Lamiales F7GATs, which preferentially use UDP- glucuronic acid (UDP-GlcA) as donor, the corresponding position is occupied by an arginine residue. This residue was hypothesized to play a crucial role in the specificic recognition of the anionic carboxylate of GlcA as mutation to a Trp shifted the sugar donor specificity to UDP-

32 Chapter 1: Literature review glucose243. Indeed, the kcat of R350W for UDP-Glc increased 10 times, while that for UDP-GlcA decreased 280 times. The Km, on the other hand, increased three times for both substrates.

In addition, another arginine positioned far outside the PSPG box in the N-terminus (acceptor binding domain) of UGT94B1 was identified to interact with the negatively charged carboxylate group of UDP-GlcA. Mutation to smaller (positively charged or polar) residues increased the kcat for UDP-Glc more than twice while reducing the Km to half of that of the WT while activity on UDP-GlcA was abolished244.

UDP-galactose:anthocyanin galactosyltransferase (ACGaT), isolated from Aralia cordata and UDP-glucose:flavonoid glucosyltransferase (UBGT) preferentially use UDP-Gal and UDP-Glc, respectively, although they can use the other nucleotide sugar to a certain extent245. Sequence alignment between several glucosyltransferases and revealed a conserved glutamine residue at the last position of the PSPG motif of GlcTs, while a histidine residue was present in GalTs. Mutation of Gln-328 to a histidine in UBGT led to a significant decrease in activity with UDP-Glc but catalysis with UDP-Gal was also four times less efficient compared to the WT. In contrast, the affinity for UDP-Glc increased roughly forty times for the H374Q mutant of AcGaT, while maintaining similar activity levels. However, WT activity with UDP-Gal was not significantly altered. This clearly illustrates the importance of the residue in catalysis although one-point mutation apparently was not sufficient to obtain a true specificity switch.

In contrast, Claus and coworkers were able to switch the donor specificity of two capsule polymerases (SiaD) of Neisseria meningitides by mutating a residue in the EX7E motif, which differed between GalTs and GlcTs of the retaining GT4 family. SiaDY prefers UDP-Glc as donor substrate but can also use UDP-Gal (about 18% of the activity with UDP-Glc). Mutating Gly-310 of the glucosyltransferase SiaDY into a proline, improved the activity on UDP-Gal three times while the activity on UDP-Glc was reduced to only 3% of the WT. Similarly, introducing a Glycine at position 310 in galactosyltransferase SiaDW-135 led to a 21-fold increase in activity on UDP-Glc while retaining only 13% of the activity on UDP-Gal compared to the parent246.

Summarized, the successes of rationally designed mutants highlight the important roles played by the residues within the active site although true specificity switches proved to be challenging in some cases. Next to rational mutagenesis, directed evolution has led to improved GTs. Directed evolution of OleD (GT1), for example, revealed four single mutants which could increase the activity towards natural and new acceptors247,248. In addition, several triple mutants had a broadened donor substrate range with activity towards nucleotide sugars (modified on C2, C3, C4 or C6)247,249. The involved positions were located in the active site, both N and C-terminal247.

In addition to single point mutations, domain swapping experiments involving the GT-BN and GT-

BC domains of related enzymes with different acceptors and donors successfully created chimeras with acceptor and donor substrate specificity dictated by the N-terminal and C-terminal

33 Chapter 1: Literature review domain parent, respectively250–253. Some of them also showed a broadened acceptor254,255, donor range256 or altered regiospecificity255. Functionally active chimeras have been made starting from highly similar enzymes (sharing e.g. 78% amino acid sequence identity) as well as more distantly related genes (e.g. only 22% identity)251,257,258. In some cases however, the point of fusion appeared to be critical to obtain soluble chimeras and the expression yield seems to be positively correlated with sequence identity among the parents255. In addition, prediction of the functional consequences of domain swapping often remains difficult as the N-terminal acceptor binding domain can also influence the sugar donor specificity and the C-terminal nucleotide donor domain the acceptor specificity251,259.

34 Chapter 1: Literature review

5 Supplementary materials

~o, .o, HO ~ 'R'

OH I ·o - P- o I OR

Oxocarbenium ion·like transition state

B - :c HO~lo ~o ( Ho-i(=::-~.. : - (la -­~ ~\ O ···H- 0 o· H- o I \ I \ ·o - P= O R' ·o - P= O R' OH I I I OR OR "0- P= O I OR - - Glycosyl~nzyme intcrmcdiate

c - - ~

Oj__NH 00 0 2 HO\::) HO~ a e o ___ :;_ o 6- O ···H- 0 I \ l \ ·o - P= O R ·o - P= O R I I OR OR

- - - Enzyme s ubstrato Oxocarbenlum lon·likc Stabillzcd oxocarbenium­ complex transition state (substitutcd) phosphatc short-lived Ion pair intonncdiate 1l r- - :t - - ~

~

0 .!.-NH ~0 Oj__NH, 6• : 2 HO-~a ~O : ~ HO~o 6+ ~ o,R ~ ~ 0 6- 0 - ~H- 0 6- I \ ·o- ~ -o"H ·o - P= O R I I OR OR I L- - - - Enzyme product Oxoearbenium ion-like Stabilized oxoearbenium­ complex transition state (substltuted) phosphate short·livcd ion pair intermediale

Figure S1 Mechanism of inverting (A) and retaining (B,C) GTs. (A) SN2 single displacement mechanism. (B) double displacement mechanism involving covalent glucosyl-enzym intermediate. (C) Stepwise SNi-like mechanism involving a short-lived oxocarbenium phosphate ion pair intermediate. Adapted from11.

35 Chapter 1: Literature review

Figure S2 Schematic representation of the glycosylation of diverse classes of small molecules by uridine glycosyltransferases (UGTs) resulting in glycosides with various applications. The UDP-sugar is used as an activated donor for the regio- and stereoselective glycosylation, hereby releasing UDP. Adapted from13.

36 Chapter 1: Literature review

Table S1 Codons included in different degenerate primers. These primer mixes can be ordered similarly as regular primers (e.g. 5’-AAGTANNKCTGAA-3’).

Amino acid NNK NDT VRK NDT/VHG/TGG Ala [A] GCG/GCT GCG Arg [R] CGT/CGG/AGG CGT CGT/CGG/AGG CGT Asn [N] AAT AAT AAT AAT Asp [D] GAT GAT GAT GAT Cys [C] TGT TGT TGT Gln [Q] CAG CAG CAG Glu [E] GAG GAG GAG Gly [G] GGT/GGG GGT GGG/GGT GGT His [H] CAT CAT CAT CAT Ile [I] ATT ATT ATT Leu [L] TTG/CTT/CTG CTT CTT/CTG Lys [K] AAG AAG AAG Met [M] ATG ATG Phe [F] TTT TTT TTT Pro [P] CCG/CCT CCG Ser [S] TCG/TCT/AGT AGT AGT AGT Thr [T] ACT/ACG ACG Trp [W] TGG TGG Tyr [Y] TAT TAT TAT Val [V] GTT/GTG GTT GTT/GTG Stop TAG Total # Codons 32 12 12 22 Total # AA 20 12 9 20

37 Chapter 1: Literature review

38

CHAPTER 2: Identification and characterization of novel bacterial SuSy enzymes

This work has partly been published as ‘Identification of Sucrose Synthase in non- photosynthetic bacteria and characterization of the recombinant enzymes (Diricks et al., 2015)’43 and ‘Sucrose synthase: A unique glycosyltransferase for biocatalytic glycosylation process development’ (Schmolzer et al., 2016)84.

39 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

1 Abstract

Sucrose Synthase (SuSy) catalyzes the reversible conversion of sucrose and NDP into fructose and NDP-glucose. Until recently, only SuSys from plants and cyanobacteria, both photosynthetic organisms, have been characterized. Here, four prokaryotic SuSy enzymes from the non- photosynthetic organisms Nitrosomonas europaea (SuSyNe), Acidithiobacillus caldus (SuSyAc), Denitrovibrio acetiphilus (SusyDa) and Melioribacter roseus (SuSyMr) were recombinantly expressed in E. coli and thoroughly characterized by studying properties such as the optimum pH, optimum temperature, thermostability and nucleotide preference. The physiological relevance of this enzyme specificity is discussed in the context of ecological niches, metabolic pathways and genomic organization. In addition, sequence alignments were used to identify possible sites of phosphorylation in bacteria, as this type of post-translational modification has been shown to alter the kinetic parameters of various plant SuSys.

40 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

2 Introduction

Sucrose is one of the major end-products of photosynthesis and has long been assumed to occur 260 only in phototrophic species such as cyanobacteria and plants . In these organisms, CO2 becomes fixed through the calvin cycle, resulting in glyceraldehyde 3-phosphate (G3P), which can be converted in several steps to sucrose (Figure 14). In plants, sucrose plays an important role in their development, growth, carbon storage, stress protection and signal transduction261,262. In cyanobacteria, sucrose serves as a compatible solute to protect against (osmotic) stress, as a transport molecule in filamentous species or as a signal molecule134,136,263–266. It could also be used as a storage reserve to survive periods with unfavorable environmental conditions. This would allow the cell to grow and divide quickly when conditions improve267. Reports about the intracellular accumulation of sucrose in non-cyanobacterial prokaryotes are currently still limited. So far, the disaccharide has only been detected in (e.g. Thioalkalivibrio268, Methylobacter269, Methylocaldum270, Methylophaga271 and Methylomicrobium272), (e.g. Methylobacillus) and in a species belonging to the 273. In these organisms, sucrose has been shown to act as a primary (main) or secondary compatible solute, protecting the cells against osmotic (salt) stress, and/or as thermoprotective270,274. It is suggested that the origins of sucrose metabolism probably lie in the or an ancestral type common to both the proteobacteria and cyanobacteria. Plants could have acquired the potential to metabolize sucrose from cyanobacteria, which are believed to be the ancestors of chloroplasts136,267,275–277.

The first step of sucrose synthesis is typically performed by Sucrose-Phosphate Synthase (SPS, EC 2.4.1.14), which generates sucrose 6-phosphate (Suc6P) from fructose 6-phosphate (Fru6P) and an activated sugar donor, such as UDP-Glc. The phosphate group of Suc6P is then cleaved off by Sucrose-Phosphate Phosphatase (SPP, EC 3.1.3.24) to irreversibly yield sucrose277–280 (Figure 14). Depending on the organism, sucrose can be metabolized by hydrolases, phosphorylases, transglycosidases or glycosyltransferases such as SuSy281. The latter produces nucleotide sugars from sucrose, which are possibly directed towards cell wall (UDP-Glc) or starch biosynthesis (ADP-Glc) in plants, whereas they play an important role in the synthesis of glycogen (ADP-Glc) and other (structural) polysaccharides in cyanobacteria119,282–284. In addition, SuSy has also been considered to be a major contributor to sucrose synthesis in some plants80– 82.

Besides its biological significance, SuSy has also proven to be a versatile biocatalyst for practical applications. In 1993, Elling and coworkers demonstrated the production of expensive nucleotide sugars (NDP-Glc) starting from the abundant and cheap substrate sucrose. Moreover, SuSy can also be coupled with a glycosyltransferase (GT), which has resulted in a cost-effective method for the glycosylation of small molecules16,285–289.

41 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

Figure 14 Carbon flow starting from CO2 fixation in phototrophic plants and cyanobacteria, including sucrose production and degradation through the SPS/SPP/SuSy pathway. SPS: Sucrose- Phosphate Synthase, SuSy: Sucrose Synthase, SPP: Sucrose-Phosphate Phosphatase. Adapted from277–280.

Since its discovery in 1955 by Cardini and coworkers, various SuSys from plants and cyanobacteria, which are both phototrophic organisms, have been characterized68,106,120,290–294. However, low activities and poor stability of the reported SuSy enzymes have impeded their commercial exploitation so far. In contrast, SuSys from non-cyanobacterial prokaryotes have not yet been characterized, despite the identification of their coding sequences by genome analysis in several organisms267,295,296. In this contribution, novel SuSys from non-photosynthetic/non- cyanobacterial prokaryotes belonging to different phyla (Proteobacteria, Deferribacteres and Ignavibacteriae) are expressed recombinantly and characterized for the first time. It has to be noted that shortly after the publication of this work, the crystal structure of one of the enzymes described here, was also published95.

3 Materials and methods

3.1 Materials

Unless otherwise stated, all chemicals were bought from Sigma-Aldrich, Merck or Carbosynth and were of the highest purity.

3.2 Phylogenetic analysis

All available SuSy sequences (1504 at the time of writing in 2013) were retrieved from the UniProtKB database by using the advanced search function with Sucrose Synthase as ‘protein name’ query. Sequences not annotated as SuSy (e.g. SPS), sequences that were either not unique, did not start with a methionine, were too long (>2000 amino acids), too short (<600 amino acids), or contained undefined amino acids, were removed using pythonscript ‘SuSy_allIsoforms.py’ or ‘SuSy_1Isoform’ (see below).

42 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

#TITLE PYTHONSCRIPT: SuSy_1Isoform #DESCRIPTION: This script is written to clean a set of protein sequences that are downloaded from the UniProtKb database (saved in fasta format) according to rules defined by the author

# your fasta file fasta_file = "uniprot_susy_proteinname_20131106.fasta" min_length = 600 max_length = 2000 undef = 0

#normally no change required below this point #======import time from Bio import SeqIO def sequence_cleaner(fasta_file, min_length): sequencesinfile0 = SeqIO.parse(fasta_file, "fasta") #delete sequences with similar in heading (to avoid problems with command header= ) output_file2 = open("fasta_file2.fasta","w+") for seq_record0 in sequencesinfile0: sequence0=str(seq_record0.seq).upper() header0 = str(seq_record0.description) if "similar" in header0 or "Similar" in header0 or "Genome" in header0: print ("similar") print header0 else: output_file2.write(">"+header0+"\n"+sequence0+"\n") output_file2.close() #Define fasta file without sequences with similar in heading fasta_file2="fasta_file2.fasta" sequencesinfile = SeqIO.parse(fasta_file2, "fasta") susy_organisms = list() susy_organisms_AA = list() susy_organisms_length = list() count1 = 0 removedseqlength_min = list() removedseqlength_max = list() removedsequndef = list() removedseqnotunique = list() removedseqdouble_org = list() #removes isoforms removedseqnotsusy = list () #to remove sequences which are not annotated as susy removedseqnotM = list() removedseqRue =list() #create hash table to add the sequences sequences={} #read fasta input file for seq_record in sequencesinfile: #take the current sequence sequence=str(seq_record.seq).upper() # seq_record.seq count1 += 1 #get header header = str(seq_record.description).split("|") #extract the accesion code from the header accession = header[1] #extract organism from header temp = str(header[2]).split("OS=")[1] temp2 = str(temp).split("=")[0] organism = temp2[:len(temp2)-2] #check if the current sequence is according to the user's parameters if not('sucrose synthase' in header[2] or 'Sucrose synthase' in header[2]): removedseqnotsusy.append(header[2]) elif ('Ruegeria' in organism or 'Clavibacter' in organism or 'K9QEC1' in accession): removedseqRue.append(organism) elif sequence[0]!= 'M':

43 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

removedseqnotM.append(accession +" "+ organism + sequence) elif len(sequence) < min_length: removedseqlength_min.append(accession+" "+organism+"\t(%d AA)" %len(sequence)) elif len(sequence) > max_length: removedseqlength_max.append(accession+"\t(%d AA)" %len(sequence)) elif sequence.count("X") > undef: removedsequndef.append(accession) elif sequence in sequences: #get header header2 = str(sequences[sequence]).split("|") #extract the accesion code from the header accession2 = header2[1] removedseqnotunique.append(accession2+" = "+accession) elif organism in susy_organisms: removedseqdouble_org.append(organism) else: #add sequence to the hash table susy_organisms.append(organism) susy_organisms_AA.append(organism+"\t(%d AA)" %len(sequence)) susy_organisms_length.append(len(sequence)) sequences[sequence] = seq_record.description #split the file name from the extension filename = str(fasta_file).split(".") #write a summary file summary2 = open(filename[0]+"_summary2.txt","w+") summary2.write("summary \n======\n\n") summary2.write("input file: "+fasta_file+"\n") summary2.write("minimum required length: %d\n" %min_length) summary2.write("max undefined AA allowed: %d\n\n" %undef) summary2.write("number of input sequences: %d\n\n" %count1) summary2.write("number of retained sequences: %d\n" %len(sequences)) summary2.write("\t max length of retained susy's: %d \n" %max(susy_organisms_length)) summary2.write("\t min length of retained susy's: %d \n\n" %min(susy_organisms_length)) summary2.write("number of removed sequences: %d\n" %(count1-len(sequences))) summary2.write(" below minimum length: %d\n" %len(removedseqlength_min)) summary2.write(" above max length: %d\n" %len(removedseqlength_max)) summary2.write(" with undefined amino acids: %d\n" %len(removedsequndef)) summary2.write(" not unique: %d\n" %len(removedseqnotunique)) summary2.write(" not starting with Methionine: %d\n\n" %len(removedseqnotM)) summary2.write("Sequences to be removed after manual inspection of alignment: %d\n\n\n\n" %len(removedseqRue)) summary2.write("sequences not starting with Met (M)\n------\n") for removedlength_notM in removedseqnotM: summary2.write(removedlength_notM+"\n") summary2.write("sequences below minimum length\n------\n") for removedlength_min in removedseqlength_min: summary2.write(removedlength_min+"\n") summary2.write("\n\nsequences above max length\n------\n") for removedlength_max in removedseqlength_max: summary2.write(removedlength_max+"\n") summary2.write("\n\nsequences containing undefined amino acids\n------\n") for removedundef in removedsequndef: summary2.write(removedundef+"\n") summary2.write("\n\nsequences that are not unique\n------\n") for removednotunique in removedseqnotunique: summary2.write(removednotunique+"\n") summary2.write("\n\nsequences removed because there was already a susy present from that organism\n------\n") for removeddouble_org in removedseqdouble_org: summary2.write(removeddouble_org+"\n")

44 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

summary2.write("\n\nsequences removed because not annotated as susy\n------\n") for removednotsusy in removedseqnotsusy: summary2.write(removednotsusy+"\n") summary2.write("\n\n Sequences that does not align very well (after manual MSA inspection) \n------\n") for removedRue in removedseqRue: summary2.write(removedRue+"\n") summary2.write("\n\nretained organisms\n------\n") for element in susy_organisms_AA: summary2.write(element +"\n") summary2.close() output_file4 = open(time.strftime("%Y-%m-%d")+" susy_of_all_organisms_.fasta","w+") #read the hash table and write file in fasta format for sequence in sequences: #get header header = str(sequences[sequence]).split("|") #extract the accession code from the header accession = header[1] #extract the organism from the header temp = str(header[2]).split("OS=")[1] temp2 = str(temp).split("=")[0] organism = temp2[:len(temp2)-2] organism_=organism.replace(" ","_") #to have the full names in clustal omega if organism_.find("("): organism_=organism_.split("(")[0] #to clean up the names organism_=organism_[:len(organism_)-1] #to remove last _ output_file4.write(">"+accession+"|"+organism_+"\n"+sequence+"\n") output_file4.close() sequence_cleaner(fasta_file, min_length)

In total, 63 prokaryotic sequences (if only 1 isoform was considered) were retained and aligned with Clustal Omega (default parameters)297. ‘MEGA 6.0’172 was used to create a maximum likelihood (ML) unrooted phylogenetic tree, based on the LG+G+I+F model (best substitution model according to the AIC criterium as determined by the program Prottest167), with 1000 bootstrap replications, five discrete gamma categories, a Nearest-Neighbor-Interchange heuristic ML method and a strong branch swap filter.

To determine the gene organization of sucrose-metabolizing genes in prokaryotic organisms, UniProtKb, the Prokaryotic Operon DataBase (ProOpDB) and the Database of prokaryotic Operons (DOOR) were used298,299.

3.3 Cloning of novel SuSy genes in a constitutive expression system

The putative SuSy sequences from Acidithiobacillus caldus ATCC 51756 (SuSyAc, UniProt ID: A0A059ZV61), Denitrovibrio acetiphilus DSM 12809 (SuSyDa, UniProt ID: D4H6M0) and Melioribacter roseus JCM 17771 (SuSyMr, UniProt ID: I7A3T6) were codon optimized for E. coli, provided with a C-terminal His6-tag and chemically synthesized by GenScript (Piscataway, NJ, USA). The putative Sucrose Synthase (SuSyNe, UniProt ID: Q820M5), Sucrose-Phosphate Synthase/Phosphatase (SPS/SPPNe, UniProt ID: Q82V85) and Fructokinase (FrkNe, UniProt ID: Q82V86) encoding sequences of Nitrosomonas europaea ATCC 19718 were amplified from genomic DNA that was extracted from the organism and kindly provided by Prof. Nico Boon

45 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

(Ghent University). The SuSy encoding sequences were cloned into the constitutive expression vector pCXP34300 by means of a Gibson assembly procedure301. Primers used to amplify the genes and backbone are summarized in Table 1. The pCXP34 backbone for SuSyNe, FrkNe and SPS/SPPNe was amplified using BB_cTERM_Fw and oMEMO1804_Rv, while primer 13 and 14 was used in case of SuSyAc, SuSyDa and SuSyMr.

Table 1 List of primers used to clone the putative susy, sps/spp and frk genes from A. caldus, D. acetiphilus, M. roseus and N. europaea into a constitutive pCXP34 vector. BB: backbone, Fw: forward primer, Rv: reverse primer, FDB: primers constructed by Frederik De Bruyn.

Nr. Name Sequence (5’  3’) 13 pCXP34_BB_Rv CTTTGTTTCCTCCGAATTCGAGGTC 14 pCXP34_BB_Fw CTGCAGGTCGACCATATGGG FDB BB_cTERM_Fw CACCACCATCATCACCATTAAC FDB oMEMO1804_Rv CTTTGTTTCCTCCGAATTCG 15 SuSyAc_Fw CGAATTCGGAGGAAACAAAGATGATTGAAGCCCTGCGCCAAC 16 SuSyAc_Rv CCCATATGGTCGACCTGCAGTTAGTGGTGGTGGTGGTGGTGTTC FDB SuSyNe_Fw CCGTCGACCTCGAATTCGGAGGAAACAAAGATGACCACGATTGACACACTC FDB SuSyNe_Rv GACCTGCAGTTAATGGTGATGATGGTGGTGTATCTCATGGGCCAGCCTGTTTG 21 SuSyDa_Fw CGAATTCGGAGGAAACAAAGATGAATCTGTCGAATAAAGAACTGG 22 SuSyDa_Rv CCCATATGGTCGACCTGCAGTTAATGATGATGATGATGATGATATTC 23 SuSyMr_Fw CGAATTCGGAGGAAACAAAGATGATTAAAGACATCTACAAAACC 24 SuSyMr_Rv CCCATATGGTCGACCTGCAGTTAGTGATGGTGATGGTGGTG CCGTCGACCTCGAATTCGGAGGAAACAAAGATGTCTATCGATTCTTACAGTACGC FDB FrkNe_Fw TCACAAAAC FDB FrkNe_Rv GACCTGCAGTTAATGGTGATGATGGTGGTGATTGATCATCCCCCAGTCTTTGAG FDB SPS/SPPNe_Fw GACCTGCAGTTAATGGTGATGATGGTGGTGTTGGTCGAAGTGGTAGTGTTTCATC CCGTCGACCTCGAATTCGGAGGAAACAAAGATGATGACAGATCAGAAACTTTATA FDB SPS/SPPNe_Rv TTTTG

In case of SuSyAc, SuSyDa and SuSyMr, the genes and the pCXP34 backbone, the reaction mixture was composed of PrimeSTAR premix (Westburg), 2.5 µM forward and reverse primer and ~3 ng/µL template, in a total volume of 50 µL. Following program was used: initial denaturation of 5 min at 98°C and 30 cycles of denaturation at 98°C for 10 sec, annealing at 55°C for 5 sec and elongation at 72°C for 1 min/kb. For SPS/SPPNe, FrkNe and SuSyNe, the reaction mixture was composed of gDNA or pCXP34 plasmid (~0.01 ng/µL), 5x Q5 reaction buffer, Q5 High-fidelity DNA polymerase (0.02 U/µL), dNTP mix (0.2 mM), forward primer (0.5 µM) and reverse primer (0.5 µM). Following program was used: initial denaturation of 30 sec at 98°C, 29 cycles of denaturation at 98°C for 10 sec, annealing at 65°C (3°C above the minimal melting temperature of all primers) for 10 sec and elongation at 72°C for 10 sec/kb, followed by a final elongation of 2 min at 72°C. Next, PCR products were treated with DpnI (Westburg) to remove template DNA and were subsequently purified using the Qiagen or Analytik Jena purification kit, checked on a 1% agarose gel and the DNA concentration was measured with a Nanodrop ND-

46 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

1000 (Thermo Scientific) at 260 nm. To ligate the SuSy encoding sequences and the pCXP34 backbone, a Gibson assembly mix (20 µL) containing 100 ng backbone and an equimolar amount of gene product was incubated for 1 hour at 50°C. Finally, the resulting expression plasmids (2 µL) were transformed into 20-40 µL electrocompetent E. coli BL21 (DE3) cells in a sterile electroporation cuvette (Westburg, 2mm). The electric pulse added had a capacitance of 25 μF, 200 Ω resistance and a field strength of 2.0 kV ensuring a time constant around 4.7 msec. All constructs were subjected to nucleotide sequencing (by LGC genomics or macrogen) to confirm that the ligation was correct, and to exclude the presence of undesirable mutations.

3.4 Construction of truncation mutants of SuSyAc

Deletion mutants were made by amplifying part of the SuSyAc gene with Q5 polymerase and cloning this fragment into the pCXP34 vector by means of a Gibson assembly procedure301. Primers are listed in Table 2. The backbone was amplified using primers 13 and 14, while gene fragments were picked up using reverse primer 16 and forward primers 60-62 .

Table 2 List of primers used to make SuSyAc deletion mutants. BB: backbone, Fw: forward primer, Rv: reverse primer.

Nr. Name Sequence (5’  3’)

13 pCXP34_BB_Rv CTTTGTTTCCTCCGAATTCGAGGTC 14 pCXP34_BB_Fw CTGCAGGTCGACCATATGGG 16 SuSyAc_Rv CCCATATGGTCGACCTGCAGTTAGTGGTGGTGGTGGTGGTGTTC

60 CTD_del_Fw CGAATTCGGAGGAAACAAAGATGGGTGCAGAAGGTGAAGC

61 CTD/linker_del_Fw CGAATTCGGAGGAAACAAAGATGGACGGTCTGACGCATCTG 62 CTD/EPBD_del_Fw CGAATTCGGAGGAAACAAAGATGATCAGTCGCATTCTGATC

3.5 Enzyme production and purification

For enzyme production, culture was first inoculated from a cryovial (35% glycerol stock solution) in 5 mL lysogeny broth containing 10 g/L trypton, 10 g/L NaCl, 5 g/L yeast extract (LB-Miller) and 100 µg/mL ampicillin. This preculture was incubated overnight at 37°C with continuous shaking at 200 rpm. Next, 1 % (v/v) of the overnight culture was inoculated in shake flasks with fermentation medium (250 mL LB-Miller containing 100 μg/mL ampicillin) and incubated with continuous shaking at 200 rpm for at least 6 hours at 37°C, until an OD600 of about 3.5 was reached. The produced biomass was harvested by centrifugation for 15 minutes at 8000 rpm in a Thermoscientific sorvall RC6+ centrifuge at 4°C and the obtained cell pellets were stored at -20 °C. Cell pellet from 250 mL culture was then redissolved in 10 mL cold lysis buffer, which consisted of phosphate buffered saline pH 7.4 (PBS: 50 mM NaH2PO4/Na2HPO4 with 500 mM NaCl) supplemented with phenylmethylsulfonyl fluoride (PMSF, in ) and lysozyme in final concentrations of 100 µM and 1 mg/mL, respectively. This cell suspension was kept on ice and sonicated 3 times for 2.5 min (Branson sonifier 250, power level 3, 50 % duty cycle). After

47 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes sonication, cell debris was removed by centrifugation at 9000 rpm for 45 min in a Thermoscientific sorvall RC6+ centrifuge. The resulting supernatant, containing the soluble fraction of the protein, was collected and filtered using a syringe and a 0.2 µm filter (polyethersulfon membrane, VWR).

The His6-tagged proteins (except for SuSyDa) were purified by nickel-nitrilotriacetic acid (Ni-NTA) chromatography. First, 1 mL of regenerated Thermo Scientific Ni-NTA resin was added to a polypropylene column with 0.8 cm, resulting in a bed volume of ~500 µL. Equilibration was performed using 4 mL buffer composed of 10 mM imidazole in PBS. Next, the protein solution was applied to the column and washed with 12 mL buffer containing 80 mM imidazole in PBS. Afterwards, elution occurred with 3 mL buffer composed of 250 mM imidazole in PBS into a 30K Amicon Ultra centrifugal filter, which was pre-equilibrated with 4 mL 100 mM MOPS pH 7.0 (centrifugation of about 5 min at 4500 rpm). The volume of the elute was increased to 4 mL with 100 mM MOPS pH 7 buffer and the amicon was centrifuged in a swing bucket centrifuge for 10- 20 min, depending on the time needed to lower the buffer level to ~250 µL. This was repeated five times until a dilution of 4000-6000 was achieved (buffer exchange). For SuSyDa, MCLAB Ni- NTA resin was used and the protein was purified according to the MCLAB protocol with 20 mM imidazole in the lysis and equilibration buffer, 60 mM imidazole in the wash buffer and 250 mM imidazole in the elution buffer. Protein concentrations were measured with a Nanodrop ND-1000 (Thermo Scientific) using extinction coefficients and molecular weights as calculated by the online available Expasy ProtParam tool302.

3.6 Enzyme assays

3.6.1 TLC

To analyze kinase activity of FrkNe on fructose, 0.3 mg/mL of the enzyme was incubated with 10 mM Fru, 1 mM ATP and 1 mM MgCl2 for 3 hours at 37°C. Afterwards, the different components of the mixture were separated and visualized using ascending thin layer chromatography (TLC).

TLC was conducted with precoated silicagel plates (Silica gel 60 F254 from Merck KGaA, EMD Millipore corporation) in closed glass tanks saturated with the developing solvent consisting of

85% acetonitrile and 15% H2O. Samples (2 µL) were spotted on the silicagel plate at one cm above the bottom edge of the plate. The spots were dried and the silica plate was developed in the glass chamber at room temperature until the solvent front migrated up to 1 cm from the upper edge. After a run, the plate was dried with a common hair dryer and put back into the same development solution. The plate was dried again with a hair dryer, soaked into a 10% (v/v) H2SO4 solution and heated with a hot air gun (Bosh PHG 500-2 level 1, 300°C) to visualize the separated spots.

3.6.2 BCA assay

Next to high performace liquid chromatography (HPLC), four assays (UDP-Glc DH, UGPase/PGM/Glc6P DH, hexokinase/PGI/Glc6P DH and arsenomolybdate assay) are commonly

48 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes used to analyze SuSy activity in the breakdown direction of sucrose. The PK/LDH assay, on the other hand, can be used to monitor sucrose synthesis activity (Figure 15). These assays are all based on enzymatic conversions of the reaction products (fructose, UDP-Glc or UDP) and/or redox reactions involving NAD+, NADP or Cu species, which can be measured spectrophotometrically at 340 nm or 520 nm. However, the enzymes used in most of these assays are very expensive.

Figure 15 Commonly used methods to measure SuSy activity based on UDP or UDP-Glc release (A) or Fru release (B). The reaction catalyzed by SuSy and the assay used in this work to detect SuSy activity in the breakdown direction of sucrose are colored grey.

In this work, the cheap, easy and robust bicinchoninic acid (BCA) assay was used to detect SuSy activity in the sucrose breakdown direction (Figure 15). This biochemical assay is commonly used for the determination of the amount of protein in a solution but it can also be used to monitor enzymatic reactions involving enzymes that release reducing sugars such as fructose or glucose during catalysis43,303–305. These sugars reduce Cu2+ ions into Cu+, which is subsequently chelated by two molecules of bichinchoninic acid sodium salt (Figure 15). The latter results in the formation of a deep purple-colored complex that strongly absorbs light at a wavelength of 540-560 nm.

49 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

The color reagent is prepared by combining 23 parts of a solution containing 1.5 g/L 4,4’- dicarboxy-2,2’-biquinoline dipotassium salt and 62.3 g/L anhydrous Na2CO3, 1 part of a solution composed of 23 g/L aspartic acid, 33 g/L anhydrous Na2CO3 and 7.3 g/L CuSO4 and 6 parts ethanol. Sample (25 µL) is added to 150 µL of assay solution. Afterwards the microtiter plate (MTP) is covered by a plastic foil and incubated for 30 min at 70°C. After cooling to room temperature, the absorbance is measured at 540 or 560 nm. The resulting OD values are proportional to the amount of fructose present in the sample. One unit of SuSy activity is defined as the amount of enzyme that released 1 μmol of fructose min-1 under the specified conditions. Kinetic parameters (apparent Km and Vmax values) were calculated by non-linear regression of the Michaelis-Menten equation using Sigma Plot. Alternatively, substrate inhibition was fitted according to the equation (Vmax*S)/(S+Km+(S2/Ki)) with Vmax: maximal reaction velocity (U/mg), S: substrate concentration (mM), Ki: inhibition dissociation constant, Km: Michaelis-Menten constant306. Kcat (s-1) is the turnover number per SuSy monomer.

3.7 SDS-PAGE analysis

An SDS-PAGE with 5 % stacking gel and 10 % separating gel was used. The stacking gel was prepared by adding 2.85 mL deionised water, 0.85 mL 30 % acrylamide/bis solution, 1.25 mL 0.5 M Tris-HCl pH 6.8, 50 μl 10 % sodium dodecyl sulphate (SDS) solution, 50 μl 10 % ammonium persulphate (APS) solution and 5 μl tetramethylethylenediamine (TEMED). The separating gel consisted of 4.1 mL deionised water, 3.3 mL 30 % acrylamide/bis, 2.5 mL 0.5 M Tris-HCl pH 8.8, 100 μl 10 % SDS, 50 μl 10 % APS and 5 μl TEMED. To visualize proteins in the insoluble fraction, cell pellet was resuspended again in 10 mL PBS buffer. Samples were prepared by mixing 10 μL sample (soluble fraction, insoluble fraction or 1 mg/mL of purified protein) with 20 μL Laemli buffer (355 μL deionised water, 125 μL 0.5 M Tris-HCl pH 6.8, 250 μL glycerol, 200 μL 10 % SDS, 20 μL 0.5 % bromophenol blue, and prior to use 25 μl β-mercaptoethanol), followed by heating at 95°C for 5 minutes. Subsequently, 5 μL of each sample was loaded on the gel, and the gel was run for about 30-45 minutes at 200 V in running buffer composed of 3 g/L Tris base, 14.4 g/L glycine and 1 g/L SDS. The gel was removed from the spacer plate and was stained with the QC Colloidal coomassie stain form Biorad, following the manufacterer’s instructions. A prestained protein marker (Thermo Scientific) was used as a molecular weight reference.

3.8 Western blot analysis

First, the SDS-PAGE protocol is followed as described above, but instead of staining the gel, the proteins are transferred to a nitrocellulose membrane by making the following sandwich: (-) Spunch - Watmann-paper - SDS-gel - membrane - Watmann-paper - Spunch (+). Proteins were transferred during 60 min at 100 V in CAPS buffer (100 mL 10x CAPS containing 22.13 g/L CAPS at pH 11, 100 mL methanol and 800 mL water). Afterwards, the membrane was blocked under gentle swirling (orbital shaker) during at least one hour in PBS (5.84 g/L NaCl, 11.91 g/L

Na2HPO4:12H2O, 2.03 g/L NaH2PO4, pH 7.2) with 1% caseine. The membrane was washed

50 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes three times with PBS containing 0.2% triton X100. Next, the membrane was blotted for two hours under gentle swirling with an anti-polyhistidine antibody derived from mice (H1029 Sigma-Aldrich, 2000x diluted in 20 mL PBS with 1% caseine and 0.2% triton X100). The membrane was washed again three times and subsequently blotted for one hour under gentle swirling with an anti-mouse- Alkaline phosphatase antibody from goat (A3562 Sigma-Aldrich, 5000x diluted in 20 mL PBS with 1% caseine and 0.2% triton X100). The membrane was washed with PBS for the last time. In the last step the membrane was incubated during 30 min at 37°C in the dark with a solution containing 50 µL nitroblue tetrazolium (NBT)/ 5-bromo-4-chloro-3-indolyl phosphate (BCIP) stock solution, 10 mL 10 mM Tris pH 9.5, 100 mM NaCl and 50 mM MgCl2).

3.9 Effect of pH, temperature and divalent cations on SuSy activity

A universal Britton-Robinson (BR) buffer system, consisting of 25 mM H3BO3, H3PO4 and

CH3COOH was used to determine pH profiles of SuSyAc, SuSyDa, SuSyMr and SuSyNe in the sucrose cleavage direction at 40°C. One part of 50 mM BR buffer was mixed with 1 part of substrate mix (sucrose and ADP in milliQ) and titrated to the desired pH with NaOH. Concentrations of sucrose and ADP in the final reaction mixture were 200 mM and 5 mM, respectively.

Temperature profiles were made by determining the activity in the direction of sucrose cleavage

(200 mM Suc, 5 mM ADP, 2 mM MgCl2, 100 mM MOPS pH 7.0) from 30°C to 90°C. The initial reaction velocity was determined by taking eight samples during a period of ten minutes. The thermal stability was evaluated by incubating the enzyme (~0.17 mg/mL), without the presence of any substrate, for 15 min at 60°C in 100 mM MOPS pH 7.0. After incubation, residual activity in the sucrose cleavage direction was determined (200 mM Suc, 5 mM ADP, 100 mM MOPS pH 7.0).

Influence of MgCl2 on SuSy was determined by measuring the activity at 60°C in the presence of

100 mM MOPS pH 7.0, 200 mM Suc, 5 mM ADP and concentrations of MgCl2 ranging from 0 to 10 mM.

3.10 Nucleotide sequence accession numbers

The DNA sequences of the codon optimized genes have been submitted to GenBank (ID 1782677) under accession numbers KP284426 (SuSyAc), KP284427 (SuSyDa), KP284428 (SuSyMr). Plasmids carrying these genes have been deposited at the Belgian collection of microorganisms (BCCM/LMBP) and are thus publically available.

3.11 Statistical analysis

Sample standard deviations were determined using the STDEV.S function of excel. At least three replications were performed in each case. The statistical significance of the difference between

51 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes parameters was determined in R using the Wilcoxon rank sum test. The nul hypothesis (parameters are not statistically different) was rejected if p<0.05.

4 Results and discussion

4.1 Phylogenetic and taxonomic analysis of prokaryotic SuSy sequences

The past few years, an increasing amount of genomic data has become available. This called for a revision of the taxonomic distribution of putative prokaryotic SuSy enzymes. Hence, a phylogenetic tree was constructed with all available prokaryotic sequences from the UniProtKB database that were annotated as SuSy (63 at the time of writing, 2015) (Figure 16). Most of the prokaryotic hosts of SuSy belonged to the Cyanobacteria and Proteobacteria, which is in good agreement with other reports100,267,307. However, our phylogenetic analysis revealed that also organisms belonging to other phyla contain predicted SuSys. Indeed, Nitrospina gracilis, Denitrovibrio acetiphilus, Desulfurispirillum indicum, Dethiobacter alkaliphilus and Melioribacter roseus belong to the phyla , Deferribacteres, Chrysiogenetes, and Ignavibacteriae, respectively.

Figure 16 Phylogenetic tree of all putative prokaryotic SuSys. All 37 cyanobacterial sequences (only 1 isoform is considered) are compressed but listed in Table S4. The is mentioned between brackets, unless the organism belongs to the Proteobacteria. Organisms are preceded by their UniProtKB accession number. SuSys selected for recombinant expression and characterization are colored red.

52 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

The non-cyanobacterial prokaryotic organisms harboring a SuSy enzyme have a diverse range of energy metabolisms and use different carbon sources (Table S2). Indeed, some are phototrophic organisms using light as energy source while others are chemotrophic, using organic (organotrophs) or inorganic compounds (lithotrophs) as electron donors. Many of them are autotrophic, able to derive their carbon for biomass (and sucrose?) production from CO2 fixation. This occurs through the Calvin cycle, similar to plants and cyanobacteria, or through other autotrophic pathways such as the reductive tricarboxylic acid cycle308,309. However, several SuSy 310 hosts use organic compounds as carbon source (heterotrophs) . This indicates that CO2 fixation is not a prerequisite for sucrose metabolism and required intermediates (e.g. Fru6P) for the SPS enzyme must thus be derived from other pathways (e.g. glycolysis?) or from the environment.

Some of the hosts also have applications in various industries. Acidithiobacillus, for example, is widely used in the biomining industry, which uses microbial communities to leach out precious metals (e.g. copper) from mineral ores. This is accomplished by oxidation of the sulfide containing insoluble minerals to soluble metal sulphates311,312. However, this microbial process is also partly responsible for the generation of acid mine drainage, which has a severely negative impact on the environment313,314. Several nitrifying marine bacteria such as the ammonia-oxidizing bacteria (AOB) Nitrosospira, Nitrosomonas and Nitrosococcus and the nitrite oxidizer Nitrospina gracilis also harbor a SuSy enzyme. Nitrifyers increase the N availability for plants, they are important for the treatment of wastewater and they are involved in bioremediation of sites contaminated with chlorinated aliphatic hydrocarbons296,315.

All non-cyanobacterial prokaryotic SuSy hosts are mesophilic or moderately thermophilic with optimal growth temperatures between 25°C and 55°C. Some of them prefer acidic environments while others are moderate or highly alkaliphilic. The organisms are either aerobic or (facultative) anaerobic and thrive in several ecological niches such as mines, marine environment, sediments of (hypersaline) soda lakes, saline anoxic hot springs, streams of hydrothermal water, terrestrial environment and freshwater. As could be expected, those that inhabit saline environments are halophilic (requiring salt) or at least moderately halotolerant, invigorating the idea that sucrose acts as a compatible solute to protect bacteria against (moderate) osmotic stress316,317. In Thioalkalivibrio species, it was already shown that sucrose is a minor compatible solute, complementing the major osmolyte glycine-betaine318. Soil AOB and AOB from fluctuating freshwater ponds and sediments, life in a discontinuous environment subject to rapid changes in water potential due to evaporative drying319–321. In these desiccation circumstances, which also lead to a limited substrate supply, sucrose could potentially act as a compatible solute or as a carbon reserve. For those that grow optimally at higher temperatures, such as Melioribacter roseus and to a lesser extent also Acidithiobacillus caldus, sucrose could also act as a thermoprotective. As the enzymes (e.g. SuSy) occurring in these organisms are also likely to be more thermostable than their mesophilic counterparts, they present interesting targets for industrial applications.

53 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

4.2 Recombinant expression of novel prokaryotic SuSys

Recombinant expression of new SuSys from different organisms is of interest from both a fundamental and industrial point of view. Indeed, identification of functional SuSys can learn us something about the metabolic potential and coping mechanisms of the host organism. In addition, exploration of nature’s arsenal of SuSys can reveal enzymes with interesting properties such as high activity, stability and expression yield, required to have economically feasible reactions on industrial scale. Consequently, SuSys from the industrially relevant A. caldus (SuSyAc) and N. europaea (SuSyNe) species, which belong to the Proteobacteria, SuSy from the moderately thermophilic M. roseus (SuSyMr) and SuSy from Denitrovibrio acetiphilus (SuSyDa) were selected for characterization. The latter two belong to the rather under-represented phyla Ignavibacteriae and Deferribacteres, respectively. Sucrose metabolism in these phyla is still unexplored and thus highly interesting to investigate.

The sequences, provided with a C-terminal His6-tag, were expressed in E. coli BL21 (DE3) and purified by Ni-NTA metal affinity chromatography to apparent homogeneity (>95%) under optimized purification conditions (Figure 17).

Figure 17 SDS-PAGE analysis of the recombinantly expressed prokaryotic SuSys from A. caldus (SuSyAc), N. europaea (SuSyNe), M. roseus (SuSyMr) and D. acetiphilus (SuSyDa). Lane 1-4: purified enzymes, lane 5-8: crude cell extract (soluble fraction), lane 1 and 5: SuSyAc, lane 2 and 6: SuSyNe, lane 3 and 7: SuSyMr, lane 4 and 8: SuSyDa.

Their electrophoretic behavior corresponded well with their predicted molecular mass of about 92 kDa. All enzymes were mainly present in the soluble fraction but high expression yields in the crude cell extract were only observed for SuSyDa and SuSyAc. Starting with 250 mL expression cultures, final yields after purification were about 2, 0.7, 0.1 and 2 mg for SuSyAc, SuSyNe, SuSyMr and SuSyDa, respectively.

54 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

4.3 Effect of pH, temperature and divalent cations on the activity of novel prokaryotic SuSys

To determine the optimal conditions of the new SuSys in the sucrose cleavage direction, the effect of temperature, pH and MgCl2 on the activity was studied using the BCA assay. Results are summarized in Table 3.

Table 3 Properties of bacterial SuSy enzymes: pH optimum, temperature (T) optimum, Stability: % residual activity after incubation of the enzyme without substrates for 15 min at 60°C and the effect of 10 mM MgCl2 on the activity of the enzyme (+: increased activity, - : decreased activity).

pH optimum T optimum (°C) Stability (%) MgCl 2 SuSyAc 5.5 60 96 ± 3 + SuSyDa 6 65 0 ± 0 -- SuSyMr 7 80 38 ± 1 -- SuSyNe 5 75 54 ± 4 -

The pH optima of SuSyAc, SuSyDa, SuSyMr and SuSyNe were 5.5, 6.0, 7.0 and 5.0, respectively. All SuSys displayed at least 40% of their maximal activity in a pH range between 5.5 and 7.5 (Figure 18 and Table 3).

Figure 18 Effect of pH on the activity of SuSyAc, SuSyDa, SuSyMr and SuSyNe, determined with a Universal Britton-Robinson buffer system at 40°C (sucrose cleavage direction).

55 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

Temperature profiles were determined in the presence of 200 mM Suc, 5 mM ADP and 2 mM

MgCl2 at pH 7.0 (Figure 19 and Table 3). Highest activities were obtained at 60°C, 65°C, 80°C and 75°C for SuSyAc, SuSyDa, SuSyMr and SuSyNe, respectively. These are remarkably high temperature optima, especially since the host organisms of these SuSys only have optimal growth temperatures between 25 and 55°C (Table S2). The cyanobacterial SuSyTe also displays an optimum of 70°C, whereas plants SuSys have optima between 40°C and 55°C322–325.

Figure 19 Effect of temperature on the activity of SuSyAc, SuSyDa, SuSyMr and SuSyNe at pH 7.0 (sucrose cleavage direction).

In addition, the thermostability of the selected SuSys was assessed by determining the residual activity after incubating the enzymes for 15 min at 60°C. It should be noted that the enzymes were incubated without the presence of any substrates, since sucrose is known to act as a stabilizing agent263,326. Unlike the others, SuSyDa was completely inactivated within 15 min. The most thermostable SuSy appeared to be SuSyAc with a residual activity of 96% (Table 3).

Mg2+ or other cations have been frequently reported to either positively (Morell and Copeland 1985) or negatively291,327 influence the activity of SuSy in the sucrose cleavage direction. To scrutinize the effect of cations on the different SuSy enzymes, the activity was determined in the presence of 200 mM Suc, 5 mM ADP and varying concentrations of MgCl2 (Figure 20). For SuSyDa, SuSyMr and SuSyNe, a decrease in activity was observed for increasing concentrations of MgCl2. In contrast, MgCl2 slightly stimulated the activity of SuSyAc.

56 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

140 120

100 80 SuSyDa 60 SuSyMr

%Activity 40 SuSyAc 20 0 SuSyNe 0 5 10 [MgCl2] (mM)

Figure 20 Effect of MgCl2 on the activity of SuSyAc, SuSyDa, SuSyMr and SuSyNe (sucrose cleavage direction). The activity measured in the absence of MgCl2 (0 mM) is considered 100% for all SuSys. 4.4 Kinetic properties and substrate specificity of novel prokaryotic SuSys

Studies have shown that plant SuSys preferentially use UDP as nucleotide substrate, although ADP, CDP, GDP and TDP can serve to a lesser extent as alternative acceptors. Conversely, the SuSy from the thermophilic cyanobacterium Thermosynechococcus elongatus (SuSyTe) showed a clear preference for ADP, as reflected by the 7-fold lower Km compared to UDP325.

To investigate the nucleotide preference of SuSyNe, apparent kinetic parameters were determined for sucrose, ADP, UDP, GDP and CDP at 60°C and pH 7.0 in the sucrose breakdown direction (Table 4 and Figure S3). Substrate inhibition occurred in the presence of GDP at concentrations above 10 mM (Ki≈50 mM) whereas typical Michaelis-Menten kinetics were observed for the other substrates. A significant difference was observed between the affinity for sucrose in the presence of either ADP or UDP. Apparently, the Km for sucrose is about 8 times lower with ADP as co-substrate instead of UDP. Conversely, Km values for ADP and UDP are in the same range. For plant SuSys, reported Km values for sucrose are also dependent on the used co-substrate but for these enzymes the affinity for sucrose was highest with UDP119,120,290.

Table 4 Apparent kinetic parameters for SuSyNe at 60°C (100 mM MOPS pH 7.0) in the sucrose cleavage direction. A fixed concentration of 200 mM sucrose was used if the concentration of NDP was varied, while 5 mM NDP was used if sucrose was varied.

-1 -1 Km (mM) Vmax (U/mg) kcat/Km (M s )

Suc (ADP) 40.1 ± 8.2 27.4 ± 7.5 1.0 x 103 Suc (UDP) 321 ± 40 63.1 ± 8.8 3.0 x 102 ADP 0.44 ± 0.15 20.8 ± 0.4 7.1 x 104 UDP 0.69 ± 0.04 67.7 ± 2.2 15.0 x 104 GDP 1.56 ± 0.17 40.1 ± 3.3 3.9 x 104 CDP 1.28 ± 0.12 11.5 ± 0.8 1.4 x 104

57 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

Based on the affinities for the different nucleotides, SuSyAc also showed a clear preference for ADP. Indeed, the enzyme displayed Km values of 0.17, 7.8, 8.5 and 16.9 mM for ADP, UDP, GDP and CDP, respectively (Table 5 and Figure S4). The Km for ADP is thus at least 45 times lower compared to the other nucleotides although substrate inhibition was observed above 2 mM (Ki≈19 mM) (Figure S4).

Table 5 Apparent kinetic parameters for SuSyAc at 60°C (100 mM MOPS pH 7.0) in the sucrose cleavage direction. A fixed concentration of 200 mM sucrose was used if the concentration of NDP was varied, while 5 mM NDP was used if sucrose was varied.

-1 -1 Km (mM) Vmax (U/mg) Kcat/Km (M s )

Suc (UDP) 72.6 ± 2.8 39.6 ± 1.1 1.0 x 103 ADP 0.17 ± 0.05 63.3 ± 2.9 5.7 x 105 UDP 7.8 ± 1.1 53.5 ± 5.4 1.1 x 104 GDP 8.5 ± 0.4 9.2 ± 0.3 1.6 x 103 CDP 16.9 ± 0.2 15.8 ± 0.4 1.4 x 103

The Km of sucrose in the presence of UDP was 73 mM, while no clear Michaelis-Menten kinetics was observed with ADP. Indeed, the activity increased almost linear after an initial saturation plateau until a second saturation plateau was obtained (Figure S4). Interestingly, Elling and coworkers also obsvered two hyperbolic curves for SuSy from rice grains, depending on the sucrose concentration322. The authors postulated that these curves could be explained by the aggregation of the hydrophobic tetrameric enzyme units at high sucrose concentrations. This aggregation would double the activity of the enzyme as needed for the efficient biosynthesis of nucleotide sugars at high sucrose concentrations.

Kinetic parameters for SuSyDa and SuSyMr are presented in Table 6, Figure S5 and Figure S6. Interestingly, the activity of SuSyDa was extremely high. Indeed, with a few exceptions119,120, typical values for Vmax of SuSy enzymes in the breakdown direction of sucrose are between 0.1 and 14 U/mg68,89,93,292,325,328,329 and thus a 10 to 100-fold lower than that observed for SuSyDa.

Table 6 Apparent kinetic parameters for SuSyDa and SuSyMr at 60°C (100 mM MOPS pH 7.0) in the sucrose cleavage direction. A fixed concentration of 200 mM sucrose was used if the concentration of ADP was varied, while 5 mM ADP was used if sucrose was varied.

Km ADP (mM) Vmax ADP (U/mg) Km Suc (mM) Vmax Suc (U/mg)

SuSyDa 1.4 ± 0.2 137 ± 1 194 ± 73 125 ± 17 SuSyMr 0.37 ± 0.01 6.4 ± 0.01 56 ± 5 8.1 ± 0.6

Finally, the nucleotide specificity for SuSyDa and SuSyMr was evaluated using 10 mM NDP and 1 M sucrose (Figure 21). Clearly, the predilection for ADP could be extended towards SuSyDa, based on the 20-fold higher activity with this substrate compared to the other nucleotides.

58 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

SuSyMr, on the other hand, did not display a significant preference (p<0.05) for the acceptor nucleotide.

Figure 21 Nucleotide specificity for SuSyDa and SuSyMr with 10 mM NDP and 1 M sucrose at 60°C (100 mM MOPS pH 7.0).

It was already suggested before that a preference for adenine nucleotides links sucrose metabolism directly to glycogen metabolism in vivo134,284. Production of glycogen inside bacteria is catalyzed by which uses ADP-glucose (ADP-Glc) as glucosyl donor to elongate an α-1,4-glucosidic chain. ADP-Glc is mainly generated from glucose 1-phosphate by ADP-Glc pyrophosphorylase (AGPase, EC 2.7.7.27) but a concomitant supply of ADP-Glc for glycogen biosynthesis should also be attributed to the sucrose cleavage action of SuSy in cyanobacteria134,284. The clear preference for ADP, observed for the SuSys from non- photosynthetic species, could thus indicate a similar function in regulating the C-flux between sucrose and glycogen. However, as the in vitro conditions used for the characterization studies differ from those in vivo (e.g. different temperature and the presence of additional metabolites inside the organism which could possibly influence the apparent kinetic parameters of the SuSy enzyme), additional in vivo experiments are required to confirm the former hypothesis.

4.5 Production of UDP-glucose with SuSyAc

Due to the excellent properties of SuSyAc (high stability, high activity and good expression yield), it would be a suitable biocatalyst for the production of high-valued nucleotide sugars. This was nicely demonstrated by a collaborating partner of our European FP7 SuSy project (http://www.glycosusy.eu/): the group of professor Nidetsky. They optimized a process for the production and downstream processing of UDP-Glc using SuSyAc permeabilized cells330.

Summarized, they boosted the expression of SuSyAc to ~350 mg/Lmedium by using a batch fermentation process using an enriched LB medium in a controlled bioreactor. To overcome mass transfer limitations, the whole cells were permeabilized by freeze-thawing and reaction was performed under pH control at low pH to have higher conversion efficiencies. High yields (86% based on UDP) and concentrations (103 g/L) of UDP-Glc were obtained within 10 hours, resulting in a space time yield (STY) of 10 g/L/h and a total turnover number (TTN) of 103 gUDP-Glc/gCDW. Downstream processing of the nucleotide-sugar was performed by a newly developed

59 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes chromatography-free purification protocol. It consisted of the removal of cells by filtration, a phosphatase treatment to hydrolyse UDP/UMP and repeated precipitation of UDP-Glc by EtOH. The overall yield was 63% and a purity of ≥ 90 % was achieved.

4.6 Truncation mutants of SuSyAc

Truncation of enzymes, which consist of deleting (N or C-terminal) parts of the coding gene, has resulted in catalysts with increased expression yields, activity and/or thermostability331–335. In addition, deletion of whole domains can help elucidate the function of these entities336.

As described in Chapter 1, plant SuSys are composed of four different domains (Figure 8). Although certain functions are attributed to the CTD and EPBD domains of plant SuSys, the role of the corresponding regions in bacterial SuSys is still unknown. These domains could, for example, be important for the catalytic activity, protein folding and/or oligomericity. Hardin and coworkers already demonstrated that truncation of the C-terminus of maize SuSy resulted in formation of dimers instead of tetramers, consistent with the finding that the C-terminus is part of the interface between monomers48,337. In order to scrutinize the function of the N-terminal domains in prokaryotes, three deletion mutants of SuSyAc were made (Table 7).

Table 7 Overview of N-terminal deletion mutants of SuSyAc. The deleted regions are crossed out. AA: amino acids, CTD: region in SuSyAc corresponding to the cellular targeting domain of SuSyAt1, EPBD: region in SuSyAc corresponding to the ENOD40 peptide binding domain of SuSyAt1, GTB: catalytic domains.

Deleted region Deleted AA Name enzyme Molecular weight (kDa)

CTD-linker-EPBD-GTBN-GTBC / SuSyAc WT 92

CTD-linker-EPBD-GTBN-GTBC 2-116 CTD_del 78

CTD-linker-EPBD-GTBN-GTBC 2-145 CTD/linker_del 75

CTD-linker-EPBD-GTBN-GTBC 1-262 CTD/EPBD_del 62

After successful deletion of the different domains, expression in both the soluble and insoluble fractions was evaluated. While SuSyAc wildt-type (WT) is expressed in high amounts in the soluble fraction (CCE), as can be concluded from the big band around 92 kDa (molecular weight of monomer), soluble or insoluble expression of deletion mutants could not be confirmed by SDS- PAGE analysis (Figure 22A). Consequently, a Western blot analysis was performed (Figure 22B). Apparently, although in small amounts, the deletion mutants were present in the soluble fraction. In summary, deletion of N-terminal domains of SuSy results in severely decreased expression yields, indicating their role in proper folding of the enzyme. Incorrect folding can induce a stress response in the producing E. coli host, activating household regulations to remove the enzymes and thereby avoiding the toxic accumulation of unfolded proteins338. The latter could explain why the protein levels of the (misfolded) mutants in the insoluble fraction are not increased.

60 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

Figure 22 SDS-PAGE (A) and Western blot (B) analysis of truncation mutants of SuSyAc. CCE: crude cell extract, IF: insoluble fraction. 1: SuSyAc CTD_del, 2: SuSyAc CTD/linker_del, 3: SuSyAc CTD/EPBD_del, 4: SuSyAc WT, NC: negative control (CCE of E. coli containing a recombinantly expressed protein without his-tag). The arrow indicates the position of SuSyAc WT in the soluble fraction.

4.7 Genomic organization of SuSy genes in bacteria

To check whether the host organisms of SuSyDa, SuSyAc, SuSyNe and SuSyMr also possessed other sucrose-synthesizing enzymes, their genomes were screened for the occurrence of putative SPS and SPP encoding genes (Figure 23). Interestingly, in all cases, susy was clustered in an operon together with a putative fructokinase (frk or pfkB) and an sps/spp. The latter encodes a bifunctional enzyme consisting of both a Sucrose-Phosphate Synthase and a Sucrose-Phosphate Phosphatase domain. Such enzymes have been identified before in both proteobacterial and cyanobacterial organisms274,339. In case of the putative sps/spp encoding sequences of M. roseus, D. acetiphilus, N. europaea and A. caldus, all HAD-phosphatase residues required for SPP activity were present and other homologous spp sequences were not found, indicating that they are probably functional bimodular enzymes (Figure 23 and Table S3). To test the identity of the clustered genes, the putative frk (EC 2.7.1.4) and sps/spp genes of N. europaea were cloned from genomic DNA into a constitutive pCXP34 vector. Unfortunately, expression of SPS/SPPNe in the soluble fraction could not be detected by western blot. Soluble FrkNe, on the other hand, was expressed in high amounts (data not shown). Furthermore, kinase activity on fructose was confirmed by the presence of a spot at a similar height as fructose 6-phosphate (Fru6P) at TLC after 3 hours of incubation at 37°C (data not shown). These results indicate the correct annotation of Fructokinase specificity in N. europaea.

61 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

Figure 23 Genomic organization of sucrose metabolizing genes in non-photosynthetic and photosynthetic prokaryotes. Position in the genome is indicated above the arrows. Blue box: seemingly futile cycle of sucrose metabolism.

In addition to the bacteria from which the SuSy enzymes were characterized in this study, genomic organization of all the other bacterial SuSys presented in Figure 16 was scrutinized too. The cyanobacteria had a very diverse pattern of sucrose metabolizing genes including susy, spp, sps, invertase, amylase and genes, scattered all over the genome (Table S4 and Figure 23). Only one cyanobacterial species, Microcystis aeruginosa, had its sucrose

62 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes genes clustered together into one operon. Several cyanobacterial species harbored multiple SuSy isoforms while only one susy gene could be identified in the genomes of non-cyanobacterial species. The occurrence of multiple isoforms in cyanobacteria was already reported for Anabaena sp. strain PCC 7119106 and it is also a general phenomenon in plants105,114,120. Remarkably, 70% of cyanobacterial strains that possessed a susy gene, did not have a putative sps sequence while they did harbor one or multiple spp sequences (e.g. Crinalium epipsammum, Figure 23). The presence of an SPP, without an enzyme to synthesize its substrate (SPS) could indicate several things: the SuSys from these organisms are promiscuous and can also use Fru6P to produce Suc6P, the SuSy enzymes are wrongly annotated and are actually SPS in vivo or extracellular sucrose is imported via a phosphotransferase transport system (PTS) yielding intracellular Suc6P. Although sucrose is normally imported with phosphorylation on the glucose moiety261, phosphorylation on the fructose moiety during transport across the membrane has also been suggested recently340.

All non-cyanobacterial prokaryotes had their putative susy, sps/spp and frk genes clustered together into one operon except for four species of the genera Thioalkalivibrio, Ectothiorhodospira, Thiorhodospira, and Thiorhodovibrio (Figure 23). These organisms all belong to the order of Chromatiales and the latter three are capable of performing anoxic photosynthesis with electron donors other than H2O (Table S2). These results suggest that a separate location of sucrose metabolizing genes in the genome is a universal feature amongst phototrophic organisms, which could possibly be important or linked with the ability to perform photosynthesis.

This would not seem unlikely, as the energy of light is used to drive CO2 fixation, for which sucrose is known to be the major end-product. In addition, susy expression has been shown to be regulated by light in cyanobacteria284.

The occurrence of both sucrose-synthesizing enzymes and sucrose-degrading enzymes in one operon in most of the non-phototrophic organisms, raises metabolic questions about the function of these enzymes in these organisms. The seemingly futile cycle of sucrose metabolism, resulting from these co-expressed enzymes, could be an ingenious mechanism to fine-tune the supply of sucrose and nucleotide sugars, depending on the cell’s demand under certain environmental conditions. Indeed, it has been suggested before that sucrose cycles in plants, characterized by a permanent process of formation and degradation, could allow organisms to respond with a high degree of sensitivity to factors influencing sugar accumulation, osmotic potential, respiration and sugar signaling133–135. In addition, although SuSy and SPS are typically not clustered in the genome of cyanobacteria, a simultaneous increase in expression of spsA, susA and susB was observed under salt stress for Anabaena sp. PCC 7120131. This led to an increase in sucrose synthesis, accompanied by an enhancement in sucrose degradation in this organism. Still, net accumulation of sucrose was observed and postulated to occur because of post-translational regulation of SPS activity (by NaCl). Similarly, the activity of prokaryotic SuSys could also be

63 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes possibly regulated by post-translational modifications (e.g. phosphorylation) like their plant counterparts.

4.8 Phosphorylation of prokaryotic SuSys

As outlined in the literature review, phosphorylation presents a major post-translational modification mechanism of plant SuSys, potentially alterating their activity, intracellular location and rate of degradation. In plants, two major phosphorylation sites (Ser-13 and Ser-167 in SuSyAt1) were identified based on sequence analysis and both in vivo and in vitro experiments with endogenous kinases and labeled phosphate substrates, antibodies or Fe3+-IMAC93,101,142. These serine residues (or threonine in some cases) are very conserved in plants and are part of a minimal consensus motif (hydrophobic-X-basic-X-X-S/T), recognized by calcium dependent (CDPKs) and SNF1-related Ser/Thr protein kinases142,341 (Figure 24).

Phosphorylation has long been considered as a process exclusively occurring in eukaryotes, but more recently, (Ser/Thr) protein kinases have also been identified in prokaryotes and are now considered ubiquitous in these organisms342–345. It was found that phosphorylation sites motifs of prokaryotes, archaea and eukaryotes are very different346,347. However, in contrast to eukaryotes, minimal motifs for the different bacterial protein kinases remain largely unknown. In addition, with a few exceptions, most online tools for the prediction of phosphorylation sites are developed for eukaryotic proteins347,348. Those that are available for prokaryotes, such as NetPhosBac347, are trained on datasets of only two model organisms, and could thus possibly be biased.

Currently, no information regarding phosphorylation of prokaryotic SuSys is available. This invigorated me to search for the corresponding CTD and EPBD plant serine residues in the sequences of these organisms. To this end, a sequence alignment was made with clustal omega between the N-terminal parts of plant SuSys and bacterial SuSys (Figure 24). Interestingly, many cyanobacteria had an N-terminal Ser residue in a conserved Leu-hydrophobic-X-X-hydrophobic- hydrophobic-X-Ser-X-Glu-basic-X-X-Leu motif, indicating that it could possibly function as a phosphoacceptor in cyanobacteria. The other cyanobacteria did not have a Ser residue at the corresponding position, but a Cys, Asn, Ala or Gly.The consensus sequence differs from other proposed bacterial phosphorylation site motifs which had conserved residues at position -1 and +1 relative to the phosphorylated serine347. In non-cyanobacterial prokaryotic SuSys, no clear motif or conserved residue could be identified. In case of the second known phosphorylation site (Ser-167 in SuSyAt1), a serine or threonine occurs in plants, serine or cysteine occupies the corresponding position in cyanobacteria while a serine or alanine is present in the other bacteria (Figure 24). The residues surrounding this site are very conserved in all bacterial and plant SuSys. However, as this region is located at the interface of two SuSy monomers, the conservation is probably due to structural restrictions rather than being a consensus motif for phosphorylation.

64 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

Figure 24 Sequence alignment of a selection of prokaryotic and eukaryotic SuSys around phosphorylation sites Ser-13 and Ser-167 from SuSyAt1. Consensus motifs (most occurring residues) are displayed above the alignment and are based on 107 plant, 41 cyanobacterial and 26 noncyanobacterial sequences (only 1 isoform per species). Blue: hydrophobic residues, yellow: polar residues, red: negatively charged (acidic) residues, green: positively charged (basic) residues, X: one of the 20 naturally occurring amino acids.

5 Conclusions

Taxonomic analysis of the hosts of all annotated prokaryotic SuSys from the UniProtKB website revealed that not only photosynthetic organisms such as plants and cyanobacteria harbor SuSys but also other phyla such as the Proteobacteria, Deferribacteres, Chrysiogenetes, Ignavibacteriae, Nitrospinae and Firmicutes. The host organisms of SuSy enzymes are physiologically very diverse and inhabit several ecological niches. Many of them thrive in saline or discontinuous environments, attributing a role for sucrose as a compatible solute protecting the organisms against osmotic or desiccation stress. Alternatively, sucrose could act as thermoprotective or as a carbon reserve to survive periods of nutritional limitation. The presence of putative susy and sps/spp sequences in the genomes of chemoheterotrophic prokaryotes indicate that CO2 fixation (autotrophy) is not a prerequisite for sucrose metabolism. The required

65 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes intermediates (e.g. Fru6P and UDP-Glc for SPS) must thus be extracted from other intracellular pathways or from the environment. In addition, several other interesting conclusions could be drawn from the analysis of the genomic organization of susy in bacteria. Some cyanobacterial species harbor multiple SuSy isoforms while only one susy gene could be identified in the genomes of non-cyanobacterial species. Many cyanobacteria do not possess an annotated SPS enzyme while they do have putative spp and susy sequences. If annotation occurred correctly, this would indicate a sucrose metabolism that does not involve the well-known SPS/SPP/SuSy pathway. Lastly, all non-cyanobacterial organisms had their putative susy, sps/spp and frk genes clustered together into one operon except for some photosynthetic purple bacteria, indicating a possible link between genome organization and photosynthetic capacity.

Phosphorylation is a well-studied post-translational modification of plant SuSys. To search for similar phosphoacceptor candidates, a sequence alignment was made between prokaryotic and plant SuSys with respect to the two major phosphorylation sites in plant SuSys. A serine residue corresponding to Ser-13 (SuSyAt1) was detected in some cyanobacteria and the surrounding residues could be part of a newly identified minimal bacterial consensus motif required for substrate recognition. A serine residue was present in several cyanobacteria and other prokaryotes at the second phosphorylation site (Ser-167 in SuSyAt1). However, this site is located at one of the interfaces between two monomers, indicating that conservation could also be the result of structural restrictions. Clearly, in vitro and in vivo experiments will be necessary to determine whether phosphorylation of bacterial SuSys is possible. Results thereof could not only improve our understanding of post-translational modifications of SuSys, it could also provide valuable information about phosphorylation in bacteria, which is still poorly understood.

Finally, two proteobacterial SuSys (SuSyAc and SuSyNe), one deferribacterial SuSy (SuSyDa) and one ignavibacterial SuSy (SuSyMr) were expressed and characterized. Good expression yields in the soluble fraction, using a constitutive pCXP34 vector, were obtained for SuSyAc and SuSyDa, while expression was much lower for SuSyNe and SuSyMr. N-terminal truncation mutants were found to be poorly expressed, indicating that the non-catalytic domains are necessary for proper folding. The purified enzymes were found to display high temperature optima (up to 80°C), high activities (up to 125 U/mg) and high thermostability (up to 15 min at

60°C). While SuSyDa displayed the highest activity, it suffered from low thermostability. MgCl2 had an inhibiting effect on SuSyMr, SuSyNe and SuSyDa while the activity of SuSyAc was slightly stimulated. SuSyAc, SuSyDa and SuSyNe displayed a clear preference for ADP, which could indicate their role in controlling the carbon flux between sucrose and glycogen. SuSyDa displayed a 20-fold higher specific activity with ADP compared to the other nucleotides. In case of SuSyNe, the Km for sucrose was about 8 times lower with ADP as co-substrate compared to UDP. SuSyAc, on the other hand, had at least a 45-fold higher affinity for ADP compared to the other nucleotides. Nonetheless, SuSyAc can be regarded as a promising biocatalyst for the

66 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes industrial production of UDP-Glc due to its high expression yield, high stability and high activity at high concentrations of UDP.

6 Supplementary materials

Figure S3 Kinetics of SuSyNe in 100 mM MOPS pH 7.0 at 60°C. A fixed concentration of 200 mM sucrose was used if the concentration of NDP was varied, while 5 mM NDP was used if sucrose was varied.

67 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

Figure S4 Kinetics of SuSyAc in 100 mM MOPS pH 7.0 at 60°C. A fixed concentration of 200 mM sucrose was used if the concentration of NDP was varied, while 5 mM NDP was used if sucrose was varied.

68 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

Figure S5 Kinetics of SuSyDa in 100 mM MOPS pH 7.0 at 60°C. A fixed concentration of 200 mM sucrose was used if the concentration of ADP was varied, while 5 mM ADP was used if sucrose was varied.

Figure S6 Kinetics of SuSyMr in 100 mM MOPS pH 7.0 at 60°C. A fixed concentration of 200 mM sucrose was used if the concentration of ADP was varied, while 5 mM ADP was used if sucrose was varied.

69 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

Table S2 Non-cyanobacterial organisms harboring a putative SuSy enzyme: (Phylum/Class/Order/Family) and characteristics such as optimal growth temperature (T), optimal pH, ecological niche, NaCl range for growth (halophilicity), source of energy, carbon and electrons and O2 requirement. N.f.: not found in literature, AOB: ammonia-oxidizing bacteria, Ref: references, mes.: mesophilic.

Source of energy/C SuSy host Taxonomy T (°C) pH Ecological niche NaCl range O Ref and electrons 2 Proteobacteria/ Acidithiobacillus caldus Acidithiobacillia/ 349– 45 2-2.5 acidic (copper) mines not halophilic chemolithoautotrophic n.f. 351 ATCC 51756 Acidithiobacillales/ Acidithiobacillaceae Proteobacteria/ Acidithiobacillus Acidithiobacillia/ facultative 352 25-32 2.5 metal mines n.f. chemolithoautotrophs ferrivorans SS3 Acidithiobacillales/ anaerobes Acidithiobacillaceae Proteobacteria/ Acidithiobacillus deep caves or acid mine 353– Acidithiobacillia/ facultative ferrooxidans (strain ATCC 30 1.5-2.5 drainage, such as coal n.f. chemolithoautotroph 355 Acidithiobacillales/ anaerobe 23270) waste Acidithiobacillaceae Deferribacteres/ Denitrovibrio acetiphilus Deferribacteres/ 0-1 M, optimum at obligately 356,3 (strain DSM 12809 / 35-37 6.5–8.6 (marine) chemoorganoheterotroph 57 Deferribacterales/ 0.5 M anaerobic N2460) Proteobacteria/ Desulfococcus / n.f. n.f. n.f. n.f. n.f. n.f. 358 multivorans DSM 2059 Desulfobacterales/ Desulfobacteraceae Proteobacteria/ 1-4 M, optimum Desulfonatronospira Deltaproteobacteria/ sediment of hypersaline between 1-2 M Na+ chemolithoautotrophically obligately 359 mes. 9.5-10 thiodismutans ASO3-1 Desulfovibrionales/ soda lakes (1-4 M Na) (extremely salt or organotroph anaerobic Desulfohalobiaceae tolerant) Chrysiogenetes/ Desulfurispirillum indicum 0.1-0.75, optimum at Chrysiogenetes/ sediment of an estuarine obligately 360 (strain ATCC BAA-1389 / 28 6.8–7.6 0.4 M (slightly chemoorganoheterotroph Chrysiogenales/ canal anaerobic S5) halophilic) Proteobacteria/ 0.2-2.5, optimum Desulfurivibrio alkaliphilus Deltaproteobacteria/ hypersaline alkaline lake between 0.5-1.0 M obligately 361 (strain DSM 19089 / 35 8.5–10 chemolithoautotrophic Desulfobacterales/ sediments (moderately salt- anaerobic UNIQEM U267 / AHT2) Desulfobulbaceae tolerant) Proteobacteria/ Desulfuromonas Deltaproteobacteria/ predominantly found in obligately 310 30 6.5-8.5 n.f. chemoorganotrophic acetoxidans DSM 684 Desulfuromonadales/ anoxic marine sediments anaerobic Desulfuromonadaceae

70 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

Firmicutes/ / 0.2-1.8 M, optimum Dethiobacter alkaliphilus obligately 362 Clostridiales/ mes. 9.5 soda lakes (sediment) at 0.4 M (moderately chemolithoautotrophically AHT 1 anaerobic Syntrophomonadaceae salt-tolerant) Proteobacteria/ Gammaproteobacteria/ alkaline, saline and 363– Ectothiorhodospira sp. 0.1 and 1.7 M, Chromatiales/ 43 8.7-9.3 anoxic hot spring (45°C) Photolithoautotrophic anaerobic 365 PHS-1 optimum at 0.6 M Ectothiorhodospiraceae in biofilms (purple sulfur bacteria) Ignavibacteria/ microbial mats Melioribacter roseus Ignavibacteria/ 0-1 M, optimum at facultative 295 52-55 n.f. developing in streams of Chemoorganotrophic (strain P3M) Ignavibacteriales/ 0.1 M anaerobic hydrothermal water Melioribacteraceae Proteobacteria/ Nitrosococcus halophilus Gammaproteobacteria/ 0.7 M optimum 28–32 7.6-8 marine chemolithoautotrophic aerobic 309 (strain Nc4) Chromatiales/ (obligately halophilic) Chromatiaceae Proteobacteria/ Nitrosococcus oceani Gammaproteobacteria/ 0.5 M optimum 309 28–32 7.6-8 marine and salt lakes chemolithoautotrophic aerobic AFC27 Chromatiales/ (obligately halophilic) Chromatiaceae Proteobacteria/ Nitrosococcus watsoni Gammaproteobacteria/ 0.6 M optimum 309 28–32 7.6-8 marine chemolithoautotrophic aerobic (strain C-113) Chromatiales/ (obligately halophilic) Chromatiaceae Proteobacteria/ soil, sewage, freshwater, 296,3 Nitrosomonas europaea Betaproteobacteria/ chemolithoautotrophic the walls of buildings and Reduced growth 66– (strain ATCC 19718 / Nitrosomonadales/ 30 8.1 (facultative aerobic on the surface of above 0.1 M 369 NBRC 14298) Nitrosomonadaceae chemolithoorganotroph) monuments, terrestrial (AOB) Proteobacteria/ Betaproteobacteria/ ammonium rich soil or 366,3 Nitrosomonas eutropha Nitrosomonadales/ 30 8.1 water; eutrophicated Tolerant up to 0.4 M chemolithoautotrophic aerobic 67 (strain C91) Nitrosomonadaceae environments (AOB) Proteobacteria/ Nitrosospira multiformis Betaproteobacteria/ (strain ATCC 25196 / Nitrosomonadales/ mes. n.f. (agricultural) soil n.f. chemolithoautotrophy aerobic 320 NCIMB 11849) Nitrosomonadaceae (AOB)

71 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

Nitrospinae/ Nitrospinia/ 308,3 Nitrospina gracilis 3/ 211 Nitrospinales/ 25-30 7.5-8 ocean surface waters n.f. chemolithoautotrophic 70

Nitrospinaceae aerobic Proteobacteria/ Gamma 0.2-1.5 M, optimum Thioalkalivibrio proteobacteria/ at 0.4 (halophilic, obligately 268,3 sulfidiphilus (strain HL- n.f. 10 soda lakes chemolithoautotrophic 71 Chromatiales/ moderately aerobic EbGR7) Ectothiorhodospiraceae halotolerant) Proteobacteria/ Gammaproteobacteria/ 0-1 M, optimum at 318,3 Thiorhodospira sibirica low salinity soda lake obligately Chromatiales/ 30 9 0.1 M (moderately photolithoautotrophic 72 ATCC 700588 (0.2% salinity) anaerobic Ectothiorhodospiraceae halotolerant) (purple sulfur bacteria) Proteobacteria/ Gammaproteobacteria/ Thiorhodovibrio sp. 970 Chromatiales/ n.f. n.f. n.f. n.f. n.f. n.f.

Chromatiaceae(purple sulfur bacteria) Deferribacteres/ Deferribacteres/ growth not inhibited 373 Geovibrio thiophilus 37-40 n.f. n.f. chemolithotrophic anaerobic Deferribacterales/ up to 0.3 M NaCl Deferribacteraceae

72 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

Table S3 Conservation of HAD-phosphatase superfamily active site residues (required for SPP activity) in SPP from Synechocystis sp. PCC 6803 and in SPP- and SPS/SPP-like sequences from other species. Residue numbers refer to the Synechocystis sp. PCC 6803 SPP (table modified from267).

HAD-phosphatase active site residues Enzyme D9XDXT13 T41 K163 D186 D190

Synechocystis sp. PCC 6803 SPP DLDNT T K D D Melioribacter roseus SPS/ SPP DIDDT T K D D Denitrovibrio acetiphilus SPS/ SPP DIDNT T K D D Acidithiobacillus caldus SPS/ SPP DIDNT T K D D Nitrosomonas europaea SPS/ SPP DIDNT T K D D Synechococcus sp. PCC 7002 SPS/ SPP TIDQN T K G D Synechococcus sp. PCC 7002 SPP DLDRT T K D D Ectothiorhodospira sp. PHS-1 SPS/ SPP DIDNT T K D D Thiorhodospira sibirica SPS/ SPP DIDNT T K D D Thioalkalivibrio sulfidiphilus SPS/ SPP DIDNT T K D D Thiorhodovibrio sp. 970 SPS/ SPP DLDQN T K G D

73 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

Table S4 Genome organization of sucrose metabolizing genes in bacteria. Genomic location of the genes is given between brackets. SuSy: Sucrose Synthase, inv: invertase, SPS: Sucrose Phopshate Synthase, SPP: Sucrose 6-Phosphate Phosphatase.

Name organism Phylum Genomic location of sucrose metabolizing genes

Acaryochloris marina Cyanobacteria SPP (spp AM1_2884), SP (AM1_3649), AM1_6048 (SuSy) Anabaena cylindrica Cyanobacteria SPS/SPP (Anacy_5147), SPP (Anacy_4941), SuSy (Anacy_1886), SuSy (Anacy_4232) Cyanobacteria SPP (ANA_C10157), SuSy (ANA_C11946), SPP (ANA_C12945), SPP (ANA_C12946), inv Anabaena sp. 90 (ANA_C12319), SPS (ANA_C12881) Anabaena variabilis PCC 7937 Cyanobacteria SuSy (Ava_2283), SuSy (Ava_3753), SPP (Ava_2821) Arthrospira maxima CS328 Cyanobacteria SuSy (AmaxDRAFT_0499), amylase (AmaxDRAFT_2182) Arthrospira platensis Cyanobacteria Glycoside (NIES39_D05340), SuSy (NIES39_O01860) Arthrospira sp. PCC 8005 Cyanobacteria SuSy (ARTHRO_9000), Putative glucosyltransferase ( ARTHRO_910008) Calothrix sp. PCC 6303 Cyanobacteria SPS (Cal6303_0199), SPP (Cal6303_0610), SuSy (Cal6303_2136) Calothrix sp. PCC 7507 Cyanobacteria SPP (Cal7507_2549), SuSy (Cal7507_2215), SuSy (Cal7507_5465) Chroococcidiopsis thermalis Cyanobacteria SuSy (Chro_5323), inv (Chro_0977), inv (Chro_4248), SPP (Chro_0680) Coleofasciculus chthonoplastes Cyanobacteria SPP (MC7420_3029), SuSY (MC7420_7584), amylase (MC7420_5392) Crinalium epipsammum Cyanobacteria SPP (Cri9333_1967), SuSy (Cri9333_2952) Cyanothece sp. PCC 7424 Cyanobacteria SuSy (PCC7424_3776), SPP (PCC7424_4476) Cyanothece sp. PCC 7425 Cyanobacteria SuSy (Cyan7425_1752), SuSy (Cyan7425_3916), SPP (Cyan7425_4592) Cylindrospermum stagnale Cyanobacteria Cylst_0594 SPP, Cylst_0728 SuSy, Cylst_1832 SuSy, Cylst_1866 (SPS/SPP) Dactylococcopsis salina PCC 8305 Cyanobacteria SPP (Dacsa_0649), SuSy (Dacsa_2708) Cyanobacteria SPP (FJSC11DRAFT_2986), SuSy (FJSC11DRAFT_1280), SuSy Cyanobacteria (FJSC11DRAFT_0905), Fischerella sp. jsc11 SuSy (FJSC11DRAFT_4368) Geitlerinema sp. PCC 7407 Cyanobacteria SuSy (GEI7407_2715), SPP (GEI7407_1610), SPP (GEI7407_2223) Gloeocapsa sp. PCC 7428 Cyanobacteria SPP (Glo7428_0776), SuSy (Glo7428_4260), SuSy (Glo7428_0408) Halothece sp. PCC 7418 Cyanobacteria SuSy (PCC7418_2272), SPP (PCC7418_1364) Microcoleus sp. PCC 7113 Cyanobacteria SPP (Mic7113_1367), SuSy (Mic7113_3751), SuSy (Mic7113_2737)

74 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

Microcoleus vaginatus Cyanobacteria spp (micvadraft_5193), susy (micvadraft_1609) Microcystis aeruginosa DIANCHI905 Cyanobacteria spp (c789_5228), susy (c789_5227), sps (c789_5226) Microcystis aeruginosa PCC 7806 Cyanobacteria spp (ipf_1566), susy (ipf_1565), sps (ipf_1564) Moorea producens 3L Cyanobacteria spp (lyngbm3l_73410), inv (lyngbm3l_40680), susy (lyngbm3l_74640), susy (lyngbm3l_31670) Nodularia spumigena ccy9414 Cyanobacteria spp (nsp_21420), susy (nsp_48150), susy (nsp_24720), sps (nsp_8740) Nostoc azollae 0708 Cyanobacteria spp (aazo_2470), susy (aazo_0215) Cyanobacteria sps (npun_r4579), sps (npun_f3065), sps (npun_r1558), spp (npun_f3066), spp (npun_r3569), susy Nostoc punctiforme PCC 73102 (npun_f1876), susy (npun_f4951), inv (npun_f4643), inv (npun_f1611), sps (npun_f5502), sps (npun_f4709) Nostoc sp. PCC 7107 Cyanobacteria sps/spp (nos7107_1808), spp (nos7107_4134), susy (nos7107_1514), susy (nos7107_2802), Nostoc sp. PCC 7524 Cyanobacteria spp (nos7524_0250), susy (nos7524_2894), susy (nos7524_5556), sps/spp (nos7524_3554) Oscillatoria acuminata Cyanobacteria spp (oscil6304_5556), susy (oscil6304_3602) Oscillatoria nigroviridis Cyanobacteria spp (osc7112_3132), susy (osc7112_4334) Oscillatoriales cyanobacterium JSC12 Cyanobacteria spp (osccydraft_4427), susy (osccydraft_3064) Pleurocapsa sp. PCC 7327 Cyanobacteria spp (ple7327_3241), glycosidase (ple7327_1355), susy (ple7327_0671), susy (ple7327_2003) Pseudanabaena sp. PCC 7367 Cyanobacteria susy (pse7367_0339) Rivularia sp. PCC 7116 Cyanobacteria spp (riv7116_6091), susy (riv7116_6202), susy (riv7116_3519) Stanieria cyanosphaera Cyanobacteria spp (sta7437_2821), sp (sta7437_2017), susy (sta7437_0897) Thermosynechococcus elongatus Cyanobacteria sps (tlr0582), susy (tlr1047) Acidithiobacillus caldus ATCC 51756 Proteobacteria susy, sps, pfkb (acaty_c1477; acaty_c1478; acaty_c1479) Acidithiobacillus ferrivorans SS3 Proteobacteria susy, sps, pfkb (acife_1676; acife_1677; acife_1678) Acidithiobacillus ferrooxidans ATCC 23270 Proteobacteria susy, sps, pfkb (afe_1550; afe_1551; afe_1553) Acidithiobacillus ferrooxidans ATCC 53993 Proteobacteria susy, sps, pfkb (lferr_1266; lferr_1267; lferr_1268) Desulfococcus multivorans Proteobacteria susy, sps, pfkb (dsmv_1862; dsmv_1863; dsmv_1864)

Desulfurivibrio alkaliphilus Proteobacteria susy, sps, pfkb (daaht2_1335; daaht2_1336; daaht2_1337)

75 Chapter 2: Identification and characterization of novel bacterial SuSy enzymes

Desulfuromonas acetoxidans DSM 684 Proteobacteria susy, sps, pfkb (dace_1805; dace_1806; dace_1807)

Ectothiorhodospira sp. PHS-1 Proteobacteria sps (ectphs_06312), susy (ectphs_11954), pfkb (ectphs_02039, ectphs_08768 and ectphs_11984) susy, sps, pfkb, glucosamine/fructose-6-phosphate aminotransferase, udp-n-acetylglucosamine Nitrosococcus halophilus Proteobacteria pyrophosphorylase (nhal_3940; nhal_3941; nhal_3942; nhal_3943; nhal_3944) Nitrosococcus oceani Proteobacteria susy, sps, pfkb (noc_3068; noc_3069; noc_3070) Nitrosococcus watsonii Proteobacteria susy, sps, pfkb (nwat_3123; nwat_3124; nwat_3125) Nitrosomonas europaea Proteobacteria susy, sps, pfkb (ne1212; ne1213; ne1214) Nitrosomonas eutropha Proteobacteria susy, sps, pfkb (neut_1079; neut_1080; neut_1081) Nitrosospira multiformis Proteobacteria susy, sps, pfkb (nmul_a2266; nmul_a2267; nmul_a2268) Thioalkalivibrio sulfidiphilus HL-EbGR7 Proteobacteria sps (tgr7_0708), susy (tgr7_0177); pfkb (tgr7_0173, tgr7_0113 and tgr7_0095) Thiorhodospira sibirica ATCC 700588 Proteobacteria sps (thisidraft_2810), susy (thisidraft_1007), pfkb (thisidraft_0111) Thiorhodovibrio sp. 970 Proteobacteria susy, sps, pfkb (thi970draft_03306, thi970draft_02291, thi970draft_02606) Nitrospina gracilis Nitrospinae susy, sps, pfkb (nitgr_950071; nitgr_950072; nitgr_950073 Denitrovibrio acetiphilus Deferribacteres susy, sps, pfkb (dacet_2942; dacet_2943; dacet_2944) Melioribacter roseus Ignavibacteria susy, sps, pfkb (mros_1314; mros_1315; mros_1316) Desulfurispirillum indicum Chrysiogenetes susy, sps, pfkb (selin_2399; selin_2400; selin_2401) Dethiobacter alkaliphilus AHT 1 Firmicutes susy, sps, pfkb (dealdraft_1750; dealdraft_1751; dealdraft_1752)

76

CHAPTER 3: Improving the affinity of a bacterial SuSy for the nucleotide acceptor UDP

This work has partly been published as ‘Sequence determinants of nucleotide binding in Sucrose Synthase: improving the affinity of a bacterial Sucrose Synthase for UDP by introducing plant residues’ (Diricks et. al, 2016)374.

77 Chapter 3: Improving the affinity of a bacterial SuSy for the nucleotide acceptor UDP

1 Abstract

Biochemical characterization of several plant and bacterial SuSys has revealed that the eukaryotic enzymes preferentially use UDP whereas prokaryotic SuSys prefer ADP as acceptor. In this study, SuSy from the bacterium A. caldus (SuSyAc), which has a higher affinity for ADP as reflected by the 45-fold lower Km value compared to UDP, was used as a test case to scrutinize the effect of introducing plant residues at positions in a putative nucleotide binding motif surrounding the nucleobase ring of NDP. All eight single to sextuple mutants had similar activities as the wild-type enzyme but significantly reduced Km values for UDP (up to 55 times). In addition we recognized that substrate inhibition by UDP is introduced by a methionine at position 637. The affinity for ADP also increased for all but one variant, although the improvement was much smaller compared to UDP. Further characterization of a double mutant also revealed more than two-fold reduction in Km values for CDP and GDP. This demonstrates the general impact of the motif on nucleotide binding. Furthermore, this research also led to the establishment of a bacterial SuSy variant that is suitable for the recycling of UDP during glycosylation reactions. The latter was successfully demonstrated by combining this variant with a C-glycosyltransferase from Oryza sativa (OsCGT) in a one-pot reaction for the production of the C-glucoside nothofagin, a health- promoting flavonoid naturally found in rooibos (tea).

78 Chapter 3: Improving the affinity of a bacterial SuSy for the nucleotide acceptor UDP

2 Introduction

Sucrose is a major photosynthetic end-product in plants and plays an important role in their development, growth, carbon storage, stress protection, and signal transduction262. One of the enzymes involved in its metabolism is Sucrose Synthase (SuSy, EC 2.4.1.13), which catalyzes the reversible conversion of NDP and sucrose into NDP-Glc and fructose. The first report of this enzyme dates back to 1955 and subsequent research was mainly focused on plant SuSys83. Forty-four years later, the first prokaryotic SuSy was purified from the cyanobacterial Anabaena and recently also SuSys from non-photosynthetic bacteria were characterized (Chapter 2)43,106.

Besides its important physiological role, SuSy also has a lot of potential in industrial context. Indeed, SuSy is perfectly suited for the production of expensive nucleotide sugars, NDP-Glc in particular, starting from the cheap and abundant substrate sucrose or for the production of sucrose analogues84,90,375. Furthermore, plant and cyanobacterial SuSys have been extensively used in coupled processes together with other GTs to synthesize glycosidic bonds in a cost- effective way16,69–71,84,285–289,376–383. The resulting products comprise valuable oligosaccharides and polysaccharides as well as glycosides and glycoconjugates with applications in food, feed, pharmaceutical and cosmetic industry84,148,149.

Although several nucleoside diphosphates (UDP, CDP, GDP, ADP, TDP, dTDP) have been shown to act as acceptor nucleotides for SuSy in vitro, biochemical characterization has revealed that plant enzymes preferentially use UDP whereas the small amount of data available for bacterial SuSys points towards a preference for ADP. SuSyAc, for example, has a Km value for UDP (7.8 mM) which is 45 times higher compared to ADP (0.17 mM) (Chapter 2)43. In contrast, Km values for UDP determined with plant SuSys range between 0.005 and 0.4 mM (Table S5). The preference for ADP has been linked to a role of SuSy in controlling the carbon flux between sucrose and glycogen134,284. To identify the determinants affecting nucleotide binding, several residues in the vicinity of the nucleotide acceptor were mutated in SuSyAc. Besides its fundamental importance, this research also led to the creation of an enzyme variant with excellent properties for use in coupled glycosylation reactions.

3 Materials and methods

3.1 Amino acid distribution

All amino acid sequences annotated as Sucrose Synthase were retrieved from the UniProtKB database. If multiple isoforms were available for one species, they were all included in the analysis. Sequences that were not unique, did not start with a methionine, were too long (>2000 amino acids), too short (<600 amino acids) or contained undefined amino acids were removed. In total, 85 prokaryotic sequences and 413 plant sequences were retained and aligned separately with Clustal Omega (default parameters) 297. To calculate the amino acid distribution at positions within this alignment, a self-written python script was used.

79 Chapter 3: Improving the affinity of a bacterial SuSy for the nucleotide acceptor UDP

3.2 Site-directed mutagenesis

Template DNA from SuSyAc (Chapter 2) was used to construct all QN variants (Table 8). They are provided with a C-terminal His6-tag and cloned into a constitutive pXCP34h expression vector300.

Table 8 QN variants of SuSyAc and their specific mutations (underlined).

Name Mutations

SuSyAc QLDKTVN L636Q + A642N SuSyAc LMDRVVA L637M + K639R + T640V SuSyAc LMDKVVA L637M + T640V SuSyAc LLDRTVA K639R SuSyAc LMDKTVA L637M SuSyAc LLDKVVA T640V SuSyAc QLDKTRN L636Q + V641R + A642N SuSyAc QLDRTRN L636Q + K639R + V641R + A642N SuSyAc QMDRVRN or QN6* L636Q + L637M + K639R + T640V + V641R + A642N

For SuSyAc QLDRTRN and QMDRVRN, SusyAc QLDKTRN was used as template while the others were constructed using SuSyAc WT plasmid. Site-directed mutations were introduced with a modified two-stage megaprimer based whole plasmid PCR method384. In each case, oMEMO351_RV_5'rrnB T2 (5’-AAAGGGAATAAGGGCGACAC-3’) was used as reverse primer and forward primers are described in Table 9.

Table 9 List of forward primers used for site-directed mutagenesis of SuSyAc. Codons subjected to mutagenesis are underlined.

Nr. Name Sequence (5’  3’)

CTGCGCTGGGTTGGTGCACAGCTGGATAAAACCGTCAACGGCGAACTG 48 SuSyAc QLDKTVN TATCGTG 59 SuSyAc LMDRVVA TGGGTTGGTGCACTGATGGATCGTGTTGTCGCTGGCGAACTG 105 SuSyAc LMDKVVA TGGGTTGGTGCACTGATGGATAAAGTGGTCGCTGGCGAACTG 102 SuSyAc LLDRTVA GGTGCACTGCTGGATCGTACCGTCGCTGGCGAAC 103 SuSyAc LMDKTVA GCGCTGGGTTGGTGCACTGATGGATAAAACCGTCGC 104 SuSyAc LLDKVVA GCACTGCTGGATAAAGTGGTCGCTGGCGAACTG CTGCGCTGGGTTGGTGCACAGCTGGATAAAACCCGTAACGGCGAACTG 49 SuSyAc QLDKTRN TATCGTG 99 SuSyAc QLDRTRN GGTGCACAGCTGGATCGTACCCGTAACGGCGAAC 100 SuSyAc QMDRVRN TGGGTTGGTGCACAGATGGATCGTGTTCGTAACGGCGAACTG

The PCR mix contained 1x Q5 reaction buffer, 0.02 U/µL Q5 High-Fidelity DNA Polymerase

80 Chapter 3: Improving the affinity of a bacterial SuSy for the nucleotide acceptor UDP

(Bioke), 0.2 mM dNTP mix, 0.1-1 ng template plasmid DNA, 0.5 µM forward and reverse primer in a total volume of 50 μl. The amplification program started with an initial denaturation of 30 sec at 98°C, followed by 5 cycles of denaturation for 10 sec at 98°C, annealing for 20 sec (3°C above the minimal melting temperature of all primers) and extension for 30 sec/kb (size megaprimer) at 72°C. The second stage consisted of 25 cycles of 10 sec at 98°C and extension for 1 min/kb (size whole plasmid) at 72°C and one final extension of 10 min at 72°C. After digestion of the template DNA by DpnI (Westburg) and PCR purification, mutant plasmids were transformed in electrocompetent E. coli BL21 (DE3) cells.

3.3 Enzyme production and purification

His6-tagged SuSyAc WT and SuSyAc variants were constitutively expressed in E. coli BL21 (DE3) and purified by Ni-NTA chromatography according to the protocol previously described (Chapter 2)43. The OsCGT (UniProt ID A1XFD9), cloned into an inducible pET-STRP3 vector, was kindly provided by the group of Prof. Robert Edwards (Centre for Bioactive Chemistry, Durham University, UK)385. The enzyme was obtained from E. coli BL21-Gold(DE3) expression cultures and purified to apparent homogeneity using Strep-tag affinity chromatography as described by Gutmann and coworkers36 .

3.4 Characterization of variant SuSys

The BCA method (Chapter 2) was used to detect Fru, which is released by SuSy during the cleavage of Suc. One unit of SuSy activity is defined as the amount of enzyme that released 1 μmol of fructose min-1 under the specified conditions. Enzyme concentrations ranged from 0.5 – 2 mg/L. Due to these low protein concentrations, no significant background signal was observed with the BCA assay. Kinetic parameters for ADP and UDP were determined with 1 M or 200 mM Sucrose at 60°C in 100 mM MOPS pH 7.0. After the addition of enzyme to preheated substrate mix, six samples were taken during ten minutes. Apparent Km and Vmax values were calculated by non-linear regression of the Michaelis-Menten equation using Sigma Plot 11.0. Alternatively, substrate inhibition was fitted according to the equation described in Chapter 2.

3.5 Coupled reactions

Coupled reactions were carried out by the group of professor Bernd Nidetsky (Institute of Biotechnology and Biochemical Engineering, Graz University of Technology). C-glucosylation of 5 mM phloretin by 30 µg/mL OsCGT was coupled to UDP-glucose (re)generation from 0.5 mM UDP and 1 M sucrose by 10 µg/mL SuSyAc, SuSyAc LMDKVVA or SuSyGm. Phloretin was dissolved by adding 17.5 mM β-cyclodextrin and reactions were buffered at pH 7.5 with 50 mM

HEPES containing 50 mM KCl, 12 mM MgCl2 and 0.13% BSA. Conversions were performed on a scale of 500 µL in 1.5 mL reaction tubes at 50°C and started by adding enzymes to the preheated reaction solutions. To monitor the conversion, aliquots of 25 µL were withdrawn and enzymes were inactivated by mixing with 25 µL water and 50 µL acetonitrile. By centrifugation at 13200

81 Chapter 3: Improving the affinity of a bacterial SuSy for the nucleotide acceptor UDP rpm for 15 min precipitated proteins were removed. The concentrations of phloretin and nothofagin were determined by analyzing 5 µL of the supernatant with ion-pairing reversed-phase HPLC. A Kinetex™ C18 column (5µm, 100 Å, 50 x 4.6 mm) was used for HPLC analysis at 35°C. 20 mM potassium phosphate, pH 5.9 containing 40 mM TBAB were used as mobile phase A and acetonitrile was used as mobile phase B. Separation was achieved using following method at a constant flow rate of 2 mL min-1: 10% B (1 min), 10-50% B (4 min), 50-80% B (0.01 min), 80% B (0.99 min), 80-10% B (0.01 min), 10% B (1.49 min). Phloretin and nothofagin were detected at 288 nm.

3.6 Homology modeling

To model the structure of SuSyAc, the I-TASSER386,387 server for protein and structure prediction server from the Zhang-lab was used with the crystal structure of SuSyAt1 (PDB ID 3S27, chain A) as template. With a confidence score (C-score) of 2, the homology model can be considered of high quality. The C-score is calculated based on the significance of threading template alignments and the convergence parameters of the structure assembly simulations. The C-score is typically in the range of [-5,2], where a high value signifies a model of high quality386,387. To evaluate the interactions of SuSyAc with the nucleotide substrate, the homology model was superposed in PyMOL388 (super command) with the crystal structure of SuSyAt1, which includes UDP.

82 Chapter 3: Improving the affinity of a bacterial SuSy for the nucleotide acceptor UDP

4 Results and discussion

4.1 Nucleotide preference and its relation to the QN motif

Plant SuSys are known to prefer UDP, although they are also able to use other nucleotides such as ADP, CDP, GDP and (d)TDP68,119,120,290–292,328,378,389–391. In contrast, most of the bacterial SuSys prefer ADP43,95,325. However, the definition of nucleotide preference has not yet been very well defined and sometimes depends on the kinetic parameter under consideration. To have a better view on this matter, a summary of all kinetic parameters: the Michaelis-Menten constant Km, the maximal velocity Vmax and the overall catalytic efficiency kcat/Km of SuSys with data available for both UDP and ADP is provided in supplementary materials (Table S5, Table S6 and Table S7). All plant SuSys display higher maximal activities with UDP in the breakdown direction of sucrose but only 64% have a higher affinity for UDP compared to ADP. Almost all of the bacterial SuSys, on the other hand, show a higher affinity for ADP, but do not necessarily have a higher maximal activitiy with this substrate. It has to be noted that differences between e.g. Km values for ADP and UDP are sometimes very small, indicating that one should be careful when stating that an enzyme has a certain ‘preference’. Nonetheless, one of the most clear examples of ADP preference is provided by the bacterial SuSyAc. Indeed, this enzyme has a Km value for UDP (7.8 mM) that is 45 times higher compared to ADP (0.17 mM) and it also shows a slightly higher maximal activity with the latter (Chapter 2).

Currently, crystal structures of SuSy enzymes from two different organisms - the bacterium Nitrosomonas europaea (SuSyNe) and the plant Arabidopsis thaliana (SuSyAt1, isoform 1) - have been solved48,95. However, the structure of SuSyNe is in an open form as it was crystallized without substrates in contrast to SuSyAt1 where UDP and fructose (PDB 3S27) or UDP and glucosyl intermediates (PDB 3S28) are trapped within a closed structure. Upon closing, conformational changes occur, resulting in stronger interactions with the nucleotide95. Consequently, to unravel the determinants underlying the difference in nucleotide specificity between prokaryotic and eukaryotic SuSys, residues surrounding the nucleobase ring of UDP were first determined using the crystal structure of SuSyAt1 and subsequently subjected to mutagenesis in SuSyAc.

In Figure 25, all positions surrounding the uracil ring of UDP are listed, together with the distribution of amino acids in plant and bacterial SuSy sequences. Six residues (position 282, 283, 565, 566, 567 and 638 in SuSyAc) are identical between SuSyAt1 and SuSyAc. Position 596 and 635 in SuSyAc do differ from those in SuSyAt1 but in 53% of the other bacterial SuSys, the former position is occupied by the same residue as in SuSyAt1 (Val) and the amino acid from SuSyAc (Ala) occurring at the other position can be found in 54% of the plant SuSys. Consequently, these eight positions were not included in the mutagenesis strategy.

83 Chapter 3: Improving the affinity of a bacterial SuSy for the nucleotide acceptor UDP

Figure 25 Amino acid distribution of 413 plant (upper part) and 85 bacterial (lower part) SuSys at positions around the nucleotide substrate including those constituting the QN motif (highlighted in blue). Residues within 4 Å of the uracil moiety of UDP (trapped within the crystal structure of SuSyAt1) are marked with an asterisk. The amino acid sequences (and residue numbering schemes) of SuSyAt1 (P49040) and SuSyAc (A0A059ZV61) were chosen as plant and bacterial representatives, respectively. Blue: basic residues, red: acidic residues; green: polar uncharged residues; orange: hydrophobic and aromatic residues; grey: special cases.

84 Chapter 3: Improving the affinity of a bacterial SuSy for the nucleotide acceptor UDP

The crystal structure of SuSyAt1 revealed that only the main chain of Gln-648 (Q) and the side chain of Asn-654 (N), residues which are highly conserved in plant SuSys, make hydrogen bonds with the uracil moiety of UDP (Figure 26A). These two amino acids flank a motif of seven residues in total, hereinafter referred to as the ‘QN motif’, and are situated in the catalytic GT-BC domain of the SuSy enzyme (Figure S7). In SuSyAc, the last residue of the motif (Ala-642) is not able to form a hydrogen bond with UDP because of its hydrophobic side chain which could explain the low affinity for this nucleotide (Figure 26B). Recently, Wu and coworkers suggested that the residues in SuSyNe corresponding to Gln-648 and Asn-654 in SuSyAt1 could be responsible for the preference towards the bulkier ADP substrate by creating a larger binding site95.

Figure 26 Structural representation of the QN motif of SuSyAt1 (A) and SuSyAc (B) using a crystal structure (PDB ID 3S27) and a homology model, respectively. Possible hydrogen bonds are represented by dashed yellow lines. N, O and P atoms are colored blue, red and orange, respectively. C atoms of UDP are colored green while C atoms of the residues within the QN domain are colored yellow or orange (first and last residue of the motif).

Interestingly, the distribution of amino acids in the QN motif also differs significantly between plants and bacteria (Figure 25). In plants, five out of seven residues (648, 650, 651, 653, 654), including those involved in hydrogen bonding, are highly conserved while in bacteria the residues are highly variable (except for Leu-637 and Lys-639). Furthermore, the most prevalent amino acids observed in plant sequences, rarely occur in bacterial ones. Taken all together, these observations strongly indicate the role of the QN motif in nucleotide preference.

4.2 Mutational analysis of the QN motif

To determine which residues have an effect on nucleotide binding, several amino acids in the QN motif of SuSyAc (LLDKTVA) were replaced by those occurring in SuSyAt1, which can be regarded as a representative sequence for plant SuSys. In total, eight variants were constructed:

85 Chapter 3: Improving the affinity of a bacterial SuSy for the nucleotide acceptor UDP three single, two double, one triple, one quadruple and one sextuple mutant. Two of these variants, QLDRTRN and QLDKTVN, contain plant residues that are highly conserved and include the two residues making hydrogen bonds with UDP. LLDRTVA has a highly conserved residue of plants that does not participate in hydrogen bonding. LMDKVVA, LLDKVVA and LMDKTVA contain mutations which are less conserved in plants and variants LMDRVVA and QMDRVRN (QN6*, complete SuSyAt1 QN motif) have a combination of conserved and non-conserved residues (Figure 25). Positions in the QN motif that are mutated are underlined.

SDS-PAGE analysis revealed that all mutants were expressed in similar amounts compared to the WT (Figure S8). After His-tag purification, the kinetic parameters for both UDP and ADP were determined for each variant and results can be found in Figure 27 and Table S8.

Figure 27 Kinetic parameters of SuSyAc and variants with UDP and ADP in the presence of 1 M sucrose at 60°C (100 mM MOPS pH 7.0). Km values are reported in mM (A), Vmax values in U/mg (B).

Compared to the wild-type (WT) enzyme, all variants had a significantly higher affinity for UDP in the presence of 1 M sucrose and half of them also showed a slightly higher activity. The Km values were in the range from 0.13 - 1.42 mM, which is comparable to the values reported for plant enzymes. Introduction of the complete SuSyAt1 QN sequence (QMDRVRN) in SuSyAc reduced the Km value to 0.37 mM, which is nearly identical to that of SuSyAt1 determined by Baroja-Fernandez and coworkers392. Double mutant LMDKVVA (L637M + T640V) exhibited the highest (55-fold) improvement in Km for UDP. Although the affinity for ADP also increased for all variants, except for QMDRVRN, the improvement is much smaller compared to UDP. Only half of the variants displayed a higher Vmax with UDP than with ADP. It has to be noted that for some variants, the affinity for UDP clearly depended on the concentration of the co-substrate sucrose. The Km value for UDP of the QLDKTVN double mutant, for example, appeared to be 25 times higher with 200 mM sucrose compared to 1 M sucrose (Table S8). Conversely, the effect of the

86 Chapter 3: Improving the affinity of a bacterial SuSy for the nucleotide acceptor UDP concentration of sucrose on the affinity for UDP of the WT enzyme and the sixtuple mutant QMDRVRN was insignificant.

Another interesting difference between the variants is the effect of high UDP concentrations on their activity. SuSyAc WT, QLDKTVN, QLDRTRN, LLDRTVA and LLDKVVA, showed no significant inhibition of activity below 20 mM UDP. In contrast, QMDRVRN, LMDRVVA, LMDKVVA and LMDKTVA displayed clear substrate inhibition (Figure 28). These enzymes all have one mutation in common: L637M, providing direct evidence that this residue is responsible for the observed inhibition profiles.

Figure 28 The effect of varying UDP concentration on the activity of SuSyAc WT and SuSyAc LMDRVVA.

It is quite surprising that positions such as Leu-637 (2nd of the motif) and Thr-640 (5th residue of the motif), which are highly variable in plant enzymes, can affect the affinity for UDP so drastically since these residues are not involved in hydrogen bonding and even point away from the nucleotide substrate (Figure 26B). However, it has been suggested that Leu-637 stabilizes the closed conformation of the enzyme95. Furthermore, our findings could possibly explain why different SuSy isoforms have different kinetic parameters as observed, for example, for SUS1 and SUS3 of A. thaliana or SUS1 and SUS2 of P. pyrifolia (Table S5, Table S6 and Table S7). Indeed, within one species, isozymes only differ on the 2nd, 3th and 5th position of the QN motif (Figure S9). Multiple isoforms are currently only identified in the genome of plants and some cyanobacteria, whereas the other bacteria only express one SuSy enzyme43,120,292. In plants, these isoforms are differently expressed either spatially, developmentally, and/or in response to abiotic factors108,110,111,393. In addition, several studies have indicated that they contribute differently to cellulose and starch biosynthesis, which requires UDP-glucose and ADP-glucose, respectively, although this has not yet been linked to their kinetic properties or amino acid sequence105,108,118. However, our mutagenesis results, together with the sequence analysis of

87 Chapter 3: Improving the affinity of a bacterial SuSy for the nucleotide acceptor UDP multiple isoforms within one species strongly suggest a possible correlation between the sequence of the QN motif and the function of the SuSy isoform in vivo.

To study the effect of the motif on CDP and GDP binding, the best performing variant LMDKVVA, exhibiting the lowest Km for UDP and one of the highest associated maximal velocities was used as test case (Table S9 and Figure S10). SuSyAc WT has a higher affinity for GDP, which is just like ADP a purine derivative, but the maximum velocity is higher with CDP. Neither SuSyAc WT nor the variant showed substrate inhibition below 25 mM CDP/GDP but the variant had a two-fold improved affinity for both CDP and GDP. This demonstrates the general impact of the QN motif on nucleotide binding although the exact mechanism remains unclear.

4.3 Coupled reaction between SuSy and a C-glycosyltransferase

Plant and cyanobacterial SuSys have been extensively used in coupled processes together with GTs to create valuable glycosides in a cost-effective way16,69–71,84,285–289,376–383. In such a one-pot reaction, SuSy provides and regenerates the expensive UDP-Glc in situ, which is subsequently used as donor substrate by a GT that attaches the sugar moiety to an acceptor thereby altering its pharmacokinetic properties such as solubility, stability or bioactivity394. Using this strategy, laborious isolation of nucleotide sugars can be bypassed and only catalytic amounts of the expensive nucleotide has to be supplied. Furthermore, conversion efficiencies are increased as reverse glycosylation and inhibition of GT by high concentrations of UDP is suppressed84,288,395– 397. To create an efficient and cost-effective SuSy/GT coupled process, it is thus of utmost importance that only a low amount of UDP has to be supplied, requiring a SuSy enzyme with high affinity for UDP combined with a high Vmax. To demonstrate this, SuSyAc WT and double mutant LMDKVVA were evaluated in a cascade reaction together with a C-glycosyltransferase from Oryza sativa (OsCGT) (Figure 29).

Figure 29 Schematic representation of the coupled reaction between OsCGT and SuSy for the production of the C-glucoside nothofagin.

88 Chapter 3: Improving the affinity of a bacterial SuSy for the nucleotide acceptor UDP

Reactions were performed by the research group of professor Bernd Nidetsky (Graz University of Technology/ACIB) as they had successfully used OsCGT before to glycosylate the dihydrochalcone acceptor phloretin resulting in the production of nothofagin16,398. The C-glucoside nothofagin, which is naturally found in rooibos (tea), displays interesting properties such as anti- oxidant (and anti-inflammatory) activity, making it an attractive neutraceutical399.

To overcome the poor solubility of the acceptor phloretin, β-cyclodextrin was used to dissolve this compound398. A ten-fold excess of phloretin (5 mM) over UDP (0.5 mM) was applied to reduce the costs of the nucleotide and to avoid potential GT inhibition by UDP. Furthermore, to avoid that the overall conversion is limited by the GT module, the concentration of OsCGT (30 µg/mL) was three times higher than that of the respective SuSy (10 µg/mL). Results of the coupled reactions can be found in Figure 30.

Figure 30 The production of the C-glucoside nothofagin in function of time as a result of the coupled reaction between OsCGT and SuSyAc (diamonds, black), double mutant LMDKVVA (circles, dark gray) or the plant SuSyGm (squares, light grey) at 50°C pH 7.5 starting from 0.5 mM UDP, 5 mM phloretin and 1 M sucrose.

Irrespective of using SuSyAc WT or the LMDKVVA variant, a linear increase of nothofagin concentration over time was observed. However, by replacing the WT enzyme with the variant, we were able to increase the nothofagin production rate by roughly 9-fold from 91 to 825 µM/h. Using the variant LMDKVVA, more than 99% of initially applied phloretin was converted to nothofagin within 6.5 h. With SuSyAc WT, only around 0.8 mM (16%) nothofagin was formed within the observed time spam of 9.2 h whereas the same conversion was already reached within less than 1 h with the variant. These results clearly demonstrate that the increased affinity of SuSyAc variants for UDP can be translated to improved performance in coupled reactions.

To be able to compare the performance of this improved mutant with that of existing plant enzymes, the coupled reaction was also conducted with SuSy from Glycine max (SuSyGm) under

89 Chapter 3: Improving the affinity of a bacterial SuSy for the nucleotide acceptor UDP the same conditions (Figure 30)400. SuSyGm was not expected to be very active in this reaction because of its low stability at higher temperatures86. Nonetheless, the production rate of nothofagin in the OsCGT-SuSyGm cascade appeared to be more than three times faster compared to SuSyAc LMDKVVA. The fact that SuSyGm was still highly active at the reaction temperature of 50°C, could be explained by the concentration of the stabilizer sucrose and pH, which were both higher in this reaction compared to previous experiments were the temperature optimum of SuSyGm was determined86,400.

5 Conclusions

In this study, several residues in the QN motif of the bacterial ADP-preferring SuSyAc were exchanged by those occurring in the plant enzyme SuSyAt1. Eight variants were constructed, containing either highly conserved residues of plants, non-conserved residues or a combination of both and all of them were expressed in similar amounts as SuSyAc. The mutants all had a significant higher affinity for UDP compared to the WT at high concentrations of sucrose and half of them also had a slightly higher maximal velocity. In addition, the variants also had altered kinetic parameters for ADP, CDP and GDP, demonstrating the general impact of the motif on nucleotide binding. Moreover, these results indicate a possible link between the sequence of the QN motif and the different functions of multiple isoforms in plants. The best variant, SuSyAc LMDKVVA, showed an affinity for UDP of 0.13 mM, which is a 55-fold improvement compared to the WT, although substrate inhibition was observed at higher concentrations (Ki≈36 mM). The increased affinity of this mutant for UDP could be translated to an improved performance in coupled reactions with OsCGT as demonstrated by the 9-fold higher production rate of nothofagin compared to the WT. Despite these succesfull results, the wild-type plant SuSy from Soybean (SuSyGm) still performed better than LMDKVVA in the same coupled reaction.

90 Chapter 3: Improving the affinity of a bacterial SuSy for the nucleotide acceptor UDP

6 Supplementary materials

Figure S7 (Upper part) Schematic representation of the domain organization in SuSyAt1 and position of the QN motif. (Lower part) Visualization of the QN motif in the crystal structure of SuSyAt1 (3S27). CTD: cellular targeting domain, EPBD: ENOD40 peptide-binding domain, GT-BN and 48 GT-BC: catalytic glycosyltransferase N and C-terminal domains, respectively .

Figure S8 SDS-PAGE analysis of four QN variants. Lane 1: CCE SuSyAc WT. lane 2-5 and lane 6-9 contain CCE and purified enzymes, respectively. Lane 2 and 6: LMDKVVA, lane 3 and 7: QLDKTVN, lane 4 and 8: LMDKTVA, lane 5 and 9: LLDRTVA.

91 Chapter 3: Improving the affinity of a bacterial SuSy for the nucleotide acceptor UDP

Isoform QN motif Accession number (UniProtKB) (gene name)

P49040 sus1 Q M D R V R N Q00917 sus2 Q M N R A R N Q9M111 sus3 Q T N R A R N Arabidopsis thaliana Q9LXL5 sus4 Q M N R V R N Q9FX32 sus5 Q T D R Y R N F4K5W8 sus6 Q T D R T R N

P49036 sus1 Q M N R V R N Zea Mays P04712 sus-sh1 Q M N R V R N A0A096T792 sus2 Q T N R A R N

O81610 ness Q M N R V R N Pisum Sativum O24301 sus2 Q T N R A R N Q9AVR8 sus3 Q M D R I R N

Figure S9 QN motif of different SuSy isoforms found in Arabidopsis thaliana, Zea Mays and Pisum Sativum. Conserved residues within the QN domain are highlighted in bold.

Figure S10 Kinetics of SuSyAc LMDKVVA in 100 mM MOPS pH 7.0 at 60°C with 1 M Suc.

92 Chapter 3: Improving the affinity of a bacterial SuSy for the nucleotide acceptor UDP

Table S5 Substrate affinity Km (mM) of characterized SuSys.

Fru Fru Suc Suc UDP ADP UDP- ADP- Condition (UDP- (ADP- (UDP) (ADP) Glc Glc Glc) Glc)

Plant SuSys

Arabidopsis thaliana SUS1 37°C b 120 53 185 0.39 0.17 (SuSyAt1) pH 7 b 120 37°C Arabidopsis thaliana SUS3 48 145 0.25 0.15 pH 7 a 68 30°C f f f Glycine max 31 n.d. 0.005 0.13 3.7 0.012 1.6 pH 7.5 a 119 37°C Hordeum Vulgare 30 210 0.22 0.15 pH 7 a,c 389 37°C Ipomoaea batatas 31 125 0.13 0.44 pH 6 a,c 401 37°C, Oryza Sativa 30 105 0.11 n.d. pH 6 a 390 37°C c f Oryza Sativa 290 400 0.8 3.3 6.9 40 5.3 3.8 pH 6d/8e a 290 25°C Phaseolus aureus 17 29 0.19 0.19 pH 7.5 a 391 25°C Prunus persica d e 62.5 n.d. 0.08 0.05 4.8 0.033 0.67 pH 7 /8.5 a 292 30°C Pyrus pyrifolia (SUS1) 37.6 n.d. 0.07 0.11 18.6 0.05 0.17 pH 7d/8e a 292 30°C Pyrus pyrifolia (SUS2) d e 36 n.d. 0.41 1.94 9.77 0.03 0.78 pH 7 /8 Solanum tuberosum (SUS4)b 37°C 119 35 220 0.22 0.3 pH 7 a 328 25°C Vicia faba 169 n.d. 0.21 0.21 n.d. n.d. n.d. n.d. pH 7 a 291 30°C f f Zea mays 40 0.14 1.25 pH 6.5 Bacterial SuSys

b 43 60°C Acidithiobacillus caldus 94 n.d. 7.8 0.17 pH 7 30°C Anabaena sp. strain PCC 7119 a 106 pH 303 305 1.25 1.15 52 4.2 2.7 1.3 SS2 d e 6.5 /7.5 30°C Anabaena sp. strain PCC 7119 b 293 pH 270 220 0.8 1.0 58 4.4 2.2 0.9 SS2 d e 6.5 /7.5 b 43 60°C Nitrosomonas europaea 321 40 0.69 0.44 pH 7 b 95 37°C Nitrosomonas europaea 120 5.6 0.89 0.04 pH 8 Thermosynechococcus 37°C b,c 325 1.3 0.18 12 5.6 1.7 0.03 elongatus pH 7 aPurified from natural source, bRecombinantly procuced in E. coli, cNon-Michaelis-Menten kinetics: Hill d e f equation (S0.5), Cleavage direction, Synthesis direction, Substrate inhibition

93 Chapter 3: Improving the affinity of a bacterial SuSy for the nucleotide acceptor UDP

Table S6 Vmax (U/mg) of characterized SuSys.

Fru Fru Suc Suc UDP ADP UDP- ADP- Condition (UDP- (ADP- (UDP) (ADP) Glc Glc Glc) Glc) Plant SuSys Arabidopsis thaliana SUS1 37°C b 120 585 290 585 290 (SuSyAt1) pH 7 b 120 37°C Arabidopsis thaliana SUS3 950 470 950 470 pH 7 a 68 30°C Glycine max 13.3 13.3 1.38 14.3 14.3 1.95 pH 7.5 a 119 37°C Hordeum Vulgare 290 190 290 190 pH 7 a 390 37°C Oryza Sativa d e 7.1 15.4 6.3 13.3 pH 6 /8 a 292 30°C Pyrus pyrifolia (SUS1) 3.56 3.59 2.85 3.42 3 1.95 pH 7 a 292 30°C Pyrus pyrifolia (SUS2) 2.4 2.28 0.82 2.86 2.5 1.58 pH 7 Solanum tuberosum (SUS4)b 37°C 119 80 65 80 65 pH7 a 328 25°C Vicia faba 2.03 0.496 pH 7 Bacterial SuSys

b 43 60°C Acidithiobacillus caldus 53.5 63.3 pH 7 b 43 60°C Denitrovibrio acetiphilus 6.3 125 pH 7 b 43 60°C Melioribacter roseus 3 3 pH 7 b 43 60°C Nitrosomonas europaea 96.7 42 103.8 31.9 pH 7 b 95 37°C Nitrosomonas europaea 2.8 4.8 4.3 3.7 pH 8 Thermosynechococcus 37°C b,c 325 2.2 1.28 2.9 1.1 2.9 1.1 elongatus pH 7 aPurified from natural source, bRecombinantly procuced in E. coli, cNon-Michaelis-Menten kinetics: Hill d e equation (S0.5), Cleavage direction, Synthesis direction

94 Chapter 3: Improving the affinity of a bacterial SuSy for the nucleotide acceptor UDP

Table S7 Catalytic efficiency kcat/Km (mM-1s-1) of characterized SuSys.

Fru Fru Suc Suc UDP ADP UDP- ADP- Condition (UDP- (ADP- (UDP) (ADP) Glc Glc Glc) Glc) Plant SuSys Arabidopsis thaliana SUS1 37°C b 120 17 2.4 2325 2647 (SuSyAt1) pH 7 b 120 37°C Arabidopsis thaliana SUS3 31 5 5880 4860 pH 7 a 68 30°C Glycine max 0.64 4000 16 5.8 1792 1.8 pH 7.5 a 119 37°C Hordeum Vulgare 15 1.4 1977 1900 pH 7 a 390 37°C Oryza Sativa d e 1.8 0.6 2.0 5.8 pH 6 /8 a 292 30°C Pyrus pyrifolia (SUS1) 0.13 71 35 0.25 84 16 pH 7 a 292 30°C Pyrus pyrifolia (SUS2) 0.09 7.8 0.6 0.41 117 2.8 pH 7 Solanum tuberosum (SUS4)b 37°C 119 3.5 0.45 558 332 pH7 a 328 25°C Vicia faba 15 3.7 pH 7 Bacterial SuSys

b 43 60°C Acidithiobacillus caldus 11 571 pH 7 b 43 60°C Nitrosomonas europaea 0.30 1.0 150 73 pH 7 b 95 37°C Nitrosomonas europaea 0.04 1.3 7.5 143 pH 8 Thermosynechococcus 37°C b,c 325 2.7 11 0.24 0.20 2.7 57 elongatus pH 7 aPurified from natural source, bRecombinantly procuced in E. coli, cNon-Michaelis-Menten kinetics: Hill d e equation (S0.5), Cleavage direction, Synthesis direction

95 Chapter 3: Improving the affinity of a bacterial SuSy for the nucleotide acceptor UDP

Table S8 Kinetic parameters of SuSyAc and variants with UDP and ADP in the presence of 1 M sucrose at 60°C (100 mM MOPS pH 7.0). Vmax values are expressed in U/mg and Km/Ki values in mM. If no inhibition was observed below 20 mM UDP, ‘/‘ is used. Mutations are bold and underlined.

Enzyme [Suc] #mutations Km ADP Ki ADP Km UDP Ki UDP Vmax ADP Vmax UDP

SuSyAc LLDKTVA (WT) 1 M 0 0.31 ± 0.01 18.9 ± 3.1 7.2 ± 1.5 / 78.7 ± 0.2 48.9 ± 5.1 SuSyAc LLDKTVA (WT) 200 mM 0 0.17 ± 0.05 19.1 ± 1.9 7.8 ± 1.1 / 63.3 ± 2.9 53.5 ± 5.4 SuSyAc QMDRVRN 1 M 6 0.75 ± 0.04 7.8 ± 1.7 0.37 ± 0.12 9.9 ± 1.9 116 ± 6 72.5 ± 7.1 SuSyAc QMDRVRN 200 mM 6 0.45 ± 0.12 17.4 ± 8.7 61.5 ± 5.6 SuSyAc LMDRVVA 1 M 3 0.27 ± 0.03 10.3 ± 2.6 0.42 ± 0.01 8.7 ± 2.8 59.9 ± 6.5 76.1 ± 4.5 SuSyAc QLDKTVN 1 M 2 0.17 ± 0.09 19.8 ± 2.2 0.79 ± 0.09 / 95.2 ± 4.9 36.0 ± 0.5 SuSyAc QLDKTVN 200 mM 2 20 ± 0.2 / 51.5 ± 2.4

SuSyAc QLDRTRN 1 M 4 0.23 ± 0.05 7.8 ± 2.0 1.42 ± 0.38 / 107 ± 9 51.2 ± 2.6 SuSyAc LMDKVVA 1 M 2 0.17 ± 0.04 11.5 ± 1.7 0.13 ± 0.03 35.7 ± 3.2 77.9 ± 8.1 90.3 ± 3.0 SuSyAc LLDRTVA 1 M 1 0.20 ± 0.03 16.1 ± 4.5 0.15 ± 0.02 / 29.3 ± 1.5 44.7 ± 0.3 SuSyAc LMDKTVA 1 M 1 0.27 ± 0.06 17.1 ± 0.6 0.20 ± 0.01 12.1 ± 0.9 50.4 ± 7.5 96.8 ± 0.7 SuSyAc LLDKVVA 1 M 1 0.18 ± 0.07 25.2 ± 2.1 0.69 ± 0.02 / 51.5 ± 4.8 33.5 ± 0.1

Table S9 Kinetic parameters of SuSyAc WT and double mutant LMDKVVA with CDP and GDP in the presence of 1 M sucrose at 60°C (100 mM MOPS pH 7.0).

Nucleotide Parameter SuSyAc LMDKVVA

Km (mM) 4.53 ± 0.36 1.94 ± 0.26 CDP Vmax (U//mg) 14.9 ± 0.6 13.6 ± 1.1

Km (mM) 1.23 ± 0.18 0.5 ± 0.1 GDP Vmax (U/mg) 9.42 ± 0.11 13.9 ± 0.3

96

CHAPTER 4: Driving the donor specificity of SuSy towards GalFru, for the efficient one-step production of UDP-galactose

97 Chapter 4: Driving the donor specificity of SuSy towards GalFru

1 Abstract

Glycosylation involves the addition of a sugar moiety to an acceptor compound and can be used as effective mechanism to improve pharmacokinetic properties such as solubility, stability or bioactivity394. Furthermore, galactosylation of drugs can be used to target them specifically to the liver402. One of the major hurdles of large-scale enzymatic glycosylation processes, is the high price of nucleotide-activated sugars, which are required by glycosyltransferases as glycosyl donor. In this respect, Sucrose Synthase (SuSy) is an interesting biocatalyst because of its ability to efficiently produce NDP-glucose sugars starting from the abundant and cheap substrate sucrose (α-D-glucopyranosyl-(1,2)-β-D-fructofuranoside). However, the production of nucleotide- activated sugars other than NDP-Glc, currently still relies on the use of additional enzymes (e.g. an epimerase). In this chapter, one-step production of UDP-galactose, using SuSy and GalFru (α- D-galactopyranosyl-(1→2)-β-D-fructofuranoside) as non-natural glycosyl donor, was evaluated and found to be very inefficient. Both a mutagenesis-dependent and independent strategy were applied in an attempt to improve the activity of SuSy from Acidithiobacillus caldus (SuSyAc) on GalFru. The latter involved the production of imprinted cross-linked enzyme aggregates (iCLEAs) while hotspots for mutagenesis were identified based on structural considerations, sequence analysis of existing galactosyltransferases (GalTs) and correlated mutation analysis. To screen the mutants as time- and cost-efficient as possible, a screening protocol without a purification step was successfully developed.

98 Chapter 4: Driving the donor specificity of SuSy towards GalFru

2 Introduction

Galactose (Gal) is an important unit of naturally occurring disaccharides (e.g. lactose), oligosaccharides (e.g. human milk oligosaccharides), (lipo)polysaccharides (e.g. agarose) and galacto-conjugates (e.g. quercetin 3-O-galactoside) which have applications in food (e.g. prebiotics), feed, cosmetic and pharmaceutical industries403–407. Attachment of a galactose moiety to bio-active compounds (e.g. quercetin) can alter their pharmacokinetic properties such as stability, solubility or bioactivity but can also be used as a mechanism to target this molecule specifically to the liver394,402. In nature, galactosylation is efficiently performed by galactosyltransferases. Unfortunately, these enzymes require the expensive UDP-Gal (2500 €/g) as donor substrate, which hampers their use for the cost-effective industrial production of galactosylated compounds. UDP-Gal can be produced enzymatically from cheaper substrates such as sucrose or galactose, but this requires the use of at least two enzymes (e.g. SuSy and an epimerase380 or a kinase and uridylyltransferases73,74). Furthermore, the epimerase step is rate- limiting and suffers from an unfavourable equilibrium while the kinase pathway needs seven enzymes to be cost-efficient13,72,74. In this chapter, the one-step production of UDP-Gal, using free or cross-linked SuSy and GalFru as non-natural glycosyl donor was evaluated (Figure 31). GalFru can be produced enzymatically from sucrose and galactose by levansucrase and was kindly provided by the University of Würzburg150.

.

Figure 31 (Upper part) SuSy catalyzes the reversible conversion of sucrose (Suc = GlcFru) and UDP into fructose (Fru) and UDP-Glc. In GalFru, a non-natural sucrose analogue, the glucose moiety is replaced by galactose (Gal). Glucose and galactose are C4 epimers that differ only in the spatial position of the hydroxyl (OH) group at the fourth carbon: axial in galactose and equatorial in glucose. (Lower part) One-step production of UDP-Gal starting from GalFru. SuSy*: SuSyAc WT, engineered variant or iCLEA.

99 Chapter 4: Driving the donor specificity of SuSy towards GalFru

3 Materials and methods

3.1 Materials

ADP was bought from Carbosynth while sucrose and UDP (disodium salt hydrate, 94330) were bought from Sigma. GalFru does not exist in nature, but it can be made starting from sucrose and Gal using levansucrase (EC 2.4.1.162)150. This process has been optimized by researchers of the University of Würzburg and the GalFru substrate was kindly provided by them.

3.2 PCR protocols

Amplification of linear fragments with PrimeSTAR (PS) GXL DNA polymerase

The PCR mix contained 1x PrimeStar GXL reaction buffer, 0.025 U/µL Primestar GXL DNA polymerase (Westburg), 0.2 mM dNTP mix, ~5 ng template plasmid DNA, 0.25 µM forward and reverse primer in a total volume of 50 μl. The amplification program started with an initial denaturation for 10 s at 98°C, followed by 30 cycles of denaturation for 10 s at 98°C, annealing for 15 s at 55 or 60°C (if Tm of primer > 55°C) and extension for 60 s/kb at 68°C. Afterwards, template DNA was digested by adding 1 µL of DpnI (Westburg) to the PCR mix and incubating this for at least 1 hour at 37°C. Fragments were then purified using a PCR purification kit and concentration was measured using a Nanodrop ND-1000 (Thermo Scientific).

Circular polymerase extension cloning (CPEC) with Q5 polymerase

CPEC is a one-step PCR protocol to assemble multiple linear fragments into one plasmid. Only DNA polymerases without strand displacement activity should be used to avoid cloning artifacts.408 The reaction mix consisted of 100 ng of linearized vector backbone (largest fragment), linear mutated insert fragment (1:3 molar vector to insert ratio for libraries and 1:1 ratio for site-directed mutants), 1x Q5 buffer, 0.4 mM dNTP mix, 3% DMSO and 0.04 U/µL Q5 high- fidelity polymerase in a total volume of 25 µL. The amplification program started with an initial denaturation (30 s at 98°C), followed by 15 cycles of denaturation for 10 s at 98°C, annealing for 30 s at 50°C and extension for at least 10 s/kb at 72°C. The program ended with a final extension step of 2 min at 72°C. The reaction mix was purified using a PCR purification kit and eluted in 15 µL mQ. This DNA solution was then used to transform electrocompetent E. coli BL21 (DE3) cells.

The vector backbones used in the CPEC reaction were obtained by amplification with Q5 polymerase and unmutated primers. For site-directed mutagenesis and some libraries, the insert was amplified using Q5 polymerase, a mutated forward primer and a regular reverse primer. For the construction of other libraries, the quality did not suffice and the insert was created in two steps to avoid preferential binding of some primers to the template. First, the insert was amplified using PS polymerase with two primers without mutation. Next, DpnI was added and the reaction mix was incubated for at least 1 hour at 37°C to remove original methylated plasmid DNA. After PCR purification, this fragment was used in a second PCR amplification round with PS albeit with

100 Chapter 4: Driving the donor specificity of SuSy towards GalFru a mutated forward primer and a normal reverse primer resulting in a mutated linear fragment which was purified using a PCR purification kit. This purified fragment was then used as insert for CPEC reaction. An overview of this protocol can be found in Figure 32.

Figure 32 Construction of site-directed or site-saturation mutants with CPEC. Arrows pointing towards right represent forward (Fw) primers, arrows pointing towards left represent reverse (Rv) primers. A red star represents a site of mutation.

Whole plasmid PCR (WPP) with Pfu polymerase

In the whole plasmid PCR protocol, a mutated unpurified linear DNA fragment is used as megaprimer to amplify the remaining part of the plasmid of interest (Figure 33). The PCR mix contained 1x PfuUltra HF reaction buffer, 2.5 U Pfu Ultra DNA polymerase AD (Agilent), 0.2 mM dNTP mix, ~10-30 ng template plasmid DNA, 2% DMSO and 2 µL megaprimer fragment in a total volume of 50 μl. The amplification program started with an initial denaturation for 30 s at 95°C, followed by 30 cycles of denaturation for 30 s at 95°C and annealing/extension for at least 60 s/kb at 72°C. The program ended with a final extension step of 6-8 min at 72°C. Afterwards, template DNA was digested by adding 1 µL of DpnI (Westburg) to the PCR mix and incubating this for at least 1 hour at 37°C. The reaction mix was purified using a PCR purification kit and eluted in 15 µL mQ. This DNA solution was then used to transform electrocompetent E. coli BL21 (DE3) cells.

For site-directed mutagenesis and some libraries, the megaprimer was amplified using Q5 or PS polymerase, a mutated forward primer and a non-mutated reverse primer (Figure 33). This unpurified mix was then used for WPP. Important to note is that the use of unpurified

101 Chapter 4: Driving the donor specificity of SuSy towards GalFru megaprimer, a sufficient long extension time and high template concentrations (around 30 ng/µL) are crucial parameters for successful generation of mutants.

Figure 33 Construction of site-directed or site-saturation mutants with whole plasmid PCR. Arrows pointing towards right represent forward (Fw) primers, arrows pointing towards left represent reverse (Rv) primers. A red star represents a site of mutation.

3.3 Construction of site-directed mutants and enzyme libraries

Site-directed mutants were made using the Sanchis protocol or whole-plasmid protocol (Table S10 and Table S11). It has to be noted that the Sanchis protocol tends to be less efficient compared to WPP (or CPEC).

Site-saturation involves the substitution of predetermined residues by each of the 20 naturally occurring amino acids using degenerate primers and PCR based methods resulting in so called enzyme libraries. However, despite numerous advances in molecular cloning work, the creation of high-quality libraries still remains very challenging and depend on different factors such as target DNA sequence, G+C content, randomization scheme, primer design/quality/length, PCR parameters and type of polymerase409,410. Consequently, the construction of the challenging multiple-site libraries (CorLib: H273H/E/G/P + Y402Y/D/G + E472E/H/Y/A/S + F665F/G/S/C/L; Lib4: F665X + G666X, Lib5: S468X + F665X and Lib6: P513X + F665X) was outsourced to C- lecta, a German company and partner of the European SuSy project specialized in this field (Table 10). For Lib 4-6, they used the ‘22c trick’ or variants thereof to reduce the size of the library and resulting screening effort (Chapter 1)186.

Two of the libraries were made using SuSyAc WT (UniProt ID A0A059ZV61) as template while the other two libraries were made starting from SuSyAc LMDKVVA (SuSyAc L637M + T640V). These enzymes are located on a pCXP34 constitutive vector, codon optimized for expression in E. coli and provided with a C-terminal His-tag. To facilitate cloning with their standard protocols they used a modified pCXP34 backbone with two nucleotide substitutions.

102 Chapter 4: Driving the donor specificity of SuSy towards GalFru

Table 10 Overview of libraries created by C-lecta. NDT codes for 12 amino acids (R, N, D, C, G, H, I, L, F, S, Y, V), VMA codes for 6 amino acids (Q, E, A, K, P, T), VHG codes for nine amino acids (Q, E, A, K, P, T, L, M, V), AGT codes for M and TGG codes for W.

Mutant/Library Template Codon set

F665X + G666X (lib 4) SuSyAc NDT/VHG/TGG

F665X + S468X (lib 5) SuSyAc LMDKVVA NDT/VMA/AGT/TGG

F665X + P513X (lib 6) SuSyAc LMDKVVA NDT/VMA/AGT/TGG

H273H/E/G/P + Y402Y/D/G + E472E/H/Y/A/S + SuSyAc F665F/G/S/C/L (CorLib)

All single-site libraries (Q482X, P513X, S468X and L667X) on the other hand were made in- house. Plasmids of two different enzymes were used as template to introduce mutations: SuSyAc LMDKVVA and SuSyAc QN6* EGFP.

To introduce diversity at the targeted positions, the commonly used NNK (N: any base, K: T or G) primer was used. Although some of the libraries could be created with sufficient quality immediately, others required several months and multiple attempts with different protocols (WPP and CPEC), use of different polymerases and the optimization of parameters such as template concentration or extension time. Three problems were encountered: either there were no colonies present after transformation, the library was not complete (some nucleotides were missing at the site of mutation) or multiple peaks were observed in the sequencing chromatogram. Consequently, some general points of attention for obtaining good quality libraries were identified. First, it is important to plate out a sufficient amount of culture after transformation (~200 µL). Second, the addition of DMSO in the CPEC protocol and the use of unpurified megaprimer, sufficient extension time and high template (plasmid) concentrations in the whole plasmid PCR protocol are crucial for success. Changing from CPEC to WPP, using alternative polymerases (GXL primestar instead of Q5) and/or a two-step protocol to create mutated inserts for CPEC were successful strategies for some libraries. Changing the vector to insert ratio using the CPEC protocol (3:1 vs 1:1) did not help. Furthermore, for some libraries created by the CPEC protocol, several double peaks were observed in the sequencing chromatograms. These were caused by deletion mutants, missing a part of their sequence after the site of mutation. Perhaps the insert formed secondary structures, leading to a wrong overlap with the vector fragment. Raising the temperature in the CPEC protocol from 50°C to 55°C or 60°C did not solve this issue but changing to WPP did. An overview of the final protocols and primers used to construct each library is presented in (Table S10 and Table S11).

The quality of the single-site libraries was checked by nucleotide sequencing and results of the final libraries can be found in Figure 34. Libraries P513X, S468X and Q482X were sequenced using forward primer 30 and library L667X was sequenced using reverse primer 40. Because

103 Chapter 4: Driving the donor specificity of SuSy towards GalFru

NNK primers were used, four sequencing peaks should be present at the first and second position (all libraries but L667X) or the second and third position (L667X) corresponding to nucleotide A, C, T and G (N). At the last (all libraries but L667X) or first position (L667X) two peaks should be present corresponding to T and G (K) or A and C (M) (L667X). Although all necessary peaks were present for all the libraries, some of the nucleotides were less represented at certain positions (smaller peaks). However, giving the difficulty of constructing libraries, they were still considered to be of sufficient quality to work further with.

Figure 34 Nucleotide sequencing results for single-site saturation libraries based on NNK degenerate primers. N: A, T, C or G; K: T or G; red: T; blue: C; black: G; green: A. 3.4 Transformation

For the construction of libraries, 2-4 µL of the DNA was transformed into 20-40 µL electrocompetent E. coli BL21 (DE3) cells in a sterile electroporation cuvette (Westburg, 2mm) while 2 µL of DNA in 20 µL cells sufficed for site-directed mutagenesis. The electric pulse added has a capacitance of 25 μF, 200 Ω resistance and a field strength of 2.0 kV ensuring a time constant around 4.7 ms. Subsequently, 0.5 (if sanchis protocol is used) or 1 mL (other protocols) of LB-medium was added and the cells were grown for 1 hour at 37°C and 200 rpm. For site- directed mutants, 200 μl was plated out on solid LB Petri dishes supplemented with 100 µg/mL ampicillin and grown overnight at 37°C. Colonies were then inoculated in 5 mL LB with 100 µg/mL ampicillin and grown overnight at 37°C at 200 rpm. 1 mL of the overnight culture was transferred to a cryovial containing 1 mL glycerol (70%) and stored at -80°C and the remaining 3-4 mL was used to extract plasmid, which was sent for sequencing. For libraries, 200 µL was plated out on solid LB and the remaining 800 µL was added to 5 mL LB supplemented with 100 µg/mL ampicillin and grown overnight at 37°C at 200 rpm. Next, 1 mL of the broth was added to 1 mL glycerol (70%) and the remaining 3-4 mL was used to extract plasmid, which was sent for sequencing.

3.5 Enzyme production and purification

Enzyme expression and purification was performed as described in Chapter 2 for SuSyAc. Lysis buffer consisted of PBS buffer (50 mM NaH2PO4/Na2HPO4 and 500 mM NaCl pH 7.4) supplemented with 100 µM PMSF and 1 mg/mL lysozyme. Equilibration, wash and elution buffer were composed of 10 mM, 80 mM and 250 mM imidazole in PBS, respectively.

104 Chapter 4: Driving the donor specificity of SuSy towards GalFru

3.6 Enzyme assays

3.6.1 SuSy activity in breakdown direction (BCA assay)

The BCA assay was performed as described in Chapter 2. Enzyme concentrations of purified enzymes ranged from 0.5 – 9 mg/L. After the addition of enzyme to preheated substrate mix, at least six samples were taken during 10-30 minutes.

3.6.2 SuSy activity in synthesis direction (PK/LDH assay)

The activity of SuSy in the synthesis direction was determined by the pyruvate kinase (PK)/lactate dehydrogenase (LDH) assay411. In this assay, the production of nucleoside diphosphate (NDP) is continuously coupled with the production of pyruvate and further reduction to lactate. During this last step, NADH is oxidized to NAD+ which can be followed spectrophotometrically as a decrease in OD at 340 nm. Unless otherwise stated, the standard reaction mixture for the PK/LDH assay contained 50 mM MOPS pH 7.0, 5 mM MgCl2, 0.3 mM phosphoenolpyruvate (PEP), 0.3 mM NADH, 10 mM UDP-Glc, 200 mM Fru, 2 U pyruvate kinase (PK), 2 U lactate dehydrogenase (LDH), 0.2 mg/mL BSA, and enzyme at an appropriate dilution, in a final volume of 200 μl. Alternatively, UDP-glucose was replaced by 10 mM UDP-Gal. Reactions were incubated at 37 °C in a 96-well MTP and oxidation of NADH was followed at 340 nm.

3.7 Screening protocol for site-directed mutants

Mutants (SuSyAc A664G, A664S, A426G + A664S/T) were inoculated from a cryovial in glass culture tubes containing 5 mL LB with 100 µ/mL ampicillin and grown overnight at 37°C and 200 rpm. 1 mL of this culture was centrifuged in an Eppendorf tube for 2 min at 14000 rpm, supernatant was discarded and the pellet was frozen at -20°C for at least 1h. Next, the pellet was thawed again, 300 µL lysisbuffer was added (1 mM EDTA, 4 mM MgCl2, 50 mM NaCl, 1 mg/mL lysozyme and 0.1 mM PMSF in 100 mM MOPS pH 7.0) and incubated for 30 min at 37°C. After centrifugation for 2 min at 14000 rpm, CCE was transferred to a new Eppendorf tube and used to test the activity on 200 mM sucrose and 5 mM ADP (20 min incubation at 37°C) or 400 mM GalFru and 50 mM UDP (21 hours incubation at 52°C) using the BCA assay.

3.8 Screening protocol for enzyme libraries

An overview of the complete screening protocol can be found in Figure 35.

105 Chapter 4: Driving the donor specificity of SuSy towards GalFru

Figure 35 Overview of screening protocol developed to evaluate the activity of enzyme (variants) on sucrose and GalFru. 1Mixture of plasmids with different enzyme variants (e.g. created by site- saturation mutagenesis). 2Mixture of E. coli cells harboring plasmids with different enzyme variants. 3Culture was manually inoculated into the MTP.

Colonies were picked from a Petri dish with an automated robot (QPix2, Genetix) and inoculated into a sterile Nunc 96-well MTP, containing 175 μL LB medium and 100 µg/mL ampicillin per well, using the picking software. These ‘masterplates (MP)’, which were covered by their original lid, were incubated for 16 hours (overnight) at 37°C and 250 rpm resulting in an OD600 of about 0.8-1. Ideally, to pick colonies efficiently, the Petri dish should contain around 100-150 colonies, the plate should not be older than 1 or 2 weeks and it should be preheated for about 30 minutes at

106 Chapter 4: Driving the donor specificity of SuSy towards GalFru

37°C before picking. The Petri dish can be obtained after plating out transformed E. coli cells or alternatively, made using the cryovial stock. In case of the latter, a sufficient amount of cells are scraped from the frozen culture, inoculated in 5 mL LB (containing 100 µg/mL ampicillin) and incubated for about 6 hours at 37°C and 200 rpm. Next, the OD of the culture (10x diluted) is measured in a spectrophotometer at 600 nm. Knowing that 1 OD600 correlates with about 2.5 x 108 E. coli cells, it can be calculated how much the culture needs to be diluted to obtain a Petri dish with about 100-150 colonies. If multiple plates are used during picking, it is important that the height of the LB layer is identical (use e.g. an LB volume of 16 mL/Petri dish) as the colony picker is calibrated using only 1 plate.

Each MP was replicated three times into new sterile 96-well MTPs, called ‘back-up plates’ (BP), using the replicating software. To that end, the replicating head (with 96 pins) dips into the MP (which is preheated for about 30 min at 37°C) for 3 sec and subsequently dips three times for 3 sec into the back-up plates containing 175 μl LB medium and ampicillin (100 µg/mL) per well. To preserve the MP, it was stored at -20°C after adding 125 μL of 70% sterile glycerol to each well. The BPs on the other hand were incubated for about 16 hours (overnight) at 37°C and 250 rpm resulting in an OD600 of about 0.8-1. Next, the BP plates were centrifuged at 4500 rpm for 30 min. Supernatant was discarded and cell pellets were frozen at -20°C for at least one hour until further use.

To extract the intracellular enzymes from the cell pellets, 100 μL of lysis buffer (1 mM EDTA, 4 mM MgCl2, 50 mM NaCl, 1 mg/mL lysozyme and 0.1 mM PMSF in 100 mM MOPS pH 7.0) was added to the cell pellets and the MTP was incubated at 37°C (without shaking) for 30 min. Next, the MTP was centrifuged at 4500 rpm (Rotixa 50RS centrifuge) for 30 min and 40 μL of supernatant (crude cell extract, CCE) was transferred with a multichannel to a new MTP.

To determine the activity on Suc, 3 µL of CCE from each well of a BP was added to the corresponding well of another MTP, together with sucrose and ADP (100 mM MOPS pH 7.0) resulting in a final volume of 30 µL. Final concentration of sucrose and ADP were 200 mM and 5 mM, respectively. Next, the MTP was sealed with EASYseal and incubated for 20 minutes at 37°C (in a hot chamber). 25 μl of each well was then added to 150 μl BCA in a new MTP, which was incubated for 30 min at 70°C. The resulting OD’s were measured at 540 nm with a spectrophotometer.

To determine the activity on GalFru, 3.5 µL of CCE from each well of a BP was added to the corresponding well of a PCR plate, together with UDP and GalFru (100 mM MOPS pH 7.0) resulting in a final volume of 35 µL. Final concentration of GalFru was 400 mM while the final concentration of UDP was 50 mM, in case the library was made using SuSyAc WT as template, or 1 mM if SuSyAc LMDKVVA or QN6* EGFP were used as template constructs. The PCR plate was sealed with an adhesive PCR film and centrifuged at 4500 rpm for 5 min at room temperature. Next, the plate was put for 16 hours (SuSyAc LMDKVVA and QN6* EGFP variants)

107 Chapter 4: Driving the donor specificity of SuSy towards GalFru or 20 hours (SuSyAc variants) in an oven at 52°C together with a trey of water to reduce evaporation. Finally, the plate was centrifuged at 4500 rpm for 5 min at room temperature and 25 μl of each well was added to 150 μl BCA in a new MTP. This MTP was incubated for 30 min at 70°C and the OD was measured at 540 nm with a spectrophotometer.

Potential hits on GalFru (high OD540 values) were sequenced, expressed in 250 mL cultures and purified using His-tag chromatography. Next, activity on 1 M sucrose and 400 mM GalFru using 5 mM ADP as acceptor was assayed at 60°C (100 mM MOPS pH 7.0) using the BCA assay.

3.9 Experiments on fructose consumption by E. coli

E. coli BL21 (DE3) containing an empty pCXP34 vector (with Amp resistance but without a gene coding for a recombinant protein) was inoculated from a frozen cryovial in a solution containing 300 µM Fru. Incubation occurred at 37°C, 45°C or 52°C in a thermoblock. Alternatively, E. coli BL21 (DE3) containing an empty pCXP34 vector was inoculated in 5 mL LB with 100 µg/mL ampicillin and grown overnight at 37°C. 1 mL of this culture was harvested by centrifugation for 2 min at 14000 rpm. Supernatant was discarded and cell pellet was frozen for at least 1 hour at - 20°C. Next, the pellet was thawed again, 300 µL lysisbuffer was added (1 mM EDTA, 4 mM

MgCl2, 50 mM NaCl, 1 mg/mL lysozyme and 0.1 mM PMSF in 100 mM MOPS pH 7.0) and incubated for 30 min at 37°C. After centrifugation for 2 min at 14000 rpm, CCE was transferred to a new Eppendorf tube. The resulting CCE was diluted 5 times in a solution containing 300 µM fructose or in a solution containing 300 µM fructose and 25 µg/mL chloramphenicol (Cat) and incubated at 37°C.

3.10 Production of iCLEAs

ICLEAs are made by three consecutive steps: imprinting of the enzyme with the target molecule, precipitation e.g. with tert-butanol and cross-linking e.g. with glutaraldehyde. Initially, a volume concentration of 60% and 80% of tert-butanol was evaluated with a final concentration of SuSyAc of 0.8 mg/mL, without imprinting molecules. Alternatively, 2 mg/mL of SuSyAc was incubated with 250 mM of sucrose (imprinting molecules) for 1 min at room temperature or 30 min at 37°C under agitation, prior to the addition of tert-butanol (final v/v percentage of 60% tert-butanol, 100 mM sucrose and 0.8 mg/mL enzyme). This solution was incubated for 30 min at 4°C and 1000 rpm to allow precipitation to occur. To measure the precipitation efficiency, this mixture was first centrifuged for 2 min at 14000 rpm after which the residual concentration was measured in the supernatant with a nanodrop. Precipitation efficiency or % precipitation is defined as: (1-(residual concentration in supernatant after centrifugation/concentration before precipitation))*100. To measure the residual activity of the precipitate, the supernatant was removed, the pellet was washed two times with 100 mM MOPS pH7 and afterwards preheated substrate mix was added (200 mM Suc, 5 mM ADP, 100 mM MOPS pH 7.0) to dry pellet. Samples (25 µL) were taken every 10 s without disturbing the pellet and added to 150 µL BCA solution. After incubation at 70°C for 30 min, the absorbance was measured at 540 nm with a spectrophotometer.

108 Chapter 4: Driving the donor specificity of SuSy towards GalFru

To precipitate SuSyAc using ammonium sulphate (AS, (NH4)2SO4), different volumes of 500 g/L AS were added to a protein solution containing either 20 µg or 40 µg of protein in a total sample volume of 25 µL. The AS% (w/v) of these solutions ranged between 4 and 40%. This mixture was incubated for three hours at 4°C under agitation (1000 rpm) and protein precipitate was harvested by 20 min of centrifugation at 14000 rpm (room temperature). The residual concentration of enzyme in the supernatant was measured with a nanodrop.

For the production of the iCLEAs, the enzyme was incubated with 250 mM GalFru or sucrose for 30 min at 4°C and 800 rpm. Next AS was added yielding a solution of 30% (w/v) AS and 100 mM sucrose or GalFru and this mixture was incubated for three hours at 4°C and 1000 rpm. To crosslink the enzymes, 2 µL or 10 µL of 0.5 wt% GA was added to 23 µL of sample corresponding to a GA/protein ratio of 0.54 and 2.9, respectively. This mixture was incubated for 1 hour at 4°C and 1000 rpm. After centrifugation for 10 min at 14000 rpm, the pellet was washed three times with 50 µL MOPS buffer (100 mM, pH 7.0). Activity tests were performed under agitation (800 rpm) at 60°C and initiated by adding 160 µL of prewarmed substrate mix (200 mM sucrose and 5 mM ADP or 400 mM GalFru and 5 mM ADP) to the pellet. Samples were taken during 10 min and added to BCA solution. During this reaction, the pellet remained attached to the Eppendorf tube.

3.11 Visualization of interactions between enzyme and substrate

Interactions between substrate and enzyme were determined using the software program PyMOL388. Crystal structures were extracted from the online PDB database: 3S27 and 3S28 for SuSyAt1 and 4RBN for SuSyNe.

LCN* was created in YASARA starting from lichenan (LCN), a glucose analogue, which is incorporated in PDB file 3S28. More concrete, the equatorial position of the hydroxyl group at the fourth carbon of LCN was changed to the axial position. LCN* thus can be considered a galactose analogue.

109 Chapter 4: Driving the donor specificity of SuSy towards GalFru

4 Results and discussion

4.1 Activity of SuSy on UDP-galactose and GalFru

The activity of purified bacterial enzymes SuSyAc, SuSyDa, SuSyNe and SuSyTe on 10 mM UDP-Gal and 200 mM fructose (synthesis direction), was measured with the PK-LDH assay411 (Figure 36). In addition, kinetic parameters for Galfru in the breakdown reaction were determined for SuSyAc, the most promising enzyme regarding stability, expression yield and activity (Chapter 2).

8

7

6

5

4 UDP-Glc UDP-Gal

3 Activity Activity (U/mg) 2

1 5.6% 6.2% 10% 14% 0 SuSyAc SuSyDa SuSyNe SuSyTe

Figure 36 Activity with UDP-Glc or UDP-Gal for SuSyAc, SuSyDa, SuSyNe and SuSyTe at 37°C with 10 mM UDP-sugar and 200 mM fructose in 50 mM MOPS pH 7.0. Relative activities obtained with UDP-Gal compared to UDP-Glc are represented as percentages.

Similar to plant SuSys, the activity on UDP-Gal was much lower compared to UDP-Glc (≤14%) for all bacterial enzymes tested. For SuSyAc, the Km and Vmax for GalFru in the presence of 5 mM ADP were about 700 mM and 0.29 U/mg, respectively. At a concentration of 100 mM donor, the activity on GalFru was only 0.4% compared to sucrose (Table 11). If UDP (20 mM) was used as acceptor, Km and Vmax values were around 1500 mM and 0.11 U/mg, respectively (Figure S11). However, it has to be noted that these values are only indicative as substrate saturation was never achieved because of the low solubility of GalFru. Indeed, at room temperature the solubility is only 180 g/L or ~ 500 mM, which is much lower compared to sucrose (> 600 g/L). Just like SuSyAc, variants LMDKVVA and QN6* (Chapter 3) also displayed very low activity with GalFru as donor substrate (Table 11).

110 Chapter 4: Driving the donor specificity of SuSy towards GalFru

Table 11 Activity (U/mg) of SuSyAc and variants on sucrose and GalFru. The final concentration of ADP in the reaction mix was 5 mM while 9 mM, 1 mM or 5 mM UDP was used for SuSyAc WT, SuSyAc LMDKVVA and SuSyAc QMDRVRN (QN6*), respectively. Reactions were carried out at 60°C in 100 mM MOPS pH 7.0 and evaluated using the BCA assay. N.d.: not determined, Suc: sucrose.

SuSyAc WT SuSyAc LMDKVVA SuSyAc QN6*

1 M Suc (ADP) 63 ± 2 54 ± 2 62 ± 1 100 mM sucrose (ADP) 12 ± 1 n.d. n.d. 400 mM GalFru (ADP) 0.12 ± 0.05 0.25 ± 0.05 n.d. 100 mM GalFru (ADP) 0.04 ± 0.01 n.d. n.d. 1 M Suc (UDP) 33 ± 6 86 ± 4 51 ± 7 400 mM GalFru (UDP) 0.04 ± 0.01 0.06 ± 0.01 0.06 ± 0.01

Because of the low catalytic efficiency (kcat/Km) of SuSy towards GalFru and UDP, improvement of the kinetic parameters is clearly required to obtain a cost-effective process for the production of UDP-Gal. One way to achieve this, is by means of enzyme engineering, which involves the introduction of mutations at target positions and screening of these enzyme variants for improved properties (e.g. activity). Practically, this requires the establishment of a proper screening protocol, selection of target residues, construction of site-directed mutants or (site-saturation) libraries at these positions and screening of these proteins with the newly developed protocol. All of these steps will be discussed in the following sections.

4.2 Development of a screening protocol to screen enzyme libraries for improved activity on GalFru

Successful and efficient screening of enzyme libraries (mix of enzyme variants) requires a robust screening method without a time-consuming protein purification step. The screening procedure typically consists of three steps: the expression of mutants in microtiterplates (MTPs), the extraction of the enzymes from the expression host and an assay to detect the desired activity. The protocol was optimized for both SuSyAc and SuSyAc LMDKVVA. As a negative control, an E. coli strain containing an empty pCXP34 vector was used.

4.2.1 Enzyme expression and enzyme extraction

Expression of the enzymes was carried out in MTPs and a detailed protocol is provided in the materials and methods section. The recombinant enzymes are expressed intracellularly in E. coli and have to be liberated by disrupting the cell membranes of this organism. The process of disruption is already initiated by freeze-thawing the cell pellets412 and is continued by adding lysis buffer containing lysozyme. Commercial lysisbuffers exist, such as EasyLyse solution, but this is quite expensive for high-throughput screenings. In this thesis, to screen SuSyAc WT or variants, a slightly adjusted version of the Truelyse buffer, developed by De Groeve413, was used. Tris- HCL was replaced by 100 mM MOPS pH 7.0 as it is known that some SuSy enzymes are inhibited by Tris68,328 and Triton X-100 was omitted because of its limited effect on lysis efficiency

111 Chapter 4: Driving the donor specificity of SuSy towards GalFru

and its high viscosity. The final lysis buffer used here thus consisted of 1 mM EDTA, 4 mM MgCl2, 50 mM NaCl, 1 mg/mL lysozyme and 100 µM PMSF (serine protease inhibitor) in 100 mM MOPS pH 7.0. Extraction of the enzymes from the cell pellet with abovementioned lysisbuffer appeared to be successful as WT SuSy activity on sucrose (200 mM) and ADP (5 mM) could be clearly detected with the BCA assay after incubation at 37°C for 20 min (OD540≈2). In this reaction, the crude cell extract was diluted ten times.

4.2.2 Enzyme assay

A fast and reliable detection method should be used that allows to screen in high-throughput. If GalFru and a nucleotide is supplied to the WT enzyme (or variants thereof), UDP-Gal and fructose are produced. Similar to the WT reaction on Suc, the activity on GalFru can thus be monitored using the colorimetric BCA assay, which detects reducing sugars such as fructose303. Parameters such as the pH, buffer, incubation temperature/time and concentration of substrates were optimized according to the characteristics of the enzyme (Km, kcat, pH/temperature optimum) and practical considerations.

Buffer and pH

The optimal pH of SuSyAc is 5.5 (Chapter 2). Consequently, MES could be a suitable buffer as its buffering capacity ranges between 5.5 and 6.7. However, as chemical hydrolysis of GalFru at this pH was observed, 100 mM MOPS pH 7.0 was used instead. At this pH, SuSyAc displays 80% of its maximal activity and catalytic parameters (Km, kcat) of SuSyAc were previously also determined using this buffer43.

Nucleotide acceptor and concentration of substrates

SuSyAc prefers ADP as substrate in the breakdown direction (Chapter 2). However, initial experiments revealed that crude cell extract from E. coli contains enzymes acting on this substrate, releasing components interfering with the BCA assay (data not shown). In addition, most GTs use UDP-sugars as donor substrate13. Consequently, the reaction with GalFru was optimized using UDP. Unfortunately, UDP disodium salt hydrate from Sigma (512 €/g) had to be used instead of UDP disodium salt from Carbosynth (60 €/g) because the latter causes a high background signal (OD540) with the BCA assay (Figure S12). Due to the high Km of SuSyAc for UDP (7.8 mM), libraries made using this enzyme as template were screened with 50 mM UDP. On the other hand, libraries based on SuSyAc LMDKVVA or SuSyAc QN6*, variants with improved affinity for UDP, were screened with 1 mM UDP. The use of these enzyme variants could thus severely decrease screening costs.

Because of the low affinity of SuSyAc and variants for GalFru (Km > 1 M), the concentrations of this substrate should be as high as possible to have sufficient activity. However, the amount of GalFru that can be used in the reaction is limited by its low solubility (500 mM) and the fact that it is supplied by another University involved in the European SuSy project. Therefore, only 400 mM

112 Chapter 4: Driving the donor specificity of SuSy towards GalFru

GalFru was used to screen the libraries. It has to be noted that the GalFru substrate also contains trace amounts of residual fructose resulting in a background noise at 540 nm (BCA assay) of 0.3 up to 0.7 depending on the purity of the delivered batch (between 99,925% and 99,99% GalFru).

Incubation time

As the activity on GalFru is 600-fold lower compared to Suc, incubation time was extended from 20 min (sucrose reaction) to at least 16 hours (overnight incubation). For SuSyAc WT, 20 hours of incubation were necessary to obtain a significant difference with the negative control (Figure S13). Incubating SuSyAc LMDKVVA for 20 hours resulted in an OD of about 2. However, as the spectrophotometer can only measure accurately up to 2 Abs units, it would not be possible to detect improved activity of putative hits within libraries. Therefore, 16 hours of incubation already sufficed for SuSyAc LMDKVVA (and libraries based on this template).

Incubation temperature

Initially, the screening protocol to detect GalFru activity was evaluated at 37°C, similar to the activity test with sucrose. The activity of SuSyAc is only 70% at that temperature but evaporation is also limited and MTPs can easily be placed in a hot chamber. However, experiments with a negative control (empty pCXP34 vector) revealed problems related with this low temperature.

Indeed, after incubation of the CCE with UDP and GalFru overnight at 37°C, the OD540 was lower compared to the initial OD. The high initial OD (about 0.7) is caused by impurities of the GalFru substrate, mainly fructose. A decreasing OD thus means that the fructose is consumed in some way. This could possibly be explained by the presence of residual E. coli cells in the CCE. This hypothesis was strengthened by experiments revealing that E. coli BL21 (DE3) containing the empty pCXP34 vector inoculated in a MOPS solution containing only 300 µM fructose rapidly grew at 37°C and consumed the fructose leading to a decreased OD signal at 540 nm using the BCA assay. E. coli was still able to grow and consume fructose at 45°C but not at 52°C (at least for 24 hours). Incubating the CCE with 300 µM fructose and chloramphenicol antibiotics also prevented fructose consumption. Taken all together, these experiments strongly suggest the involvement of residual E. coli cells in fructose consumption at lower temperatures.

Consequently, the temperature for screening was set at 52°C, which is only 8°C below the optimal temperature of SuSyAc. At this temperature, E. coli cells are killed while SuSyAc still retains 100% of its activity after 24 hours of incubation (Chapter 5). To reduce evaporation, 96- well PCR plates covered by ThermoScientific adhesive PCR films were used and a heavy weight was placed upon the plate to prevent detachment of the foil. Moreover, the oven at 52°C was saturated with water vapor by placing a bucket of water inside.

Final screening conditions

Screening of the constructed enzyme libraries consisted of three consecutive steps: growing the libraries in MTPs, each well containing a different (variant) enzyme, extracting the enzymes with

113 Chapter 4: Driving the donor specificity of SuSy towards GalFru lysisbuffer and evaluation of the activity of the resulting CCE on sucrose and on GalFru by a single measurement (one time-point) with the BCA assay. Optimized screening conditions are presented in Table 12. Using this screening protocol, activity on sucrose and GalFru could be clearly distinguished from the negative control (Figure S13). An overview of the complete screening protocol can be found in detail in materials and methods section.

It has to be noted that this protocol is generally applicable to screen for improved activity on non- reducing sugar donors in the breakdown direction, as long as reducing sugars are released from this molecule during catalysis.

Table 12 Optimized screening conditions to evaluate the activity of SuSyAc (variants) on sucrose and GalFru.

Parameter Activity on Suc Activity on GalFru

Incubation temperature 37°C (hot chamber) 52°C (oven) 20 h (WT) Incubation time 20 min 16 h (LMDKVVA or QN6* EGFP) 50 mM UDP (WT) [nucleotide] 5 mM ADP 1 mM UDP (LMDKVVA or QN6* EGFP) [Sugar] 200 mM Suc 400 mM GalFru Dilution CCE 10x 10x Type of plate MTP with lid PCR plate with adhesive PCR film Buffer and pH 100 mM MOPS pH 7.0 100 mM MOPS pH 7.0 Final volume 30 µL 35 µL

4.3 Selection of target residues for mutagenesis

4.3.1 Residues around the active site (EX7E motif)

Changing the enzymes substrate specificity requires remodeling of the active site in which the substrates are positioned. To achieve this, it seems obvious to mutate residues in direct contact with the substrate (first-shell residues). However, residues more distant from the active site (e.g. second-shell residues) can also have a severe impact on activity or selectivity by propagating structural changes to the active site through interactions with first-shell residues414,415.

To inspect the active site of SuSy, two types of crystal structures are available: the one from Nitrosomonas europaea (SuSyNe) and those from the plant Arabidopsis thaliana (SuSyAt1)48,95. However, the structure of SuSyNe (PDB 4RBN) is in an open form as it was crystallized without substrates in contrast to the closed structures of SuSyAt1, which were obtained by crystallizing the enzyme in complex with either UDP and fructose (PDB 3S27 and 3S29) or UDP-Glc (PDB 3S28) (Figure 7).

Modeling of the glucose moiety in the closed structure of SuSyAt1 appeared to be very challenging and did not correspond to an intact UDP-Glc or glucose molecule. Instead, the

114 Chapter 4: Driving the donor specificity of SuSy towards GalFru electron density suggested a distorted glucosyl species consistent with an oxocarbenium ion 48 expected in a SNi-like reaction mechanism . This glucosyl intermediate could be mimicked in situ by two glucosyl analogues: 1,5-anhydrofructose (NHF) and lichenan (LCN), a tautomer of NHF, which are also included in the PDB structure 3S28. All interactions between the enzyme and the glucose moiety of sucrose (or UDP-Glc) can therefore be described using LCN.

As GalFru and sucrose only differ by the spatial arrangement of the OH group at the fourth carbon of the glucose moiety, residues around this group are good candidates for mutagenesis. Table 13 lists all residues within 4 Å of the equatorial C4-OH group of LCN in the crystal structure of SuSyAt1 (PDB 3S28) or within 4 Å of the axial C4-OH group of LCN*. LCN* was created in situ using the software program YASARA by changing the equatorial C4-OH group of LCN to the axial position. LCN* thus represents a galactose analogue.

Table 13 Residues within 4 Å of the equatorial C4-OH group of LCN in the crystal structure of SuSyAt1 (PDB 3S28) or within 4 Å of the axial C4-OH group of LCN*. For each of these residues, their interaction with the atoms of LCN is provided in the last column. Residues in the consensus motifs are >90% conserved in SuSy enzymes. X: undefined amino acid, H-bond: hydrogen bond

Residue in Residue in 3DM Consensus Reference Interaction with LCN SuSyAt1 SuSyAc nr. motif in SuSy atoms

His-438 (H) His-425Ac 94 AHALE LCN, LCN H-bond (C6-OH)

Glu-675 (E) Glu-663Ac 222 EAFGLTXXE LCN H-bond (C3-OH)

Ala-676 (A) Ala-664Ac 223 EAFGLTXXE LCN* H-bond (C3-OH)

Phe-677 (F) Phe-665Ac 224 EAFGLTXXE LCN, LCN* H-bond (C3-OH)

Gly-678 (G) Gly-666Ac 225 EAFGLTXXE LCN H-bond (C3-OH and C4-OH)

Leu-679 (L) Leu-667Ac 226 EAFGLTXXE LCN H-bond (C4-OH)

Five out of six residues belong to a well-known conserved motif: EX7E. This sequence motif is characteristic for several retaining hexosyltransferases and consists of seven variable amino acids flanked by two highly conserved glutamate residues56. Furthermore, positions 2-6 of the motif are also more than 95% conserved within the SuSy family. Structurally, the first part (EAFGL) consists of a loop positioned close to the glucose moiety while the second part (LTXXE) is the start of an α-helix in the vicinity of UDP (Figure 37). Interestingly, the position of the motif differs markedly between the open structure of SuSyNe and the closed structure of SuSyAt1 when these are superimposed (Figure 37). This suggests that it is involved in the conformational changes induced upon substrate binding.

In the active site of SuSy, substrates interact with each other and with residues of the enzyme through and extensive hydrogen bonded network, which contributes most likely to substrate specificity. Manipulation of this network could thus possibly lead to improved Galfru mutants. Hydrogen bonds between the amino acids of SuSyAt1 and the hydroxyl groups of the glucose

115 Chapter 4: Driving the donor specificity of SuSy towards GalFru analogue are possible with the main chain of Leu-679, Gly-678, Phe-677 and Ala-676 and the side chains of Glu-675 and His-438 (Figure 37 and Table 13). Gly-678 (Gly-666Ac) and Leu-679

(Leu-667Ac) are of particular interest because of their interaction with the hydroxyl group of LCN at the fourth carbon. Interestingly, no hydrogen bonds are predicted to be formed between the C4-substituents of the galactose analogue and the enzyme or UDP. The absence of these interactions could possibly explain the low activity of SuSy on GalFru. Indeed, the hydrogen bond between Gly-678/Leu-697 and the glucose moiety could be a crucial factor in inducing the conformational changes (including the position of the EX7E motif) necessary for proper catalysis. The hydrogen bond between C4-OH of the glucose moiety and UDP, on the other hand, could be required for correct positioning of both substrates in the active site.

Figure 37 (Left) Interactions of LCN (glucose analogue) with the enzyme and UDP, according to the crystal structure of SuSyAt1 (3S28). Residue numbers are those from SuSyAt1 but corresponding positions in SuSyAc are shown between brackets. Hydrogen bonds are represented as dashed lines. The main chain of His-438 can stabilize the partial positive charge at C1 during catalysis (dotted lines). Red: hydroxyl group that differs between GalFru and sucrose in space. (Right) EX E 7 motif of the closed structure of SuSyAt1 (blue) and the open structure of SuSyNe (orange). Numbering scheme according to SuSyAt1. Hydrogen bonds between C4-OH of LCN and the enzyme/UDP are represented by dashed yellow lines. Eq: equatorial, Ax: axial, Fru: fructose.

Summarized, the residues surrounding the glucose/galactose moiety in SuSy present interesting targets for mutagenesis, not only to improve the affinity/activity for GalFru, but also to gain more insight into structure-function relationships. In the next paragraphs, mutagenesis of the EX7E motif residues Leu-667Ac, Gly-666Ac, Phe-665Ac, Ala-664Ac and interacting second-shell residues in SuSyAc (LMDKVVA) is described. These residues were replaced by one or several predetermined amino acids (site-directed mutagenesis) or by all 20 naturally occurring amino acids (site-saturation mutagenesis). His-425Ac en Glu-663Ac were not subjected to mutagenesis as these positions are not only highly conserved in SuSy, but also in other retaining GTs416. It

116 Chapter 4: Driving the donor specificity of SuSy towards GalFru suggests an important role in catalysis but not in substrate specificity. Indeed, the main chain of the histidine residue is thought to stabilize the partial positive charge at the C1 carbon of the glucose moiety and thus plays an important electrostatic role in the active site48. Although the exact role of the glutamate residue is still a matter of debate, it has proven to be critical for the activity of several different GTs and would possibly be involved in binding and stabilizing the glucose moiety of the nucleotide sugar during catalysis57–62.

Leu-667Ac

The main chain of Leu-667Ac (fifth position of the EX7E motif) is involved in hydrogen bonding with the equatorial hydroxyl group of glucose at the fourth carbon (Figure 37). Its side chain, on the other hand, points towards a putative hydrophobic partner: Val-306 (Val-292Ac). This residue is part of another highly conserved motif within the SuSy family, GGQVV306Y, which is positioned next to fructose and UDP. The importance of Leu-667Ac in catalysis could be demonstrated by the creation of an alanine mutant (SuSyAc LMDKVVA L667A), which showed only around 18% activity with sucrose compared to its parent (SuSyAc LMDKVVA) while no significant difference in expression levels was observed (data not shown).

In an attempt to improve the activity for GalFru, an additional site-saturation library of Leu-667Ac was made using a NNK primer. Consequently, to statistically cover 95% of this single-site library, about 96 (~3 times 32 codons) colonies should be screened (Chapter 1)186. Eventually, 86 clones (≈1 MTP) were screened. E. coli harboring SuSyAc LMDKVVA and an empty pCXP34 vector were inoculated manually in two wells of the MTP as positive and negative control, respectively. Most of the clones did not display any detectable activity on sucrose as can be derived from the plateau at OD 0.4 (Figure 38, left). This does not necessarily mean that they are completely inactive, but rather that the activity is too low to be detected within the current experimental setting. This can either be caused by a reduced expression of the variants in the soluble fraction (higher occurrence of misfolded proteins) or due to an actual decrease in activity. Nevertheless, these results again emphasize the importance of Leu-667Ac for the SuSy enzyme. Sequencing four of those with significant activity on sucrose revealed that one of them was a variant with a structurally similar amino acid (L667V) while three of them did not have a mutation at position 667. The latter is in good agreement to what is theoretically expected. Indeed, as 3 out of 32 codons (~ 9%) of the NNK primer encode a Leu residue (Table S1), 8 out of 86 screened colonies can contain a non-mutated enzyme. Differences between the actual observed number of a particular mutant (or WT) and theoretical values are caused by the stochastic nature of sampling, although it also strongly depends on the quality of the library.

117 Chapter 4: Driving the donor specificity of SuSy towards GalFru

Figure 38 Screening results of library SuSyAc LMDKVVA L667X. Enzyme activity with 200 mM sucrose and 5 mM ADP (37°C) or 400 mM GalFru and 1 mM UDP (52°C) was measured using the BCA assay. LM: SuSyAc LMDKVVA sampled from the library (with no additional mutations). LMi: SuSyAc LMDKVVA inoculated manually (positive control).

The library was also evaluated using GalFru as a substrate. Promisingly, variant L667M showed a high activity on GalFru during the initial screening experiment (Figure 38, right) with OD values exceeding those observed for non-mutated enzymes. Consequently, the enzyme was expressed in a 250 mL culture, purified, and its activity was assayed at 60°C using 5 mM ADP and 1 M sucrose or 400 mM GalFru. Unfortunately, it appeared to be a false positive as the purified enzyme showed a slightly lower activity on GalFru compared to the parent enzyme and half of the activity on sucrose. The discrepancy between initial screening results and those after purification remains unclear but could possibly be explained by slightly higher expression yields of the L667M mutant or some sort of contamination.

Phe-665Ac, Ile-690Ac, Pro-513Ac and Ser-468Ac

As mentioned before, it seems that no hydrogen bonds can be formed between the WT SuSy and the C4 substituents of a galactose moiety. However, new hydrogen bonds with GalFru could possibly be established by mutating nearby hydrophobic residues into hydrophilic ones. Phe-

665Ac and Ala-664Ac, belonging to the EX7E motif, are positioned ideally for this purpose (Figure 39).

Figure 39 Visualization of Ile-690Ac, Phe-665Ac, Ala-664Ac, Pro-513Ac, Ser-468Ac using the crystal structure of SuSyAt1 (PDB 3S28). Residue numbers are those from SuSyAt1 but corresponding positions in SuSyAc are shown between brackets. LCN*: galactose analogue.

118 Chapter 4: Driving the donor specificity of SuSy towards GalFru

Consequently, Phe-665Ac in SuSyAc WT was replaced by the hydrophilic amino acids Asn (N), Thr (T), Ser (S) and His (H) and the smallest residue Gly (G). These variants were purified and tested for their activity on sucrose and GalFru (Table 14).

Table 14 Activity (U/mg) of purified Phe-665Ac mutants with 5 mM ADP and 200 mM sucrose or 400 mM GalFu at 60°C (100 mM MOPS pH 7.0). /: No activity or activity too low to measure accurately within 30 min.

SuSyAc WT F665S F665N F665T F665H F665G

Suc 63 ± 2 0.9 ± 0.2 2.9 ± 0.3 7.3 ± 0.2 1.4 ± 0.3 11 ± 1 GalFru 0.12 ± 0.05 / / 0.03 / 0.02

SDS-PAGE analysis of the crude cell extract of F665T, F665N and F665S revealed that the latter two were expressed to a lesser extent in the soluble fraction compared to the WT (Figure S14). Unfortunately, none of the mutants showed improved activity on GalFru. In addition, although the side-chain of Phe-665Ac does not point towards the substrates in the active site, all mutants showed severely decreased activities (≤ 17%) on the natural substrate sucrose. These results suggest that interactions between Phe-665Ac and nearby residues are important for the overall structure and activity of the enzyme. Using the crystal structure of SuSyAt1, several potential second-shell residues were identified: Pro-513Ac, Ile-690Ac and Ser-468Ac (Figure 39). A proline is a special type of amino acid as it side chain is attached to the alpha amino group. It can interact favorably with aromatic side chains due to the hydrophobic effect and a CH-π interaction417,418.

Although Ser-468Ac has a hydrophilic side chain, the beta carbon can also possibly participate in 419 a CH-π interaction with Phe-665Ac .

To scrutinize the importance of the hydrophobic Ile at position 690, it was mutated into the polar residues Asn and Gln in SuSyAc and SuSyAc F665S (Figure 40). In addition, these particular substitutions could give rise to a new hydrogen bond network, possibly altering substrate specificity.

Figure 40 Visualization of Ile-690Ac, Tyr-741Ac, Gly-666Ac and Phe-665Ac using the crystal structure of SuSyAt1 (PDB 3S28). Residue numbers are those from SuSyAt1 but corresponding positions in SuSyAc are shown between brackets. Mutated residues (F665S, I690N in part A and I690Q in part B) are shown in purple. LCN*: galactose analogue.

119 Chapter 4: Driving the donor specificity of SuSy towards GalFru

Compared to SuSyAc WT, the expression in the soluble fraction was similar for the two Ile-690 single mutants but severely decreased for the F665S + Ile-690Ac double mutants. Despite the structural resemblance between Ile and Asn/Gln, variants I690N and I690Q only retained 42% and 66% activity on sucrose compared to the WT. These results could be explained by the disturbance of the hydrophobic environment and interaction between Phe-665Ac and Ile-690Ac. Yet another proof that small changes around the active site, even at second-shell positions, can have a profound effect on catalysis. Similar to single mutant F665S, double mutants F665S + I690N and F665S + I690Q displayed only 2% of the activity on sucrose compared to the WT. The activity of all four mutants on GalFru was lower than the WT or even too low to determine accurately. The proposed hydrogen network was thus not sufficient to properly position the EX7E motif for efficient catalysis.

The two other potential interaction partners of Phe-665Ac were subjected to site-saturation mutagenesis in an attempt to adjust the position of the EX7E motif and hence substrate specificity: SuSyAc LMDKVVA P513X and S468X. In total, 91 colonies of P513X and 94 of S468X were screened for their activity on sucrose and GalFru.

Figure 41 Screening results of library SuSyAc LMDKVVA P513X. Enzyme activity with 200 mM sucrose and 5 mM ADP (37°C) or 400 mM GalFru and 1 mM UDP (52°C) was measured using the BCA assay. LM: SuSyAc LMDKVVA sampled from the library. LMi: SuSyAc LMDKVVA inoculated manually (positive control).

75% of the colonies from library P513X showed no detectable activity on sucrose as represented by the plateau at OD 0.4 (Figure 41, left). In many cases, this will probably be the result of steric hindrance between the newly introduced (larger) side chain and other nearby residues.

Conclusions about the importance of the interaction between Pro-315Ac and Phe-665Ac can therefore not be made based on these results only.

The enzymes that produced the most fructose (highest OD value at 540 nm) overnight if GalFru was supplied were two positive controls: SuSyAc LMDKVVA manually inoculated in the MTP (Figure 41, right, LMi). Only three other colonies from the library seemed to show significant activity (OD > 0.5). Sequencing revealed a P513Y mutant and two enzymes without mutation at

120 Chapter 4: Driving the donor specificity of SuSy towards GalFru position P513. Expression and purification of P513Y revealed lower expression levels (soluble fraction) and lower activity on sucrose and GalFru (10%) compared to the parent enzyme.

In case of library LMDKVVA S468X, about half of the enzymes still showed activity on sucrose (Figure 42, left). Some of them were sequenced and found to have no mutation, a valine or an alanine residue at position 468. Sequencing of the most promising candidates for GalFru conversion, revealed two LMDKVVA enzymes without additional mutation and one S468A mutant (Figure 42, right). The latter was purified and found to have similar activity on sucrose and GalFru compared to SuSyAc LMDKVVA. This indicates that position 468 is not crucial for activity.

Figure 42 Screening results of library SuSyAc LMDKVVA S468X. Enzyme activity with 200 mM sucrose and 5 mM ADP (37°C) or 400 mM GalFru and 1 mM UDP (52°C) was measured using the BCA assay. LM: SuSyAc LMDKVVA sampled from the library. LMi: SuSyAc LMDKVVA inoculated manually (positive control).

Because of their potential interaction, two additional libraries were made saturating both Phe-

665Ac and Pro-513Ac/Ser-468Ac simultaneously: SuSyAc LMDKVVA F665X + S468X (C-lecta lib 5) and SuSyAc LMDKVVA F665X + P513X (C-lecta lib 6). As these libraries each consist of 400 (20x20 amino acids) possible variants, the amount of clones that needed to be screened to cover a substantial part of the library is much larger compared to the single-site saturation libraries described before. To evaluate libraries F665X + P513X and F665X + S468X, 453 and 450 colonies were screened, respectively. An example of screening results of one MTP of both libraries can be seen in Figure S15 and Figure S16.

Only 2% of the screened colonies of F665X + P513X and 3% of F665X + S468X showed detectable activity on sucrose. These included P513A, F665C, P513G + F665Y, F665A, P513A + F665Y, P513H + F665T, S468A, S468G, S468C, F665C, F665L, F665A, F665C + S468A and F665L + S468G. Considering the activity on GalFru, only variant P513T + F665E and F665I + S468Y seemed to be potentially interesting during initial screening experiments. These enzymes were therefore expressed and purified but the soluble expression appeared to be very low, activity on sucrose was less then 2% compared to SuSyAc LMDKVVA and activity on GalFru could not be detected.

121 Chapter 4: Driving the donor specificity of SuSy towards GalFru

Ala-664Ac

Similar to Phe-665Ac, substitution of Ala-664Ac by hydrophilic residues could result in new hydrogen bonds with the axial C4-OH group of the galactose unit. However, larger side chains could possibly also clash with an alanine at position 426, which was therefore mutated into a glycine (Figure 43, left). Furthermore, a glycine at position 664 was introduced to evaluate if the resulting flexibility in the EX7E loop would result in better activity on the non-natural substrate GalFru. Summarized, four mutants were made: SuSyAc A664G, A664S, A426G + A664S and A426G + A664T and their CCE was tested on sucrose and GalFru.

Suc + ADP SuSyAc A426G + A664T GalFru + UDP SuSyAc A426G + A664S

SuSyAc A664G

SuSyAc A664S

Empty pCXP34h

SuSyAc

0 1 2 3

OD 540 nm

Figure 43 (Left) Visualization of Ala-426Ac, Ala-664Ac and Gln-482Ac using the crystal structure of SuSyAt1 (PDB 3S28). Residue numbers are those from SuSyAt1 but corresponding positions in SuSyAc are shown between brackets. LCN*: galactose analogue. (Right) Activity of CCE on 200 mM sucrose and 5 mM ADP or 400 mM GalFru and 50 mM UDP.

SuSyAc A664S and both double mutants A664S + A426G/T were expressed in similar yields compared to the WT enzyme. Apparently, the expected clash with Ala-426 did not occur as the activity of A664S on sucrose was comparable with SuSyAc WT (Figure 43, right). Unfortunately, no improved activity was observed with GalFru, which was confirmed with the purified A664S enzyme. Both double mutants showed severely decreased activity on both substrates, despite the small difference between alanine and glycine. Ala-426Ac clearly plays a crucial role in the activity of the enzyme, which is reflected by the high conservation (97%) in other SuSy enzymes.

Furthermore, this residue is adjacent to His-425Ac (His-438 in SuSyAt1), which is 100% conserved in SuSy enzymes, hydrogen bonds with C6-OH of the glucose moiety and is believed to play an important electrostatic role in the active site48. Changing the alanine into a glycine could perhaps have resulted in the mispositioning of this His residue leading to decreased activity. In contrast to A664S, expression levels of A664G were lower and no activity could be detected on sucrose (or GalFru), indicating that rigidity around the substrate at this position is important for catalysis and folding.

In addition to the hydrogen bond with C3-OH of the glucose moiety, the main chain of Ala-664Ac also interacts with the side chain of Gln-482Ac (Figure 43). The latter could thus possibly influence the position of the EX7E motif.

122 Chapter 4: Driving the donor specificity of SuSy towards GalFru

A site-saturation library of this position was made in SuSyAc LMDKVVA and 89 colonies were screened (Figure 44). A substantial amount of colonies seemed to have some detectable activity left on sucrose, albeit reduced compared to the parent enzyme suggesting that the interaction between Ala-664Ac and Gln-482Ac contribute to the activity (or folding) of SuSy. As there were no real promising GalFru mutants, no enzymes from this library were purified.

Figure 44 Screening results of library SuSyAc LMDKVVA Q482X. Enzyme activity with 200 mM sucrose and 5 mM ADP (37°C) or 400 mM GalFru and 1 mM UDP (52°C) was measured using the BCA assay. LM: SuSyAc LMDKVVA sampled from the library.

Gly-666Ac

th Gly-666Ac (4 position of EX7E motif) is of interest because of its hydrogen bond with the equatorial C4-OH of the glucose moiety, which could be important to position the EX7E loop in the right conformation for catalysis (Figure 37). As it is located next to the important residue Phe-

665Ac, simultaneous saturation of these two positions was performed using SuSyAc as template: SuSyAc F665X + G666X. In total, 917 colonies (10 MTPs) were screened and the activity of the CCE was tested on sucrose (using 5 mM ADP) and GalFru (using 50 mM UDP). Only the output of ‘master plate’ (MP) 9 is presented in Figure 45, although results of the nine other MPs were similar.

Figure 45 Screening results of library SuSyAc F665X + G666X (MP14). Enzyme activity with 200 mM sucrose and 5 mM ADP (37°C) or 400 mM GalFru and 50 mM UDP (52°C) was measured using the BCA assay. WT: SuSyAc sampled from the library.

123 Chapter 4: Driving the donor specificity of SuSy towards GalFru

Only 5% of the library still showed considerable activity on sucrose. Half of them were sequenced, revealing mainly WT enzymes (78%) and only three different mutants: F665C, F665Y and G666C. In search for an improved GalFru mutant, the most promising colonies of each MP were sequenced, 38 in total. Almost half of them did not have any mutation (WT enzymes). On the other hand, five mutant enzymes were particularly interesting because they were identified more than once: F665C + G666D, F665E + G666C, F665C, G666F and F665Y. These variants were expressed in 250 mL cultures, His-tag purified and characterized (Table 15).

Table 15 Activity (U/mg) of purified enzymes belonging to library F665X + G666X with 5 mM ADP and 1 M sucrose or 400 mM GalFu at 60° (100 mM MOPS pH 7.0). /: No activity or activity too low to measure accurately within 30 min.

SuSyAc WT F665C + G666D F665E + G666C F665C G666F F665Y

Suc 63 ± 2 / 0.2 50 ± 4 / 70 ± 5 GalFru 0.12 ± 0.05 / / 0.03 / 0.05

In contrast to the site-directed mutants at position 665 described earlier, F665C and F665Y displayed similar activities on sucrose as the WT enzyme. This could be expected for F665Y as tyrosine structurally resembles phenylalanine but the significant difference in activity on sucrose between F665C (~50 U/mg) and F665S (~1 U/mg) is surprising. However, similar results were published recently by Huang and coworkers. They mutated several residues in the EX7E motif of rice SuSy including Phe-680, which corresponds to Phe-665Ac of SuSyAc, and found that F680Y retained partial enzymatic activity while F680S was inactive420. Unfortunately, none of the purified mutants showed improved activity on GalFru.

4.3.2 Correlated mutations

In context of the European SuSy project, Bioprodict created a 3DM structure-based alignment of UDP-glycosyltransferases belonging to different GT families such as GT1, to which many natural product plant GTs belong (e.g. flavonoid 3-O-glycosyltransferase VvGT1)242 and GT4, to which SuSy belongs. To identify correlated positions, which are promising targets to alter the specificity (Chapter 1), the Cornet tool was used. Analysis of the superfamily revealed that 3DM position 218 is the highest correlated position of the correlation network. Consequently, 3DM automatically created a subset with the most occurring residue on that position, which appeared to be a proline in this case, and recalculated the correlation network of this specific subset. The subset, called P218, contains a diverse set of enzymes such as glucosyltransferases (e.g. SuSys, sucrose-phosphate synthases, glycogen synthases, starch synthases), glycogen phosphorylases, N-acetylglucosaminyltransferases, and galactosyltransferases. Next, the system automatically searched for online literature related to each position of the network. Results from these publications were then used to associate the residues with keywords like specificity or enantioselectivity. An enrichment factor, which is a measure for the amount of publications related to the keyword, is calculated for each keyword. The correlation network of

124 Chapter 4: Driving the donor specificity of SuSy towards GalFru

P218 contains 260 positions. Interestingly, this network is highly enriched for specificity if the top- correlated positions are considered as represented by a plot of the enrichment vs correlated mutation cutoff (Figure 46). If the correlated mutation cutoff increases, the network under consideration becomes smaller (less positions), but the overall correlation between these positions is higher. To reduce screening efforts, we only focused on the four most highly correlated positions of the network: 9, 74, 224 and 122, which corresponded to a correlation cutoff of 0.91 and an enrichment factor for specificity of about 40 (Figure 46). This score is based on articles available for position 9 and 74. Aspartate at position 13 in a mannosyltransferases from glutamicum, corresponding to position 9 in 3DM, was mutated to several other residues. However, all variants displayed severely reduced activity (≤ 2% compared to WT), even the one with the most conservative substitution (D13N). It was postulated that this position plays a key role in acceptor specificity and regio-selectivity of mannosyl transfer421. Mutation of isoleucine at position 112 (3DM position 74) in OleD led to an increase in activity for a non-natural acceptor. Furthermore, a triple mutant having the I112K mutation also showed activity on alternative donor sugars, such as UDP-Gal, which were not accepted by the WT enzyme249. In trehalose synthase from Pyrococcus horikoshii, Asp-134 (3DM position 74) was replaced with Glu resulting in reduced activity for several acceptors62. Clearly, residues at 3DM position 9 and 74 play an important role in specificity, and by extension probably also the other residues of the network as they are correlated, making them interesting targets to drive the specificity of SuSy towards GalFru.

In SuSyAt1, position 9 and 74 are situated around the fructose binding site, while residues 224 and 122 reside in the vicinity of the Glc substrate (Figure 46). Note that position 224 (Phe-665Ac), being part of the EX7E motif, was already discussed in section 4.3.1. As site-saturation mutagenesis of these four correlated positions simultaneously would still result in a massive library, only those amino acids that occur more than 8% in the complete UDP-glycosyltransferase superfamily were included (Table 16 and Figure S17). The set of amino acids tested at each position thus represent a selection of those found in existing enzymes, which should increase the chance of finding active and stable enzymes422. Furthermore, the WT amino acid of SuSyAc was also included resulting in a library of 300 (4x3x5x5) possible variants.

Table 16 Correlated positions targeted for mutagenesis and randomization scheme based on the occurrence of amino acids within the complete 3DM UDP-glycosyltransferase superfamily.

Position (3DM) Position in SuSyAc Mutations (incl. WT) 9 H273 H,E,G,P 74 Y402 Y,D,G 122 E472 E,H,Y,A,S 224 F665 F,G,S,C,L

125 Chapter 4: Driving the donor specificity of SuSy towards GalFru

Figure 46 Outline of decision making process to engineer GalFru specificity using correlated mutations (A) Correlation network of the UDP- glycosyltransferase superfamily built by 3DM with a 0.73 correlation score. (B) Enrichment in function of correlated mutation cutoff within P218 subset. If a correlation cutoff of 0.91 is applied, the correlation network consists of four positions and the enrichment factor is about 40. (C) Top 4 correlated positions in the correlation network of subset P218. (D) Visualization of the top 4 correlated positions of subset P218 using the crystal structure of SuSyAt1 (PDB 3S28). Residue numbers are those from SuSyAt1 but corresponding positions in SuSyAc and 3DM (bold) are shown between brackets.

126 Chapter 4: Driving the donor specificity of SuSy towards GalFru

To evaluate this library (CorLib: H273H/E/G/P + Y402Y/D/G + E472E/H/Y/A/S + F665F/G/S/C/L), a total of 809 colonies (10 MTPs) were screened for their activity on sucrose and GalFru. An example is presented in Figure 47 although all other results looked completely similar. For none of the colonies, activity could be detected on sucrose, invigorating the functional importance of correlated positions. However, compared to the (manually inoculated) WT, none of the colonies showed increased fructose production using GalFru making it highly unlikely to find improved mutants unless expression is severely deprived.

Figure 47 Screening results of CorLib (based on correlated mutations) (MP6). Enzyme activity with 200 mM sucrose and 5 mM ADP (37°C) or 400 mM GalFru and 50 mM UDP (52°C) was measured using the BCA assay. WTi: SuSyAc manually inoculated (positive control).

Nonetheless, eleven of the ‘best’ colonies were sequenced and two variants were identified twice: triple mutant H273G + Y402D + F665G and single mutant E472H. Together with triple mutant Y402D + E472S + F665C and quadruple mutant H273P + Y402D + E472A + F665L, these four enzymes were purified and characterized (Table 17).

Table 17 Activity (U/mg) of purified Corlib mutants with 5 mM ADP an 1 M sucrose or 400 mM GalFu at 60°C (100 mM MOPS pH 7.0).

Suc GalFru

SuSyAc WT 63 ± 2 0.12 ± 0.05 E472H 0.12 / H273G + Y402D + F665G 0.33 / Y402D + E472S + F665C / / H273P + Y402D + E472A + F665L / /

As expected, none of the enzymes displayed detectable activity on GalFru and only two of them showed activity on Suc, albeit severely reduced compared to the parent. Furthermore, expression was severely deprived for all of the variants except for E472H (data not shown).

127 Chapter 4: Driving the donor specificity of SuSy towards GalFru

4.3.3 Conserved residues of GT4 galactosyltransferases

Sequence information of homologues enzymes having the desired specificity can guide rational mutagenesis. In this case, galactosyltransferases are of particular interest. GalTs are found both in prokaryotic and eukaryotic cells although they share little structural similarity423. They are classified into two GH and 20 GT families according to the CAZY database424. The GT4 family, to which SuSy belongs, also contains several bacterial GalTs (Table S12). Some of them are unequivocally identified as GalTs while others are only annotated as such based on sequence homology246,425–431. Unfortunately, no crystal structure is available for this type of GalTs and sequence alignment of the only 20 available GT4 GalTs revealed low overall sequence similarity.

Similar to other hexosyltransferases, GalTs from GT4 possess the conserved EX7E motif. Interestingly, the fourth position of the motif (3DM position 225) is occupied by a conserved proline in all but 1 GalT while a glycine is present in GlcTs (Table 18).

Table 18 Subset of retaining glycosyltransferases of the GT4 family that exhibit a correlation between their donor sugar specificity and the amino acid within the EX7E motif that corresponded to 246 position 310 of SiaDW-135 and SiaDY (adapted from , supplemented with additional data).

Gene Sugar Species EX E motif and flanking residues name donor 7

E. coli WaaB T S S F - E G F P M T L L E A L S W G K. pneumonia WbbO P S V Y S E G V P R I L L E A S S V G L. helveticus EpsG P S L W - E G L S V S A I E A Q A S G N. gonorrhoeae PgtA UDP-Gal P S Y Y R E G V P R S T Q E A M A V G

N. meningitidis SiaDW-135 T S E S - E G F P310 Y I F M E G M V Y D P. mirabilis WamB T S T H - E G L P M V L I E A Q S Y G S. enterica RfaB T S A F - E G F P M T L L E A M S Y G S. dysenteriae WbbP P S Y Y R E G V P R S T Q E A M A M G

A. laidlawii alDGS P S Y E - E T E G I V V L E G L A S K B. subtilis TagE S T S H F E G F G L S N M E A L S N G E. coli WaaG P A Y Q - E A A G I V L L E A I A A G L. helveticus EpsH UDP-Glc P S L F - E G F G N A L I E A Q A N G N. meningitidis SiaDY T S Q S - E G F G310 Y I F L E G M V Y D

T. elongatus SuSyTe P A L F - E A F G677 L T I L E A M I S G

A. caldus SuSyAc P A L F - E A F G666 L T V I E A M S S G

A. thaliana SuSyAt1 P A L Y - E A F G678 L T V V E A M T C G

Using this sequence information, Claus and coworkers were able to switch the donor specificity of two capsule polymerases (SiaD) of Neisseria meningitides. Mutating Gly-310 of the glucosyltransferase SiaDY into a proline, improved the activity on UDP-Gal three times while the activity on UDP-Glc was reduced to only 3% of the WT. Similarly, introducing a Glycine at position 310 in galactosyltransferase SiaDW-135 led to a 21-fold increase in activity on UDP-Glc while retaining only 13% of the activity on UDP-Gal compared to the parent. In SuSy, the same glycine residue is involved in hydrogen bonding with C4-OH of glucose (see section 4.3.1). In an attempt to obtain a switch towards GalFru, this residue was exchanged in SuSyAc by proline

128 Chapter 4: Driving the donor specificity of SuSy towards GalFru

(G666P) but subsequent enzyme assays revealed no activity on GalFru and only 0.02% activity on sucrose compared to the WT (Table 19). As the former strategy failed, the most occurring

EX7E motifs up to the fourth residue (EGFP: A664G + G666P, EGVP: A664G + F665V + G666P and EGLP: A664G + F665L + G666P) were introduced in SuSyAc QMDRVRN, a mutant with improved affinity on UDP. Unfortunately, these purified variants showed once again no detectable activity on GalFru and severely deprived or even no activity on sucrose (Table 19).

Table 19 Activity of purified variants containing conserved residues from GalTs with 400 mM GalFru or 1 M sucrose and 5 mM ADP, 5 mM UDP (SuSyAc QN6* and variants) or 9 mM UDP (SuSyAc WT and variants) at 60°C (100 mM MOPS pH 7.0). Suc: sucrose.

Enzyme Suc (ADP) GalFru (ADP) Suc (UDP) GalFru (UDP)

SuSyAc 63 0.1 33 0.04 G666P 0.01 / / / G666P + P513N 0.42 / 0.31 / G666P + A660S / / n.d. n.d. G666P + P513N + A660S / / n.d. n.d.

SuSyAc QMDRVRN (QN6*) n.d. n.d. 39 0.06 QN6* EGFP 0.2 / / / QN6* EGLP 0.02 / 0.02 / QN6* EGVP 0.12 / 0.02 / QN6* EGFP + A660S / / n.d. n.d.

The contradictory results with those obtained by Claus and coworkers, could perhaps be explained by the necessity for additional permissive mutations. Such mutations do not alter the specificity itself but are required to tolerate function-switching mutations. They are not necessarily located near the active site but rather stabilize local structural elements432–434. As permissive mutations are most likely correlated to the function- switching residues, a 3DM subset of 9 GlcTs (including SuSy and SPS) and 18 GalTs was made to identify those residues. Interestingly, two positions are highly correlated with 3DM position 225 (Gly-666Ac in SuSyAc): 219 and 132 (Figure 48). Position 132 (Pro-513 in SuSyAc) is 100% conserved in SuSy enzymes while 90% of the GalTs have an Asn instead. Moreover, this position was already discussed before because of its potential interaction with Phe-665Ac (see section 4.3.1). Position 219, Ala-660Ac in SuSyAc, is occupied by an alanine residue in 99% of the SuSy enzymes and a serine residue in 95% of the GalTs. Interestingly, 24 out of 28 subfamilies of the complete UDP-glycosyltransferase superfamily of 3DM containing the EX7(E) motif possess a serine at the corresponding position as consensus residue, while an alanine is most occurring in the subfamily of SuSy (99% A and 0.5% S), Sucrose-Phosphate Synthase (79% A, 17% S, 1% T and 2% P) and LPS synthase (84% A, 6 %S, 10% T) and a proline in the Trehalose Phosphate Synthase subfamily (6% A, 23% S and 71% P. These findings indicate that a Ser at that particular position (3DM number 219) probably was the ancestral state of the predecessor of modern day GT enzymes containing the EX7(E)

129 Chapter 4: Driving the donor specificity of SuSy towards GalFru motif. It has been shown before that evolution of new functions tend to drift towards their progenitor before a switch to the desired specificity is achieved435, making this position an interesting engineering target. Furthermore, GlcT SiaDY also has a Ser at position 219, which could explain the successful switch to Gal specificity by mutation of only position 225, a strategy that did not work with SuSyAc. The amino acid (219) is located on the same loop as the EX7E motif albeit a little further away from the substrates (Figure 48).

Figure 48 Visualization of Pro-513Ac and Ala-660Ac, which are correlated with Gly-666Ac, using the crystal structure of SuSyAt1 (PDB 3S28). Residue numbers are those from SuSyAt1 but corresponding positions in SuSyAc and 3DM (bold) are shown between brackets. LCN*: galactose analogue.

To evaluate the importance of the two correlated positions, four site-directed mutants and one site-saturation library were made: SuSyAc G666P + P513N, SuSyAc G666P + A660S, SuSyAc G666P + P513N + A660S, SuSyAc QN6* EGFP + A660S and SuSyAc QN6* EGFP P513X (Table 19). Expression of G666P + P513N + A660S was reduced while SuSyAc QN6* EGFP + A660S and SuSyAc G666P + A660S were expressed in similar yields compared to the WT. The SuSyAc G666P + P513N variant had severely reduced activity on sucrose (0.4 U/mg) compared to the WT (63 U/mg), but it was higher than the G666P (0.01 U/mg) single mutant. Unfortunately, no activity on GalFru could be detected. Furthermore, none of the three enzyme variants with the A660S mutation had detectable activity on sucrose or GalFru.

To test the SuSyAc QN6* EGFP P513X library, 92 colonies were screened on GalFru and Suc. Similar to the parent, none of the colonies showed detectable activity on sucrose and GalFru. Consequently, no variants were sequenced or purified.

130 Chapter 4: Driving the donor specificity of SuSy towards GalFru

4.4 Production of iCLEAs

As all structure-based strategies failed, a mutagenesis independent strategy involving imprinted cross-linked enzyme aggregates (iCLEAs) of SuSyAc was evaluated. For more information about molecular imprinting and iCLEAs, the reader is referred to the literature review (Chapter 1). The production of iCLEAs consist of four consecutive steps: imprinting of the enzyme with molecules to change the shape of the active site, precipitation by physical aggregation, chemical covalent crosslinking of the resulting aggregate and a wash step (Figure 12).

Imprinting has been successfully used before to introduce new specificities (e.g. the acceptance of D-galactose by glucose oxidase) or alter substrate and enantio selectivity196–200. The molecules used as imprinters can be the target substrate (which would be GalFru in our case), wild-type substrates (e.g. to change the enantioselectivity) or other competitive inhibitors/substrate analogues. In many cases, several parameters of the production process such as type of precipitant (e.g. solvent, ammonium sulphate (AS), polyethyleenglycol), crosslinking time and ratio of crosslinker vs protein needs to be optimized in order to achieve maximum activity208.

To precipitate our target enzyme, the first step in CLEA production, tert-butanol was added to the enzyme solution436. If a solution’s concentration of 80% (v/v) tert-butanol was used, only 2% of SuSyAc was left in the supernatant after centrifugation meaning that 98% of the enzyme succesfully precipitated. However, this clearly visible pellet was not able anymore to dissolve again and residual activity of this protein aggregate compared to the free enzyme was only about 2%. This indicates that the enzyme is probably denatured irreversibly. Precipitation normally consists of physical aggregation of proteins based on non-covalent bonding interactions without disturbing the tertiary structure and hence activity206,437. However, this type of aggregation is reversible and the enzyme should dissolve again when adding aqueous buffer, which was not the case with SuSyAc. Lowering the volume percent to 60% tert-butanol still led to an insoluble pellet, even when the enzyme was incubated with 100 mM sucrose (for 30’ at 30°C and 750 rpm, imprinting step) before adding tert-butanol. Sugar-assisted precipitation has been suggested as a means to protect the enzyme and enhance the activity of the dissolved protein aggregate438. Unfortunately, the activity of the (insoluble) enzyme aggregate of SuSyAc which was pretreated with sucrose also did not exceed 2% (compared to the soluble enzyme). Clearly, tert-butanol was not a good choice as precipitant agent for SuSyAc.

Consequently, ammonium sulphate (AS, (NH4)2SO4) based precipitation was evaluated (Table 20). Different concentrations of AS were tested by adding a volume of 500 g/L AS to a protein solution containing either 20 µg or 40 µg of protein in a total sample volume of 25 µL. This mixture was incubated for three hours at 4°C under agitation (1000 rpm) and protein precipitate was harvested by 20 min of centrifugation at 14000 rpm (and room temperature). The residual concentration of enzyme in the supernatant (2 µl) was measured with a nanodrop.

131 Chapter 4: Driving the donor specificity of SuSy towards GalFru

Table 20 Evaluation of precipitation using ammonium sulphate. A sample volume of 25 µL was used. Different concentrations of salt and soluble enzyme were tested. % precipitation was calculated as follows: (1-(residual concentration in supernatant after centrifugation/concentration before precipitation)*100).

% (NH4)2SO4 (w/v) Amount of SuSyAc (µg) % precipitation imprinting molecule

4 20 38 / 8 20 67 / 12,5 20 82 / 25 20 95 / 25 40 96 / 40 20 95 / 30 20 95 / 30 20 93 250 mM Suc 30 20 96 250 mM GalFru

Apparently, a concentration of AS between 25 and 40% (w/v) is perfectly suited to precipitate most of the protein (95%). It has to be noted that the precipitate (after centrifugation) starting from 20 µg of protein consists of a very thin white layer at the bottom of the Eppendorf tube and is sometimes difficult to see. The use of 40 µg of protein resulted in a clearly visible pellet. In contrast to the tert-butanol treatment, this enzyme precipitate was still able to dissolve if MOPS buffer was added and the residual activity of this redissolved enzyme was about 90% compared to the soluble enzyme (25% ammonium sulphate treatment). The prior incubation (20 min at 30°C and 800 rpm) of the enzyme with 250 mM of the imprinting molecules sucrose and GalFru (without AS) did not have any significant effect on precipitation (final concentration of Suc/GalFru after the addition of AS: 100 mM).

For the next step in the iCLEA production, cross-linking the enzymes in the precipitate, 2 µL or 10 µL of 0.5 wt% GA was added to 23 µL of AS treated sample corresponding to a GA/protein ratio of 0.54 and 2.9 respectively. This mixture was incubated for 1 hour at 4°C and 1000 rpm. It has to be noted that no reduction step was performed after glutaraldehyde treatment as it is still a matter of debate if this step is really necessary and it could even be harmful to the enzyme439–441. In case of a GA/protein ratio of 0.54, the crosslinking was not efficient enough and the protein redissolved again. A GA/protein ratio of 2.9 on the other hand led to a tight insoluble pellet, which was used for further activity tests on GalFru and sucrose. Apparently, crosslinking of the enzyme reduced its capacity to convert its substrates considerably. Indeed, the activity with sucrose was only about 6% compared to the soluble enzyme. Furthermore, imprinting of the enzyme with either sucrose or GalFru, did not improve the activity of the CLEA towards our target substrate GalFru (Figure 49). Although the activity of the (i)CLEA could be optimized further by testing alternative GA/protein ratios and incubation times, this is rather pointless as the imprinting strategy clearly did not have the desired effect.

132 Chapter 4: Driving the donor specificity of SuSy towards GalFru

iCLEA (250 mM GalFru)

iCLEA (250 mM Suc)

CLEA Activity on Suc Redissolved precipitate Activity on GalFru

Soluble enzyme

0 5 10 15 20 Activity (U/mg)

Figure 49 Activity (U/mg) of soluble SuSyAc, the redissolved precipitate after ammonium sulphate treatment, the glutaraldehyde treated CLEAs and iCLEAs.

5 Conclusions

SuSy catalyzes the reversible conversion of sucrose and NDP into fructose and NDP-glucose. Sucrose consists of a glucose unit covalently bound to fructose, while the latter is linked to a galactose moiety in the non-natural substrate GalFru. Although glucose only differs from galactose in the spatial position of the hydroxyl group at the fourth carbon, SuSyAc is not able to efficiently convert GalFru into UDP-galactose, a valuable nucleotide sugar. Km values of SuSyAc for GalFru exceed 700 mM and the activity is below 0.3 U/mg. A possible explanation could be the absence of specific hydrogen bonds between GalFru and UDP or between GalFru and Gly-

666Ac/Leu-667Ac. These interactions could be crucial to position the substrates correctly and to induce the structural rearrangements necessary to reshape the enzyme into its active, closed conformation.

In this contribution, a time and cost-effective screening protocol to find SuSy mutants with improved activity on GalFru was successfully developed. SuSyAc LMDKVVA, a variant with improved affinity for UDP, is preferred as template for mutagenesis, as the subsequent screening of mutants requires a much smaller amount of the expensive nucleotide. Based on the crystal structure, first-shell residues (Ala-664Ac, Phe-665Ac, Gly-666Ac and Leu-667Ac), all part of the

EX7E motif and positioned around the glucose moiety, and second-shell residues (Ala-426Ac, Ser-

468Ac, Pro-513Ac, Gln-482Ac, Ala-660Ac and Ile-690Ac) were identified and subjected to mutagenesis in an attempt to increase the activity for GalFru. In total, five single-site saturation libraries, three double-site saturation libraries and 22 site-directed mutants were made but variants with improved activity on GalFru were not found. In addition, alternative strategies, involving the introduction of conserved GT4 GalT residues, the simultaneous mutation of correlated positions related to specificity (His-273Ac, Tyr-402Ac, Glu-472Ac and Phe-665Ac) and the production of iCLEAs, were unsuccessful. Nonetheless, the mutagenesis experiments increased our knowledge about structure-function relationships in SuSy and revealed some crucial residues

133 Chapter 4: Driving the donor specificity of SuSy towards GalFru

for catalysis. For example, most of the mutations at position 665Ac, including smaller amino acids, severely reduced the activity on sucrose even though the side chain of Phe-665Ac points away from the active site. Other first-shell residues, such as Ala-664Ac, Gly-666Ac, Leu-667Ac and Ala-

426Ac, appeared to be crucial as well and even small changes in second-shell residues (e.g. Ile-

690Ac or Gln-482Ac) could reduce the activity. These results highlight the importance of the EX7E motif and its interaction partners on catalysis. In addition, several of the active-site mutations resultated in decreased expression levels in the soluble fraction, indicating their importance in proper folding.

.

134 Chapter 4: Driving the donor specificity of SuSy towards GalFru

6 Supplementary materials

Figure S11 Michaelis-Menten profiles of SuSyAc for GalFru in combination with 5 mM ADP or 20 mM UDP in 100 mM MOPS pH 7.0 at 60°C. Activity was determined with the BCA assay.

1,6 1,4 1,2 1 Carbosynth 0,8 Sigma 0,6 OD 540 nm 540 OD 0,4 0,2 0 0 10 20 30 UDP (mM)

Figure S12 Effect of the concentration of UDP, bought from different companies, on OD measured at 540 nm using the BCA assay.

Sucrose GalFru 2,5 2

2

1,5 1,5 1

1 OD 540 OD540 nm OD 540 OD540 nm 0,5 0,5

0 0 SuSyAc WT SuSyAc Empty SuSyAc WT SuSyAc Empty LMDKVVA pCXP34h LMDKVVA pCXP34h

Figure S13 Activity on sucrose and GalFru using optimized screening conditions. (Left) Optical density at 540 nm (BCA assay) after incubation of CCE of SuSyAc WT, SuSyAc LMDKVVA or Empty pCXP34 (negative control) with 200 mM sucrose and 5 mM ADP at 37°C for 20 min. (Right) Optical density at 540 nm (BCA assay) after incubation at 52°C with 400 mM GalFru and 50 mM UDP for 20 hours (CCE of SuSyAc WT and Empty pCXP34) or 1 mM UDP for 16 hours (CCE SuSyAc LMDKVVA). GalFru contains some impurities (glucose, fructose and/or sucrose) explaining the high OD observed for Empty pCXP34. As the University of Würzburg delivered different batches with slightly different purity (ranging from 99,9 up to 99,99%), absolute OD values can differ from those presented in this figure.

135 Chapter 4: Driving the donor specificity of SuSy towards GalFru

Figure S14 SDS-PAGE analysis of the crude cell extract of SuSyAc WT and Phe-665Ac mutants. Variants F665N and F665S are clearly less expressed compared to the WT (~92 kDa).

Figure S15 Screening results of library SuSyAc LMDKVVA F665X + S468X (MP1). Enzyme activity with 200 mM sucrose and 5 mM ADP (37°C) or 400 mM GalFru and 1 mM UDP (52°C) was measured using the BCA assay. LMi: SuSyAc LMDKVVA inoculated manually (positive control).

Figure S16 Screening results of library SuSyAc LMDKVVA F665X + P513X (MP5). Enzyme activity with 200 mM sucrose and 5 mM ADP (37°C) or 400 mM GalFru and 1 mM UDP (52°C) was measured using the BCA assay. LMi: SuSyAc LMDKVVA inoculated manually (positive control).

136 Chapter 4: Driving the donor specificity of SuSy towards GalFru

Figure S17 Amino acid distribution (occurence in percentage) at 3DM position 9, 74, 122 and 224 based on all sequences within the 3DM UDP-glycosyltransferase superfamily (130255 aligned sequences). Dashed lines represent the arbitrarily chosen occurrence cutoff of 8%. - : gap.

137 Chapter 4: Driving the donor specificity of SuSy towards GalFru

Table S10 List of primers used to construct site-directed mutants and site-saturation libraries.

Nr. Name Sequence 5’ → 3’

13 pCXP34_BB_Rv CTTTGTTTCCTCCGAATTCGAGGTC 14 pCXP34_BB_Fw CTGCAGGTCGACCATATGGG 30 SuSyAc_seq3_Fw TCGGAACGTCTGGGTGTTAC 40 pcxp34_Rv_seq4_goed CTTCTCTCATCCGCCAAAAC 58 SuSyAc_G666P_Fw CTGTTTGAAGCTTTCCCGCTGACCGTCATTGAA 64 oMEMO351_RV_5'rrnB AAAGGGAATAAGGGCGACAC 88 SuSyAc_F665S_Fw GCACTGTTTGAAGCTTCTGGCCTGACCGTCATTG 89 SuSyAc_F665T_Fw GCACTGTTTGAAGCTACCGGCCTGACCGTCATTG 90 SuSyAc_F665N_Fw GCACTGTTTGAAGCTAACGGCCTGACCGTCATTG 91 SuSyAc_F665H_Fw GCACTGTTTGAAGCTCATGGCCTGACCGTCATTG 93 SuSyAc_F665G_Fw GCACTGTTTGAAGCTGGCGGCCTGACCGTCATTG 95 SuSyAc_EGLP_Fw CCGGCACTGTTTGAAGGCCTGCCGCTGACCGTCATTG 96 SuSyAc_EGFP_Fw CCGGCACTGTTTGAAGGCTTCCCGCTGACCGTCATTG 97 SuSyAc_EGVP_Fw CCGGCACTGTTTGAAGGCGTGCCGCTGACCGTCATTG 98 SuSyAc_P513N_Fw AATTCAACATTGTGTCGAACGGCGCGGATCCG 109 SuSyAc_A664S_Rv GACGGTCAGGCCGAAAGATTCAAACAGTGCCG 110 SuSyAc_A664T_Rv GACGGTCAGGCCGAAGGTTTCAAACAGTGCCG 111 SuSyAc_A664G_Rv GACGGTCAGGCCGAAGCCTTCAAACAGTGCCG 112 SuSyAc_A426G_Fw TGCAATATCGCACATGGCCTGGAAAAAAGCAAATATC 145 pcxp34_Fw_seq4_goed GTTTTGGCGGATGAGAGAAG 150 SuSyAc_S468NNK_Fw CGGCCGACATTATCGTGACCNNKACGTATCAGGAAATC 152 SuSyAc_Q482NNK_Fw CGCCGGTAATGATCGTGAAATTGGCNNKTATGAAGGTCACCAA G 153 SuSyAc_ Q482_Rv GCCAATTTCACGATCATTAC 154 SuSyAc_P513NNK_Fw AAAATTCAACATTGTGTCGNNKGGCGCGGATCCG 155 SuSyAc_P513_Rv CGACACAATGTTGAATTTTGAG 156 SuSyAc_ L667NNK_Fw GCACTGTTTGAAGCTTTCGGCNNKACCGTCATTGAAGCAATGA G 157 SuSyAc_ L667_Rv GCCGAAAGCTTCAAACAGTGC 164 SuSyAc_P513_after_NNK_Fw GGCGCGGATCCGCGCTTTTATTTC 177 SuSyAc_L667A_Fw GCACTGTTTGAAGCTTTCGGCCTGACCGTCATTGAAGC 182 SuSyAc_I690N_Fw TTCGGCGGTCCGCTGGAAAACATCGAAGATGGCGTGAGTG 183 SuSyAc_I690Q_Fw TTCGGCGGTCCGCTGGAACAGATCGAAGATGGCGTGAGTG 184 SuSyAc_P659S_EG_Fw GGTGTTTTTGTCCAGTCTGCACTGTTTGAAGG 188 SuSyAc_A660S_Fw GCGGTGTTTTTGTCCAGCCGTCTCTGTTTGAAG 189 SuSyAc_A660S_Rv CTTCAAACAGAGACGGCTGGACAAAAACACCGC

138 Chapter 4: Driving the donor specificity of SuSy towards GalFru

Table S11 Overview of site-directed mutants and libraries and the protocol used to construct them.

Mutant/Library Template Protocol Fw primer Rv primer

Site-directed mutants

I690N SuSyAc Mutated insert (Q5) 182 40

WPP I690N SuSyAc F665S Sanchis 182 40 I690Q SuSyAc Mutated insert (Q5) 183 40 WPP I690Q SuSyAc F665S Mutated insert (Q5) 183 40 WPP F665N SuSyAc sanchis 90 64 F665T SuSyAc sanchis 89 64 F665S SuSyAc sanchis 88 64 F665H SuSyAc sanchis 91 64 F665G SuSyAc sanchis 93 64 G666P SuSyAc sanchis 58 64 L667A SuSyAc LMDKVVA Mutated insert (Q5) 177 40 P513N SuSyAc G666P sanchis 98 64 EGLP: SuSyAc QN6* sanchis 95 64 A664G + F665L + G666P EGVP: SuSyAc QN6* sanchis 97 64 A664G + F665V + G666P EGFP: SuSyAc QN6* sanchis 96 64 A664G + G666P P659S SuSyAc QN6* EGFP Mutated insert (Q5) 184 40 WPP A426G + A664T SuSyAc Sanchis 112 110 A426G + A664S SuSyAc Sanchis 112 109 A664G SuSyAc Sanchis 30 111 A664S SuSyAc Sanchis 30 109

Libraries

P513X SuSyAc LMDKVVA Insert (PS) 164 40

Mutated insert (PS) 154 40

Vector (Q5) 145 155

CPEC P513X SuSyAc QN6* EGFP Mutated Insert (Q5) 154 40 Vector (Q5) 145 155 CPEC

139 Chapter 4: Driving the donor specificity of SuSy towards GalFru

S468X SuSyAc LMDKVVA Mutated Insert (PS) 150 40

WPP Q482X SuSyAc LMDKVVA Mutated Insert (Q5) 152 40 Vector (Q5) 145 153 CPEC L667X SuSyAc LMDKVVA Mutated Insert (Q5) 156 40 Vector (Q5) 145 157 CPEC

Table S12 Galactosyltransferases belonging to the GT4 family.

Organism Protein name

Aeromonas salmonicida LPS α-1,6-galactosyltransferase (WasC) Bacteroides fragilis UDP-Gal α-galactosyltransferase (WcfP) coli442 Galactosyltransferase Escherichia coli425 UDP-D-galactose:(glucosyl)lipopolysaccharide-1, 6-D- galactosyltransferase (WaaB) Escherichia coli LPS/UndPP-GlcNAc α-1,3-galactosyltransferase (WbdH) Escherichia coli str. K-12 LPS α-1,6-galactosyltransferase (WaaB; RfaB) N,N'-diacetylbacillosaminyl-diphospho-undecaprenol alpha-1,3-N- Klebsiella pneumonia443 acetylgalactosaminyltransferase activity (WbbO) helveticus444 α-1,6-galactosyltransferase (EpsG; Eps3) Neisseria gonorrhoeae427 pilin α-galactosyltransferase (PgtA) Neisseria meningitidis246 bifunctional alternating UDP-Gal: α-1,4-galactosyltransferase / CMP- Neu5Ac: α-2,6- (SiaD-W-135) Proteus mirabilis428 galactosyltransferase (WamB) enterica429 LPS 1,6-galactosyltransferase (RfaB) dysenteriae430 LPS/UndPP-GlcNAc α-1,3-galactosyltransferase (WbbP; RfpB) mitis galactosyltransferase (WefM) Streptococcus oralis431 α-galactosyltransferase (WefI) Streptococcus pneumoniae445 UDP-Gal: cellobiuronic acid-PPL α-galactosyltransferase (Cap8H; WciS) Streptococcus suis galactosyltransferase (Cps1/2G) Streptococcus thermophilus426 EPS α-1,6-galactosyltransferase (EpsF) (possible fragment) Yersinia enterocolitica UDP-Gal:Und-P-P-OC2-α-1,4-galactosyltransferase (WbcN) (possible frameshift) Yersinia similis α-galactosyltransferase (WbhW)

140

CHAPTER 5: Engineering the stability of SuSyAc

141 Chapter 5: Engineering the stability of SuSyAc

1 Abstract

Sufficient thermostability is a key requirement for enzymes catalyzing carbohydrate conversions in industry. Indeed, these reactions are preferably performed at 60°C, mainly to avoid microbial contamination. In this chapter, the stability of SuSyAc was evaluated and three main strategies were applied to improve this property: prediction of stabilizing mutations by specialized computer software (Rosetta and FoldX), introduction of consensus residues and stabilization of the most flexible domain within the protein. Three out of 24 predicted mutations increased the thermodynamic stability in vitro, but only to a small extend (≤1.7°C). In addition, none of the mutants for which kinetic stability was assessed, showed significant improvements compared to the WT.

142 Chapter 5: Engineering the stability of SuSyAc

2 Introduction

High thermostability is a key requirement for enzymes catalyzing carbohydrate conversions in industry. Indeed, these reactions are preferably performed at 60°C, mainly to avoid microbial contamination. In addition, high temperatures can be beneficial to the overall process by reducing the viscosity, improving transfer rates and increasing solubility of substrates8,446. Thermostable enzymes can also be purified more easily as most of the contaminating proteins can simply be removed by a heat-treatment447. Furthermore, stable enzyme templates are preferred in protein engineering experiments involving the alteration of substrate specificity as mutations leading to altered protein function generally tend to be destabilizing448–452.

Unfortunately, many natural enzymes unfold at high temperatures, resulting in aggregation and loss of activity453. Increasing the stability of proteins can be achieved by addition of chemical additives, immobilization, random mutagenesis or rational strategies such as the B-fit method, consensus design, ancestral protein construction or chimeric designs180,220,224,228,230,454. Furthermore, stabilizing mutations can be predicted in silico by computer algorithms such as FoldX and Rosetta. These programs take into account all interactions that contribute to stability such as electrostatic interactions, Van der Waals interactions and hydrogen bonds and calculate the difference in free energy of folding (∆∆G)235,236. For more information, the reader is referred to the literature review. Although all techniques have proven to be successful, they all have their limitations and no golden standard exist so far.

An example of a carbohydrate active enzyme with industrial interest is Sucrose Synthase (SuSy), which catalyzes the reversible conversion of sucrose and a nucleoside diphosphate (NDP) into fructose and NDP-glucose84. Plant SuSys have optima ranging between 40 and 55°C but stability is often severely deprived at higher temperatures84,91,322–324. SuSy from Solanum tuberosum, for example, displays maximal activity at 56°C but stability at this temperature was very low91. Similarly, the cyanobacterial SuSy from T. elongatus has an optimum temperature at 70°C but stability was strongly impaired above 55°C (e.g. only 30% residual activity after 10 min of incubation at 60°C)325. In this work, the thermodynamic and kinetic stability of the proteobacterial SuSyAc was scrutinized. Furthermore, potential stabilizing mutations were predicted by visual inspection of the homology model of SuSyAc, computational methods (Rosetta and FoldX) and the consensus approach and subsequently evaluated in vitro.

3 Materials and methods

3.1 In silico prediction of stabilizing mutations: foldX and Rosetta

To predict stabilizing mutation in silico, FoldX 3.0-beta6 and Rosetta 3.5 were run on the high performance computing (HPC) cluster of Ghent University. All necessary information on how to connect to the HPC can be found on the HPC wiki (http://hpc.ugent.be/userwiki/index.php/User:VscConnect). Files were transferred from the

143 Chapter 5: Engineering the stability of SuSyAc computer of the user to the HPC (and vice versa) with WinSCP. In-house (Rosetta) or online (FoldX) available manuals were used to guide the process.

Because FoldX requires much less computational time compared to Rosetta, this algorithm was used to screen the whole enzyme for stabilizing mutations. First, an I-tasser homology model of SuSyAc was subjected to the repair function of FoldX with default parameters in order to avoid false results by distorted residues. At every position in the enzyme, all 20 naturally occurring amino acids were introduced (full saturation) and the corresponding ∆∆G was calculated with the build model function of FoldX using default parameters and five repetitions. As mutating residues in the catalytic site would be detrimental for the activity of the enzyme, all residues within 8 Å of the substrates were excluded from the analysis and considered fixed during energy calculations.

Next, the 105 top scoring point mutations of FoldX (∆∆G < -2 kcal/mol) were recalculated using the Rosetta algorithm with default parameters and 35 iterations. In addition, position 488 that appeared to be a crucial residue according to FoldX was fully saturated.

3.2 Construction of site-directed mutants

Mutations were introduced with the Sanchis or CPEC method as described before with primers listed in Table 21.

Table 21 List of primers used to construct the stability mutants. Fw: forward, Rv: Reverse.

Nr. Name Sequence 5’ → 3’

40 pCXP34_Rv_seq4_goed CTTCTCTCATCCGCCAAAAC 64 oMEMO351_RV_5'rrnB T2 AAAGGGAATAAGGGCGACAC 75 SuSyAc_Fw_D488T TATGAAGGTCACCAAACCTACACCCTGCCGGGC 76 SuSyAc_Fw_D488C TATGAAGGTCACCAATGCTACACCCTGCCGGGCCTG 77 SuSyAc_Fw_D488K ATGAAGGTCACCAAAAATACACCCTGCCGGGCCTG 78 SuSyAc_Fw_D488W ATGAAGGTCACCAATGGTACACCCTGCCGGGCCTG 79 SuSyAc_Fw_S674L GTCATTGAAGCAATGCTGTCCGGTCTGCCGGTG 80 SuSyAc_Fw_S522Y CCGCGCTTTTATTTCTATTACGCCCGTACGGAAG 81 SuSyAc_Fw_F128M GCTGACGGTTGATATGCGTGACTTCCGCCCGGTTAG 82 SuSyAc_Fw_E119M GGCCTGGGTGCAATGGGTGAAGCTGTGCTGAC 83 SuSyAc_Fw_A380M GCTGGAACGTTATATGCAGGATCTGGAACGC

84 SuSyAc_Fw_N190L GCTGTCGAACGGTCTGACCGATTTTGACAGCC 115 SuSyAc_Fw_S522P CCGCGCTTTTATTTCCCGTACGCCCGTACGGAAG 116 SuSyAc_Fw_R167L GATCTGGCGGCCGGTCTGTCCCAGATTCTGG 117 SuSyAc_Fw_L36H TGGCTGTACACCGATCATCAGCGTGCATGCGCTG 118 SuSyAc_Rv_N703D GGTTGCTTCGTGGTCATCCGGATCGATATGAAAAC 119 SuSyAc_Fw_L36K TGGCTGTACACCGATAAACAGCGTGCATGCGCTG 120 SuSyAc_Fw_L36S TGGCTGTACACCGATTCTCAGCGTGCATGCGCTG

121 SuSyAc_Fw_L36Q TGGCTGTACACCGATCAGCAGCGTGCATGCGCTG

144 Chapter 5: Engineering the stability of SuSyAc

122 SuSyAc_Fw_L36N TGGCTGTACACCGATAACCAGCGTGCATGCGCTG 123 SuSyAc_Fw_R167F GATCTGGCGGCCGGTTTTTCCCAGATTCTGG 124 SuSyAc_Fw_L433S GGAAAAAAGCAAATATTCTTACTCTGATCTGCATTG 125 SuSyAc_Fw_L433Q GGAAAAAAGCAAATATCAGTACTCTGATCTGCATTG 126 SuSyAc_Fw_L433N GGAAAAAAGCAAATATAACTACTCTGATCTGCATTG 130 SuSyAc_Rv_L36S CAGCGCATGCACGCTGAGAATCGGTGTACAGCCA 131 SuSyAc_Rv_L36Q CAGCGCATGCACGCTGCTGATCGGTGTACAGCCA 132 SuSyAc_Rv_L36N CAGCGCATGCACGCTGGTTATCGGTGTACAGCCA 133 SuSyAc_Rv_L433N CAATGCAGATCAGAGTAGTTATATTTGCTTTTTTCC 140 SuSyAc_Rv_L36H CAGCGCATGCACGCTGATGATCGGTGTACAGCC 141 SuSyAc_Rv_L36K CAGCGCATGCACGCTGTTTATCGGTGTACAGCC 142 SuSyAc_Fw_N703D GTTTTCATATCGATCCGGATGACCACGAAGCAACC 145 pcxp34_Fw_seq4_goed GTTTTGGCGGATGAGAGAAG 146 SuSyAc_Rv_L36 ATCGGTGTACAGCCAACTATC 148 SuSyAc_Fw_S674T GTCATTGAAGCAATGACCTCCGGTCTGCCGGTG 149 SuSyAc_Fw_A380T TGGCTGGAACGTTATACCCAGGATCTGGAACGCGAAG 158 SuSyAc_Rv_S674 CATTGCTTCAATGACGGTCAG 159 SuSyAc_Rv_A380T GCGTTCCAGATCCTGGGTATAACGTTCC 170 SuSyAc_Rv_L433S CAATGCAGATCAGAGTAAGAATATTTGCTTTTTTCC

3.3 Enzyme production and purification

Enzyme production and purification were performed according to the protocol described in Chapter 2 for SuSyAc.

3.4 Determination of kinetic and thermodynamic stability

Activity was assayed in the cleavage direction with the BCA assay as described before. Kinetic stability of purified SuSyAc was evaluated in the absence and presence of sucrose. In the first case, 0.17 mg/mL of the enzyme was incubated at 55°C or 60°C in a thermoblock and the residual activity was determined with 200 mM sucrose and 5 mM ADP (100 mM MOPS pH 7.0) at different time points with the BCA assay. In the second experimental set-up, 0.02 mg/mL of the enzyme was incubated with 400 mM sucrose at 45°C, 52°C and 60°C in a thermoblock and the residual activity was determined at different time points with sucrose and ADP in final concentrations of 360 mM and 5 mM (100 mM MOPS pH 7), respectively. The temperature of a blank solution in the thermoblock was monitored using a thermometer. Alternatively, 0.17 mg/mL of SuSyAc was incubated for 1 hour in a PCR device at 60°C. Residual activity of purified SuSyAc mutants (200 mM Suc, 5 mM ADP, 60°C) was evaluated after incubation of 0.17 mg/mL enzyme without sucrose for 1 hour at 60°C (thermoblock).

Thermodynamic stability was determined by differential scanning fluorimetry (DSF) using a real- time PCR detection system (CFX Connect, Bio-Rad). SYPRO Orange dye (Invitrogen) was

145 Chapter 5: Engineering the stability of SuSyAc diluted 400 times in 100 mM MOPS pH 7.0 and 5 µL of this dilution was added to 20 µL of 1 mg/mL enzyme solution in a PCR plate. Next, the temperature inside the PCR device was gradually increased at a rate of 1°C per minute from 20°C to 95°C. The fluorescence intensity (excitation/emission: 450 to 490 nm/560 to 580 nm) was measured every 1°C increase. Melting temperatures (Tm) of the enzymes were determined by CFX Manager Program (Bio-Rad) as the minimal value of the negative first derivative of the melting curve. The latter represents the relative fluorescence intensity units (RFU) in function of the temperature.

3.5 Statistical analysis

Sample standard deviations were determined using the STDEV.S function of excel. At least three replications were performed in each case. The statistical significance of the difference between parameters (e.g. Tm) was determined in R using the Wilcoxon rank sum test. The nul hypothesis (parameters are not statistically different) was rejected if p<0.05.

4 Results and discussion

4.1 Stability of wild-type SuSyAc

As outlined in the literature review, two types of stability can be evaluated: the thermodynamic and kinetic stability, characterized by the melting temperature (Tm) and the half-life of inactivation 212 (t50), respectively . Kinetic stability of SuSyAc was examined by determining the residual activity of the enzyme after incubation in a thermoblock for a defined period of time at 55 and 60°C in the absence of sucrose and at 45°C, 52°C and 60°C in the presence of sucrose (Figure 50).

Figure 50 Thermostability of SuSyAc in the absence (left) and presence (right) of 400 mM sucrose (Suc).

In the absence of sucrose, the enzyme retained its full activity for at least 30 minutes at 55°C.

The t50 of SuSyAc at 60°C was 27 min. If the enzyme was incubated together with its substrate sucrose, the t50 at 60°C was prolonged to 4 hours. This is not surprising, as it is known that sucrose can act as a stabilizing agent264,326. After two days of incubation at 52°C and 45°C with 400 mM Suc, SuSyAc displayed a residual activity of 63% and 92%, respectively. Alternatively, SuSyAc was also incubated in a prewarmed PCR device. In contrast to the residual activity

146 Chapter 5: Engineering the stability of SuSyAc observed using the thermoblock at 60°C (24.3 ± 8.9 %), the remaining activity after incubation of 1 hour at 60°C in the PCR machine was still 85.7 ± 9.5 %. Clearly, the temperature of the actual solution differs markedly between the two experimental set-ups and small differences at high temperatures probably have a profound effect on the stability of the enzyme. It has to be noted that the temperature of a reference solution (MOPS buffer) in the thermoblock was monitored because it differed significantly from the indicated temperature on the display (up to 5°C) of the device, especially at higher temperatures. This is however not possible in a PCR machine, although evaporation is more efficiently prevented here by the heated lid making it more appropriate for long-term incubations. These findings should always be kept in mind when designing experiments for assaying thermostability and when comparing results from different studies.

To determine the thermodynamic stability of SuSyAc, purified protein was subjected to differential scanning fluorescence (DSF) using SYPRO Orange as a fluorescent dye. SYPRO Orange is quenched in an aqueous environment but becomes fluorescent when binding occurs to the hydrophobic parts of a protein, which become exposed during unfolding. A qPCR device was used to measure the fluorescence in function of temperature, and for SuSyAc this resulted in a characteristic melting curve (Figure 51, left).

Figure 51 Melting curves and melting peaks of SuSyAc WT for six replicates. The melting temperature Tm of SuSyAc was 65°C on average. For some replicates, additional minima were observed around 42°C, 48°C and 82°C. RFU: relative fluorescence units.

The Tm (melt peak) of the WT was 65.0°C with a standard deviation of 0.4°C. In accordance with the kinetic data, the melting temperature of SuSyAc was increased to 66°C and 70°C in the presence of 160 and 800 mM sucrose, respectively. In addition to the large peak at 65°C, smaller minima around 42°C, 48°C and 82°C were observed for some replicates (Figure 51). Multiple peaks in a thermogram can be attributed to subsequent steps in the unfolding process of multimeric proteins: the dissociation of the oligomeric subunits of the protein, followed by unfolding of the monomers213,214. SuSyAc is indeed a tetramer330, but this would not explain why

147 Chapter 5: Engineering the stability of SuSyAc some replicates did not display the minima below 50°C. Alternatively, the smaller peaks could possibly originate from contaminating endogenous proteins in the sample. Depending on the purification batch, these proteins could be present in lower or higher amounts resulting in the absence or presence of the smaller peaks. Nevertheless, the method is much less laborious compared to kinetic measurements and the melting temperature derived from the most prominent peak could be determined with high reproducibility for different batches of purified SuSyAc.

4.2 Increasing the stability of flexible regions

It has been suggested that protein regions with increased flexibility are more prone to unfolding and present good targets to increase the thermostability221,455–458. To locate the most flexible regions of SuSy, the residues of the crystal structure of SuSyAt1 (PDB id: 3S28) were colored according to their B-factors (Figure 52). B-values or B-factors are determined by X-ray diffraction data and reflect the fluctuations of atoms with respect to their equilibrium positions. High B-factors are therefore indicative for flexible residues.

Figure 52 Cartoon representation of SuSyAt1 (PDB 3S28), colored based on the average B-value of the residue. Blue: low B-factor, red: high B-factor, green: average B-factor.

Clearly, the most flexible regions of SuSy are the N-terminal CTD domain (orange/red), part of the EPBD and beginning of the C-terminal domain (green) (Figure 52). Interestingly, these domains are also involved in subunit interactions between the different monomers. The catalytic domains, on the other hand, are quite rigid (blue) as could be expected.

Based on visual inspection of the homology model of SuSyAc, four candidate residues were identified which could possibly stabilize the N-terminal domain upon mutation. Substitution of Leu- 433 and Leu-36 into a hydrophilic residue could result in a stabilizing hydrogen bond with Thr-34 and Asn-703, respectively (Figure 53). Alternatively, a salt bridge could be introduced between these residues. Either way, this could lead to a strong interaction between the catalytic GT-B domains and the flexible N-terminal CTD domain. In total, 8 mutants were made: L433S/N/Q, L36S/N/Q, L36K + N703D and L36H + N703D.

148 Chapter 5: Engineering the stability of SuSyAc

Figure 53 Cartoon representation of SuSyAc (homology model) with positions that could stabilize the flexible N-terminal domain (red).

In contrast to the other mutants, expression of L36K/H + 703D and L36Q/N was severely reduced compared to the WT. Melting temperatures were determined for all these mutants (Figure S18 and Figure S19). In addition, kinetic stability was determined for L433S/N/Q and L36S, which displayed similar activities compared to the WT (Figure 54). Only L433Q had a slightly higher Tm (66°C), while five mutations had a negative effect on thermodynamic stability. Furthermore, none of the variants displayed a higher kinetic stability compared to the WT. Overall, these results indicate that the flexibility of the N-terminal domain is not the crucial factor during unfolding, or alternatively, that the introduced hydrogen bonds/salt bridges are not formed as predicted based on the homology model or are not sufficient to stabilize the domain.

100 % Residual activity at 60°C 70 90 Tm (°C) 68

80 66

70 64

C)

60 62 °

50 60 Tm( 40 58 30 56 % Residual actvitiy 20 54 10 52 0 50 SuSyAc L433S L433N L433Q L36S L36Q L36N L36K + L36H + N703D N703D

Figure 54 Thermodynamic and kinetic stability of Leu-433, Leu-36 and Asn-703 mutants.

149 Chapter 5: Engineering the stability of SuSyAc

4.3 Predicting stabilizing mutations in silico

To predict stabilizing mutations in silico for SuSyAc, FoldX236 and Rosetta235 were used. These programs both calculate the difference in free energy of folding (∆∆G) between the WT enzyme and the mutant, albeit with different scoring functions. FoldX, which has a much shorter computational time compared to Rosetta, was used to screen the whole enzyme. As no crystal structure of SuSyAc is available, a homology model was used as input structure. At every position in the enzyme, all 20 naturally occurring amino acids were introduced and the corresponding ∆∆G was calculated. As mutating residues in the catalytic site would be detrimental for the activity of the enzyme, all residues within 8 Å from the substrate were excluded from the analysis. The result of the FoldX calculations was an array of all mutations with ∆∆G values ranging from -4.303 to 80.431 kcal/mol. 3207 mutations yielded a negative ∆∆G and were thus predicted to be more stable. Since small differences in the scoring function of an algorithm can greatly influence the in silico prediction of stabilizing mutations459, Rosetta was used as a second opinion. Because of the large computational time required by this program, only those mutants with a ∆∆G lower than -2 kcal/mol according to FoldX, 105 in total, were re-evaluated. In addition, all 20 possible mutants at position 488 were re-calculated as this residue was clearly overrepresented in the subset of 105 FoldX mutants and thus highly likely to be important for stability. Asp-488 is situated in the GT-BN domain of SuSyAc, albeit far away from the ligands, and interacts with the N-terminal domain through a hydrogen bond with Tyr-88 (Figure 55).

Figure 55 Visual representation of Asp-488 and its interaction with Tyr-88 in the homology model of SuSyAc. The hydrogen bond is represented by a dotted yellow line. The catalytic GT-B domains are colored orange, the N-terminal domain red and the EPBD domain white.

150 Chapter 5: Engineering the stability of SuSyAc

The ∆∆G values calculated by Rosetta ranged from -30.26 to 72.632 kcal/mol. Interestingly, saturation of position 488 in Rosetta yielded two high-scoring mutations, D488K (-27.9 kcal/mol) and D488W (-23.6 kcal/mol), which were very bad (positive ∆∆G) according to FoldX. Using the homology model of SuSyAc, high-scoring mutations were investigated for their ability to form hydrogen bonds, salt bridges and hydrophobic interactions. Residues that were very conserved in SuSy enzymes or mutations that resulted in loss of hydrogen bonds (e.g. D488V) were discarded. Finally, eight positions were selected for mutagenesis (Table 22). Seven of the proposed mutations (R167L, S522Y, D488K, D488W, N190L, A380M and E119M) belonged to the top 8 stabilizing residues according to Rosetta and the other four (D488T, D488C, S674L and F128M) were selected from the top 20 residues according to FoldX. Except for Asp-488, all positions are located at the outer side of the enzyme, although the side chains of Asn-190, Ala-380, Glu-119 and Phe-128 are pointed towards the inside (Figure S20). The main energy terms predicted to contribute to the stabilization of the mutants were polar or hydrophobic solvation energy, main chain entropy, internal energy of sidechain rotamers and/or Van der Waals interactions.

Table 22 Difference in free energy of folding (∆∆G in kcal/mol) between selected mutants and SuSyAc WT, calculated by FoldX and/or Rosetta. More negative values indicate more stable proteins. N.d.: not determined. C: consensus residue.

Mutation FoldX Rosetta

D488T -3.21 -1.66 D488C -2.81 2.30 D488K 2.40 -27.9 D488W 1.13 -23.5 S674L -2.89 1.80 S674T (C) -0.50 n.d. S522Y -2.69 -30.2 S522P (C) -2.47 n.d. F128M -2.64 -2.38 E119M -2.12 -14.95 A380M -2.10 -18.54 A380T (C) 0.4 n.d. N190L -2.94 -22.04 R167L -2.35 -22.6 R167F 3.13 n.d.

In addition to the eleven mutants predicted to be stabilizing according to FoldX or Rosetta, four other mutants were tested in vitro: S674T, A380T, S522P and R167F. The latter was predicted to be more stable due to a potential hydrophobic stacking interaction with Phe-193. The others were chosen according to the consensus method, which states that improved stability can be achieved by introducing the most occurring residue of a sequence alignment of homologues enzymes228.

151 Chapter 5: Engineering the stability of SuSyAc

To calculate the amino acid distributions at the positions selected from the foldX/Rosetta analysis, an alignment of 498 SuSy homologues was used (Figure S21). Interestingly, all proposed substitutions by FoldX and Rosetta almost never occur in existing SuSy sequences (Leu-674: 1%, Met-380: 0%, Tyr-522: 0%, Leu-190: 12%, Met-128: 2%, Thr-488: 1%, Cys-488: 7%, Lys- 488: 1%, Trp-488: 0%). Three positions in SuSyAc were selected to introduce consensus residues: 674, 380 and 522. In other SuSys, a threonine is the most occurring residue at the first two positions while proline is most prevalent at position 522. In addition, Pro is the most rigid residue and has previously been introduced in other proteins to enhance thermostability456,457,460.

Remarkably, despite the low occurrence of proposed mutations in other SuSy enzymes, all 15 mutants were successfully expressed and most of them also showed similar activities compared to the WT. To evaluate the in silico predicted stabilizing mutations, thermodynamic and/or kinetic stability of the purified enzymes was determined (Figure 56 and Figure S22-Figure S25).

100 % Residual activity at 60°C 70 90 Tm (°C) 68 80 66

70 64

60 62 C) 50 60 °

40 58 Tm(

30 56 % Residual activity 20 54 10 52 0 50

Figure 56 Thermodynamic and kinetic stability of Asp-488, Phe-128, Ser-522, Ser-674, Glu-119, Ala- 380, Asn-190 and Arg-167 mutants. Melting temperatures (Tm) reported, are those found in the same range of the most prominent melting peak of the WT. C: consensus residue.

Mutants D488K and D488W, which were in the top 3 according to Rosetta but predicted to be destabilizing according to FoldX, only had 50% activity compared to the WT and completely lost their activity within 15 minutes at 60°C. For D488K, the melting temperature was also severely decreased (between 42°C and 48°C, Figure S22). Mutants F128M and E119M did not show any activity anymore after 30 min and 45 min of incubation, respectively. These two enzymes, however, still displayed melting temperatures between 64 and 65°C indicating that both types of thermostability were not related in this case. None of the mutants displayed a higher kinetic stability, and only S522Y (66°C) and A380M (66.7°C) had statistically significant higher melting

152 Chapter 5: Engineering the stability of SuSyAc temperatures compared to the WT (65°C). Interestingly, S522Y was predicted by Rosetta to be the most stabilizing mutation. However, combining the two best mutations regarding thermodynamic stability (A380M + S522Y) yielded a variant with a lower Tm (63.3°C) compared to the WT.

5 Conclusions

The half-life of inactivation at 60°C and the melting temperature for SuSyAc were found to be 27 min and 65°C, respectively. Sucrose, one of the substrates of the enzyme, acts as a stabilizer and prolonged the t50 at 60°C to 4 hours and increased the Tm with 1-5°C, depending on the concentration of the disaccharide. It has to be noted that kinetic stability measurements often suffered from high variability. In addition, large differences in stability were observed depending on the device for incubation (PCR vs thermoblock). Tm values determined by DSF, on the other hand, were much easier to reproduce, even with different batches of enzyme. However, it remains unclear why multiple melting peaks were present in some cases. Three main strategies were applied to improve the stability of SuSyAc: prediction of stabilizing mutations by FoldX and Rosetta, introduction of consensus residues and stabilization of the most flexible domain within the protein. Remarkably, all four mutations at position 488, which were ranked very high according to FoldX or Rosetta, decreased the thermodynamic and/or kinetic stability of the enzyme markedly. Introduction of consensus residues appeared to have a neutral or negative effect on the melting temperature. None of the mutants for which kinetic stability was assessed, showed significant improvements compared to the WT. Mutation L433Q, predicted to stabilize the flexible N-terminal domain, and mutations S522Y and A380M, predicted to be stabilizing according to FoldX and Rosetta, increased the thermodynamic stability, but only to a small extent (≤1.7°C). In addition, double mutant A380M + S522Y even had a lower melting temperature compared to the WT. Summarized, most of the mutations, predicted to be stabilizing according to the consensus approach, computer algorithms or visual inspection of the homology model of SuSyAc, did not result in more stable enzyme variants according to in vitro experiments.

153 Chapter 5: Engineering the stability of SuSyAc

6 Supplementary materials I SuSyAc L433S I

MekCUfve MeltPeak 35

30 ·:···

25 M 1- ... -1000 ; 20 .. ..."'::> ~ 0: 0: ~ 15 .... .;..... ••(•• -2000

10 -3000

20 30 40 50 60 70 80 90 20 30 40 50 60 70 80 90 Ttmptr.ttur., Celsius Ttmpuatu.-., Ctlslus I SuSyAc L433N I

MekCurve Melt Peak

25

20 >· ~ .. I' 1-... ~ ..."'::> -1000 ~ 15 0: 0: ~

-2000 10 ······> ....

-3000 20 30 40 50 60 70 80 90 20 30 40 50 60 70 80 90 Ttmperatu.-., Celsius Ttmptr.nu,.., Ctlstus I SuSyAc L433Q I

MekCUfYe M.. Peak 25

20

M" ...1- 15 -1000 ·····} •}•• ~ "'~ ~ 0: 0: ~ 10 -2000 ......

-3000 20 30 40 50 60 70 80 90 20 30 40 50 60 70 80 90 Ttmptr.ttu,.., Celsius Ttmperature, Celsius

SuSyAc L365

MettCune Melt Ptak

500 25

20 ·:··· 1-... -500 ~ ..."'::> 0: -1000 ... ~ 0: ~ 15 ...... -~ .. ' ... ·1500

10 -2000

-2500 20 30 40 50 60 70 80 90 20 30 40 50 60 70 80 90 Ttmptrature, Celsius Ttmptr.tture, Celsius Figure S18 Melt curves of L433S/N/Q and L36S, determined by DSF.

154 Chapter 5: Engineering the stability of SuSyAc

SuSyAc L36Q

MellCurn Mek Pt ak

13 200 12

11 ;; .... 10 g_ ·200 ~ g iZ iZ 0: 0: 'i -400 ...... ···J

-600

-800

20 30 40 50 60 70 80 90 20 30 40 50 60 70 80 90 Temper.n ure, Celsius Ttmptl'3ture, Celsius

SuSyAc L36N

MellCunoe Metl Peak 11 200

10 100

.., ..,.... :>. ~ 8 :> :> t: ·100 ..0: 'i -200

-300

20 30 40 50 60 70 80 90 20 30 40 50 60 70 80 90 Ttmper.n ure, Celsius Ttmptr.lture, Celsius

SuSyAc L36K + N703D

MellCu"'e MekPeak

10 100

.., ..,.... :>. ~ ..:> iZ 0: 0: 'i -100

-200

20 30 40 50 60 70 80 90 20 30 40 50 60 70 80 90 Ttmptr.lture, Celsius Ttmptr.nure, Celsius

SuSyAc L36H + N703D

Mell Peak 1000 40 500 35 ..,.... f 30 :>. -500 ~ ..:> :> 0: .. 25 -1000 0: 'i

20 -1500

-2000 15

20 30 40 50 60 70 80 90 20 30 40 50 60 70 80 90 Ttmpt r.nure, Celsius Ttmptr.nure, Celsius Figure S19 Melt curves of L36Q/N and L36H/K + N703D, determined by DSF.

155 Chapter 5: Engineering the stability of SuSyAc

Figure S20 Visualization of the target positions for stability engineering (as determined by FoldX and/or Rosetta) in the homology model of SuSyAc.

156 Chapter 5: Engineering the stability of SuSyAc

F128 E119

100 100

80 80 60 60

40 40 %occurence %occurence 20 20 0 0 R H K D E S T N Q C U G P A V I L M F Y W - R H K D E S T N Q C U G P A V I L M F Y W -

R167 N190

100 100

80 80 60 60

40 40 %occurence %occurence 20 20 0 0 R H K D E S T N Q C U G P A V I L M F Y W - R H K D E S T N Q C U G P A V I L M F Y W -

A380 D488

100 100

80 80 60 60

40 40 %occurence %occurence 20 20 0 0 R H K D E S T N Q C U G P A V I L M F Y W - R H K D E S T N Q C U G P A V I L M F Y W -

S522 S674

100 100

80 80 60 60

40 40 %occurence %occurence 20 20 0 0 R H K D E S T N Q C U G P A V I L M F Y W - R H K D E S T N Q C U G P A V I L M F Y W -

Figure S21 Amino acid distributions (occurrence in percentage) of selected FoldX/Rosetta positions based on a sequence alignment of 498 putative SuSy sequences. Numbering is based on the sequence of SuSyAc. It has to be noted that the amino acid distribution at position 119 could be misleading as this residue is located in a region with many gaps.

157 Chapter 5: Engineering the stability of SuSyAc

I SuSyAc D488K I

MiltCurve MeltPeak 1000 50 500 45 ;M 40 ~ 35 ~ ~ a: -500 a: 'i 30 -1000 25 -1500 20

40 50 60 10 80 !10 40 50 60 10 80 90 Ttmptr

I SuSyAc D488T I

Mt lt Curve MeltPetk

500

20 ......

.... f "- g_ 15 "::l ... -500 ...::l a: "' f -1000 10

-1500

40 50 60 70 80 90 40 50 60 70 80 90 Temperature, Celsius Temperature, Celsius

I SuSyAc D488C I

Mtlt Curve MtltPuk

24 500

22

f 20 ~ g_ 5' ::l ... 18 ~ ... '1 "' 16 -500

14

-1000 12 40 50 60 70 80 90 40 50 60 70 80 90 Temperature, Celsius Temperature, Celsius I SuSyAc F128M I

Mt lt Curve MtltPuk 30 500

25 .., ~ <:. ...~ -500 ·' ...... ~...... ~ ...... '•• ~ 20 "'f "' . . -1000 ...... ~.•...••••...... ~...... ••... . •. ~ ...... •..

15 -1500

40 50 60 70 80 90 40 50 60 70 80 90 Temperature, Celsius Temperature, Celsius Figure S22 Melt curves of D488T/C/K and F128M, determined by DSF.

158 Chapter 5: Engineering the stability of SuSyAc

I SuSyAc S522Y I

MeltCurve MeltPtak

35 500

30 ,_ I' ".... ~ ~ -500 ~ 25 0: 0: ~ -1000

20 -1500

40 50 60 70 80 90 40 50 60 70 80 90 Ttmptr.uu,., Celsius Ttmptrature, Ctlsius I SuSyAc S522P I

MiltCurve Melt Puk 500

20

,_ -500 15 ... ". ~ ~ +1000 ...::> 0: 0: ~ 10 -1500

-2000

-2500

20 30 40 50 60 70 80 90 20 30 40 50 60 70 80 90 Ttmptr

J SuSyAc A380M J

Mtlt Curve Mt lt Pt3k 30 500 ., 25 h >=. i "- ::> 20 I; ... -500 "' ""'

15 -1000

40 50 60 70 80 90 40 50 60 70 80 90 Temperature, Celsius Temperature, Celsius

I SuSyAc A380T J

MeltCurve MekPeok

500

20

f ~ "- -500 .. ;. ' • ' • ' ' I • • ~ ...::> ...::J 15 a: ~ "' -1000

-1500 10

-2000 20 30 40 50 60 70 80 90 20 30 40 50 60 70 80 90 Ttmptrature, Celsiu s Ttmptr.Uur., Celsius Figure S23 Melt curves of S522P/Y and A380T/M, determined by DSF.

159 Chapter 5: Engineering the stability of SuSyAc

I SuSyAc R167L I

MeltCu~e Melt Peak 50

6500

6000 .... 5500 ":>. -50 rl ~ 0: 0: 5000 'i -100 4500

4000 -150

20 30 40 50 60 70 80 90 20 30 40 50 60 70 80 90 Ttmpt ratu~ . Celsius Ttmptr..lture, Celsius I SuSyAc R167F I

Melt C u~e Mett Peak 30 500

25

-500 M 20 .... ":>. ê. ~ ·1000 ~ 15 0: 0: 'i -1500

10 -2000

-2500

20 30 40 50 60 70 80 90 20 30 40 50 60 70 80 90 Ttmptr3ture, Celsius Ttmptrature, Celsius

SuSyAc S674L

Melt Curve Melt Puk

25 500

0 ;;;- 20 h -500 "'- ~ ~ !!;. "V "' 15 -1000

-1500 10 -2000 40 50 60 70 80 90 40 50 60 70 80 90 Temperature, Celsius Temperature, Celsius

I SuSyAc S674T I

M4:1tCur·v• MeK Pu k 22 500

20

18 f .... 16 (···· :>. ~ " ~ -500 ~ ... .. 0: 0: 14 'i

12 -1000 10

20 30 40 50 60 70 80 90 20 30 40 50 60 70 80 90 Ttmptrature, Celsius Ttmptr.Uure, Celsius Figure S24 Melt curves of R367F/L and S674T/L, determined by DSF.

160 Chapter 5: Engineering the stability of SuSyAc

Figure S25 Melt curves of E119M, B190L and A380M + S522Y, determined by DSF.

161 Chapter 5: Engineering the stability of SuSyAc

162

CHAPTER 6: Introducing phosphorylase activity into a glycosyltransferase

163 Chapter 6: Introducing phosphorylase activity into a glycosyltransferase

1 Abstract The majority of glycosylation reactions in nature are performed by GTs, which use nucleotide- activated sugars as glycosyl donors to transfer a sugar residue to an acceptor molecule. Natural acceptors of these enzymes are sugars or non-saccharide molecules such as lipids, secondary metabolites (e.g. flavonoids, antibiotics) or proteins and the resulting oligosaccharides or glycoconjugates have several applications in food, feed, cosmetic and pharmaceutical industries11,403–407,461. However, large-scale application of GTs is currently still hampered by the high cost of their nucleotide-activated sugar donors. One way to circumvent this issue is to engineer the donor specificity of GTs towards smaller and cheaper glycosyl-phosphates. Successful conversion of a GT into a glycoside phosphorylase (GP) has, however, not yet been described in literature. Here, as a test case, Trehalose glycosylTransferring synthase from Fervidobacterium pennivorans (TreTFp) was subjected to site-directed mutagenesis to accept glucose 1-phosphate (Glc1P) instead of UDP-glucose. Target residues were chosen based on sequence comparison with retaining Trehalose Phosphorylases (TreP), which belong to the same CAZY family (GT4) as TreT but use Glc1P as donor substrate. In addition, seven chimeric enzymes consisting of (parts) of the GT-BN domain of TreTFp and (parts of) the GT-BC domain of TreP from Grifola frondosa were constructed.

164

Chapter 6: Introducing phosphorylase activity into a glycosyltransferase

2 Introduction

Trehalose (Tre) is a non-reducing disaccharide in which two α-D-glucose molecules are joined together by an α,α-(1-1)-. It can withstand high temperatures (up to 120°C for 90 min) and remains stable in a wide pH range (between 3.5 and 10)462. Because of the preserving and stabilizing properties of trehalose, it has several biotechnological applications in research, pharmaceutical, cosmetic and food industries. Indeed, it can be used to preserve enzymes, vaccines, cells, tissues and organs in either dried or frozen form463–465. Furthermore, it is added as an additive in beverages, baking products, frozen, dried or processed food, chocolate and candy but it is also naturally present in a variety of food products (e.g. mushrooms, honey, bread, beer,…)466,467. Due to its capacity to trap and reduce bad odors, trehalose is also used in the cosmetic industry as additive in creams and deodorants462.

Trehalose biosynthesis and/or degrading capacity is present in most species from all three domains of the biological world. In these organisms, trehalose fulfills an important role as energy source468, as signaling molecule, as structural component in cell walls of some specific bacteria469 and/or as a compatible solute, which can protect the organism against different types of stress such as extreme temperatures, dehydration, osmotic and/or oxidative stress469–471. During the course of evolution, six trehalose metabolizing pathways involving different enzymes were optimized and these are summarized in Table S13472. The most common and best studied route for trehalose synthesis involves the Trehalose 6-Phosphate Synthase (TPS)/Phosphatase (TPP) pathway472. It involves the production of trehalose 6-phosphate by TPS, starting from NDP-Glc and glucose 6-phosphate, and the subsequent cleavage of the phosphate group by TPP. Trehalose can also be formed by the combined action of Maltooligosyl-trehalose synthase (Trey) and maltooligosyl-Trehalose trehalohydrolase (TreZ) starting from maltooligosaccharides. Trehalose glycosylTransferring synthase (TreT, EC 2.4.1.245) accelerates the one-step formation of trehalose using NDP-Glc as donor and α-D-glucose as acceptor. In addition, some TreTs are also able to degrade trehalose with NDP473–475. TreT should not be confused with Trehalose Synthase (TreS, EC 5.4.99.16), which is a maltose α-D-glucosyltransferase that catalyzes the reversible conversion of maltose into trehalose. Reversible phosphorolysis of trehalose by Trehalose Phosphorylase can occur either with retention (TreP, EC 2.4.1.231) or inversion (EC 2.4.1.64) of configuration yielding α-D-glucose 1-phosphate (Glc1P) or β-D-glucose 1-phosphate, respectively. Finally, trehalose can also be hydrolyzed with trehalases (TreH).

Interestingly, the retaining TreP and TreT belong to the same glycosyltransferase family, GT4, which is rather unique. Indeed, all other known disaccharide phosphorylases, including the inverting Trehalose Phosphorylase, belong to a family (GH13, GH65, GH94 or GH112) according to the CAZY classification476. Furthermore, polysaccharide phosphorylases such as glycogen and belong to GT35 while their synthase counterparts belong to the GT3 or GT5 families. The classification of TreT and retaining TreP in the same

165

Chapter 6: Introducing phosphorylase activity into a glycosyltransferase

CAZY family does not only imply that they have a similar sequence, it also suggests a similar mechanism and structure and evolution from a common ancestor. Their shared ancestry presents an ideal opportunity to examine the possibility of turning TreT into a TreP by mutagenesis, using sequence information of the latter. Not only would this increase our knowledge about structure- function relationships in this family tremendously, it could also be the first step towards a general mechanism to turn a glycosyltransferase into a glycoside phosphorylase. This would allow us to use the wide range of available GTs for the cost-effective glycosylation of more interesting targets (e.g. resveratrol) as glycosyl-phosphates are much cheaper than nucleotide sugars.

3 Materials and methods

3.1 Alignments and phylogenetic analyses

The sequence of Trehalose synthase from Pyrococcus Horikoshii (TreTPh, accession number O58762) and Trehalose Phosphorylase from Grifola frondosa (TrePGf) (accession number O75003)were used as templates in the blast tool of the UniProtKB477 site to extract 250 putative TreT and 250 putative TreP homologues, respectively. These were used to build a multiple sequence alignment using clustal omega297. Several evolutionary analyses were then conducted in MEGA6 or MEGA7172. The amino substitution model that fitted the data the best, the LG model in this particular case, was determined using a maximum likelihood approach. The phylogenetic tree was inferred by using the Maximum Likelihood method based on the LG substitution matrix478. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor- Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then the topology with superior log likelihood value was selected. A discrete Gamma distribution was used to model evolutionary rate differences among sites (5 categories). 500 bootstrap replicates were used to evaluate trees with smaller subsets of sequences while only 150 replicates were used with the total set of 500 putative TreP and TreTs because of the large computational time.

3.2 Cloning of TreTFp from an inducible expression system (pET21a) into a constitutive one (pCXP34)

The TreTFp pET21a construct was kindly provided by Renfei Zhao. It consisted of a pET21a vector harboring a codon-optimized gene for E. coli expression, coding for the Trehalose synthase from Fervidobacterium pennivorans (TreTFp, accession WP_014451106 or UniProtKb ID H9UAU7), including a linker region (Leu-Glu) and a 6x His-tag at the C-terminal end.

The TreTFp gene was amplified from the available pET21a vector using primers 113 and 114 (Table 23), which consisted of a part complementary to the TreTFp gene and an overhang complementary to the pCXP34 vector. The pCXP34 backbone was amplified with primers 13 and 14 (Table 23). Amplification was performed with Q5 polymerase using the Q5 PCR protocol as previously described. Afterwards, the resulting mix was supplemented with DpnI and incubated at

166

Chapter 6: Introducing phosphorylase activity into a glycosyltransferase

37°C for at least 1 hour to remove original methylated plasmid DNA. The linear fragments were purified using a PCR purification (Analytik Jena) kit and joined together using the CPEC protocol as described before. The resulting plasmid was transformed into electrocompetent E. coli BL21 (DE3) cells.

Table 23 List of primers used to construct TreTFp pCXP34. Fw: forward, Rv: Reverse.

Nr. Name Sequence 5’ → 3’

13 pCXP34_BB_Rv CTTTGTTTCCTCCGAATTCGAGGTC 14 pCXP34_BB_Fw CTGCAGGTCGACCATATGGG 113 TretFp_Fw_pCXP34 CGAATTCGGAGGAAACAAAGATGACGGTGGAACTGCCGGTC 114 TretFp_Rv_pCXP34 CCCATATGGTCGACCTGCAGTCAGTGGTGGTGGTGGTGGTGT TTCAC

3.3 Construction of site-directed mutants and chimeric enzymes

Site-directed mutants of TreTFp and TreTFp/TrePGf chimeric enzymes were made using the CPEC or sanchis protocol as described before with primers and template plasmids listed in Table 24 and Table 25. The TrePGf pET21a plasmid was kindly provided by Renfei Zhao.

Table 24 List of primers used to construct TreTFp site-directed mutants and chimeric enzymes. Fw: forward, Rv: Reverse.

Nr. Name Sequence 5’ → 3’

40 pCXP34_Rv_seq4_goed CTTCTCTCATCCGCCAAAAC

134 TreTFp_G319E + L320V + ACGCGCGAAGGTTTCGAAGTGAAAATCTCCGAAATGATG V321K_Fw 135 TreTFp_G319E_Fw ACGCGCGAAGGTTTCGAACTGGTGATCTCCGAAATG 136 TreTFp_L320V_Fw CGCGAAGGTTTCGGCGTGGTGATCTCCGAAATGATG 137 TreTFp_T321K_Fw GAAGGTTTCGGCCTGAAAATCTCCGAAATGATG 138 TreTFp_E50L_Fw GGCGGTGGCGTTGCACTGCTGCTGATGACGATTG 145 pcxp34_Fw_seq4_goed GTTTTGGCGGATGAGAGAAG 169 TreTFp_E50L_Rv3 GCGGGACAATCGTCATCAGCAGCAGTGCAACGCCACCG 190 TreTFp_PIPSPK_Fw GTTGAATTCCCGGCGAGCACCGATTGGCTGGATGGCCTGAAT CGTGAAATTAG 191 TreTFp_PIPSPK_Rv CTAATTTCACGATTCAGGCCATCCAGCCAATCGGTGCTCGCCG GGAATTCAAC 192 TreTFp_Chima_Fw GTGAACTGCATGTGCAGGGACACCACCACCACCACCACTGAC TGCAG 193 TreTFp_Chimb_Fw GGATGTATCTGGCAGTTATGCACCACCACCACCACCACTGACT GCAG 194 TreTFp_Chim1_Rv ATCGATGTGGCAGCGCCAAATG 195 TreTFp_Chim2_Rv ATCGATGCTCGGCGGGAATTC

167

Chapter 6: Introducing phosphorylase activity into a glycosyltransferase

Nr. Name Sequence 5’ → 3’

196 TreTFp_Chim3_Rv TGCAACCACGGTGATAATCG

197 TrePGf_Chim1_Fw CATTTGGCGCTGCCACATCGATATTCGTAGCGATCTGGTTC 198 TrePGf_Chima_Rv TCCCTGCACATGCAGTTCAC 199 TrePGf_Chimb_Rv CATAACTGCCAGATACATCC 200 TrePGf_Chim2_Fw AATTCCCGCCGAGCATCGATGGTCTGAGCAAACATCTG 201 TrePGf_Chim3_Fw CGATTATCACCGTGGTTGCACGTTTTGATCCGAGCAAAGG 202 TreTFp_Chimc_Fw AATATGCCCGTACCCATGTTCGCAAAACCTATCTGTCAAC 203 TrePGf_Chimc_Rv AACATGGGTACGGGCATATTC

Table 25 Overview of methods, primers and template plasmids used to construct TreTFp site- directed mutants and chimeric enzymes.

Mutant Protocol Primers Template plasmid m1: E50L Sanchis 138 and 40 TreTFp pCXP34 m2: G319E Sanchis 135 and 40 TreTFp pCXP34 m4: T321K Sanchis 137 and 40 TreTFp pCXP34 m5: L320V Sanchis 136 and 40 TreTFp pCXP34 m6: G319E + L320V + V321K Sanchis 134 and 40 TreTFp pCXP34 m7: G319E + L320V + V321K + E50L CPEC 138 and 40 (insert) m5 145 and 169 (vector) m10: P192A + I194T + P196W + CPEC 190 and 40 (insert) TreTFp pCXP34 S198D + P199G + K200L 145 and 191 (vector) m11: G319E + L320V + V321K + CPEC 190 and 40 (insert) m7 E50L + P192A + I194T + P196W + 145 and 191 (vector) S198D + P199G + K200L m12: G319E + L320V + V321K + CPEC 190 and 40 (insert) m6 P192A + I194T + P196W + S198D + 145 and 191 (vector) P199G + K200L Chim 1a CPEC 192 and 194 (vector) TreTFp pCXP34 197 and 198 (insert) TrePGf pET21a Chim 1b CPEC 193 and 194 (vector) TreTFp pCXP34 197 and 199 (insert) TrePGf pET21a Chim 2a CPEC 192 and 195 (vector) TreTFp pCXP34 200 and 198 (insert) TrePGf pET21a Chim 2b CPEC 193 and 195 (vector) TreTFp pCXP34 200 and 199 (insert) TrePGf pET21a Chim 2c CPEC 202 and 195 (vector) TreTFp pCXP34 200 and 203 (insert) TrePGf pET21a Chim 3a CPEC 192 and 196 (vector) TreTFp pCXP34 201 and 198 (insert) TrePGf pET21a Chim 3b CPEC 193 and 196 (vector) TreTFp pCXP34 201 and 199 (insert) TrePGf pET21a

168

Chapter 6: Introducing phosphorylase activity into a glycosyltransferase

Amino acid sequences of TreTFp, TrePGf and all chimeras can be found in supplementary materials.

3.4 Expression and purification of TreTFp (mutants) from the pET21a and pCXP34 vector

Expression of the TreTFp gene, cloned into the pCXP34 vector, was performed as described in Chapter 2 for the SuSy enzymes. For the expression of the TreTFp gene, cloned into the pET21a vector containing an inducible T7 promotor, culture was first inoculated from a cryovial in 5 mL LB-Miller supplemented with 100 µg/mL ampicillin. This preculture was incubated overnight at 37°C with continuous shaking at 200 rpm. Next, 1% (v/v) of the culture was inoculated in 250 mL LB-Miller with 100 µg/mL ampicillin. This culture was grown at 37°C and 200 rpm until an OD of about 0.6 was reached (approximately after 2 hours). At this point, 50 µL 1 M isopropyl β-D-1- thiogalactopyranoside (IPTG) was added to induce expression. The culture was allowed to grow further overnight at 16°C and 200 rpm. Afterwards, it was harvested by centrifugation at 9000 rpm for 17 min in a Thermoscientific sorvall RC6+ centrifuge or 25 min at 4500 rpm in a Rotixa 50 RS swing bucket centrifuge and cell pellets were stored at – 20°C.

Purification of proteins was performed according to the protocol described in Chapter 2. The concentration of imidazole in the lysis, equilibration, wash and elution buffer was 0 mM, 10 mM, 30 mM and 250 mM in PBS pH 7.4, respectively

3.5 Enzyme assays

3.5.1 GOD-POD assay

The GOD-POD assay can be used to determine glucose concentrations479,480. In this assay, glucose oxidase (GOD) converts glucose into gluconolacton and hydrogen peroxide using oxygen. The peroxide is subsequently reduced by peroxidase (POD or POX), using ABTS (2,2'- azino-bis(3-ethylbenzothiazoline-6-sulphonic acid)) as electron donor. The oxidized ABTS has a green color, which can be measured with a spectrophotometer around 420 nm. The assay solution contains 69.2 µg/mL POD, 0.25 mg/mL ABTS and 452.594 µg/mL GOD dissolved in 100 mM Tris HCl pH 7.0. Samples or standard solutions (25 µl) were added to 200 µL assay solution and incubated for 30 minutes at 30°C. Absorbance was measured at 415 nm.

3.5.2 Glc1P assay

In this assay, Glc1P is first converted to Glc6P by a . Glc6P is subsequently used by Glc6P dehydrogenase for the production of glucono-1,5-lactone 6-phosphate, which is coupled with the reduction of β-NAD+ to NADH. The latter can be monitored at 340 nm. The + assay solution contained 2 mM EDTA, 10 mM MgSO4, 2 mM β-NAD , 10 µM glucose 1,6- bisphosphate (tetracyclohexylammonium) salt, 1.2 U/mL phosphoglucomutase (from rabbit muscle, sigma) and 3.2 U/mL Glc6P dehydrogenase (from L. mesenteroides, sigma) in 100 mM 169

Chapter 6: Introducing phosphorylase activity into a glycosyltransferase

Tris-HCl pH 7. 100 µL of this assay solution was combined with 55 µL Tris-HCl buffer and 20 µL standard or sample. After 30 minutes incubation at 37°C, absorbance was read at 340 nm.

3.5.3 TLC

Ascending chromatography was conducted with precoated silicagel plates (TLC Silica gel 60 F254 from Merck KGaA, EMD Millipore corporation) in closed glass tanks saturated with the developing solvent consisting of 85% acetonitrile and 15% H2O. Samples (2 µL) were spotted on the silicagel plate at one cm above the bottom edge of the plate. The spots were dried and the silica plate was developed in the glass chamber at room temperature until the solvent front migrated up to 1 cm from the upper edge. After a run, the plate was dried with a common hair dryer and put back into the same development solution. The plate was dried again with a hair dryer, soaked into a H2SO4 solution and heated with a hot air gun (Bosh PHG 500-2 level 1, 300 °C) to visualize the separated spots.

3.6 Screening protocol for site-directed mutants of TreTFp

A protocol was developed to screen mutant TreT enzymes for phosphorylase and synthase activity without the time-consuming purification step. First, culture containing the mutant plasmid is inoculated into a falcon containing 10 mL LB supplemented with 100 µg/mL ampicillin. Next, cells are harvested by centrifugation for 30 min at 4°C and 4500 rpm using a Rotixa 50 RS swing bucket. Supernatans is discarded and pellets are frozen at -20°C for at least one hour. Afterwards, they are thawed, redissolved in 100 µL lysis buffer, transferred to an eppendorf tube and incubated for 30 min at 37°C and 250 rpm. The lysis buffer consisted of 100 mM MOPS, 1 mM EDTA, 4 mM MgCl2, 50 mM NaCl, 1 mg/mL lysozyme and 100 µM PMSF. Cells are harvested again by centrifugation for 10 min at 14000 rpm. 40 µL of the resulting soluble crude cell extract (CCE) is subjected to a heat-treatment of 1 hour at 60°C. This results in a very turbid solution, which needs to be centrifuged for 10 min at 14000 rpm to pellet and remove all denatured proteins. The remaining supernatans still contains the target (mutant) TreTFp enzyme and can be used for activity tests. To test the substrate specificity, the heat-treated CCE was incubated with either 250 mM trehalose and 25 mM phosphate (phosphorylase activity) or 250 mM trehalose and 10 mM ADP (synthase activity). Phosphate was supplemented as potassium phosphate buffer (KH2PO4 + K2HPO4, pH7). In total, the CCE was diluted four times in both reactions. The release of Glc after 1 hour, 3 hours (in thermoblock) and/or overnight (in PCR device with lid temperature of 70°C to prevent evaporation) at 60°C in 100 mM MOPS pH 7.0 was monitored using TLC and/or GOD-POD.

3.7 Homology modeling

Homology models of TrePGf and chimeric enzymes were made using I-Tasser387 with TreTPh (PDB 2X6RA) as input structure. The homology model of TrePGf had a C-score of -2 while those of the chimeras ranged between 0.03 and 1.49.

170

Chapter 6: Introducing phosphorylase activity into a glycosyltransferase

4 Results and discussion

4.1 Comparison of TreT and TreP: functional properties, reaction mechanism, phylogenetic analysis and structure

Overview of currently characterized TreTs and TrePs: functional parameters and reaction mechanism

Retaining Trehalose Phosphorylase (TreP, EC 2.4.1.231) catalyzes the reversible conversion of trehalose (α-D-glucopyranosyl-(1→1)-α-D-glucopyranoside) and monoanionic phosphate - (H2PO4 ) into α-D-glucose 1-phosphate and D-glucose while Trehalose glycosylTransferring synthase (TreT, EC 2.4.1.245 ) accelerates the formation of trehalose using a nucleotide sugar (NDP-Glc) as donor (Figure 57). In addition, some TreTs are also able to degrade trehalose with NDP473–475.

Figure 57 Reaction scheme of retaining Trehalose Phosphorylase (TreP, EC 2.4.1.231) and Trehalose glycosylTransferring synthase (TreT, EC 2.4.1.245). Some TreTs are unidirectional and can only catalyze the synthesis of trehalose. NDP: nucleoside diphosphate (e.g. ADP), R: nucleobase (e.g. adenine), Glc: glucose.

Similar to SuSy, which also belongs to the GT4 family, reactions catalyzed by TreP and TreT most probably proceed via an SNi-like (‘internal return’) mechanism involving an oxocarbenium ion–like ternary complex transition state48,50,62,481. Nucleophilic attack and leaving group departure thus occur on the same face of the anomeric centre, resulting in retention of configuration. In the ordered bi-bi kinetic mechanism of TreP, phosphate binds before trehalose but both substrates must be present in the active site before glucose is released. Furthermore, substrate binding (mainly phosphate binding in TreP) would induce conformational changes necessary to rearrange the active site into the correct conformation for catalysis482.

Retaining TrePs are currently only identified in fungi and six native or recombinant enzymes from Pleurotus ostreatus483, Pleurotus sajor-cu484, Schizophyllum commune481,482,485–487, Agaricus bisporus488, Flammulina velutipes489 and Grifola frondosa (TrePGf)490,491 were already characterized (Table 26). Although the synthesis of trehalose is the thermodynamically favored reaction, the excess of phosphate and trehalose inside fungal cells and the inhibition of trehalose 171

Chapter 6: Introducing phosphorylase activity into a glycosyltransferase synthesis by phosphate in vitro indicate that phosphorolysis is the preferred direction in vivo481. Interestingly, the TreP from S. commune binds UDP-Glc, as it acted as a competitive inhibitor, but could not catalyze it485.

TreT, on the other hand, has been characterized from the hyperthermophilic archaea maritima475, Thermococcus litoralis474, Pyrococcus horikoshii (TreTPh)62,473,492, Thermoproteus tenax493 and the halotolerant bacterium xylanophilus494 (Table 26). TreT from T. tenax is unidirectional and catalyzes only the synthesis of trehalose whereas the others are reversible and capable of both the production and degradation of trehalose. Because of the clustering of the tret gene from T. litoralis with a trehalose ABC transporter, together with the observation of tret induction by the presence of trehalose in the growth medium and high Km values for glucose, a preference for trehalose degradation has been suggested for this enzyme in vivo474. However, the high Km values for trehalose and NDP, determined for several other reversible TreTs, present an indication that synthesis of trehalose is highly favored, conform to the chemical equilibrium62,473–475,494.

Table 26 Functional parameters of characterized Trehalose Phosphorylases (TrePs) and Trehalose glycosylTransferring synthases (TreTs). N.d.: not determined. /: no activity detected. Values were retrieved from Brenda, supplemented with additional information from literature. Pi: anorganic phosphate, Tre: trehalose, Glc: glucose.

TreP TreT

Temperature optima (°C) 30-40 481,483,484,490 37-90 474,475,494,495 Stability Very low 481,483,488 High 494 pH optima 5.8-7.5 481,483 5.5-10 474,475,494,495 Km Tre (mM) 53-91 481,483,484,488,489 12-82 473,474,494 Km Pi (mM) 0.8-5 481,483,484,488,489 / 474,493 Km UDP (mM) n.d. 30-71 475,492 Km ADP (mM) n.d. 7-40 475,492,494 Km GDP (mM) n.d. 24-54 475,492 Km Glc1P (mM) 1.6-47 481,483,484,488,489 / 473 Km UDP-Glc (mM) / 485 0.2-2.5 475,492,493 Km ADP-Glc (mM) n.d. 0.8-3 474,475,492,494 Km GDP-Glc (mM) n.d. 1.7-2 475,492 Km Glc (mM) 0.63-46 481,483,484,488,489 1-6 474,493,494 Oligomeric state Monomer 481,483 Monomer 475 Dimer 490,496 Dimer 473,474 Tetramer 488 Divalent cations Mg2+ independent activity481 Mg2+ can enhance the activity 474,493,497 Sequence length 700-800 400-550 (amino acids)

172

Chapter 6: Introducing phosphorylase activity into a glycosyltransferase

Phylogenetic analysis of TreT and TreP sequences

Blasting the TreP sequence of the fungi G. frondosa against the UniProtKb database, revealed only sequences from other eukaryotic fungal species as closest homologues (first 250 results). They all belonged to two of the seven major phyla of fungi: (subdivision Agaricomycotina) and (class saccharomyceta), which are also known as the ‘higher fungi’. Important to note is that not all these sequences were annotated as Trehalose Phosphorylase. Indeed, also ‘Glycosyltransferase family 4’, ‘uncharacterized protein’, ‘Trehalose synthase’, ‘Clock-controlled gene-9 protein’, ‘’ and ‘D-inositol-3- phosphate glycosyltransferase’ annotations were found.

If the TreT sequence of the archaeal species P. horikoshii was used as template for the UniProtKb blast tool, homologues were identified belonging to two domains of life (Figure S26): Archaea and Bacteria. Analysis of the 250 closest homologues of TreTPh revealed that the bacterial species belonged to 11 of the 30 known phyla such as , Firmicutes, , , , , , Planctomycetes, , and Proteobacteria while the archaeal ones belonged to the and , the two biggest phyla within the archaeal domain. Again, several other annotations than Trehalose glycosyltransferring synthase, such as ‘Glycosyl family or group 1’, ‘Glycosyl transferase’, ‘Trehalose synthase’, ‘uncharacterized protein’, ‘Lipopolysaccharide N-acetylglucosaminyltransferase’ and ‘Trehalose Phosphorylase’ appeared in the results.

The TreP and TreT sequences were used to build a multiple sequence alignment (MSA) using Clustal Omega297. The MSA was subsequently used in MEGA6 to calculate the substitution model that fits the data the best. This substitution matrix specifies the relative rates of different amino acid substitutions. According to the Bayesian information criterium (BIC), the LG model with different additions (G, G+I, G+F or G+I+F) scored the best. For more information about these parameters, the reader is referred to the literature review. Next, the substitution model and the MSA (or a subset) were used to construct a maximum likelihood phylogenetic tree with 150 or 500 bootstrap replications and results can be found in Figure 58.

As stated before, not all sequences were annotated as either TreP or TreT. However, in all bootstrap replications, the sequences derived from the blast with either TreT from P. horikoshii or TreP from G. frondosa were clustered together as indicated by the high bootstrap value of 99 (Figure 58, left). The sequences within the selected TreT and TreP subsets are thus highly likely to be true TreTs and TrePs, respectively, even if their annotation state otherwise. It is not unlikely that proteins are wrongly annotated as these annotations are often generated automatically and thus inherently prone to error.

173

Chapter 6: Introducing phosphorylase activity into a glycosyltransferase

Figure 58 Phylogenetic trees of putative TreT and TreP sequences constructed using MEGA6172. The evolutionary history was inferred by using the Maximum Likelihood method and a discrete Gamma distribution to model evolutionary rate differences among sites (5 categories). The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. Genus names from organisms which harbor a TreT or TreP that have been characterized are highlighted in bold. (Left) Putative TrePs (250 sequences) belong to the fungal phyla Ascomycota (green) or Basidiomycota (blue) while putative TreTs (250 sequences) are found in archaeal and bacterial species (red). (Right) distribution of a subset of TreTs within the Crenarchaeota (blue), Euryarchaeota (red) or Bacteria (green). 50 TrePs were used as outgroup. Bootstrap values are shown on each node and calculated using 500 bootstrap replications (right) or 150 bootstrap replications (left).

174 Chapter 6: Introducing phosphorylase activity into a glycosyltransferase

Both TreP and TreT belong to the GT4 family, which indicates descendance from a common ancestor. Nevertheless, TrePs are so far only identified in fungi, which are eukaryotes, while species with putative TreTs belong to the archaeal or bacterial domain. The peculiar distribution of the sequences within the three domains of life raises questions about the evolutionary trajectory of the two types of enzymes. How could this phylogenetic relationship be explained? Perhaps, TreP and TreT are derived from a common GT-like ancestor present in the last universal common ancestor (LUCA) (Figure S26). In the fungi, the ancestor would have subsequently be evolved into a TreP, while in some Archaea and Bacteria a TreT arose. The absence of TreT or TreP in many species could be explained by massive gene loss498 due to a change in the habitat in which no compatible solutes such as trehalose are necessary to survive or because of the inheritance or evolution of other proteins and strategies to cope with stress or to gain energy making TreP and/or TreT redundant. Another possible way to explain the data is that the enzymes, despite their similar function and sequence similarity, evolved from different ancestral genes. Indeed, although sequence similarity is assumed to be highly indicative for common ancestry, it can also be the result of convergent evolution. The convergence to similar sequences, starting from separate ancestral genes, is then merely a consequence of the restrictions imposed by functional fitness499,500. A third hypothesis, possibly complementary with the former two theories, is that several horizontal gene transfers occurred during evolution. HGT involves the transfer of DNA between organisms other than via vertical transmission from parent to offspring. It is very common among prokaryotic species, even between very distantly related ones, and has played a key role in their evolution and adaptation to new environments501,502. Although less frequently, HGT also occurred from bacteria to eukaryotes or between eukaryotes through phagocytosis (endosymbiotic gene transfer) or other unknown mechanisms502. In addition, several examples of HGT from archaea to bacteria and vice versa and from eukaryotes to bacteria and archaea are reported503. Consequently, it is very difficult to predict which taxa are the donor species and which ones are the acceptors, especially if the branches within the tree are statistically not well supported or if the distribution within the different domains is similar. Nevertheless, it is known that HGT result in tree topologies that deviate from the characteristic tree of life. In case of TreT, the latter is definitely true as both archaeal TreTs and bacterial TreTs cluster together (Figure 58, right), although not always with high reliability (low bootstrap values). An undoubtful case of HGT is presented by the sequences of the archaeum Methanosaeta and the bacterium Coprothermobacter which cluster together with maximum statistical significance (bootstrap value 100) and are 60% identical. In addition, the genome of T. maritima, which also harbors the TreT gene, contains 24% of archaeal-like genes504, indicating massive HGT from archaea to this species. Although this gives a possible clue about the transfer direction of TreT in this particular case, it is not justified to generalize it to the other cases.

175 Chapter 6: Introducing phosphorylase activity into a glycosyltransferase

Structure

Browsing through the PDB databank, no crystal structures for TreP were found. Instead, six crystal structures for Trehalose synthase from P. horikoshii (TreTPh) were available: crystal structure of wild-type TreTPh without ligands (2X6Q) or by soaking in trehalose (2X6R), crystal structure of a seleno derivative of TreTPh (2XA1) and crystal structures of TreTPh mutant E326A in complex with UDP-Glc (2XA2 and 2XA9) or UDP (2XMP). Structurally, TreTPh is a dimer and each monomer consists of two Rosmann like β/α/β domains characteristic for GT-B glycosyltransferases62. The catalytic center is positioned at the fissure between the N-terminal GT-B domain, which binds Glc and the C-terminal GT-B domain, which binds NDP-Glc. These two catalytic domains are connected through a linker loop constisting of 15 amino acids (Figure 59).

Figure 59 Visualization of the 3D structure of one monomer of TreTPh (2XMP). It consists of an N- terminal GT-BN domain, which binds glucose (Glc), and a C-terminal GT-BC domain, which binds NDP-Glc. Substrates are represented by dots: UDP from 2XMP (right) and Glc6P from the superposed crystal structure of TPSEc 1GZ5 (left). The linker loop connecting the two GT-B domains is colored blue.

The structure 2X6R showed a significant shift of the acceptor binding domain and some essential residues at the acceptor binding pocket compared to the other structures, possibly induced by trehalose62. These conformational changes are however not as prominent as those observed between the open structure of SuSyNe95 (without substrates) and the closed structure of SuSyAt148 (with UDP/fructose or a hydrolysis product of UDP-Glc) or between two structures of Trehalose 6-Phopshate Synthase from E. coli (TPSEc) with UDP-Glc505 and Glc6P/UDP506. In these two enzymes, the active site becomes shielded from the solvent either partially (TPS) or completely (SuSy) upon closing, which is not the case for 2X6R (Figure S27). Therefore, it is possible that none of the currently available TreT structures present the actual closed conformation that occurs upon substrate binding, as was also pointed out by the authors

176

Chapter 6: Introducing phosphorylase activity into a glycosyltransferase themselves62. Interestingly, although no crystal structure is available for TreP, initial velocity studies of product inhibition and inhibition by dead-end inhibitors suggested that phosphate binds first to the enzyme and plays a key role in the induced conformational changes while trehalose does not bind to the free enzyme.

The sequence length differs markedly between TrePs (700-800 amino acids) and TreTs (400-550 amino acids). Sequence alignment of several TreTs and TrePs (Figure S28) and a 3D superposition of the crystal structures of TreTPh and a homology model of TrePGf (Figure 60), revealed that this could be explained by an N-terminal extension (± 200 residues), larger loops (5- 10 additional residues) and a longer C-terminal extension in TreP. However, some studies have shown that (part of) the N-terminal extension of TreP (± 25 amino acids491) becomes cleaved during maturation of the native enzyme491,507 or during storage of the recombinant enzyme485. Remarkably, the larger loops in the GT-B domains of TreP (e.g. RSDLVHVK, Figure S28), seemed to be situated at the surface, while they would have been expected to extend in the active site, making it smaller and thus more appropriate for the smaller acceptor substrate (phosphate instead of NDP). However, it should be noted that this unexpected observation could also be the result of errors in the prediction of the homology model of TrePGf, which was constructed based on the (open?) structure of TreTPh.

Figure 60 (Left) Domain organization in TreT and TreP. (Right) Superposition of TreTPh 2X6R (red) and a homology model of TrePGf (blue) without the N-terminal extension. The additional C-terminal extension of TreP is colored cyan.

A schematic representation of the active site of TreTPh, based on the crystal structure 2X6R62, is presented in Figure 61A. N93ALQG and V304HAREV are parts of α-helices, G52GGVAE and

E326GFGLTVTE form partly a loop and an α-helix structure , V237SRF and V268GVM are parts of a loop and the beginning of a β-sheet, while the other stretches of amino acids that delineate the active site, are loop secondary structural elements. Interestingly, many of the active site residues of TreTPh are not conserved in other TreT sequences (Figure S29-Figure S32). In addition, those amino acids that are conserved, are most often not different from those occurring in TreP sequences at the corresponding positions. Exceptions - and thus interesting targets for mutagenesis - include: Glu-57, Pro-201, Ile-203, Pro-205, Ser-207, Glu-208, Lys-209, Gly-329, Leu-330 and Thr-331 (Figure 61B). The first one belongs to the glucose acceptor site while the 177

Chapter 6: Introducing phosphorylase activity into a glycosyltransferase

latter three are part of the EX7E motif, which is a conserved nucleotide sugar binding motif present in a variety of retaining GTs56,57,59. Residues 201-209 of TreTPh are situated in the linker loop that connects the two catalytic GT-B domains. Lys-209 forms hydrogen bonds with both the and uracil moiety of UDP. Glu-208 is possibly involved in a salt bridge with Arg-307, although the former is not at all conserved in other TreTs. Instead, it is a conserved glycine residue in TreP based on a sequence alignment of 500 putative TrePs and TreTs (Figure 61B and Figure S28). However, based on the 3DM structural alignment of glycosyltransferases, this conserved glycine residue aligns with Pro-205 of TreTPh. It is hard to predict, unfortunately, which of the two alignments is the right one.

Figure 61 (A) Schematic representation of the active site of TreT from P. horikoshii (TreTPh) based on crystal structure 2X6R. The sugar donor UDP-Glc and the acceptor (Glca) are represented as spheres. (B) Amino acid distribution of active site residues that differ between 250 (putative) TreT and 250 (putative) TreP sequences. For the PSIDPLSEK motif, differences were observed between the sequence-based alignment using Clustal Omega and structure-based alignment using 3DM. Amino acid numbering is based on the sequence of TreTPh in the crystal structure 2X6R (differs from the numbering of the UniProtKB TreTPh sequence) and the UniProtKB sequence of TreTFp. Positions that were mutated in TreTFp are highlighted in red (section 4.3). 178

Chapter 6: Introducing phosphorylase activity into a glycosyltransferase

4.2 Characterization of TreTFp

Recently, TreT from F. pennivorans was cloned into a pET21a inducible vector, expressed in E. coli BL21 (DE3), purified using His-tag chromatography and characterized (Renfei Zhao, unpublished results). The optimal temperature and stability in the cleavage direction was determined using the GOD-POD assay479,480, which measures the release of glucose. Similar to the other characterized TreTs, the enzyme appeared to be highly stable, with a temperature optimum of 80°C (Figure 62, unpublished results from Renfei Zhao). The enzyme was also able to catalyze trehalose formation from several NDP-glucose donors (data not shown).

Figure 62 Temperature optimum (left) and stability (right) of TreT from F. pennivorans (Renfei Zhao, unpublished results). The thermostability was determined by incubating the enzyme for 20 min at different temperatures (30-97.5°C) using a PCR device, cooling the enzyme for 5 min at 4°C and evaluating the residual activity using 100 mM NDP, 500 mM trehalose and 17 µg/mL enzyme at 70°C pH 7.0.

High expression yields could be obtained with the pET21a TreTFp construct. Starting from 250 mL culture, about 6.5 mg of the enzyme could be recovered after His-tag purification (Figure S33). However, expression required induction with IPTG, which makes the protocol more expensive, time-consuming and laborious compared to a constitutive expression system. Therefore, the enzyme was cloned into the constitutive expression vector pCXP34300 using the CPEC protocol. TreTFp could be successfully expressed from the pCXP34 vector and 2.3 mg of the enzyme was recovered after His-tag purification, starting from 250 mL culture. The yields are lower than those obtained with the pET21a vector, but this is most probably because of the shorter incubation time of the expression culture or because both promotors could have different strengths.

TreT is known to accept several nucleoside diphosphates in the cleavage direction473–475,493. To test the nucleotide acceptor preference of TreTFp, its activity with 250 mM trehalose and 10/50 mM ADP/UDP at 60°C was evaluated by TLC with acetonitrile:H2O (85:15) as solvent system. Based on the intensity of the glucose spot, highest activity on trehalose was achieved with UDP (Figure 63, lane 3 and 4). Moreover, activity could be detected within 1 hour with 10 mM ADP

179 Chapter 6: Introducing phosphorylase activity into a glycosyltransferase

(Figure 63, lane 1) but not with 50 mM ADP, indicating substrate inhibition by this nucleotide. Nonetheless, ADP was used in further experiments because it is cheaper than UDP. As a control, TreT was also incubated with trehalose as the only substrate, but hydrolysis was not detected within 1 hour (Figure 63, lane 6).

Figure 63 TLC analysis of the activity of TreTFp (pCXP34) with different concentrations of ADP and UDP after 1 hour at 60°C (100 mM MOPS pH 7.0). In each reaction, 250 mM trehalose (Tre) and 0.06 mg/mL TreTFp was present and the release of glucose from trehalose was evaluated. It has to be noted that NDP-Glc spots are masked by the trehalose spots. 4.3 Mutagenesis of TreTFp

To scrutinize sequence-function relationships in TreT and to obtain a possible switch to phosphorylase activity, amino acids in the active site of Trehalose synthase from F. Pennivorans were mutated into the corresponding residues common in TrePs according to a sequence-based alignment of 500 TrePs and TreTs (Figure 61B). In total, nine site-directed variants were made with one, three, four, six, nine or ten mutations in the glucose acceptor site, the EX7E motif and/or the linker loop between GT-BN and GT-BC (Table 27). To avoid the trouble of purifying each new mutant enzyme, a screening protocol using crude cell extract (CCE) was developed (section 4.3.1). In addition, several chimeric enzymes consisting of both TreTFp and TrePGf parts were constructed and evaluated (section 4.3.3).

4.3.1 Development of a screening method for TreTFp site-directed mutants

Both the phosphorylase and synthase activity of the mutants are preferably assessed in the breakdown direction of trehalose, because cheaper substrates and more convenient assays can be used in this way. However, the presence of native E. coli trehalases508 in the CCE, which hydrolyze the trehalose donor substrate, presented a major issue that had to be overcome. Basically, a protocol needed to be developed that specifically disables these trehalases but leaves TreTFp intact. Fortunately, TreTFp is a thermostable enzyme with an optimal reaction temperature at 80°C. Three different reaction set-ups were evaluated using E. coli with an empty pCXP34 vector as control: reaction temperature of 60°C, reaction temperature of 70°C and a prior heat-treatment of the CCE for 1 hour at 60°C followed by reaction at 60°C. Incubating the reaction at 60°C or 70°C was not sufficient to inactive the native trehalases, probably because of the stabilizing effect of trehalose. Instead, prior heat-treatment of the CCE at 60°C did destroy all trehalase activity, while maintaining TreTFp activity (a clear glucose spot could be seen on a TLC

180

Chapter 6: Introducing phosphorylase activity into a glycosyltransferase silica plate after 3 hours at 60°C with 250 mM trehalose and 10 mM ADP). A complete overview of this successful method and subsequent activity testing conditions are provided in the materials and methods section.

4.3.2 Evaluation of site-directed TreTFp mutants

The activity of the nine active site mutants was evaluated with 250 mM trehalose and 25 mM phosphate or 10 mM ADP. For the reaction with ADP, clear glucose spots on TLC were observed for E50L, V321K and L320V after 3 hours (Table 27). These results can give a clue about the sequence-function relationships in TreT. Indeed, Glu-50 is positioned in the acceptor pocket but does not seem to make interactions with other residues (based on the crystal structure of TreTPh 2X6R). This coincides with the observation that replacement of this residue did not impair the activity. Replacement of the small residue Gly-319 by the conserved but large glutamate present in TreP sequences, probably caused steric hindrance resulting in the abolishment of activity. Although the same was expected for V321K, this mutation was not detrimental for activity. The preservation of activity for mutant L320V, on the other hand, could be explained by the structural similarity of the two residues. Unfortunately, none of the mutants showed phosphorolysis activity with trehalose and phosphate after 3 hours or even overnight incubation.

Table 27 Synthase and phosphorylase activity of TreTFp and site-directed mutants with 250 mM trehalose as donor and 10 mM ADP or 25 mM potassium phosphate (Pi) as acceptor. +: a glucose spot was observed at TLC after 3 hours at 60°C.

Name Enzyme Mutations ADP Pi

TreTFp / + - TreTFp m1 E50L + - TreTFp m2 G319E - - TreTFp m4 V321K + - TreTFp m5 L320V + - TreTFp m6 G319E + L320V + V321K - - TreTFp m7 G319E + L320V + V321K + E50L - - TreTFp m10 P192A + I194T + P196W + S198D + P199G + K200L - - TreTFp m11 G319E + L320V + V321K + E50L + P192A + I194T - - + P196W + S198D + P199G + K200L TreTFp m12 G319E + L320V + V321K + P192A + I194T + P196W - - + S198D + P199G + K200L

To verify whether the absence of activity of the other mutants was due to a decrease in soluble expression or denaturation after the heat-treatment, or because of impaired catalysis, crude cell extracts were visualized on SDS-PAGE before and after heat-treatment (Figure 64). TreTFp and mutants 1-7 are clearly visible on the gel, indicating that loss of activity is correlated with loss of catalysis rather than soluble expression. In contrast, the presence of m10-12 could not be

181

Chapter 6: Introducing phosphorylase activity into a glycosyltransferase confirmed unambiguously by SDS-PAGE after heat-treatment, while there seemed to be expression in the soluble fraction (data not shown). Lack of activity could thus also be attributed to a decrease in stability leading to denaturation during heat-treatment.

Figure 64 SDS-PAGE analysis of CCE from TreTFp WT or site-directed mutants before and after heat treatment (HT, 1 hour at 60°C). WT: TreTFp (47 kDa), NC: negative control = pCXP34 empty vector. The arrows indicate the predicted position of TreTFp and variants.

4.3.3 Construction and evaluation of chimeric TreTFp/TrePGf enzymes

As already pointed out in the literature review, construction of chimeric GT-B enzymes has led in many cases to an altered substrate specificity, typically dictated by the acceptor and nucleotide sugar donor specificity of the parent enzymes supplying the N-terminal and C-terminal GT-B domain, respectively. Turning TreTFp into a phosphorylase could thus perhaps be achieved by exchanging the GT-BC domain of TreTFp, which is involved in binding of the donor substrate NDP-Glc, by that of TreP from G. frondosa (TrePGf), which uses Glc1P. TrePGf is a dimeric trehalose phosphorylase enzyme with an optimal temperature of 32.5°C and an optimal pH of 6.5 in the phosphorolysis direction490. TreTFp and TrePGf only share about 25% identity (overall and between two GT-BC domains) but functional chimeras between distantly related enzymes have been reported before255. Crucial in the construction of active and soluble chimeras is the point of fusion between the two domains, which should conserve the overall tertiary structure as much as possible. A logic fusion point would be the linker sequence connecting the GT-BN and GT-BC domains. However, this region is not well conserved between the two enzymes and also differs in length according to the crystal structure of TreTPh and the homology model of TrePGf, making it hard to find the perfect assembly point. In addition, other authors typically chose assembly points 254,258 outside this region . Nevertheless, as discussed in section 4.1, D195PL in the linker region of

TreTFp can possibly be structurally aligned with D460GL from TrePGf and was chosen as one of the assembly points (Chim 2.). The other two fusion points were chosen at the first three-letter motif upstream (H147ID in TreTFp vs H399IE in TrePGf, Chim 1.) and downstream (R230FD in

TreTFp vs R503FD in TrePGf, Chim 3.) of the linker region, which was conserved in both TreP and TreT sequences. An important feature of TreTPh is the final C-terminal helix, which extends over the GT-BN domain (Figure 59). Interestingly, this structural arrangement is also present in TPSEc (PDB 1UQU)505, UGT71G1 (PDB 2ACW)509 and Vvgt1 (PDB 2C1X)242 and thus possibly

182

Chapter 6: Introducing phosphorylase activity into a glycosyltransferase universal among GTs. In TreP (homology model) and SuSyAt1 (PDB 3S27), the C-terminal end is even longer and extends further on the side of the GT-BN domain (Figure 60). As this C-terminal extension could possibly disrupt the GT-BN domain in TreTFp (GT-BN)/TrePGf (GT-BC) chimeras, three different types were made: chimeras with the complete C-terminal extension of TrePGf (Chim .a), chimeras with the C-terminal extension of TrePGf that structurally aligns with that of

TreTPh (=deletion of last 43 residues of TrePGf) (Chim .b) and chimeras with the GT-BC domain of TrePGf but the C-terminal end of TreTPh (Chim .c). Eventually, seven chimeras were made: Chim 1a, 1b, 2a, 2b, 2c, 3a and 3b (Figure 65).

Figure 65 Schematic representation and ribbon diagram 3D structures of TreTFp (red), TrePGf without N-terminal domain (blue) and TreTFp/TrePGf chimeric enzymes (homology models). The additional C-terminal extension of TrePGf is colored cyan. The last residue of the TreTFp sequence is indicated above the schematic representation and represents the point of fusion. 183

Chapter 6: Introducing phosphorylase activity into a glycosyltransferase

Chim 1a, 1b, 2c, 3a and 3b were expressed in 10 mL culture medium and a heat-treatment of 1 hour at 60°C was used to remove trehalase activity in the CCE. No TreT (ADP as acceptor) or TreP (potassium phosphate as acceptor) activity was detected after 3 hours with TLC for none of the chimeras. However, SDS-PAGE analysis revealed that expression in the soluble fraction (CCE) appeared to be absent or at least much less compared to the WT (Figure 66A). In addition, after heat-treatment of the CCE at 60°C, the chimeras could not be unambiguously detected on SDS-PAGE (Figure 66A) indicating that they could have been denatured at this temperature. To investigate this, TreTFp and chimeras 1a, 2a, 2b, 2c and 3b, were expressed on 250 mL scale and purified using His-tag chromatography (without heat-treatment). Interestingly, only Chim 2c was expressed successfully in the soluble fraction and yields of this enzyme after His-tag purification were about half of that of the WT enzyme (Figure 66B). Chimera 2c is the only one that has the TreTFp C-terminal extension, highlighting its importance for proper folding as could be expected from the tertiary structure.

Figure 66 (A) SDS-PAGE analysis of CCE from TreTFp WT and TreTFp/TrePGf chimeras, derived from 10 mL culture, before (2x diluted) and after heat-treatment (HT, 1 hour at 60°C). (B) SDS-PAGE analysis of CCE from TreTFp WT and TreTFp/TrePGf chimeras or His-tag purified enzymes, derived from 250 mL culture. Chim 1a: 55 kDa, Chim 1b: 50 kDa, Chim 2a: 54 kDa, Chim 2b: 49 kDa, Chim 2c: 50 kDa, Chim 3a: 52 kDa, Chim 3b: 48 kDa, TreTFp WT: 47 kDa, NC: negative control = pCXP34 empty vector. The arrows indicates the position of TreTFp or Chim 2c.

To determine the catalytic activity of Chim 2c, the purified enzyme was incubated at 60°C with trehalose and phosphate or trehalose and ADP. Despite the presence of the stabilizing molecule trehalose, the reaction solution became turbid within 1 hour, indicating denaturation of the protein at this temperature. Similar to TrePGf, the chimera is thus much less stable compared to TreTFp. Indeed, TreTFp readily survived a heat-treatment of 1 hour at 60°C in the absence of trehalose (see section 4.3.1), while TrePGf activity was rapidly reduced if the enzyme was preincubated for 30 min at temperatures above 35°C490. Therefore, Chim 2c was incubated overnight at 37°C, room temperature and 4°C with 125 mM or 417 mM trehalose and 25 mM potassium phosphate or 10 mM ADP (Table 28). However, activity on ADP or phosphate could not be detected in any case.

184

Chapter 6: Introducing phosphorylase activity into a glycosyltransferase

Table 28 Tested conditions for His-tag purified Chim 2c. In each case, the reaction proceeded overnight (~20 hours) and activity was assayed using the Glc1P assay and/or GOD-POD. Tre: trehalose, n.d.: not determined, - : no activity detected, RT: room temperature (~21°C), Pi: sodium or potassium phosphate buffer pH 7.0.

Conditions ADP Pi

60°C, 125 mM Tre, 25 mM potassium phosphate or 10 mM ADP, 1.3 mg/mL Chim 2c - - 37°C, 125 mM Tre, 25 mM potassium phosphate or 10 mM ADP, 1.3 mg/mL Chim 2c - - RT, 125 mM Tre, 25 mM potassium phosphate or 10 mM ADP, 1.3 mg/mL Chim 2c - - 4°C, 125 mM Tre, 25 mM potassium phosphate or 10 mM ADP, 1.3 mg/mL Chim 2c - - RT, 417 mM Tre, 10 mM potassium phosphate, 0.7 mg/mL Chim 2c n.d. - 37°C, 417 mM Tre, 10 mM sodium phosphate, 0.7 mg/mL Chim 2c n.d. -

5 Conclusion

Trehalose glycosylTransferring synthase (TreT) and Trehalose Phosphorylase (TreP), belong to the same GT4 family according to the sequence-based CAZY classification32. This implies sequence and structural similarity and, possibly, descendance from a common ancestor. Furthermore, the active site between TreT and TreP is largely conserved and only a couple of residues delineating this pocket differ markedly between the two enzymes, suggesting their role in specificity. However, the large difference in length between TreT and TreP, caused by an additional N- and C-terminal extension and larger loops in TreP, and the peculiar distribution of the enzymes within the three kingdoms suggest a very ancient origin of their common ancestor or convergent evolution from different ancestral genes, combined with several horizontal gene transfer events during evolution. Site-directed mutagenesis of the putative specificity-determining residues in the active site of TreTFp into those occurring in TreP sequences, was not sufficient to obtain phosphorylase activity. In addition, seven chimeric enzymes were made containing the

GT-BN domain of TreTFp and (parts of) the GT-BC domain of TrePGf. Only the hybrid enzyme with the C-terminal helix of TreTFp, which extends over the GT-BN domain of TreTFp, was expressed successfully in the soluble fraction. However, the enzyme was much less stable compared to TreTFp and no phosphorylase activity could be detected. The latter suggests unexpectedly that residues from the N-terminal domain (or C-terminal end of the GT-BC domain) are involved in acceptor specificity (phosphate vs NDP) in the breakdown direction of trehalose. However, this interaction is probably indirect, involving e.g. second-shell residues, as the active site residues of GT-BN are highly similar between TreP and TreT sequences. The sequence- function relationships within the GT4 family can thus be considered highly complex and further investigation will be necessary to establish a general approach for the introduction of phosphorylase activity into glycosyltransferases.

185

Chapter 6: Introducing phosphorylase activity into a glycosyltransferase

6 Supplementary materials

Amino acid sequences of TreTFp, TrePGf and the chimeric enzymes are presented below. Assembly points are underlined, the C-terminal extension of TrePGf is colored cyan and the sequence of the linker region between the two GT-B domains is bold. The linker region of TreTFp was determined based on visual inspection of the crystal structure of TreTPh and subsequent sequence comparison between the two enzymes to find the corresponding region in TreTFp. The linker region in TrePGf on the other hand was determined by comparing the homology model of TrePGf with the crystal structure of TreTFp.

>TreTFp MTVELPVKRLADYSVFSKEDVEAIFELGKKLKGLKVVHVNATAYGGGVAELLMTIVPLMKDAGLDASWEVLEAPM EFFNVTKKLHNALQGADIEISEEEWELFEKVNEENAKRLNLDADVVIIHDPQPAYIPYFRSGTHTKYIWRCHIDTSTP NLKVWNRLTSKMTKYEKALFHIMDYVRAPFDNIAVEFPPSIDPLSPKNREISEAELKSVVDRYKIDVSRPIITVVARF DPWKDLFSAINVYRKVKERFDVQLAIVSAMAKDDPEGWIFFEDVLRYAGVDKDILFLTDLKGVGHLEVNAIQRLSTL GLHTATREGFGLVISEMMWKEHPVVARPVGGVKIQIDDGVNGYLRQSVEELADAVCDLLENQSRLREFGKNAKE KVRKTYLSTAHVKRYLEIIKSVVK >TrePGf MAPPHQFQSKPSDVIRRRLSSAVSSKRPNIPGYTSLTPMWAGIAGAVVNNNTQFEVAISIHDSVYNTDFASS VVPYSPNEPEAQAGIIEKHVLETLRKFSTEHMCKFLGAGVTVILLREAPNLCTRLWLDMDIVPIVFNIKPFHTD SITRPNVRHRISSTTGSYVPSGAETPTVYYDPAQLQDPNKLSANVQTRLPIPRTVDEQADSAARKCIMYFGP GNNPRLQIGPRNQVAVDAGGKIHLIDDIDEYRKTVGKGTWNSVIKLADELREKKIKIGFFSSTPQGGGVALM RHAIIRFFTALDVDAAWYVPNPSPSVFRTTKNNHNILQGVADPSLRLTKEAADNFDSWILKNGLRWTAEGGP LAPGGVDIAFIDDPQMPGLIPLIKRIRPDLPIIYRSHIEIRSDLVHVKGSPQEEVWNYLWNNIQHSDLFISHPVN KFVPSDVPLEKLALLGAATDWLDGLSKHLDAWDSQYYMGEFRNLCVKEKMNELGWPAREYIVQIARFDPS KGIPNVIDSYARFRKLCVDKVMEDDIPQLLLCGHGAVDDPDASIIYDQVLQLIHAKYKEYAPDIVVMRCPPSD QLLNTLMANAKFALQLSTREGFEVKVSEALHAGKPVIACRTGGIPLQIEHGKSGYLCEPGDNAAVAQHMLDL YTDEDLYDTMSEYARTHVSDEVGTVGNAAAWMYLAVMYVSRGVKLRPHGAWINDLMRTEMGEPYRPGEP RLPRGELHVQG >Chimera 1a MTVELPVKRLADYSVFSKEDVEAIFELGKKLKGLKVVHVNATAYGGGVAELLMTIVPLMKDAGLDASWEVLE APMEFFNVTKKLHNALQGADIEISEEEWELFEKVNEENAKRLNLDADVVIIHDPQPAYIPYFRSGTHTKYIWR CHIDIRSDLVHVKGSPQEEVWNYLWNNIQHSDLFISHPVNKFVPSDVPLEKLALLGAATDWLDGLSKHLDA WDSQYYMGEFRNLCVKEKMNELGWPAREYIVQIARFDPSKGIPNVIDSYARFRKLCVDKVMEDDIPQLLLC GHGAVDDPDASIIYDQVLQLIHAKYKEYAPDIVVMRCPPSDQLLNTLMANAKFALQLSTREGFEVKVSEALH AGKPVIACRTGGIPLQIEHGKSGYLCEPGDNAAVAQHMLDLYTDEDLYDTMSEYARTHVSDEVGTVGNAAA WMYLAVMYVSRGVKLRPHGAWINDLMRTEMGEPYRPGEPRLPRGELHVQGHHHHHH >Chimera 1b MTVELPVKRLADYSVFSKEDVEAIFELGKKLKGLKVVHVNATAYGGGVAELLMTIVPLMKDAGLDASWEVLE APMEFFNVTKKLHNALQGADIEISEEEWELFEKVNEENAKRLNLDADVVIIHDPQPAYIPYFRSGTHTKYIWR CHIDIRSDLVHVKGSPQEEVWNYLWNNIQHSDLFISHPVNKFVPSDVPLEKLALLGAATDWLDGLSKHLDA WDSQYYMGEFRNLCVKEKMNELGWPAREYIVQIARFDPSKGIPNVIDSYARFRKLCVDKVMEDDIPQLLLC GHGAVDDPDASIIYDQVLQLIHAKYKEYAPDIVVMRCPPSDQLLNTLMANAKFALQLSTREGFEVKVSEALH AGKPVIACRTGGIPLQIEHGKSGYLCEPGDNAAVAQHMLDLYTDEDLYDTMSEYARTHVSDEVGTVGNAAA WMYLAVMHHHHHH

>Chimera 1c MTVELPVKRLADYSVFSKEDVEAIFELGKKLKGLKVVHVNATAYGGGVAELLMTIVPLMKDAGLDASWEVLE APMEFFNVTKKLHNALQGADIEISEEEWELFEKVNEENAKRLNLDADVVIIHDPQPAYIPYFRSGTHTKYIWR CHIDIRSDLVHVKGSPQEEVWNYLWNNIQHSDLFISHPVNKFVPSDVPLEKLALLGAATDWLDGLSKHLDA WDSQYYMGEFRNLCVKEKMNELGWPAREYIVQIARFDPSKGIPNVIDSYARFRKLCVDKVMEDDIPQLLLC GHGAVDDPDASIIYDQVLQLIHAKYKEYAPDIVVMRCPPSDQLLNTLMANAKFALQLSTREGFEVKVSEALH AGKPVIACRTGGIPLQIEHGKSGYLCEPGDNAAVAQHMLDLYTDEDLYDTMSEYARTHVRKTYLSTAHVKR YLEIIKSVVKLEHHHHHH

186

Chapter 6: Introducing phosphorylase activity into a glycosyltransferase

>Chimera 2a MTVELPVKRLADYSVFSKEDVEAIFELGKKLKGLKVVHVNATAYGGGVAELLMTIVPLMKDAGLDASWEVLE APMEFFNVTKKLHNALQGADIEISEEEWELFEKVNEENAKRLNLDADVVIIHDPQPAYIPYFRSGTHTKYIWR CHIDTSTPNLKVWNRLTSKMTKYEKALFHIMDYVRAPFDNIAVEFPPSIDGLSKHLDAWDSQYYMGEFRN LCVKEKMNELGWPAREYIVQIARFDPSKGIPNVIDSYARFRKLCVDKVMEDDIPQLLLCGHGAVDDPDASIIY DQVLQLIHAKYKEYAPDIVVMRCPPSDQLLNTLMANAKFALQLSTREGFEVKVSEALHAGKPVIACRTGGIPL QIEHGKSGYLCEPGDNAAVAQHMLDLYTDEDLYDTMSEYARTHVSDEVGTVGNAAAWMYLAVMYVSRGV KLRPHGAWINDLMRTEMGEPYRPGEPRLPRGELHVQGHHHHHH

>Chimera 2b MTVELPVKRLADYSVFSKEDVEAIFELGKKLKGLKVVHVNATAYGGGVAELLMTIVPLMKDAGLDASWEVLE APMEFFNVTKKLHNALQGADIEISEEEWELFEKVNEENAKRLNLDADVVIIHDPQPAYIPYFRSGTHTKYIWR CHIDTSTPNLKVWNRLTSKMTKYEKALFHIMDYVRAPFDNIAVEFPPSIDGLSKHLDAWDSQYYMGEFRN LCVKEKMNELGWPAREYIVQIARFDPSKGIPNVIDSYARFRKLCVDKVMEDDIPQLLLCGHGAVDDPDASIIY DQVLQLIHAKYKEYAPDIVVMRCPPSDQLLNTLMANAKFALQLSTREGFEVKVSEALHAGKPVIACRTGGIPL QIEHGKSGYLCEPGDNAAVAQHMLDLYTDEDLYDTMSEYARTHVSDEVGTVGNAAAWMYLAVMHHHHHH

>Chimera 2c MTVELPVKRLADYSVFSKEDVEAIFELGKKLKGLKVVHVNATAYGGGVAELLMTIVPLMKDAGLDASWEVLE APMEFFNVTKKLHNALQGADIEISEEEWELFEKVNEENAKRLNLDADVVIIHDPQPAYIPYFRSGTHTKYIWR CHIDTSTPNLKVWNRLTSKMTKYEKALFHIMDYVRAPFDNIAVEFPPSIDGLSKHLDAWDSQYYMGEFRN LCVKEKMNELGWPAREYIVQIARFDPSKGIPNVIDSYARFRKLCVDKVMEDDIPQLLLCGHGAVDDPDASIIY DQVLQLIHAKYKEYAPDIVVMRCPPSDQLLNTLMANAKFALQLSTREGFEVKVSEALHAGKPVIACRTGGIPL QIEHGKSGYLCEPGDNAAVAQHMLDLYTDEDLYDTMSEYARTHVRKTYLSTAHVKRYLEIIKSVVKLEHHHH HH

>Chimera 3a MTVELPVKRLADYSVFSKEDVEAIFELGKKLKGLKVVHVNATAYGGGVAELLMTIVPLMKDAGLDASWEVLE APMEFFNVTKKLHNALQGADIEISEEEWELFEKVNEENAKRLNLDADVVIIHDPQPAYIPYFRSGTHTKYIWR CHIDTSTPNLKVWNRLTSKMTKYEKALFHIMDYVRAPFDNIAVEFPPSIDPLSPKNREISEAELKSVVDRYKID VSRPIITVVARFDPSKGIPNVIDSYARFRKLCVDKVMEDDIPQLLLCGHGAVDDPDASIIYDQVLQLIHAKYKE YAPDIVVMRCPPSDQLLNTLMANAKFALQLSTREGFEVKVSEALHAGKPVIACRTGGIPLQIEHGKSGYLCE PGDNAAVAQHMLDLYTDEDLYDTMSEYARTHVSDEVGTVGNAAAWMYLAVMYVSRGVKLRPHGAWINDL MRTEMGEPYRPGEPRLPRGELHVQGHHHHHH

>Chimera 3b MTVELPVKRLADYSVFSKEDVEAIFELGKKLKGLKVVHVNATAYGGGVAELLMTIVPLMKDAGLDASWEVLE APMEFFNVTKKLHNALQGADIEISEEEWELFEKVNEENAKRLNLDADVVIIHDPQPAYIPYFRSGTHTKYIWR CHIDTSTPNLKVWNRLTSKMTKYEKALFHIMDYVRAPFDNIAVEFPPSIDPLSPKNREISEAELKSVVDRYKID VSRPIITVVARFDPSKGIPNVIDSYARFRKLCVDKVMEDDIPQLLLCGHGAVDDPDASIIYDQVLQLIHAKYKE YAPDIVVMRCPPSDQLLNTLMANAKFALQLSTREGFEVKVSEALHAGKPVIACRTGGIPLQIEHGKSGYLCE PGDNAAVAQHMLDLYTDEDLYDTMSEYARTHVSDEVGTVGNAAAWMYLAVMHHHHHH

>Chimera 3c MTVELPVKRLADYSVFSKEDVEAIFELGKKLKGLKVVHVNATAYGGGVAELLMTIVPLMKDAGLDASWEVLE APMEFFNVTKKLHNALQGADIEISEEEWELFEKVNEENAKRLNLDADVVIIHDPQPAYIPYFRSGTHTKYIWR CHIDTSTPNLKVWNRLTSKMTKYEKALFHIMDYVRAPFDNIAVEFPPSIDPLSPKNREISEAELKSVVDRYKID VSRPIITVVARFDPSKGIPNVIDSYARFRKLCVDKVMEDDIPQLLLCGHGAVDDPDASIIYDQVLQLIHAKYKE YAPDIVVMRCPPSDQLLNTLMANAKFALQLSTREGFEVKVSEALHAGKPVIACRTGGIPLQIEHGKSGYLCE PGDNAAVAQHMLDLYTDEDLYDTMSEYARTHVRKTYLSTAHVKRYLEIIKSVVKLEHHHHHH

187

Chapter 6: Introducing phosphorylase activity into a glycosyltransferase

Figure S26 Phylogenetic tree of life separated into three different domains: Bacteria, Archaea and Eucarya . Several phyla and orders within the different domains are represented on the branches. LUCA: last universal common ancestor from which every living organism descends. Cr: Crenarchaeota, Eu: Euryarchaeota. Cr and Eu represent the two main phyla of archaea. Adapted 534 from .

Figure S27 Surface representation (created in PyMol388) of TreTPh, TPSEc, SuSyNe and SuSyAt1 and their respective PDB codes. Substrates are represented by sticks. In SuSyNe, the visualized substrates are those from the superposed structure of SuSyAt1 (3S27). Arrows indicate the solvent- accessible active site.

188

Chapter 6: Introducing phosphorylase activity into a glycosyltransferase

Figure S28 Sequence alignment of several TreP and TreT sequences. The linker region between the GT-BN and GT-BC domains of TreTFp (based on crystal structure 2X6R) is underlined in grey while residues in the active site of TreTFp are underlined in black. The EX7E motif is represented by a solid line black box and an example of a bigger loop in TreP is represented by a dashed line black box.

189 Chapter 6: Introducing phosphorylase activity into a glycosyltransferase

TreTPh 52 93 134 G G G V A E N A L Q G D P Q G (96) G (98) G (99) V (99) A (87) E (95) N (100) A (63) L (94) Q (94) G (86) D (100) P (99) Q (100) S (4) A (1) T (1) Q (4) H (1) T (8) V (1) H (6) M (1) N (1) L (1) V (1) S (11) P (1) N (2) F (2) V (1) S (1) C (1) G (1) S (12) M (4) T (2) T (1) V (1) C (4) I (2) N (11) G (13) L (2) L (3) Other TreTs M (1)

TrePGf 284 325 375 G G G V A L N I L Q G D P Q G (100) G (100) G (100) V (100) A (98) L (100) N (100) I (94) L (95) Q (100) G (100) D (100) P (100) Q (100) T (1) C (1) H (1) S (1) R (3) I (1) H (1) Other TrePs K (1) T (2) M (2) V (3) F (4)

Figure S29 Part 1 of amino acid distribution of putative TreTs (250 closest homologues of TreTPh according to the UniProtKB blast tool) (upper part) and putative TrePs (250 closest homologues of TrePGf) (lower part) at positions in the active site of TreTPh (based on crystal structure 2X6R). Percentage occurrence of each amino acid is given between brackets. The corresponding positions in TreP were determined based on a sequence alignment of the 500 TreT and TreP sequences. The amino acid sequences and residue numbering schemes of TreTPh and TrePGf were chosen as representatives for GT4 Trehalose synthases (TreT) and Trehalose Phosphorylases (TreP), respectively. Residues in bold are >90% conserved in the TreT or TreP subset. Ph: P. horikoshii, Gf: G. frondosa.

190

Chapter 6: Introducing phosphorylase activity into a glycosyltransferase

155 181 201 TreTPh H I D F H L P S I D P L S E K H (100) I (88) D (97) F (66) H (24) L (38) P (100) S (66) I (100) D (96) P (100) L (87) S (86) E (29) K (95) V (10) E (3) S (2) S (52) H (5) T (1) N (4) V (1) R (1) H (3) H (1) I (1) L (3) H (2) V (7) T (21) K (1) N (1) H (1) R (1) T (6) D (15) V (3) I (15) Q (5) E (1) C (1) S (1) N (2) S (2) T (3) L (6) S (6) G (3) V (2) G (1) T (6) M (1) T (2) V (1) I (1) A (5) N (3) Y (5) N (3) I (1) F (10) Q (2) Other TreTs W (2) C (3) F (1) P (25) A (4) A (29) A (1) V (9) V (9) I (9) I (10) M (24) L (2) M (1) F (1) Y (1)

399 433 TrePGf H I E S H ? ? ? ? ? ? ? ? ? ? H (100) I (100) E (40) S (98) H (100) Q (61) A (2) Other TrePs

Figure S30 Part 2 of amino acid distribution of putative TreTs (250 closest homologues of TreTPh according to the UniProtKB blast tool) (upper part) and putative TrePs (250 closest homologues of TrePGf) (lower part) at positions in the active site of TreTPh (based on crystal structure 2X6R). Percentage occurrence of each amino acid is given between brackets. The corresponding positions in TreP were determined based on a sequence alignment of the 500 TreT and TreP sequences. The amino acid sequences and residue numbering schemes of TreTPh and TrePGf were chosen as representatives for GT4 Trehalose synthases (TreT) and Trehalose Phosphorylases (TreP), respectively. Residues in bold are >90% conserved in the TreT or TreP subset. Ph: P. horikoshii, Gf: G. frondosa, ?: uncertainty about correct sequence alignment.

191

Chapter 6: Introducing phosphorylase activity into a glycosyltransferase

237 243 268 274 TreTPh V S R F W K G V G V M D D P V (72) S (85) R (100) F (96) W (63) K (100) G (8) V (46) G (81) V (6) M (46) D (100) D (100) P (100) I (29) A (15) L (2) F (18) D (93) C (9) S (15) S (48) D (1) A (1) G (1) Y (4) A (15) A (25) T (3) T (2) E (1) I (9) A (2) N (10) S (4) L (12) Y (2) G (21) T (4) M (1) A (14) N (1) M (1) Q (1) Other TreTs G (12) P (15) A (2) V (7) I (4) L (5) F (3) Y (1)

501 507 539 545 TrePGf I A R F S K G C G H G D D P I (40) A (98) R (100) F (99) S (61) K (100) G (100) C (93) G (100) H (57) G (88) D (100) D (100) P (100) V (60) S (2) L (2) A (40) Q (1) A (6) N (44) S (4) T (1) Other TrePs C (1) V (1) V (2) Q (2) A (2) Y (1) Figure S31 Part 3 of amino acid distribution of putative TreTs (250 closest homologues of TreTPh according to the UniProtKB blast tool) (upper part) and putative TrePs (250 closest homologues of TrePGf) (lower part) at positions in the active site of TreTPh (based on crystal structure 2X6R). Percentage occurrence of each amino acid is given between brackets. The corresponding positions in TreP were determined based on a sequence alignment of the 500 TreT and TreP sequences. The amino acid sequences and residue numbering schemes of TreTPh and TrePGf were chosen as representatives for GT4 Trehalose synthases (TreT) and Trehalose Phosphorylases (TreP), respectively. Residues in bold are >90% conserved in the TreT or TreP subset. Ph: P. horikoshii, Gf: G. frondosa.

192

Chapter 6: Introducing phosphorylase activity into a glycosyltransferase

304 326 TreTPh V H A R E V E G F G L T V T E V (34) H (11) A (9) R (21) E (59) V (69) E (100) G (100) F (100) G (93) L (100) T (71) V (88) T (41) E (100) D (23) R (1) H (27) K (8) K (1) I (32) (1) (S) I (1) S (2) A (2) S (54) E (8) K (1) D (31) E (2) D (1) C (2) M (1) C (1) I (10) T (41) T (3) D (12) E (3) S (2) T (13) A (6) G (1) M (2) A (7) N (5) E (1) S (5) T (1) Q (3) A (4) Q (7) S (26) T (3) Q (1) L (1) V (23) G (1) T (2) N (18) P (1) G (2) Other TreTs A (6) N (7) G (5) A (8) A (1) I (4) G (25) P (1) V (9) V (2) F (2) P (2) Y (2) I (22) I (1) Y (9) A (15) L (26) L (18) Y (1) M (3) M (2) F (1) F (1) Y (1)

TrePGf ? ? ? ? ? L E G F E V K V S E L (95) E (100) G (100) F (100) E (100) V (84) K (100) V (100) S (100) E (100) I (6) I (17) Other TrePs

Figure S32 Part 4 of amino acid distribution of putative TreTs (250 closest homologues of TreTPh according to the UniProtKB blast tool) (upper part) and putative TrePs (250 closest homologues of TrePGf) (lower part) at positions in the active site of TreTPh (based on crystal structure 2X6R). Percentage occurrence of each amino acid is given between brackets. The corresponding positions in TreP were determined based on a sequence alignment of the 500 TreT and TreP sequences. The amino acid sequences and residue numbering schemes of TreTPh and TrePGf were chosen as representatives for GT4 Trehalose synthases (TreT) and Trehalose Phosphorylases (TreP), respectively. Residues in bold are >90% conserved in the TreT or TreP subset. Residues in italic are aligned using the structure-based alignment of the 3DM glycosyltransferase superfamily. Ph: P. horikoshii, Gf: G. frondosa, ?: uncertainty about correct sequence alignment.

193

Chapter 6: Introducing phosphorylase activity into a glycosyltransferase

Figure S33 SDS-PAGE analysis of TreTFp (47 kDa), expressed from the pET21a inducible vector. Lane 1: His-tag purified TreTFp (equilibration, wash and elution using 10 mM, 30 mM and 250 mM imidazole, respectively), lane 2: crude cell extract of TreTFp.

Table S13 Pathways involved in trehalose metabolism472. Tre: trehalose, Glc: glucose, Pi: inorganic phosphate, Mal: maltose, NDP: nucleoside diphosphate.

Reaction Involved enzymes Present in

1. Trehalose 6-Phosphate Synthase UDP-Glc + Glc6P (TPS, EC 2.4.1.15) Archaea, Bacteria, Fungi, →Tre6P → Tre + Pi 2. Trehalose 6-Phosphate Plants,

Phosphatase (TPP, EC 3.1.3.12) Trehalose Synthase (TreS, EC Mal ↔Tre Bacteria 5.4.99.16) 1. Maltooligosyl-Trehalose Maltooligosaccharides → synthase (Trey, EC 5.4.99.15) Maltooligosyltrehalose → 2. Maltooligosyl-Trehalose Archaea, Bacteria Tre trehalohydrolase (TreZ, EC 3.2.1.141) Trehalose Phosphorylase (TreP, Glc1P + Glc ↔ Tre + Pi EC 2.4.1.64 (inverting) or EC , Bacteria, Fungi 2.4.1.231 (retaining)) NDP-Glc + Glc ↔ Tre + Trehalose GlycosylTransferring Archaea, Bacteria NDP synthase (TreT, EC 2.4.1.245 )

Tre → Glc + Glc Trehalase (TreH, EC 3.2.1.28) Archaea, Bacteria, Eukarya

194

CHAPTER 7: General discussion and future perspectives

195 Chapter 7: general discussion and future perspectives

1 Wrongly annotated protein sequences highlight the need for the development of an automated validation system

Recombinant expression and characterization of new SuSy enzymes is of interest from both a fundamental and an industrial point of view. Identification of functional SuSys can teach us something about the metabolic potential and coping mechanisms of the host organism. Also, exploration of nature’s arsenal of SuSys can reveal enzymes with interesting properties (e.g. high stability, expression and/or activity) for use in economically feasible reactions on industrial scale. To explore available SuSy homologues, the freely accessible UniProtKb477 site can be used. UniProtKb comprises a large protein database of reviewed (Swiss-Prot section) and unreviewed (TrEMBL section) protein sequences. The vast majority of these sequences are translations of coding sequences (CDS) imported from nucleotide sequence databases (EMBL- Bank/Genbank/DDBJ) and corresponding protein names and EC numbers are provided by the submitters of the DNA sequences. The CDS are either generated by gene prediction software applied to genomic DNA or via the translation of cDNA.

Advanced searching of the UniProtKB database with Sucrose Synthase as ‘protein name’ query and subsequent automated deletion of inappropriate sequences (Chapter 2) resulted in a selection of 176 putative eukaryotic and prokaryotic annotated SuSy sequences. Taxonomic analysis revealed that these sequences belonged to plant hosts, cyanobacteria, proteobacteria and some other underrepresented (Figure 67 and Chapter 2). Two species, the thermophilic Thermosipho melanesiensis510 from the phylum Thermotogae, and the mesophilic Clavibacter michiganensis subsp. michiganensis511 belonging to the Actinobacteria, clearly formed an outgroup compared to the other organisms in the phylogenetic tree (Figure 67). Although the two protein sequences are annotated as Sucrose Synthase (UniProtKB A6LKE9) and putative Sucrose Synthase (UniProtKB A5CN81), respectively, recombinant expression and characterization of these enzymes revealed that they did not have sucrose cleavage activity. Instead, the enzyme from T. melanesiensis displayed Sucrose-Phosphate Synthase activity, a related specificity. These sequences are thus wrongly annotated, highlighting the need for revision of submitted protein annotations. Based on sequence length, the SPS specificity of the enzyme from T. melanesiensis could actually have been predicted. Similar to other SPS monofunctional enzymes512, the protein consists of only 423 amino acids, while SuSy sequences are on average 800 residues long. In addition to the SuSy sequences, several Trehalose Phosphorylases and Trehalose glycosylTransferring synthases also seem to be wrongly annotated based on phylogenetic analysis (Chapter 6). Sequence analysis of the EX7E motif (‘EGFXLXXXE’ consensus in TreT vs ‘EGFEI/VKVSE’ in TreP) possibly presents a way to distinguish between the two specificities (Figure S32). Therefore, integration of phylogenetic analysis, sequence length and additional protein signatures (the EX7E motif in particular for GTs) in an automated process of protein annotation validation after submission of DNA sequences,

196

Chapter 7: general discussion and future perspectives could provide a means to improve functional predictions leading to a reduced occurrence of wrongly annotated sequences in protein databases. It should be noted, however, that misannotation will always remain a problem to some extent as even mutations at one single position can alter the specificity of an enzyme.

Figure 67 Phylogenetic analysis of 167 putative SuSy genes conducted in MEGA7168. Only one isoform per species was considered. Green: plant SuSys, black: non-cyanobacterial prokaryotic SuSys, blue: cyanobacterial SuSys, red: wrongly annotated protein sequences from C. michiganensis ssp. michiganensis and T. melanesiensis.

197

Chapter 7: general discussion and future perspectives

2 Choice of SuSy enzyme depends on the application

From an industrial point of view, the sucrose cleavage activity of SuSy is highly valued since it produces expensive nucleotide sugars, UDP-Glc in particular, starting from the cheap and abundant substrate sucrose84,90,375. To reduce overall costs and process time even further, plant and cyanobacterial SuSys have been extensively used in coupled reactions together with other glycosyltransferases for the production of valuable glycosides with in situ regeneration of the expensive nucleotide (sugar)16,69–71,84,285–289,376–383. In chapter 2, a promising new candidate SuSy for industrial applications was identified: SuSyAc, from the non-phototrophic proteobacterium A. caldus. The enzyme displayed a high temperature optimum (60°C), high thermostability (t50≈4 hours at 60°C in the presence of sucrose), high activity (up to 50 U/mg with UDP) and a good expression yield (at least 2 mg of protein could be recovered after purification starting from a 250 mL culture). Although SuSyAc prefers ADP as nucleotide acceptor, it was better suited for the production of UDP-Glc compared to plant enzymes, which generally prefer UDP. Indeed, despite the low affinity of SuSyAc for UDP, it is highly active at high concentrations of UDP, which are required to reach high end-concentrations of the resulting nucleotide sugar. At 300 mM UDP, SuSyAc was 11 times more active than SuSy from Glycine max (SuSyGm). In addition, SuSyAc can also operate at higher temperatures, which reduces the risks of microbial contamination14,16,68,86. The lower activity of SuSyGm is mainly caused by substrate inhibition of the enzyme at UDP concentrations higher than 50 mM, while SuSyAc only reaches its maximal activity at 150 mM. Clearly, the traditional way of selecting an enzyme based on its overall catalytic efficiency parameter kcat/Km is not relevant for choosing a SuSy enzyme for nucleotide sugar production. Instead, the activity at high concentrations of (nucleotide) acceptor is more appropriate and should probably be considered as the most important parameter for production processes in general.

For coupled reactions, a SuSy enzyme with high activity at low concentrations of UDP is preferred, meaning that it should have a low Km (combined with a high Vmax)400. Indeed, low concentrations of UDP are typically used during these reactions to avoid substrate inhibition of GTs by UDP and to reduce costs. As SuSyAc clearly did not have the appropriate kinetic parameters for coupled reactions, it was engineered to improve its affinity for UDP. This was successfully achieved by introducing plant residues in the nucleotide acceptor binding pocket. Double mutant SuSyAc LMDKVVA displayed a 55-fold higher affinity for UDP (Km: 0.13 mM) compared to the WT (Km: 7.2 mM), without negative effects on the activity and expression yield. This variant significantly outperformed the WT enzyme at low concentrations of UDP (0.5 mM) as demonstrated by the 9-fold increased production rate of the C-glucoside nothofagin in the coupled reaction with OsCGT513. Despite these successful results, production of nothofagin was still fastest when the wild-type plant SuSyGm was used under the same conditions400. Recombinant SuSyGm also displays a Km of about 0.13 mM for UDP (unpublished results), but

198

Chapter 7: general discussion and future perspectives the rate of nothofagin production in the coupled reaction with OsCGTwas three times higher with this enzyme compared to SuSyAc LMDKVVA at 50°C400.

Finally, it has to be noted that for reactions involving the production of nucleotide sugars, more acidic conditions are preferred because of the favorable thermodynamic equilibrium (Keq), the higher stability of sugar nucleotides and the higher activity of SuSy enzymes at low pH86. Instead, coupled reactions should be performed around neutral pH, as a compromise between the low and high pH optimum of SuSy and the coupled GTs, respectively84.

Summarized, the type of application determines the required catalytic properties and thus the choice of the SuSy enzyme (Figure 68). For the production of UDP-Glc, the bacterial SuSyAc is recommended because of its high activity at elevated temperatures, low pH and at high concentrations of UDP. For coupled glycosylation reactions on the other hand, SuSys with high activities at low concentrations of UDP (~low Km and high Vmax) should be chosen, such as the plant SuSy from G. max (or SuSyAc LMDKVVA).

Figure 68 Overview of the required characteristics of the SuSy enzyme, depending on the target product.

3 Structure-function relationships within the GT4 family

The determination of several crystal structures from retaining GT-B enzymes (e.g. GT5 glycogen synthase514, GT4 Sucrose-Phosphate Synthase512, GT4 Trehalose synthase62 and GT20 Trehalose 6-Phosphate Synthase506) has revealed that these enzymes exist in open and closed conformations. According to the induced fit model of Koshland515, structural changes of the catalytic domains occur upon binding of the substrates leading to a closed conformation suited for catalysis. These rearrangements reshape the active site, resulting in stronger interactions with the substrates and proper positioning of critical residues95. In SuSy, interactions between the

199

Chapter 7: general discussion and future perspectives different substrates and between the substrate and the enzyme residues occur predominantly through an extensive hydrogen-bond network. Characterization studies of several SuSys imply that this enzyme must have been selected and optimized throughout evolution to specifically use sucrose while retaining promiscuous activity towards the nucleotide acceptor. SuSyAc for example, displays high activity for both ADP and UDP (and to a lesser extent also CDP and GDP) and improving its affinity for UDP was easily achieved by introducing residues from plants in the nucleotide acceptor pocket of this bacterial enzyme. In contrast, the enzyme is not able to efficiently convert substrates such as GalFru (Chapter 4), which differs from sucrose only in the spatial orientation of the hydroxyl group at the C4 position of the glucose moiety, or XylFru (data not shown), which lacks the CH2OH group at the C5 position. This strengthens the hypothesis that the hydrogen bonds between the enzyme and the hydroxyl groups of the donor substrate and/or the hydrogen bonds between the two substrates, are crucial for specificity and induction of the structural changes necessary for proper catalysis. Mutagenesis of several second-shell residues and active site residues positioned around the sugar moieties, such as those belonging to the EX7E motif, typically reduced expression yields and/or sucrose cleavage activity, indicating their role in folding and/or specificity. However, none of these mutants showed improved activity on the non-natural substrate GalFru (Chapter 4). Up till now, it remains unclear why some authors were successful in changing the donor substrate specificity of a GT4 enzyme from UDP-Glc to UDP-Gal246, although the higher initial activity on the target substrate could have been crucial.

Similar to SuSy, TreT can also readily accept several nucleoside diphosphates (Chapter 6). However, activity with inorganic phosphate, which is basically a shortened version of its natural acceptor, was not observed within the tested timeframe. Again, the structural changes necessary for catalysis will probably not occur due to the absence of certain hydrogen bond interactions. Replacement of the active site residues in Trehalose synthase by those occurring in Trehalose Phosphorylase, an enzyme which naturally accepts phosphate, generally abolished activity or stability and could not drive the specificity towards phosphate. In addition, soluble expression was only succesfull for a chimeric TreTFp/TrePGf enzyme containing the C-terminal helix of TreTFp, which extends over the N-terminal GT-BN domain of TreTFp. The inability of this chimera to produce Glc1P within a reasonable time suggests that residues from the N-terminal domain (or C- terminal end of the GT-BC domain) are involved in acceptor specificity in the breakdown direction of trehalose or that the enzyme is not stable enough. Although the former would be rather unexpected, similar conclusions were drawn based on experiments with other chimeric GT enzymes251,259. Clearly, continuing efforts will be necessary to unravel the complex structure- function relationships within the GT4 family and to alter the specificity towards other substrates. As rational engineering techniques failed so far, it could be an option to fall back on the more classical approach of protein engineering: directed evolution. To this end, random mutagenesis (e.g. via error prone PCR) could be used to introduce mutations in e.g. TreT WT or TreT Chimera 2c (the only chimera which was expressed in the soluble fraction). However, as the amount of

200

Chapter 7: general discussion and future perspectives colonies that need to be screened in directed evolution experiments are enormous, use of the screening protocol developed in Chapter 6 would be too labor-intensive and time-consuming. A high-throughput method based on in vivo selection on minimal media containing trehalose as the only carbon source could present a valuable alternative here. This would, however, require metabolic engineering of the expression host E. coli because the native trehalose metabolism of this organism interferes with the search for the new phosphorylase activity (Figure 69).

Figure 69 Trehalose metabolism of E. coli and required knock-outs to create a strain suitable for the identification of TreT mutants with phosphorylase activity. Figure was created based on ecocyc and information found in literature 516–523.

Indeed, trehalose can be hydrolyzed by a periplasmic trehalase (TreA), or can enter the cell as trehalose 6-phosphate (Tre6P) through a PTS transporter. Tre6P can directly be converted by a hydrolase (TreC) to Glc and Glc6P or the phosphate can be cleaved off by a phosphatase (OtsB), yielding trehalose. The latter can subsequently be hydrolyzed by a cytoplasmic trehalase (TreF). Summarized, to disable the growth of E. coli on trehalose minimal medium (trehalose negative strain), by blocking the pathways towards intermediates of the energy-producing glycolysis, at least three genes should be knocked out: TreA, TreF and TreC. If this triple knock-out mutant would be transformed with a TreT containing enzyme, growth should in principle be restored again as this enzyme will lead to the production of UDP-Glc and glucose from intracellular accumulated trehalose. Next, if we want to selectively search for TreT mutants with TreP activity, metabolism of glucose and UDP-Glc would have to be shut down. This would require several

201

Chapter 7: general discussion and future perspectives additional knock-outs of cytoplasmic enzymes: glucokinase, which converts glucose into Glc6P and GalU, which converts UDP-Glc into Glc1P. Furthermore, accumulation of monosaccharides or derivatives can lead to their leakage into the periplasmic space522,523. Consequently, the PTS systems responsible for the import of glucose as Glc6P should also be knocked down517. Finally, UshA should be inactivated as this periplasmic enzyme hydrolyzes UDP-Glc to Glc1P, which can enter the cell again524. In theory, if all the aforementioned genes are knocked out successfully in E. coli, only transformed colonies harboring a TreT mutant with sufficient TreP activity would be able to grow on trehalose minimal medium, providing a valuable high-throughput selection screen appropriate for directed evolution experiments.

A similar approach for engineering SuSy towards GalFru specificity could be explored. Here, a sucrose negative strain able to import sucrose but unable to metabolize it (e.g. E. coli W ΔcscAR525 or S. cerevisiae pVAN192526) would have to be used and tested for its ability to import GalFru.

4 Correlated positions as hotspots for mutagenesis

A well-known set of correlated positions are specificity-determining residues (SDR). These are not necessarily located in the vicinity of each other but they tend to be in functional sites conferring specificity, typically (but not exclusively) the active site192. These larger groups of residues are specifically co-conserved within a particular protein subfamily, but vary between subfamilies with different enzyme specificities191,193,194. Consequently, SDR are potential targets to alter the activity or change the specificity192. Nonetheless, successful examples using correlation networks as hotspots for mutagenesis are currently still scarce. In addition, it remains unclear how a correlation network should be optimally used to engineer enzymes. Questions such as: ‘how much and which residues of the correlation network should be included in the mutagenesis strategy’, ‘should they always be mutated simultaneously’ and ‘which amino acids should be included’ remain unanswered191. In this work, only the top four correlated positions of subset P218 were targeted and only those amino acids that occur more than 8% in the complete UDP-glycosyltransferase superfamily were included. This approach did not lead to improved mutants on GalFru. However, there are still plenty of other possible ways to use the correlation network. Indeed, additional correlated positions and/or amino acids could be considered in the mutagenesis strategy. On the other hand, correlated positions most closely positioned to the substrate of interest (glucose moiety in this case) could be selected rather than the top-correlated positions. Lastly, other subsets could be generated, possibly revealing new correlation networks. In this respect, it could have been interesting to focus on the correlation network of a subset containing only glucosyltransferases en galactosyltransferases. However, as the amount of available GalT sequences is extremely low, it is highly doubtful that the extracted correlation network is correct or meaningful.

202

Chapter 7: general discussion and future perspectives

5 The quest for stable enzymes

High (thermo)stability is a key requirement for enzymes catalyzing carbohydrate conversions in industry, mainly because these are typically performed at high temperatures (55-60°C) to avoid microbial contamination14. The quest for a stable enzyme typically starts by searching for natural (hyper)thermophilic hosts of the target specificity. Enzymes from (hyper)thermophiles are generally also thermostable and optimally active at high temperatures527. The underlying mechanisms responsible for the high thermostability vary between these organisms and include hydrogen bonds, ion pairs, hydrophobic interactions, disulfide bridges and intersubunit interactions in oligomers527,528.

Most of the non-cyanobacterial prokaryotes harboring a SuSy are mesophilic. One exception includes the moderately thermophile M. roseus295, which thrives optimally around 55°C. Remarkably, its SuSy enzyme (SuSyMr) reached maximal activity at a much higher temperature (80°C). However, also SuSys from mesophilic organisms (SuSyDa, SuSyAc and SuSyNe) appeared to have high temperature optima (65°C, 60°C and 75°C, respectively). The most thermostable one was SuSyAc, which retained its full activity for at least 30 min at 55°C (in the absence of substrates). In contrast, SuSy from the cyanobacterium Chroococcidiopsis thermalis PCC 7203 (UniProtKb accession number K9U774) had an optimum temperature of only 50°C and it lost almost half of its activity after incubation of five hours on ice (data not shown). This is rather unexpected as Chroococcidiopsis species are typically able to survive under extreme conditions of drought, temperature, pH, high levels of radiation and salt concentrations529. These results suggest that the search for a stable enzyme should not be restricted to thermophilic hosts. Instead, it should involve the cloning and testing of multiple candidates, including those from less obvious sources. Nowadays, this should not be a bottleneck anymore as the cost of gene synthesis has dropped tremendously over the years530.

If one wants to increase the stability of an enzyme, physical immobilization220 or protein engineering techniques can be applied. Unfortunately, the process of denaturation is still not fully understood and no single method for protein stabilization has been identified that can be applied to all enzymes. In chapter 5, three main strategies were attempted to improve the thermostability of SuSyAc: prediction of stabilizing mutations in silico by foldX and Rosetta, introduction of consensus residues, and stabilization of the most flexible domain within the protein. Three out of 24 mutations slightly increased the thermodynamic stability, but only to a small extend (≤1.7°C). In addition, none of the mutants displayed a significant higher kinetic stability, which is a more relevant parameter for the economics of an industrial process. One possible explanation for the unsuccessful results is that protein denaturation in SuSy is initiated by the dissociation of the multimeric complex into its (inactive) monomers, rather than unfolding of the tertiary structure of the subunits. In this case, stabilization at the oligomeric interfaces would be more appropriate. However, inspection of the crystal structure of SuSyAt1, did not reveal obvious targets for

203

Chapter 7: general discussion and future perspectives mutagenesis to this end. Alternatively, the inability to predict more stable mutants with programs such as foldX and Rosetta, could possibly be attributed to the use of a homology model which is even more prone to errors than crystal structures. Molecular dynamics531, which simulate the flexibility of a protein in solution, could provide a means to refine the predicted homology model or to generate an ensemble of input structures (MD snapshots), which offer a more accurate representation of the actual structure531,532. The use of multiple MD simulated input structures rather than one rigid homology model could possibly reduce the amount of false positive predictions of stabilizing mutations. Unfortunately, despite many advances, MD simulations (and computational tools in general) remain highly computationally demanding and requires profound knowledge of bioinformatics. In addition, because of the complex nature of protein denaturation, some authors still claim that directed evolution by means of random mutagenesis present a more successful strategy for protein stabilization.

Finally, it has to be noted that thermostability parameters are typically assessed in the absence of substrates. However, it is known that sucrose can act as stabilizer and this was nicely demonstrated by an 8-fold increase of the half-life of SuSyAc at 60°C (Chapter 5). Thus, in industrial settings where SuSy is used to produce UDP-Glc, the enzyme will be stabilized by the presence of its own substrate and possibly also by immobilization, which is frequently used to allow for efficient recovery or continuous operation of the enzyme533. The stability of the SuSy enzymes is thus not expected to be a major concern. Instead, low stability of coupled GTs and the low stability of UDP-Glc at certain conditions (neutral/alkaline pH, elevated temperature and 400 presence of MgCl2) pose more challenging problems .

6 Outlook

In the coming years, Sucrose Synthase will undoubtedly continue to play a key role in unlocking nature’s glycosylation potential by providing nucleotide sugars starting from the cheap bulk substrate sucrose. In this work, expression and characterization of new bacterial SuSys resulted in the expansion of the available arsenal of industrially relevant biocatalysts. Continuing efforts should now focus on the coupled processes with different GT enzymes for the production of glycosides, either in vitro using purified enzymes or in vivo using whole-cells (optionally permeabilized to allow substrate transport across the cell wall). Evaluation of different expression hosts (e.g. S. cerevisiae instead of E. coli) and process engineering will be crucial to optimize expression, activity and/or overall yields. In addition, protein engineering could still provide a means to alter substrate specificity or to improve the stability of GTs. However, future successes in rational GT engineering will mainly depend on advances in our knowledge of structure-function relationships within this class of enzymes. This will definitely be accelerated by the continuing increase of available genomic sequences and crystal structures and the improvements of computational tools. The elucidation of the crystal structure of a Trehalose Phosphorylase, for example, could be extremely helpful to guide additional engineering experiments for the

204

Chapter 7: general discussion and future perspectives introduction of phosphorylase activity into glycosyltransferases. Meanwhile, directed evolution experiments still present a valuable alternative, provided that a suitable high-throughput assay can be developed.

205

Chapter 7: general discussion and future perspectives

206

References

207 References

1. Richardson, B. From a Fossil-Fuel to a Biobased Economy: The Politics of Industrial Biotechnology. Environ. Plan. C Gov. Policy 30, 282–296 (2012).

2. Belghith, H., Ellouz-Chaabouni, S. & Gargouri, A. Biostoning of denims by Penicillium occitanis (Pol6) cellulases. J. Biotechnol. 89, 257–62 (2001).

3. Miyanaga, M., Tanaka, T., Sakiyama, T. & Nakanishi, K. Synthesis of aspartame precursor with an immobilized thermolysin in mixed organic solvents. Biotechnol. Bioeng. 46, 631–635 (1995).

4. Dersjant-Li, Y., Awati, A., Schulze, H. & Partridge, G. Phytase in non-ruminant nutrition: a critical review on phytase activities in the gastrointestinal tract and influencing factors. J. Sci. Food Agric. 95, 878–96 (2015).

5. Yamada, H. & Kobayashi, M. Nitrile Hydratase and Its Application to Industrial Production of Acrylamide. Biosci. Biotechnol. Biochem. 60, 1391–1400 (1996).

6. Yoo, E.-H. & Lee, S.-Y. Glucose biosensors: an overview of use in clinical practice. Sensors (Basel). 10, 4558–76 (2010).

7. Pèlach, M. ., Pastor, F. ., Puig, J., Vilaseca, F. & Mutjé, P. Enzymic deinking of old newspapers with cellulase. Process Biochem. 38, 1063–1067 (2003).

8. Brahmachari, G., Demain, A. & Adrio, J. Biotechnology of Microbial Enzymes: Production, Biocatalysis and Industrial applications. (Elsevier, 2017).

9. Ghaffari-Moghaddam, M., Eslahi, H., Omay, D. & Zakipour-Rahimabadi, E. Industrial applications of enzymes. Rev. J. Chem. 4, 341–361 (2014).

10. Robinson, P. K. Enzymes: principles and biotechnological applications. Essays Biochem. 59, 1:41 (2015).

11. Lairson, L. L., Henrissat, B., Davies, G. J. & Withers, S. G. Glycosyltransferases: structures, functions, and mechanisms. Annu. Rev. Biochem. 77, 521–55 (2008).

12. Aerts, D. et al. Transglucosylation potential of six sucrose phosphorylases toward different classes of acceptors. Carbohydr. Res. 346, 1860–7 (2011).

13. De Bruyn, F., Maertens, J., Beauprez, J., Soetaert, W. & De Mey, M. Biotechnological advances in UDP-sugar based glycosylation of small molecules. Biotechnol. Adv. 33, 288–302 (2015).

14. Desmet, T. et al. Enzymatic glycosylation of small molecules: challenging substrates require tailored catalysts. Chemistry (Easton). 18, 10786–801 (2012).

15. Yang, G. et al. Fluorescence Activated Cell Sorting as a General Ultra-High-Throughput Screening Method for Directed Evolution of Glycosyltransferases. J. Am. Chem. Soc. 132, 10570–10577 (2010).

16. Bungaruang, L., Gutmann, A. & Nidetzky, B. Leloir Glycosyltransferases and Natural Product Glycosylation: Biocatalytic Synthesis of the C-Glucoside Nothofagin, a Major Antioxidant of

208

References

Redbush Herbal Tea. Adv. Synth. Catal. 355, 2757–2763 (2013).

17. Henrissat, B. & Daviest, G. Structural and sequence-based classification of glycoside hydrolases. Curr. Opin. Struct. Biol. 7, 637–644 (1997).

18. Singh, S., Phillips, G. N., Thorson, J. S. & Thorson, J. S. The structural biology of enzymes involved in natural product glycosylation. Nat. Prod. Rep. 29, 1201–37 (2012).

19. Weijers, C. A. G. M., Franssen, M. C. R. & Visser, G. M. Glycosyltransferase-catalyzed synthesis of bioactive oligosaccharides. Biotechnol. Adv. 26, 436–456 (2008).

20. Boehm, G., Fanaro, S., Jelinek, J., Stahl, B. & Marini, A. Prebiotic concept for infant nutrition. Acta Paediatr. 91, 64–7 (2003).

21. Oku, T., Tokunaga, T. & Hosoya, N. Nondigestibility of a new sweetener, ‘Neosugar,’ in the rat. J. Nutr. 114, 1574–81 (1984).

22. Otaka, K. Functional Oligosaccharide and Its New Aspect as Immune Modulation. J. Biol. Macromol 6, 3–9 (2006).

23. Han, R., Liu, L., Li, J., Du, G. & Chen, J. Functions, applications and production of 2-O-D- glucopyranosyl-L-ascorbic acid. Appl. Microbiol. Biotechnol. 95, 313–20 (2012).

24. Hung, L. M., Chen, J. K., Huang, S. S., Lee, R. S. & Su, M. J. Cardioprotective effect of resveratrol, a natural antioxidant derived from grapes. Cardiovasc. Res. 47, 549–55 (2000).

25. Aggarwal, B. B. et al. Role of resveratrol in prevention and therapy of cancer: preclinical and clinical studies. Anticancer research 24, (2004).

26. Torres, P. et al. Enzymatic Synthesis of α-Glucosides of Resveratrol with Surfactant Activity. Adv. Synth. Catal. 353, 1077–1086 (2011).

27. Boots, A. W., Haenen, G. R. M. M. & Bast, A. Health effects of quercetin: From antioxidant to nutraceutical. Eur. J. Pharmacol. 585, 325–337 (2008).

28. Okamoto, T. Safety of quercetin for clinical application (Review). Int. J. Mol. Med. 16, 275–8 (2005).

29. Garegg, P. J. et al. Studies on Koenigs-Knorr glycosidations. Acta Chem. Scand. Ser. B- Organic Chem. Biochem. 39, 569–577 (1985).

30. De Roode, B. M., Franssen, M. C. R., Van Der Padt, A. & Boom, R. M. Perspectives for the Industrial Enzymatic Production of Glycosides. Biotechnol. Prog. 19, 1391–1402 (2003).

31. Rather, M. et al. β-Glycosidases: An alternative enzyme based method for synthesis of alkyl- glycosides. Sustain. Chem. Process. 1, 7 (2013).

32. Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P. M. & Henrissat, B. The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res. 42, D490–D495 (2014).

209

References

33. Yang, S.-T. Bioprocessing for value-added products from renewable resources : new technologies and applications. (Elsevier, 2007).

34. Barrett, A. Enzyme Nomenclature. Eur. J. Biochem. 232, 1 (1995).

35. Possner, D. D. D., Claesson, M. & Guy, J. E. Structure of the Glycosyltransferase Ktr4p from Saccharomyces cerevisiae. PLoS One 10, e0136239 (2015).

36. Gutmann, A. & Nidetzky, B. Switching between O- and C-glycosyltransferase through exchange of active-site motifs. Angew. Chem. Int. Ed. Engl. 51, 12879–83 (2012).

37. Gloster, T. M. Advances in understanding glycosyltransferases from a structural perspective. Curr. Opin. Struct. Biol. 28, 131–141 (2014).

38. Lizak, C., Gerber, S., Numao, S., Aebi, M. & Locher, K. P. X-ray structure of a bacterial oligosaccharyltransferase. Nature 474, 350–355 (2011).

39. Zhang, H. et al. The highly conserved domain of unknown function 1792 has a distinct glycosyltransferase fold. Nat. Commun. 5, 147–157 (2014).

40. Charnock, S. J. & Davies, G. J. Structure of the nucleotide-diphospho-sugar transferase, SpsA from Bacillus subtilis, in native and nucleotide-complexed forms. Biochemistry 38, 6380–6385 (1999).

41. Larivière, L., Gueguen-Chaignon, V. & Moréra, S. Crystal structures of the T4 phage β- glucosyltransferase and the D100A mutant in complex with UDP-glucose: Glucose binding and identification of the catalytic base for a direct displacement mechanism. J. Mol. Biol. 330, 1077–1086 (2003).

42. Albesa-Jove, D., Giganti, D., Jackson, M., Alzari, P. M. & Guerin, M. E. Structure-function relationships of membrane-associated GT-B glycosyltransferases. Glycobiology 24, 108–124 (2014).

43. Diricks, M., De Bruyn, F., Van Daele, P., Walmagh, M. & Desmet, T. Identification of sucrose synthase in nonphotosynthetic bacteria and characterization of the recombinant enzymes. Appl. Microbiol. Biotechnol. 99, 8465–74 (2015).

44. Breton, C., Šnajdrová, L., Jeanneau, C., Koča, J. & Imberty, A. Structures and mechanisms of glycosyltransferases. Glycobiology 16, 29R–37R (2005).

45. Ardevol, A., Iglesias-Fernandez, J., Rojas-Cervellera, V. & Rovira, C. The reaction mechanism of retaining glycosyltransferases. Biochem. Soc. Trans. 44, 51–60 (2016).

46. Soya, N., Fang, Y., Palcic, M. M. & Klassen, J. S. Trapping and characterization of covalent intermediates of mutant retaining glycosyltransferases. Glycobiology 21, 547–552 (2011).

47. Gómez, H., Lluch, J. M. & Masgrau, L. Substrate-assisted and nucleophilically assisted catalysis in bovine α1,3-galactosyltransferase. Mechanistic implications for retaining glycosyltransferases. J. Am. Chem. Soc. 135, 7053–63 (2013).

210

References

48. Zheng, Y., Anderson, S., Zhang, Y. & Garavito, R. M. The structure of sucrose synthase-1 from Arabidopsis thaliana and its functional implications. J. Biol. Chem. 286, 36108–18 (2011).

49. Bobovská, A., Tvaroška, I. & Kóňa, J. Theoretical study of enzymatic catalysis explains why the trapped covalent intermediate in the E303C mutant of glycosyltransferase GTB was not detected in the wild-type enzyme. Glycobiology 25, 3–7 (2014).

50. Lee, S. S. et al. Mechanistic evidence for a front-side, SNi-type reaction in a retaining glycosyltransferase. Nat. Chem. Biol. 7, 631–8 (2011).

51. Ardèvol, A. & Rovira, C. The Molecular Mechanism of Enzymatic Glycosyl Transfer with Retention of Configuration: Evidence for a Short-Lived Oxocarbenium-Like Species. Angew. Chemie Int. Ed. 50, 10897–10901 (2011).

52. Bobovská, A., Tvaroška, I. & Kóňa, J. A theoretical study on the catalytic mechanism of the retaining α-1,2- Kre2p/Mnt1p: the impact of different metal ions on catalysis. Org. Biomol. Chem. 12, 4201 (2014).

53. Gómez, H., Polyak, I., Thiel, W., Lluch, J. M. & Masgrau, L. Retaining Glycosyltransferase Mechanism Studied by QM/MM Methods: Lipopolysaccharyl-α-1,4-galactosyltransferase C Transfers α-Galactose via an Oxocarbenium Ion-like Transition State. J. Am. Chem. Soc. 134, 4743–4752 (2012).

54. Lira-Navarrete, E. et al. Substrate-guided front-face reaction revealed by combined structural snapshots and metadynamics for the polypeptide N-acetylgalactosaminyltransferase 2. Angew. Chem. Int. Ed. Engl. 53, 8206–10 (2014).

55. Williams, G. J., Gantt, R. W. & Thorson, J. S. The impact of enzyme engineering upon natural product glycodiversification. Curr. Opin. Chem. Biol. 12, 556–64 (2008).

56. Kapitonov, D. & Yu, R. K. Conserved domains of glycosyltransferases. Glycobiology 9, 961–78 (1999).

57. Cid, E., Gomis, R. R., Geremia, R. A., Guinovart, J. J. & Ferrer, J. C. Identification of two essential glutamic acid residues in glycogen synthase. J. Biol. Chem. 275, 33614–21 (2000).

58. Kostova, Z. et al. Comparative importance in vivo of conserved glutamate residues in the EX7E motif retaining glycosyltransferase Gpi3p, the UDP-GlcNAc-binding subunit of the first enzyme in glycosylphosphatidylinositol assembly. Eur. J. Biochem. 270, 4507–14 (2003).

59. Absmanner, B., Schmeiser, V., Kämpf, M. & Lehle, L. Biochemical characterization, membrane association and identification of amino acids essential for the function of Alg11 from Saccharomyces cerevisiae, an alpha1,2-mannosyltransferase catalysing two sequential glycosylation steps in the formation of the l. Biochem. J. 426, 205–17 (2010).

60. Greenfield, L. K. et al. Domain Organization of the Polymerizing Mannosyltransferases Involved in Synthesis of the Escherichia coli O8 and O9a Lipopolysaccharide O-antigens. J. Biol. Chem. 287, 38135–38149 (2012).

211

References

61. Yep, A., Ballicora, M. A., Sivak, M. N. & Preiss, J. Identification and characterization of a critical region in the glycogen synthase from Escherichia coli. J. Biol. Chem. 279, 8359–67 (2004).

62. Woo, E.-J. et al. Structural insights on the new mechanism of trehalose synthesis by trehalose synthase TreT from Pyrococcus horikoshii. J. Mol. Biol. 404, 247–59 (2010).

63. Hughes, J. & Hughes, M. A. Multiple secondary plant product UDP-glucose glucosyltransferase genes expressed in cassava (Manihot esculenta Crantz) cotyledons. DNA Seq. 5, 41–9 (1994).

64. Hundle, B. S., O’Brien, D. A., Alberti, M., Beyer, P. & Hearst, J. E. Functional expression of zeaxanthin glucosyltransferase from Erwinia herbicola and a proposed uridine diphosphate binding site. Proc. Natl. Acad. Sci. U. S. A. 89, 9321–5 (1992).

65. Campbell, J. A., Davies, G. J., Bulone, V. & Henrissat, B. A classification of nucleotide- diphospho-sugar glycosyltransferases based on amino acid sequence similarities. Biochem. J. 929–39 (1997).

66. Palcic, M. M. Glycosyltransferases as biocatalysts. Curr. Opin. Chem. Biol. 15, 226–33 (2011).

67. Ryu, S.-I. & Lee, S.-B. Synthesis of nucleotide sugars and α-galacto-oligosaccharides by recombinant Escherichia coli cells with trehalose substrate. Enzyme Microb. Technol. 53, 359– 63 (2013).

68. Morell, M. & Copeland, L. Sucrose synthase of soybean nodules. Plant Physiol. 78, 149–54 (1985).

69. Zervosen, A. & Elling, L. A Novel Three-Enzyme Reaction Cycle for the Synthesis of N- Acetyllactosamine with in Situ Regeneration of Uridine 5‘-Diphosphate Glucose and Uridine 5‘- Diphosphate Galactose. J. Am. Chem. Soc. 118, 1836–1840 (1996).

70. Chen, X. et al. Transferring a Biosynthetic Cycle into a Productive Escherichia coli Strain: Large-Scale Synthesis of Galactosides. J. Am. Chem. Soc. 123, 8866–8867 (2001).

71. Liu, Z., Lu, Y., Zhang, J., Pardee, K. & Wang, P. G. P1 Trisaccharide (Gal 1,4Gal 1,4GlcNAc) Synthesis by Enzyme Glycosylation Reactions Using Recombinant Escherichia coli. Appl. Environ. Microbiol. 69, 2110–2115 (2003).

72. Tsai, T.-I. et al. Effective Sugar Nucleotide Regeneration for the Large-Scale Enzymatic Synthesis of Globo H and SSEA4. J. Am. Chem. Soc. 135, 14831–14839 (2013).

73. Koizumi, S., Endo, T., Tabata, K. & Ozaki, A. Large-scale production of UDP-galactose and globotriose by coupling metabolically engineered bacteria. Nat. Biotechnol. 16, 847–50 (1998).

74. Liu, Z., Zhang, J., Chen, X. & Wang, P. G. Combined biosynthetic pathway for de novo production of UDP-galactose: catalysis with multiple enzymes immobilized on agarose beads. Chembiochem 3, 348–55 (2002).

75. Coutinho, P. M., Deleury, E., Davies, G. J. & Henrissat, B. An Evolving Hierarchical Family Classification for Glycosyltransferases. J. Mol. Biol. 328, 307–317 (2003).

212

References

76. Lee, Y. G. et al. Molecular Cloning of Cyanobacterial Pteridine Glycosyltransferases That Catalyze the Transfer of either Glucose or Xylose to Tetrahydrobiopterin. Appl. Environ. Microbiol. 76, 7658–7661 (2010).

77. Yethon, J. A., Vinogradov, E., Perry, M. B. & Whitfield, C. Mutation of the lipopolysaccharide core glycosyltransferase encoded by waaG destabilizes the outer membrane of Escherichia coli by interfering with core phosphorylation. J. Bacteriol. 182, 5620–3 (2000).

78. Muheim, C. et al. Identification of a Fragment-Based Scaffold that Inhibits the Glycosyltransferase WaaG from Escherichia coli. Antibiot. (Basel, Switzerland) 5, (2016).

79. Weitnauer, G. et al. Biosynthesis of the orthosomycin antibiotic avilamycin A: deductions from the molecular analysis of the avi biosynthetic gene cluster of Streptomyces viridochromogenes Tü57 and production of new antibiotics. Chem. Biol. 8, 569–81 (2001).

80. Geigenberger, P. & Stitt, M. Sucrose synthase catalyses a readily reversible reaction in vivo in developing potato tubers and other plant tissues. Planta 189, 329–39 (1993).

81. Déjardin, A., Rochat, C., Maugenest, S. & Boutin, J.-P. Purification, characterization and physiological role of sucrose synthase in the pea seed coat (Pisum sativum L.). Planta 201, 128–137 (1997).

82. Wang, H. L., Lee, P. D., Chen, W. L., Huang, D. J. & Su, J. C. Osmotic stress-induced changes of sucrose metabolism in cultured sweet potato cells. J. Exp. Bot. 51, 1991–9 (2000).

83. Cardini, C. E., Leloir, L. F. & Chiriboga, J. The biosynthesis of sucrose. J. Biol. Chem. 214, 149–55 (1955).

84. Schmölzer, K., Gutmann, A., Diricks, M., Desmet, T. & Nidetzky, B. Sucrose synthase: A unique glycosyltransferase for biocatalytic glycosylation process development. Biotechnol. Adv. 34, 88–111 (2015).

85. Delmer, D. P. The purification and properties of sucrose synthetase from etiolated Phaseolus aureus seedlings. J. Biol. Chem. 247, 3822–8 (1972).

86. Gutmann, A. & Nidetzky, B. Unlocking the Potential of Leloir Glycosyltransferases for Applied Biocatalysis: Efficient Synthesis of Uridine 5′-Diphosphate-Glucose by Sucrose Synthase. Adv. Synth. Catal. 358, 3600–3609 (2016).

87. Deng, C. & Chen, R. R. A pH-sensitive assay for galactosyltransferase. Anal. Biochem. 330, 219–226 (2004).

88. Persson, M. & Palcic, M. M. A high-throughput pH indicator assay for screening glycosyltransferase saturation mutagenesis libraries. Anal. Biochem. 378, 1–7 (2008).

89. Bieniawska, Z. et al. Analysis of the sucrose synthase gene family in Arabidopsis. Plant J. 49, 810–828 (2007).

90. Elling, L., Grothus, M. & Kula, M. R. Investigation of sucrose synthase from rice for the synthesis of various nucleotide sugars and saccharides. Glycobiology 3, 349–55 (1993). 213

References

91. Römer, U. et al. Expression, purification and characterization of recombinant sucrose synthase 1 from Solanum tuberosum L. for carbohydrate engineering. J. Biotechnol. 107, 135–149 (2004).

92. Römer, U., Nettelstroth, N., Köckenberger, W. & Elling, L. Characterization of Recombinant Sucrose Synthase 1 from Potato for the Synthesis of Sucrose Analogues. Adv. Synth. Catal. 343, 655–661 (2001).

93. Sauerzapfe, B., Engels, L. & Elling, L. Broadening the biocatalytic properties of recombinant sucrose synthase 1 from potato (Solanum tuberosum L.) by expression in Escherichia coli and Saccharomyces cerevisiae. Enzyme Microb. Technol. 43, 289–296 (2008).

94. Cardini, C. & Recondo, E. Specificity of nucleoside diphosphate sugars in sucrose biosynthesis. Plant Cell Physiol. 3, 313–318 (1962).

95. Wu, R. et al. The Crystal Structure of Nitrosomonas europaea Sucrose Synthase Reveals Critical Conformational Changes and Insights into Sucrose Metabolism in Prokaryotes. J. Bacteriol. 197, 2734–2746 (2015).

96. Rohrig, H., Schmidt, J., Miklashevichs, E., Schell, J. & John, M. Soybean ENOD40 encodes two peptides that bind to sucrose synthase. Proc. Natl. Acad. Sci. U. S. A. 99, 1915–20 (2002).

97. Amor, Y., Haigler, C. H., Johnson, S., Wainscott, M. & Delmer, D. P. A membrane-associated form of sucrose synthase and its potential role in synthesis of cellulose and callose in plants. Proc. Natl. Acad. Sci. U. S. A. 92, 9353–7 (1995).

98. Winter, H., Huber, J. L. & Huber, S. C. Membrane association of sucrose synthase: changes during the graviresponse and possible control by protein phosphorylation. FEBS Lett. 420, 151–155 (1997).

99. Winter, H., L. Huber, J. & Huber, S. C. Identification of sucrose synthase as an actin-binding protein. FEBS Lett. 430, 205–208 (1998).

100. Subbaiah, C. C. et al. Mitochondrial localization and putative signaling function of sucrose synthase in maize. J. Biol. Chem. 281, 15625–35 (2006).

101. Huber, S. C. et al. Phosphorylation of serine-15 of maize leaf sucrose synthase. Occurrence in vivo and possible regulatory significance. Plant Physiol. 112, 793–802 (1996).

102. Duncan, K. A. & Huber, S. C. Sucrose synthase oligomerization and F-actin association are regulated by sucrose concentration and phosphorylation. Plant Cell Physiol. 48, 1612–23 (2007).

103. Su, J. C. & Preiss, J. Purification and properties of sucrose synthase from maize kernels. Plant Physiol. 61, 389–93 (1978).

104. Thummler, F. & Verma, D. P. Nodulin-100 of soybean is the subunit of sucrose synthase regulated by the availability of free heme in nodules. J. Biol. Chem. 262, 14730–6 (1987).

105. Barratt, D. H. et al. Multiple, distinct isoforms of sucrose synthase in pea. Plant Physiol. 127, 214

References

655–64 (2001).

106. Porchia, A. C., Curatti, L. & Salerno, G. L. Sucrose metabolism in cyanobacteria: sucrose synthase from Anabaena sp. strain PCC 7119 is remarkably different from the plant enzymes with respect to substrate affinity and amino-terminal sequence. Planta 210, 34–40 (1999).

107. Martin, T., Frommer, W. B., Salanoubat, M. & Willmitzer, L. Expression of an Arabidopsis sucrose synthase gene indicates a role in metabolization of sucrose both during phloem loading and in sink organs. Plant J. 4, 367–77 (1993).

108. Chen, Y.-C. & Chourey, P. S. Spatial and temporal expression of the two sucrose synthase genes in maize: immunohistological evidence. Theor. Appl. Genet. 78, 553–559 (1989).

109. Heinlein, M. & Starlinger, P. Tissue- and cell-specific expression of the two sucrose synthase isoenzymes in developing maize kernels. MGG Mol. Gen. Genet. 215, 441–446 (1989).

110. Maraña, C., Garcia-Olmedo, F. & Carbonero, P. Differential expression of two types of sucrose synthase-encoding genes in wheat in response to anaerobiosis, cold shock and light. Gene 88, 167–172 (1990).

111. Sturm, a, Lienhard, S., Schatt, S. & Hardegger, M. Tissue-specific expression of two genes for sucrose synthase in carrot (Daucus carota L.). Plant Mol. Biol. 39, 349–60 (1999).

112. Chourey, P. S., Latham, M. D. & Still, P. E. Expression of two sucrose synthetase genes in endosperm and seedling cells of maize: evidence of tissue specific polymerization of protomers. MGG Mol. Gen. Genet. 203, 251–255 (1986).

113. Barratt, D. H. et al. Multiple, distinct isoforms of sucrose synthase in pea. Plant Physiol. 127, 655–64 (2001).

114. Duncan, K. A., Hardin, S. C. & Huber, S. C. The three maize sucrose synthase isoforms differ in distribution, localization, and phosphorylation. Plant Cell Physiol. 47, 959–971 (2006).

115. Chourey, P. S., Taliercio, E. W., Carlson, S. J. & Ruan, Y. L. Genetic evidence that the two isozymes of sucrose synthase present in developing maize endosperm are critical, one for cell wall integrity and the other for starch biosynthesis. Mol. Gen. Genet. 259, 88–96 (1998).

116. Streb, S., Egli, B., Eicke, S. & Zeeman, S. C. The debate on the pathway of starch synthesis: a closer look at low-starch mutants lacking plastidial phosphoglucomutase supports the chloroplast-localized pathway. Plant Physiol. 151, 1769–72 (2009).

117. Pozueta-Romero, J., Ardila, F. & Akazawa, T. ADP-Glucose Transport by the Chloroplast Adenylate Translocator Is Linked to Starch Biosynthesis. Plant Physiol. 97, (1991).

118. Zrenner, R., Salanoubat, M., Willmitzer, L. & Sonnewald, U. Evidence of the crucial role of sucrose synthase for sink strength using transgenic potato plants (Solanum tuberosum L.). Plant J. 7, 97–107 (1995).

119. Baroja-Fernández, E. et al. Sucrose synthase catalyzes the de novo production of ADPglucose linked to starch biosynthesis in heterotrophic tissues of plants. Plant Cell Physiol. 44, 500–9 215

References

(2003).

120. Baroja-Fernández, E. et al. Sucrose synthase activity in the sus1/sus2/sus3/sus4 Arabidopsis mutant is sufficient to support normal cellulose and starch production. Proc. Natl. Acad. Sci. U. S. A. 109, 321–6 (2012).

121. Munoz, F. J. et al. Sucrose Synthase Controls Both Intracellular ADP Glucose Levels and Transitory Starch Biosynthesis in Source Leaves. Plant Cell Physiol. 46, 1366–1376 (2005).

122. Baroja-Fernández, E. et al. Most of ADP-glucose linked to starch biosynthesis occurs outside the chloroplast in source leaves. Proc. Natl. Acad. Sci. U. S. A. 101, 13080–5 (2004).

123. Pozueta-Romero, J., Frehner, M., Viale, A. M. & Akazawa, T. Direct transport of ADPglucose by an adenylate translocator is linked to starch biosynthesis in amyloplasts. Proc. Natl. Acad. Sci. U. S. A. 88, 5769–73 (1991).

124. Bahaji, A. et al. Arabidopsis thaliana Mutants Lacking ADP-Glucose Pyrophosphorylase Accumulate Starch and Wild-type ADP-Glucose Content: Further Evidence for the Occurrence of Important Sources, other than ADP-Glucose Pyrophosphorylase, of ADP-Glucose Linked to Leaf Starch Biosynthesis. Plant Cell Physiol. 52, 1162–1176 (2011).

125. Caspar, T., Huber, S. C. & Somerville, C. Alterations in Growth, Photosynthesis, and Respiration in a Starchless Mutant of Arabidopsis thaliana (L.) Deficient in Chloroplast Phosphoglucomutase Activity. Plant Physiol. 79, 11–7 (1985).

126. Pfister, B. & Zeeman, S. C. Formation of starch in plant cells. Cell. Mol. Life Sci. 73, 2781–807 (2016).

127. Stitt, M. & Zeeman, S. C. Starch turnover: pathways, regulation and role in growth. Curr. Opin. Plant Biol. 15, 282–292 (2012).

128. Schunemann, D., Borchert, S., Flugge, U. I. & Heldt, H. W. ADP/ATP Translocator from Pea Root Plastids (Comparison with Translocators from Spinach Chloroplasts and Pea Leaf Mitochondria). Plant Physiol. 103, 131–137 (1993).

129. Barratt, D. H. P. et al. Normal growth of Arabidopsis requires cytosolic invertase but not sucrose synthase. Proc. Natl. Acad. Sci. U. S. A. 106, 13124–9 (2009).

130. Zeeman, S. C., Smith, S. M. & Smith, A. M. The diurnal metabolism of leaf starch. Biochem. J 401, 13–28 (2007).

131. Ehira, S., Kimura, S., Miyazaki, S. & Ohmori, M. Sucrose synthesis in the nitrogen-fixing Cyanobacterium Anabaena sp. strain PCC 7120 is controlled by the two-component response regulator OrrA. Appl. Environ. Microbiol. 80, 5672–9 (2014).

132. Higo, A., Katoh, H., Ohmori, K., Ikeuchi, M. & Ohmori, M. The role of a gene cluster for trehalose metabolism in dehydration tolerance of the filamentous cyanobacterium Anabaena sp. PCC 7120. Microbiology 152, 979–87 (2006).

133. Roby, C., Cortès, S., Gromova, M., Le Bail, J.-L. & Roberts, J. K. M. Sucrose cycling in 216

References

heterotrophic plant cell metabolism: first step towards an experimental model. Mol. Biol. Rep. 29, 145–9 (2002).

134. Cumino, A. C., Marcozzi, C., Barreiro, R. & Salerno, G. L. Carbon cycling in Anabaena sp. PCC 7120. Sucrose synthesis in the heterocysts and possible role in nitrogen fixation. Plant Physiol. 143, 1385–97 (2007).

135. Nguyen-Quoc, B. & Foyer, C. H. A role for ‘futile cycles’ involving invertase and sucrose synthase in sucrose metabolism of tomato fruit. J. Exp. Bot. 52, 881–9 (2001).

136. Kolman, M. A., Nishi, C. N., Perez-Cenci, M. & Salerno, G. L. Sucrose in cyanobacteria: from a salt-response molecule to play a key role in nitrogen fixation. Life 5, 102–26 (2015).

137. Curatti, L., Giarrocco, L. & Salerno, G. L. Sucrose synthase and RuBisCo expression is similarly regulated by the nitrogen source in the nitrogen-fixing cyanobacterium Anabaena sp. Planta 223, 891–900 (2006).

138. Gordon, A. J. Sucrose Synthase in Legume Nodules Is Essential for Nitrogen Fixation. PLANT Physiol. 120, 867–878 (1999).

139. Curatti, L., Flores, E. & Salerno, G. Sucrose is involved in the diazotrophic metabolism of the heterocyst-forming cyanobacterium Anabaena sp. FEBS Lett. 513, 175–8 (2002).

140. Hardin, S. C. et al. Phosphorylation of sucrose synthase at serine 170: occurrence and possible role as a signal for proteolysis. Plant J. 35, 588–603 (2003).

141. Tsai, Z.-C. & Wang, A.-Y. Identification of rice manganese-dependent protein kinases that phosphorylate sucrose synthase at multiple serine residues. Bot. Bull. Acad. Sin. 44, 141–150 (2003).

142. Hardin, S. C., Winter, H. & Huber, S. C. Phosphorylation of the Amino Terminus of Maize Sucrose Synthase in Relation to Membrane Association and Enzyme Activity. PLANT Physiol. 134, 1427–1438 (2004).

143. Zhang, X. Q. et al. Soybean nodule sucrose synthase (nodulin-100): further analysis of its phosphorylation using recombinant and authentic root-nodule enzymes. Arch. Biochem. Biophys. 371, 70–82 (1999).

144. Nakai, T. et al. An increase in apparent affinity for sucrose of mung bean sucrose synthase is caused by in vitro phosphorylation or directed mutagenesis of Ser11. Plant Cell Physiol. 39, 1337–41 (1998).

145. Röhrig, H., John, M. & Schmidt, J. Modification of soybean sucrose synthase by S-thiolation with ENOD40 peptide A. Biochem. Biophys. Res. Commun. 325, 864–70 (2004).

146. Kouchi, H., Takane, K., So, R. B., Ladha, J. K. & Reddy, P. M. Rice ENOD40: isolation and expression analysis in rice and transgenic soybean root nodules. Plant J. 18, 121–9 (1999).

147. André, I., Potocki-Véronèse, G., Morel, S., Monsan, P. & Remaud-Siméon, M. Sucrose-utilizing transglucosidases for biocatalysis. Top. Curr. Chem. 294, 25–48 (2010). 217

References

148. Daudé, D., Remaud-Siméon, M. & André, I. Sucrose analogs: an attractive (bio)source for glycodiversification. Nat. Prod. Rep. 29, 945–60 (2012).

149. Kralj, S., Buchholz, K., Dijkhuizen, L. & Seibel, J. Fructansucrase enzymes and sucrose analogues: A new approach for the synthesis of unique fructo-oligosaccharides. Biocatal. Biotransformation 26, 32–41 (2009).

150. Seibel, J. et al. Synthesis of sucrose analogues and the mechanism of action of Bacillus subtilis fructosyltransferase (levansucrase). Carbohydr. Res. 341, 2335–2349 (2006).

151. Sinha, A. K. et al. Metabolizable and Non-Metabolizable Sugars Activate Different Signal Transduction Pathways in Tomato. PLANT Physiol. 128, 1480–1489 (2002).

152. Ward, J. M., Kühn, C., Tegeder, M. & Frommer, W. B. Sucrose Transport in Higher Plants. Int Rev Cytol. 178, 41–71 (1998).

153. Gantt, R. W., Peltier-Pain, P., Singh, S., Zhou, M. & Thorson, J. S. Broadening the scope of glycosyltransferase-catalyzed sugar nucleotide synthesis. Proc. Natl. Acad. Sci. U. S. A. 110, 7648–53 (2013).

154. Michu, E. A short guide to phylogeny reconstruction. Plant, Soil Environ. 53, 442–446 (2007).

155. Fitch, W. M. Toward defining the course of evolution: minimum change for a specified tree topology. Syst. Zool. 20, 406–416 (1971).

156. Felsenstein, J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–76 (1981).

157. Zhang, J. & Nei, M. Accuracies of Ancestral Amino Acid Sequences Inferred by the Parsimony , Likelihood , and Distance Methods. J. Mol. Evol. 44, 139–146 (1997).

158. Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–25 (1987).

159. Hasegawa, M. & Fujiwara, M. Relative efficiencies of the Maximum Likelihood, Neigbor-Joining and Maxmimum Parsimony methods for estimating protein phylogeny. Mol. Phylogenet. Evol. 2, 1–5 (1993).

160. Tateno, Y., Takezaki, N. & Nei, M. Relative efficiencies of the maximum-likelihood, neighbor- joining, and maximum-parsimony methods when substitution rate varies with site. Mol. Biol. Evol. 11, 261–77 (1994).

161. Huelsenbeck, J. P. The robustness of two phylogenetic methods: four-taxon simulations reveal a slight superiority of maximum likelihood over neighbor joining. Mol. Biol. Evol. 12, 843–9 (1995).

162. Thornton, J. W. Resurrecting ancient genes: experimental analysis of extinct molecules. Nat. Rev. Genet. 5, 366–75 (2004).

163. Dayhoff, M., Schwartz, R. & Orcutt, B. A model of evolutionary change in proteins. Atlas

218

References

Protein Seq. Struct. 5, 345–352 (1978).

164. Jones, D., Taylor, W. & Thornton, J. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8, 275–282 (1992).

165. Whelan, S. & Goldman, N. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18, 691–699 (1995).

166. Keane, T. M., Creevey, C. J., Pentony, M. M., Naughton, T. J. & Mclnerney, J. O. Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evol. Biol. 6, 29 (2006).

167. Abascal, F., Zardoya, R. & Posada, D. ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21, 2104–5 (2005).

168. Kumar, S., Stecher, G. & Tamura, K. MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. Mol. Biol. Evol. 33, 1870–1874 (2016).

169. Yang, Z. Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol. Biol. Evol. 10, 1396–401 (1993).

170. Felsenstein, J. Confidence Limits on Phylogenies: An Approach Using the Bootstrap. Evolution (N. Y). 39, 783 (1985).

171. Hall, B. G. Building phylogenetic trees from molecular data with MEGA. Mol. Biol. Evol. 30, 1229–35 (2013).

172. Tamura, K., Stecher, G., Peterson, D., Filipski, A. & Kumar, S. MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol. Biol. Evol. 30, 2725–9 (2013).

173. Tao, H. & Cornish, V. W. Milestones in directed enzyme evolution. Curr. Opin. Chem. Biol. 6, 858–64 (2002).

174. Böttcher, D. & Bornscheuer, U. T. Protein engineering of microbial enzymes. Curr. Opin. Microbiol. 13, 274–82 (2010).

175. Bornscheuer, U. & Kazlauskas, R. J. Survey of protein engineering strategies. Curr. Protoc. Protein Sci. Chapter 26, Unit26.7 (2011).

176. Dalby, P. A. Strategy and success for the directed evolution of enzymes. Curr. Opin. Struct. Biol. 21, 473–80 (2011).

177. Desmet, T. & soetaert, wim. Biocatalysis and Enzyme Technology (course notes). Ghent university. (2012).

178. Stemmer, W. Rapid evolution of a protein in vitro by DNA shuffling. Nature 370, 389–391 (1994).

179. Neylon, C. Chemical and biochemical strategies for the randomization of protein encoding DNA sequences: library construction methods for directed evolution. Nucleic Acids Res. 32, 1448– 59 (2004).

219

References

180. Bornscheuer, U. T. & Pohl, M. Improved biocatalysts by directed evolution and rational protein design. Curr. Opin. Chem. Biol. 5, 137–43 (2001).

181. Stepankova, V. et al. Strategies for Stabilization of Enzymes in Organic Solvents. ACS Catal. 3, 2823–2836 (2013).

182. Davids, T., Schmidt, M., Böttcher, D. & Bornscheuer, U. T. Strategies for the discovery and engineering of enzymes for biocatalysis. Curr. Opin. Chem. Biol. 17, 215–220 (2013).

183. Tee, K. L. & Wong, T. S. Polishing the craft of genetic diversity creation in directed evolution. Biotechnol. Adv. 31, 1707–1721 (2013).

184. Reetz, M. T., Bocola, M., Carballeira, J. D., Zha, D. & Vogel, A. Expanding the Range of Substrate Acceptance of Enzymes: Combinatorial Active-Site Saturation Test. Angew. Chemie Int. Ed. 44, 4192–4196 (2005).

185. Patrick, W. M., Firth, A. E. & Blackburn, J. M. User-friendly algorithms for estimating completeness and diversity in randomized protein-encoding libraries. Protein Eng. 16, 451–7 (2003).

186. Kille, S. et al. Reducing codon redundancy and screening effort of combinatorial protein libraries created by saturation mutagenesis. ACS Synth. Biol. 2, 83–92 (2013).

187. Heringa, J. & Taylor, W. Three-dimensional domain duplication, swapping and stealing. Curr. Opin. Struct. Biol. 7, 416–421 (1997).

188. Nixon, a E., Ostermeier, M. & Benkovic, S. J. Hybrid enzymes: manipulating enzyme design. Trends Biotechnol. 16, 258–64 (1998).

189. Ostermeier, B. M. & Benkovic, S. J. Evolution of protein function by domain swapping. 55, 29– 77 (2001).

190. van Beek, H. L., de Gonzalo, G. & Fraaije, M. W. Blending Baeyer-Villiger monooxygenases: using a robust BVMO as a scaffold for creating chimeric enzymes with novel catalytic properties. Chem. Commun. 48, 3288–90 (2012).

191. Franceus, J., Verhaeghe, T. & Desmet, T. Correlated positions in protein evolution and engineering. J. Ind. Microbiol. Biotechnol. 1–9 (2016). doi:10.1007/s10295-016-1811-1

192. Kuipers, R. K. P. et al. Correlated mutation analyses on super-family alignments reveal functionally important residues. Proteins 76, 608–16 (2009).

193. de Juan, D., Pazos, F. & Valencia, A. Emerging methods in protein co-evolution. Nat. Rev. Genet. 14, 249–261 (2013).

194. Sandler, I., Zigdon, N., Levy, E. & Aharoni, A. The functional importance of co-evolving residues in proteins. Cell. Mol. Life Sci. 71, 673–82 (2014).

195. Kuipers, R. K. et al. 3DM: systematic analysis of heterogeneous superfamily data to discover protein functionalities. Proteins 78, 2101–13 (2010).

220

References

196. Russell, A. J. & Klibanov, A. M. Inhibitor-induced enzyme activation in organic solvents. J. Biol. Chem. 263, 11624–6 (1988).

197. Staahl, M., Jeppsson-Wistrand, U., Maansson, M. O. & Mosbach, K. Induced stereo- and substrate selectivity of bioimprinted .alpha.-chymotrypsin in anhydrous organic media. J. Am. Chem. Soc. 113, 9366–9368 (1991).

198. Peißker, F. & Fischer, L. Crosslinking of imprinted proteases to maintain a tailor-made substrate selectivity in aqueous solutions. Bioorg. Med. Chem. 7, 2231–2237 (1999).

199. Kronenburg, N. A. ., de Bont, J. A. . & Fischer, L. Improvement of enantioselectivity by immobilized imprinting of epoxide hydrolase from Rhodotorula glutinis. J. Mol. Catal. B Enzym. 16, 121–129 (2001).

200. Vaidya, A., Borck, A., Manns, A. & Fischer, L. Altering Glucose Oxidase to Oxidize D- Galactose through Crosslinking of Imprinted Protein. ChemBioChem 5, 132–135 (2004).

201. Wang, S. et al. Enzyme Stability and Activity in Non-Aqueous Reaction Systems: A Mini Review. Catalysts 6, 32 (2016).

202. Cantone, S. et al. Biocatalysis in non-conventional media—ionic liquids, supercritical fluids and the gas phase. Green Chem. 9, 954 (2007).

203. Klibanov, A. M. Improving enzymes by using them in organic solvents. Nature 409, 241–246 (2001).

204. Ju, H., Jang, E., Ryu, B. H. & Kim, T. D. Characterization and preparation of highly stable aggregates of a novel type of hydrolase (BL28) from Bacillus licheniformis. Bioresour. Technol. 128, 81–86 (2013).

205. Sheldon, R. A. Cross-Linked Enzyme Aggregates as Industrial Biocatalysts. Org. Process Res. Dev. 15, 213–223 (2011).

206. Sheldon, R. A. Characteristic features and biotechnological applications of cross-linked enzyme aggregates (CLEAs). Appl. Microbiol. Biotechnol. 92, 467–77 (2011).

207. Yang, X., Zheng, P., Ni, Y. & Sun, Z. Highly efficient biosynthesis of sucrose-6-acetate with cross-linked aggregates of Lipozyme TL 100 L. J. Biotechnol. 161, 27–33 (2012).

208. Cui, J. D. & Jia, S. R. Optimization protocols and improved strategies of cross-linked enzyme aggregates technology: current development and future challenges. Crit. Rev. Biotechnol. 35, 15–28 (2015).

209. Kaulpiboon, J., Pongsawasdi, P. & Zimmermann, W. Molecular imprinting of cyclodextrin glycosyltransferases from Paenibacillus sp. A11 and Bacillus macerans with γ-cyclodextrin. FEBS J. 274, 1001–1010 (2007).

210. Vaidya, A. A., Lele, B. S., Kulkarni, M. G. & Mashelkar, R. A. Creating a macromolecular receptor by affinity imprinting. J. Appl. Polym. Sci. 81, 1075–1083 (2001).

221

References

211. Klein, J. U., Whitcombe, M. J., Mulholland, F. & Vulfson, E. N. Template-Mediated Synthesis of a Polymeric Receptor Specific to Amino Acid Sequences. Angew. Chemie Int. Ed. 38, 2057– 2060 (1999).

212. Sanchez-Ruiz, J. M. Protein kinetic stability. Biophys. Chem. 148, 1–15 (2010).

213. Brissos, V., Gonçalves, N., Melo, E. P. & Martins, L. O. Improving kinetic or thermodynamic stability of an azoreductase by directed evolution. PLoS One 9, e87209 (2014).

214. Niesen, F. H., Berglund, H. & Vedadi, M. The use of differential scanning fluorimetry to detect ligand interactions that promote protein stability. Nat. Protoc. 2, 2212–21 (2007).

215. Greenfield, N. J. Using circular dichroism collected as a function of temperature to determine the thermodynamics of protein unfolding and binding interactions. Nat. Protoc. 1, 2527–35 (2006).

216. Johnson, C. M. Differential scanning calorimetry as a tool for protein folding and stability. Arch. Biochem. Biophys. 531, 100–109 (2013).

217. Polizzi, K. M., Bommarius, A. S., Broering, J. M. & Chaparro-Riggers, J. F. Stability of biocatalysts. Curr. Opin. Chem. Biol. 11, 220–5 (2007).

218. Verhaeghe, T., Diricks, M., Aerts, D., Soetaert, W. & Desmet, T. Mapping the acceptor site of sucrose phosphorylase from adolescentis by alanine scanning. J. Mol. Caralysis B Enzym. 96, 81–88 (2013).

219. Kim, Y.-. W. et al. Directed evolution of Thermusmaltogenic amylase toward enhanced thermal resistance. Appl. Environ. Microbiol. 69, 4866–4874 (2003).

220. Cerdobbel, A., Desmet, T., De Winter, K., Maertens, J. & Soetaert, W. Increasing the thermostability of sucrose phosphorylase by multipoint covalent immobilization. J. Biotechnol. 150, 125–30 (2010).

221. Reetz, M. T., Carballeira, J. D. & Vogel, A. Iterative saturation mutagenesis on the basis of B factors as a strategy for increasing protein thermostability. Angew. Chem. Int. Ed. Engl. 45, 7745–51 (2006).

222. Matthews, B. W., Nicholson, H. & Becktel, W. J. Enhanced protein thermostability from site- directed mutations that decrease the entropy of unfolding. Proc. Natl. Acad. Sci. U. S. A. 84, 6663–7 (1987).

223. Berezovsky, I. N., Chen, W. W., Choi, P. J. & Shakhnovich, E. I. Entropic Stabilization of Proteins and Its Proteomic Consequences. PLoS Comput. Biol. 1, e47 (2005).

224. Eijsink, V. G. H. et al. Rational engineering of enzyme stability. J. Biotechnol. 113, 105–20 (2004).

225. Reed, C. J., Lewis, H., Trejo, E., Winston, V. & Evilia, C. Protein adaptations in archaeal extremophiles. Archaea 2013, 373275 (2013).

222

References

226. Pace, C. N. et al. Contribution of hydrogen bonds to protein stability. Protein Sci. 23, 652–61 (2014).

227. Scandurra, R., Consalvi, V., Chiaraluce, R., Politi, L. & Engel, P. C. Protein thermostability in extremophiles. Biochimie 80, 933–941 (1998).

228. Lehmann, M., Pasamontes, L., Lassen, S. F. & Wyss, M. The consensus concept for thermostability engineering of proteins. Biochim. Biophys. Acta 1543, 408–415 (2000).

229. Bosshart, A., Panke, S. & Bechtold, M. Systematic optimization of interface interactions increases the thermostability of a multimeric enzyme. Angew. Chem. Int. Ed. Engl. 52, 9673–6 (2013).

230. Wijma, H. J., Floor, R. J. & Janssen, D. B. Structure- and sequence-analysis inspired engineering of proteins for enhanced thermostability. Curr. Opin. Struct. Biol. 23, 588–94 (2013).

231. Bommarius, A. S. & Paye, M. F. Stabilizing biocatalysts. Chem. Soc. Rev. 42, 6534–65 (2013).

232. Damborsky, J. & Brezovsky, J. Computational tools for designing and engineering biocatalysts. Curr. Opin. Chem. Biol. 13, 26–34 (2009).

233. Gumulya, Y. & Reetz, M. T. Enhancing the thermal robustness of an enzyme by directed evolution: least favorable starting points and inferior mutants can map superior evolutionary pathways. Chembiochem 12, 2502–10 (2011).

234. Koudelakova, T. et al. Engineering enzyme stability and resistance to an organic cosolvent by modification of residues in the access tunnel. Angew. Chem. Int. Ed. Engl. 52, 1959–63 (2013).

235. Richter, F., Leaver-Fay, A., Khare, S. D., Bjelic, S. & Baker, D. De novo enzyme design using Rosetta3. PLoS One 6, e19230 (2011).

236. Schymkowitz, J. et al. The FoldX web server: an online force field. Nucleic Acids Res. 33, W382-8 (2005).

237. Aerts, D. et al. Consensus Engineering of Sucrose Phosphorylase: The Outcome Reflects the Sequence Input. Biotechnol. Bioeng. 110, 2563–72 (2013).

238. Porebski, B. T. & Buckle, A. M. Consensus protein design. Protein Eng. Des. Sel. 29, 245–51 (2016).

239. Redondo, R. A. F., de Vladar, H. P., Włodarski, T. & Bollback, J. P. Evolutionary interplay between structure, energy and epistasis in the coat protein of the ϕX174 phage family. J. R. Soc. Interface 14, (2017).

240. Ramakrishnan, B. & Qasba, P. K. Structure-based Design of 1,4-Galactosyltransferase I ( 4Gal-T1) with Equally EfficientN-Acetylgalactosaminyltransferase Activity: point mutation broadens donor specificity. J. Biol. Chem. 277, 20833–20839 (2002).

241. Ramakrishnan, B. & Qasba, P. K. Role of a Single Amino Acid in the Evolution of Glycans of

223

References

Invertebrates and Vertebrates. J. Mol. Biol. 365, 570–576 (2007).

242. Offen, W. et al. Structure of a flavonoid glucosyltransferase reveals the basis for plant natural product modification. EMBO J. 25, 1396–405 (2006).

243. Noguchi, A. et al. Local differentiation of sugar donor specificity of flavonoid glycosyltransferase in Lamiales. Plant Cell 21, 1556–72 (2009).

244. Osmani, S. A., Bak, S., Imberty, A., Olsen, C. E. & Møller, B. L. Catalytic key amino acids and UDP-sugar donor specificity of a plant , UGT94B1: molecular modeling substantiated by site-specific mutagenesis and biochemical analyses. Plant Physiol. 148, 1295–308 (2008).

245. Kubo, A., Arai, Y., Nagashima, S. & Yoshikawa, T. Alteration of sugar donor specificities of plant glycosyltransferases by a single point mutation. Arch. Biochem. Biophys. 429, 198–203 (2004).

246. Claus, H., Stummeyer, K., Batzilla, J., Mühlenhoff, M. & Vogel, U. Amino acid 310 determines the donor substrate specificity of serogroup W-135 and Y capsule polymerases of Neisseria meningitidis. Mol. Microbiol. 71, 960–71 (2009).

247. Williams, G. J., Zhang, C. & Thorson, J. S. Expanding the promiscuity of a natural-product glycosyltransferase by directed evolution. Nat. Chem. Biol. 3, 657–662 (2007).

248. Gantt, R. W., Goff, R. D., Williams, G. J. & Thorson, J. S. Probing the Aglycon Promiscuity of an Engineered Glycosyltransferase. Angew. Chemie Int. Ed. 47, 8889–8892 (2008).

249. Williams, G. J., Goff, R. D., Zhang, C. & Thorson, J. S. Optimizing glycosyltransferase specificity via ‘hot-spot’ saturation mutagenesis presents a catalyst for novobiocin glycorandomization. Chem. Biol. 15, 393–401 (2008).

250. Truman, A. W. et al. Chimeric Glycosyltransferases for the Generation of Hybrid Glycopeptides. Chem. Biol. 16, 676–685 (2009).

251. McArthur, J. B. & Chen, X. Glycosyltransferase engineering for carbohydrate synthesis. Biochem. Soc. Trans. 44, 129–142 (2016).

252. Hutchinson, E. et al. Redesign of Polyene Macrolide Glycosylation: Engineered Biosynthesis of 19-(O)-Perosaminyl- Amphoteronolide B. Chem. Biol. 17, 174–182 (2010).

253. Cartwright, A. M., Lim, E.-K., Kleanthous, C. & Bowles, D. J. A kinetic analysis of regiospecific glucosylation by two glycosyltransferases of Arabidopsis thaliana: domain swapping to introduce new activities. J. Biol. Chem. 283, 15724–31 (2008).

254. Truman, A. W. et al. Chimeric glycosyltransferases for the generation of hybrid glycopeptides. Chem. Biol. 16, 676–85 (2009).

255. Hansen, E. H., Osmani, S. A., Kristensen, C., Møller, B. L. & Hansen, J. Substrate specificities of family 1 UGTs gained by domain swapping. Phytochemistry 70, 473–482 (2009).

224

References

256. Park, S.-H. et al. Expanding substrate specificity of GT-B fold glycosyltransferase via domain swapping and high-throughput screening. Biotechnol. Bioeng. 102, 988–994 (2009).

257. Kohara, A., Nakajima, C., Yoshida, S. & Muranaka, T. Characterization and engineering of glycosyltransferases responsible for steroid saponin biosynthesis in Solanaceous plants. Phytochemistry 68, 478–486 (2007).

258. Abuelizz, H. A. & Mahmud, T. Distinct Substrate Specificity and Catalytic Activity of the Pseudoglycosyltransferase VldE. Chem. Biol. 22, 724–33 (2015).

259. Kim, H.-L., Kim, A. H., Park, M. B., Lee, S.-W. & Park, Y. S. Altered sugar donor specificity and catalytic activity of pteridine glycosyltransferases by domain swapping or site-directed mutagenesis. BMB Rep. 46, 37–40 (2013).

260. Layne, D. R. & Flore, J. A. End-product Inhibition of Photosynthesis in Prunus cerasus L. in Response to Whole-plant Source-Sink Manipulation. 120, 583–599 (1995).

261. Reid, S. J. & Abratt, V. R. Sucrose utilisation in bacteria: genetic organisation and regulation. Appl. Microbiol. Biotechnol. 67, 312–21 (2005).

262. Winter, H. & Huber, S. C. Regulation of sucrose metabolism in higher plants: localization and regulation of activity of key enzymes. Crit. Rev. Biochem. Mol. Biol. 35, 253–89 (2000).

263. Leslie, S. B., Israeli, E., Lighthart, B., Crowe, J. H. & Crowe, L. M. Trehalose and sucrose protect both membranes and proteins in intact bacteria during drying. Appl. Environ. Microbiol. 61, 3592–7 (1995).

264. Reed, R. Organic solute accumulation in osmotically stressed cyanobacteria. FEMS Microbiol. Lett. 39, 51–56 (1986).

265. Empadinhas, N. & da Costa, M. S. Osmoadaptation mechanisms in prokaryotes: distribution of compatible solutes. Int. Microbiol. 11, 151–61 (2008).

266. Desplats, P., Folco, E. & Salerno, G. L. Sucrose may play an additional role to that of an osmolyte in Synechocystis sp. PCC 6803 salt-shocked cells. Plant Physiol. Biochem. 43, 133– 8 (2005).

267. Lunn, J. E. Evolution of sucrose synthesis. Plant Physiol. 128, 1490–500 (2002).

268. Muyzer, G. et al. Complete genome sequence of "Thioalkalivibrio sulfidophilus" HL- EbGr7. Stand. Genomic Sci. 4, 23–35 (2011).

269. Khmelenina, V. et al. Osmoadaptation in halophilic and alkaliphilic methanotrophs. Arch. Microbiol. 172, 321–9 (1999).

270. Medvedkova, K. A., Khmelenina, V. N. & Trotsenko, I. A. Sucrose as a factor of thermal adaptation of the thermophilic methanotroph Methylocaldum szegediense O-12. Mikrobiologiia 76, 567–9

271. Doronina, N. V., Darmaeva, T. D. & Trotsenko, Y. A. Methylophaga alcalica sp. nov., a novel

225

References

alkaliphilic and moderately halophilic, obligately methylotrophic bacterium from an East Mongolian saline soda lake. Int. J. Syst. Evol. Microbiol. 53, 223–229 (2003).

272. But, S. Y. et al. Sucrose metabolism in halotolerant methanotroph Methylomicrobium alcaliphilum 20Z. Arch. Microbiol. 197, 471–480 (2015).

273. d’Avó, A. F. et al. A Unique Pool of Compatible Solutes on Rhodopirellula baltica, Member of the Deep-Branching Phylum Planctomycetes. PLoS One 8, e68289 (2013).

274. But, S. Y., Khmelenina, V. N., Reshetnikov, A. S. & Trotsenko, Y. A. Bifunctional sucrose phosphate synthase/phosphatase is involved in the sucrose biosynthesis by Methylobacillus flagellatus KT. FEMS Microbiol. Lett. 347, 43–51 (2013).

275. McFadden, G. I. Chloroplast origin and integration. Plant Physiol. 125, 50–53 (2001).

276. Blank, C. E. Phylogenetic distribution of compatible solute synthesis genes support a freshwater origin for cyanobacteria. J. Phycol. 49, 880–95 (2013).

277. Salerno, G. L. & Curatti, L. Origin of sucrose metabolism in higher plants: when, how and why? Trends Plant Sci. 8, 63–9 (2003).

278. Cumino, A. C., Perez-Cenci, M., Giarrocco, L. E. & Salerno, G. L. The proteins involved in sucrose synthesis in the marine cyanobacterium Synechococcus sp. PCC 7002 are encoded by two genes transcribed from a gene cluster. FEBS Lett. 584, 4655–60 (2010).

279. Porchia, A. C. & Salerno, G. L. Sucrose biosynthesis in a prokaryotic organism: Presence of two sucrose-phosphate synthases in Anabaena with remarkable differences compared with the plant enzymes. Proc. Natl. Acad. Sci. U. S. A. 93, 13600–4 (1996).

280. Alberts, B., Johnson, A. & Lewis, J. Molecular Biology of the Cell. 4th edition. (2002).

281. Vargas, W., Cumino, A. & Salerno, G. L. Cyanobacterial alkaline/neutral invertases. Origin of sucrose hydrolysis in the plant cytosol? Planta 216, 951–60 (2003).

282. Haigler, C. H. et al. Carbon partitioning to cellulose synthesis. Plant Mol. Biol. 47, 29–51 (2001).

283. Koch, K. Sucrose metabolism: regulatory mechanisms and pivotal roles in sugar sensing and plant development. Curr. Opin. Plant Biol. 7, 235–46 (2004).

284. Curatti, L., Giarrocco, L. E., Cumino, A. C. & Salerno, G. L. Sucrose synthase is involved in the conversion of sucrose to polysaccharides in filamentous nitrogen-fixing cyanobacteria. Planta 228, 617–25 (2008).

285. Brinkmann, N. et al. Chemo-enzymatic synthesis of the Galili epitope Gal(alpha)(1-- >3)Galbeta(1-->4)GlcNAc on a homogeneously soluble PEG polymer by a multi-enzyme system. Bioorg. Med. Chem. Lett. 11, 2503–6 (2001).

286. Masada, S. et al. An efficient chemoenzymatic production of small molecule glucosides with in situ UDP-glucose recycling. FEBS Lett. 581, 2562–6 (2007).

226

References

287. Son, M. H. et al. Production of flavonoid o-glucoside using sucrose synthase and flavonoid o- glucosyltransferase fusion protein. J. Microbiol. Biotechnol. 19, 709–12 (2009).

288. Terasaka, K., Mizutani, Y., Nagatsu, A. & Mizukami, H. In situ UDP-glucose regeneration unravels diverse functions of plant secondary product glycosyltransferases. FEBS Lett. 586, 4344–50 (2012).

289. Gutmann, A. et al. Towards the synthesis of glycosylated dihydrochalcone natural products using glycosyltransferase-catalysed cascade reactions. Green Chem. 16, 4417–4425 (2014).

290. Delmer, D. P. The Purification and Properties of Sucrose Synthetase from Etiolated Phaseolus aureus Seedlings. J. Biol. Chem. 247, 3822–3828 (1972).

291. Tsai, C.-Y. Sucrose-udp glucosyltransferase of Zea mays endosperm. Phytochemistry 13, 885–891 (1974).

292. Tanase, K. & Yamaki, S. Purification and characterization of two sucrose synthase isoforms from Japanese pear fruit. Plant Cell Physiol. 41, 408–14 (2000).

293. Curatti, L., Porchia, A. C., Herrera-Estrella, L. & Salerno, G. L. A prokaryotic sucrose synthase gene (susA) isolated from a filamentous nitrogen-fixing cyanobacterium encodes a protein similar to those of plants. Planta 211, 729–35 (2000).

294. Kolman, M. A., Torres, L. L., Martin, M. L. & Salerno, G. L. Sucrose synthase in unicellular cyanobacteria and its relationship with salt and hypoxic stress. Planta 235, 955–64 (2012).

295. Kadnikov, V. V et al. Genomic analysis of Melioribacter roseus, facultatively anaerobic organotrophic bacterium representing a novel deep lineage within Bacteriodetes/Chlorobi group. PLoS One 8, e53047 (2013).

296. Chain, P. et al. Complete genome sequence of the ammonia-oxidizing bacterium and obligate chemolithoautotroph Nitrosomonas europaea. J. Bacteriol. 185, 2759–73 (2003).

297. Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011).

298. Taboada, B., Ciria, R., Martinez-Guerrero, C. E. & Merino, E. ProOpDB: Prokaryotic Operon DataBase. Nucleic Acids Res. 40, D627-31 (2012).

299. Mao, X. et al. DOOR 2.0: presenting operons and their functions through dynamic and integrated views. Nucleic Acids Res. 42, D654-9 (2014).

300. Aerts, D., Verhaeghe, T., De Mey, M., Desmet, T. & Soetaert, W. A constitutive expression system for high-throughput screening. Eng. Life Sci. 11, 10–19 (2011).

301. Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. 6, 12–16 (2009).

302. Gasteiger, E. et al. in The Proteomics Protocols Handbook (ed. Walker, J.) 571–607 (Humana Press, 2005).

227

References

303. Waffenschmidt, S. & Jaenicke, L. Assay of Reducing Sugars in the Nanomole Range with 2 , 2 ’ -Bicinchoninate. Anal. Biochem. 165, 337–340 (1987).

304. Chen, P.-J., Wei, T.-C., Chang, Y.-T. & Lin, L.-P. Purification and characterization of carboxymethyl cellulase from Sinorhizobium fredii. Bot. Bull. Acad. Sin. 45, 111–118 (2004).

305. Cerdobbel, A. Engineering the thermostability of sucrose phosphorylase for industrial applications. (2011).

306. Copeland, R. Enzymes. A practical introduction to structure, mechanism and data analysis. (Wiley-VCH, 2000).

307. Jayashree, B., Pradeep, R., Kumar, A. & Gopal, B. Correlation between the Sucrose Synthase Protein Subfamilies, Variations in Structure and Expression in Stress-derived Expressed Sequence Tag Datasets. J. Proteomics Bioinform. 1, 408–423 (2008).

308. Lücker, S., Nowka, B., Rattei, T., Spieck, E. & Daims, H. The Genome of Nitrospina gracilis Illuminates the Metabolism and Evolution of the Major Marine Nitrite Oxidizer. Front. Microbiol. 4, 27 (2013).

309. Campbell, M. A. et al. Nitrosococcus watsonii sp. nov., a new species of marine obligate ammonia-oxidizing bacteria that is not omnipresent in the world’s oceans: calls to validate the names ‘Nitrosococcus halophilus’ and ‘Nitrosomonas mobilis’. FEMS Microbiol. Ecol. 76, 39–48 (2011).

310. Pfennig, N. & Biebl, H. Desulfuromonas acetoxidans gen. nov. and sp. nov., a new anaerobic, sulfur-reducing, acetate-oxidizing bacterium. Arch. Microbiol. 110, 3–12 (1976).

311. Tuovinen, O. H. & Fry, I. J. Bioleaching and mineral biotechnology. Curr. Opin. Biotechnol. 4, 344–355 (1993).

312. Rawlings, D. E. & Johnson, D. B. The microbiology of biomining: development and optimization of mineral-oxidizing microbial consortia. Microbiology 153, 315–324 (2007).

313. Gray, N. F. Environmental impact and remediation of acid mine drainage: a management problem. Environ. Geol. 30, 62–71 (1997).

314. Nordstrom, D. K. Advances in the Hydrogeochemistry and Microbiology of Acid Mine Waters. Int. Geol. Rev. 42, 499–515 (2000).

315. Vannelli, T., Logan, M., Arciero, D. M. & Hooper, A. B. Degradation of halogenated aliphatic compounds by the ammonia- oxidizing bacterium Nitrosomonas europaea. Appl. Environ. Microbiol. 56, 1169–71 (1990).

316. Kunte, H. J. Osmoregulation in Bacteria: Compatible Solute Accumulation and Osmosensing. Environ. Chem. 3, 94–99 (2006).

317. Sorokin, D. Y., Kuenen, J. G. & Muyzer, G. The microbial sulfur cycle at extremely haloalkaline conditions of soda lakes. Front. Microbiol. 2, 44 (2011).

228

References

318. Banciu, H. et al. , compatible solute and pigment composition of obligately chemolithoautotrophic alkaliphilic sulfur-oxidizing bacteria from soda lakes. FEMS Microbiol. Lett. 243, 181–187 (2005).

319. Arp, D. J., Chain, P. S. G. & Klotz, M. G. The Impact of Genome Analyses on Our Understanding of Ammonia-Oxidizing Bacteria. Annu. Rev. Microbiol. 61, 503–528 (2007).

320. Norton, J. M. et al. Complete genome sequence of Nitrosospira multiformis, an ammonia- oxidizing bacterium from the soil environment. Appl. Environ. Microbiol. 74, 3559–72 (2008).

321. Stark, J. M. & Firestone, M. K. Mechanisms for soil moisture effects on activity of nitrifying bacteria. Appl. Environ. Microbiol. 61, 218–21 (1995).

322. Elling, L. & Kula, M.-R. Characterization of sucrose synthase from rice grains for the enzymatic synthesis of UDP and TDP glucose. Enzyme Microb. Technol. 17, 929–934 (1995).

323. Sebková, V., Unger, C., Hardegger, M. & Sturm, A. Biochemical, physiological, and molecular characterization of sucrose synthase from Daucus carota. Plant Physiol. 108, 75–83 (1995).

324. Klotz, K. L., Finger, F. L. & Shelver, W. L. Characterization of two sucrose synthase isoforms in sugarbeet root. Plant Physiol. Biochem. 41, 107–115 (2003).

325. Figueroa, C. M. et al. The unique nucleotide specificity of the sucrose synthase from Thermosynechococcus elongatus. FEBS Lett. 587, 165–9 (2013).

326. Lee, J. C. & Timasheff, S. N. The stabilization of proteins by sucrose. J. Biol. Chem. 256, 7193–201 (1981).

327. Huang, D. Y. & Wang, A. Y. Purification and characterization of sucrose synthase isozymes from etiolated rice seedlings. Biochem. Mol. Biol. Int. 46, 107–13 (1998).

328. Ross, H. A. & Davies, H. V. Purification and Characterization of Sucrose Synthase from the Cotyledons of Vicia faba L. Plant Physiol. 100, 1008–13 (1992).

329. Römer, U. et al. Expression, purification and characterization of recombinant sucrose synthase 1 from Solanum tuberosum L. for carbohydrate engineering. J. Biotechnol. 107, 135–49 (2004).

330. Schmölzer, K., Lemmerer, M., Gutmann, A. & Nidetzky, B. Integrated process design for biocatalytic synthesis by a Leloir Glycosyltransferase: UDP-glucose production with sucrose synthase. Biotechnol. Bioeng. 114, 924–928 (2017).

331. Wang, J., Zeng, D., Liu, G., Wang, S. & Yu, S. Truncation of a mannanase from Trichoderma harzianum improves its enzymatic properties and expression efficiency in Trichoderma reesei. J. Ind. Microbiol. Biotechnol. 41, 125–133 (2014).

332. Wen, T.-N., Chen, J.-L., Lee, S.-H., Yang, N.-S. & Shyur, L.-F. A Truncated Fibrobacter succinogenes 1,3−1,4-β- d -Glucanase with Improved Enzymatic Activity and Thermotolerance. Biochemistry 44, 9197–9205 (2005).

333. Yang, H. et al. Integrating terminal truncation and oligopeptide fusion for a novel protein

229

References

engineering strategy to improve specific activity and catalytic efficiency: alkaline α-amylase as a case study. Appl. Environ. Microbiol. 79, 6429–38 (2013).

334. Kim, Y.-M. et al. Truncation of N- and C-terminal regions of Streptococcus mutans dextranase enhances catalytic activity. Appl. Microbiol. Biotechnol. 91, 329–339 (2011).

335. Dyson, M. R., Shadbolt, S. P., Vincent, K. J., Perera, R. L. & McCafferty, J. Production of soluble mammalian proteins in Escherichia coli: identification of protein features that correlate with successful expression. BMC Biotechnol. 4, 32 (2004).

336. Ruotolo, R. et al. Domain Organization of Phytochelatin Synthase: Functional properties of truncated enzyme species identified by limited proteolysis. J. Biol. Chem. 279, 14686–14693 (2004).

337. Hardin, S. C., Duncan, K. A. & Huber, S. C. Determination of structural requirements and probable regulatory effectors for membrane association of maize sucrose synthase 1. Plant Physiol. 141, 1106–19 (2006).

338. Kirstein, J., Molière, N., Dougan, D. A. & Turgay, K. Adapting the machine: adaptor proteins for Hsp100/Clp and AAA+ proteases. Nat. Rev. Microbiol. 7, 589–599 (2009).

339. Martínez-Noël, G. M. A., Cumino, A. C., Kolman, M. de los A. & Salerno, G. L. First evidence of sucrose biosynthesis by single cyanobacterial bimodular proteins. FEBS Lett. 587, 1669–74 (2013).

340. Verhaeghe, T., Aerts, D., Diricks, M., Soetaert, W. & Desmet, T. The quest for a thermostable sucrose phosphorylase reveals sucrose 6′-phosphate phosphorylase as a novel specificity. Appl. Microbiol. Biotechnol. 98, 7027–7037 (2014).

341. Pinna, L. A. & Ruzzene, M. How do protein kinases recognize their substrates? Biochim. Biophys. Acta - Mol. Cell Res. 1314, 191–225 (1996).

342. Pereira, S. F. F., Goss, L. & Dworkin, J. -like serine/threonine kinases and phosphatases in bacteria. Microbiol. Mol. Biol. Rev. 75, 192–212 (2011).

343. Gonzalez, L., Phalip, V. & Zhang, C. C. Characterization of PknC, a Ser/Thr kinase with broad substrate specificity from the cyanobacterium Anabaena sp. strain PCC 7120. Eur. J. Biochem. 268, 1869–75 (2001).

344. Mann, N. H. Protein phosphorylation in cyanobacteria. Microbiology 140, 3207–3215 (1994).

345. Tyagi, N., Anamika, K., Srinivasan, N., Kumar, S. & McGettigan, P. A Framework for Classification of Prokaryotic Protein Kinases. PLoS One 5, e10608 (2010).

346. Pan, Z. et al. dbPSP: a curated database for protein phosphorylation sites in prokaryotes. Database (Oxford). 2015, bav031 (2015).

347. Miller, M. L. et al. NetPhosBac - A predictor for Ser/Thr phosphorylation sites in bacterial proteins. Proteomics 9, 116–125 (2009).

230

References

348. Li, Z., Wu, P., Zhao, Y., Liu, Z. & Zhao, W. in Advances in experimental medicine and biology 827, 275–285 (2015).

349. You, X.-Y. et al. Unraveling the Acidithiobacillus caldus complete genome and its central metabolisms for carbon assimilation. J. Genet. Genomics 38, 243–252 (2011).

350. Zhang, X. et al. Gene Turnover Contributes to the Evolutionary Adaptation of Acidithiobacillus caldus: Insights from Comparative Genomics. Front. Microbiol. 7, 1960 (2016).

351. Hallberg, K. B. & Lindstrom, E. B. Characterization of Thiobacillus caldus sp. nov., a moderately thermophilic acidophile. Microbiology 140, 3451–3456 (1994).

352. Hallberg, K. B., González-Toril, E. & Johnson, D. B. Acidithiobacillus ferrivorans, sp. nov.; facultatively anaerobic, psychrotolerant iron-, and sulfur-oxidizing acidophiles isolated from metal mine-impacted environments. Extremophiles 14, 9–19 (2010).

353. Gale, N. L. & Beck, J. V. Evidence for the Calvin cycle and hexose monophosphate pathway in Thiobacillus ferrooxidans. J. Bacteriol. 94, 1052–9 (1967).

354. Drobner, E., Huber, H. & Stetter, K. O. Thiobacillus ferrooxidans, a facultative hydrogen oxidizer. Appl. Environ. Microbiol. 56, 2922–3 (1990).

355. Valdés, J. et al. Acidithiobacillus ferrooxidans metabolism: from genome sequence to industrial applications. BMC Genomics 9, 597 (2008).

356. Myhr, S. & Torsvik, T. Denitrovibrio acetiphilus, a novel genus and species of dissimilatory nitrate-reducing bacterium isolated from an oil reservoir model column. Int. J. Syst. Evol. Microbiol. 50, 1611–1619 (2000).

357. Kiss, H. et al. Complete genome sequence of Denitrovibrio acetiphilus type strain (N2460T). Stand. Genomic Sci. 2, 270–279 (2010).

358. Dörries, M., Wöhlbrand, L., Kube, M., Reinhardt, R. & Rabus, R. Genome and catabolic subproteomes of the marine, nutritionally versatile, sulfate-reducing bacterium Desulfococcus multivorans DSM 2059. BMC Genomics 17, 918 (2016).

359. Sorokin, D. Y. et al. Sulfidogenesis under extremely haloalkaline conditions by Desulfonatronospira thiodismutans gen. nov., sp. nov., and Desulfonatronospira delicata sp. nov. - a novel lineage of Deltaproteobacteria from hypersaline soda lakes. Microbiology 154, 1444–1453 (2008).

360. Rauschenbach, I., Narasingarao, P. & Haggblom, M. M. Desulfurispirillum indicum sp. nov., a selenate- and selenite-respiring bacterium isolated from an estuarine canal. Int. J. Syst. Evol. Microbiol. 61, 654–658 (2011).

361. Melton, E. D. et al. Complete genome sequence of Desulfurivibrio alkaliphilus strain AHT2(T), a haloalkaliphilic sulfidogen from Egyptian hypersaline alkaline lakes. Stand. Genomic Sci. 11, 67 (2016).

362. Sorokin, D. Y., Tourova, T. P., Mußmann, M. & Muyzer, G. Dethiobacter alkaliphilus gen. nov. 231

References

sp. nov., and Desulfurivibrio alkaliphilus gen. nov. sp. nov.: two novel representatives of reductive sulfur cycle from soda lakes. Extremophiles 12, 431–439 (2008).

363. Hoeft, S. E., Kulp, T. R., Han, S., Lanoil, B. & Oremland, R. S. Coupled arsenotrophy in a hot spring photosynthetic biofilm at Mono Lake, California. Appl. Environ. Microbiol. 76, 4633–9 (2010).

364. Kulp, T. R. et al. Arsenic(III) Fuels Anoxygenic Photosynthesis in Hot Spring Biofilms from Mono Lake, California. Science (80-. ). 321, 967–970 (2008).

365. Hoeft McCann, S. et al. Arsenite as an Electron Donor for Anoxygenic Photosynthesis: Description of Three Strains of Ectothiorhodospira from Mono Lake, California and Big Soda Lake, Nevada. Life 7, 1 (2016).

366. Munn, C. Marine microbiology. (Garland Science, 2011).

367. Bhaskar, K. V. & Charyulu, P. B. B. N. Effect of environmental factors on nitrifying bacteria isolated from the rhizosphere of Setaria italica (L.) Beauv. African J. Biotechnol. 4, 1145–1146 (2005).

368. Hommes, N. G., Sayavedra-Soto, L. A. & Arp, D. J. Chemolithoorganotrophic growth of Nitrosomonas europaea on fructose. J. Bacteriol. 185, 6809–14 (2003).

369. Wood, N. J. & Sörensen, J. Osmotic stimulation of microcolony development by Nitrosomonas europaea. FEMS Microbiol. Ecol. 27, 175–183 (1998).

370. Watson, S. W. & Waterbury, J. B. Characteristics of two marine nitrite oxidizing bacteria, Nitrospina gracilis nov. gen. nov. sp. and Nitrococcus mobilis nov. gen. nov. sp. Arch. Mikrobiol. 77, 203–230 (1971).

371. Sorokin, D. Y., Muntyan, M. S., Panteleeva, A. N. & Muyzer, G. Thioalkalivibrio sulfidiphilus sp. nov., a haloalkaliphilic, sulfur-oxidizing gammaproteobacterium from alkaline habitats. Int. J. Syst. Evol. Microbiol. 62, 1884–1889 (2012).

372. Bryantseva, I. et al. Thiorhodospira sibirica gen. nov., sp. nov., a new alkaliphilic purple sulfur bacterium from a Siberian soda lake. Int. J. Syst. Bacteriol. 49, 697–703 (1999).

373. Janssen, P. H., Liesack, W. & Schink, B. Geovibrio thiophilus sp. nov., a novel sulfurreducing bacterium belonging to the phylum Deferribacteres. Int. J. Syst. Evol. Microbiol. 52, 1341–1347 (2002).

374. Diricks, M. et al. Sequence determinants of nucleotide binding in Sucrose Synthase: improving the affinity of a bacterial Sucrose Synthase for UDP by introducing plant residues. Protein Eng. Des. Sel. (2016). doi:10.1093/protein/gzw048

375. Zervosen, A., Römer, U. & Elling, L. Application of recombinant sucrose synthase for the large scale synthesis of ADP-glucose. J. Mol. Catal. B Enzym. 5, 25–28 (1998).

376. Hokke, C. H., Zervosen, A., Elling, L., Joziasse, D. H. & van den Eijnden, D. H. One-pot enzymatic synthesis of the Galα1→3Galβ1→4GlcNAc sequence with in situ UDP-Gal 232

References

regeneration. Glycoconj. J. 13, 687–692 (1996).

377. Wang, Y. et al. Efficient enzymatic production of rebaudioside A from stevioside. Biosci. Biotechnol. Biochem. 80, 67–73 (2015).

378. Rupprath, C., Kopp, M., Hirtz, D., Müller, R. & Elling, L. An enzyme module system for in situ regeneration of deoxythymidine 5′-diphosphate (dTDP)-activated deoxy sugars. Adv. Synth. Catal. 349, 1489–1496 (2007).

379. Michlmayr, H. et al. Biochemical Characterization of a Recombinant UDP-glucosyltransferase from Rice and Enzymatic Production of Deoxynivalenol-3-O-β-D-glucoside. Toxins (Basel). 7, 2685–2700 (2015).

380. Brinkmann, N. et al. Chemo-Enzymatic synthesis of the galili epitope Galα(1→3)Galβ(1→4)GlcNAc on a homogeneously soluble PEG polymer by a multi-Enzyme system. Bioorg. Med. Chem. Lett. 11, 2503–2506 (2001).

381. Shao, J., Hayashi, T. & Wang, P. G. Enhanced Production of -Galactosyl Epitopes by Metabolically Engineered Pichia pastoris. Appl. Environ. Microbiol. 69, 5238–5242 (2003).

382. Engels, L., Henze, M., Hummel, W. & Elling, L. Enzyme Module Systems for the Synthesis of Uridine 5′-Diphospho-α- D -glucuronic Acid and Non-Sulfated Human Natural Killer Cell-1 (HNK-1) Epitope. Adv. Synth. Catal. 357, 1751–1762 (2015).

383. Lepak, A., Gutmann, A., Kulmer, S. T. & Nidetzky, B. Creating a Water-Soluble Resveratrol- Based Antioxidant by Site-Selective Enzymatic Glucosylation. ChemBioChem 16, 1870–1874 (2015).

384. Sanchis, J. et al. Improved PCR method for the creation of saturation mutagenesis libraries in directed evolution: application to difficult-to-amplify templates. Appl. Microbiol. Biotechnol. 81, 387–97 (2008).

385. Brazier-Hicks, M. et al. The C-glycosylation of flavonoids in cereals. J. Biol. Chem. 284, 17926–34 (2009).

386. Zhang, Y. I-TASSER server for protein 3D structure prediction. BMC Bioinformatics 9, 40 (2008).

387. Roy, A., Kucukural, A. & Zhang, Y. I-TASSER: a unified platform for automated protein structure and function prediction. Nat. Protoc. 5, 725–38 (2010).

388. The PyMOL Molecular Graphics System, Version 1.3 Schrödinger, LLC.

389. Murata, T. Sucrose Synthetase of Sweet Potato Roots. Agric. Biol. Chem. 35, 1441–1448 (1971).

390. Nomura, T. & Akazawa, T. Enzymic mechanism of starch synthesis in ripening rice grains VII. Purification and enzymic properties of sucrose synthetase. Arch. Biochem. Biophys. 156, 644– 652 (1973).

233

References

391. Moriguchi, T. & Yamaki, S. Purification and Characterization of Sucrose Synthase from Peach (Prunus persica) Fruit. Plant Cell Physiol. 29, 1361–1366 (1988).

392. Baroja-Fernandez, E. et al. Sucrose synthase activity in the sus1/sus2/sus3/sus4 Arabidopsis mutant is sufficient to support normal cellulose and starch production. Proc. Natl. Acad. Sci. 109, 321–326 (2012).

393. Zeng, Y., Wu, Y., Avigne, W. T. & Koch, K. E. Differential Regulation of Sugar-Sensitive Sucrose Synthases by Hypoxia and Anoxia Indicate Complementary Transcriptional and Posttranscriptional Responses1. Plant Physiol. 116, 1573–1583 (1998).

394. Kren, V. & Martínková, L. Glycosides in medicine: ‘The role of glycosidic residue in biological activity’. Curr. Med. Chem. 8, 1303–28 (2001).

395. Ichikawa, Y., Wang, R. & Wong, C. H. Regeneration of sugar nucleotide for enzymatic oligosaccharide synthesis. Methods Enzymol. 247, 107–27 (1994).

396. Owens, D. K. & McIntosh, C. A. Identification, recombinant expression, and biochemical characterization of a flavonol 3-O-glucosyltransferase clone from Citrus paradisi. Phytochemistry 70, 1382–91 (2009).

397. Zhang, C. et al. Exploiting the reversibility of natural product glycosyltransferase-catalyzed reactions. Science 313, 1291–4 (2006).

398. Bungaruang, L., Gutmann, A. & Nidetzky, B. β-Cyclodextrin Improves Solubility and Enzymatic C -Glucosylation of the Flavonoid Phloretin. Adv. Synth. Catal. 358, 486–493 (2016).

399. Lee, W. & Bae, J.-S. Anti-inflammatory Effects of Aspalathin and Nothofagin from Rooibos (Aspalathus linearis) In Vitro and In Vivo. Inflammation 38, 1502–1516

400. Gutmann, A., Lepak, A., Diricks, M., Desmet, T. & Nidetzky, B. Glycosyltransferase cascades for natural product glycosylation: Use of plant instead of bacterial sucrose synthases improves the UDP-glucose recycling from sucrose and UDP. Biotechnol. J. 1600557 (2017). doi:10.1002/biot.201600557

401. Murata, T. Sucrose Synthetase of Rice Grains and Potato Tubers. Agric. Biol. Chem. 36, 1815–1818 (1972).

402. Davis, B. & Robinson, M. Drug delivery systems based on sugar-macromolecule conjugates. Curr. Opin. Drug Discov. Dev. 5, 279–288 (2002).

403. Watanabe, M. et al. Oral Therapeutic Agents with Highly Clustered Globotriose for Treatment of Shiga Toxigenic Escherichia coli Infections. J. Infect. Dis. 189, 360–368 (2004).

404. Bode, L. Human milk oligosaccharides: every baby needs a sugar mama. Glycobiology 22, 1147–62 (2012).

405. Vogt, T. & Taylor, L. P. Flavonol 3-O-glycosyltransferases associated with petunia pollen produce gametophyte-specific flavonol diglycosides. Plant Physiol. 108, 903–11 (1995).

234

References

406. Delattre, C., Fenoradosoa, T. A. & Michaud, P. Galactans: an overview of their most important sourcing and applications as natural polysaccharides. Brazilian Arch. Biol. Technol. 54, 1075– 1092 (2011).

407. Torres, D. P. M., Gonçalves, M. do P. F., Teixeira, J. A. & Rodrigues, L. R. Galacto- Oligosaccharides: Production, Properties, Applications, and Significance as Prebiotics. Compr. Rev. Food Sci. Food Saf. 9, 438–454 (2010).

408. Quan, J. & Tian, J. Circular polymerase extension cloning. Methods Mol. Biol. 1116, 103–17 (2014).

409. Acevedo-Rocha, C. G., Reetz, M. T. & Nov, Y. Economical analysis of saturation mutagenesis experiments. Sci. Rep. 5, 10654 (2015).

410. Siloto, R. M. P. & Weselake, R. J. Site saturation mutagenesis: Methods and applications in protein engineering. Biocatal. Agric. Biotechnol. 1, 181–189 (2012).

411. Huang, Y. H., Picha, D. H. & Kilili, A. W. A continuous method for enzymatic assay of sucrose synthase in the synthetic direction. J. Agric. Food Chem. 47, 2746–50 (1999).

412. Johnson, B. H. & Hecht, M. H. Recombinant proteins can be isolated from E. coli cells by repeated cycles of freezing and thawing. Biotechnology. (N. Y). 12, 1357–60 (1994).

413. De Groeve, M. Engineering of for glycoside synthesis. (Ghent University, 2009).

414. Morley, K. L. & Kazlauskas, R. J. Improving enzyme properties: when are closer mutations better? Trends Biotechnol. 23, 231–7 (2005).

415. Oue, S., Okamoto, A., Yano, T. & Kagamiyama, H. Redesigning the Substrate Specificity of an Enzyme by Cumulative Effects of the Mutations of Non-active Site Residues. J. Biol. Chem. 274, 2344–2349 (1999).

416. Yep, A., Ballicora, M. A. & Preiss, J. The active site of the Escherichia coli glycogen synthase is similar to the active site of retaining GT-B glycosyltransferases. Biochem. Biophys. Res. Commun. 316, 960–966 (2004).

417. Umezawa, Y. & Nishio, M. CH/pi interactions in the crystal structure of Class I MHC antigens and their complexes with peptides. Bioorganic Med. Chem. 6, 2507–2515 (1998).

418. Zondlo, N. J. Aromatic-Proline interactions: Electronically tubale CH/pi interactions. Acc Chem Res. 46, 1039–1049 (2013).

419. Brandl, M., Weiss, M. S., Jabs, A., Suhnel, J. & Hilgenfled, R. CH-pi interactions in proteins. J. Mol. Biol. 307, 357–377 (2001).

420. Huang, Y.-C., Hsiang, E.-C., Yang, C.-C. & Wang, A.-Y. New insight into the catalytic properties of rice sucrose synthase. Plant Mol. Biol. 90, 127–135 (2016).

421. Batt, S. M. et al. Acceptor substrate discrimination in phosphatidyl-myo-inositol mannoside

235

References

synthesis: structural and mutational analysis of mannosyltransferase Corynebacterium glutamicum PimB’. J. Biol. Chem. 285, 37741–52 (2010).

422. Jochens, H., Aerts, D. & Bornscheuer, U. T. Thermostabilization of an esterase by alignment- guided focussed directed evolution. Protein Eng. Des. Sel. 23, 903–9 (2010).

423. Hennet, T. The galactosyltransferase family. Cell. Mol. Life Sci. 59, 1081–95 (2002).

424. Cantarel, B. et al. The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics. (2009). Available at: http://www.cazy.org/.

425. Pradel, E., Parker, C. T. & Schnaitman, C. A. Structures of the rfaB, rfaI, rfaJ, and rfaS genes of Escherichia coli K-12 and their roles in assembly of the lipopolysaccharide core. J. Bacteriol. 174, 4736–45 (1992).

426. Stingele, F., Neeser, J. R. & Mollet, B. Identification and characterization of the eps (Exopolysaccharide) gene cluster from Streptococcus thermophilus Sfi6. J. Bacteriol. 178, 1680–90 (1996).

427. Banerjee, A. et al. Implications of phase variation of a gene (pgtA) encoding a pilin galactosyl transferase in gonococcal pathogenesis. J. Exp. Med. 196, 147–62 (2002).

428. Aquilini, E. et al. Functional Identification of the Proteus mirabilis Core Lipopolysaccharide Biosynthesis Genes. J. Bacteriol. 192, 4413–4424 (2010).

429. Heinrichs, D. E., Monteiro, M. A., Perry, M. B. & Whitfield, C. The assembly system for the lipopolysaccharide R2 core-type of Escherichia coli is a hybrid of those found in Escherichia coli K-12 and Salmonella enterica. Structure and function of the R2 WaaK and WaaL homologs. J. Biol. Chem. 273, 8849–59 (1998).

430. Göhmann, S., Manning, P. A., Alpert, C.-A., Walker, M. J. & Timmis, K. N. Lipopolysaccharide O-antigen biosynthesis in Shigella dysenteriae serotype 1: analysis of the plasmid-carried rfp determinant. Microb. Pathog. 16, 53–64 (1994).

431. Yoshida, Y. et al. Molecular and antigenic characterization of a Streptococcus oralis coaggregation receptor polysaccharide by carbohydrate engineering in Streptococcus gordonii. J. Biol. Chem. 283, 12654–64 (2008).

432. Harms, M. J. & Thornton, J. W. Analyzing protein structure and function using ancestral gene reconstruction. Curr. Opin. Struct. Biol. 20, 360–6 (2010).

433. Bridgham, J. T., Ortlund, E. a & Thornton, J. W. An epistatic ratchet constrains the direction of glucocorticoid receptor evolution. Nature 461, 515–9 (2009).

434. Ortlund, E. a, Bridgham, J. T., Redinbo, M. R. & Thornton, J. W. Crystal structure of an ancient protein: evolution by conformational epistasis. Science 317, 1544–8 (2007).

435. Khersonsky, O. & Tawfik, D. S. : a mechanistic and evolutionary perspective. Annu. Rev. Biochem. 79, 471–505 (2010).

236

References

436. De Winter, K., Soetaert, W. & Desmet, T. An Imprinted Cross-Linked Enzyme Aggregate (iCLEA) of Sucrose Phosphorylase: Combining Improved Stability with Altered Specificity. Int. J. Mol. Sci. 13, 11333–11342 (2012).

437. Burgess, R. in Methods in Enzymology 463, 331–342 (2009).

438. Wang, M. et al. Enhancement of activity of cross-linked enzyme aggregates by a sugar- assisted precipitation strategy: Technical development and molecular mechanism. J. Biotechnol. 156, 30–38 (2011).

439. Barbosa, O. et al. Glutaraldehyde in bio-catalysts design: a useful crosslinker and a versatile tool in enzyme immobilization. RSC Adv. 4, 1583 (2014).

440. Hansen, E. H. & Mikkelsen, H. S. Enzyme-Immobilization by the Glutardialdehyde Procedure. An Investigation of the Effects of Reducing the Schiff-Bases Generated, as Based on Studying the Immobilization of Glucose Oxidase to Silanized Controlled Pore Glass. Anal. Lett. 24, 1419–1430 (1991).

441. Mielenz, J., Klasson, T., Adney, W. & McMillan, J. Biotechnology for Fuels and Chemicals: The Twenty-Eighth Symposium. (Humana press, 2007).

442. Korolik, V., Fry, B. N., Alderton, M. R., Van Der Zeijst ’, B. A. M. & Coloel, P. J. Expression of Campylobacter hyoilei lipo-oligosaccharide (LOS) antigens in Escherichia coli. Print. Gt. Britain Microbiol. 143, 3481–3489 (1997).

443. Guan, S., Clarke, A. J. & Whitfield, C. Functional analysis of the galactosyltransferases required for biosynthesis of D-galactan I, a component of the lipopolysaccharide O1 antigen of Klebsiella pneumoniae. J. Bacteriol. 183, 3318–27 (2001).

444. Jolly, L., Newell, J., Porcelli, I., Vincent, S. J. F. & Stingele, F. Lactobacillus helveticus glycosyltransferases: from genes to carbohydrate synthesis. Glycobiology 12, 319–27 (2002).

445. Saksouk, N. et al. The capsular polysaccharide biosynthesis of Streptococcus pneumoniae serotype 8: functional identification of the glycosyltransferase WciS (Cap8H). Biochem. J. 389, 63–72 (2005).

446. Bruins, M. E., Janssen, A. E. & Boom, R. M. Thermozymes and their applications: a review of recent literature and patents. Appl. Biochem. Biotechnol. 90, 155–86 (2001).

447. Olichon, A., Schweizer, D., Muyldermans, S. & de Marco, A. Heating as a rapid purification method for recovering correctly-folded thermotolerant VH and VHH domains. BMC Biotechnol. 7, 7 (2007).

448. Tokuriki, N., Stricher, F., Serrano, L. & Tawfik, D. S. How Protein Stability and New Functions Trade Off. PLoS Comput. Biol. 4, e1000002 (2008).

449. Studer, R. A., Christin, P.-A., Williams, M. A. & Orengo, C. A. Stability-activity tradeoffs constrain the adaptive evolution of RubisCO. Proc. Natl. Acad. Sci. U. S. A. 111, 2223–8 (2014).

237

References

450. Serrano, L., Day, A. & Fersht, A. Step-wise mutation of barnase to binase. A procedure for engineering increased stability of proteins and an experimental analysis of the evolution of protein stability. J. Mol. Biol. 233, 305–312 (1993).

451. Wang, X., Minasov, G. & Shoichet, B. Evolution of an antibiotic resistance enzyme constrained by stability and activity trade-offs. J. Mol. Biol. 320, 85–95 (2002).

452. Godoy-Ruiz, R, Perez-Jimenez, R., Ibarra-Molero, B. & Sanchez-Ruiz, J. Relation between protein stability, evolution and structure, as probed by mutations. J. Mol. Biol. 336, 313–318 (2004).

453. Tao, Q. et al. Protecting enzymes against heat inactivation by temperature-sensitive polymer in confined space. Phys. Chem. Chem. Phys. 13, 16265 (2011).

454. Paiva, C. L. A. & Panek, A. D. Biotechnological Applications of the Disaccharide Trehalose. Biotechnol. Annu. Rev. 2, 293–314 (1996).

455. Vihinen, M. Relationship of protein flexibility to thermostability. Protein Eng. 1, 477–80 (1987).

456. Yu, H. & Huang, H. Engineering proteins for thermostability through rigidifying flexible sites. Biotechnol. Adv. 32, 308–315 (2014).

457. Kheirollahi, A., Khajeh, K. & Golestani, A. Rigidifying flexible sites: An approach to improve stability of chondroitinase ABC I. Int. J. Biol. Macromol. 97, 270–278 (2017).

458. Wijma, H. J. et al. Computationally designed libraries for rapid enzyme stabilization. Protein Eng. Des. Sel. 27, 49–58 (2014).

459. Thiltgen, G. & Goldstein, R. A. Assessing predictors of changes in protein stability upon mutation using self-consistency. PLoS One 7, e46084 (2012).

460. Yu, H., Zhao, Y., Guo, C., Gan, Y. & Huang, H. The role of proline substitutions within flexible regions on thermostability of luciferase. Biochim. Biophys. Acta - Proteins Proteomics 1854, 65–72 (2015).

461. Hung, L. M., Chen, J. K., Huang, S. S., Lee, R. S. & Su, M. J. Cardioprotective effect of resveratrol, a natural antioxidant derived from grapes. Cardiovasc. Res. 47, 549–55 (2000).

462. Higashiyama, T. Novel functions and applications of trehalose. Pure Appl. Chem. 74, 1263– 1269 (2002).

463. Lee, Y.-A. et al. Cryopreservation in Trehalose Preserves Functional Capacity of Murine Spermatogonial Stem Cells. PLoS One 8, e54889 (2013).

464. Johnson, M. Antibody Shelf Life/How to Store Antibodies. (2014). doi:http://dx.doi.org/10.13070/mm.en.2.120

465. Martínez, L. & Videa, M. Preservation Effect of Vitreous non Reducing Carbohydrates on the Enzymatic Activity, Denaturation Temperature and Retention of Native Structure of Lysozyme. J. Mex. Chem. Soc 55, 185–189 (2011).

238

References

466. Richards, a. B. et al. Trehalose: A review of properties, history of use and human tolerance, and results of multiple safety studies. Food Chem. Toxicol. 40, 871–898 (2002).

467. Ohtake, S. & Wang, Y. J. Trehalose: Current Use and Future Applications. J. Pharm. Sci. 100, 2020–2053 (2011).

468. Becker, a., Schlöder, P., Steele, J. E. & Wegener, G. The regulation of trehalose metabolism in insects. Experientia 52, 433–439 (1996).

469. Elbein, A. D. New insights on trehalose: a multifunctional molecule. Glycobiology 13, 17R–27 (2003).

470. Crowe, J. H., Carpenter, J. F. & Crowe, L. M. The role of vitrification in anhydrobiosis. Annu. Rev. Physiol. 60, 73–103 (1998).

471. Müller, J., Aeschbacher, R. a, Wingler, a, Boller, T. & Wiemken, a. Trehalose and trehalase in Arabidopsis. Plant Physiol. 125, 1086–1093 (2001).

472. Iturriaga, G., Suárez, R. & Nova-Franco, B. Trehalose Metabolism: From Osmoprotection to Signaling. Int. J. Mol. Sci. 10, 3793–3810 (2009).

473. Ryu, S.-I., Park, C.-S., Cha, J., Woo, E.-J. & Lee, S.-B. A novel trehalose-synthesizing glycosyltransferase from Pyrococcus horikoshii: Molecular cloning and characterization. Biochem. Biophys. Res. Commun. 329, 429–436 (2005).

474. Qu, Q., Lee, S. J. & Boos, W. TreT, a novel trehalose glycosyltransferring synthase of the hyperthermophilic archaeon Thermococcus litoralis. J. Biol. Chem. 279, 47890–47897 (2004).

475. Ryu, S.-I. et al. Molecular cloning and characterization of trehalose synthase from Thermotoga maritima DSM3109: Syntheses of trehalose disaccharide analogues and NDP-. Enzyme Microb. Technol. 47, 249–256 (2010).

476. Luley-Goedl, C. & Nidetzky, B. Carbohydrate synthesis by disaccharide phosphorylases: Reactions, catalytic mechanisms and application in the glycosciences. Biotechnol. J. 5, 1324– 1338 (2010).

477. UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 43, D204–D212 (2015).

478. Le, S. Q. & Gascuel, O. An Improved General Amino Acid Replacement Matrix. Mol. Biol. Evol. 25, 1307–1320 (2008).

479. Kapustka, L. A., Annala, A. E. & Swanson, W. C. The peroxidase-glucose oxidase system: A new method to determine glucose liberated by carbohydrate degrading soil enzymes. Plant Soil 63, 487–490 (1981).

480. BLECHER, M. & GLASSMAN, A. B. Determination of glucose in the presence of sucrose using glucose oxidase; effect of pH on absorption spectrum of oxidized o-dianisidine. Anal. Biochem. 3, 343–52 (1962).

239

References

481. Eis, C. & Nidetzky, B. Characterization of trehalose phosphorylase from Schizophyllum commune. Biochem. J. 341, 385–93 (1999).

482. Eis, C., Watkins, M., Prohaska, T. & Nidetzky, B. Fungal trehalose phosphorylase: kinetic mechanism, pH-dependence of the reaction and some structural properties of the enzyme from Schizophyllum commune. Biochem. J. 356, 757–67 (2001).

483. Schwarz, A., Goedl, C., Minani, A. & Nidetzky, B. Trehalose phosphorylase from Pleurotus ostreatus: Characterization and stabilization by covalent modification, and application for the synthesis of α,α-trehalose. J. Biotechnol. 129, 140–150 (2007).

484. Han, S.-E. et al. Cloning and characterization of a gene encoding trehalose phosphorylase (TP) from Pleurotus sajor-caju. Protein Expr. Purif. 30, 194–202 (2003).

485. Goedl, C., Griessler, R., Schwarz, A. & Nidetzky, B. Structure-function relationships for Schizophyllum commune trehalose phosphorylase and their implications for the catalytic mechanism of family GT-4 glycosyltransferases. Biochem. J. 397, 491–500 (2006).

486. Eis, C. & Nidetzky, B. Substrate-binding recognition and specificity of trehalose phosphorylase from Schizophyllum commune examined in steady-state kinetic studies with deoxy and deoxyfluoro substrate analogues and inhibitors. Biochem. J. 363, 335–40 (2002).

487. Eis, C., Albert, M., Dax, K. & Nidetzky, B. The stereochemical course of the reaction mechanism of trehalose phosphorylase from Schizophyllum commune. FEBS Lett. 440, 440– 443 (1998).

488. Wannet, W. J. et al. Purification and characterization of trehalose phosphorylase from the commercial mushroom Agaricus bisporus. Biochim. Biophys. Acta 1425, 177–88 (1998).

489. Kitamoto, Y., Akashi, H., Tanaka, H. & Mori, N. Glucose-1-phosphate formation by a novel trehalose phosphorylase from Flammulina velutipes. FEMS Microbiol. Lett. 55, 147–150 (1988).

490. Saito, K., Kase, T., Takahashi, E., Takahashi, E. & Horinouchi, S. Purification and Characterization of a Trehalose Synthase from the Basidiomycete Grifola frondosa. Appl. Envir. Microbiol. 64, 4340–4345 (1998).

491. Saito, K. et al. Production of trehalose synthase from a basidiomycete, Grifola frondosa, in Escherichia coli. Appl. Microbiol. Biotechnol. 50, 193–8 (1998).

492. Ryu, S.-I., Kim, J.-E., Kim, E.-J., Chung, S.-K. & Lee, S.-B. Catalytic reversibility of Pyrococcus horikoshii trehalose synthase: Efficient synthesis of several nucleoside diphosphate glucoses with enzyme recycling. Process Biochem. 46, 128–134 (2011).

493. Kouril, T., Zaparty, M., Marrero, J., Brinkmann, H. & Siebers, B. A novel trehalose synthesizing pathway in the hyperthermophilic Crenarchaeon Thermoproteus tenax: the unidirectional TreT pathway. Arch. Microbiol. 190, 355–369 (2008).

494. Nobre, A., Alarico, S., Fernandes, C., Empadinhas, N. & Da Costa, M. S. A unique combination

240

References

of genetic systems for the synthesis of trehalose in Rubrobacter xylanophilus: Properties of a rare actinobacterial TreT. J. Bacteriol. 190, 7939–7946 (2008).

495. Ryu, S.-I., Park, C.-S., Cha, J., Woo, E.-J. & Lee, S.-B. A novel trehalose-synthesizing glycosyltransferase from Pyrococcus horikoshii: Molecular cloning and characterization. Biochem. Biophys. Res. Commun. 329, 429–436 (2005).

496. Kitamoto, Y., Osaki, N., Tanaka, H., Sasaki, H. & Mori, N. Purification and properties of α- glucose 1-phosphateforming trehalose phosphorylase from a basidiomycete, Pleurotus ostreatus. Mycoscience 41, 607–613 (2000).

497. Ryu, S.-I. et al. Molecular cloning and characterization of trehalose synthase from Thermotoga maritima DSM3109: Syntheses of trehalose disaccharide analogues and NDP-glucoses. Enzyme Microb. Technol. 47, 249–256 (2010).

498. Albalat, R. & Cañestro, C. Evolution by gene loss. Nat. Rev. Genet. 17, 379–391 (2016).

499. Theobald, D. L. On universal common ancestry, sequence similarity, and phylogenetic structure: the sins of P-values and the virtues of Bayesian evidence. Biol. Direct 6, 60 (2011).

500. Fitch, W. M. Distinguishing homologous from analogous proteins. Syst. Zool. 19, 99–113 (1970).

501. Wiedenbeck, J. & Cohan, F. M. Origins of bacterial diversity through horizontal genetic transfer and adaptation to new ecological niches. FEMS Microbiol. Rev. 35, 957–976 (2011).

502. Richards, T. A., Leonard, G., Soanes, D. M. & Talbot, N. J. Gene transfer into the fungi. Fungal Biol. Rev. 25, 98–110 (2011).

503. Koonin, E. V., Makarova, K. S. & Aravind, L. Horizontal Gene Transfer in Prokaryotes: Quantification and Classification. Annu. Rev. Microbiol. 55, 709–742 (2001).

504. Fraser, C. M. et al. Evidence for lateral gene transfer between Archaea and Bacteria fromgenome sequence of Thermotoga maritima. Nature 399, 323–329 (1999).

505. Gibson, R. P., Tarling, C. A., Roberts, S., Withers, S. G. & Davies, G. J. The donor subsite of trehalose-6-phosphate synthase: Binary complexes with udp-glucose and udp-2-deoxy-2- fluoro-glucose at 2 Å resolution. J. Biol. Chem. 279, 1950–1955 (2004).

506. Gibson, R. P., Turkenburg, J. P., Charnock, S. J., Lloyd, R. & Davies, G. J. Insights into trehalose synthesis provided by the structure of the retaining glucosyltransferase OtsA. Chem. Biol. 9, 1337–46 (2002).

507. Saito, Kase, Takahashi & Horinouchi. Purification and characterization of a trehalose synthase from the basidiomycete grifola frondosa. Appl. Environ. Microbiol. 64, 4340–5 (1998).

508. Horlacher, R., Uhland, K., Klein, W., Ehrmann, M. & Boos, W. Characterization of a cytoplasmic trehalase of Escherichia coli. J. Bacteriol. 178, 6250–6257 (1996).

509. Shao, H. et al. Crystal Structures of a Multifunctional Triterpene/Flavonoid Glycosyltransferase

241

References

from Medicago truncatula. plant cell online 17, 3141–3154 (2005).

510. Antoine, E. et al. Thermosipho melanesiensis sp. nov., a New Thermophilic Anaerobic Bacterium Belonging to the Order Thermotogales, Isolated from Deep-Sea Hydrothermal Vents in the Southwestern Pacific Ocean. Int. J. Syst. Bacteriol. 47, 1118–1123 (1997).

511. Gartemann, K.-H. et al. Clavibacter michiganensis subsp. michiganensis: first steps in the understanding of virulence of a Gram-positive phytopathogenic bacterium. J. Biotechnol. 106, 179–191 (2003).

512. Chua, T. K. et al. The structure of sucrose phosphate synthase from Halothermothrix orenii reveals its mechanism of action and binding mode. Plant Cell 20, 1059–72 (2008).

513. Diricks, M. et al. Structural determinants underlying the nucleotide preference of Sucrose Synthase: improving the affinity of a bacterial Sucrose Synthase for UDP by introducing plant residues. FEBS J. (2016).

514. Buschiazzo, A. et al. Crystal structure of glycogen synthase: homologous enzymes catalyze glycogen synthesis and degradation. EMBO J. 23, 3196–205 (2004).

515. Koshland, D. E. Correlation of structure and function in enzyme action. Science 142, 1533–41 (1963).

516. Keseler, I. M. et al. EcoCyc: fusing model organism databases with systems biology. Nucleic Acids Res. 41, D605–D612 (2013).

517. Curtis, S. J. & Epstein, W. Phosphorylation of D-glucose in Escherichia coli mutants defective in glucosephosphotransferase, mannosephosphotransferase, and glucokinase. J. Bacteriol. 122, 1189–99 (1975).

518. Lunin, V. V et al. Crystal structures of Escherichia coli ATP-dependent glucokinase and its complex with glucose. J. Bacteriol. 186, 6915–27 (2004).

519. Purvis, J. E., Yomano, L. P. & Ingram, L. O. Enhanced Trehalose Production Improves Growth of Escherichia coli under Osmotic Stress. Appl. Environ. Microbiol. 71, 3761–3769 (2005).

520. Steinsiek, S. & Bettenbrock, K. Glucose transport in Escherichia coli mutant strains with defects in sugar transport systems. J. Bacteriol. 194, 5897–908 (2012).

521. Rimmele, M. & Boos, W. Trehalose-6-phosphate hydrolase of Escherichia coli. J. Bacteriol. 176, 5654–64 (1994).

522. Winkler, H. H. Compartmentation in the induction of the hexose-6-phosphate transport system of Escherichia coli. J. Bacteriol. 101, 470–5 (1970).

523. Sun, Y. & Vanderpool, C. K. Regulation and function of Escherichia coli sugar efflux transporter A (SetA) during glucose-phosphate stress. J. Bacteriol. 193, 143–53 (2011).

524. Dietz, G. W. & Heppel, L. A. Studies on the uptake of hexose phosphates. 3. Mechanism of uptake of glucose 1-phosphate in Escherichia coli. J. Biol. Chem. 246, 2891–7 (1971).

242

References

525. De Bruyn, F. et al. Development of an in vivo glucosylation platform by coupling production to growth: Production of phenolic glucosides by a glycosyltransferase of Vitis vinifera. Biotechnol. Bioeng. 112, 1594–1603 (2015).

526. Houghton-Larsen, J. et al. Recombinant Production of Steviol Glycosides. (2014).

527. Vieille, C. & Zeikus, G. J. Hyperthermophilic enzymes: sources, uses, and molecular mechanisms for thermostability. Microbiol. Mol. Biol. Rev. 65, 1–43 (2001).

528. de Champdoré, M., Staiano, M., Rossi, M. & D’Auria, S. Proteins from extremophiles as stable tools for advanced biotechnological applications of high social interest. J. R. Soc. Interface 4, 183–91 (2007).

529. Magana-Arachchi, D. N. & Wanigatunge, R. P. First report of genus Chroococcidiopsis (cyanobacteria) from Sri Lanka: a potential threat to human health. J. Natl. Sci. Found. Sri Lanka 41, (2013).

530. Carlson, R. The changing economics of DNA synthesis. Nat. Biotechnol. 27, 1091–1094 (2009).

531. Hospital, A., Goñi, J. R., Orozco, M. & Gelpí, J. L. Molecular dynamics simulations: advances and applications. Adv Appl Bioinform Chem. 8, 37–47 (2015).

532. Raval, A., Piana, S., Eastwood, M. P., Dror, R. O. & Shaw, D. E. Refinement of protein structure homology models via long, all-atom molecular dynamics simulations. Proteins Struct. Funct. Bioinforma. 80, n/a-n/a (2012).

533. Homaei, A. A., Sariri, R., Vianello, F. & Stevanato, R. Enzyme immobilization: an update. J. Chem. Biol. 6, 185–205 (2013).

534. Woese, C. R., Kandler, O. & Wheelis, M. L. Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc. Natl. Acad. Sci. U. S. A. 87, 4576–9 (1990).

243

References

244

Summary

245 Summary

Glycosylation - the addition of a sugar molecule onto an acceptor substrate - is a promising strategy to improve the activity, solubility, stability, flavor and/or pharmacokinetic behavior of chemicals such as pharmaceuticals, neutraceuticals or cosmetics. An example of a glycosylated molecule with potential as neutraceutical is nothofagin. This glycoside is also naturally found in rooibos (tea) and displays interesting properties such as anti-oxidant and anti-inflammatory activity.

Glycosides can be extracted directly from natural sources such as plants, but these methods are often highly-laborious and low-yielding. Alternatively, glycosylation can be performed either by conventional chemical synthesis or by a biocatalytic reaction with enzymes. Although chemical glycosylation is used intensively in the field of glycochemistry, it suffers from a number of drawbacks including labor-intensive activation and protection procedures to allow for regioselectivity, multistep synthetic routes with low overall yields, the use of toxic catalysts and solvents and the production of a large amount of waste. Reactions with GlycosylTransferases (GTs), which are nature’s most efficient enzymes for the production of glycosides, present a valuable and green alternative. Indeed, GTs can transfer a sugar molecule from a nucleotide- activated sugar donor (e.g. uridine diphosphate glucose, UDP-glucose) onto an acceptor compound with high regio- and/or stereoselectivity and high product yields, in an aqueous environment under mild reaction conditions. However, the industrial application of these enzymes in vitro is mainly hampered by their need for nucleotide-activated sugars, which are highly expensive and rarely available in large quantities (UDP-glucose: 150 €/g; UDP-galactose: 2500 €/g).

In this doctoral thesis, two strategies to make in vitro reactions with GTs more cost-efficient were evaluated: the use of Sucrose Synthase (SuSy) as intermediate enzyme to produce UDP-glucose from the cheap substrate sucrose (300 €/ton) and the engineering of GTs to alter the sugar donor specificity towards cheaper glycosyl-phosphates (e.g. glucose 1-phosphate: 15 €/g). To this end, several new bacterial SuSy enzymes were cloned, expressed, purified and characterized. In addition, one of them was subjected to extensive mutagenesis to improve or change properties such as substrate affinity, substrate specificity and stability. The enzyme Trehalose glycosylTransferring synthase, on the other hand, was used as a test-case to scrutinize the possibility of changing the donor specificity of GTs from nucleotide sugars towards glycosyl- phosphates through mutagenesis.

Sucrose or table sugar is a dissacharide consisting of a glucose and a fructose moiety. SuSy catalyzes the reversible conversion of sucrose and a nucleoside diphosphate (NDP, e.g. UDP) into NDP-glucose and fructose. Besides its pivotal role in the sucrose metabolism of photosynthetic organisms, SuSy has proven to be a valuable biocatalyst for practical applications. Indeed, it can for example be used for the production of UDP-glucose as end-product or as intermediate in coupled reactions. During such coupled reactions, UDP-glucose is continuously

246

Summary produced by SuSy and subsequently used by a second GT as sugar donor for the synthesis of glycosides (e.g. the anti-oxidant nothofagin). Since the discovery of Sucrose Synthase specificity in 1955 by Cardini and coworkers, various SuSys from plants and cyanobacteria, which are both phototrophic organisms, have been characterized. However, low activities and/or poor stability of the reported SuSy enzymes have impeded their commercial exploitation so far. To search for alternatives present in nature, a phylogenetic and taxonomic analysis of the available SuSy sequences was performed. These revealed that also non-photosynthetic bacteria, belonging to the phyla Proteobacteria, Deferribacteres, Chrysiogenetes, Ignavibacteriae, Nitrospinae and Firmicutes, harbor putative SuSy homologues. Recombinant expression and characterization of several of these ‘novel’ SuSy enzymes turned out to be valuable from both a fundamental and industrial point of view. Indeed, it contributed to our understandings of the largely unexplored sucrose metabolism within non-photosynthetic organisms and it led to the identification of a SuSy enzyme with favorable properties for practical applications. This SuSy, originating from the bacterium Acidithiobacillus caldus (SuSyAc), was particularly useful for the production of UDP- glucose as end-product because of its high stability, high activity at high concentrations of UDP and good expression yield. Its potential as an industrial biocatalyst was proven by a whole-cell bioconversion experiment, which was characterized by high yields of UDP-glucose (86% based on UDP), a space time yield of 10 g/L/h and an excellent total turnover number of 103 gUDP- glucose/gcell dry weight.

At low concentrations of UDP, SuSyAc is much less active. This makes the enzyme unsuitable for coupled reactions where only low amounts of UDP are added to reduce costs and to avoid substrate inhibition of the second GT. However, the affinity of SuSyAc for UDP could be improved significantly by mutagenesis. The best mutant (SuSyAc LMDKVVA) showed an affinity for UDP of 0.13 mM, which is a 55-fold improvement compared to the unmutated enzyme (wild-type). Moreover, replacing the wild-type with SuSyAc LMDKVVA in a coupled reaction with the glycosyltransferase OsCGT resulted in a 9-fold increased production rate of the anti-oxidant nothofagin.

Besides UDP-glucose, SuSyAc can also produce other valuable nucleotide sugars. It can, for example, generate UDP-galactose in a one-step reaction if GalFru is used as sugar donor instead of sucrose. However, this reaction is very inefficient despite the high structural resemblance between the two molecules. The absence of specific hydrogen bond interactions between the non-natural substrate GalFru and the enzyme and/or between GalFru and UDP, is likely to be the main reason for this observation. Unfortunately, both mutagenesis-dependent and independent strategies failed to improve the activity of SuSyAc on GalFru. Nonetheless, the research led to the establishment of a general, time- and cost-efficient screening protocol to evaluate the activity of SuSy mutants towards sugar donors other than sucrose.

247

Summary

As an alternative strategy to make GT reactions more cost-efficient, one could try to engineer the specificity of GTs towards cheaper donor substrates (e.g. glucose 1-phosphate instead of UDP- glucose). Here, as a test case, Trehalose glycosylTransferring synthase from Fervidobacterium pennivorans (TreTFp) was subjected to mutagenesis, with the aim of converting it into a Trehalose Phosphorylase. TreTFp belongs to the GT4 family and is able to synthesize trehalose from UDP-glucose and glucose. To guide the mutagenesis, sequence information of existing GT4 Trehalose Phosphorylases was used. These enzymes are naturally capable of converting glucose 1-phosphate and glucose into trehalose. Unfortunately, none of the TreTFp mutants showed phosphorylase activity. In addition, most of them also lost trehalose synthase activity, had lower expression yields in the soluble fraction and/or were less stable.

Summarized, this work resulted in the expansion of the arsenal of industrially relevant SuSy enzymes, taking us one step closer to the economically viable production of glucosides. In addition, it provided new insights into the sucrose metabolism of bacteria and into structure- function relationships of GT4 enzymes. However, the unsuccessful engineering experiments with both SuSy and TreT, highlight the need for additional studies with this fascinating class of enzymes to unlock new applications.

248

Samenvatting

249 Samenvatting

Glycosylatie – de toevoeging van een suikergroep aan een andere molecule (acceptor) – kan gebruikt worden om de stabiliteit, activiteit, oplosbaarheid en/of farmacokinetische eigenschappen van nutraceuticals, therapeutica of cosmetica te verbeteren. Een voorbeeld van zo een geglycosyleerde nutraceutical is nothofagine. Dit glycoside komt ook natuurlijk voor in rooibos (thee) en vertoont interessante eigenschappen zoals anti-oxidant en anti-inflammatoire activiteit.

Glycosiden kunnen rechtstreeks geëxtraheerd worden uit natuurlijke bronnen zoals planten maar deze methoden zijn dikwijls zeer arbeidsintensief en gaan gepaard met lage opbrengsten. De productie van glycosiden kan echter ook gebeuren via conventionele chemische synthese of via een biokatalytische reactie met enzymen. Chemische glycosylatie wordt veel gebruikt in het veld van de glycochemie maar heeft toch heel wat nadelen zoals de nood aan arbeidsintensieve activatie en protectie procedures om regioselectiviteit te verzekeren, het gebruik van giftige katalysatoren en solventen and de productie van heel wat afval. Als groen alternatief kan gebruik gemaakt worden van enzymen, zoals GlycosylTransferasen (GT’s). In levende organismen zijn GT’s de meest efficiënte biokatalysatoren voor het creëren van glycosidische bindingen. Hierbij brengen ze zeer selectief een suikergroep over van een geactiveerde suikerdonor (bijv. nucleotidesuikers zoals uridinedifosfaat-glucose, UDP-glucose) naar een acceptormolecule in waterige milieu onder milde reactie condities. Industriële toepassing van GT’s in vitro, wordt echter voornamelijk bemoeilijkt door de hoge kostprijs van deze geactiveerde suikers (bijv. UDP- glucose: 150 €/g; UDP-galactose: 2500 €/g) en het feit dat ze zelden beschikbaar zijn in grote hoeveelheden.

In deze doctoraatsthesis werden twee strategieën geëvalueerd om in vitro reacties met GT’s kost-efficiënter te maken: Sucrose Synthase (SuSy) aanwenden als intermediair enzym om UDP- glucose te produceren uit het goedkope substraat sucrose (300 €/ton) en GT’s muteren zodat ze goedkopere suikerdonors (vb. glucose-1-fosfaat: 15 €/g) kunnen gebruiken. Meerdere nieuwe bacteriële SuSy enzymen werden hiervoor gekloneerd, tot expressie gebracht, opgezuiverd en gekarakteriseerd. Verder werd één van hun onderworpen aan mutagenese om eigenschappen zoals substraat affiniteit, substraat specificiteit en stabiliteit te verbeteren of aan te passen. Ten slotte werd Trehalose GlycosylTransferring synthase gebruikt als test-case om te onderzoeken of het mogelijk is om de donor specificiteit van GTs te veranderen naar glycosyl-fosfaten in plaats van nucleotide suikers.

Sucrose of tafelsuiker is een disaccharide bestaande uit een glucose- en een fructose-eenheid. SuSy is een bijzonder GT dat sucrose kan afbreken tot NDP-glucose en fructose, met behulp van een nucleosidedifosfaat (NDP, bijv. UDP). Dit enzym speelt een centrale rol in het sucrose- metabolisme van fotosynthetiserende organismen. Het is echter ook gegeerd voor praktische toepassingen, o.a. voor de productie van UDP-glucose als eindproduct of als tussenproduct in gekoppelde reacties. In zo een gekoppelde reactie wordt de door SuSy geproduceerde UDP-

250

Samenvatting glucose direct gebruikt door een tweede GT als suikerdonor voor de aanmaak van glycosiden (bijv. de anti-oxidant nothofagine). De voorbije 70 jaar zijn er verschillende SuSy’s, allen afkomstig uit fotosynthetische cyanobacteriën en planten, beschreven en gekarakteriseerd. Tot op heden hebben de lage activiteit en/of stabiliteit van de gerapporteerde SuSy-enzymen hun commerciële exploitatie echter belemmerd. Daarom werd in dit onderzoek gezocht naar nieuwe, alternatieve SuSy’s op basis van een fylogenetische en taxonomische analyse van alle beschikbare SuSy-sequenties. Uit deze analyse bleek dat ook niet-fotosynthetische bacteriën, behorend tot de phyla Proteobacteria, Deferribacteres, Chrysiogenetes, Ignavibacteriae, Nitrospinae en Firmicutes, mogelijks SuSy-homologen bezitten. Recombinante expressie en karakterisatie van een aantal van deze 'nieuwe' bacteriële enzymen bleek zowel vanuit fundamenteel als industrieel oogpunt waardevol. Het verschafte immers nieuwe inzichten in het grotendeels onbekende sucrose-metabolisme van niet-fotosynthetiserende organismen maar het leidde ook tot de identificatie van een SuSy met gunstige eigenschappen voor praktische toepassingen. Dit SuSy-enzym, afkomstig uit de bacterie Acidithiobacillus caldus (SuSyAc), is namelijk veelbelovend voor de industriële productie van UDP-glucose als eindproduct door zijn hoge stabiliteit, hoge activiteit bij hoge concentraties UDP en goede expressieopbrengst. Het potentieel van dit enzym werd bewezen in een ‘whole-cell’ bioconversie-experiment waarbij een grote hoeveelheid UDP-glucose werd behaald in een relatief korte tijdspanne (10 g/L/h) en met een minimale hoeveelheid aan cellen (103 gUDP-glucose/gcel droog gewicht).

SuSyAc is bij lage concentraties UDP echter veel minder actief. Hierdoor is het enzym ongeschikt voor gekoppelde reacties waar slechts lage hoeveelheden van UDP worden toegevoegd om kosten te besparen en substraatsinhibitie van het tweede GT te voorkomen. De UDP-affiniteit van SuSyAc kon echter significant verbeterd worden door het enzym op meerdere plaatsen te muteren. De beste mutant (SuSyAc LMDKVVA) vertoonde een affiniteit voor UDP van 0.13 mM, wat een 55-voudige verbetering is in vergelijking met het ongemuteerde enzym (wild-type). Het vervangen van het wild-type door SuSyAc LMDKVVA in een gekoppelde reactie met het glycosyltransferase OsCGT, resulteerde bovendien in een 9-voudige verhoogde productiesnelheid van de anti-oxidant nothofagine.

SuSyAc kan naast UDP-glucose ook andere waardevolle nucleotidesuikers produceren. UDP- galactose kan bijvoorbeeld gegenereerd worden in een 1-stapsreactie indien GalFru als suikerdonor wordt gebruikt in plaats van sucrose. Helaas is deze reactie zeer inefficiënt ondanks de grote structurele gelijkenis tussen GalFru en sucrose. De afwezigheid van specifieke waterstofbruggen tussen het niet-natuurlijke substraat GalFru en het SuSyAc-enzym en/of tussen GalFru en UDP zou hiervoor een mogelijke verklaring kunnen zijn. Om de activiteit van SuSyAc voor GalFru te verbeteren werd opnieuw gepoogd het enzym gericht te veranderen, weliswaar zonder succes. Desalniettemin heeft dit werk geleid tot de ontwikkeling van een algemeen, kost- en tijdsefficiënt protocol om de activiteit van SuSy-mutanten op andere suikerdonors dan sucrose te evalueren.

251

Samenvatting

Als alternatieve strategie om de prijs van reacties met GT’s te drukken, zou men GT’s ook kunnen muteren zodat ze goedkopere suikerdonors kunnen gebruiken (bijvoorbeeld glucose 1- fosfaat i.p.v. UDP-glucose). In dit werk werd daarom, als case-study, het Trehalose glycosylTransferring synthase van Fervidobacterium pennivorans (TreTFp) onderworpen aan mutagenese met als doel dit enzym om te zetten naar een Trehalose Fosforylase. TreTFp behoort tot de GT4-familie en kan trehalose synthetiseren uit NDP-glucose en glucose. Om de mutagenese te sturen, werd sequentie-informatie van bestaande GT4 Trehalose Fosforylasen gebruikt. Deze enzymen kunnen van nature glucose-1-fosfaat en glucose omzetten in trehalose. Helaas had geen enkele van de TreTFp mutanten fosforylase activiteit. Bovendien hadden de meesten geen trehalose synthase activiteit meer, kwamen ze minder goed tot expressie en/of hadden ze een verminderde stabiliteit.

Samengevat kan er gesteld worden dat dit proefschrift geleid heeft tot een expansie van het arsenaal aan industrieel relevante SuSy-enzymen. Dit brengt ons opnieuw een stap dichter bij de economisch haalbare, enzymatische productie van glucosiden. Bovendien leverde het onderzoek nieuwe inzichten op over het sucrose-metabolisme van bacteriën en over de structuur-functie relaties van GT4-enzymen. De niet-geslaagde ‘engineering’ experimenten met SuSy en TreT benadrukken echter de noodzaak tot verdere studies met deze fascinerende klasse enzymen, zodoende het toepassingsgebied verder te kunnen uitbreiden.

252

Curriculum vitae

253 Curriculum vitae

PERSONAL DETAILS

Name Margo Diricks

Professional address Centre for Synthetic Biology Faculty of Bioscience engineering Coupure Links 653, 9000 Ghent Belgium

Personal address Martelaarslaan 269, 9000 Ghent Belgium

Nationality Belgian

Date of birth 22nd of December 1990

Telephone +32496/99.29.56

E-mail [email protected] [email protected]

EDUCATION

2013 – 2017 Doctor of Applied Biological Sciences

Ghent University, Faculty of Bioscience Engineering, Centre for Synthetic Biology, Unit for Biocatalysis and Enzyme Engineering

(http://www.biocatalysis.ugent.be) Doctoral scholarship: special research fund (BOF) Date of public defense: September 2017

2011 – 2013 Master of Science in Bioscience Engineering: Cell and Gene Biotechnology

Obtained with the greatest distinction at Ghent University

2008 – 2011 Bachelor of Science in Bioscience Engineering: Cell and Gene Biotechnology

Obtained with great distinction at Ghent University

254

Curriculum vitae

CONFERENCES

12th Carbohydrate Bioengineering Meeting (23/04/2017-26/04/2017) in Vienna (Austria) Poster presentation: ‘Identification and engineering of bacterial Sucrose Synthases’

5th International Conference on Novel Enzymes (11/10/2016-14/10/2016) in Groningen (Netherlands) Poster presentation: ‘Improving the affinity of a bacterial Sucrose Synthase for UDP by introducing plant residues’

12th Biotrans (26/07/2015-30/07/2015) in Vienna (Austria) Poster presentation: ‘Identification of Sucrose Synthase in non-photosynthetic bacteria and characterization of the recombinant enzymes’

25th Joint Glycobiology Meeting (14/09/2014-16/09/2014) in Ghent (Belgium) Poster presentation: ‘Ancestral protein reconstruction as protein engineering strategy to improve glycosylation reactions’

COURSES

Specialist courses:

Structurele bio-informatica en analyse (2016) N2N Multidisciplinary Seminar Series on Bioinformatics (2015) Ghent Biobased Economy Summer School (2013) 3DM Course (2013)

Transferable skills:

Communication skills Advanced Academic English: Conference Skills - Academic Posters (2013) Research & valorization Getting started with High Performance Computing. Part 1. Unix command line (2016) Getting started with High Performance Computing. Part 2. HPC basics (2016) Introduction Day for new PhD's (2013) Leadership & efficiency Leadership Foundation Course (2016)

255

Curriculum vitae

PEER-REVIEWED PUBLICATIONS

Gutmann A., Lepak A., Diricks M., Desmet T., Nidetzky B. (2017). Glycosyltransferase cascades for natural product glycosylation: Use of plant instead of bacterial sucrose synthases improves the UDP-glucose recycling from sucrose and UDP. Biotechnolgy journal. DOI:10.1002/biot.201600557

Diricks M., Gutmann A., Debacker S., Dewitte G., Nidetzky B., Desmet T. (2016). Sequence determinants of nucleotide binding in Sucrose Synthase: improving the affinity of a bacterial Sucrose Synthase for UDP by introducing plant residues. Protein engineering, Design and Selection. DOI: 10.1093/protein/gzw048

Dewitte G., Walmagh M., Diricks M., Lepak A., Gutmann A., Nidetzky B., Desmet T. (2016). Screening of recombinant Glycosyltransferases reveals the broad acceptor specificity of stevia UGT-76G1. Journal of biotechnology. DOI: 10.1016/j.jbiotec.2016.06.034

Schmölzer K., Gutmann A., Diricks M., Desmet T., Nidetsky B. (2016). Sucrose synthase: A unique glycosyltransferase for biocatalytic glycosylation process development. Biotechnology advances. DOI: 10.1016/j.biotechadv.2015.11.003

Diricks M., De Bruyn F., Van Daele P., Walmagh M., Desmet T. (2015). Identification of Sucrose Synthase in nonphotosynthetic bacteria and characterization of the recombinant enzymes. Applied Microbiology and Biotechnology. DOI: 10.1007/s00253-015-6548-7

Verhaeghe T., Aerts D., Diricks M., Soetaert W., Desmet T. (2014). The quest for a thermostable sucrose phosphorylase reveals sucrose 6'-phosphate phosphorylase as a novel specificity. Applied Microbiology and Biotechnology. DOI: 10.1007/s00253-014-5621-y

Verhaeghe T., Diricks M., Aerts D., Soetaert W., Desmet, T. (2013). Mapping the acceptor site of sucrose phosphorylase from Bifidobacterium adolescentis by alanine scanning. Journal of molecular catalysis B-enzymatic. DOI: 10.1016/j.molcatb.2013.06.014

PATENTS Diricks M., Dewitte G., Desmet T. Mutant sucrose synthases and their uses. EP 16172107.1

256

Curriculum vitae

STUDENT GUIDANCE

Practical courses:

Biocatalysis – Bioethanol (2015) Biocatalysis – Pymol (2014) Biocatalysis – (2013)

MSc thesis: Simon Debacker (2015-2016). Engineering of biocatalysts for the cost-effective production of glucosylated compounds.

Jeff Moens (2015-2016). Engineering the substrate specificity and stability of sucrose synthase from Acidithiobacillus caldus.

Paul Van Daele (2014-2015). Characterization and engineering of new prokaryotic sucrose synthases.

257