<<

Research Collection

Doctoral Thesis

Enzyme engineering for intensified processes for the production of rare monosaccharides

Author(s): Bosshart, Andreas

Publication Date: 2014

Permanent Link: https://doi.org/10.3929/ethz-a-010252699

Rights / License: In Copyright - Non-Commercial Use Permitted

This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use.

ETH Library

DISS. ETH NO 22128

Enzyme engineering for intensified processes for the production of rare monosaccharides

A thesis submitted to attain the degree of

DOCTOR OF SCIENCES of ETH ZURICH (Dr. sc. ETH Zurich)

presented by

Andreas Bosshart Master of Science UZH in Biochemistry, University of Zurich

born on 29.11.1982 citizen of Fischingen (TG), Switzerland

accepted on the recommendation of

Prof. Dr. Sven Panke (ETH Zurich, Switzerland), examiner Prof. Dr. Andreas Plückthun (University of Zurich, Switzerland), co-examiner Prof. Dr. Sai Reddy (ETH Zurich, Switzerland), co-examiner

2014

ABSTRACT With the current trend in chemical industry towards more sustainable processes that produce less waste and are less energy-intensive, enzyme-catalyzed reactions are gaining increasing interest due to their striking selectivity and specificity and their ability to operate at ambient conditions (neutral pH, aqueous solution, moderate temperatures). However, biocatalysts considered for industrial application need to provide high reaction rates (i.e., a high specific activity) and operational stability in order to be competitive in economic terms. In order to overcome limitations in such properties, enzyme engineering procedures like directed evolution or rational design have been successfully applied for a wide range of . This thesis describes the development of a D-tagatose epimerase from Pseudomonas cichorii (PcDTE) into an industrially suitable biocatalyst by directed evolution. D-Tagatose epimerase is arguably the central enzyme in the synthesis of a wide range of rare monosaccharides, carbohydrates that are not available in large amounts from natural resources, but have recently attracted great interest as low-calorie sweetener, chiral building blocks, or active pharmaceutical ingredients. They can be synthesized via simple isomerization or epimerization reactions from readily available bulk sugars as e.g. D-glucose, D-fructose or D-galactose, but these reactions suffer from an unfavorable position of the thermodynamic equilibrium, limiting the yield and making these reactions economically unfeasible. Integration of enzyme- catalyzed reaction and separation of and reactant, e.g. by continuous chromatography, into one continuous process can offer a highly attractive solution to overcome this limitation. Such an integrated process, however, demands high standards of the biocatalyst in terms of stability, selectivity and specific activity in order to enable an economic process. Thermostability is one of the most frequent limitations in the application of biocatalysts for the synthesis of fine and bulk chemicals, especially at elevated temperatures which are often required in sugar-processing plants to avoid microbial contaminations, to reduce the viscosity and to increase the reaction rate. Systematic screening of the subunit-subunit interface of dimeric PcDTE in a medium-throughput in vitro assay format revealed several mutations that increased the thermostability of PcDTE. Combination of all 9 beneficial sites by iterative saturation mutagenesis (ISM) resulted in variant PcDTE Var8 that had an increase in temperature stability of 21.4°C over wild-type. This variant showed no significant loss in D-fructose conversion over 4 days in a long-term experiment at 50°C, while conversion catalyzed by the WT decreased by 40% in the same time. Next, this variant was taken as basis for improving the specific activity of the enzyme for the reactions D-fructose/D-psicose and L-sorbose/L-tagatose. Saturation mutagenesis of 20 individual residues in the first sphere and 28 residues in the second sphere around the revealed 8 mutations that increased activity towards D-fructose in variant IDF8 and 6 mutations that did so for L-sorbose in variant ILS6. IDF8 exhibited an increased specific activity

I for D-fructose of 8.6-fold and ILS6 showed a 13.5-fold higher catalytic rate for L-sorbose. The crystal structures of PcDTE Var8, IDF8 and ILS6 in presence of the respective substrates indicated that modifications of the entrance tunnel afforded an increased catalytic rate for IDF8 whereas the increase for ILS6 rather derived from differences in the hydrogen bonding network to and product. The operational performance of further developed variant of IDF8, IDF10-3, was determined in an enzyme-membrane reactor at different temperatures in presence of the respective substrate and confirmed that these final variants exhibited an up to 40-fold higher total turnover numbers compared to WT PcDTE. Finally, to reduce the screening effort for the directed evolution of PcDTE, a growth-based selection system was developed that was expected to facilitate the discovery of variants with improved specific activity significantly. Therefore, a de novo metabolic pathway from D-allose via D-psicose and D-fructose to fructose 6-phosphate was established by introduction of the enzymes L-rhamnose , D-tagatose epimerase and fructokinase. Additionally, five genes or complete operons of the E. coli host were deleted that were found to interfere with the proposed metabolic pathway. Growth of the assembled selection system on D-allose as sole carbon source could demonstrate the principle feasibility of the approach, but consistent failures of the selections suggest that either a pathway intermediate had a toxic effect on the selection host or that the catalytic activity of one of the pathway enzymes had too low activity to effectively satisfy the flux requirement for growth on D-allose.

II ZUSAMMENFASSUNG Der Fokus der chemischen Industrie liegt zunehmend auf neuen Prozessen, die weniger Abfall produzieren, weniger Energie verbrauchen und weniger gefährliche oder giftige Lösungsmittel benötigen. Enzym-katalysierte Reaktionen können hier einen entscheidenden Beitrag leisten, sowohl wegen ihrer erstaunlichen Spezifität und Selektivität, als auch wegen ihrer Fähigkeit, Reaktionen in wässriger Lösung, bei beinahe neutralem pH und moderaten Temperaturen zu katalysieren. Um sowohl prozesstechnischen als auch ökonomischen Ansprüchen zu genügen, muss ein Biokatalysator jedoch bestimmte Anforderungen erfüllen, wie beispielsweise eine seht gute spezifische Aktivität und Selektivität, als auch genügend Prozessstabilität gewährleisten. Durch Protein-Engineering lassen sich praktisch alle dieser Parameter an die Prozessbedürfnisse anpassen. In dieser Arbeit wird die Entwicklung einer D-Tagatoseepimerase vom Bakterium Pseudomonas cichorii (PcDTE) behandelt, die durch gerichtete Evolution zu einem Biokatalysator verändert wurde, der prozessbedingte Anforderungen erfüllen kann. D-Tagatoseepimerase ist das zentrale Enzym in der Synthese von seltenen Zuckern, die in letzter Zeit grosses Interesse geweckt haben durch ihre potentielle Rolle als kalorienarme Süssstoffe, als chirale Bausteine für die Synthese von Arzneistoffen oder direkt als pharmazeutisch aktive Stoffe. Diese seltenen Zucker kommen in der Natur nur in Spuren vor, lassen sich aber durch einfache enzymatische Isomerisierungsreaktionen aus günstig verfügbaren Zuckern wie D-Glukose, D-Fruktose oder D-Galaktose herstellen. Das Hauptproblem bei der Synthese dieser Stoffe durch enzymatische Katalyse ist das thermodynamische Gleichgewicht der Iso- oder Epimerisierungsreaktionen, was dazu führt, dass die Reaktionen unvollständig ablaufen und somit nur eine relativ geringe Ausbeute ermöglichen. Durch eine Integration der enzymatischen Reaktion mit einer direkt anschliessenden Trennung des Produkts von Ausgangs- und Zwischenprodukten und einem Recycling des Ausgangsstoffs zurück in die Reaktion, lässt sich dieses Hindernis jedoch effektiv überwinden. Andererseits stellt eine solche Integration hohe Anforderungen an den Biokatalysator bezüglich Selektivität, spezifische Aktivität und vor allem Thermostabilität, die indirekt – als Stellvertreter für Prozessstabilität allgemein – und in diesem Fall auch direkt (siehe unten) von Bedeutung ist. Ungenügende Prozess- und/oder Thermostabilität ist eine der häufigsten Limitationen bei der Anwendung von Biokatalysatoren in der Synthese von Fein- und Massenchemikalien. Oftmals werden erhöhte Temperaturen benötigt, speziell in der Zuckerindustrie, um Kontaminationen durch Mikroorganismen zu verhindern, die Viskosität der Zuckerlösung zu reduzieren oder um die Reaktionsrate zu erhöhen. Das Wildtypenzym PcDTE zeigte eine ungenügende Stabilität bei solchen Bedingungen. Eine systematische Durchmusterung der Kontaktflächen der beiden Untereinheiten des dimeren Enzyms PcDTE förderte mehrere Mutationen zutage, die eine erhöhte Thermostabilität des Enzyms bewirkten, vermutlich indem sie die Trennung der

III Untereinheiten als den die Enzyminaktivierung einleitenden Schritt hinauszögern. Die Kombination von optimalen Mutationen in allen neun identifizierten Positionen durch iterative

20 Sättigungsmutagenese resultierte in einer finalen Variante namens PcDTE Var8, die einen T50 - Wert (die Temperatur, bei der nach einer Inkubation von 20 min nur noch 50% der ursprünglichen Aktivität gemessen werden kann) von 87°C erreichte, was einem Zuwachs von 21.4°C im Vergleich zum Wildtypenzym entspricht. Die darauffolgende Evaluation der operationellen Stabilität unter Produktionsbedingungen in einem Enzym-Membran Reaktor zeigte, dass Var8 auch nach 4 Tagen bei 50°C kaum Aktivität einbüsste, während der Wildtyp in dieser Zeit 40% Aktivität verlor. Diese Variante wurde daraufhin als Ausgangspunkt gewählt, um die spezifische Aktivität gegenüber den Substraten D-Fruktose und L-Sorbose zu verbessern. Sowohl die Positionen der 20 Aminosäuren, die das aktive Zentrum des Enzym direkt umgeben, als auch die 28 Positionen, in denen sich die Aminosäuren der zweiten Schicht befinden, wurden durch Sättigungsmutagenese randomisiert und die resultierenden Enzymvarianten auf erhöhte Aktivität durchmustert. Dabei wurden insgesamt 8 Mutationen gefunden, die eine Erhöhung der katalytischen Aktivität gegenüber D-Fruktose um das 8.6-fache bewirkten (Variante IDF8), und 6 Mutationen wurden gefunden, die die Aktivität gegenüber L-Sorbose um das 13.5-fache erhöhten (Variante ILS6). Die Enzymvarianten Var8, IDF8 und ILS6 wurden in Gegenwart ihrer jeweiligen Substrate kristallisiert und ihre Kristallstruktur wurde bestimmt. Basierend auf diesen Strukturen konnte die Hypothese aufgestellt werden, dass die Erhöhung der katalytischen Rate von IDF8 für D-Fruktose durch eine Modifikation des Eingangstunnels für das Substrat bedingt ist, während für ILS6 eine Änderung im Wasserstoffbrückenbindungsnetzwerk, das die Positionierung des Substrats L-Sorbose im aktiven Zentrum massgeblich beeinflusst, die Erhöhung der Aktivität verursacht. Als nächstes wurden zwei vorteilhafte Mutationen in IDF8 eingeführt, die in einer parallelen Durchmusterung einer Bibliothek gefunden wurden, die auf zufälliger Mutagenese beruhte. Zusammen mit einer Rückmutation einer sehr destabilisierenden Mutation und einer weiteren vorteilhaften Mutation ergab das die finale Variante IDF10-3 für die effiziente Umsetzung von D-Fruktose zu D-Psikose. Die Varianten IDF10-3 und ILS6 wurden in Gegenwart des jeweiligen Substrats auf ihr Verhalten unter operationellen Bedingungen in einem Enzym-Membran Reaktor bei unterschiedlichen Temperaturen hin untersucht. Es zeigte sich, dass diese verbesserten Varianten bis zu 40-mal höhere total Substratumsatzzahlen (total turnover numbers, TTN) erzielten, verglichen mit dem ursprünglichen Wildtyp. Um den Aufwand zu reduzieren, der für das Durchmustern von grossen Bibliotheken aufgewendet werden muss, wurde ein wachstumsbasiertes in vivo Selektionssystem entwickelt, das die Evaluation einer sehr grossen Anzahl Varianten in kurzer Zeit erlauben sollte. Dafür wurde ein de novo metabolischer Pfad in einen Escherichia coli Wirt implementiert, der von D-Allose über D-Psikose und D-Fruktose zu Fruktose 6-Phosphat führt, dass dann in die

IV Glykolyse eingespeist wird. Dazu wurden die Gene von L-Rhamnoseisomerase, D- Tagatoseepimerase und Fruktokinase rekombinant exprimiert. Zusätzlich wurden 5 Gene und Operons ausgeschaltet, die mit dem projektierten metabolischen Pfad interferierten. Wachstum des zusammengesetzten Selektionssystems auf D-Allose als einziger Kohlenstoffquelle bestätigte die grundsätzliche Machbarkeit dieses Ansatzes, aber widerholte Versuche für eine Selektion von verbesserten Mutationen war nicht erfolgreich. Dieses Resultat wurde darauf zurückgeführt, dass entweder eines der Intermediate des neuen metabolischen Pfades eine toxische Wirkung auf den Wirt ausübt, oder dass eines der neu eingeführten Enzyme eine zu geringe spezifische Aktivität aufweist, um einen genügend grossen metabolischen Fluss durch den Pfad zu ermöglichen.

V

VI TABLE OF CONTENT Abstract ...... I Zusammenfassung ...... III Table of content ...... VII 1. Introduction ...... 1 1.1. General introduction – process intensification for enzymatic reactions ...... 2 1.2. Synthesis of rare monosaccharides by iso - and epimerases ...... 2 1.3. Increasing the yield for enzyme reactions with unfavorable equilibrium position ...... 5 1.4. Overcoming enzymatic limitations by directed evolution ...... 7 1.5. D-Tagatose epimerase is the central enzyme for rare sugar production ...... 8 1.6. Scope of this thesis ...... 11 2. Screening and selection methods for the directed evolution of industrially relevant enzymes ...... 13 2.1. Abstract ...... 14 2.2. Introduction ...... 14 2.3. Analysis method and library size ...... 17 2.4. Diversity generation – random or focused mutagenesis ...... 20 2.5. Screening or selecting for improved biocatalysts ...... 22 2.6. Selection and screening ...... 24 2.7. In vivo selection ...... 24 2.8. In vivo screening ...... 27 2.9. In vitro selection ...... 33 2.10. In vitro screening ...... 35 2.11. Future directions in the directed evolution of biocatalysts ...... 40 2.12. Conclusion ...... 41 3. Systematic optimization of interface interactions increases the thermostability of a multimeric enzyme ...... 43 3.1. Abstract ...... 44 3.2. Introduction and Results ...... 44 3.3. Materials and Methods ...... 51 3.4. Acknowledgment ...... 57 3.5. Supporting Material ...... 58 4. Directed divergent evolution of a thermostable D-tagatose epimerase towards improved activity for two different hexose substrates ...... 65 4.1. Abstract ...... 66 4.2. Introduction ...... 66 4.3. Results ...... 68 4.4. Discussion ...... 81 4.5. Material and Methods ...... 83 4.6. Acknowledgment ...... 92 4.7. Supporting Material ...... 93

VII 5. Highly efficient production of rare sugars D-psicose and L-tagatose by two engineered D-tagatose epimerases ...... 103 5.1. Abstract ...... 104 5.2. Introduction ...... 104 5.3. Results ...... 106 5.4. Discussion ...... 114 5.5. Material and Methods ...... 116 5.6. Acknowledgment ...... 121 5.7. Supporting Material ...... 122 6. Development of a de novo orthogonal D-allose catabolic pathway in Escherichia coli and its application as an in vivo selection system ...... 127 6.1. Abstract ...... 128 6.2. Introduction ...... 128 6.3. Results ...... 132 6.4. Discussion ...... 144 6.5. Material and Methods ...... 147 6.6. Supporting Material ...... 158 7. Conclusion and Outlook ...... 163 8. References ...... 167 9. Acknowledgements ...... 179 10. Curriculum vitae ...... 181

VIII CHAPTER 1: INTRODUCTION

1 1.1. General introduction – process intensification for enzymatic reactions

The chemical industry is currently undergoing rapid changes, not only concerning the shift of its raw materials from mainly petro-based to more sustainable bio-based compounds, but also concerning the economy of its production methods and the sustainability of the chemical processes itself [1]. One of the most important strategies to improve process performance in this context is process intensification, defined as “any chemical engineering development that leads to a substantially smaller, cleaner, and more energy-efficient technology” [2]. Process intensification can be achieved by miniaturization and parallelization, implementation of multifunctional reactors or, most importantly with respect to the present thesis, by integrating reaction and separation in one single operation [3]. Such a strategy can be instrumental to overcome frequently encountered constraints in enzyme catalyzed reactions as for example product inhibition, rapidly degrading products under production conditions, or an unfavorable position of the thermodynamic equilibrium. In this context, continuous process operation, in contrast to the more prevalent batch reaction in fine chemistry, is becoming more and more important for process intensification, as it often exhibits higher productivity and can reduce the number of process steps that are involved and thus the amount of waste that is produced [4]. In organic chemistry, the coupling of several reaction steps in one vessel usually requires extensive fine-tuning of the reaction conditions, if possible at all, as these conditions typically differ substantially for each step in terms of solvent composition, temperature and pH. This stands in marked contrast to biocatalysis, i.e. the utilization of enzymes to manufacture chemicals. Enzymes have evolved to catalyze reaction with astonishing chemo-, regio- and enantioselectivity [5] under very similar conditions in water and at ambient temperature and pH (or at least the possible windows for temperature and pH are much smaller than for chemical reactions). This considerably facilitates the design of multi-step reactions in one vessel. Further, the high specificity and selectivity of enzymatic reactions allows the implementation of multi-enzyme cascades with often only negligible formation of side-products, thus maximizing the use of potentially expensive starting materials.

1.2. Synthesis of rare monosaccharides by iso - and epimerases

To demonstrate the potential of process intensification for biocatalytic reactions in the field of fine chemical synthesis, the synthesis of uncommon hexose sugars by means of sugar and epimerases was chosen as a highly interesting proof of principle study. Of all 24 possible hexose sugars (Figure 1.1), only 4 are readily available, namely D-glucose and D-fructose and that can be obtained from staple food as for example sugar beet, sugar cane or corn starch,

2 whereas D-galactose is obtained from the disaccharide lactose from milk. Among the 12 hexoses of the L-configuration, only L-sorbose is present in considerable amounts from the enzymatic oxidation of D-sorbitol with Gluconobacter sp cells, an intermediate step in the synthesis of vitamin C via the Reichstein process [6]. The 20 remaining monosaccharide hexoses, which are hardly or even not at all available from natural sources, are termed “rare sugars” [7]. The scarcity of these rare monosaccharides is in contrast to their potential usefulness in different fields of application: Several rare monosaccharides have been found to exhibit antitumor activity (e.g. D-allose, D-talose, L-glucose), others can be used as low-calorie sweetener (e.g. D-psicose, D-tagatose) and most of them can be used as valuable chiral building blocks for the synthesis of active pharmaceutical ingredients, such as for example L- galactose that can serve as starting material for the synthesis of L-nucleoside-based antiviral drugs [8]. Next to their limited availability form natural sources, synthesis of rare monosaccharides by traditional organic chemical means often suffers from elaborated protection group chemistry and low yields [9]. Together, these obstacles motivated the development of alternative biocatalytic routes for production. To this end, a variety of different sugar isomerases and epimerases have been described in literature that catalyze the interconversion of different monosaccharides and that can be exploited for the biotechnological production of these compounds [8]. Alternatively, Izumori and coworkers have suggested the production of rare sugars by means of different polyol dehydrogenases that first reduce a certain hexose to the respective polyol, which is then oxidized by a second polyol dehydrogenase to form a hexose with a different configuration [10]. These enzymes however require NAD(H) as , complicating their utilization from an engineering point of view compared to the more simple iso- and epimerization reactions. The 24 possible hexose sugars can be depicted in the so-called sugar cube such that the 8 ketohexoses are localized in the center of the cube and are surrounded by the 16 aldohexoses (Figure 1.1). Each of the monosaccharide can conceptually be interconverted into the adjacent neighbor by a single isomerization (interconversion of aldose and ketose) or epimerization (inversion of a single hydroxyl-functionality) step, with the X-axis implying a change in the conformation of C3, the Y-axis a change in the conformation of C5 and the Z-axis a change in the conformation of C4. In praxi, enzymes are not available for all of the theoretically possible reactions. Nonetheless, a variety of different sugar isomerases and epimerases have been discovered by now that allow obtaining a set of 11 rare sugars starting from three readily available bulk sugars (D-fructose, D-galactose, L-sorbose) by combining a maximum of three iso- and epimerases. However, before rare sugar production via enzymatic reaction can be considered an economically viable option, there are two main obstacles that have to be overcome, which will be discussed below:

3 (I) the limitation on yield of reactions with unfavorable thermodynamic equilibrium and (II) the insufficient process parameters (thermostability, specific activity) of the biocatalyst under operational conditions.

a) C4

L-Gul L-Gal C3 C5 L-Ido L-Sor L-Tag L-Tal

D-Glu D-Fru D-Psi D-All

D-Man D-Alt

L-All L-Glc

L-Alt L-Psi L-Fru L-Man

D-Gal D-Tag D-Sor D-Gul

D-Tal D-Ido

Abundant monosaccharides

Rare monosaccharides

Enzyme Organism D-Tagatose epimerase A. tumefaciens [11], P. cichorii [12], .. L-Rhamnose isomerase T. maritima [13], B. halodurans [14], P. stutzeri [15], .. L- isomerase G. stearothermophilus [16], T. neapolitana [17], E. coli [18], .. L- isomerase P. stutzeri [15], Actinobacter sp. DL-28 [19] Ribose-5-phosphate isomerase C. thermocellum [20] D- T. neapolitana [21], S. murinus [22], … UDP-Galactose-4-epimerase E. coli [23] D-Arabinose isomerase B. pallidus [24] b) Abbreviation Name Abbreviation Name Abbreviation Name D/L-All D/L-Allose D/L-Glu D/L-Glucose D/L-Psi D/L-Psicose D/L-Alt D/L -Altrose D/L-Gul D/L-Gulose D/L-Sor D/L-Sorbose D/L-Fru D/L-Fructose D/L-Ido D/L-Idose D/L-Tag D/L-Tagatose D/L-Gal D/L-Galactose D/L-Man D/L-Mannose D/L-Tal D/L-Talose

4 c) d)

C1 O C1 OH C1 OH C1 OH

H OH O O O HO H HO H HO H H OH H OH H OH H OH H OH H OH H OH H OH H OH

OH OH OH OH D-Glucose D-Fructose D-Fructose D-Psicose e)

O O O OH OH OH OH

H OH H OH H OH O O O O HO H H OH HO H HO H H OH HO H H OH H OH H OH HO H H OH H OH H OH H OH H OH H OH H OH H OH H OH HO H HO H

OH OH OH OH OH OH OH D-Glucose D-Allose D-Galactose D-Fructose D-Psicose L-Sorbose L-Tagatose

Figure 1.1| a) Representation of interconversion routes for all 24 hexose monosaccharides in the sugar cube. The ketohexoses fructose, psicose, tagatose and sorbose (both in the L- and the D-configuration) are located in the center of the cube (yellow box) and can be epimerized at the C3 position by D-tagatose epimerase. The inlet in the upper left corner indicates the change of configuration of the respective C- atom of a ketohexose when going from sugar to sugar along the X- ,Y- or Z-axis. The 16 aldohexoses at the outside of the cube can be interconverted by a variety of isomerases from various organisms, some of which are listed above. b) Abbreviations of all 24 hexoses that are depicted in a). c) A typical isomerization reaction (interconversion of aldo- and ketohexose) is depicted, catalyzed for example by D- xylose isomerase, with the ketone and aldehyde group highlighted in red. d) An epimerization reaction catalyzed by D-tagatose epimerase is shown, with the carbon (C3) at which the hydroxyl configuration is changed highlighted in green. e) Fischer projection of the keto- and aldohexoses that are of most importance in this thesis.

1.3. Increasing the yield for enzyme reactions with unfavorable equilibrium position

The first limitation concerns an inherent physical property of all enzymatic isomerization or epimerization reactions, namely their limited yield as defined by the unfavorable position of the thermodynamic equilibrium. This leads to the termination of the conversion at a yield of around 50% and leaves a mix of two chemically similar compounds, which makes these reactions unattractive for industrial-scale applications [25]. An arguably elegant way to overcome this limitation is the continuous removal of the product from the reaction mixture, leading to the continuation of the reaction and thus to the efficient utilization of the substrate. Such an in situ product removal (ISPR) approach for rare sugar production can be best

5 implemented by coupling the reactor to a continuous chromatography unit that separates the product from the reactant and recycles the reactant back into the reactor module, allowing in principle to obtain a yield of 100 % (Figure 1.2a) [26]. Continuous chromatography in the form of simulated moving bed (SMB) chromatography has been proven to be an industrially viable process option for the separation of enantiomers and diastereomers, i.e. compounds with highly similar physicochemical properties that are difficult to separate by other means [27, 28]. SMB is based on the simulation of a counter-current between the mobile and the stationary phase by continuously switching the inlet ports (eluent and feed) and outlet ports (raffinate and extract) in direction of the mobile phase flow (Figure 1.2b, lower part).

a)

b)

Figure 1.2| Concept of integrating reaction and separation. a) Schematic representation of the coupling of reaction and separation. b) Setup of an integrated process for an equilibrium-limited reaction involving an isomerization or epimerization step and a subsequent 2-compound separation. Abbreviations: EMR, enzyme-membrane reactor; NF, nanofiltration; SMB, simulated moving bed chromatography. (Figures: courtesy of Nina Wagner)

6 The more strongly retained compound can thus be continuously collected at the extract port while the more weakly retained compound can be collected at the raffinate port. There have been only few examples in literature of processes that couple biotransformation and separation by continuous chromatography, and most of them are employed in the separation of glucose and fructose [25]. Many of these processes rely on the simple addition of the enzyme throughout the SMB unit either in immobilized form to the solid phase or in soluble form to the mobile phase. This concept of a “reactive SMB” (SMBR) has the disadvantage of equally catalyzing the reaction in all zones of the separation process, thus promoting the backward reaction in those zones where product is enriched and hence reducing the productivity of the system [25]. Therefore, a configuration where the reactor and the separation unit are detached and connected only by a loop (so-called “coupled SMB and reactor” concept or cSMB-R) allows operating each unit under optimal conditions. Such a setup can theoretically achieve 100% yield and has been calculated to be the best method of coupling reaction and separation if high purity (> 95%) of the product is required [29]. Figure 1.2b depicts a schematic representation of such an integrated process, consisting of an enzyme-membrane reactor (EMR) that is connected to a SMB. The loop is closed by a nanofiltration device that concentrates the diluted starting material before it re-enters the enzyme-membrane reactor in order to account for the inherent dilution of the raffinate stream.

1.4. Overcoming enzymatic limitations by directed evolution

The second limitation that has to be overcome for efficient rare sugar production concerns the biocatalyst itself. The reaction rates of the wild-type isomerases and epimerases are frequently low for the desired substrates (see also chapter 4 or [23]), which is not surprising considering that these rare hexoses can be assumed to be merely promiscuous substrates that are only accepted by the enzymes due to their high similarity with the natural substrate. Additionally, although the integration of reaction and separation for the production of rare hexoses appears highly attractive, the fact that both units are operated with essentially (in terms of physicochemical properties) the same liquid stream (Figure 1.2b) suggests that this integration inherently limits the range of conditions that can be applied to each of the units: the conditions applied to one unit must respect the requirements of the other, and vice versa. It was shown previously that such a tight interconnection can lead to trade-offs between enzyme activity and stability in the one unit and separation productivity and stationary phase stability in the other units due to different solvent requirements [30]. Given the powerful methods of enzyme engineering to change enzyme preferences with respect to reaction conditions [31], it seems an attractive process optimization route to adapt the biocatalyst to the requirements of the process.

7 For the present case of rare sugar production solvent incompatibility is no issue, as sugar separation can be performed efficiently on ion-exchange resins with water as solvent [32]. On the other hand, industrial sugar processes, exemplified by the industrial production of high- fructose corn syrup (HFCS, a mixture of glucose and fructose obtained from corn starch by hydrolysis, epimerization of the resulting glucose, and, in some cases, further enrichment of the fructose by SMB), are performed at very high substrate concentrations (> 3 M), requiring elevated temperatures (55°C - 60°C) to reduce liquid stream viscosity, increase the reaction rate, shift the thermodynamic equilibrium towards the product (fructose in the case of HFCS) as well as to prevent microbial growth [33]. Therefore, thermostability of a sugar-processing biocatalyst is crucial for the successful integration of reaction and separation. Additionally, a high specific activity for the substrate in question is essential to reduce the residence time in the reactor (correlated to the required size of the reactor itself), not to mention the reduction of the amount of required (often costly) biocatalyst.

1.5. D-Tagatose epimerase is the central enzyme for rare sugar production

In the present thesis the development of a D-tagatose epimerase enzyme towards an industrially suitable biocatalyst for the production of D-psicose and L-tagatose is described. D-Tagatose epimerase arguably represents the central enzyme in rare sugar production, catalyzing the interconversion of all eight ketohexoses (seeTable 1.1). In the literature, it is often also referred to as D-psicose epimerase because representatives of this enzyme often show strong substrate preference towards D-psicose [11]. For the sake of clarity however, the enzyme will be named D-tagatose epimerase (DTE) throughout this thesis, regardless of the substrate preference. A variety of DTE’s from different host organisms have been described in literature so far, but they all featured limited thermostability (Table 1.1). Thermostability is however central to the development of a biocatalyst in two aspects. First, thermostability is essential for the performance of the enzyme in the integrated process that is operated at high temperatures due to the reasons discussed above. Second, it has been described repeatedly that thermostable enzymes are a preferable starting point for directed evolution as they can tolerate more destabilizing but functionally beneficial mutations before the threshold is reached where correct folding is not possible anymore [34-36]. Thus, a thermostable enzyme is more robust in terms of operational performance in the process as well as in terms of its buffering capacity for destabilizing mutations. At the time when this thesis was started (May 2009), only two functional DTE homologs were described, namely from Agrobacterium tumefaciens [11] and Pseudomonas cichorii [12].

8 Preliminary experiments with both enzymes revealed that DTE from P. cichorii (PcDTE) exhibited more thermostability and was thus chosen as starting point for enzyme engineering.

9 Table 1.1| D-Tagatose 3-epimerases from different organisms -1 c) c) Organism Optimal Size Metal ion Half-life Substrate Temp. for enzyme kcat [s ] Km [mM] Ref. pH [aa] a) preference specificity b) kinetics C. bolteae 7 291 Co2+, Mn2+ 156 min (55°C) D-Psicose 55°C 59 59.8 [37] Desmospora sp. 7.5 290 Co2+, Mn2+ 120 min (50°C) D-Psicose 60°C 1’060 549 [38] C. scindens 7.5 289 Mn2+ 108 min (50°C) D-Psicose 60°C 350 40.1 [39] Ruminococcus sp. 8.0 291 Mn2+ 96 min (60°C) D-Psicose 60°C 59.4 216 [40] C. cellulolyticum 8.0 293 Co2+ 408 min (60°C) D-Psicose 55°C 55.9 53.5 [41] Clostridium sp. 8.0 292 Co2+ 15 min (60°C) D-Psicose 65°C 273 279 [42] R. sphaeroides 9.0 295 Mn2+ 180 min (45°C) D-Fructose N.R. N.R. N.R. [43] A. tumefaciens 8.0 289 Mn2+ 4 min (60°C) D-Psicose 50°C 34.5 24 [11] P. cichorii 7.0 – 9.0 290 Mn2+ N.R. / N.D. D-Tagatose 30°C 23.5 40 [12], this thesis a) aa: amino acids b) Substrate with highest kcat/Km c) Both values for D-fructose as substrate N.R. not reported N.D. not determined

1.6. Scope of this thesis

In chapter 2, an overview about the current progress in directed evolution of biocatalysts is presented, with a special focus on screening and selection methods for the development of biocatalysts that are employed in industrial settings. The advantages and disadvantages of the different methods are discussed and a prognosis about the future development of the field is given. Although PcDTE showed higher thermostability than its homolog from A. tumefaciens, its stability was still considered as bottleneck for the process as well as for directed evolution of PcDTE towards higher specific activity for different ketohexose substrates. Chapter 3 therefore describes the directed evolution of a PcDTE variant with greatly increased thermostability by improving the subunit-subunit interaction of the dimeric PcDTE. This thermostable variant then served as template for the divergent directed evolution of specific activities towards two different ketohexoses, D-fructose and L-sorbose. The directed evolution of these two enzyme variants is described in chapter 4, together with a thorough analysis of the catalytic parameters and the crystal structures of the different variants. Based on these structures, the foundation of the observed improvements in specific activity is discussed. Chapter 5 is dedicated to the characterization of operational parameters of the evolved variants for their respective substrates and these performance parameters of the final variants are compared to their precursor variant and the wild-type. Finally, the development of an in vivo selection system for directed evolution of PcDTE and L-rhamnose isomerase (LRI) is presented in chapter 6, before a summary of the results is described in chapter 7.

11

12 CHAPTER 2: SCREENING AND SELECTION METHODS FOR THE DIRECTED EVOLUTION OF INDUSTRIALLY RELEVANT ENZYMES

Andreas Bosshart and Sven Panke

13 2.1. Abstract

Production of fine and bulk chemicals by means of enzymes has gained pace in the last years, which is, amongst other reasons, due to the successful tailoring of natural enzymes to industrial production conditions by means of directed evolution. Arguably, the most important bottleneck in the development of an industrial biocatalyst by directed evolution is still the screening of the often vast number of variants that need to be generated during the directed evolution process. In this review we discuss the development of screening and selection assays that have been reported recently, with an explicit focus on assays for the development of industrially relevant enzymes. We discuss different assays separated according to the localization of the assayed reaction (in vivo and in vitro assays) on the one hand, and according to the analysis method of the library (screening and selection assays) on the other hand. Further, a simplified decision tree is presented that enables the choice of a suitable screening or selection strategy based on the enzymatic reaction that is under investigation. Finally we conclude by providing our view of three developments in directed evolution that we expect to become crucial in the future for the directed evolution of industrial biocatalysts.

2.2. Introduction

Biocatalysis – the use of enzymes as catalysts in synthetic chemical reactions - has become an important tool in the green manufacturing of pharmaceuticals, fine chemicals (defined as the manufacturing of drug intermediates (such as para-hydroxyphenylglycine for second generation beta-lactame antibiotics) and large scale drugs (such as penicillin G)) and bulk chemicals (such as acrylic acid), as well as in the field of food processing and fuel production [44-46]. This progress can be attributed to a variety of factors, like improvements in gene synthesis, the availability of a wealth of DNA sequences from uncultivable organisms in metagenome databases, improvements in biocatalyst immobilization and new bioinformatics tools. The most important advance however is a rapid development in the field of directed evolution of industrial biocatalysts that has taken place in the last years and shall be recapitulated in the following. Bornscheuer et al. suggested that the development of biocatalysts proceeded in three waves [44]. Accordingly, the first wave comprised the utilization of whole cells or isolated wild-type enzymes for the production of chemicals. In the second wave, enzymes were engineered on a rational basis to tailor them for their specific applications. Finally, in the third wave directed evolution started to provide the basis for the extensive modifications of biocatalysts that are necessary to generate an economically viable industrial biocatalyst. Directed evolution of biocatalysts is a version of Darwinian evolution that takes place in vitro. Diversity is introduced into a gene sequence either randomly or restricted to selected sites, and this library of variants is subsequently screened (many variants are actively analyzed and then

14 the best is chosen according to a previously chosen performance criterion for application or as the basis for another round of diversification) or selected (the analysis is replaced by coupling function to the survival of a cell that is expressing the diversified gene, so only few (and functional) protein variants from surviving cells have actually to be analyzed) for variants with improved characteristics. This cycle of mutagenesis and selection/screening is repeated until a sufficient improvement of the biocatalyst is reached (Figure 2.1). Arguably the main challenge of using enzymes for the synthesis of (fine) chemicals is to accommodate them rapidly to the very different environments that exist in industrial settings compared to their natural environment inside, or in selected cases such as secreted enzymes, outside the cell. Industrial enzymes often have to cope with elevated temperatures, high substrate (and product) concentrations, organic solvents, and/or high or low pH [47].

Best variant(s)

Final variant

Screening / Diversity Selection generation

Point-mutation library

Recombination library

Figure 2.1| Principle of directed evolution, consisting of a diversity generation step and the screening or selection of the resultant variants. Point mutations are single mutations that are introduced into a parent sequence either randomly (e.g. by epPCR) or at selected positions (e.g. saturation mutagenesis), whereas a recombination library consists of novel combinations of homologous sequences (e.g. by DNA shuffling).

All these challenges can be addressed by directed evolution and numerous examples have been reported on the improvement towards single enzymatic traits as e.g. specific activity [48], thermostability [49] and stability against organic solvents [50]. However, directed evolution for an industrial biocatalyst is in most instances a multi-objective optimization, requiring the improvement of several enzymatic traits at once in order to develop an economically viable biocatalyst [47], and the screening or selection method ideally has to take these requirements into account in order to direct the biocatalyst evolution by applying the appropriate selection pressure. Where this is not possible, the resulting hits after each round of directed evolution

15 towards one property have to be carefully monitored for their performance in the other properties, as, according to Arnold’s “First Law of Directed Evolution”: “You get what you screen for”[51], and focusing the screening or selection on one property can rapidly lead to loss of performance in another. On the other hand, it has been shown that also a tiered approach, in which a first round is done under conditions which are distant to the ultimate reaction conditions and then subsequent rounds impose conditions which are increasingly similar, can result in an excellent biocatalyst. This was demonstrated for the evolution of a halohydrin dehalogenase by a team at Codexis [52]. They used an agar plate-based high-throughput screening step as a first tier to remove inactive variants, followed by a medium-throughput screening step (second tier) that analyzed the discovered hits from the first tier quantitatively. The screening condition of the second tier already resembled the final industrial conditions more closely, whereas in a third tier up to five hits from the second tier were tested under real preparative-scale reaction conditions. Such a tiered approach hence allows the utilization of high-throughput screening or selection methods that are not suitable as exclusive assays in a first round because they exhibit only a two-state phenotype (dead/alive or highly active/weakly active) or do not necessarily reflect the final operational condition [47]. In this work, the different technologies of library analysis are roughly divided along two different lines. The first distinction is made whether the enzyme under investigation is analyzed inside a cell (in vitro) or in a cell-free environment (in vivo). Secondly, a distinction between “screening” and “selection” is made (Figure 2.2). Screening

In vitro Reporter- compartmentalization based assays Agar plate- (IVC) based assay Microtiter plate-based Cell-in-droplet assays Cell as microreactor Instrument-based assays (HPLC, GC, MS,..) Cell-surface display In vivo In vitro Chemical complementation Phage display Genetic Ribosome complementation display

Selection

Figure 2.2| Map of different screening and selection methods for directed evolution of biocatalysts, located along the two axes in vivo/in vitro and screening/selection.

16 This review will focus on the development of screening and selection assays that are of relevance for the directed evolution of industrially relevant biocatalysts (summarized in Table 2.2). Some screening or selection methods that have not yet been utilized for the development of an industrial biocatalyst but that are supposed to become relevant in the future for this purpose are covered as well. For a broader discussion of directed evolution in general, also covering proteins that are not immediately relevant for industrial biocatalysis, the reader is referred to other recent reviews on the topic [53-56].

2.3. Analysis method and library size

In most instances, development of a biocatalyst starts with the definition of a reaction that needs to be catalyzed and for which already one or several enzymes are known that catalyze the reaction. If no enzyme is available to start with, de novo enzyme design can be envisaged. However this is still not a straight-forward procedure and success rate is often low [57]. Alternatively, the substrate binding pocket of an enzyme already containing the essential catalytic machinery but not accepting the desired substrate (e.g. due to steric reasons) can be engineered (using computational tools and/or substrate walking) to accept the new substrate [58]. This engineered variant, exhibiting poor but measurable activity on the substrate in question can then be used as template for directed evolution cycles to improve the specific activity. In a similar manner, the broad specificity of certain enzymes () that catalyze a desired reaction with low turnover numbers can be harnessed as starting point for directed evolution [59]. The reaction constrains the potential screening or selection assays that can be used and in this way already indicates the best library generation strategy. If for example the desired reaction generates a product that can complement an auxotrophy in a screening strain (in most instances Escherichia coli, but Thermus strains, Bacillus subtilis, or yeast have also been used), an in vivo selection system suggests itself, at least as first step in a tiered approach, which then allows the screening of large libraries as e.g. generated by epPCR. Large libraries are generally advantageous to target a greater area of the protein sequence space or to discover mutations at positions that cannot be inferred from the protein structure or the catalytic mechanism [60]. It might be also necessary to resort to random mutagenesis methods (creating inevitable large libraries) if no structural or phylogenetic information are available for the enzyme. In contrast, if the reaction product can only be detected by HPLC or GC instruments, the library size that can be screened is restricted due to the limited throughput of these methods, suggesting a “smart” library that has a high rate of hits per screened variants, but where variation is limited. Libraries are considered as smart if they are targeted to specific sites of an enzyme based on structural, evolutionary or mechanistic knowledge [53]. Additionally, the final conditions of the reaction in the industrial setting have to be considered, often requiring a tiered approach with a secondary

17 medium-throughput, instrument-based screening step that allows varying e.g. temperature, organic solvent concentration or substrate concentration [47]. Figure 2.3 depicts a decision tree that directs the choice of screening/selection assays and diversity generation. For biotransformation reactions that permit several screening or selection strategies a tiered approach can be implemented, according to Figure 2.4.

18 NO Substrate can enter the cell YES YES Product is toxic NO YES Toxic substrate can be In vivo growth- neutralized by enzyme based selection NO An auxotrophic strain YES Product can serve as YES for the product exists / metabolic intermediate can be easily in a cell engineered Throughput NO NO Substrate can be coupled to a metabolic YES intermediate that is cleaved off by the enzyme NO YES In vivo FACS- Substrate is fluorogenic based screening NO

Fluorogenic substrate YES analog exists NO

Product can serve as YES A transcriptional YES inducer for in vivo regulator exists for the reporter-based assay product NO A riboswitch exists for YES the product NO NO An aptamer exist that YES binds the product / it can easily be generated NO

In vitro compart- mentalization Substrate or substrate YES analog exists that is fluorogenic Cell surface NO display

Reaction produces YES Screening on agar colored or pH-active plates product

NO

Reaction is NAD(P)- In vitro screening dependent or product is YES a substrate for an in microtiter enzyme-coupled plates reaction NO In vitro screening by instrument

Figure 2.3| Simplified decision tree, directing the choice of a suitable screening or selection assay based on the catalyzed reaction. Abbreviations: FACS, fluorescence activated cell sorting; HPLC, high performance liquid chromatography; GC, gas chromatography; MS, mass spectrometry; NMR, nuclear magnetic resonance.

19 2.4. Diversity generation – random or focused mutagenesis

Introducing diversity into a protein sequence has made great progress in the last decades and a protein engineer has nowadays a broad range of different methodologies at hand for this task. For a very detailed discussion on library generation for directed evolution the reader is referred to a review from Schwaneberg and coworkers [61]. The diversification method that is suitable for a certain protein largely depends on the screening or the selection step that is available in screening later on (see above). One of the most often used random mutagenesis methods is error-prone PCR (epPCR) that introduces errors either by amplifying a gene using a low-fidelity DNA polymerase, by the addition of Mn2+ ions or by using unbalanced dNTP ratios [62]. Similar results can be obtained by in vivo introduction of variability either by employing a mutator strain that lacks certain DNA repair enzymes or a strain that contains a highly inaccurate DNA polymerase I [63]. All these random mutagenesis methods can theoretically generate huge libraries by targeting the complete protein sequence, and thus lead to very low frequencies of beneficial mutations (10-3) and > 30 % of deleterious mutations [64]. DNA shuffling can then be used to combine beneficial mutations and concomitantly purge deleterious mutations that may have been co-selected with the beneficial mutations due to genetic linkage. It is based on in vitro homologous recombination of highly similar DNA sequences, which may be derived from a pool of mutants from a single gene (single gene shuffling) [65], but may also involve homologous sequences from different organisms (family shuffling) [66]. It is done either by fragmentation of the homologous sequences by DNaseI digestion and subsequent reassembly as pioneered by Stemmer [65] or by staggered extension PCR (StEP) as developed by Arnold and coworkers [67]. Shuffling can also be used as “stand- alone” method for the introduction of diversity as it generates a low frequency of point mutations. Incorporation of synthetic oligonucleotides (encoding degenerated codons for certain amino acid positions) into the re-assembly reaction can finally extend the scope of shuffling towards site-directed library design [68, 69]. Saturation mutagenesis (also termed cassette mutagenesis) represents an example of a targeted mutagenesis approach. One or several selected residues are randomized using oligonucleotides encoding for degenerated codons. The randomization of the sites are introduced into the protein sequence encoded on a plasmid using the QuikChange protocol from Stratagene [70] or modifications thereof [71]. If all 20 possible amino acids should be encoded at a certain site, they can be introduced via an oligonucleotide containing a fully (NNN, N=any of the four possible nucleotides, i.e. A, C, G, or T, 64 possible codons encoding all possible 20 proteinogenic amino acids (aa)) or partially degenerated codons (e.g. NNK, K = G/T, gives access to 32 codons encoding all 20 aa’s, NDT , D = A/G/T, 12 codons giving access to 12 aa’s, or RNG, R = G/A, 8 codons giving access to 8 aa’s). Reetz and coworkers showed repeatedly that a

20 restricted amino acid alphabet (encoded by NDT or RNG codons) can result in great improvements in terms of enantioselectivity [72-74], although only a subset of all possible 20 proteinogenic amino acids was allowed at the selected positions. It can restrict the size of the screening problem significantly, especially if multiple positions are randomized simultaneously and nearly full coverage is sought (Table 2.1) [75, 76]. Reetz and coworkers established a very useful tool termed CASTER [75] that helps to assess the library size that needs to be screened for 95 % coverage, dependent on the number of sites that are addressed simultaneously and the degeneracy that is chosen for each of these sites. If for example a ultrahigh-throughput screening method is available that can process a library size of 108 variants, 3 sites could be randomized simultaneously using NNN degenerated codons, 5 sites could be randomized simultaneously using NNK degenerated codons whereas already 7 residues can be randomized simultaneously for NDT degenerated codons and even 8 sites for RNG degeneracy. If on the other hand 3 sites need to be targeted simultaneously, NNN degeneracy would require a high- throughput screen (8 x 105 variants) whereas a medium-throughput screening method would suffice for NDT and RNG degeneracy (< 104 variants) (see Table 2.1).

Table 2.1| Number of colonies to screen for 95 % coverage (~3-fold oversampling) as a function of different codon degeneracies (adapted from [76]) No. of targeted NNN (64 codons, NNK (32 codons, NDT (12 codons, RNG (8 codons, 8 sites 20 aa) 20 aa) 12 aa) aa) 1 192 96 36 24 2 1.2 x 104 3068 431 192 3 7.9 x 105 9.8 x 104 5177 1534 4 5.0 x 107 3.1 x 106 6.2 x 104 1.2 x 104 5 3.2 x 109 1.0 x 108 7.5 x 105 9.8 x 104 6 2.1 x 1011 3.2 x 109 9.0 x 106 7.9 x 105 7 1.3 x 1013 1.0 x 1011 1.1 x 108 6.3 x 106 8 8.4 x 1014 3.3 x 1012 1.3 x 109 5.0 x 107

Saturation mutagenesis has been shown to be more efficient than random mutagenesis approaches in terms of hits per screened variant [77-79] but it relies on structural information of the protein under investigation, which is not always available. However, with the advent of high-throughput protein crystallization [80, 81] as well as more powerful three-dimensional structure prediction algorithms [82], this represents decreasingly important limitation. The maximum library size that can be obtained by any method is mainly restricted by the transformation efficiency of the host that is used for screening or selection, with the obvious exception of in vitro selection methods. In most cases, either E. coli or S. cerevisiae are used as hosts for expression of the enzyme mutants, with a maximum library size of 109 transformants for E. coli that can be obtained with reasonable effort [55, 83], whereas S. cerevisiae is slightly less efficient [55].

21 2.5. Screening or selecting for improved biocatalysts

The identification of an improved biocatalyst from a mutant library is arguably the most difficult and clearly the most laborious part in directed evolution [55, 84, 85]. The decision on the strategy for evaluation of the library is largely based on the properties of the substrate and/or the product of the biotransformation (see Figure 2.3). Products of enzymatic reactions that can complement an auxotrophy of a host strain might be suitable for in vivo selection, whereas for other biotransformation reactions (e.g. lipases or ) it might be possible to use a fluorogenic substrate whose conversion can be easily detected via fluorescence- activated cell sorting (FACS) or in microtiterplates using a platereader (see [86] and references cited therein). An overview of the different assays is given in Table 2.2, listing advantages and disadvantages of the different strategies.

Assay: In vitro selection In vivo selection Random FACS-based screening Mutagenesis Microfluidics (in vivo and in vitro) Agar-plate (in vivo)

Microtiter plate (in vivo and in vitro) “Smart” Libraries HPLC, GC (in vitro)

Agilent Agilent 1100 Capillary LC ChemStation System Figure 2.4| The screening funnel, with the library size that can be processed by the respective assay. Smart libraries are especially useful for assays with limited throughput, but can be used as well for (ultra)high-throughput assays. Abbreviations: FACS: fluorescence activated cell sorting; HPLC: high performance liquid chromatography; GC: gas chromatography.

22 Table 2.2| Comparison of different screening and selection technologies Significance for Library Strategy Advantage Disadvantage industrial Examples and references size biocatalysts Genetic/chemical ~109 . Can process very large libraries . Restricted to certain reactions medium Delta(9)-16 :0-acyl carrier protein complementation . Simple to operate . Higher risk of false positives (ACP) desaturase [87], chorismate . Low dynamic range mutase [88], lipase A [89] Agar plate screen ~105 . Simple to operate . Low dynamic range medium - high Monoamine oxidase [90], . Difficult to quantify improvement glycosynthase [91], . Limited throughput cephalosporinase [92], halohydrin dehalogenase [52] Cells as microreactors ~108 . Can process large libraries . Only fluorescent detection possible to date medium Glycosyltransferase [93], esterase . Few reagent needed due to femto- . Substrate has to enter the cell, but product must stay inside [94] liter scale droplets the cell . Relatively easy to operate (compared to IVC or cell-in-droplet) Cell-in-droplet ~108 . Can process large libraries . Only fluorescent detection possible yet medium Thiolactonase [95], β-glucosidase . Few reagent needed due to femto- . Substrate must enter the cell [96], glucose oxidase [97] liter scale droplets . Product must not leave the droplet . Technically challenging Cell surface display ~108 . Can process large libraries . Product must attach to cell surface medium Horseradish peroxidase [78], . Substrate does not need to enter . Enzyme needs to be displayed at cell surface esterase [98] the cell . Only fluorescent detection possible yet Display technologies ~1012 . Can process very large libraries . Selection based on binding, not on catalytic turnover events low Lipase A [99], amylase [100] . Transition-state analog or suicide inhibitor necessary as bait . Technically challenging In vitro ~1010 . Can process very large libraries . Only fluorescent detection possible yet medium (high if Phosphotriesterase [101], compartmentalization . No cloning step necessary . Technically very challenging detection can go Sulfatase [102] . Free access of substrate to enzyme . Product must stay in droplet beyond fluorescence) Enzyme-coupled ~104 . Simple to operate . Low throughput high Cytochrome P450 [103], terpene reactions in microtiter . Large dynamic range . Not all reactions can be coupled synthase [104], plate . Free access of substrate to enzyme Instrument-based ~104 . Excellent dynamic range . Low throughput very high γ-Humulene synthase [105], assays . Virtually all possible products can halohydrin dehalogenase [52], be detected amine transaminase [58], D- . All possible environmental tagatose epimerase [106] parameters can be chosen freely

2.6. Selection and screening

In the present text, the distinction between “selection” and “screening” will be made according to the suggestion established by Hilvert and coworkers [107], who propose that screening involves active evaluation of all members of a library of mutants towards a desired trait, making this kind of screening a very laborious endeavor (Figure 2.5). The screening effort can already be reduced by several orders of magnitude if facilitated screening is possible. In this case, a beneficial enzyme variant causes a certain “phenotype” of the host organism, and only these variants are then considered later for detailed analysis. In contrast, selection directly links the desired enzyme property to host survival such that only “fit” variants survive whereas all others disappear from the diversity pool. a) b) c)

Figure 2.5| Strategies for finding improved variants in vivo. a) Random screening: each clone has to be evaluated individually by a screening method (e.g. microtiter plate based assay). b) Facilitated screening: improved variants can be identified by their phenotype (red color in this example). c) Selection: only cells that carry an improved variant can survive under the given conditions.

2.7. In vivo selection

In vivo selection is based on the ability of an enzyme variant to confer a growth advantage to the host cell and can easily handle very large libraries up to 109 variants [56]. Accordingly, the main advantage of selection is its ability to address library sizes that are not, or only hardly amenable by other in vivo means. Two different selection principles have been described in literature, namely the complementation of an auxotrophy or the neutralization of lethal conditions by the enzyme under investigation. Therefore the selection of enzyme variants in vivo is limited to enzymes that catalyze a reaction which fulfills one of these two criteria. In the following paragraph some pioneering in vivo selection approaches for the directed evolution of enzymes are presented, with a focus on industrially relevant reactions.

2.7.1. Selection by genetic complementation

Genetic complementation that eliminates an auxotrophy has historically been a method of geneticists for the elucidation of biosynthetic pathway [108]. To be of use in directed evolution, an essential gene of a host organism has to be knocked out so that an auxotrophy for a certain

24 nutrient is created that can then be complemented by a functional variant of an enzyme library. Hilvert and coworkers used genetic complementation extensively for the study of chorismate mutase, an enzyme that catalyzes the first step in the synthesis of the aromatic amino acids tyrosine and phenylalanine [109]. They constructed an E. coli host that is deficient in both bifunctional E. coli chorismate mutases pheA and tyrA (chorismate mutase/prephenate dehydratase and chorismate mutase/prephenate dehydrogenase, respectively) and supplied monofunctional variants of either enzyme from other organisms on a helper plasmid [107, 109]. In the absence of tyrosine and phenylalanine cells can grow only when a functional version of chorismate mutase is introduced (Figure 2.6). Using this selection system they were able to study the catalytic mechanism of chorismate mutase [110], to improve the catalytic activity of an engineered monomeric chorismate mutase [111], and to evolve a highly active trimeric chorismate mutase from an engineered hexameric variant with low activity [88]. They established valuable tools for the stringent in vivo control of the evolutionary pressure on the library variants, e.g. by reduction of the intracellular enzyme level by fusing a degradation tag to the C- [88] or N-terminus [112] of the chorismate mutase. Although chorismate mutase is not an important enzyme in terms of industrial application, the tools that were developed are supposed to become useful for the directed evolution of biocatalysts with greater applied relevance.

O TyrA PheA OH NH PheC 2 O L-phenylalanine - CO - 2 O2C Chorismate - mutase CO2

- O CO2 OH OH TyrA

(-)-Chorismate Prephenate O

OH NH HO 2 pKIMP- L-tyrosine UAUC E. coli KA12

Figure 2.6| In vivo selection system for functional chorismate mutase variants. An engineered E. coli strain (KA12) has been created that lacks the bifunctional enzymes chorismate mutase/prephenate dehydrogenase (tyrA) and chorismate mutase/prephenate dehydratase (pheA). Monofunctional variants of either enzyme (tyrA from Erwinia herbicola and pheC from Pseudomonas aeruginosa) are introduced on the helper plasmid pKIMP-UAUC. In the absence of tyrosine and phenylalanine in minimal medium, only E. coli KA12 cell encoding a functional chorismate mutase can grow.

A selection system for such an industrially relevant enzyme was described by Quax and coworkers [89] who engineered the enantioselectivity of lipase A (LipA) from B. subtilis for the production of enantiopure 1,2-O-isopropylidene-sn-glycerol (IPG) that is a precursor in the synthesis of β-adrenoceptor antagonists. They engineered an E. coli strain for aspartate

25 auxotrophy, transformed it with a lipA library generated by cassette mutagenesis and plated the resulting cells on selective minimal medium containing an aspartate ester of (S)-(+)-1,2-O- isopropylidene-sn-glycerol ((S)-(+)-IPG aspartate) and a phosphonate ester of the undesired D- enantiomer. The aspartate auxotrophy of the host strain could then only be complemented by those lipA variants that were selective for the L-ester, which was hydrolyzed in the periplasm, whereas less enantioselective variants were covalently inactivated by the competing phosphonate substrate. After three rounds of selection a variant was obtained with an improved enantioselectivity towards the (S)-(+)-enantiomer from an enantiomeric excess (ee) of -29.6 % to an ee of +73.1 %. Another directed evolution experiment by means of genetic complementation with relevance for industrial biotechnology was presented by Quax and coworkers who engineered a glutaryl acylase from Pseudomonas SY-77 for the conversion of adipyl-7- aminodesacetoxycephalosporanic (adipyl-7-ADCA) to 7-ADCA [113]. This compound is an important intermediate for the synthesis of semisynthetic cephalosporin antibiotics. A glutaryl acylase library generated by epPCR was selected in a leucine-auxotroph E. coli host with adipyl- leucine as the sole leucine source. Only glutaryl-acylases that could deacylate the adipyl-leucine substrate were able to complement the leucine auxotrophy of the host strain and grow on minimal medium. They could isolate a glutaryl acylase variant with a modest 50 % increase in kcat for the real substrate adipyl-7-ADCA and a 6-fold decrease in Km. An example that uses a pyruvate auxotroph E. coli strain for the directed evolution of 2-keto-3- deoxy-6-phosphogluconate (KDPG) aldolases from E. coli or T. maritima has been published recently [114]. This E. coli selection strain has a knockout of pyruvate kinase genes pykA and pykF leaving it unable to grow on D-ribose as sole carbon source unless pyruvate is added to complement the auxotrophy. Next, a randomized T. maritima 2-keto-3-deoxy-6- phosphogluconate (KDPG) aldolase library was generated by epPCR, used to transform the auxotroph E. coli strain, and cells that could complement the pyruvate auxotrophy by cleaving the unnatural substrate 2-keto-4-hydroxyoctonoate (KHO) into an aldehyde moiety and pyruvate were selected. This way, variants with up to 25-fold improved catalytic efficiency

(kcat/Km) compared to wild-type could be isolated. One of the most successful examples of directed evolution by selecting biocatalysts in vivo has been reported by Whittle and Shanklin who improved the specific activity of a Δ9-16:0-acyl carrier protein (ACP) desaturase towards a 16-carbon substrate by 35-fold by combinatorial saturation mutagenesis of six amino acids [87]. They employed an E. coli host that was auxotroph for unsaturated fatty acids and transformed it with > 106 library variants of a castor desaturase, allowing only growth of E. coli cells that carried a desaturase variant that could accommodate 16-carbon substrates and hence complement the auxotrophy. Not only enzyme activity, substrate specificity or enantioselectivity are amenable to in vivo selection, but also thermostability has been reported to be selectable by an in vivo approach.

26 Schwab and Sterner have described selection of the engineered monomeric anthranilate phosphoribosyl in the thermophilic host T. thermophilus that is deficient for this enzyme [115]. Enzyme variants from an epPCR library that could complement the growth deficiency of the thermophilic host at 79°C were isolated and resulted in a final variant that

app showed an increase in the apparent melting temperature TM of 13°C, from 70°C to 83°C.

2.7.2. Significance

In vivo selection methods have been shown to be useful for improving or investigating a variety of different enzymatic traits. They are a good method for the screening of very large libraries and as such could be employed as a first screening step in a tiered screening approach. However, they are confined to enzymatic reactions that can alleviate an auxotrophy of a host cell or provide resistance against an otherwise lethal compound, restricting their application to a narrow set of reactions. It has also been reported that selection for growth results in more false positives compared to screening [54] which might be due to the complexity of the cell metabolism, the redundancy of certain metabolic cascades or the substrate promiscuity of many E. coli enzymes [116]. In summary, the limitations are severe enough that up to now and to the best of our knowledge, no industrially used biocatalyst has been evolved using an in vivo selection assay.

2.8. In vivo screening

In contrast to selection, the screening of enzyme variants in vivo extends the scope of amenable reactions greatly. In principle all enzymatic reactions that produce a colored or fluorescent product are easily accessible to in vivo screening, as well as reactions that produce a product that can act as a transcriptional regulator and turn on the expression of a fluorescent reporter gene (e.g. GFP, RFP) (Figure 2.7). With the advent of fluorescence-activated cell sorting (FACS) this approach has become increasingly important as it can handle library sizes that are in the range of in vivo selection [117]. For a detailed and comprehensive overview on recent examples of reporter-based assays and screening by FACS the reader is referred to recent reviews [54, 93]. In the present text the main focus will be on methods that have proven valuable for the directed evolution of industrially relevant biocatalysts. Please note that the categorization of the different in vivo screening and selection methods is not absolute. If for example a product triggers the expression of a gene that resolves an auxotrophy, the method is referred to as ‘in vivo selection’, whereas the triggering of a fluorescent protein by the same product would be categorized as ‘reporter-based assay’ and thus qualify as ‘facilitated screening’.

27 a) b) c)

d) f)

Enzyme under investigation Library Substrate (Fluorescent) product

Oil Fluorescent protein

Figure 2.7| Different in vivo screening strategies for the directed evolution of biocatalysts. a) Cells as microreactors: a substrate can enter the cell and is transformed to a fluorescent product by the investigated enzyme. b) Cell surface display: enzymes are displayed on the surface of cells, the substrate is transformed to a (fluorescent) product that attaches to the cell (covalent/nocovalent). c) Reporter- based assay: a substrate can enter the cell and is transformed to a product that can induce the expression of a reporter gene. d) Cells-in-droplets: cells are encapsulated into small emulsion droplets together with the substrate that enters the cell and gets transformed to a fluorescent product. This product can leave the cell by diffusion but is captured within the droplet.

2.8.1. Agar plate-based screening

Facilitated screening on agar plates is one of the most straightforward assays for screening a variety of different biocatalysts that can produce a colored or fluorescent product or release a (by-)product that changes the pH and can be detected via a pH indicator in the agar. Colonies from agar plates can also be transferred to a membrane filter, where colonies can be lysed and checked for enzymatic activity using enzyme-coupled reactions that generate a colored product while leaving a living replica of the strain behind [118]. Alexeeva et al. have described the directed evolution of a type II monoamine oxidase from Aspergillus niger (MAO-N) that was evolved for the oxidation of the L-enantiomer of α-methylbenzylamine (L-AMBA) [90]. Amine oxidases release hydrogen peroxide as a byproduct which can be readily used as a substrate for a peroxidase together with 3,3’-diaminobenzidine (DAB) to color colonies with active enzyme variants in dark pink. They could improve both the enantioselectivity (ca. 100:1 L- over D-AMBA, compared to 17:1 for the wt) and the catalytic activity (47-fold higher kcat) of the enzyme for the substrate L-AMBA. A different example of an agar-based screening approach has been described by Withers and coworkers who evolved a glycosynthase towards higher activity [91]. Glycosynthases are derived from glycosidases and are engineered to have considerably reduced hydrolytic activity

28 and correspondingly higher glycosyl transfer activity towards sugar acceptors, making them promising biocatalysts for the synthesis of complex oligosaccharides [119]. They employed an “on-plate” coupled enzymatic assay to evolve an endo-cellulase into a glycosynthase in which a fluorescent umbelliferone is released from the acceptor sugar if an activated donor sugar is covalently attached. Screening an epPCR library of a previously engineered glycosynthase from Agrobacterium sp. in the first round and a saturation mutagenesis library at identified hot- spots in the second round they were able to increase the catalytic efficiency (kcat/Km) up to 27- fold. A different in vivo screening (or selection) system has been developed by Cornish and coworkers. It is based on a three-hybrid complementation assay that can be used to assay enzyme-catalyzed bond cleavage or bond formation reactions by linking the catalytic activity to the transcription of an essential gene (in vivo selection) or a reporter gene (e.g. gfp or lacZ) (in vivo screening) [108]. In a proof-of-principle experiment they could show that a wild-type cephalosporinase (cleaving methotrexate-cephem-dexamethasone (Mtx-Cephem-Dex) at the β-lactam bond) could be separated from inactive variants using a lacZ screen (Figure 2.8) [92].

B42 B42

Dex S NH

Dex HO NH O S NH HO - S N SH O O GR O Mtx - Mtx O O GR DHFR cephalosporinase DHFR

LexA LexA LexA lacZ lacZ LexA binding site Figure 2.8| Principle of chemical complementation. Cephalosporinase-mediated cleavage of methotrexate-cephem-dexamethasone (Mtx-Cephem-Dex) results in the dissociation of a LexA DNA- binding domain-dihydrofolate reductase (LexA-DHFR) fusion protein and a B42 activation domain- glucocorticoid receptor (B42-GR) fusion protein, leading to the shut-down of lacZ transcription. The β- lactam ring is highlighted in green.

An agar plate-based method that proved to result in a real industrial biocatalyst was developed by Codexis for the first-tier screen of a halohydrin dehalogenase (HHDH) [52]. A HHDH library was used to transform E. coli and colonies were transferred from an LB-agar plate into a 384- well master plate containing liquid medium. Cells were grown and then replicated onto a nylon membrane, on which they were grown again on LB-agar and HHDH production was induced. These colonies were then transferred onto a low-melting agar plate containing the substrates ethyl(S)-4-chloro-3-hydroxybutyrate (ECHB) and the product ethyl (R)-4-cyano-3- hydroxybutyrate (HN), together with the pH-indicator bromocresol purple. Conversion of ECHB to the epoxide by HHDH resulted in the release of HCl that can be detected by a color change of the pH indicator in the soft agar. 29 2.8.2. Cells as microreactors and cells in droplets

If a fluorogenic substrate is available that can enter the cell, if it is transformed to the fluorescent product by the enzyme under investigation and if it is retained in the cell, cells can be utilized as ultimate microreactors with femto-liter volumes. The fluorescently labeled cells can be analyzed by FACS and the most fluorescent cells (corresponding to the most active enzyme variants) can be isolated and analyzed further. A glycosyltransferase was engineered by the group of Withers and coworkers using an ultrahigh-throughput FACS-based screening method [93]. The methodology consists of fluorescently labeled acceptor-sugars that are imported into the E. coli cell via a sugar-transport protein (permease). The cells express variants of a β-1,3-galactosyltransferase (CgtB) from Campylobacter jejuni that can modify the acceptor sugar, which leads to a fluorescent product (a fluorescently labeled oligosaccharide) that is entrapped in the cell as it is too polar to pass the cytoplasmic membrane and no longer recognized by the sugar permease. Thus, unreacted acceptor molecules can be washed from the cells after incubation, leaving only those cells fluorescent that host an active CgtB variant. By screening 2 x 107 variants of a randomly mutagenized library they were able to identify a mutant showing broader substrate tolerance as well as 300-fold higher specific activity, emphasizing the superior performance of FACS- based screening compared to microtiter plate-based screening for glycosyltransferases. Bornscheuer and coworkers reported an approach for the screening of enantioselectivity, a trait that is inherently difficult to screen for in vivo [94]. They designed a “carrot and stick” approach for screening an esterase library. As a “carrot” substrate, (R)-3-phenyl butaric acid covalently linked to glycerol was used, that supports growth after hydrolysis releases glycerol . As a “stick” substrate, (S)-3-phenyl butaric acid coupled to 2,3-dibromopropanol was provided, that releases toxic 2,3-dibromopropanol upon cleavage. To differentiate between the two activities, cells were then stained for life/death after incubation and sorted by FACS, thus selecting only those E. coli cells that harbored esterase variants with an improved enantioselectivity. If the fluorescent product readily diffuses out of the cell after formation or if it cannot be fixed onto the cell surface, single cells can be encapsulated in a water-in-oil-in-water emulsion which retains a hydrophilic product formed by the enzymatic reaction in the cell (see Figure 2.7d). Tawfik and coworkers have pioneered this approach by evolving a serum paraoxonase (PON1) for thiolactonase activity [95]. They encapsulated single E. coli cells expressing a serum PON1- library into emulsion droplets of a volume of less than 10 femto-liter together with the substrate γ-thiobutyrolactone (γTBL) in a water-in-oil emulsion first. The fluorogenic thiol- detecting dye N-(4-(7-diethylamino-4-methylcoumarin-3-yl)phenyl)maleimide (CPM) was added through the oil phase and the primary water-in-oil emulsion was emulsified again to obtain a water-in-oil-in-water emulsion. Upon hydrolysis of γTBL by PON1 variants, the free thiol reacts with the thiol-detecting reagent CPM to form a fluorescent product. By screening of > 107

30 variants using FACS, a PON1 variant with more than 100-fold improvement in thiolactonase activity could be isolated. In a very recent example, Fischer and coworkers have reported the directed evolution of a glucose oxidase (GOx) towards higher activity in an approach that incorporated both cell surface display and cell-in-droplet technology [97]. Yeast cells expressing GOx variants from a site-directed mutagenesis library (consisting of 105 variants) were encapsulated in a water-in- oil emulsion together with β-glucosidase, horseradish peroxidase (HRP) and tyramide- fluorescein. The substrate is then delivered in the form of β-D-octyl-glucoside through the oil phase that goes to the oil-water interphase due to its detergent-like structure. The glucose is then released inside the emulsion droplet by β-glucosidase and is oxidized by the GOx displayed at the yeast surface. In the course of the reaction, hydrogen peroxide is formed, which is used by HRP to catalyze the formation of phenolic radicals of tyramide-fluorescein and tyrosines of yeast surface proteins, leading to the staining of the yeast cells. After removal of the oil phase the cells were analyzed by FACS, finally resulting in the isolation of a variant with 5.8-fold improved activity at pH 7.4 compared to the starting variant.

2.8.3. Cell surface display

Cell surface display has emerged as another highly versatile and powerful selection method for directed evolution of enzymes. Wittrup and coworkers have developed a yeast surface display screening method to enhance the enantioselectivity of horseradish peroxidase (HRP) [78]. Active variants of a HRP library displayed on the yeast surface can activate a fluorescently labeled L/D-tyrosinol substrate, which then can react spontaneously with tyrosine residues that are in the vicinity (e.g. as part of membrane proteins) of the catalyzing HRP on the yeast surface. Such fluorescently labeled yeast cells can then be sorted by FACS and those with the highest fluorescence, presumably hosting the most active HRP variants, are isolated. Lipovšek et al. selected two libraries of HRP with each 2 x 106 variants, one library derived from a random epPCR approach and the second library containing complete randomization of five positions near the active site [78]. The epPCR library did not yield a variant with improved enantioselectivity for either D- or L-tyrosinol, whereas the saturation mutagenesis library resulted in improved variants with up to 8-fold improved enantioselectivity for L- over D- tyrosinol. In a similar example Kolmar and coworkers used E. coli cells to display variants of a Pseudomonas aeruginosa esterase (EstA) on the cell surface and screening for enzyme variants with improved enantioselectivity [98]. Esterases have gained increasing importance in industrial biotechnology for the synthesis of pharmaceuticals, food or cosmetics [120]. The screening of EstA libraries proceeded in two steps. In a first round, the random mutagenesis library of EstA (4 x 107 clones) was screened for full-length and wild-type expression level, using

31 an antibody that binds to EstA and was itself labeled by a fluorescently labeled secondary antibody. In a second round the surviving variants (108 cells) were screened for the hydrolysis of differently labeled tyramide ester of 2-methyldecanoic acid (2-MDA). (R)-2-MDA was labeled with 2,4-dinitrophenol (DNP) and (S)-2-MDA was labeled with biotin. The substrates were hydrolyzed by EstA and the tyramide derivative was attached covalently to the cell surface by a peroxidase-mediated radical reaction. The two different enantiomers were then detected via fluorescently labeled streptavidin (detecting biotin tyramide) or an anti-DNP antibody (detecting 2,4-Dinitrophenyl (DNP) tyramide). They isolated several mutants with inverted enantioselectivity compared to wt, thus hydrolyzing preferentially the R enantiomer. Saturation mutagenesis of a “hot-spot” residue and screening resulted in a final mutant that showed even further increased enantioselectivity, up to an apparent enantioselectivity (Eapp) of 16.3 for the S enantiomer.

2.8.4. Reporter-based in vivo screening assays

In reporter-based assays, the product of an in vivo biotransformation is not observed directly (e.g. through its fluorescence) but rather triggers the expression of a genetically encoded reporter protein that allows discrimination of the different enzyme variants (Figure 2.7c). To this date, only few examples of industrially relevant enzymes have been evolved via reporter- based assays, thus for a more comprehensive overview covering also other proteins the reader is referred to a recent review [54]. An example of this strategy has recently been described by Michener and Smolke, who employed a theophylline-responsive ribozyme incorporated in the 3′-UTR of a gfp gene, coupling the concentration of intracellular theophylline, obtained from caffeine via a caffeine demethylase, to a fluorescent read-out [121]. Upon binding of theophylline by the riboswitch, it can no longer cleave off the poly-A tail of the mRNA. This prevents the fast degradation of the mRNA and thus leads to low expression of the upstream gfp gene. In several rounds, they screened a caffeine demethylase library generated by epPCR and DNA shuffling using this reporter-system in S. cerevisiae, which resulted in a final variant that had a 33-fold higher enzyme activity (determined in vivo) as well as a 22-fold higher product selectivity (the ratio of the desired product (theophylline) and an undesired side product (paraxanthine)) than the starting variant. The authors suggest that this system is highly versatile and would be adaptable for a variety of different small molecules by aptamer selection strategies that are readily available (e.g. aptamer selection by SELEX [122]).

2.8.5. Significance

Screening on agar plates is generally easy to perform with minimal requirements on instrument equipment and reagents. However, its major disadvantage is the limited dynamic

32 range, often allowing only differentiating between active and inactive variants without accurate quantification of the catalytic rate. Nevertheless, an agar plate-based method was used by Codexis as a first-tier screen for a halohydrin dehalogenase [52], emphasizing the utility of this approach for directed evolution of industrial biocatalysts. The high-throughput screening of enzyme variants by FACS, either in cells used as microreactors, in emulsion droplets or in the form of enzymes displayed on the cell surface requires either a fluorogenic substrate or an indirect connection to a fluorogenic signal, restricting the applicability of these assays to a narrow set of reactions. For some instances it might be possible to use a fluorogenic analogue of the actual substrate, risking however to improve the enzyme towards this surrogate instead of the ‘real’ substrate and thus creating artifacts [45]. On the other hand, reporter-based assays using a riboswitch to trigger the expression of a (fluorescent) reporter protein might be a promising tool for the directed evolution of proteins due to their modularity and the availability to generate aptamers against small molecules from scratch. However, riboswitches have not yet found wide application in this area, making it difficult to assess their potential reliably. Although the selection of RNA aptamers against many different targets in vitro has been reported [122], there are still few examples where such a novel functional aptamer could be used in vivo to develop a functional riboswitch [123]. Therefore, it remains to be seen whether this approach will be applicable for the directed evolution of industrial enzymes.

2.9. In vitro selection

Similar to in vivo screening and selection systems, in vitro selection also relies on the physical linkage between the genotype (the gene encoding the variant) and the phenotype (the reaction product). In the former case, this linkage is provided by the cell membrane, whereas in the latter case a phage particle (for phage display) or a ternary complex consisting of the mRNA, the ribosome and the displayed enzyme (for ribosome display) need to be coupled to the reaction product (Figure 2.9). Display systems are by design selecting for binding events rather than for catalytic turnover, therefore in those cases where catalysis was the target, most approaches used transition state analogs or suicide inhibitors as an indirect mean to select for catalytic activity. The big advantage of these assays is that they can handle very large libraries, in the order of 107 to 1011 for phage display [124] and even in the order of 1012 for ribosome display [125] and that the enzyme has unhindered access to the substrate and the reaction conditions (metal ions, pH) can be chosen relatively freely. However, the linkage between the enzyme and the reaction product is one of the main challenges with display systems. A comprehensive overview on the topic has been reported elsewhere [126, 127], here the focus is on enzymes that are relevant for industrial application.

33 a) b) Ribosome

mRNA Phage Nascent protein

Phage

TSA/suicide TSA/suicide substrate substrate

Figure 2.9| Principle of phage display (a) and ribosome display (b) for the directed evolution of biocatalysts. The phage or the ternary complex (mRNA-ribosome-enzyme) displaying the enzyme are adsorbed to an immobilized transition-state analog (TSA) or a suicide substrate. Active enzymes are able to bind, whereas inactive enzymes are eluted.

Quax and coworkers have reported the directed evolution of a B. subtilis lipase A for inverted and improved enantioselectivity towards the pharmaceutically relevant 1,2-O-isopropylidene- sn-glycerol (IPG) by selecting phage particles displaying LipA variants against a lipase suicide inhibitor consisting of a phosphonate moiety, enantiopure (S)-IPG or (R)-IPG and a p- nitrophenyl group coupled to a glass-bead [99, 128]. In a two-step selection process, they first removed LipA variants that exhibited enantioselectivity towards the unwanted enantiomer of IPG coupled to the suicide inhibitor. In the second step, surviving phages that did not bind the unwanted IPG derivative were panned against the second suicide inhibitor coupled to the desired enantiomer of IPG. After four rounds, this resulted in a variant with inverted enantioselectivity for the desired IPG enantiomer, however to a modest degree (wt: enantiomeric excess (ee) = -33.3; variant N18I: ee = +35.3). The same group used phage display to improve β-amylase variants towards starch binding at industrially preferred low pH conditions [100]. Phages that displayed site-saturation libraries targeting the starch-binding domain of β-amylase were displayed against cross-linked starch at decreasing pH and a variant with more then 2-fold increase in the starch hydrolysis ratio at pH 4.5/7.5 compared to wt was isolated. Ribosome display has been used in few cases for the directed evolution of enzymes of limited industrial relevance. In a proof-of-principle study, Plückthun and coworkers could show that ribosome display was able to enrich active β-lactamase over an inactive mutant > 100-fold [129].

2.9.1. Significance

Although display technologies can handle a vast library size that other technologies cannot cope with they suffer from several disadvantages. First, the enzyme of choice needs to be displayed at the outside of the phage particle as a fusion-protein, restricting its use mostly to periplasmic proteins, as the fusion protein needs to translocate the inner membrane of E. coli to the periplasm of the cell where the phage coat is assembled [130]. Second, display technologies

34 are per se designed to select for binding rather than for catalytic (multi-turnover) events, and by selecting enzymes to bind to transition-state analogs or suicide inhibitors, the catalytic intricacies of an efficient enzyme might not be met by this technology and thus do not lead to an improved biocatalyst. In conclusion, display technologies might be a suitable strategy to find initial catalytic activity from a very large library of variants or to improve the stability of a certain biocatalyst that can be displayed [131], but they are supposed to be less useful for improving the catalytic activity of an industrial enzyme for its real substrate.

2.10. In vitro screening

Reactions that result in a product that is neither fluorescent nor colored, toxic to the host organism, cannot be taken up by the cell or is not soluble in water, need to be screened in an in vitro format. A great variety of in vitro screening assays have been developed in the last decades, ranging from low-throughput assays (e.g. assays measured by HPLC, GC, NMR) to medium-throughput assays (e.g. enzyme-coupled assays in 96-well plates) to high- and ultrahigh-throughput screening methods (e.g. in in vitro compartmentalization (IVC) assays). In vitro screening assays offer the highest flexibility and dynamic range of all screening and selection assays, allowing the screening of nearly all conceivable reactions in any buffer or solvent that is necessary. These features make screening in vitro the most suitable method for directed evolution of industrial biocatalysts [44, 46, 84]. In vitro screening reactions mostly rely on microtiter plates for the cultivation of the enzyme variants (mostly in 96-well plates in a host cell as e.g. E. coli or yeast) and also for the reaction and detection in case of enzyme-coupled reactions. On the other hand, IVC offers the possibility to reduce the reactor size from the microliter range of microtiter plates to the picoliter or femtoliter range [132] with the concomitant reduction in reactants consumption and lab space required.

35 a) b) c)

NAD(P)+ NAD(P)H

mRNA

Oil Oil

Figure 2.10| Different in vitro screening methods. a) Encapsulation of single cells in an emulsion droplet together with a fluorescent substrate and a lysis solution that disrupt the cell wall and releases the enzyme into the droplet. b) In vitro compartmentalization, where a linear DNA fragment is encapsulated with a transcription/translation mix and a fluorescent substrate. c) In enzyme-coupled reactions, the product of the first reaction serves as substrate for a second reaction that is coupled to the formation of a colored or fluorescent product (often NAD(P)H is used as it is a common co-factor for many different dehydrogenases). For a legend of the different symbols please refer to Figure 2.7.

2.10.1. Screening with in vitro compartmentalization (IVC) and cell lysate in emulsion droplets

In vitro compartmentalization is similar to the cell-in-droplet approach described above (Figure 2.7d) except that the transcription and translation for the production of the protein is done completely in vitro. A linear DNA fragment encoding the biocatalyst library is emulsified together with a transcription/translation mixture, resulting in water-in-oil emulsion droplets with femto-liter volumes (see Figure 2.10b). The genes are transcribed and translated into multiple copies of the enzyme which can then convert the added substrate to a product. In order to be detectable by FACS analysis, the product needs to exhibit fluorescence. Although the compartmentalization approach (completely in vitro or emulsifying whole cells (Figure 2.10a)) is a promising technology, there are up to now only few applications for the directed evolution of industrially relevant enzymes. A variety of reviews and methodological articles have been published on this topic which we refer to for further and more detailed information for specific examples [132-135]. A recent example of cells lysed in compartmentalized emulsion droplets has been presented by Hollfelder and coworkers [102]. They emulsified single E. coli cells expressing variants of a promiscuous sulfatase library in a microfluidic device, together with fluorogenic substrates (bis(methylphosphonyl)fluorescein or bis(sulfate)fluorescein) and a lysis solution to disrupt the cells inside the emulsion droplets. Using a microfluidic setup to analyze and sort 3 x 107 droplets per round, they were able to enrich variants with improved activity over three rounds and finally identified an arylsulfatase variant exhibiting 6-fold higher kcat/Km compared to the wt for the substrate bis(methylphosphonyl)-fluorescein.

36 Such microfluidic-based assays, as well as the more ‘traditional’ IVC approach utilizing FACS analysis for the single droplets, require a fluorescent product, which is the main limitation of these technologies. Additionally, these are rather complex procedures, requiring high levels of skills and experience of the experimenter compared to more simple assays as e.g. microtiter plate-based assays.

2.10.2. Screening in vitro with enzyme-coupled reactions

Many industrially relevant reactions do not generate a product with any easily detectable change in their UV/VIS or their fluorescence spectrum because they lack a chromophore. However, many primary reaction products can serve as substrates for a secondary, NAD(P)- dependent enzyme, thus giving an easily detectable signal at 340 nm due to a change in absorbance between NAD(P)+ and NAD(P)H [136]. Other secondary enzymes are available which for example convert the generation of hydrogen peroxide to a colored signal molecule. An interesting example of such a coupled enzyme assay has been reported by Lauchli et al. who screened a terpene synthase for enhanced thermostability as well as optimized growth and reaction conditions [104]. They synthesized a modified screening substrate that contained a vinyl methyl ether functionality, releasing methanol upon cyclization. Methanol can then be converted to formaldehyde by an alcohol dehydrogenase in presence of molecular oxygen that reacts spontaneously with purpald to form a purple product. Screening an epPCR library using this assay they allowed isolation of a terpene synthase with an increase in thermostability of 12°C. A variety of directed evolution experiments have been undertaken with cytochrome P450 monooxygenase from Bacillus megaterium (P450 BM3) [137]. P450 BM3 consists of a natural fusion between the monooxygenase and the electron-delivering NADPH-dependent diflavin reductase. Therefore, the monooxygenase activity can be indirectly determined by following the decay of NADPH level. Using this principle, Kille et al. for example demonstrated the directed evolution of P450 BM3 variants that show excellent regio- and diastereoselectivity on testosterone as substrate [103]. The throughput of microtiter plate-based assays can be increased either by employing laboratory automation systems or by increasing the number of wells per microtiter plate from 96 to 384 or even 1536. An example of an even higher number of wells per plate was reported by Lafferty and Dycaico from Diversa Corp. (now Verenium) who developed a microtiter plate containing approximately 105 - 106 wells with each 190 nL volume [138].

2.10.3. Screening in vitro using instrument-based assays

For products that can neither be detected by colorimetric of fluorescent means nor can serve as compound to complement an auxotrophy in an in vivo selection system, the more laborious

37 and time-consuming detection by means of an instrumental assay is a highly sensitive and versatile option. In fact, screening enzyme libraries by means of HPLC, GC or capillary chromatography have become one of the cornerstones in directed evolution [46] and many examples of directed evolution for industrially relevant biocatalysts have been reported using these methods. The trend towards smaller and “smarter” libraries that can be observed over the last decade is not to a small degree due to the limitations imposed by these versatile yet often cumbersome methods that have a throughput of < 1’000 samples per day and therefore do not allow screening very large libraries. In an impressive example, Keasling and coworkers have evolved γ-humulene synthase towards seven different variants that show very different product profiles [105]. They used saturation mutagenesis at sites that were identified as “plasticity residues” based on a homology model of the enzyme, together with a simple computational algorithm that made predictions on the effect of different mutations on the product profile, and screened the libraries using GC-FID and GC-MS. Screening less than 2’500 variants in total, they were able to come up with variants that showed a complete change in product distribution. Reetz and coworkers developed a capillary electrophoresis assay that is able to perform > 7’000 determinations of enantiomeric excess per day, thus outperforming standard HPLC or GC- based assays by one to two orders of magnitude [139]. A research group at the company Codexis described the directed evolution of a halohydrin dehalogenase (HHDH) that is employed in the synthesis of atorvastatin, a cholesterol-lowering drug belonging to the statin family of so-called HMG-CoA reductase inhibitors [52, 140]. By several rounds of DNA shuffling they could overcome low specific activity, strong product inhibition and low stability of the starting enzyme, resulting in a final variant with 4’000-fold higher volumetric productivity under production conditions compared to the wild-type enzyme. They employed an agar-based assay for the first-tier screen (see above) and a GC-based method for the second-tier screen. Codexis together with the company Merck reported another example of in vitro screening for the directed evolution of an industrial biocatalyst that can be considered as a milestone in the field [58]. A combination of computational modeling, substrate walking and site-directed mutations was used to generate a transaminase with marginal activity towards a substrate that was initially not accepted by the wild-type enzyme. On this basis, eleven rounds of directed evolution using the ProSAR algorithm to separate beneficial from detrimental mutations and screening 36’480 variants by HPLC resulted in a final variant that converted 200 g L-1 prositagliptin ketone to the active ingredient sitagliptin with 92% yield and virtually absolute enantiopurity (ee of 99.95%) (see Figure 2.11). Notably, these results were achieved under production conditions (45°C, 50% DMSO, 6 g L-1 biocatalyst loading, 0.5 – 1 g L-1 pyridoxal 5'- phosphate (PLP)). Immobilization improved the performance of the biocatalyst even further,

38 allowing its long-term operation in neat organic solvent which greatly simplifies the downstream processing [141].

F Figure 2.11| Reaction scheme for F the synthesis of sitagliptin from O O prositagliptin ketone, using an NH engineered transaminase with 27 2 N N N mutations compared to the + N F starting variant. The reaction 0.5 - 1 M prositagliptin ketone conditions and the enzyme CF3 loading are given. 200 g/L prositagliptin ketone

6 g/L transaminase 45°C 50% DMSO F F O NH 2 99.95% e.e. N O N (R) N + N F sitagliptin CF3

In a recent example, we have shown that thermostability of a D-tagatose epimerase (DTE) can be increased by systematic optimization of the dimeric interface interactions [106]. D-Tagatose epimerase is an important enzyme for the production of rare monosaccharides, compounds that have a broad field of applications, from low calorie sweetener to building blocks for pharmaceuticals [8]. Less than 4’000 clones were screened of saturation mutagenesis libraries of dimeric interface residues using an HPLC-based method. A variant with strongly increased thermostability was finally identified that carried 8 beneficial interface mutations.

2.10.4. Significance

In vitro screening, especially when instrument-based assays are employed, is without doubt the most versatile and precise method to screen biocatalyst variants. The advantage of most of these approaches is the free choice of reaction conditions (organic solvents, pH, temperature, metal ions) and the free access of the substrate to the enzyme. Using instruments for determining the reaction progress, virtually all possible reactions can be followed quantitatively. However, this comes at the price of rather low throughput, limiting the library size to ~104 variants (Figure 2.4). However, it has been shown for many cases that this does not have to be a crucial disadvantage and highly optimized industrial biocatalysts can be generated using such assays, as shown most impressively by the example of the transaminase for the synthesis of sitagliptin [58].

39 2.11. Future directions in the directed evolution of biocatalysts

The field of enzyme engineering by directed engineering has made tremendous progress in the last two decades since the seminal works of Frances Arnold [142] and Willem Stemmer [65]. However, there still remain several bottlenecks that need to be overcome before the full potential of enzymes in industrial biocatalysis can be harnessed. It is no longer the question whether a distinct enzymatic trait (e.g. enantioselectivity, thermostability, specific activity) can be improved, as in most cases and for most screening or selection methods a certain improvement of the enzyme properties can be obtained. Given the efforts that have to be invested into assay development, the question is rather how to obtain the largest improvement with the least experimental effort. A decade ago, the emphasis of directed evolution of enzymes was on the generation of huge libraries and screening or selection assays which can cope with the resulting immense number of variants. The underlying assumption was that bigger libraries cover larger parts of the enzyme sequence space, therefore the chance of discovering beneficial mutations is larger [107, 143]. More recently, the question of screening efficacy has come to the forefront, especially for industrially relevant biocatalysts that catalyze reactions which normally do not generate a fluorescent signal or complement an auxotrophy and hence rely on assays with smaller throughput. Therefore we argue that three main foci have emerged for the screening of industrial biocatalysts and will gain increased attention in the future.

(I) Smarter libraries. The group of Manfred Reetz pioneered the concept of Iterative Saturation Mutagenesis (ISM) [75] which consists of repeated cycles of saturation mutagenesis of sites that were shown to lead to improvements in previous efforts. This approach is termed semi- rational (in contrast to random mutagenesis by e.g. epPCR or DNA shuffling), as it restricts the diversity that is examined to carefully pre-selected amino acids. The choice of sites is based on additional information of the enzyme as for example the crystal structure, homology models, sequence alignments, or available literature. Semi-rational, smart libraries have since gathered pace and have been applied for a variety of different biocatalysts [46, 144]. There have been several direct comparisons between the efficacies of screening semi-rational libraries versus random libraries, which showed that smart libraries show bigger improvements with less screening effort compared to random libraries [77, 79]. Another example of a restricted search space in directed protein evolution is SCHEMA recombination that identifies optimal crossover points for the shuffling of enzyme homologs, based on their three-dimensional structure [145]. This method is based on a computational algorithm, thus serving as well as example for the second focus.

40 (II) Computational tools. Several de novo enzymes have been designed, catalyzing Kemp elimination [48] , retro-aldolase reaction [146] or a Diels-Alder reaction [147]. These tools allow the generation of enzyme catalysts of reactions not catalyzed by natural enzymes, however they result in catalysts with catalytic rates that are orders of magnitudes below those of natural enzymes. It has been shown on the other hand that their specific activity can be readily improved by directed evolution, reaching catalytic parameters comparable to those of natural counterparts [148]. A highly successful example of a computational tool used in directed evolution of biocatalysts is the ProSAR algorithm, a machine learning algorithm that infers the effects of mutations in a combinatorial biocatalyst library on a property of interest (e.g. catalytic activity, enantioselectivity, thermostability) [149, 150]. For an overview on other computational tools, the reader is referred to a recent review by Damborsky and Brezovsky [151].

(III) In vitro compartmentalization and microfluidics. The advent of smart libraries does not make the screening of random mutagenesis libraries obsolete, but these two approaches should be rather viewed as complementing each other. Screening such random mutagenesis libraries using high-throughput methods is an essential step to discover new mutations at potential mutagenesis ‘hotspots’ that can be targeted in further rounds by focused libraries [144]. The combination of microfluidics and IVC has enabled the formation of highly uniform mono-disperse droplets [152] and offers the possibility to carry out precise assays in a miniaturized format, reducing the amount of reagent to a minimum [102, 134]. With the progress in microfluidics and the applied analytical techniques [153], it is supposed that these technologies might become even more versatile and applicable to reactions that do not necessarily produce a fluorescent product.

2.12. Conclusion

Directed evolution of biocatalysts has become an increasingly efficient tool to tailor enzymes for the specific demands in industrial biotechnology. The development of highly efficient biocatalysts as demonstrated in the case of sitagliptin manufacturing [58] has shown that target functions for industrial enzymes are multidimensional, i.e. not only focusing on improved specific activity but also on stability under process conditions, enantio- and regioselectivity, expression level, and others [47]. The lessons learned during the past, together with more powerful computational algorithms and novel screening technologies are supposed to unveil the full potential of biocatalysis for the manufacturing of pharmaceuticals, fine and bulk chemicals.

41

42 CHAPTER 3: SYSTEMATIC OPTIMIZATION OF INTERFACE INTERACTIONS INCREASES THE THERMOSTABILITY OF A MULTIMERIC ENZYME

Andreas Bosshart, Sven Panke and Matthias Bechtold This work was published in Angewandte Chemie International Edition 52, p. 9673-9676 (2013)

43 3.1. Abstract

Insufficient operational stability of biocatalysts is still one of the main limitations for their economic utilization in the chemical industry. Limited stability of multimeric proteins is often associated with subunit dissociation, followed by irreversible denaturation. We applied a novel procedure on the basis of crystal structure analysis, sequence alignment and single site saturation mutagenesis to systematically identify residues in the intersubunit interface of a dimeric D-tagatose epimerase (DTE) enzyme, suitable for increasing thermostability. DTE is an crucial biocatalyst for the production of rare monosaccharide, compounds that are important precursors for the synthesis of pharmaceuticals or can be directly used as low-calorie sweetener. Using iterative site mutagenesis on 9 hot-spots identified in an initial screening yielded a very stable enzyme with up to 18-fold improved catalytic performance in terms of total turnover number. A long-term experiment under production conditions (50°C, 1 M D-fructose) showed nearly no decay in conversion over 4 days, whereas WT lost 40 % of activity during the same time.

3.2. Introduction and Results

With their exceptional selectivity, biocatalysts play an increasing role in diverse chemical fields including food processing, production of fine, specialty and bulk chemicals, and fuel production [44]. A frequently remaining fundamental limitation is operational stability [154], in particular at higher reaction temperatures, which are often preferred due to an increase in reaction rate or reactant solubility, a decrease in medium viscosity or risk of microbial contamination, or a more favorable position of the reaction equilibrium [33]. However, the structural integrity of an enzyme, especially of mesophilic origin, is often impaired under these high-temperature operational regimes. Inactivation typically starts with a loss of integrity of the quaternary (for multimeric enzymes) or tertiary structure (for monomeric enzymes) and is followed by an irreversible denaturation step [155, 156]. The situation can be improved by immobilization [157], additives [158], or enzyme engineering [159-163], preferably employing semi-rational approaches to prevent the often tedious development of high-throughput assay formats. Such approaches include increasing the similarity to a consensus amino acid sequence derived from sequence alignments [164, 165] or are based on available crystal structures, which can direct the variation of presumably very flexible amino acids identified by their high atomic displacement parameters (B-factors)[159]. In the case of multimeric enzymes, the placement of inter-subunit ionic interactions or disulfide bridges was shown to prevent subunit dissociation and increase biocatalyst stability [166]. However, successful placement of such strong links to prevent multimer dissociation [167] as well as proper disulfide formation in model expression hosts such as E. coli is non-trivial [168]. Consequently, generic strategies to systematically improve the thermostabilty of multimeric enzymes are still lacking.

44 We reasoned that a systematic variation of the non-conserved residues of a protein-protein interface should rapidly reveal those positions in the amino acid sequence of an enzyme whose mutation can contribute to strengthening the inter-subunit interface and thus counter disintegration. To test this hypothesis, we selected the homodimeric D-tagatose 3-epimerase of P. cichorii (PcDTE) [12]. PcDTE catalyzes the reversible C3-epimerization of all four ketohexose epimer pairs [8] and is hence of strategic importance for preparative production of rare hexoses, which on industrial levels takes place at elevated temperatures to reduce viscosity and microbial contamination. However, wild-type PcDTE degrades rapidly at elevated temperatures (see below). To exclude that disintegration of the tertiary structure of one of the monomers was actually the rate-determining step in thermal inactivation, we varied the 10 amino acids with the highest B-factors extracted from the analysis of 3 PcDTE crystal structures (PDB ID 2QUN, 2QUM, 2QUL) as described by Reetz et al. [75]. Of these, only substitution of K122 (K122V)

20 showed a slight impact on thermostability (conversion of D-fructose to D-psicose, T50 = 67.2°C 20 (+1.2°C compared to WT), with T50 being the temperature at which 50% activity is lost after 20 min of incubation (see supplementary material for details). Interestingly, K122 is part of the dimer interface, whereas 6 of the other residues are not, suggesting that disintegration of the quaternary structure is the more likely rate-limiting step in thermal inactivation. As a first attempt towards stabilization of the dimeric interface of PcDTE, we introduced intersubunit disulfide bonds at positions F157 or position W160 and W262. These sites were selected by the program “MODIP” [169] as the only potentially suitable locations at the interface, albeit already with a low quality ranking. Correspondingly, the subsequent mutagenesis and expression in E. coli did not lead to enzyme variants with increased stability (see supplementary materials, Figure S1). Instead, when we explored structure-guided systematic strengthening of interactions between the two subunits throughout the entire dimer interface, variants with improved thermostability could be readily isolated. For this, we conducted first a thorough analysis of one PcDTE crystal structure (PDB ID 2QUN) using the software PDBePISA [170] to identify the residues involved in interface formation. As a homodimer, PcDTE has a virtually symmetric interaction pattern, suggesting a maximum of 44 sites for engineering (Figure S3 & Table S3). Three of the 44 residues make only negligible contributions to the buried surface area of the interface and were discarded. Ten highly conserved residues in the interface (Figure 3.1), likely to be crucial for function or structural integrity and thus unlikely to yield mutants with improved stability, were identified from a sequence alignment with 28 other DTE sequences (listed in the Uniprot database as DTEs, Figure S2) and discarded as well.

45 b) Glu158 a) Arg217 Asn155 Trp262

Phe248 Arg263

Arg156 Glu192

Glu193 Gly223

6xHis 6xHis

c) S116N/H 90° F157Y K122V G260C K251A/T

D164E M265L T194N A215N/Q

variable conserved 6xHis Figure 3.1| Localization of strictly conserved amino acid residues and those affording more thermostable PcDTE-variants. a) PcDTE dimer with chain A in surface representation and chain B (dark grey) shown in cartoon representation, the C-terminal 6xHis-tag is marked. b) Chain A/B with all 10 strictly conserved interface residues shown as sticks. c) Chain A/B with the 9 interface sites that afforded an improved mutant during the initial stability screening, highlighted as spheres. The coloring of each residue corresponds to its degree of conservation in 10% increments.

Each of the 31 remaining sites was randomized in a first round of variation separately by site- saturation mutagenesis with NNK degeneracy primers allowing sampling of all possible amino acids with a reduced set of codons (see supplementary materials for details). Nine of the 31 libraries produced at least one improved variant (i.e. affording at least 120% conversion after heat treatment when compared to the heat-treated WT), the best mutants being S116N, K122V, F157Y, D164E, T194N, A215N, K251A, G260C and M265L (Figure 3.1c and 3.2a).

46 a) b)

90 b) Var8 Var8C Var7+ Var6 Var7 88 Var5 A215Q Var4 Var4+ Var5+ Var6+ 100 86 Var3+ S116H M265LK122V Var3 T194N 84 Var2+ K251T 80 82 Var2 Var1+ Var1 G260C 80 F157Y 60 PcDTE wt 78 A / % PcDTE Var1 T20 /°C r 50 PcDTE Var2 76 40 PcDTE Var3 74 PcDTE Var4 G260C PcDTE Var5 72 A215N 20 PcDTE Var6 70 K251A PcDTE Var7 T194N S116N PcDTE Var8 68 M265L K122V D164E 0 PcDTE Var8C

66 PcDTE wt 30 40 50 60 70 80 90 T / °C 20 Figure 3.2| a) Thermostability, expressed as T50 value, of all variants involved in this study: PcDTE wild- type (red bar), hits obtained in the first variation round (black bars), variants 2-8 obtained by ISM (blue bars), and variant 8C obtained by combination of the 8 mutations from the first round (green bar). Mutation D164E was excluded in combinations as no improved variant could be identified during ISM. b)

Residual activity (Ar) curves of PcDTE WT, variants 1 to 8, and variant 8C, fitted to a second-order sigmoidal function.

20 The T50 for each of these variants varied from 67°C (D164E, WT+1.4°C) up to 78.1°C (F157Y, WT+12.5°C). The sites for the 9 stabilizing mutations were located mostly around the border of the interface except for mutation F157Y which is positioned right in the center of the dimeric interface and produced the biggest increase in thermostability. Computational modeling suggested that the new Y157 residue can form a hydrogen bond with the strictly conserved residue N155 of the other subunit, thus contributing to a stabilized dimer interface (Figure 3).

Asn155 Tyr157 Figure 3.3| PyMol model of the best single improvement in PcDTE thermostability (F157Y), 3.0Å based on PDB 2QUN. Chain A is colored in red, 3.1Å chain B is colored in dark grey. The hydroxyl group of Y157 of chain A enables hydrogen bond Asn155 formation to the highly conserved residue N155. Tyr157

Another inter-chain hydrogen bond is probably established between the side-chain nitrogen of the new N194 and the carbonyl oxygen of E222. Mutation S116N might allow the formation of intra-chain hydrogen bonds with the carbonyl oxygen of residue F157 and similarly, mutation

47 A215Q is expected to lead to a hydrogen bond-network with E215 and R224. The stabilizing effect of the other mutations was more difficult to rationalize: we speculate that mutations K122V and K251T reduce the flexibility of the side-chains and mutation G260C remedies packing defects within the protein dimer-interface [171]. In order to accumulate beneficial mutations and also capture non-additive effects [172], we proceeded by iterative site-directed mutagenesis (ISM)[75], which goes through iterative cycles of saturation mutagenesis, targeting one previously identified beneficial site after another and using the best variant of each cycle as template for the next round. ISM was started on the most improved mutant, F157Y (re-named to Var1, Figure 3.2), as template. G260, having resulted in the second best improvement in the single site investigation, was randomized and screened as described above except that the heat-treatment was performed at 80°C for 20 min. The

20 resulting Var2 had the same mutation as in the initial screening (G260C) and a T50 of 80.0°C. Next, position A215 was randomized in Var2 but yielded no improved variant, even though randomizing this position had resulted in a significantly more stable variant in the initial screening. Therefore, the probing of this site was re-visited at the last stage of this iterative

20 procedure. Randomization of Var2 at position 251 yielded K251T in Var3 and a T50 of 82.6°C. In 20 the next iteration stages positions 194 (Var4, T194N, T50 =84.7°C) and 116 (Var5, S116H, 20 T50 =85.5°C) were addressed with increasing selection pressure in form of increased temperature during the heat-treatment step (Figure 3.2). Randomization of site 265 in Var5

20 resulted in Var6 (M265L) that showed no significant increase in T50 but a considerable increase in enzyme activity of heat-treated lysate. Randomization of position 122 of Var6 lead to K122V

20 (Var7, T50 =85.6°C). Randomizing D164 of Var7 did not lead to further improvement. Finally, site 215, previously postponed, was again randomized in Var7 and led to a more stable Var8 (A215Q

20 20 T50 = 87°C), representing a very respectable increase of T50 of 21.4°C over WT. Var8 contained only 3 mutations that had not already been found in the first round of variation (K251T, S116H, A215Q, Figure 3.2a). This high conservation of mutations prompted us to change also the amino acids in these three sites to those found in the first round, leading to Var8C (as

20 Var8 but with K251A, S116N, and A215N). Surprisingly, this variant exhibited an even higher T50 (88.5°C vs. 87°C), albeit at a lower specific activity.

48 a) 35 200

180 30 160

25 140

120 20 100 -1 t1/2 / h A / s 15 80

10 60 40 5 20

0 0 35 40 45 50 55 60 65 70 75 T / °C b) 6x106

5x106

4x106

6 TTN / - 3x10

2x106

106

35 40 45 50 55 60 65 70 75 T / °C c) 250

200

150 P / mM 100

50

0 0 20 40 60 80 100 t / h Figure 3.4| Enzyme kinetic and stability characteristics of PcDTE variants at different temperatures. a)

Half-life time (t1/2) (filled symbols) and catalytic activity (A, kcat, obs) (empty symbols) of PcDTE WT (circle), Var8 (triangle) and Var8C (square) and b) total turnover number TTN for PcDTE WT (circle), Var8 (triangle) and Var8C (square) are plotted against temperature. c) Long-term stability experiment under D-psicose (P) production conditions (EMR at 50°C, 1M D-fructose feed) for PcDTE WT (circle) and PcDTE Var8 (triangle). The same initial amount of enzyme was applied in both runs (0.18 mg/mL)

49 In order to demonstrate that the achieved thermal stabilization actually translates into superior performance in a process-like setting, we calculated total turnover numbers (TTN) of

WT protein, Var8 and Var8C at different temperatures (40-70°C). TTN equals kcat,obs / kinact,obs if deactivation is controlled by unfolding at elevated temperatures (i.e. unfolding is a 1st order process). On that condition TTN is defined as the product of catalytic proficiency, expressed as the apparent catalytic constant kcat,obs (from Michaelis-Menten plots), and operational stability, expressed as the inverse of the apparent first order deactivation constant kinact,obs (obtained from enzyme half-life time t1/2 in an isothermal continuous enzyme membrane reactor (EMR))[155]. As expected the WT PcDTE half-life time decreased rapidly with increasing temperature while half-life times for Var8 and Var8C remained above 5h even at 60° (Figure 4a), resulting in a TTN for Var8 that is between more than 2 (at 50°C) and 18 times (70°C) higher than the TTNs for WT PcDTE (Figure 4b). The performance improvement was confirmed by the nearly constant conversion during a 4 day EMR operation for PcDTE Var8 (50°C, 1M D-fructose feed, 180 µg/mL enzyme) compared to a 40% decrease observed for the WT (Figure 4c). Interestingly, both enzymes exhibited considerably longer half-life times (WT: 24-fold; Var8: 80- fold, Table S5) under production than under the initial analytical conditions (substrate and enzyme concentrations approx. 20-fold lower). We attribute this to the stabilizing effect of higher enzyme and/or higher substrate concentration in the reactor. We conclude that a straightforward semi-rational surface engineering strategy in multimeric enzymes, based only on a crystal structure and a limited number of homologous sequences, can lead to drastically improved thermostability with a limited screening effort (<4000 clones were screened in total for the interface engineering). The improved thermostability translated directly into superior operational stability under production-like conditions. In fact, variants Var8 and Var8C obtained here have acquired characteristics typically associated with enzymes from a thermophilic organism, i.e. high thermostability and high activity at elevated temperatures. We argue that the approach adopted here is a potentially generic method for increasing the stability of multimeric biocatalysts.

50 3.3. Materials and Methods

If not stated otherwise all chemicals were purchased from Sigma Aldrich (Buchs, Switzerland). D-psicose was purchased from Carbosynth (Berkshire, UK), restriction enzymes were obtained from New England Biolabs (Ipswich, MA, USA) and oligonucleotides from Microsynth (Balgach, Switzerland).

3.3.1. Cloning of PcDTE variants

Molecular cloning was performed according to standard procedures [173]. PCRs were performed using Phusion High-Fidelity Polymerase (NEB). Plasmid pKTS-C6H (all plasmids are summarized in Table S1) was generated by PCR with template pKTS and primers pKTS_tagHindIII_for (all primers are summarized in Table S2) and pKTS_tagHindIII_rev. The PCR product was digested with restriction enzymes XhoI and HindIII. Oligonucleotides 6xHis-tag_for and 6xHis-tag_rev were annealed and ligated to the digested PCR product to result in plasmid pKTS-C6H. The gene of Pseudomonas cichorii D-tagatose 3-epimerase (PcDTE) was inferred from the amino acid sequence described earlier [12] by reverse translation, codon optimized for expression in E. coli, and then obtained from Geneart (Regensburg, Germany). It was inserted into the expression vector pKTS-C6H via restriction sites NdeI and XhoI, resulting in plasmid pKTS- PcDTE-C6H, the final PcDTE construct with a C-terminal 6x His-tag. This plasmid served as template for the mutant library generation. Mutant genes were sequenced using primer AB123.

Table S3.1| Plasmids used and constructed in this study Plasmid Relevant Genotype Reference pMA-PcDTE Cloning vector carrying full-length codon optimized Geneart, this study PcDTE sequence pKTS Ptet-PT7 fusion promoter, bla, ori pMB1 Neuenschwander et al. [88] pKTS-C6H pKTS-derivative with C-terminal 6xHis tag This study pKTS-PcDTE-C6H pKTS-C6H derivative for expression This study of PcDTE with C-terminal 6xHis tag

51 Table S3.2| Oligonucelotides used for cloning and sequencing, restriction sites are underlined Primer name Sequence (5’-3’) pKTS_tagHindIII_for 5’-ACTGAGCTAAGCTTTCTAGTCAGCTGATCCGGCTGC-3’ pKTS_tagHindIII_rev 5’-ATAGTTTTCATCGTTCGCCG-3’ 6xHis-tag_for 5’-TCGAGCACCACCACCACCACCACTA-3’ 6xHis-tag_rev 5’-AGCTTAGTGGTGGTGGTGGTGGTGC-3’ AB123 5’-ACCACTCCCTATCAGTGATA-3’

3.3.2. PcDTE library generation

Saturation mutagenesis libraries were generated by site-saturation mutagenesis according to the QuickChange protocol (Stratagene) using the primers given in Table S3. NNK-degenerated codons (32 codons, all amino acids encoded)[75] were used to randomize each site individually. Each QuickChange reaction contained in a final volume of 50 µL: 5x Phusion HF buffer (10 µL), dNTPs (10 mM each, 1 µL), DMSO (1.5 µL), primer forward and reverse (10 µM, 5 µL each), template plasmid (50-80 ng/µL, 1 µL), and Phusion polymerase (0.5U). A typical thermal cycler program consisted of a denaturation step for 1 min at 98°C, 18 cycles of each denaturation at 98°C for 10 s, annealing at 50°C for 20 s, extension at 72°C for 2.5 min and a final extension step at 72°C for 5 min. The product was purified by spin-column purification, digested with 10 U of DpnI for at least 2 h at 37°C to remove the template, and then used to transform chemo- competent E. coli Top10 cells (Invitrogen).

3.3.3. Expression of saturation-mutagenesis libraries of PcDTE

From each single-site saturation mutagenesis library 93 colonies (3-fold oversampling, 95% coverage) were picked, inoculated into 96-well plates containing 1 mL LB-medium per well supplemented with 100 µg/mL ampicillin and incubated overnight at 37°C with shaking. Three wells per plate were inoculated with WT PcDTE as control. An aliquot of 200 µL of each preculture was filled into a glycerol plate (40 µL glycerol per well, 100%) and stored at -80°C. A second aliquot of the preculture (20 µL) was used to inoculate the 96 well expression plate containing 1 mL ZYM-505 medium [174] supplemented with 100 µg/mL ampicillin. Cells were grown for 4 h at 37°C with shaking before anhydrotetracycline (end concentration 200 ng/mL) was added, temperature was reduced to 28°C and cultures were grown for another 14 h. Cells were harvested by centrifugation (4000 g, 10 min) and immediately frozen at -20°C.

3.3.4. Thermostability screen

Cell pellets in 96-well plates were resuspended in 100 µL lysis buffer (50 mM Tris pH8.0, 0.2 mg/mL lysozyme) at room temperature and shaking for 20 min. The content was then again

52 frozen at -80°C for 20 min, thawed at room temperature and DNase solution was added (50 µL;

50 mM Tris pH 8.0, 5 µg/mL DNase, 3 mM MnCl2) to reduce viscosity. An aliquot of 100 µL of this crude lysate was added to 96-well PCR plates and heated to 70°C (if not noted otherwise) for 20 min in a Mastercycler gradient PCR thermocycler (Eppendorf, Germany). Cell debris and precipitated proteins were removed by centrifugation (3220 rcf, 10 min) and the supernatant was used to determine residual PcDTE activity using D-fructose as substrate. An aliquot of 40 µL of cleared heat-treated lysate was added to 260 µL substrate solution (15 mM D-fructose, 50 mM Tris pH8.0) at 30°C for 60 min and shaking. The reaction was stopped by injecting 20 µL of reaction solution to 1 mL of 95°C ddH2O. Conversion of D-fructose to D-psicose was quantified by HPLC using a LC ICS-3000 system (Dionex, Olten, Switzerland) equipped with a CarboPac PA1 column (250mm x 4mm I.D.) preceded by a CarboPac PA1 guard column (50 mm x 4mm I.D.) (both Dionex, Olten, Switzerland). Samples were eluted isocratically with 25 mM NaOH at a flow rate of 2.5 mL/min and detected by triple pulsed amperometry using an EC detector with a gold electrode (all Dionex, Sunnyvale, USA). The conversion obtained with variant enzymes was then compared to the conversions achieved with the 3 instances of parent protein and improved variants were retained.

3.3.5. Verification of hits

Variants exhibting more than 120% activity compared to heat-treated WT PcDTE were analyzed in detail. Those mutants were regrown in triplicates in 96-well plates and lysed (as described above), diluted with lysis buffer to 300 µL and 200 µL of each sample was aliquoted into 2 PCR plates. One plate was subjected to a heat-step at 70°C for 20 min while the other was kept on ice. The relative residual activity of all hits was then determined by dividing the activity of heat- treated samples by the activity of untreated samples. This relative residual activity was then compared to the relative residual activity of the equally treated reference protein (WT protein in the first screening round, and parent variant during ISM). Mutants that could reproduce more than 120% of relative residual activity compared to the reference protein in this assay were considered as hits.

3.3.6. Thermal denaturation curves

Strains with PcDTE variants with improved thermostability were used to inoculate 5 mL of LB medium supplemented with 100 µg/mL ampicillin and grown overnight. Shake flasks containing 20 mL of ZYM-505 medium supplemented with 100 µg/mL ampicillin were inoculated with 200 µL of overnight culture, grown at 37°C for 4 h shaking (220 rpm), and then anhydrotetracycline was added (200 ng/mL final concentration) followed by a reduction in incubation temperature to 28°C and incubation was continued for 14 h. Cells were harvested by centrifugation (3220 rcf, 10 min), frozen at -20°C for at least 20 min, resuspended in 2 mL lysis

53 buffer and incubated for 20 min at room temperature. The suspension was frozen at -80°C for

20 min, thawed, MnCl2 was added (final concentration 1 mM) and DNase (2 µL of a 5 mg/mL stock) were added to reduce viscosity. Cell debris was removed by centrifugation (3220 rcf, 10 min) and aliquots of 100 µL supernatant were distributed into 8 thin-walled PCR tubes. One tube was kept on ice, 7 tubes were subjected to a temperature gradient adjusted to the stability of the variants (starting with 40-80°C in the first round to 78°C – 90°C for Var8/Var8C) using a PeqStar 2X Gradient Thermocycler (Sarisbury Green, UK) for 20 min. Temperature was plotted against residual activity and datapoints were fitted to a second-order sigmoidal function using

20 SigmaPlot 12.2 (Systat Software Inc., CA, USA) in order to obtain T50 values (the temperature at which 50% of the initial activity is retained after 20 min).

3.3.7. B-factor screening

The analysis for potentially variable amino acid residues was performed on 3 PcDTE crystal structures (Protein Data Bank ID codes 2QUL, 2QUM, and 2QUN) and averaged B-factors for each residue were calculated[75]. The ten residues with the highest B-factors were K80, P118, L119, D120, M121, K122, D123, R125, R137 and N267 (residues which are part of the dimer interface are underlined). Met1 was excluded from further consideration despite its high B-factor as it cannot be easily modified without interfering with gene expression. Libraries to randomize the amino acid residues were generated in E. coli for each of the positions by site-saturation mutagenesis with NNK degeneracy primers (see above). Per site, 93 strains with variants, corresponding to approximately 95% library coverage [76], and 3 strains containing wild-type PcDTE were screened by exposure to 70°C for 20 min and subsequent activity determination at 30°C with a medium-throughput HPLC-based assay (see above). Only substitution of K122

20 (K122V) showed a slight impact on thermostability (T50 = 67.2°C/ +1.2°C compared to WT).

3.3.8. Construction of PcDTE variants with potential disulfide bridges

PcDTE variants engineered for potential disulfide bonds were generated by site-directed mutagenesis according to the QuickChange protocol (Stratagene) using the primers given inTable S4 . The first variant with one designed disulfide bond at position 157 required only one mutation (F157C) due to its position right in the center of the symmetry axis of the dimeric interface. The second variant with two designed disulfide bonds between position 160 and 262 of each other subunit required the introduction of two mutations (W160C and W262C). The PCR protocol was the same as described above for the generation of PcDTE libraries. Variants were expressed in BL21(DE3) cells that were grown in 50 mL ZYM-5052 autoinduction medium supplemented with 100 µg/mL ampicillin for 18 h at 28°C and 220 rpm in shake flasks. Thermal denaturation curves were determined as described above for the variants of the interface screening.

54 3.3.9. Expression and purification of improved PcDTE variants

To facilitate further characterization, PcDTE variants and WT protein were expressed using the T7 expression system. Plasmids encoding improved PcDTE WT or variant proteins were isolated from E. coli Top10 cells and used to transform chemo-competent E. coli BL21 (DE3) cells. The expression plasmids carried a hybrid pTet-pT7 promoter[88] and PcDTE and its variants could be easily expressed in these cells using the ZYM-5052 autoinduction medium[174]. PcDTE variants were produced in 300 mL scale in a 1 L shake flask at 28°C and 220 rpm. For this, 3 mL of overnight culture were transferred into 300 mL ZYM-5052 medium supplemented with 100 µg/mL ampicillin. The culture was grown at 28°C for 16 h with shaking (200 rpm). Cells were harvested by centrifugation (3220 rcf, 10 min) and stored at -20°C until further processing. Cells were resuspended in 5 mL lysis buffer (50 mM Tris pH8.0, 30 mM imidazol, 100 mM NaCl, 0.2 mg/mL lysozyme), incubated at room temperature for 20 min and frozen at -80°C for another 20 min. Cell suspension was thawed at room temperature, MnCl2 was added to a final concentration of 1 mM and DNase (5 µL of 5 mg/mL) was added to reduce viscosity. The cell debris was removed by centrifugation at 20’000 g for 20 min before the cleared lysate was applied to 2 mL Ni-Sepharose 6 Fast Flow (GE Healthcare) in a gravity-flow column. The column was extensively washed with lysis buffer before protein was eluted with elution buffer (50 mM Tris pH8.0, 200 mM imidazol, 100 mM NaCl). Main fractions containing the PcDTE variants were pooled and dialyzed against 20 mM Tris pH8.0, 1 mM EDTA, then 2 x against 20 mM Tris pH8.0. Dialyzed proteins were then supplemented with 1 mM MnCl2 and stored at 4°C. Enzyme purity was validated by SDS-PAGE and shown to be >95% pure. Enzyme concentration was determined spectrophotometrically (molecular weight: 33.7 kDa, extinction coefficient ε =46.4 x 103 M-1 cm-1).

3.3.10. Enzyme kinetic measurement

Enzyme kinetic constants of purified variants PcDTE WT, Var8 and Var8C were determined by recording initial conversion rates of D-fructose to D-psicose in 50 mM Tris pH8.0 at 4 different substrate concentrations (6 mM, 30 mM, 150 mM, 750 mM D-fructose) and for 5 different temperatures (30°C to 70°C). Reaction was stopped by adding 20 µL of the reaction mix to 145 µL of 0.1 M HCl which was followed by the addition of 135 µL of 0.1 M NaOH after 5 min. Conversion of D-fructose to D-psicose was determined by HPLC (see above). Kinetic parameters

Km and kcat were obtained by fitting initial velocities for each temperature to the Michaelis- Menten kinetic model using SigmaPlot 12.2 (Systat Software Inc., CA, USA).

3.3.11. Determination of specificity and selectivity of PcDTE WT and PcDTE Var8

Specific activity of PcDTE WT and Var8 was determined for D-fructose, D-psicose, D-sorbose, D- tagatose, L-fructose and L-sorbose. Initial conversion rates for each substrate were determined

55 in triplicates in 50 mM Tris, pH8.0 and 40 mM substrate concentration at 30°C and final enzyme concentrations between 0.05 and 0.2 mg/ml purified PcDTE WT or Var8. Reaction was stopped by adding 10 µL of the reaction mix to 148 µL of 0.1 M HCl which was followed by the addition of 142 µL of 0.1 M NaOH after 5 min. Conversion of the substrates to the respective products D-psicose (from D-fructose), D-fructose (from D-psicose), D-tagatose (from D- sorbose), D-sorbose (from D-tagatose), L-psicose (from L-fructose) and L-tagatose (from L- sorbose) was detected by HPLC (see above). No side-reaction products for any of the above epimerization reactions could be detected by HPLC.

3.3.12. PcDTE sequence alignment and Var8 structural model

Sequences listed in the Uniprot database as D-tagatose 3-epimerases were aligned using the multiple sequence alignment tool ClustalW [175]. A 3D structural model of PcDTE Var8 was generated using the SWISS-MODEL online software [176] on the basis of the crystal structure of 2QUL, the crystal structure of PcDTE solved at 1.79 Å resolution. ConSurf [177] was used to calculate the degree of conservation for PcDTE shown in Figure 3.1 based on a sequence alignment of 29 DTE sequences (Figure S2).

3.3.13. Half-life time determination in enzyme-membrane reactor

Determination of enzyme half-life times was performed in a jacketed enzyme-membrane reactor (EMR) (Julich Fine Chemicals, Julich, Germany) equipped with an AMICON regenerated cellulose membrane (MWCO 10 kDa) obtained from Millipore (Bedford, MA, USA). The temperature in the EMR was controlled by an external thermostat and operation of the reactor in a styrofoam container to minimize heat exchange with the environment. The EMR was stirred at 700 rpm using a magnetic stirrer and fed with an aqueous solution containing 50 mM D-fructose, 50 mM Tris pH 8.0 with a flow-rate of 0.25 mL/min implemented by an HPLC pump and resulting in a residence time of 47.2 min. Enzyme amounts of 100 g for WT protein, 105 g (70 g at 70°C) for Var8 or 92 g (61 g at 70°C) for Var8C were injected. Fractions were collected periodically and conversion of D-fructose to D-psicose was determined by HPLC. The apparent rate of deactivation was calculated from conversion profiles obtained from isothermal enzyme membrane reactor operation. The reactor was supplied with 61 – 100 g of purified enzyme and continuously fed with 50 mM D-fructose (50 mM Tris, pH 8.0) and operated at 40, 50, 60 and 70°C respectively. The operation period was adjusted for each temperature in order to observe a significant degree of conversion decrease. From the obtained profiles kinact,obs was estimated using a non-linear regression analysis based on the Levenberg- Marquardt algorithm using equation 1:

56 dce  kinact,obsce dt (1)

The inactivation rate constant kinact,obs could then be used together with the catalytic rate constant kcat,obs to calculate the TTN [155] (Eq. 2):

kcat,obs TTN  (2) kinact,obs

3.3.14. Long-term EMR stability experiment

The same EMR configuration was used as for the determination of half-life times (see above). The temperature was set to 50°C. D-fructose of a concentration of 1 M in 10 mM sodium phosphate buffer pH 7.0 was pumped with a flow-rate of 0.2 mL/min by an HPLC pump resulting in an EMR residence time of 59 min. For both PcDTE WT and PcDTE Var8, 2.16 mg of purified enzyme were supplied and fractions were collected periodically to determine the conversion by HPLC.

3.4. Acknowledgment

A.B. is indebted to the Swiss National Science foundation for funding (grant 200021-121918).

57 3.5. Supporting Material

100%

90%

80%

70%

60%

50%

40%

Residual Activity [%] Activity Residual 30% PcDTE wt 20% PcDTE F157C 10% PcDTE W160C/W262C

0% 30 35 40 45 50 55 60 65 70 75 80 Temperature [°C for 20 min]

Figure S3.3| Thermostability of PcDTE after engineering of potential disulfide-bridges. Residual activity curves of PcDTE WT, PcDTE F157C and PcDTE W160C/W262C after 20 min of incubation at the indicated temperature. For PcDTE W160C/W262C a complete loss of enzymatic activity was observed even in the non-heat-treated sample.

58 10 20 30 40 50 60 70 80 . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . |

O50580|PcDTE - - - MN K V GM F Y T Y W S - T E WM V D - F P A T A K R I A G L G F D L M E I S - L G E F H N L S D A K K R E L K A V A D D L G L T V M C C I G L K - - - S E Y D F A Q7CVR7|AtDTE - - - M K G L G V H A MMW S - L QWD H Q N A A R A I A G A A S Y G Q D F I E I P - - - L V D I A S - V D A E H T R S L L E Q H H L S A V C S L V L P - - - E P A WA S Q98GF0|RlDTE - - - - M K I GM C M F L W T - - T A V S K K H E P L L R D I K A T G F D G V E I P - - - I F A GM P - D D Y K K L G D L L D R I G L E R T A V S A M G D - - P T MN L I Q92W75|RmDTE - - - MQ G F G V H T S MW T - MN WD R P G A E R A V A A A V K Y K V D F I E I P - - - M L N P P A - V D T E H T R G L L E K S R L R A V C S L G L P - - - E R A WA S C3KRA2|R-DTE - - - MQ G F G V H T S MW T - MN WD R P G A E R A V A A A V K Y K V D F I E I P - - - M L N P P A - V D T E H T R A L L E K N R L R A L C S L G L P - - - E R A WA S Q7UUK7|RbDTE - - M P F R Y A I C N E T F G - - D M P V E D A L R L A K D A G Y T GW E V A P F M L S D D I S S Y S K S E R R T Y R D QM T E A E MQ C V G L HW L L A K T E G Y H L T A6LEB8|PdDTE - - MN P T F G A S I L S W I P P MW T P E G G L F A I Q Q A S A A G F D L L E I L - - - L P P S M E - F D A P T V K R Q L K Q H G L K A T C S L N L P - - - Q E A H I P C1KKR1|RsDTE - - - - M K N P V G I I S MQ F I R P F T S E S L H F L K K S R A L G F D F I E L L - - - V P E P E D G L D A A E V R R I C E G E G L G L V L A A R V N - - - L Q R S I A Q0FST9|PbDTE - - - - M K N P I G I I S MQ F I R P F A G K D L H Y F Q K A A D L G F D F V E L L - - - V P E P E D D L S I A D V K K A A D D A G I F T V L A A R V N - - - Q Q R S I A D3AJP4|ChDTE - - - - M K Y G I H F G H L G - S W Y D E L G V R E C L K Q A K E A G S D V F E F F P T K E M F D M E K D K I R E L K M Y M E E I G I E P A F T F G Y P - - - A GWD M A A3HYP0|A-DTE - - - MN K I G F N V L A W S - - A E M S D N L L P V L D R L K K I G Y D G A E F F - - - I G G S P E - E S F K M I G K H C A D I G L E V T A V T V M G - - - P E Q N A I D0THK9|B-DTE - - MN P T F G A S I L S W I P P MW T P E A G L F A I Q Q A S A A G F D L L E I L - - - L P P S M E - F D A P T V K R Q L K Q H G L K A T C S L N L P - - - K E A H I P A6DJL3|LaDTE MN K K N K V G I N MM L W T - - P F V E E K H F P I F N D L K N A G Y D G V E I P - - - L F E G D L - E H Y K K V K E A L D D L G L E C T T S T A C L - - - E D R N L I A6CEA9|PmDTE - - - - M K F G I C Q E L F V - - GWD W E Q Q C D L I A E I G Y T G I E L A P F A F A E R P S D I S A E Q R A F L R K T A E D R G L Q I F G L HW L L A K T E G L H L T D4MXK4|--DTE - - M K F K F G V D S F I WA E D F K E K D - - L W I I E K A K E I G F E V V D F A - - - I A N P Y T - F P T E Q V K A E L A R V GM D C V C T T T L T - - - L E T N P I C7XBR5|P-DTE - - MN P T F G A S I L S W I P P MW T P E G G L F A I Q Q A S A A G F D L L E I L - - - L P P S M E - F D A P T V K R Q L K Q H G L K A T C S L N L P - - - Q E A H I P A6DIY8|LaDTE - - - - M K Y A I C N E T Y Q - - N W S F E D T C R D I A S H G Y Q G V E I A P F T L K K N P E E L T I S E A K T F A K I A K A H D L E V A G L HW L L T K P V G L H I S F2AYT9|RbDTE - - M P F R Y A I C N E T F G - - D M P V E D A L R L A K D A G Y T GW E V A P F M L S D D I S S Y S K S E R R T Y R D QM T E A E MQ C V G L HW L L A K T E G Y H L T D7W1P8|CgDTE - MMN I K L G A S L L S W I T P L WN A E S G K Y A I E K T S Q A G F D L I E I L - - - L P G S M D - F D A S T V K K Q L K D N H L E A V C S L N L P - - - K D A H I A F5J829|A-DTE - - - M K G L G V H A MMW S - L QWN H Q N A A R A I A G A A S Y G Q D F I E I P - - - L V D I A S - V D A E H T R S L L E Q H D L S A V C S L V L P - - - E P A WA S F0YVV0|C-DTE - - - - M K Y G L H Y I Y WQ - K D L Q C K S Y V P Y V E K V K N L G F D V L E L G - D Y L V L N M P E S Q V E A L A A A S K E C G V E L S V G L D P P - - - A D S S L T Q98GE7|RHITE ------M H L S T H N - - WM R A E P L E T T L K R I K K F G Y E S I E I S - - - - - G E P E Q Y K T K E T R A L L K E H G I R C WG A V T L M L G - - E R N L A Q3IW04|RlDTE - - - - M K N P V G I I S MQ F I R P F T S E S L H F L K K S R A L G F D F I E L L - - - V P E P E D G L D A A E V R R I C E G E G L G L V L A A R V N - - - L Q R S I A D4MXC6|--DTE - - MN N K F G V D S F I WA E S F S E K D - - L W I I P K A K E L G F E V I D F A - - - I S N P F T - F P T E K V K E S L K E T G I D C V C T T T L T - - - E E T N P I B9NVC4|RbDTE ------M E F C Q L A A ------E A G Y D G V E I P - - - V L E G S P - E H Y F W L A S E L D A L G L K R A C T A I V P D - - P N A D P T F5SL39|D-DTE - - - - M K Y G V Y F A Y W E - D S WD V D - F E K Y V R K V K K L G F D I L E V A - A L G L V N L P E E K L E R L K Q L A E Q H D I I L T A G I G L P - - - K E Y D V S G0LA17|ZgDTE - - - - M K I GMN M L L W T - - N H V T E Q H F G I V D D L K K T G Y D G I E L F - - - F G E G S E - K Y Y S G L G R H F S S I D M G V T G V A S L S - - - A E Q N I A F7XHE8|SmDTE - - - MQ G F G V H T S MW T - MN WD R P G A E R A V A A A V K Y K V D F I E I P - - - M L N P P A - V D T E H T R A L L E K N R L R A V C S L G L P - - - E R A WA S A3ZS35|BmDTE - - - - M K F G I C N E T F Q - - D W P L A R G F E Y A K N A G Y A G V E I A P F T M A N S A Y D I T P Q Q R E E T R R A A E D A G V T V I G L HW L L A K T E G F Y L T Excluded from screening Stability Hits

90 100 110 120 130 140 150 160 . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . .

O50580|PcDTE S P D K S V R D A G T E Y V K R L L D D C H L L G A P V F A G L T F C A W P Q S P P L D M K - - - - - D K R P Y V D R A I E S V R R V I K V A E D Y G - I I Y A L E V V Q7CVR7|AtDTE V R P - - - - Q A A V E H L K A A L G K A A A M G A K A L T G V T Y G G T N E R T G F P P T ------R G E Y D N L T R A L S E A A G H A K T L G - L Q F G I E T V Q98GF0|RlDTE S A D A G T R K A G I A Y M K WA V D C A D A L G A R T L S G P L H S T L G A F S G S G P T ------P A E K N R S I A S Q R A I G D H A G T R N - V T I G L E A L Q92W75|RmDTE V R P - - - - E A A I D H L K I A I D K T A D L G G E A L S G V I Y G G I G E R T G V P P T ------V E E Y D N I A R V L Q A A A K H A K S R G - I E L G V E A V C3KRA2|R-DTE V R P - - - - D A A I E H L K V A I D K T A D L G G E A L S G V I Y G G I G E R T G V P P S ------L E E Y D N I A R V L T A A A K H A K S R G - I E L G V E P V Q7UUK7|RbDTE T R D A I T R A S T T A Y L C D L A E L C A D L G G K V M V L G S P Q Q R N R T E G Q S I E ------E A M E N A A E V L R G V V P A L H S H G - V R I A L E P L A6LEB8|PdDTE F Y P - - - - K E A T R L I K A A L D K A S E L E V D Y L G G V L H S G I G V F S G K Q R T ------R E E E N T L C E V WA E V A E Y A G R S G - I T I G I E P I C1KKR1|RsDTE S E E A A A R A G G R D Y L K Y C I E A A E A L G A T I V G G P L Y G E P L V F A G R P P F P W T A E Q I A T R A A R T V E G L A E V A P L A A S A G - K V F G L E P L Q0FST9|PbDTE S E D A A N R Q G GM D Y L K L C I D V A A E L G G G I V G G P L Y G E P M V F A G R P P V P R T D D E I A A R A E R T I S A F Q T I A P Q A E A A G - V T F A V E A L D3AJP4|ChDTE G E D E A N R E K A V E H L K R I I E S M G V L G A T G I G G I V Y S N W P A D Y S L Q V I E - - P D D K K R R K D N C I A G L R K V M K T A E D N N - V T V N L E I V A3HYP0|A-DTE S P D A K I R A A A S E Q L K WV I D R A A D L N A Q V L C G P Y H S A F T V F A S R E P L ------E D E Y N W S A E Y L H G V A D Y A K E A G - V L L T P E A L D0THK9|B-DTE F Y P - - - - K K A T C L I K E A L D K A S E L E V D Y L G G V L H S G I G V F S G K Q R T ------R E E E N T L C E V WA E V A E Y A G R S G - I T I G I E P I A6DJL3|LaDTE S P D E K C R E A A L Q H L K WA V D C S A I L G S E A L C G P F H S A P G V L T G K A P S ------S E E MN WA I K G L R E L A D Y A K G K D - V L L T I E Y L A6CEA9|PmDTE T A D A A V R K K T A S Y L V E L G E L C A D L G G D L M V F G S P F Q R N I E E GM T R E ------Q A Y A N A A E V F R N C L P A I G E R G - V R I C M E P L D4MXK4|--DTE S P D E E I R K A A V A A M K K C V D I C N T L G A P I L G G V N Y A GWG Y L T K K P R T ------D Q E WD WG V A C M R E V A E Y A K E T G D V T I C V E C V C7XBR5|P-DTE F Y P - - - - K E A T R L I K A A L D K A S E L E V D Y L G G V L H S G I G V F S G K Q R T ------R E E E N T L C E V WA E V A E Y A G R S G - I T I G I E P I A6DIY8|LaDTE T P D C E V R K R T T N F L Q H L V R L N A A M G G D V L V F G S P M S R N V P E G E D Y N ------E Y WD R A R D S I A I M A N E A E N E G - G I I A I E P L F2AYT9|RbDTE T R D A I T R A S T T A Y L C D L A E L C A D L G G K V M V L G S P Q Q R N R T E G Q S I E ------E A M E N A A E V L R G V V P A L H S H G - V R I A L E P L D7W1P8|CgDTE F E P - - - - E A A E K L I K K A I D K V D E L E T H L L A G V L H G G I G V F T G K P L T ------E N E K E I I A D V W C N V A D Y A Q A K S - I D I A I E P I F5J829|A-DTE V R P - - - - Q A A V E H L K A A L D K A A A M G A K A L T G V T Y G G T N E R T G F P P T ------Q G E Y D N L T R A L S E A A A H A K T L G - L Q F G I E T V F0YVV0|C-DTE G E D K D C R E K G I E F Y K R A F A R L E K L G I R T M G G N L L N A P A R V P L K E Y I - - - - - E K E W E Y G - - V D S L V K I G R S A A E Y G - I D L N I E I C Q98GE7|RHITE A K N Q G Q R E R S V Q Y V K D V L T M V S E L D G E I I T L V P A T V G K V V P D G T E E ------E E W K WV V D A T R E C F T H A K K V G - V K I A V E P L Q3IW04|RlDTE S E E P A A R A G G R D Y L K Y C I E A A E A L G A T I V G G P L Y G E P L V F A G R P P F P W T A E Q I A A R A A R T V E G L A E V A P L A A S A G - K V F G L E P L D4MXC6|--DTE S P D P A I R K N A L D A M K K C V D I C N E L N A P I L G G V N Y A A WG Y L T K K P R T ------Q D E WN WG V E N M K Q V A A Y A K E T G N V T I C V E C V B9NVC4|RbDTE H A D P V I R Q R G R A H L E W I L D C A T A L G A E T I G G P F H A P I G H F T G S G P T ------T E E WQ R G A E A H H R M A E Q A D E R G - M L L A L E P L F5SL39|D-DTE S T D K K V R R N G I S F M K K V M D A M H Q A G I H R I G G T V Y S Y W P V D Y S C S F ------D K P A V R K H S I E S V R E L A E Y A R Q Y N - I T L L I E T L G0LA17|ZgDTE S P D K K V R E A G L E R L K W S I D M G E A A N A E V L C G P F H S T F A L F T R Q P P T ------L D E K K W S N E M L L K A A E Y A K G A N - I I L T P E A V F7XHE8|SmDTE V R P - - - - E A A I D H L K I A I D K T A D L G G E A L S G V I Y G G I G E R T G V P P T ------V E E Y D N I A R V L Q A A A K H A K S R G - I E L G V E A V A3ZS35|BmDTE T P D D E V R N R T S D Y F A E L A R L C R D L G G T I M V L G S P Q Q R N L L P G V T E A ------E A M K Y A A N C L R R A M P T L E E C G - I T L A L E P L Excluded from screening Stability Hits H H 170 180 190 200 210 220 230 240 250 | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | O50580|PcDTE N R F E QW L C N D A K E A I A F A D A V D S P A C K V Q L D T F HMN I E E T S F R D A I L A C - K G K M G H F H L G E A N R L P P - G E G R L P WD E I F G A L K E I G Q7CVR7|AtDTE N R Y E N H L L N S A E Q A V A L V E R I G A D N I F I H L D T F HMN M E E K G I A N G I I A A - R N H L K Y M HM S E S D R G T P - G F G N V A WD E V F S A L A A I R Q98GF0|RlDTE N R F E C Y L F N T M A D L S E H I D A V G R P H I K A M Y D T F H A N I E E A D P I G A Y T K H - R R N V V H I H I S E N D R G V P - G R G N I P W K E T F A A I R K S G Q92W75|RmDTE N R Y E N H L I N T GWQ A V K M I E R V G A D N V F V H L D T Y HMN I E E K G V G K G I L D A - R E H L K Y I H L S E S D R G T P - G Y G T C GWD E I F S T L A A I G C3KRA2|R-DTE N R Y E N H L I N T GWQ A V K M I E R V G A D N I F V H L D T Y HMN I E E K G V G K G I L D A - R E H L K Y I H L S E S D R G T P - G Y G T C GWD E I F S T L A A I G Q7UUK7|RbDTE G P A E G D F L N T A D E G V R L A E M I D D D H I G L H L D V K A M S - S E S K P I E T V I R E H A D S M I H F H A N D P N L L G P - GM G D V P F Q P I M K A L S D I D A6LEB8|PdDTE N R Y E S Y M C T S A E E V L R F I K C V D A P N L S L H L D T F HMN I E E T S F Y E P V I A A - G S R L R H I HM T E S D R GM L - G E G N V R WD D L F R G L Q E I D C1KKR1|RsDTE N R F E T D I V N T T A Q A I E V V D A V G S P G L G V M L D T F HMN M E E R S I P D A I R A T - G A R L V H F Q A N E N H R G F P - G T G T M D W T A I A R A L G Q A G Q0FST9|PbDTE N R F E T D I L S T T R Q A C E V V D A V D N P G F K L M L D T F HMN M E E R S I P D A I R M A - G D R I V H F Q A N E N H R G H P - G T G H I D W T A V M R A L A Q V N D3AJP4|ChDTE N R F E H Y L MN T A A E G I E V C K A V G S P N C K L L L D C F HMN I E E D S L P E A I R S A - R G Y L G H F H V S E P N R K V P Y H T D R I P WN E V G R A L R D I G A3HYP0|A-DTE N R F E C Y L C N T M E Q L S Y L L K K V N H P N V Q A M F D T H H A N I E E K K L G E A I K Y I - A P Q L G H F H I S E N D R G T P - G S G H V N F D E T F K A L A E V N D0THK9|B-DTE N R Y E S Y M C T S A E E V L R F I K C V D A P N L S L H L D T F HMN I E E T S F Y E P V I A A - G S R L R H I HM T E S N R GM L - G E G N V R WD D L F R G L Q E I D A6DJL3|LaDTE N R F E S H L T N T L A Q T V E L V E A V G A D N L G I H Y D T H H A H L E E Y S L S E A I Q Q A - G K H I K H V Q Y S E S N R G I P - G Q G Q V N WQ E N T S A L K E I G A6CEA9|PmDTE T T K E T D F V N T C A E A L E L I D M V G A D N F V L H Q D V K A M L G A E T E S I P E L I H K Y D T R T G H F H V N D S N L L G P - GM G E T D Y H P I F K A L K E S R D4MXK4|--DTE N R F E T H F L N I A E D A V K F C K D V G T G N V K V H L D C F HM I R E E K S F S G A V K T C G K E Y L G Y V H V N E N D R G I P - G T G L V P F K E F F Q A L D E I G C7XBR5|P-DTE N R Y E S Y M C T S A E E V L R F I K R V D A P N L S L H L D T F HMN I E E T S F Y E P V I A A - G S R L R H I HM T E S N R GM L - G E G N V R WD D L F R G L Q E I D A6DIY8|LaDTE G H V E T N F F T S A E E T I K M I K E I N S P N C R L H L D V K A M S - Y E D K A I A D I I A D S A E Y L E H F H A N D P N L R G P - G T G D I D Y A P I Y K A L N K I N F2AYT9|RbDTE G P A E G D F L N T A D E G V R L A E M I D D D H I G L H L D V K A M S - S E S K P I E T V I R E H A D S M I H F H A N D P N L L G P - GM G D V P F Q P I M K A L S D I D D7W1P8|CgDTE N R Y E S Y V C N T A E N V L E L I K K T G K N N I F L H L D T F HMN I E E D N F Y D P I I K S - G K M L K H I H V T E S H R GM L - G E G T V N W E E F F A A L K K I N F5J829|A-DTE N R Y E N H L L N S A E Q A V A L V E R I G A D N I F I H L D T F HMN M E E K G I A N G I I A A - R N H L K Y M HM S E S D R G T P - G F G N V A WD E V F S A L A A I R F0YVV0|C-DTE N R F E N H I L N T A E Q G V R F A K A V G L P N V K I L L D T F HMN I E E D S F F E A F L T A - G E Y L G H V H L G E N H R R L P - G K G H L P WN E I R D A I K A V N Q98GE7|RHITE N R F E T Y L F N R G A Q A L A L A D A V S - P E C G V C L D A Y H I HM E E F N V Y D A I R K V - G K R L F D F H V A D N N R F A A - G L G Q I D W P K I V A T L K E V G Q3IW04|RlDTE N R F E T D I V N T T A Q A I E V V D A V G S P G L G V M L D T F HMN M E E R S I P D A I R A T - G A R L V H F Q A N E N H R G F P - G T G T M D W T A I A R A L G Q A G D4MXC6|--DTE N R F E T H F L N I A E D A V A F C K A T GM D N V K V H L D C F HM I R E E K S F S E A V K T C G K Q Y L G Y V H V N E N D R G I P - G T G L V P F K E F F E A I K E I G B9NVC4|RbDTE N R F E T H F L N T A A Q A A Q Y C E L V D H P A F G I M Y D T F H A H I E E K D Q A A A I G T L - S G Q I N V L H I S E N D R G T P - G T G Q V D F D T V F S A V K K S G F5SL39|D-DTE N R F E Q F L L N D A E E A V A Y V K E V D E P N V K V M L D T F HMN I E E D H I A D A I R Y T - G D H L G Q L H I G E A N R K V P - G K G S M P W T E I G Q A L K D I R G0LA17|ZgDTE N R F E C Y L Y N T M A D L G E M V K T V N H P N L G A M F D T H H A N I E E K S Q S G A I K T I - A P H L K H V H I S E N D R G T P - G K G Q I D WD D V F S A L K E I D F7XHE8|SmDTE N R Y E N H L I N T GWQ A V K M I E R V G A D N V F V H L D T Y HMN I E E K G V G K G I L D A - R E H L K Y I H L S E S D R G T P - G Y G T C GWD E I F S T L A A I G A3ZS35|BmDTE G P A E G D F L L T A E K G L E L R E M I D S P N C Q L H L D V K A M S - S E S K P I P Q I I R D S V P H I A H F H A N D A N K L G P - GM G D I D F H P I F A A L K E V N Excluded from screening Stability Hits H H H H 260 270 280 290 300 310 320 . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . .

O50580|PcDTE Y D G T I V M E P F M R K G G S V S R A V G V W R D ------M S N G A T D E E M D E R A R R S L Q F V R D K L A ------Q7CVR7|AtDTE F N G V L A L E G F A A M P V E M A G A I S T W R P ------V A A N T E E S L E K G L A F L R D K A N Q Y R I F E ------Q98GF0|RlDTE Y D D W L T I E A F G R S L K D L A A A T K V W R D ------F S E T P E A V Y R E G Y K H I R N GW K K A A ------Q92W75|RmDTE F K G G L A M E S F I N M P P E V A Y G L A V W R P ------V A K D E E E V M G N G L P F L R N K A T Q Y G L I ------C3KRA2|R-DTE F T G G L A M E S F I N M P P E V A Y G L A V W R P ------V A K D E E E V M G N G L P F L R N K A R Q Y G L I ------Q7UUK7|RbDTE Y D GWV S V E V F D Y S ------P G A E T L A R E S I A N L K N S I G ------A6LEB8|PdDTE F N G N L V L E N F S N S V P GM A V A V S L W R P ------S K Y N A D E L A K G S L A F M R K M T C E H Q ------C1KKR1|RsDTE Y A G P V S L E P F R R D D E R V A L P I A HW R A ------P H E D E D E K L R A G L G L I R S A I T L A E V T H ------Q0FST9|PbDTE Y A G P I S L E P F R R A D D R I A L P I A HW R A ------P R E D E S D K L M A G L G V I R N A L A L A E V D Q ------D3AJP4|ChDTE Y D K A V I V E S F Y K F G G V Q G H N M R MW R D ------L D P D L S L E S R L K L A R Q G I E Y I R G Q F G G ------A3HYP0|A-DTE Y K GW L T I E G F T R N D P A F A N S I G V W R ------N F S E P WD M A E K G F E L I - K GM G E K Y G L ------D0THK9|B-DTE F N G N L V L E N F S N S V P GM A E A V S L W R P ------S K Y N A D E L A K G S L A F M R K M T C E H Q ------A6DJL3|LaDTE Y E GWV V I E A F T Q N D P G I V A A L H L W R P ------L F S S E R E V Y D A G I A L V K K F W S ------A6CEA9|PmDTE Y D GW I S V E V F D Y E ------P G C E H I A R E S F R Y M K E V W E S V ------D4MXK4|--DTE Y D G P L V I E S F D P S F E E L A G N C A I W R N ------F A D T G E E L A I E G L K N L K A I A D A M ------C7XBR5|P-DTE F N G N L V L E N F S N S V P GM A E A V S L W R P ------S K Y N A D E L A K G S L A F M R K M T C E H Q ------A6DIY8|LaDTE Y S K W L S I E V F N Y D ------E G P E N I A R N S I E F L K N C E K K Y E S S Q T I S ------F2AYT9|RbDTE Y D GWV S V E V F D Y S ------P G A E T L A R E S I A N L K N S I G ------D7W1P8|CgDTE F E G N L V L E N F S S S I P GMQ E K V S L WQ K ------S P Y D A Q T L A E G S L A F L K K H L S L ------F5J829|A-DTE F N G V L A L E G F A A M P V E M A G A I S T W R P ------V A A S T E E S L E K G L A F L R D K A N Q Y R I F E ------F0YVV0|C-DTE F Q G I M T M E P L V N A G D E L G D C C R I W R D ------M T D G A D A A R M D A D A G Q A L Q F M H Y L F S ------Q98GE7|RHITE Y D G A L T N E F V A P V D R T P A A P Y P E M V E R H P V D I S P E Q L K F I Q D H G S S V L T E K F Y T D QM R I T A E T L L P L I K Q3IW04|RlDTE Y A G P V S L E P F R R D D E R V A L P I A HW R A ------P H E D E D E K L R A G L G L I R S A L T L A E V T H ------D4MXC6|--DTE Y D G P L V I E S F D P S F E E L A G N C A I W R S ------F A K T G E E L A V K G L A N L K A I A E T V ------B9NVC4|RbDTE F D GWV V M E A F G A G V P E L A A A T R I W R P ------M F D D H P Q L F H D S A A F I R K GWA R A ------F5SL39|D-DTE Y D G Y V V M E P F I K T G G Q V G R D I K L W R D ------L S G N A T E E Q L D R E L A E S L E F V K A A F G E ------G0LA17|ZgDTE Y H GWV T I E A F S T A I P E F A N A I N V W R ------N Y S P V E E V Y T E G F K L I S Q G L G I T K ------F7XHE8|SmDTE F R G G L A M E S F I N M P P E V A Y G L A V W R P ------V A K D E E E V M G N G L P F L R N K A T Q Y G L I ------A3ZS35|BmDTE Y D GWV S V E V F N Y E ------P G L E A L V D G S L K Y M K S C L G Q ------Excluded from screening Stability Hits H H H Figure S3.1| Sequence alignment of 29 D-tagatose epimerase homologs. Blue boxes indicate residues that contribute to the dimer interface, interface residues that were excluded from screening are marked by red boxes, positions that yielded a mutant with improved thermostability are indicated by green boxes.

59 Table S3.4| List of all residues of PcDTE constituting the dimeric interface (based on prediction of PDBePISA on PDB 2QUN). From a total of 44 residues, 3 (colored white) were excluded from screening due to negligible buried surface area (BSA) and 10 (colored orange) were excluded due to high conservation and/or contact to highly conserved interface residues, based on a sequence alignment and the crystal structure of PcDTE. The remaining 31 residues (green) were screened as described in the methods part.

Sat. Mut. Site Comment

Ser116 included in screen Pro117 included in screen Pro118 position also detected by B-factor analysis, included in screen Leu119 position also detected by B-factor analysis, included in screen Met121 position also detected by B-factor analysis, included in screen Lys122 position also detected by B-factor analysis, included in screen Lys124 included in screen Asn155 conserved; directed toward inner of monomer; no direct contact to other subunit; excluded from screen Arg156 conserved; contacts Cα carbonyl oxygen of Val259; in contact with Trp262; excluded from screen Phe157 included in screen Glu158 strictly conserved; points toward active site; no direct contact to other subunit; excluded from screen Trp160 included in screen Asn163 included in screen Asp164 included in screen Glu167 included in screen Phe187 no direct contact to any other conserved interface residue of the other subunit; included in screen Met189 only 0.12 Å buried surface area; excluded from screen Asn190 contacts the carbonyl-oxygen of Ans190 of the other subunit; contacts Arg224 of the other subunit; included in screen Ile191 included in screen Glu192 conserved; contacts Arg 263 of the other subunit; excluded from screen Glu193 strictly conserved; contacts Arg224 via the Cα carbonyl oxygen; no direct contact to any side-chain of the other subunit; excluded from screen Thr194 included in screen Ser195 included in screen Phe196 included in screen Ala215 included in screen Asn216 included in screen Arg217 conserved; points toward the active site ot the ist subunit; excluded from screen Leu218 included in screen Glu222 included in screen Gly223 highly conserved; no contact to any other interface residue of the other subunit; highly conserved; excluded from screen Arg224 included in screen Pro226 only 0.5 Å buried surface area; excluded from screen Glu229 only 0.12 Å buried surface area; excluded from screen Phe248 highly conserved; no direct contact to any other conserved interface residue of the other subunit; points toward the active site; excluded from screen Lys251 included in screen Gly252 included in screen Arg257 included in screen Ala258 included in screen Val259 included in screen Gly260 included in screen Val261 included in screen Trp262 conserved; contacts Arg156; excluded from screen Arg263 conserved; contacts Glu192 of other subunit via one salt-bridge and 3 hydrogen bonds; excluded from screen Met265 included in screen

Screened 31 Not screened due to low 3 BSA Not screened due to conservation/contact to 10 conserved residue Total 44

60

Interface Residues 2QUN Chain B

SER116 PRO117 PRO118 LEU119 MET121 LYS122 LYS124 ASN155* ARG156* PHE157 GLU158* TRP160 ASN163 ASP164 GLU167 PHE187 MET189* ASN190 ILE191 GLU192* GLU193* THR194 SER195 PHE196 ALA215 ASN216 ARG217* LEU218 GLU222 GLY223* ARG224 PRO226* PHE248* GLY252 ARG257 ALA258 VAL259 GLY260 VAL261 TRP262* ARG263* MET265 SER116 PRO117 H PRO118 H LEU119 MET121 H LYS122 LYS124 H *ASN155 *ARG156 H H H PHE157 *GLU158 TRP160 ASN163 H ASP164 HS GLU167 PHE187 *MET189 H ASN190 H H ILE191 *GLU192 H HS *GLU193 H THR194 H SER195 PHE196 ALA215 ASN216 H H H *ARG217 LEU218

Interface Residues 2QUN Chain A GLU222 *GLY223 ARG224 H H *PRO226 *GLU229 *PHE248 LYS251 GLY252 ARG257 H ALA258 VAL259 H GLY260 VAL261 *TRP262 H *ARG263 HS HS MET265 # of contacts 2 2 1 1 3 2 2 1 10 8 1 1 2 1 1 3 1 4 6 2 3 5 1 1 3 5 1 1 1 1 9 1 1 1 5 1 2 3 1 7 6 2

contacting residues H hydrogen bonds only HS hydrogen bonds and salt bridges contact between two positions with improved variants ASN155* position excluded from screening ASP164 position with improvement in thermostability

Figure S3.2| Contact map of PcDTE homodimeric interface residues. Residues from PcDTE chain A that contact residues from chain B are marked by red squares. 4 such contacts are present between residues at positions that exhibited variants with improved thermostability (green border). This graph was prepared based on the PcDTE crystal structure (PDB ID 2QUN) and with the help of the CMA (Contact Map Analysis) online software (http://ligin.weizmann.ac.il/cma) [178].

61 Table S3.5| Oligonucleotides used for site directed saturation mutagenesis of pKTS-PcDTE-C6H Name Sequence (5‘-3‘) Lib_S116_f CATGGCCTCAGNNKCCACCGCTGG Lib_S116_r CCAGCGGTGGMNNCTGAGGCCATG Lib_P117_f GGCCTCAGTCTNNKCCGCTGGATATG Lib_P117_r CATATCCAGCGGMNNAGACTGAGGCC Lib_K124_f GATATGAAAGATNNKCGTCCGTATG Lib_K124_r CATACGGACGMNNATCTTTCATATC Lib_F157_f GTGGTGAATCGTNNKGAACAGTGGCTG Lib_F157_r CAGCCACTGTTCMNNACGATTCACCAC Lib_W160_f GTTTTGAACAGNNKCTGTGCAATGATG Lib_W160_r CATCATTGCACAGMNNCTGTTCAAAAC Lib_N163_f CAGTGGCTGTGCNNKGATGCAAAAGAAG Lib_N163_r CTTCTTTTGCATCMNNGCACAGCCACTG Lib_D164_f GGCTGTGCAATNNKGCAAAAGAAGC Lib_D164_r GCTTCTTTTGCMNNATTGCACAGCC Lib_E167_f CAATGATGCAAAANNKGCAATCGCATTTG Lib_E167_r CAAATGCGATTGCMNNTTTTGCATCATTG Lib_F187_f CAGCTGGATACCNNKCACATGAATATTG Lib_F187_r CAATATTCATGTGMNNGGTATCCAGCTG Lib_N190_f CCTTTCACATGNNKATTGAAGAAAC Lib_N190_r GTTTCTTCAATMNNCATGTGAAAGG Lib_I191_f CTTTCACATGAATNNKGAAGAAACCAG Lib_I191_r CTGGTTTCTTCMNNATTCATGTGAAAG Lib_T194_f GAATATTGAAGAANNKAGCTTTCGCG Lib_T194_r CGCGAAAGCTMNNTTCTTCAATATTC Lib_S195_f GAAGAAACCNNKTTTCGCGATGC Lib_S195_r GCATCGCGAAAMNNGGTTTCTTC Lib_F196_f GAAGAAACCAGCNNKCGCGATGCAATTC Lib_F196_r GAATTGCATCGCGMNNGCTGGTTTCTTC Lib_A215_f CATCTGGGTGAANNKAATCGTCTGCC Lib_A215_r GGCAGACGATTMNNTTCACCCAGATG Lib_N216_f CTGGGTGAAGCANNKCGTCTGCCTCC Lib_N216_r GGAGGCAGACGMNNTGCTTCACCCAG Lib_L218_f GAAGCAAATCGTNNKCCTCCTGGTG Lib_L218_r CACCAGGAGGMNNACGATTTGCTTC Lib_E222_f CTGCCTCCTGGTNNKGGTCGTCTGCC Lib_E222_r GGCAGACGACCMNNACCAGGAGGCAG Lib_R224_f CTGGTGAAGGTNNKCTGCCGTGGGATG Lib_R224_r CATCCCACGGCAGMNNACCTTCACCAG Lib_K251_f CGTTTATGCGTNNKGGTGGTAGCG Lib_K251_r CGCTACCACCMNNACGCATAAACG Lib_G252_f GTTTATGCGTAAANNKGGTAGCGTTAG Lib_G252_r CTAACGCTACCMNNTTTACGCATAAAC Lib_R257_f GTAGCGTTAGCNNKGCAGTTGGTG Lib_R257_r CACCAACTGCMNNGCTAACGCTAC Lib_A258_f GCGTTAGCCGTNNKGTTGGTGTTTG Lib_A258_r CAAACACCAACMNNACGGCTAACGC Lib_V259_f GTTAGCCGTGCANNKGGTGTTTGGCG Lib_V259_r CGCCAAACACCMNNTGCACGGCTAAC Lib_G260_f GCCGTGCAGTTNNKGTTTGGCGTG Lib_G260_r CACGCCAAACMNNAACTGCACGGC Lib_V261_f GTGCAGTTGGTNNKTGGCGTGATATG Lib_V261_r CATATCACGCCAMNNACCAACTGCAC Lib_M265_f GTTTGGCGTGATNNKAGCAATGGTGC Lib_M265_r GCACCATTGCTMNNATCACGCCAAAC PcDTE-F157C_f GGTGAATCGTTGTGAACAGTGGC PcDTE-F157C_r GCCACTGTTCACAACGATTCACC PcDTE-W160C_f GTTTTGAACAGTGCCTGTGCAATGATG PcDTE-W160C_r CATCATTGCACAGGCACTGTTCAAAAC PcDTE-W262C_f GTTGGTGTTTGCCGTGATATGAGC PcDTE-W262C_r GCTCATATCACGGCAAACACCAAC

62 Table S5| Observed half-life times t1/2 and kinact,obs values a) Analytic conditions ([D-fructose]: 50 mM)

PcDTE WT PcDTE Var8 PcDTE Var8C Temperature [°C] -1 -1 -1 t1/2 (h) kinact,obs (h ) t1/2 (h) kinact,obs (h ) t1/2 (h) kinact,obs (h )

40 24.1 0.029 23.0 0.030 33.2 0.021 50 5.2 0.132 15.8 0.044 13.7 0.051 60 0.6 1.153 5.1 0.135 6.6 0.105 70 0.1 12.327 1.0 0.722 0.4 1.581

b) Production conditions ([D-fructose]: 1 M)

PcDTE WT PcDTE Var8 Temperature [°C] -1 -1 t1/2 (h) kinact,obs (h ) t1/2 (h) kinact,obs (h )

50 127.0 0.0055 1267.2 0.0005

Table S3.6| Specific activities of PcDTE WT and Var8 for 6 different ketohexoses (40 mM substrate concentration, 50 mM Tris, pH8.0 and 30°C) and relative activities of Var8 compared to WT for each substrate. Rate PcDTE WT Rate PcDTE Var8 Relative activity of Var8 Substrate [µmol min-1 mg-1] [µmol min-1 mg-1] compared to PcDTE WT

D-fructose 8.1 ± 0.2 2.7 ± 0.04 33.2% ± 0.5% D-psicose 34.2 ± 0.8 8.6 ± 0.2 25.1% ± 0.5% D-sorbose 12.0 ± 0.2 6.7 ± 0.07 55.9% ± 0.6% D-tagatose 29.53 ± 0.5 10.0 ± 0.16 33.7% ± 0.5% L-fructose 1.8 ± 0.02 0.9 ± 0.004 48.7% ± 0.2% L-sorbose 0.33 ± 0.001 0.09 ± 0.002 28.6% ± 0.5%

63

64 CHAPTER 4: DIRECTED DIVERGENT EVOLUTION OF A THERMOSTABLE D-TAGATOSE EPIMERASE TOWARDS IMPROVED ACTIVITY FOR TWO DIFFERENT HEXOSE SUBSTRATES

Andreas Bosshart1, Chee Seng Hee2, Matthias Bechtold1, Tilman Schirmer2 and Sven Panke1

1 Bioprocess Laboratory, Department of Biosystems Science and Engineering, ETH Zurich Mattenstrasse 26, 4058 Basel, Switzerland 2 Structural Biology and Biophysics, Biozentrum, University of Basel, CH-4056 Basel, Switzerland

65 4.1. Abstract

Functional promiscuity of enzymes can often be harnessed as starting point for the directed evolution of novel biocatalysts which catalyze desired reactions with high specific activity and selectivity. Here we describe the divergent morphing of a thermostable, promiscuous D- tagatose epimerase (DTE) into two efficient epimerization catalysts, one for the conversion of D-fructose to D-psicose and one for the conversion of L-sorbose to L-tagatose. Iterative single- site randomization and screening of 48 residues in the first and second shell around the active site cavity revealed 8 mutations that improved specific activity towards D-fructose 8.6-fold and 6 mutations that improved activity towards L-tagatose 13.5-fold, respectively. X-Ray crystallography of the most improved variants as enzyme-substrate/product complexes revealed that the mechanism for activity improvement of the two evolutionary trajectories is significantly different. Variant IDF8 with improved activity for D-fructose exhibited a concentration of mutations around the entrance of the active site cavity, creating a charged patch that supposedly facilitates the entry of the polar substrate. In contrast, variant ILS6 with improved activity for L-sorbose revealed differences in the coordination of the substrate/product molecule by hydrogen bonding. By targeting only residues in vicinity to the active site a high ratio of hits per screened clone was obtained, reducing the screening workload to a minimum while at the same time obtaining final variants that are readily usable as efficient catalysts for the production of the rare hexoses D-psicose and L-tagatose, respectively.

4.2. Introduction

In the last decades, directed evolution of proteins has become a powerful tool to tailor enzymes for their application in industrial biocatalysis [44, 47]. Directed evolution, relying on iterative cycles of mutagenesis and screening of the resulting libraries for variants that exhibit the desired traits [179], has been shown to allow improving almost any property that is of importance for an industrially useful biocatalyst, including thermostability [106, 180], enantioselectivity [77] or catalytic rate [58]. There is a growing body of evidence that enzymes from thermophiles are more suitable to serve as starting points for directed evolution than enzymes from mesophiles [34, 36, 59, 181]. The major reason for this is that most mutations, and mutations altering protein function in particular, are destabilizing, increasing the chance that the mutation results in a protein that is not folded (correctly) and thus not functional anymore [59]. Proteins from thermophilic organisms on the other hand have to encode enzymes that are functional at high temperatures, thus exhibiting a larger free energy difference (ΔGu) between folded (native) and unfolded state than mesophilic enzymes [181, 182]. This free folding energy can be considered as a ‘stability reservoir’ that can be consumed by accumulating mutations encoding for new

66 protein functions but are destabilizing in terms of free energy of folding. This larger reservoir of thermostable enzymes therefore allows exploring a larger fraction of protein sequence space, which is often referred to as ‘evolvability’ [183], before the limit of folding stability is reached [36]. The enzyme D-tagatose epimerase (DTE) catalyzes the interconversion of ketohexoses into their respective C3-epimers [184]. It constitutes the central enzyme for biocatalytic access to rare monosaccharides, which have recently attracted great interest as low calorie sweeteners, chiral building blocks or as active pharmaceutical ingredients [8]. By combining DTE with maximally two additional isomerases, the whole set of 24 hexoses can be generated in a short cascade reaction from only 4 starting materials that are cheaply available (D-glucose, D- fructose, D-galactose, L-sorbose) [7]. D-Tagatose epimerases from various organisms are known to date [37, 39, 43, 185], however none of them exhibits catalytic rates on the respective substrates that would make them attractive for application in an industrial context. On the other hand, no active DTE homolog from thermophilic origin is known that could serve as an optimal starting point for improving catalytic efficiency. In this case the reasons presented above clearly suggest proceeding in two steps. First, a mesostable enzyme that catalyzes the desired reaction should be (thermo-) stabilized and in a second step this thermostable enzyme variant should be evolved towards the desired (novel) functions. In line with this reasoning we recently described the thermostabilization of a dimeric DTE from the mesophile Pseudomonas cichorii (PcDTE) by systematically optimizing the dimeric interface interactions, generating a variant termed PcDTE Var8 [106]. This enzyme shows promiscuous enzymatic activity for C3 epimerization of all 4 ketohexose pairs and was therefore an interesting starting point for further catalytic rate optimization. We decided to divergently evolve this thermostable Var8 for improved production of two rare ketohexoses, specifically D- psicose from D-fructose and L-tagatose from L-sorbose. Both products are rare hexoses that are of considerable interest as low-calorie sweetener or chiral building blocks [186]. PcDTE Var8 shows mediocre catalytic activity towards the epimerization of D-fructose to D-psicose (kcat: 12.1 -1 s at 30°C) and only poor activity towards the epimerization of L-sorbose to L-tagatose (kcat: 0.24 s-1 at 30°C). Regarding a suitable engineering strategy for these objectives, it has been shown previously that a limited number of amino acid substitutions in vicinity to the active site are sufficient to change the substrate specificity and catalytic activity of a promiscuous enzyme [105, 187], potentially limiting the required effort to obtain improved biocatalysts substantially. Accordingly, we reasoned that a directed evolution strategy based on iterative saturation mutagenesis (ISM) [75] of residues that surround the active site of PcDTE Var8 would be a promising method to improve catalytic efficiency towards D-fructose or L-sorbose. Besides the obvious benefit of superior biocatalysts for the production of rare sugars, the central position of

67 DTE in sugar interconversions made us investigate the development of catalytic parameters for other hexose substrates as well. In this work we describe the divergent evolution of thermostable PcDTE Var8 towards two different ketohexose substrates by targeting residues that are in the first and second sphere around the active site. We describe the complete divergent trajectory of the two different variants that have improved specificity for D-fructose (IDF variants) and L-sorbose (ILS variants), respectively, by characterizing all intermediates from both evolutionary trajectories. By solving the crystal structures of the two final variants in presence of the substrate as well as that of the thermostable parent PcDTE Var8, we were able to rationalize the molecular basis of the changes in catalytic specificity.

4.3. Results

4.3.1. Establishment of a high-throughput screening protocol

The screening effort that comes with large libraries is arguably still the main limiting factor in directed evolution. The strategy to consider only the 45 residues that are in proximity to the active site already reduces the workload significantly but still requires the screening of several thousands of clones. Therefore the establishment of a microtiter plate-based screening procedure was necessary as an HPLC-based assay described earlier [106] seemed infeasible. A galactitol dehydrogenase (RsGD) from Rhodobacter sphaeroides was previously described [188] for the NADH-dependent oxidation of galactitol to L-tagatose. L-Tagatose, the C3 epimerization product of L-sorbose, can hence be detected using RsGD in the reverse direction by following the oxidation of NADH at 340 nm. Indeed, RsGD reliably detected L-tagatose in the presence of L-sorbose (Supplementary Figure S4.2), a prerequisite for the screening assay. A dehydrogenase that specifically reduces D-psicose in presence of D-fructose was less readily available. Klebsiella pneumonia strain 3321 was reported to contain a ribitol dehydrogenase that can specifically reduce D-psicose without reducing D-fructose [189]. The gene sequence coding for this protein was not specified, but the genomes of several K. pneumoniae strains have been sequenced to date. Based on sequence homology data, a ribitol 2-dehydrogenase (KpRD) was identified in the genome of K. pneumoniae MGH 48, codon optimized for expression in E. coli, chemically synthesized, and expressed in E. coli (see for amino acid sequence). The purified KpRD indeed showed high specificity for the reduction of D-psicose under concomitant oxidation of NADH and very low activity towards the reduction of D-fructose, making it a suitable enzyme for the determination of D-psicose in our screening assay (Supplementary Figure S4.2). As a result, we could establish a screening system for the reliable detection of D-psicose in presence of D-fructose by KpRD and for L-tagatose in presence of L-sorbose by RsGD. These UV-

68 VIS assays allowed the rapid screening of over 6,000 clones from IDF libraries (Improved for D- fructose) and over 5,000 clones from ILS libraries (Improved for L-sorbose), a number that would be highly laborious to meet by HPLC-based screening.

4.3.2. Selecting sites for saturation mutagenesis of PcDTE Var8

Based on the available crystal structure of PcDTE co-crystallized with D-fructose (PDB ID: 2QUN [185]) we selected the 22 amino acid residues that are located within 5Å of the C3-atom of D- fructose and categorized them as 1st-sphere residues, whereas the 28 amino acid residues located between 5Å and 10Å were categorized as 2nd-sphere residues. From the residues belonging to the 1st sphere we excluded the 5 residues that either make up the catalytic duo (E152 and E246), bind the metal ion Mn(II) (D185), or stabilize the cis-enediol transition state (H188 and R217) [185] (Figure 4.1 and Supplementary Figure S4.3). To reduce the screening effort further we applied a “short-cut” version of the established ISM method [75]. Instead of screening each of the remaining 17 sites separately first and then re-diversifying each beneficial site once the first mutation has been fixed, we fixed the first beneficial site that we encountered in the first round and then immediately used this variant to diversify the next site (Figure S4.3).

69 a) b)

c)

Figure 4.1| Directed evolution of PcDTE towards substrates D-fructose or L-sorbose. a) Residues that were targeted during divergent directed evolution are depicted in PcDTE wildtype monomer (PDB ID 2QUN) with bound D-fructose (green spheres): residues that are located within 5Å of the C3 of D-fructose (1st sphere residues) are colored cyan, and residues that are located between 5Å and 10Å (2nd sphere) are colored in orange. Catalytic residues (E152 and E248) as well as highly conserved residues (D185, H188 and R217) that were excluded from screening are shown as red sticks. b) Variant PcDTE IDF8 (PDB ID 4PFH) evolved towards D-fructose with mutation S37N in the 1st and the additional mutations in the 2nd sphere (color coding as in a). c) Variant PcDTE ILS6 (PDB ID 4PGL) evolved towards L-sorbose with mutation N183H in the 1st and additional mutations in the 2nd sphere (color coding as in a).

4.3.3. Evolving DTE towards improved D-fructose epimerization

Upon screening the first shell of residues, only one mutation (S37N) was found that improved conversion of D-fructose to D-psicose by 1.4-fold compared to Var8 and determined in heat- treated lysate. This mutation constituted the first variant and was termed IDF1 (Figure 4.2). When screening the second-shell residues of variant IDF1, mutation H209V was obtained in protein IDF2 that yielded a 1.6-fold higher activity compared to IDF1. In the next rounds, mutation G39E (IDF3) was found that increased activity 2.7-fold (IDF3), and in round 4 the silent mutation T9T (IDF4, codon ACC to ACT) that increased protein expression by 1.3-fold. In the next rounds the mutations A258D (IDF5, 1.6-fold improvement), T109N (IDF6, 1.4-fold improvement),

70 L212I (IDF7, 1.3-fold improvement), and S256G (IDF8, 1.4-fold improvement, final variant) were fixed. The total improvement in catalysis from PcDTE Var8 to IDF8 as determined from heat- treated lysate was 14.4-fold, comprised of the 8.6-fold catalytic improvement (Figure 4.3) and a (calculated) 1.7-fold improvement in protein yield. Figure 4.1b shows the localization of the beneficial mutations of IDF8 around the active site.

IDF8 Round 8 (IDF7 + S256G)

IDF7 Round 7 (IDF6 + L212I) IDF6 ILS6 Round 6 (IDF5 + (ILS5 + T109N) M245I)

IDF5 ILS5 Round 5 (IDF4 + (ILS4 + A258D) T109N)

ILS4 IDF4 Round 4 (ILS3 + (IDF3 + G39S) T9T)

IDF3 ILS3 Round 3 (IDF2 + (ILS2 + G39E) T9S)

IDF2 ILS2 Round 2 (IDF1 + (Var8 + H209V) V153A)

IDF1 ILS1 (Var8 + Round 1 (Var8 +

S37N) Q183H)

2 6 5 3

9 8 9 3

7

9

1 5 4 8

0 5 0 9 5

3 3 r

2 2

r 2 2 2 1 1

1

y t

r l r

PcDTE Var8 h

u l

e s a n

i e l

l e a

h T

e

G

S

H A V

L S M G T Figure 4.2| Lineage of IDF variants (grey boxes) and ILS variants (green boxes) starting from thermostabilized variant PcDTE Var8. Only residues that exhibited improved activity towards D-fructose (IDF) or L-sorbose (ILS) are depicted (see Figure S4.3 for the complete screening process). Amino acid residues in the first row have been arranged such that residues containing mutations in IDF variants are aligned to the left side and those for ILS variants to the right side, respectively.

Next, all IDF variants were overexpressed, purified, and kcat and Km values were determined for each variant with the substrates D-fructose and D-psicose. For the starting template and the final variant IDF8, these values were also determined for the substrates D-tagatose, D-sorbose, L-sorbose and L-fructose (Figure 4.3a, Figure 4.4a and Table S4.2). We observed a steady increase

-1 -1 in kcat from Var8 (4.9 s ) to IDF8 (42.3 s ) for the substrate D-fructose and a concomitant -1 -1 increase in kcat for substrate D-psicose from Var8 (2.8 s ) to IDF8 (57.1 s ) (Figure 4.3). However, the net catalytic efficiency (kcat/Km) remained nearly unchanged as the increase in kcat was accompanied by a concurrent increase in Km for both D-fructose and D-psicose. For substrates D-tagatose and D-sorbose, which are supposed to be the original substrate pair of PcDTE [12] no net decrease in kcat was observed for the IDF evolution trajectory (Figure 4.4a), however the catalytic efficiency for these two epimers was significantly reduced when going from Var8 to IDF8 (Figure 4.4c). For substrates L-sorbose and L-fructose net catalytic rate was increased moderately whereas catalytic efficiency remained largely unchanged (Figure 4.4a and c).

71 a) b) 45 800 3.5

40 700 3.0 35 60 600

2.5 ]

30 ]

]

] -1

-1 500

s

s

mM

mM

[

[ t

t 2.0 [

25 [

a

a

c c

40 m

m

k

k K 400 K 20 1.5 300 15 1.0 200 20 10 0.5 5 100

0 0 0.0 0 Var8 IDF1 IDF2 IDF3 IDF5 IDF6 IDF7 IDF8 Var8 ILS1 ILS2 ILS3 ILS4 ILS5 ILS6

Figure 4.3| Development of enzyme kinetic parameters kcat and Km for each improved variant IDF and ILS, determined at 25°C. a) kcat (grey square) and Km values (grey circle) for Var8 and IDF variants with D- fructose as substrate. b) kcat (green square) and Km (green circle) values for Var8 and ILS variants with L- sorbose as substrate

4.3.4. Screening libraries for improved L-sorbose epimerization conversion and enzyme kinetic parameters of improved variants

Interestingly, screening the first shell for improved L-sorbose to L-tagatose conversion also yielded only one single mutation (Q183H) that was termed variant ILS1 (Figure 4.2). This mutation, however, increased conversion of L-sorbose to L-tagatose by 8.2-fold as judged from the heat-treated lysate (not corrected for protein expression). We started to screen the second shell residues for improved catalytic activity based on variant ILS1 and found mutation V153A (ILS2) that increased catalytic activity 1.4-fold. Further rounds of directed evolution resulted in fixation of mutation T9S (ILS3) that brought an improvement of 1.2-fold, mutation G39S (ILS4) that improved activity by 1.3-fold and mutation T109N (ILS5) that resulted in 1.3-fold higher activity compared to ILS4, too. The final mutation M245I made up variant ILS6 and improved activity by another 1.3-fold. In total, a 30-fold activity improvement determined from heat- treated cell lysate was obtained compared to Var8. Figure 4.1c shows the accumulated mutations around the active site and their localization in the molecule. Enzyme kinetic parameters were determined from purified protein for each variant in the evolutionary trajectory for substrate L-sorbose and for ILS1 and ILS6 for substrates L-fructose, D- fructose, D-psicose, D-tagatose and D-sorbose. The most dramatic improvement in kinetic parameters was found with mutation Q183N (ILS1) and resulted in a 6.9-fold higher kcat value, -1 -1 from a kcat of 0.24 s (Var8) to a kcat of 1.61 s (ILS1) (Figure 4.3b and Supplementary Table S4.2).

Mutation G39S forming ILS4 resulted in no improvement in terms of kcat which indicates that this mutation was found due to an increased expression level of soluble protein. Finally, ILS6 had a 13.5-fold improvement in terms of kcat compared to the starting variant Var8 and hence a roughly two-fold higher level in soluble protein expression (see above). Interestingly, catalytic improvement for ILS6 was not accompanied by a loss in thermostability compared to starting

72 20 variant Var8 (ΔT50 = 0.2°C), in contrast to PcDTE IDF8 that lost considerable thermostability 20 (ΔT50 = -11.8°C; Supplementary Figure S4.4).

Figure 4.4| Development of kcat and catalytic efficiencies (kcat/Km) for the first and the final variant discovered during the screening relative to the values of kcat and kcat/Km of parent Var8 for the same substrate. The substrate for which the variants were evolved is marked with a hash (#). a) Relative kcat for IDF1 (black bar) and IDF8 (dark grey bar) for 6 different substrates. b) Relative kcat for ILS1 (light grey bar) and ILS6 (white bar). c) Relative catalytic efficiencies for IDF1 (black bar) and IDF8 (dark grey bar). d) Relative catalytic efficiencies for ILS1 (light grey bar) and ILS6 (white bar). Details regarding the kinetic parameters are listed in Supplementary Table S4.2.

The mutation leading to ILS1 effectively reverses the preference of the enzyme for the C4- hydroxy group of the hexose substrate from the S- to the R-configuration and that of the C5- hydroxy group from R to S. This single mutation changes the catalytic efficiency between substrates D- (3R,4S,5R) and L-sorbose (3S,4R,5S) by 56-fold (Figure 4.4d). As a result, in the final variant ILS6, net catalytic rates as well as catalytic efficiency for the native substrates D-sorbose

(3R,4S,5R) and D-tagatose (3S,4S,5R) were both reduced, whereas the kcat for D-fructose (3S,4R,5R), D-psicose (3R,4R,5R) and L-fructose (3R,4S,5S) was unchanged but catalytic efficiency for D-psicose and L-fructose was moderately increased (Figure 4.4b and d).

73 4.3.5. Structural basis of the change in substrate specificity and catalytic characteristic

To investigate the structural changes in PcDTE during the divergent directed evolution towards two different ketohexoses, we determined X-ray structures of PcDTE Var8 in the substrate-free form, for PcDTE IDF8 in the D-fructose bound form and for PcDTE ILS6 in the L-sorbose bound form. The structures were solved at resolutions of 1.6 Å (Var8), 2.1 Å (ILS6) and 2.0 Å (IDF8) (Table 4.3), giving precise insight into the detailed geometry of the active site and the interactions between PcDTE variants and their substrates. Structural alignment of the Cα backbone of the wildtype (WT) and the 3 variants (Var8, IDF8, and ILS6) showed very little difference in tertiary structure between the variants (Figure 4.5). Except for the 6xHis tag, that is resolved in all of the three structures of Var8, IDF8, and ILS6 and exhibits the expected high structural flexibility, the backbone superimposes nearly perfectly, which is also reflected in the low root-mean square deviation (rmsd)-values determined from pairwise structural alignments (Table S4.3).

Figure 4.5| Cα-backbone superposition of wildtype PcDTE and the three variants, shown in ribbon representation. The backbone of PcDTE WT is colored in orange, of Var8 in blue, of IDF8 in grey, and of ILS6 in green. Substrates for WT (D-fructose), IDF8 (D-fructose) and ILS6 (L-sorbose) are shown as sticks in the color of the respective variant. For Var8, IDF8 and ILS6 the C-terminal 6xHis-tag is resolved in the crystal structure and indicated in the figure.

4.3.5.1. Structural features of PcDTE Var8

The structure of Var8 at 1.8 Å resolution shows that the active site geometry is virtually identical to the substrate-bound or substrate free form of PcDTE WT (Figure S4.5). Var8 was crystallized without substrate but the crystal was cryoprotected with glycerol before freezing. Closer inspection of the Var8 active site revealed that a glycerol molecule is coordinated by a manganese ion and active site residues, resembling the coordination of C1-C3 of the C6 sugars D-fructose or D-tagatose as shown previously (Figure S4.5). The Var8 structure confirmed the

74 previous speculations on the stabilizing effect of F157Y, T194N and A215Q at the dimer interface [106]. These mutants form an extensive hydrogen bonding network at the dimer interface (see Table S4.7). Unexpectedly, the S116H mutation was coordinated by a metal ion and water- mediated isologous hydrogen bonds with W262 from the other subunit (Figure S4.6). The stabilizing effects of other mutations are difficult to deduce from the Var8 structure. It is likely that the mutations reduce the dimer interface entropy by replacing flexible residue with less flexible one (G260C) or substituting larger side chains with smaller ones (K122V, K251T, M265L).

a) b)

Figure 4.6| Detailed representation of the active site residues with bound substrates. a) Structure of PcDTE IDF8 with bound linear D-fructose (cyan sticks, occupancy 0.7) and D-psicose (green sticks, occupancy 0.3) with 2F0-FC map contoured at 0.7 σ. The catalytic dyade E152 and E246 and conserved residues N185, H188 and R217 are depicted as sticks, the catalytic Mn2+ ion is shown as a sphere. b) Active site of PcDTE ILS6 with bound L-sorbose (yellow sticks, occupancy 0.7) and L-tagatose (light red sticks, occupancy 0.3), the 2F0-FC map is contoured at 1.2 σ.

4.3.5.2. Structural features of PcDTE IDF8

Crystals of variant IDF8 were soaked in D-fructose prior to freezing in order to determine the structure in the substrate-bound state. The active site of IDF8 possesses additional electron- densities which could be accounted for both substrate and product of IDF8 (Figure 4.6a). The electron-densities are rather weak at substrate positions C4 – C6, which we suppose can be ascribed to the weak coordination of these positions by the enzyme’s active site. This phenomenon has also been observed in other substrate bound PcDTE structures [185, 190]. Substrate D-fructose and product D-psicose were fitted in the electron densities of IDF8 with occupancies of 0.7 and 0.3, respectively, according to the thermodynamic equilibrium of the epimerization reaction. The most prominent changes in the active site geometry of IDF8 arose from mutation S37N, G39E, and A258D which constrict the access tunnel to the active site (Figure 4.7a and Figure 4.9a). These mutations also lead to a change in the conformations of residues K70 and R257, orienting them more towards the entrance tunnel and thus further narrowing the entry. All three mutations in the entrance tunnel increase the polarity at the substrate entry site by

75 exchanging a polar but uncharged residue (S37) or a nonpolar residue (G39 and A258) for a positively charged (S37N) or negatively charged residues (G39E and A258D). It is noteworthy that the 6xHis tag (residues H293 – H298) of a subunit from a neighboring asymmetric unit protrudes into the cleft next to the active site of chain A (Figure S4.7), forming potential hydrogen bonds to H116 and R257. These interactions are considered as nonbiological crystallization artifacts, therefore only chain B was used for closer analysis of the active site entrance. a) b)

Figure 4.7| Mutations and differently oriented side chains of variants at the entrance of the active site channel in comparison to the WT (orange surfaces and sticks). a) Mutations of IDF8 (PDB: 4PFH, chain B) are depicted as grey sticks and dots with potential non-covalent interactions indicated by red dashed lines, the position of the substrate D-fructose in the WT is shown as orange sticks and in IDF8 it is shown as green sticks. Residues R257 and K70 show two different conformations in chain B of IDF8 crystal structure, whereas they have only one conformation in WT. b) The only mutation of ILS6 (PDB: 4PGL), G39S, that is located at the entrance of ILS6 is shown as green sticks and dots, and the position of the substrate L-sorbose is shown as green sticks for ILS6 and in orange sticks for WT.

Although it is tempting to speculate that these mutations change the affinity of the enzyme towards the substrate, preferably in its linear form, this cannot be corroborated by enzyme kinetic data from different variants. S37N (IDF1) and A258D (IDF5) in fact exhibit increased values for Km, whereas G39E (IDF3) shows an unchanged Km for D-fructose (Figure 4.3). Hence, we suppose that the constriction of the entrance tunnel might shield the active site from bulk solvent and in this way leads to an increase in catalytic efficiency. It has been proposed previously that restricting the access to the active site can enhance the catalytic activity by increasing the stability of the activated complex [148, 191]. The mutation S37N that constitutes IDF1 not only constricts the substrate channel but also forms a hydrogen bond via its δ-amino group to the O6 of the bound substrate D-fructose, thus it is thought to improve the correct positioning of the substrate in the active site (Figure 4.6). In contrast, in the WT structure D-

76 fructose is contacted at its O6 atom indirectly by S37 via an ordered water molecule. Mutation G39E in IDF3 then further extends this hydrogen bond network by interacting with N37 via an ordered water molecule and via its backbone amide (Figure 4.6a). a) b)

Figure 4.8| Schematic representation of potential non-covalent interactions between D-fructose and amino acid positions E35, H/V209, H211 and T242 as well as Mn2+. a) An extensive hydrogen bond network is formed in PcDTE WT between these residues, reaching from the substrate out to the surface residue T242. b) Mutation H209V in PcDTE IDF8 prevents a hydrogen bond contact between E35 and H209, thus interrupting the hydrogen bond network. Numbers indicate the distance between hydrogen-bond or salt bridge forming atoms in Å.

Mutation H209V that was fixed in IDF2 is thought to break a hydrogen bond network, extending from the manganese ion to the surface of the enzyme (Figure 4.8). This leads to a slight repositioning of the residue H211, the manganese ion, and of the ordered water that contacts O3 and O5 of D-fructose. Interestingly, mutation T109N first identified in IDF6 and slightly increasing kcat (Table S4.2) also increased epimerization rate towards L-sorbose (see variant ILS5). A possible explanation for this phenomenon is that N109 can form a hydrogen bond via its δ-amino group to the carbonyl group of the N150 backbone, leading to the disruption of the β-sheet that potentially influences the near catalytic residue E152. Mutation L212I is just adjacent to H211 that coordinates the manganese ion and is slightly displaced compared to the WT. Finally, the effect of mutation S256G is difficult to rationalize; it is possible that glycine instead of serine allows a greater degree of flexibility at position 256 located in an α-helix, leading to a better binding of D-fructose (lower Km).

77 a) b)

Figure 4.9| Substrate channels of a) PcDTE WT (PDB: 2QUN) and b) chain B of IDF8 (PDB: 4PFH). Residues at position 37, 39 and 113 are shown in stick-dot representation; the substrate channel is shown as mesh with coordinates obtained from the CAVER 3.0 plugin for PyMOL. a) The substrate channel for the WT is colored in green. b) The substrate channel for IDF8 is colored in blue, the potential hydrogen bond between O6 of D-fructose and N37 is depicted in red.

4.3.5.3. Structural features of PcDTE ILS6

Crystals of ILS6 were soaked in L-sorbose before subjecting to flash freezing in order to obtain a structure in its substrate-bound state. The active sites of all four molecules in asymmetric unit contain additional electron densities that could be unambiguously fitted with L-sorbose. As in case of IDF8, both substrate and products of ILS6 were fitted into the electron densities with occupancies of 0.7 and 0.3 for L-sorbose and L-tagatose, respectively (Figure 4.6b).

Mutation Q183H, first introduced in ILS1 brings the highest increase in kcat for the epimerization of L-sorbose to L-tagatose (6.9-fold higher kcat). In the WT, Q183 contacts O5 of D-fructose by hydrogen bonds via an ordered water molecule (Figure 4.10). As the configuration of L-sorbose at the C5 position is 5S whereas it is 5R in case of D-fructose or D-tagatose, a hydrogen bond to Q183 would likely lead to a non-productive positioning of the L-sorbose in the active site. The molecular basis of mutation V153A (ILS2) is harder to rationalize as a hydrophobic alanine is simply exchanged for a bigger hydrophobic valine. Also the mechanism by which mutation T9S (ILS3) improves catalytic efficiency remains obscure whereas G39S (ILS4) allows formation of a hydrogen bond network between O6 of L-sorbose and S39 via S37 (Figure 4.10). This does not impact kcat but reduces Km to some extent, presumably by tighter binding of the substrate. Mutation T109N is also found in IDF6 and we therefore suppose a similar underlying effect. Finally, the beneficial effect on activity for mutation M245I (ILS6) is also difficult to explain by mechanistic means.

78 a) b)

Figure 4.10| Schematic representation of potential non-covalent interactions in the active site of (a) PcDTE WT and (b) PcDTE ILS6. Mutation Q183H in ILS6 interrupts the potential hydrogen bond network over ordered water to the O5 of the substrate. Numbers indicate distances (in Å) between atoms forming potential non-covalent interactions (hydrogen bonds or salt-bridges)

4.3.6. pH-rate profiles for PcDTE WT, Var8, IDF8 and ILS6

To elucidate whether a change in pKa of catalytically relevant residues might have had an impact on the increased catalytic activity during directed evolution for IDF8 or ILS6, we recorded pH-rate profiles for the four PcDTE variants. The epimerization reaction between two ketohexoses at the C3-position by D-tagatose epimerase is suggested to be catalyzed by two glutamate residues (E154 and E246) that are both in their deprotonated state and can act as catalytic bases. Mechanistically, the two glutamates have been suggested to proceed by abstracting a proton from the C3, forming a cis-enediolate intermediate, followed by re- protonation of the C3 by the other glutamate residue [185]. Other ionizable residues contribute as well to the binding and positioning of the substrate, suggesting that pH might significantly affect the catalytic rate. To determine whether any of the accumulated mutations influenced the pH-dependent activity we recorded reaction rates as a function of environmental pH for PcDTE WT, Var8, ILS6 and IDF8 in a pH range of 5 to 9 (Figure 4.11).

79 1.2

1.0

0.8

0.6

0.4 PcDTE WT 0.2 PcDTE Var8

Relative activity [normalized] Relativeactivity PcDTE IDF8 PcDTE ILS6 0.0 4 5 6 7 8 9 10 pH Figure 4.11| The pH-rate profiles of PcDTE WT (squares), thermostabilized Var8 (diamonds), and catalytically improved variants IDF8 (triangles) and ILS6 (circles) are shown. Activities at the respective pH were determined with D-fructose (WT, Var8, IDF8) or L-sorbose (ILS6) as substrate and normalized to the maximum activity of the respective variant. Different pH’s were set with actetate buffer (pH 5 – pH 6), phosphate buffer (pH 6 – pH 7) and Tris buffer (pH 7 – pH 9). Error bars indicate standard deviations from mean from triplicate measurements (Var8, ILS6, IDF8) and duplicate measurements (WT). Solid lines indicate the best fit of the experimental values to a pH-rate equation (see methods for details).

By fitting a standard Henderson-Hasselbach equation (Eq. 1, see methods for details) to the normalized activities, an apparent pKa,app of 6.1 +/- 0.05 could be determined for PcDTE WT, whereas the apparent pKa,app of Var8 was significantly lower (4.5 +/- 0.02) and ILS6 showed a pKa,app of 5.25 +/- 0.06. Surprisingly, the pH-rate curve of PcDTE IDF8 could not be described adequately by a simple Henderson-Hasselbach equation, suggesting between several different ionizable residues within the active site of IDF8. Therefore a modified Hill- equation which accommodates for cooperativity was employed for fitting [192] (Equation 2). To determine which residues could elicit this effect we analyzed pKa values of all ionizable groups of WT and IDF8 by means of the web-based prediction software H++ [193]. Predictions were made with explicit incorporation of the manganese ion in the active site as it significantly influences the pKa values of the adjacent residues. We found three residues which showed a significant difference in pKa for WT and IDF8 and which are close to the active site, namely E35, E39 and H209 (Table S4.4). Hence, two different mechanisms are likely to be responsible for the cooperative pH-rate profile for IDF8. First, the mutation H209V disrupts a hydrogen-bond network between T242, H209, H211 and E35 (Figure 4.8) and thus changes the predicted pKa of

E35 significantly (pKa < 0 for WT, pKa = 5.6 for IDF8). Protonation of E35 in IDF8 would supposedly lead to a severe disturbance of the non-covalent interactions with H211 and thus

2+ with the catalytic Mn ion. Second, the protonation of E39 (pKa = 5.9 for IDF8) could lead to a loss of the extensive hydrogen bond network at the entrance of the substrate channel, leading to an impaired entrance of the substrate.

80 4.4. Discussion

Screening of the thermostable Var8 towards catalytic efficiency for two different substrates revealed two interesting evolutionary trajectories for each branch. In both cases we were able to find only one single beneficial mutation that is directly adjacent to the active site (so called first-sphere residues), mutation S37N in IDF1 and Q183H for ILS6. IDF1 had only a moderate increase in kcat for D-fructose whereas ILS1 showed a much more pronounced increase in catalytic activity as well as catalytic efficiency for L-sorbose. We suppose that mutation Q183H ‘switches’ the substrate preference from hexoses with a 4S,5R configuration to those with a 4R,5S configuration based on the catalytic efficiencies of the ILS1 mutant for 6 different substrates (Figure 4.4). Building on this initial improvement, catalytic activity was then further improved by 5 additional mutations, leading finally to ILS6 with an improvement in kcat of over 13-fold compared to Var8 for L-sorbose. These results underline the importance of second-shell residues on catalytic activity. Tokuriki et al. showed for example that mutations that were randomly introduced by error-prone PCR (epPCR) into a phosphotriesterase to change it to an arylesterase were stepwise accumulated in a radial fashion from the center out to the periphery, with only 4 of the total of 18 mutations directly adjacent to the substrate binding site [194]. We observed a similar behavior for the evolution of ILS variants insofar as an initial big improvement in catalytic rate was followed by less pronounced improvements from more distant mutations in the second sphere. In contrast to the evolution of the ILS variants of PcDTE, where a strong initial improvement was followed by minor incremental improvements, the IDF variants showed a smooth stepwise increase in kcat over the whole trajectory (Figure 4.3). Second, the increase in catalytic activity

(kcat) for IDF variants was accompanied by a loss of affinity (Km) for D-fructose during the evolutionary trajectory whereas the ILS branch showed no significant change in affinity to L- sorbose despite the increase in catalytic activity. It is difficult to explain this finding as the substrate concentration used for screening for both branches was the same (200 mM). Due to the increase in Km for IDF variants no net increase in catalytic efficiency (kcat/Km) could be observed during the directed evolution that finally resulted in variant IDF8. This does however not impair the use of IDF8 as a promising catalyst for the production of D-psicose, as the substrate concentration that is applied in industrial settings and in particular in sugar- processing plants is usually high (> 1 M). Therefore the impact of Km is often considered to be of subordinate interest in industrial biocatalysis [195]. As expected, the overall structural changes for IDF8 as well as for ILS6 from the WT are very small. Upon detailed inspection of the structures, the most prominent change for IDF8 is the accumulation of negatively charged residues at the entrance of the substrate channel (Figure 4.7). It has been shown previously that residues lining the substrate channel or its mouth have a high probability of returning catalytic improvement upon mutation [148, 196]. The two

81 mutations that encode negatively charged residues (G39E and A258D) lead to a conformational change of the two positively charged residues K70 and R257 at the entrance of the substrate tunnel. Together, this highly charged patch at the mouth of the entrance might facilitate the entry of the polar sugar substrate, although the entry tunnel of IDF8 is supposed to be also significantly constricted by these mutations compared to the WT enzyme. Finally we conclude that our strategy of improving catalytic activity of enzymes by screening residues around the active site, going radially from the center to the periphery, is an efficient method for directed evolution of enzymes where screening is the limiting step, which is still the case for most industrially relevant biocatalysts [44]. In this specific case, it delivered improvements in kcat of a factor 8.6 for the conversion of D-fructose (IDF8) and of 13.5 for L- sorbose (ILS6).

82 4.5. Material and Methods

If not stated otherwise all chemicals were purchased from Sigma Aldrich (Buchs, Switzerland). NADH was purchased from GERBU (Heidelberg, Germany), D-fructose, D-sorbose and L-fructose were purchased from Carbosynth (Berkshire, UK), D-tagatose and L-sorbose were obtained from Sigma Aldrich (Buchs, Switzerland). D-Psicose was produced in-house by epimerization of D-fructose to D-psicose with D-tagatose epimerase and separation of D-psicose from D- fructose by continuous chromatography [197]. Restriction enzymes and polymerases were obtained from New England Biolabs (Ipswich, MA, USA) and oligonucleotides from Microsynth (Balgach, Switzerland).

4.5.1. Molecular Biology

General molecular biology was performed according to standard protocols [173]. All PCRs were generated using Phusion High-Fidelity Polymerase (NEB). Primers used for cloning are listed in Table 4.1. All general cloning work was done in E. coli Top10 cells (Invitrogen).

4.5.2. Cloning of KpRD from K. pneumoniae and RsGD from R. sphaeroides

The genetic sequence of ribitol dehydrogenase (KpRD) from Klepsiella pneumoniae was retrieved from the NCBI database (GenBank Nr. ESM54230) based on a report of Takeshita et al. [189]. The gene was codon-optimized for expression in E. coli and synthesized by Geneart (Regensburg, Germany). Plasmid pAB139 was constructed as follows (see Table 4.2 for details): the origin of replication pBR322 was amplified from pRK793 using primer pBR322_for-AscI and pBR322_rev-FseI and used to exchange the origin of replication of pSEVA131 via unique restriction sites AscI and FseI, resulting in plasmid pAB73. The tetR-Ptet-PT7 cassette was amplified from plasmid pKTS using primers Ptet-SEVA_for and Ptet-SEVA_rev and inserted into pAB73 via restriction sites SpeI and HindIII, resulting in construct pAB92. The bla-resistance gene of pAB92 was exchanged against the kamR resistance gene from plasmid pSEVA231 using unique restriction sites SwaI and FseI, giving vector pAB228. The kpRD gene was amplified from the vector pMA-T-KpRD using primers KpRD_N6H_f (introducing a 6xHis tag) and KpRD_s_rev and inserted into pAB92 using restriction enzymes HindIII and EcoRI, resulting in construct pAB138. The galactitol dehydrogenase (RsGD) [198] was isolated from the genomic DNA of R. sphaeroides (DSM No. 8371). The gene was amplified directly from whole R. sphaeroides cells using primers RsGDH_NheI_for and RsGDH_s_rev and inserted via restriction sites NheI and EcoRI into pAB139, giving construct pAB140.

83 4.5.3. Expression and purification of KpRD and RsGD

E. coli BL21(DE3) cells were transformed with plasmid pAB139 or pAB140. Cells containing the respective plasmid were pre-cultured in 5 ml of Luria Bertani broth supplemented with 50 µg mL-1 kanamycin, then 200 µL of this preculture was used to inoculate 50 ml of M9-medium containing 0.4 % D-glucose and 50 µg mL-1 kanamycin, which after re-incubation served to inoculate a 1 L fed-batch reactor. Protein production was induced with 0.2 mM IPTG when an

OD600 of 50 was reached and protein was synthesized for 6 h at 37°C. Cells were harvested by centrifugation (20 min at 6,000 rcf) and stored at -80°C until further use. To obtain purified RsGD, 10 g of wet cell pellet from the RsGD cultivation was resuspended in 15 mL-1 of RsGD-lysis buffer (50 mM Tris, pH 6.8; 100 mM NaCl, 20 mM imidazol, 0.2 mg mL-1 lysozyme) and incubated 20 min at room temperature before the cell suspension was frozen at

-80°C for 20 min. The cell suspension was then thawed at room temperature, MnCl2 to a final concentration of 1 mM and a spatula-tip of DNase was added and the suspension was sonicated for 10 min in an ultrasonication waterbath. Cell debris was removed by centrifugation (20 min at 48’384 rcf and 4°C) and the cleared lysate was loaded on 2 mL of Ni- Sepharose 6 Fast Flow (GE Healthcare) in a gravity-flow column. The column was extensively washed with lysis buffer before protein was eluted with elution buffer (50 mM Tris pH 6.8, 200 mM imidazol, 100 mM NaCl). The main fractions containing RsGD were analyzed by SDS-PAGE, pure fractions (> 95 %) were pooled and dialyzed twice against 1 L of buffer A (20 mM Tris, pH

6.8; 1mM MnCl2; 10% sucrose) before it was dialyzed once against 200 mL of buffer B (20 mM Tris, pH 6.8; 1mM MnCl2; 30% sucrose). The protein solution was sterile filtered through an 0.2 µm filter and stored at 4°C. KpRD was purified in the same way as RsGD, except for the following modifications: The wet pellet was resuspended in 15 ml KpRD-lysis buffer (50 mM Tris, pH 8.0; 100 mM NaCl, 20 mM imidazol, 0.2 mg mL-1 lysozyme) and the IMAC-purified KpRD was dialyzed twice against 2 L of buffer C (20 mM Tris, pH8.0) before being aliquoted and stored at -80°C. Protein concentrations were determined spectrophotometrically at 280 nm wavelength for both RsGD (MW=27.5 kDa, ε =21.1 x 103 M-1 cm-1) and KpRD (MW=27.4 kDa, ε=35.1 x 103 M-1 cm-1).

4.5.4. D-Psicose quantification assay using KpRD

For the qualitative determination of conversion of D-psicose from D-fructose, a screening assay was developed based on the reduction of D-psicose by the enzyme KpRD and coenzyme NADH. The screening assay was performed in a two-step fashion. First, epimerization of D-fructose to D-psicose by a PcDTE enzyme variant was done for a certain time period (see below for details). Second, NADH and KpRD was added which converted D-psicose to allitol. The concomitant oxidation of NADH can be followed spectrophotometrically at 340 nm. PcDTE was not stopped before the KpRD was added for reasons of simplification, therefore this assay allows only for a

84 qualitative comparison of PcDTE activity, which is however sufficient for screening purposes. As D-fructose is also to a small extent a substrate for KpRD, a calibration curve for the oxidation of different starting concentrations of D-psicose in presence of D-fructose was recorded by following the rate of NADH consumption at 340 nm in a Perkin Elmer Wallac 1420 Victor plate reader (Perkin Elmer, MA, USA). Six different calibration samples were prepared in 50 mM Tris buffer (pH 8.0) with the following concentrations (in mM) of D-fructose: 100; 99; 98; 95; 90; 80. The difference to a total hexose concentration of 100 mM was made up by D-psicose. Next, an aliquot of 200 µL of each calibration sample was supplied with 1 mM NADH (Gerbu Biotechnik GmbH, Heidelberg, Germany) and 25 µg of KpRD in a 96 well microplate (Greiner Bio-One, Germany), and the rates of NADH consumption were recorded at 30°C. This rate was a linear function of the D-psicose concentration at least in the range 0f 0 to 10 % (Figure S4.2a), and this part was then used as calibration curve to determine the amount of D-psicose in the screening assay.

4.5.5. L-Tagatose quantification assay using RsGD

Detection of epimerization of L-tagatose from L-sorbose was done similarly as outlined above for detection of D-psicose. Epimerization of L-tagatose from L-sorbose was detected using RsGD that preferentially reduces L-tagatose to galactitol with concomitant oxidiation of NADH. A calibration curve for L-tagatose in presence of L-sorbose was generated in a similar fashion as for D-psicose and KpRD. In short, 200 µL of L-tagatose calibration sample in 50 mM Tris buffer

(pH 8.0) was supplied with 1 mM NADH, 1 mM MgCl2 and 4.8 µg of RsGD in a 96 well microplate and NADH decline rates were recorded at 30°C. The linear part of the slope was used as calibration curve (see Figure S4.2b).

4.5.6. Cloning of PcDTE Var8 and library generation

The gene of thermostabilized D-tagatose epimerase from P. cichorii (PcDTE Var8)[106] was amplified using primers DTEci_HindIII_f and DTEci-ss_EcoRI_r and inserted into pAB92, giving plasmid pAB174 that served as template for mutant library generation. Saturation mutagenesis libraries on pAB174 were generated as described previously [106]. Primers (listed in Table S4.5) with NNK-degenerated codons [76] were used to randomize selected sites according to the QuikChange protocol (Stratagene) using Phusion High-Fidelity Polymerase (NEB). The product was digested directly in the polymerase buffer with 10 U of DpnI for at least 2 h at 37°C in order to remove the template before 5 µL were used to transform 70 µL of chemo-competent E. coli Top10 cells.

85

Table 4.1| Primers used in this work with restriction sites underlined Nr Name Sequence 1 KpRD_N6H_for 5’-ATCATAAGCTTATGCACCACCACCACCACCACGCTAGCAAACATAGCGTGAGCAGC-3’ 2 KpRD_s_rev 5’-ATCATGAATTCTTACAGATCAACGCTATTCGG-3’ 3 pBR322_for-AscI 5’-ATATATGGCGCGCCCCCGCCGCATCCATACCGC-3’ 4 pBR322_rev-FseI 5’-ATATATGGCCGGCCCCGTAGAAAAGATCAAAGG-3’ 5 Ptet-SEVA_for 5’-ATCATACTAGTGCTTAAGACCCACTTTC-3’ 6 Ptet-SEVA_rev 5’-ATCATAAGCTTATATCTCCTTCTTAAAG-3’ 7 RsGDH_NheI_for 5’-ATCATGCTAGCGACTACAGGACGGTTTTTCG-3’ 8 RsGDH_s_rev 5’-ATCATGAATTCTCACCAGACCGTGTAACCGC-3’ 9 DTEci_HindIII_f 5’-ATCATAAGCTTATGAATAAAGTGGGCATG-3' 10 DTEci-ss_EcoRI_r 5’-ATCATGAATTCTTATGCCAGTTTATCACGAAC-3’

Table 4.2| Plasmids used in this work Nr Name Description Reference 1 pMA-T-KpRD Cloning vector carrying full-length codon optimized KpRD Geneart, this work sequence 2 pRK793 pMal-C2 vector backbone, MBP-TEV fusion, bla resistance Kapust et al. [199] gene, ori pBR322 3 pSEVA131 SEVA vector backbone, bla resistance gene, MCS, ori pBBR1 Silva-Rocha et al. [200] 4 pAB73 SEVA vector backbone, bla resistance gene, MCS, ori pBR322 this work

5 pKTS Ptet-PT7 fusion promoter, bla resistance gene, ori pMB1 Neuenschwander et al. [88]

6 pAB92 SEVA vector backbone, bla resistance gene, Ptet-PT7 fusion this work promoter, MCS, ori pBR322 7 pSEVA231 SEVA vector backbone, kamR resistance gene, MCS, ori Silva-Rocha et al.[200] pBBR1 8 pAB228 pAB73 with kamR resistance gene this work 9 pAB139 pAB228 with codon-optimized KpRD-gene from K. this work pneumoniae containing an N-terminal 6His-tag 10 pAB140 pAB139 with wild-type RsGD-gene from R. sphaeroides, N- this work terminal 6His-tag 11 pKTS-PcDTE- Thermostable D-tagatose epimerase (PcDTE) in pKTS-C6H Bosshart et al. [106] Var8-C6H backbone, C-terminal 6His tag 12 pAB174 pAB92 with thermostable PcDTE Var8, C-terminal 6His tag this work

4.5.7. Expression and screening of saturation-mutagenesis libraries of PcDTE Var8

Clones from saturation mutagenesis libraries were expressed as described previously [106]. Briefly, 93 single clones were picked from each single saturation mutagenesis library from an LB-agar plate and inoculated into 0.5 ml of LB supplemented with 100 µg mL-1 ampicillin in a 96-deep-well plate and grown over-night at 37°C. Three wells per plate were inoculated with PcDTE Var8 (or the corresponding parent sequence in later rounds) as control. An aliquot of 20 µL of these precultures was used to inoculate the expression cultures in 96-deep-well plates containing 1 mL of ZYM-505 medium [174] supplemented with 100 µg mL-1 ampicillin and 160 µL

86 of the precultures was supplemented with 20 % glycerol and stored at -80°C. Expression culture plates were incubated for 4 h at 37°C before 200 ng mL-1 of anhydrotetracycline (aTc) was added to induce protein expression. Temperature was reduced to 30°C and protein expression took place for 16 to 20h. Cells were harvested by centrifugation (4,000 rcf, 10 min) and the plates were stored at -20°C until further use. Cells were lysed by addition of 150 µL lysis buffer (50 mM Tris (pH 8.0), 0.2 mg mL-1 lysozyme) per well and incubation for 20 min at 30°C with shaking (1,500 rpm). The lysate-containing plates were frozen for 20 min at -80°C, thawed again at 30°C with shaking, and MnCl2 was added to a final concentration of 1 mM together with some crystals of DNaseI to reduce viscosity. An aliquot of 100 µL of lysate was transferred to a 96-well PCR plate (Vaudaux- Eppendorf), heat-treated for 10 min at 70°C and immediately cooled on ice before cell debris and precipitated proteins were removed by centrifugation (4,000 rcf, 10 min, 4°C). The activity assay was performed in 96-well flat-bottom microplates (Greiner Bio-One) by addition of 20 µL of heat-treated lysate to 100 µL of 240 mM D-fructose (IDF evolution) or L- sorbose (ILS evolution) in 50 mM Tris (pH 8.0). In later rounds heat-treated cell lysate was diluted up to 7.5-fold in 50 mM Tris (pH 8.0) before addition to the substrate to account for the increasing activity. The assay plate was incubated for 1 h at 37°C (reduced in later rounds first to 20, then to 10 min for IDF variants) before 120 µL of developing solution was added (50 mM Tris (pH 8.0), 1 mM NADH, 40 µg mL-1 KpRD (IDF) or 80 µg mL-1 RsGD (ILS)) and the reaction progress was monitored at 340 nm wavelength and 30°C in a Perkin Elmer Wallac 1420 Victor platereader (Perkin Elmer). From the slope of NADH oxidation the concentration of D-psicose or L-sorbose was determined using the KpRD/RsGD calibration curve (see above). To verify the hits from this initial qualitative screening, the most active clones (> 120% activity of parent) were regrown in triplicates in 96-well plates, lysed and heat-treated as described above, and the epimerization reaction was started as described above. After 1 h of incubation (20 – 10 min for later rounds) 20 µL of the reaction was stopped in 145 µL 0.1 M HCl which was followed by the addition of 135 µL of 0.1 M NaOH after 5 min. Conversion of D-fructose to D- psicose (for IDF) or L-sorbose to L-tagatose (for ILS) was determined by HPLC using a LC ICS- 3000 system (Dionex, Olten, Switzerland) equipped with a CarboPac PA1 column (250 mm x 4 mm I.D.) preceded by a CarboPac PA1 guard column (50 mm x 4 mm I.D.) (both Dionex, Olten, Switzerland). Samples were eluted isocratically with 30 mM NaOH at a flow rate of 2.0 mL min-1 and detected by triple pulsed amperometry using an EC detector with a gold electrode (all Dionex, Olten, Switzerland). The best verified hit from each round was then used for detailed characterization as described below.

4.5.8. Expression and purification of improved IDF and ILS variants

Variants IDF1 – IDF8 and ILS1 – ILS6 were expressed using the T7 expression system, utilizing the

Ptet-PT7 fusion promoter (see Table 4.2). Plasmids encoding these variants were isolated from E. 87 coli Top10 cells and used for transformation of chemo-competent E. coli BL21 (DE3) cells. All variants were expressed in ZYM-5052 autoinduction medium [174]. A volume of 1 mL of overnight culture of the respective variant was used to inoculate 250 mL of ZYM-5052 medium supplemented with 100 µg mL-1 ampicillin in a 1 L Erlenmeyer flask and incubated for 16 h at 30°C and 220 rpm. Cells were harvested by centrifugation (6,238 rcf, 20 min, 4°C) and stored as a pellet at -20°C until further use. Cells were resuspended in 10 mL lysis buffer (50 mM Tris (pH 8.0), 0.2 mg mL-1 lysozyme) and incubated for 30 min at room temperature. The cells suspension was then lysed by one freeze/thaw cycle (20 min at -80°C, thaw at room temperature) before MnCl2 was added to a final concentration of 1 mM and DNase (5 µL of a 5 mg mL-1 solution) was added to reduce viscosity. The cell lysate was heat-treated for 10 min at 70°C in a water bath before cell debris and denatured host proteins were removed by centrifugation (20 min at 48’384 rcf). Cleared lysate was applied to 2 mL of Ni-Sepharose 6 Fast Flow (GE Healthcare) in a gravity-flow column. The column was extensively washed with wash buffer (50 mM Tris (pH 8.0), 100 mM NaCl, 30 mM imidazole) before protein was eluted with elution buffer (50 mM Tris pH8.0, 100 mM NaCl, 200 mM imidazole). Main fractions containing the PcDTE variants were pooled and dialyzed against buffer D (10 mM Tris (pH8.0), 1 mM MnCl2) and then 2 times against 10 mM Tris (pH8.0). Dialyzed proteins were then aliquoted and stored at -80°C. All variants showed >95% protein purity as judged by SDS-PAGE. Enzyme concentration was determined spectrophotometrically (see Table S4.6 for extinction coefficients and molecular weights of all variants).

4.5.9. Thermal denaturation curves

Plasmids encoding improved variants were used to transform BL21 (DE3) E. coli cells (see above) and single colonies were directly used to inoculate 25 mL of ZYM-5052 autoinduction medium supplemented with 100 µg mL-1 ampicillin. Cells were grown at 30°C for 16 h at 220 rpm and harvested by centrifugation (4,000 rcf, 10 min). Cell pellets were stored at -20°C until further use. Cells were resuspended in 5 mL of lysis buffer and lysed by a freeze/thaw cycle as described above. Cell debris was then removed by centrifugation (20’238 rcf, 5 min) and aliquots of 100 µL cleared lysate were distributed into 8 thin-walled PCR tubes. One tube was kept on ice, 7 tubes were subjected to a temperature gradient (by default 70°C – 90°C; IDF1 70°C – 100°C) using a PeqStar 2X Gradient Thermocycler (PeqLab, Sarisbury Green, UK) for 20 min. Samples were immediately transferred on ice and precipitated protein was removed by centrifugation (20’238 rcf, 2 min, 4°C). Residual activity from each sample was then determined as described before [106]. Temperature was plotted against residual activity and datapoints were fitted to a second-order sigmoidal function using SigmaPlot 12.2 (Systat Software Inc., CA, USA) in order to

88 20 obtain T50 values (the temperature at which 50% of the initial activity is retained after 20 min incubation).

4.5.10. Enzyme kinetic measurements

Enzyme kinetic constants kcat and Km were determined from progress curves with 6 different ketohexose substrates (D-fructose, D-psicose, D-tagatose, D-sorbose, L-sorbose, L-fructose) epimerizing to the respective stereoisomer. In detail, 200 µL of a solution containing 50 mM sodium-phosphate buffer (pH7.0) and different substrate concentrations (4.4 mM, 22 mM, 110 mM, 550 mM, 1.1M, 2.2M) was mixed with 20 µL of a solution containing different amounts of purified enzyme (76 µg to 184 µg) in a U-shaped 96-well plate (NUNC) on a BioShake iQ shaker (Q.Instruments, Jena, Germany) at 25 °C and 1’200 rpm. Reaction was stopped by adding 20 µL of the reaction mix to 145 µL of 0.1 M HCl, followed by the addition of 135 µL of 0.1 M NaOH after 5 min. Conversion of the substrate to the respective epimer was determined by HPLC (see above). Kinetic parameters Km and kcat were obtained by fitting initial velocities to the Michaelis-Menten kinetic model using SigmaPlot 12.2 (Systat Software Inc., CA, USA).

4.5.11. Protein production and purification for crystallization

The three variants PcDTE Var8, PcDTE IDF8 and PcDTE ILS6 were expressed and purified in a first step as described above. In short, the variants were expressed in 1 L of ZYM-5052 autoinduction medium at 30°C and 120 rpm for 16 h. The cells were harvested by centrifugation and the pellet was frozen at -20°C until use. Cells were lysed as described above in 30 mL of lysis buffer (50 mM Tris (pH 8.0), 100 mM NaCl, 30 mM imidazole, 0.2 mg mL-1 lysozyme) and subjected to one freeze-thaw cycle (see above). The lysate was heat-treated for 10 min at 70°C and then centrifuged (20 min at 48’384 rcf). The clear supernatant of each variant was loaded on 2 x 2 mL Ni-Sepharose 6 Fast Flow (GE Healthcare), the loaded resin was extensively washed with lysis buffer, and protein was finally eluted using elution buffer (50 mM Tris (pH 8.0), 100 mM NaCl, 200 mM imidazole). The main fractions containing the pure protein were pooled and concentrated to 5 mL volume using an Amicon Ultra-4 (10 kDa cut-off, Millipore) centrifugal filter device. Concentrated protein was loaded on a Superdex 200 size-exclusion chromatography (SEC) column (GE Healthcare) that had been equilibrated with a solution containing 20 mM Tris (pH 8.0) and 100 mM NaCl. All three variants eluted as one single peak at retention times corresponding to a homodimeric quaternary structure (67.6 kDa) as determined from a calibration curve from proteins of a SEC calibration kit (Sigma Aldrich). Main fractions were analyzed on SDS-PAGE and found to be virtually pure. They were pooled and the proteins were concentrated using an Amicon Ultra-4 (10 kDa cut-off, Millipore) centrifugal filter device before dialysis once against 10 mM Tris (pH 8.0), 1 mM MnCl2 and then extensively against 10 mM Tris (pH 8.0). Protein concentrations were determined spectrophotometrically at

89 280 nm using extinction coefficients calculated using the online tool ProtParam [201] (see Table S4.6) and were 17.4 mg mL-1 (Var8), 17.0 mg mL-1 (IDF8), or 15.9 mg mL-1 (ILS6). The protein solutions were aliquoted, flash-frozen on dry ice and stored at -80°C until further use.

4.5.12. Crystallization of PcDTE Var8, IDF8 and ILS6

Crystals were obtained by the sitting-drop vapor diffusion method (IDF8 and ILS6) in Intelli- Plates 96 (Hampton Research, CA, USA), MRC2 crystallization plates (Molecular Dimensions, Suffolk, UK) or by the hanging-drop vapor diffusion method (Var8) in ComboPlates on siliconized cover slides (Jena Bioscience, Jena, Germany). All crystallization trials were performed at 20°C. Initial crystallization hits were obtained using commercial screens JBScreen Classic 5 (Jena Bioscience, Jena, Germany) for PcDTE Var8, NeXtal PEGs-Suite (Quiagen, VA, USA) for PcDTE IDF8 and PEGRx HT (Hampton Research, CA, USA) for PcDTE ILS6. Crystallization conditions were optimized by varying the precipitant concentration and the pH around the initial hit conditions. The optimized crystallization conditions were as follows: 0.1 M Tris-HCl

(pH 7.7), 18% (w/v) PEG 8K, 0.2 M LiSO4 for PcDTE Var8; 0.1 M Na-HEPES (pH 7.5), 25% (w/v) PEG 6K for PcDTE IDF8; 0.1 M MES (pH 5.7), 10% (w/v) PEG 4K for PcDTE ILS6. Crystals for PcDTE Var8 were cryoprotected by dipping into mother liquor complemented with 20% glycerol (v/v) and flash-frozen in liquid nitrogen. Crystals for PcDTE IDF8 were soaked for 10 min in mother liquor complemented with 0.8 M D-fructose and crystals for PcDTE ILS6 were soaked for 3 min in mother liquor complemented with 0.6 M L-sorbose. Soaked crystals were then flash-frozen without addition of any further cryoprotectant.

4.5.13. pH-dependent activity profile

Initial catalytic rates were determined with D-fructose as substrate for PcDTE WT, Var8 and IDF8 and L-sorbose as substrate for PcDTE ILS6 at a final concentration of 90 mM at pH 5.0 - 9.0, initial rates for PcDTE ILS6 were determined with 90 mM L-sorbose as substrate. 80 mM acetate buffer (pH 5.0 – 6.0), 80 mM phosphate buffer (pH 6.0 – 7.0) and 80 mM Tris buffer (pH 7.0 – 9.0) was used to set pH. Enzyme stocks were stored in 10 mM phosphate buffer, pH 7.0 supplemented with 1 mM MnCl2. The reactions were performed at 25°C in a 96well plate as described above for the enzyme kinetic measurements. Reactions were stopped at 4 different time-points by adding 20 µL of a reaction mix to 145 µL of 0.1 M HCl. An aliquot of 135 µL of 0.1 M NaOH was added after 5 min and conversion to the respective product was determined by HPLC as decribed above. pH rate data were fitted for variants PcDTE WT, Var8 and ILS6 using equation 1 [202].

Equation 2

90 H with V as the pH-dependent reaction rate, Vmax as the maximum reaction rate and pKa,app as the apparent pKa value of the acidic groups. For PcDTE IDF8 a modified Hill equation [192] was used for fitting:

Equation 3

H with V as the pH-dependent reaction rate, n as the Hill coefficient, Vmax as the maximum reaction rate and pKa,app as the apparent pKa value of the acidic groups.

4.5.14. Crystallographic methods

The x-ray diffraction datasets for all crystals were collected at PXI or PXIII beam line of Swiss Light Source, Villigen, Switzerland. The numbers of collected diffraction images were 720 images for PcDTE Var8 ( = 0.25, t = 0.125 s), 1440 images for IDF8 ( = 0.25, t = 0.1 s) and 1440 images for ILS6 ( = 0.25, t = 0.125 s). Diffraction datasets were processed with iMOSFLM [203] and scaled using Aimless [204] from CCP4 suite. A set of 5% reflections (Rfree set) was set aside for cross-validation [205]. The molecular replacement solution for all three PcDTE structures was obtained using PcDTE wildtype structure (PDB ID 2QUL) as the search model in the program MOLREP [206]. Refinement of structures was performed using REFMAC5 [207] and model building was carried out using COOT [208]. Model validation was carried out with PROCHECK [209].

Table 4.3| Data collection and structure refinement statistics. Crystal structure PcDTE Var8 PcDTE IDF8 PcDTE ILS6 (PDB: 4Q7I) (PDB: 4PFH) (PDB: 4PGL) Data collection Space group C 1 2 1 P 1 21 1 P 1 21 1 Cell axes, a, b, c (Å) 110.47, 47.50, 124.73 57.46, 86.74, 61.82 102.8, 47.44, 126.4 Angles, α, β, γ (°) 90, 103.68, 90 90, 90.1, 90 90, 102.5, 90 Resolution (Å) 53.73 -1.80 (1.85 – 1.80)* 56.74 – 1.9 (1.94 – 1.9) 36.85 – 2.1 (2.15 – 2.1) Unique reflections 58427 (3461) 46594 (2988) 69460 (4447) Multiplicity 3.1 (2.9) 5.4 (5.5) 6.2 (5.8)

Rmerge (%) 5 (33.6) 11 (53.5) 9.9 (42.5) I/σ(I) 11.25 (3.37) 9.6 (4.6) 12.1 (3.7) Completeness (%) 99.5 (99.6) 99.2 (71.0) 99.1 (98.4) Refinement

Rwork / Rfree (%) 17.6 / 20.7 14.4 / 17.7 15.2 / 18.1 RMSD Bond length (Å) 0.013 0.013 0.011 Bond angles (°) 1.485 1.492 1.398 No. of atoms

91 Crystal structure PcDTE Var8 PcDTE IDF8 PcDTE ILS6 (PDB: 4Q7I) (PDB: 4PFH) (PDB: 4PGL) Protein 4698 4814 9384 Ligand 29 48 156 Metals 4 3 5 Water 469 420 381 Average B-factor (Å2) Protein (main chain) 6.2 6.6 13.3 Protein (side chain) 8.2 9.7 17.5 Metals 23.1 15.7 22.3 Water 35.9 26.93 25.4 Other ligands 36.0 26.6 38.9 Ramachandran statistics (%) Favored regions 98.3 97.8 97.8 Allowed regions 1.6 2.2 2.2 * Highest resolution shell in shown in parenthesis.

4.5.15. Structural analysis

The substrate entry tunnel was analyzed by the PyMOL plugin Caver 3.0 [151], potential hydrogen bonds and salt bridges were predicted by PyMOL or the PDBePISA webserver (http://www.ebi.ac.uk/msd-srv/prot_int/cgi-bin/piserver). All figures were prepared with PyMOL.

4.6. Acknowledgment

The authors want to thank the beamline staff from the Swiss Light Source (SLS) for their support during data collection. A.B. is indebted to the Swiss National Science foundation for funding (grant 200021-121918).

92 4.7. Supporting Material

KpRD (ribitol dehydrogenase from K. pneumoniae): MHHHHHHASKHSVSSMNTSLSGKVAAITGAASGIGLECARTLLGAGAKVVLIDREGEKLNKLVAELGENAF ALQVDLMQADQVDNLLQGILQLTGRLDIFHANAGAYIGGPVAEGDPDVWDRVLHLNINAAFRCVRSVLPH LIAQKSGDIIFTSSIAGVVPVIWEPVYTASKFAVQAFVHTTRRQVAQYGVRVGAVLPGPVVTALLDDWPKAK MDEALANGSLMQPIEVAESVLFMVTRSKNVTVRDIVILPNSVDL

RsGD (galactitol dehydrogenase from R. sphaeroides): MHHHHHHASDYRTVFRLDGACAAVTGAGSGIGLEICRAFAASGARLILIDREAAALDRAAQELGAAVAARIV ADVTDAEAMTAAAAEAEAVAPVSILVNSAGIARLHDALETDDATWRQVMAVNVDGMFWASRAFGRAM VARGAGAIVNLGSMSGTIVNRPQFASSYMASKGAVHQLTRALAAEWAGRGVRVNALAPGYVATEMTLKM RERPELFETWLDMTPMGRCGEPSEIAAAALFLASPAASYVTGAILAVDGGYTVW

PcDTE Var8: MNKVGMFYTYWSTEWMVDFPATAKRIAGLGFDLMEISLGEFHNLSDAKKRELKAVADDLGLTVMCCIGLK SEYDFASPDKSVRDAGTEYVKRLLDDCHLLGAPVFAGLTFCAWPQHPPLDMVDKRPYVDRAIESVRRVIKVA EDYGIIYALEVVNRYEQWLCNDAKEAIAFADAVDSPACKVQLDTFHMNIEENSFRDAILACKGKMGHFHLG EQNRLPPGEGRLPWDEIFGALKEIGYDGTIVMEPFMRTGGSVSRAVCVWRDLSNGATDEEMDERARRSLQ FVRDKLALEHHHHHH

PcDTE IDF8: (T9T), S37N, G39E, T109N, H209V, L212I, S256G, A258D MNKVGMFYTYWSTEWMVDFPATAKRIAGLGFDLMEINLEEFHNLSDAKKRELKAVADDLGLTVMCCIGLK SEYDFASPDKSVRDAGTEYVKRLLDDCHLLGAPVFAGLNFCAWPQHPPLDMVDKRPYVDRAIESVRRVIKV AEDYGIIYALEVVNRYEQWLCNDAKEAIAFADAVDSPACKVQLDTFHMNIEENSFRDAILACKGKMGVFHI GEQNRLPPGEGRLPWDEIFGALKEIGYDGTIVMEPFMRTGGSVGRDVCVWRDLSNGATDEEMDERARRSL QFVRDKLALEHHHHHH

PcDTE ILS6: T9S, G39S, T109N, V153A, Q183H, M245I MNKVGMFYSYWSTEWMVDFPATAKRIAGLGFDLMEISLSEFHNLSDAKKRELKAVADDLGLTVMCCIGLK SEYDFASPDKSVRDAGTEYVKRLLDDCHLLGAPVFAGLNFCAWPQHPPLDMVDKRPYVDRAIESVRRVIKV AEDYGIIYALEAVNRYEQWLCNDAKEAIAFADAVDSPACKVHLDTFHMNIEENSFRDAILACKGKMGHFHL GEQNRLPPGEGRLPWDEIFGALKEIGYDGTIVIEPFMRTGGSVSRAVCVWRDLSNGATDEEMDERARRSLQ FVRDKLALEHHHHHH

Figure S4.1| Amino acid sequences of KpRD, RsGD and PcDTE variants Var8, IDF8 and ILS6. Mutations in PcDTE variants are colored in red.

93 a) b)

NADH NAD+ NADH NAD+ D-Fru D-Psi Allitol L-Sor L-Tag Galactitol PcDTE KpRD PcDTE RsGD

600 140

]

]

1

1 - 500 - 120 100 400

80 time [A.U. h [A.U. time

300 h [A.U. time

Δ

Δ / / 60 y = 3.3151x + 38.462

200 R² = 1 340nm 340nm 40

y = 23.485x + 38.019 Abs Abs

100 Abs 20 Δ R² = 0.9992 Δ 0 0 0 5 10 15 20 25 0 5 10 15 20 25 30 35 % D-psicose (100 mM total sugar conc.) % L-tagatose (100 mM total sugar conc.)

Figure S4.2| a) Calibration curve for determination of D-psicose concentration in the presence of D- fructose using KpRD. D-Psicose and D-fructose (total concentration of 100 mM) were mixed in different ratios (0% - 20% of D-psicose/100% - 80% D-fructose) and the slope of NADH oxidation was recorded for each sample and plotted against the amount of D-psicose in the sample. b) Calibration curve for determination of L-tagatose concentration in presence of L-sorbose using RsGD. As for a), L-tagatose and L-sorbose (100 mM total) were mixed in different ratios (0% - 30% of L-tagatose/100% - 70% L- sorbose) and slopes of NADH decline were plotted against L-tagatose concentration.

Construct Var8 IDF1 IDF2 IDF3 IDF4 IDF5 IDF6 IDF7 IDF8 Targeted x-times a) Screening Round R1 R2 R3 R4 R5 R6 R7 R8 R9 Residues Phe7 2 Trp15 2 Glu35 2 Ser37 S37N IDF1 1 Cys66 2 Ile67 2 Gly68 2 Gly107 2

Leu108 2 e

r Trp113 2 e

Glu152 - ph

s Val154 2 t

s Asn155 1 1 Glu158 1 Gln183 2 Asp185 1 His188 - His211 1 Arg217 1 Glu246 - Phe248 2 Val259 2 Tyr8 1 Thr9 T9T T9T IDF4 2 Ile36 1 Leu38 1 Gly39 G39E IDF3 1 Glu40 1 Cys65 1 Leu69 1 Lys70 1 Ala106 1 Thr109 T109N IDF6 2

Pro114 1

e r

e Leu151 1

Val153 1 ph s Leu184 1

nd Thr186 2 2 Phe187 1 His209 H209V IDF2 1 Phe210 1 Leu212 L212I IDF7 1 Val244 V244A 2 Met245 1 Pro247 1 LibS254 2 LibV255 2 LibS256 S256G IDF8 2 LibR257 2 Ala258 A258D IDF5 1

94 Construct Var8 ILS1 ILS2 ILS3 ILS4 ILS5 ILS6 Targeted x-times b) Screening Round R1 R2 R3 R4 R5 R6 R7 Residues Phe7 1 Trp15 1 Glu35 2 Ser37 1 Cys66 1 Ile67 1 Gly68 1 Gly107 1

Leu108 1 e

r Trp113 1 e

Glu152 - ph

s Val154 1 t

s Asn155 1 1 Glu158 1 Gln183 Q183H ILS1 1 Asp185 - His188 - His211 1 Arg217 - Glu246 - Phe248 1 Val259 2 Tyr8 1 Thr9 T9T T9S ILS3 2 Ile36 1 Leu38 1 Gly39 G39S G39S ILS4 3 Glu40 1 Cys65 C65T C65T 5 Leu69 1 Lys70 1 Ala106 1 Thr109 T109N ILS5 1

Pro114 1

e r

e Leu151 1

Val153 V153A ILS2 1 ph s Leu184 1

nd Thr186 1 2 Phe187 1 His209 1 Phe210 1 Leu212 1 Val244 1 Met245 M245I ILS6 2 Pro247 1 LibS254 1 LibV255 1 LibS256 1 LibR257 1 Ala258 2 Figure S4.3| Complete lineage of a) IDF variants and b) ILS variants. Red squares indicate that no hit was found after screening of 93 variants of an NNK library at the respective position. Green squares indicate verified hits with improved activity in the heat-treated lysate. Please note that no distinction between mutants with increased expression level or real catalytic improvement could be made at this point. The best hit of a round was then taken as parent for the next round of screening (orange square). Residues that are underlined were excluded from screening (see text for details). Residues from the first sphere (all residues that have at least 1 atom within 5Å from C3 of D-fructose in PDB 2QUN) were screened first, then the residues of the second sphere were addressed.

Table S4.1| Apparent pKa,app values for PcDTE WT, Var8, ILS6 and IDF8 from fitting initial velocities at different pH to a pH-rate equation (see Figure 4.11)

Enzyme variant apparent pKa,app Hill coefficient PcDTE WT 6.05 +/- 0.05 N.D.a PcDTE Var8 4.49 +/- 0.02 N.D.a PcDTE ILS6 5.25 +/- 0.06 N.D.a PcDTE IDF8 6.70+/- 0.03b 4.2 +/- 0.5b a N.D. indicates not determined, pH-rate profile was fitted using Equation 1 b pH-rate profile was fitted with Equation 2

95 Table S4.2| Enzyme kinetic parameters for IDF and ILS variants Substrate D-fructose D-psicose D-tagatose D-sorbose L-fructose L-sorbose a) a) a) a) a) a) kcat Km kcat Km kcat Km kcat Km kcat Km kcat Km Enzyme s-1 mM s-1 mM s-1 mM s-1 mM s-1 mM s-1 mM Var8 4.9 45.3 2.8 21.3 9.9 48.3 5.7 24.5 0.7 30.7 0.2 54.7 IDF1 11.0 115.1 6.2 22.5 4.8 51.7 2.3 16.6 0.8 28.1 0.3 55.2 IDF2 18.9 285.5 13.7 66.4 N.D. N.D. N.D. N.D. N.D. N.D. N.D. N.D. IDF3 22.1 272.2 15.2 67.2 N.D. N.D. N.D. N.D. N.D. N.D. N.D. N.D. IDF5 32.4 458 27.1 128 N.D. N.D. N.D. N.D. N.D. N.D. N.D. N.D. IDF6 34.0 457 28.0 158 N.D. N.D. N.D. N.D. N.D. N.D. N.D. N.D. IDF7 41.6 563 32.9 179 N.D. N.D. N.D. N.D. N.D. N.D. N.D. N.D. IDF8 42.3 473 57.1 275 11.3 240 4.5 92.4 1.8 91.0 1.0 129 ILS1 3.7 19.2 4.9 9.0 8.4 81.2 2.1 12.4 0.6 12.7 1.6 51.4 ILS2 N.D. N.D. N.D. N.D. N.D. N.D. N.D. N.D. N.D. N.D. 1.9 53.9 ILS3 N.D. N.D. N.D. N.D. N.D. N.D. N.D. N.D. N.D. N.D. 2.1 66.4 ILS4 N.D. N.D. N.D. N.D. N.D. N.D. N.D. N.D. N.D. N.D. 2.1 53.6 ILS5 N.D. N.D. N.D. N.D. N.D. N.D. N.D. N.D. N.D. N.D. 3.0 52.2 ILS6 4.1 32.9 4.2 11.2 4.1 44.3 1.3 10.6 0.4 8.7 3.2 63.0 a) catalytic parameters were determined at 25°C (see methods for details) N.D.: not determined

Table S4.3| Results are presented from pairwise structural alignment of all combinations of PDB structures by the DaliLite web-server (http://www.ebi.ac.uk/Tools/structure/dalilite/) [210]. The higher rmsd-value for pairs with IDF8 is supposed to originate from the difference in structure of the C-terminal 6xHis tag (see Figure 4.5). PDB 1 PDB 2 RMSD [Å] Z-score 2QUN (WT) Var8 0.3 51.9 2QUN (WT) IDF8 0.4 50.7 2QUN (WT) ILS6 0.4 51.4 Var8 IDF8 0.8 51.1 Var8 ILS6 0.4 52.7 IDF8 ILS6 0.9 51.2

96

20 Figure S4.4| Thermostability, expressed as T50 values, of all improved IDF and ILS variants: PcDTE Var8 (blue bar) that was the starting point of the activity screening, IDF variants (grey bars) and ILS variants (green bars). PcDTE WT (red bar) is given as reference for comparison. The final variant ILS6 retains the thermostability of PcDTE Var8 completely, whereas IDF8 has a significantly reduced thermostability 20 compared to Var8 (ΔT50 of -11.8°C).

Figure S4.5| Superposition of PcDTE WT (colored orange, PDB ID: 2QUN) and PcDTE Var8 (colored blue, PDB ID: 4Q7I) with catalytic dyade E152-E246 and conserved residues N185, H188 and R217 shown as sticks. D-fructose from WT structure and glycerol from Var8 structure are colored green and yellow, respectively. All depicted hydrogen-bonds (dashed lines) are between 2.3 Å and 3.6 Å.

97

Figure S4.6| Interaction between the two subunits of PcDTE Var8 with chain A colored in dark red and chain B colored in dark yellow. The 2F0-FC electron density map is contoured at 1.7σ and dashed lines indicate hydrogen bonds that have distances between 2.7Å and 3.4 Å. The two residues H116 from each subunit coordinate one Mn2+ ion; the two Mn2+ ions that are depicted have occupancies of each 0.5, water molecules are shown as red spheres.

Figure S4.7| Crystal contact between chain A (light blue surface) and chain B (cyan surface) of PcDTE IDF8 (PDB: 4PFH) and 6His-tag of neighboring asymmetric unit (dark red sticks). Active site residues are shown in yellow sticks and the entries of the active sites are indicated by black arrows. Potential hydrogen bonds are indicated as dashed lines with distances shown in angstrom (Å). The Mn2+ ion that is depicted is coordinated by the H116 residues from both subunits of the dimer, together with two histidines from the neighboring asymmetric unit.

98 Table S4.4| pKa values between values of 4 and 9 of PcDTE WT and IDF8 predicted by H++, a web-based computational prediction software of protonation states of macromolecules [193]. The Mn2+ ion was explicitly included into the prediction in the second and 4th row. The three residues that show the most significant change in predicted pKa and are close to the active site are highlighted in bold. PcDTE IDF8 PcDTE IDF8 PcDTE WT PcDTE WT Comment w/o Mn2+ w/ Mn2+ w/o Mn2+ w/ Mn2+

Residue pK(1/2) pK(1/2) pK(1/2) pK(1/2) NTMET-1 7.9 7.9 7.9 7.8 N-terminal amino group GLU-35 0.0 5.6 0.0 0.0 GLU-39 5.9 5.9 - - Mutation G39E in IDF8 HID-42 8.1 8.1 7.8 7.5 far from active site ASP-58 4.4 4.4 4.4 4.5 GLU-72 4.1 4.1 3.4 3.7 HIS-98 7.1 7.1 7.5 7.5 far from active site HIS-116 6.7 6.8 - - ASP-144 3.8 3.8 4.0 3.9 ASP-164 3.6 3.6 2.8 4.1 HIS-188 0.0 0.0 4.8 0.0 GLU-193 5.2 5.7 0.7 5.2 far from active site HIS-209 - - 7.1 8.7 Mutation H209V in IDF8 GLU-229 4.3 4.3 3.1 3.9 GLU-246 6.4 0.0 4.9 0.0 GLU-272 4.9 4.9 4.1 4.1 GLU-273 4.4 4.4 4.6 4.7 ASP-287 3.7 3.8 5.0 5.0 far from active site

Table S4.5| Oligonucleotides for site-directed saturation mutagenesis Nr Name Sequence 1 LibF7_f 5’-GTGGGCATGNNKTATACCTATTGGAGCACCG-3’ 2 LibF7_r 5’-CAATAGGTATAMNNCATGCCCACTTTATTC-3’ 3 LibW15_f 5’-GCACCGAANNKATGGTGGATTTTCCTGCAAC-3’ 4 LibW15_r 5’-TCCACCATMNNTTCGGTGCTCCAATAGG-3’ 5 LibE35_f 5’-TCTGATGNNKATTAGCCTGGGCGAATTTC-3’ 6 LibE35_r 5’-GGCTAATMNNCATCAGATCAAAACCCAGAC-3’ 7 LibS37_f 5’-GGAAATTNNKCTGGGCGAATTTCATAATC-3’ 8 LibS37_r 5’-CGCCCAGMNNAATTTCCATCAGATCAAAACCC-3’ 9 LibC66_f 5’-TATGTGTNNKATTGGTCTGAAAAGTGAATATG-3’ 10 LibC66_r 5’-GACCAATMNNACACATAACGGTCAGACCC-3’ 11 LibI67_f 5’-GTGTTGCNNKGGTCTGAAAAGTGAATATG-3’ 12 LibI67_r 5’-TCAGACCMNNGCAACACATAACGGTCAG-3’ 13 LibG68_f 5’-GTTGCATTNNKCTGAAAAGTGAATATG-3’ 14 LibG68_r 5’-CTTTTCAGNNKAATGCAACACATAACGGTC-3’ 15 LibG107_f 5’-TTTTGCCNNKCTGACCTTTTGTGCATGG-3’ 16 LibG107_r 5’-AGGTCAGMNNGGCAAAAACCGGTGCACC-3’ 17 LibL108_f 5’-GGTTTTTGCCGGTNNKACCTTTTGTGCATG-3’ 18 LibL108_r 5’-CATGCACAAAAGGTMNNACCGGCAAAAACC-3’ 19 LibW113_f 5’-TTTGTGCANNKCCTCAGCATCCACCGCTGG-3’ 20 LibW113_r 5’-TGCTGAGGMNNTGCACAAAAGGTCAGACCGG-3’ 21 LibV154_f 5’-TGGAAGTGNNKAATCGTTATGAACAGTGGCTG-3’ 22 LibV154_r 5’-TAACGATTMNNCACTTCCAGGGCATAAATAATG-3’ 23 LibN155_f 5’-GTGGTGNNKCGTTATGAACAGTGGCTG-3’ 24 LibN155_r 5’-ATAACGMNNCACCACTTCCAGGGC-3’ 25 LibE158_f 5’-CGTTATNNKCAGTGGCTGTGCAATGATG-3’ 26 LibE158_r 5’-CCACTGMNNATAACGATTCACCACTTCC-3’ 27 LibQ183_f 5’-CATGTAAAGTTNNKCTGGATACC-3’ 28 LibQ183_r 5’-GGTATCCAGMNNAACTTTACATG-3’ 29 LibH211_f 5’-CATTTTNNKCTGGGTGAACAGAATCGTC-3’ 30 LibH211_r 5’-ACCCAGMNNAAAATGACCCATTTTACC-3’ 31 LibF248_f 5’-AACCGNNKATGCGTACTGGTGGTAGC-3’ 32 LibF248_r 5’-ACGCATMNNCGGTTCCATCACAATGG-3’ 33 LibV258_f 5’-GTTAGCCGTGCANNKTGTGTTTGGC-3’

99 Nr Name Sequence 34 LibV258_r 5’-GCCAAACACAMNNTGCACGGCTAAC-3’ 35 LibY8_f 5’-GTGGGCATGTTTNNKACCTATTGGAGC-3’ 36 LibY8_r 5’-GCTCCAATAGGTMNNAAACATGCCCAC-3’ 37 LibT9_f 5’-GCATGTTTTATNNKTATTGGAGCACC-3’ 38 LibT9_r 5’-GGTGCTCCAATAMNNATAAAACATGC-3’ 39 LibI36_f 5’-GATCTGATGGAANNKAATCTGGGCG-3’ 40 LibI36_r 5’-CGCCCAGATTMNNTTCCATCAGATC-3’ 41 LibL38_f 5’-CTGATGGAAATTAATNNKGGCGAATTTCATAATC-3’ 42 LibL38_r 5’-GATTATGAAATTCGCCMNNATTAATTTCCATCAG-3’ 43 LibG39_f 5’-GGAAATTAATCTGNNKGAATTTCATAATC-3’ 44 LibG39_r 5’-GATTATGAAATTCMNNCAGATTAATTTCC-3’ 45 LibE40_f 5’-GAAATTAATCTGGGCNNKTTTCATAATCTGTC-3’ 46 LibE40_r 5’-GACAGATTATGAAAMNNGCCCAGATTAATTTC-3’ 47 LibC65_f 5’-CTGACCGTTATGNNKTGCATTGGTCTG-3’ 48 LibC65_r 5’-CAGACCAATGCAMNNCATAACGGTCAG-3’ 49 LibL69_f 5’-ATTGGTNNKAAAAGTGAATATGATTTTGCC-3’ 50 LibL69_r 5’-ACTTTTMNNACCAATGCAACACATAACGG-3’ 51 LibK70_f 5’-GCATTGGTCTGNNKAGTGAATATG-3’ 52 LibK70_r 5’-CATATTCACTMNNCAGACCAATGC-3’ 53 LibA106_f 5’-GCACCGGTTTTTNNKGGTCTGACCTTTTG-3’ 54 LibA106_r 5’-CAAAAGGTCAGACCMNNAAAAACCGGTGC-3’ 55 LibT109_f 5’-GTTTTTGCCGGTCTGNNKTTTTGTGCATGG-3’ 56 LibT109_r 5’-CCATGCACAAAAMNNCAGACCGGCAAAAAC-3’ 57 LibP114_f 5’-CTTTTGTGCATGGNNKCAGCATCCACCG-3’ 58 LibP114_r 5’-CGGTGGATGCTGMNNCCATGCACAAAAG-3’ 59 LibL151_f 5’-CATTATTTATGCCNNKGAAGTGGTGAATC-3’ 60 LibL151_r 5’-GATTCACCACTTCMNNGGCATAAATAATG-3’ 61 LibV153_f 5’-GCCCTGGAANNKGTGAATCGTTATG-3’ 62 LibV153_r 5’-CATAACGATTCACMNNTTCCAGGGC-3’ 63 LibL184_f 5’-GTAAAGTTCAGNNKGATACCTTTCAC-3’ 64 LibL184_r 5’-GTGAAAGGTATCMNNCTGAACTTTAC-3’ 65 LibT186_f 5’-GTTCAGCTGGATNNKTTTCACATGAATATTG-3’ 66 LibT186_r 5’-CAATATTCATGTGAAAMNNATCCAGCTGAAC-3’ 67 LibF187_f 5’-CATCTGGATACCNNKCACATGAATATTG-3’ 68 LibF187_r 5’-CAATATTCATGTGMNNGGTATCCAGATG-3’ 69 LibH209_f 5’-GGTAAAATGGGTNNKTTTCATCTGGGTG-3’ 70 LibH209_r 5’-CACCCAGATGAAAMNNACCCATTTTACC-3’ 71 LibF210_f 5’-GTAAAATGGGTGTGNNKCATCTGGGTGAAC-3’ 72 LibF210_r 5’-GTTCACCCAGATGMNNCACACCCATTTTAC-3’ 73 LibL212_f 5’-GGTGTGTTTCATNNKGGTGAACAGAATC-3’ 74 LibL212_r 5’-GATTCTGTTCACCMNNATGAAACACACC-3’ 75 LibV244_f 5’-GATGGCACCATTNNKATGGAACCGTTTATG-3’ 76 LibV244_r 5’-CATAAACGGTTCCATMNNAATGGTGCCATC-3’ 77 LibM245_f 5’-GGCACCATTGTGNNKGAACCGTTTATG-3’ 78 LibM245_r 5’-CATAAACGGTTCMNNCACAATGGTGCC-3’ 79 LibP247_f 5’-CATTGTGATGGAANNKTTTATGCGTACTG-3’ 80 LibP247_r 5’-CAGTACGCATAAAMNNTTCCATCACAATG-3’ 81 LibS254_f 5’-CGTACTGGTGGTNNKGTTAGCCGTGATG-3’ 82 LibS254_r 5’-CATCACGGCTAACMNNACCACCAGTACG-3’ 83 LibV255_f 5’-CTGGTGGTAGCNNKAGCCGTGATG-3’ 84 LibV255_r 5’-CATCACGGCTMNNGCTACCACCAG-3’ 85 LibS256_f 5’-GTGGTAGCGTTNNKCGTGATGTTTG-3’ 86 LibS256_r 5’-CAAACATCACGMNNAACGCTACCAC-3’ 87 LibR257_f 5’-GGTAGCGTTAGCNNKGATGTTTGTG-3’ 88 LibR257_r 5’-CACAAACATCMNNGCTAACGCTACC-3’ 89 LibA258_f 5'-GCGTTAGCCGTNNKGTTGGTGTTTG-3’ 90 LibA258_r 5'-CAAACACCAACMNNACGGCTAACGC-3’ Degenerated codons: N = A/T/G/C; K = G/T; M = A/C

100 Table S4.6| Extinction coefficients and molecular weight of PcDTE variants discussed in this work Enzyme name Extinction coefficient [M-1 cm-1] Molecular weight [Da] PcDTE Var8 46410 33788 PcDTE IDF1 46410 33815 PcDTE IDF2 46410 33777 PcDTE IDF3 46410 33850 PcDTE IDF5 46410 33894 PcDTE IDF6 46410 33906 PcDTE IDF7 46410 33906 PcDTE IDF8 46410 33876 PcDTE ILS1 46410 33797 PcDTE ILS2 46410 33769 PcDTE ILS3 46410 33755 PcDTE ILS4 46410 33786 PcDTE ILS5 46410 33798 PcDTE ILS6 46410 33780

Table S4.7| Non-covalent interactions between subunits at the dimeric interface of PcDTE WT (PDB ID: 2QUN) and PcDTE Var8 (PDB ID: 4Q7I). Interactions involving residues that were mutated in Var8 compared to WT are highlighted in yellow. This table was prepared based upon prediction of non- covalent interactions by the PDBePISA webserver (http://www.ebi.ac.uk/msd-srv/prot_int/cgi- bin/piserver). PcDTE WT (2QUN) PcDTE Var8 (4Q7I) Hydrogen bonds Hydrogen bonds # Chain A Distance [Å] Chain B # Chain A Distance [Å] Chain B 1 PRO117[O] 3.1 ARG257[NH1] 1 LYS124[NZ] 2.83 TRP262[O] 2 PRO118[O] 2.66 ARG257[NH2] 2 ASN155[ND2] 3.19 TYR157[OH] 3 MET121[O] 3.73 ARG257[NH2] 3 ARG156[NH2] 3.31 ASN216[O] 4 LYS124[NZ] 3.11 TRP262[O] 4 ARG156[NH1] 3.85 VAL259[O] 5 ARG156[NH2] 3.9 ASN216[O] 5 ARG156[O] 2.85 TRP262[NE1] 6 ARG156[NH1] 3.22 VAL259[O] 6 TYR157[OH] 3.14 ASN155[ND2] 7 ARG156[O] 2.96 TRP262[NE1] 7 TYR157[OH] 2.44 GLU158[OE2] 8 ASP164[OD1] 3.08 ARG263[NE] 8 GLU158[OE2] 2.51 TYR157[OH] 9 MET189[O] 3.3 ARG224[NH2] 9 ASP164[OD2] 3.34 ARG263[NE] 10 ASN190[ND2] 2.96 ASN190[O] 10 MET189[O] 3.1 ARG224[NH2] 11 ASN190[O] 3.18 ASN190[ND2] 11 ASN190[ND2] 3.28 ASN190[O] 12 ASN190[O] 2.88 ARG224[NH2] 12 ASN190[O] 3.21 ASN190[ND2] 13 ASN190[OD1] 3.09 ARG224[NH1] 13 ASN190[OD1] 3.24 ARG224[NH1] 14 GLU192[O] 3.14 ASN216[ND2] 14 ASN190[O] 2.92 ARG224[NH2] 15 GLU192[O] 3.38 ARG263[NH2] 15 GLU192[O] 3.35 ASN216[ND2] 16 GLU192[OE1] 3.16 ARG263[NH1] 16 GLU192[OE1] 3.22 ARG263[NH1] 17 GLU192[OE1] 2.97 ARG263[NH2] 17 GLU192[OE1] 2.96 ARG263[NH2] 18 GLU193[O] 2.73 ARG224[NH2] 18 GLU192[O] 3.56 ARG263[NH2] 19 THR194[OG1] 2.9 ASN216[ND2] 19 GLU193[O] 3.49 GLN215[NE2] 20 ASN216[O] 3.59 ARG156[NH2] 20 GLU193[O] 2.8 ARG224[NH2] 21 ASN216[ND2] 3.81 GLU192[O] 21 ASN194[OD1] 3.82 GLN215[NE2] 22 ASN216[ND2] 3.23 THR194[OG1] 22 ASN194[OD1] 3.08 ASN216[ND2] 23 ARG224[NH1] 3.11 ASN190[OD1] 23 GLN215[NE2] 3.81 GLU193[O] 24 ARG224[NH2] 2.83 ASN190[O] 24 GLN215[NE2] 3.86 ASN194[OD1] 25 ARG224[NH2] 2.81 GLU193[O] 25 ASN216[O] 3.38 ARG156[NH2] 26 ARG257[NH1] 2.45 PRO117[O] 26 ASN216[ND2] 3.36 ASN194[OD1] 27 VAL259[O] 3.37 ARG156[NH1] 27 ARG224[NH2] 3.13 MET189[O] 28 TRP262[NE1] 2.88 ARG156[O] 28 ARG224[NH1] 3.32 ASN190[OD1] 29 ARG263[NE] 3.28 ASP164[OD1] 29 ARG224[NH2] 2.86 ASN190[O] 30 ARG263[NH2] 3.52 GLU192[O] 30 ARG224[NH2] 2.69 GLU193[O] 31 ARG263[NH2] 2.95 GLU192[OE1] 31 VAL259[O] 3.87 ARG156[NH1] 32 ARG263[NH1] 3.25 GLU192[OE1] 32 TRP262[NE1] 2.73 ARG156[O] 33 ARG263[NH1] 3.74 ASN163[OD1] 34 ARG263[NH2] 2.73 GLU192[OE1]

101 PcDTE WT (2QUN) PcDTE Var8 (4Q7I) Salt bridges Salt bridges 1 ASP164[OD1] 3.08 ARG263[NE] 1 ASP164[OD2] 3.34 ARG263[NE] 2 ASP164[OD1] 3.98 ARG263[NH2] 2 GLU192[OE1] 3.22 ARG263[NH1] 3 GLU192[OE1] 3.16 ARG263[NH1] 3 GLU192[OE1] 2.96 ARG263[NH2] 4 GLU192[OE1] 2.97 ARG263[NH2] 4 ARG263[NH1] 3.43 GLU192[OE1] 5 ARG263[NE] 3.28 ASP164[OD1] 5 ARG263[NH2] 2.73 GLU192[OE1] 6 ARG263[NH2] 2.95 GLU192[OE1] 7 ARG263[NH1] 3.25 GLU192[OE1]

102 CHAPTER 5: HIGHLY EFFICIENT PRODUCTION OF RARE SUGARS D-PSICOSE AND L-TAGATOSE BY TWO ENGINEERED D-TAGATOSE EPIMERASES

Andreas Bosshart, Nina Wagner, Lei Lei, Matthias Bechtold and Sven Panke

103 5.1. Abstract

Rare sugars are monosaccharides that do not occur in nature in large amounts. However, many of them demonstrate high potential as low-calorie sweetener, chiral building blocks or active pharmaceutical ingredients. Their production by enzymatic means from broadly abundant epimers is an attractive alternative to synthesis by traditional organic chemical means, but often suffers from low space-time yields and high enzyme costs due to rapid enzyme degradation. Here we describe the detailed characterization of two variants of D-tagatose epimerase under operational conditions that were engineered for high stability and high catalytic activity towards the epimerization of D-fructose to D-psicose and L-sorbose to L- tagatose, respectively. A variant optimized for the production of D-psicose showed a very high total turnover number (TTN) of up to 108 catalytic events over a catalyst’s lifetime, determined under operational conditions at high temperatures in an enzyme-membrane reactor (EMR). Maximum space-time yields as high as 10.6 kg L-1 d-1 were obtained with a small laboratory- scale EMR, indicating excellent performance. A variant optimized for the production of L- tagatose performed less stable in the same setting, but still showed a very good TTN of 5.8 x 105 and space-time yields of up to 478 g L-1 d-1. Together, these results confirm that large-scale enzymatic access to rare sugars is feasible.

5.2. Introduction

Enzymes hold great promise as catalysts in such different fields as the manufacture of fine chemicals, for food processing, biofuel production or for the production of bulk chemicals [44]. However, when it comes to industrial application, biocatalytic processes are often limited by low space-time yield (STY) and poor catalytic efficiency of the enzyme under the chosen process conditions [47, 211]. In contrast, an ideal biocatalyst should be stable at process conditions, have high selectivity for and high specific activity on the desired substrate. A high specific activity of the biocatalyst in question reduces the cost contribution of the biocatalyst to the process and facilitates process development by affording high STY in a batch reaction [47] or requiring only a small continuously operated reactor [212]. We previously reported a process that integrated a thermodynamically limited biocatalytic step and a separation via continuous chromatography in the form of a simulated moving bed (SMB) for the efficient production of the rare sugar D-psicose from the bulk hexose D-fructose by epimerization at the C3 position [197]. This system was chosen as a proof-of-principle study to demonstrate the general feasibility of the efficient integrated production of rare hexoses, compounds that have recently attracted great interest as low-calorie sweetener or as precursor for the synthesis of pharmaceuticals [8, 186, 213]. This integrated process concept relies on the interconversion of different aldo- and ketohexoses by stable and efficient isomerases and epimerases [7]. Of central importance in this scheme is D-tagatose epimerase (DTE), a

104 promiscuous enzyme that catalyzes the interconversion of D-fructose and D-psicose, but also the epimerization of all 8 possible ketohexoses at the C3 position [8]. By combining DTE with maximally two additional isomerases in a cascade reaction, the whole set of 24 hexoses is available from only 4 starting materials that are cheaply available (D-glucose, D-fructose, D- galactose, L-sorbose). Enzymatic sugar production depends on two properties of the enzyme: (i) High thermostability in order to enable operation at elevated temperatures, applied to reduce viscosity and the risk of microbial contamination in the system; and (ii) a high catalytic efficiency in order to minimize the amount of enzyme loaded into the enzyme reactor for a required productivity [214]. To this end, we previously drastically increased the thermostability of homodimeric D- tagatose epimerase from Pseudomonas cichorii (PcDTE) by systematically optimizing its dimeric

20 interface, resulting in variant PcDTE Var8, displaying a T50 (the temperature at which it loses 50% of its activity after 20 min incubation) of 87°C, an increase of 21.4°C compared to its wildtype (WT) parent [106] and allowing continuous production of D-psicose at elevated temperatures for several days without significant decrease in conversion [106]. However, PcDTE Var8 showed only mediocre catalytic activity towards the interconversion of D-fructose to D-

-1 psicose (kcat: 12.1 s at 30°C) and poor catalytic activity towards the epimerization of L-sorbose to -1 L-tagatose (kcat: 0.24 s at 30°C). Based on this thermostable Var8 we aimed at improving the catalytic efficiency of PcDTE Var8 towards these two readily available bulk substrates by directed divergent evolution (Chapter 3 of this thesis) and identified two new variants, PcDTE IDF8, improved for the production of D-psicose from D-fructose, and PcDTE ILS6, improved for formation of L-tagatose from L-sorbose. The latter substrate is cheaply available as an intermediate from vitamin C production via the Reichstein synthesis [215]. Here, we characterize the performance of a further optimized variant of PcDTE, IDF10-3, for D- psicose production, and of PcDTE ISL6 for L-tagatose production in an operational setting that would closely resemble the conditions found in an industrial biocatalytic environment: high temperature, high substrate concentration and shear forces. We investigated the two variants in an enzyme-membrane reactor (EMR) with continuous substrate flow through the reactor and determined their performance at different temperatures in terms of operational stability, total turnover number (TTN) and productivity. TTN, the total number of catalytic turnovers (reported in mole conversions per mole catalyst, for immobilized enzymes also gram product per gram immobilized enzyme) during a catalyst’s lifetime [155], is a particular useful key figure as it merges the catalytic activity and the stability of a biocatalyst into one single number, allowing the comparison of the performance of different variants at different temperatures. It has been suggested that a TTN in the order of 104 to 105 can be already considered as a good value for an industrial biocatalyst [216]. However, there have been reports on an immobilized glucose isomerase (Sweetzyme T) that obtained TTN in the order of 107 g g-1 [154] or on (free) leucine dehydrogenase that reached a TTN of nearly 5 x 107 mol mol-1 [217]. In the present study

105 we show that the most improved DTE variants indeed fulfill these requirements, exhibiting TTNs in the order of 108, which represents an improvement of more than 40-fold compared to PcDTE WT. Further, the specific productivities as well as the STY’s obtained with the different variants are discussed and compared to the values obtained with similar biocatalysts.

5.3. Results

5.3.1. epPCR and screening of catalytically improved variants of PcDTE Var8

We have previously described a semi-rational approach to improve the activity of thermostabilized PcDTE Var8 by targeting residues that are located around the active site and the substrate channel of the enzyme (Chapter 4 of this thesis). We were able to improve the activity of Var8 towards D-fructose 8.6-fold and for L-sorbose 13.2-fold, introducing 7 and 6 amino acid changes, respectively. Having thus explored the potential for improvement in the immediate vicinity of the active site, we next explored the remainder of PcDTE Var8. Using an efficient microtiter-plate assay that was available for variant evaluation (Chapter 4 of this thesis), we employed error-prone PCR (epPCR) as a random mutagenesis approach to target amino acid residues beyond the proximity to the active site. We created an epPCR library of PcDTE Var8 with an average error-rate of 3.8 (+/- 2) nucelotides per gene, that upon screening exhibited on average 25% of inactive mutants (defined as less than 10% activity of WT in heat- treated cell free extract). We screened a total of 6,603 clones using the previously mentioned enzymatic assay for determination of D-psicose level and identified 11 variants that showed significantly increased (1.2- to 1.6-fold) conversion of D-fructose to D-psicose in heat-treated cleared lysate. Most of these variants contained more than one amino acid change as well as a variety of silent mutations. In summary, a total of 13 unique amino acid changes were discovered in these 11 different mutants (Table S5.1) as well as 14 silent mutations. Silent mutations were excluded from further analysis as they were deemed to solely affect expression level. Five of the amino acid mutations were located in the dimeric interface that had already been optimized for improved thermostability. They were hence expected to increase activity at the expense of stability and were therefore excluded. We separately randomized each of the remaining eight sites on template PcDTE Var8 by QuickChange mutagenesis using NNK degenerated primers to assess also the effect of amino acid exchanges that could not be introduced by epPCR (e.g. two-base substitutions) and to exploit potential epistatic effects between two or more mutations. Diversifying 4 of the 8 remaining sites led to an increase in activity of 1.2- to 1.8-fold for D-fructose to D-psicose conversion (Table S5.2), specifically A45G, A172I, M207L and E214T. Diversifying the 4 other sites did not result in a beneficial single-site mutation, probably because they exhibited the beneficial effect only in context of the other

106 mutation (epistatic effect), or because the previously observed beneficial effect was not due to them but due to a concomitantly occurring silent mutation (expression level effect). Next, we combined the beneficial mutations of these 4 sites by iterative saturation mutagenesis (ISM), a method that goes through iterative cycles of saturation mutagenesis, targeting one beneficial site after another and using the best variant of each cycle as template for the next round [75]. We started with the best variant PcDTE Gen1 (Var8 + E214T) that showed a 1.8-fold improvement compared to Var8 for D-fructose to D-psicose conversion. Randomization of site M207 did not lead to any mutant with an improvement in enzymatic activity; however randomization and screening of site S45 did generate PcDTE-Gen2 (PcDTE Gen1 + S45T) that showed a 1.3-fold increase in activity compared to PcDTE-Gen1. Randomization of site A172 on PcDTE Gen2 yielded variant PcDTE-Gen3 (PcDTE Gen2 + A172C), showing a 1.5-fold higher activity than PcDTE-Gen2. Interestingly, all these mutations were located at considerable distance from the active site, making it difficult to rationalize their effect on the catalytic activity of PcDTE Var8. PcDTE Gen3 was purified by heat-precipitation of host proteins and immobilized metal-affinity chromatography (IMAC) and were determined for comparison with the parent

-1 PcDTE Var8. A kcat of 16.6 s and a Km of 246 mM was determined with D-fructose as substrate at

25°C (Table S5.3). This is a 3.4-fold improvement in kcat compared to Var8 and a 5.4-fold increase -1 -1 in Km which thus results in a net decrease in catalytic efficiency (kcat/Km = 67.6 M s for Gen3 vs. -1 -1 kcat/Km = 108 M s for Var8). Given the fact that preparative sugar conversions generally occur at high substrate concentrations (> 1M), our focus was on the improvement of kcat. On a more general level, we screened, starting from the same enzyme variant Var8, nearly 30% more clones for the epPCR method than for the rational site-directed approach (7’812 clones for epPCR libraries plus subsequent ISM (this chapter) vs. 6’045 clones by SSM for the generation of

PcDTE IDF8 (Chapter 4)), but the level of improvement was less (8.6-fold increase in kcat vs. 3.4- fold, see Table S5.3). This result highlights that site-saturation mutagenesis of residues in proximity to the active site is significantly more efficient in identifying beneficial mutations compared to a random mutagenesis approach, at least for increasing catalytic activity (see also Supporting Material for detailed discussion).

5.3.2. Integration of beneficial mutations located by epPCR into PcDTE IDF8

The variant PcDTE IDF8 that had emerged from the semi-rational screening of sites in the vicinity of the active site showed already an 8.6-fold higher kcat than Var8 for the epimerization of D-fructose to D-psicose [106]. We reasoned that the introduction of mutations at sites discovered by epPCR might further improve the catalytic rate of PcDTE IDF8 as these sites are all located at a distance to the active site and might thus contribute in an additive manner. We randomized position E214 on IDF8 but could not find any improved variant, which is surprising as this site showed a significant increase in activity in the PcDTE Var8 context. However, 107 randomization of site M207 yielded variant IDF9 (IDF8 + M207V) with a 1.2-fold higher conversion of D-fructose compared to the parent IDF8, determined from heat-treated cleared lysate. Randomization and screening of site A172 did not yield any improvement either, but randomizing S45 resulted in variant IDF10 (IDF9 + S45A) that had a 1.8-fold higher conversion rate compared to the parent variant. In general, the combination of semi-rationally and randomly obtained beneficial mutations suggested that the overall gain from the random step was limited. Therefore, instead of pursuing the same random strategy for the conversion of L-sorbose to L-tagatose, we continued directly with the variant ILS6 obtained earlier with a semi-rational approach (Chapter 4).

5.3.3. Thermal stability of IDF and ILS variants

Thermostability , which often goes in hand with operational stability, is a very important trait for industrial biocatalysts, as higher reaction temperatures increase the reaction rate, decrease the medium viscosity or prevent microbial growth, the two latter points being of special importance for sugar-processing enzymes [33]. Therefore, we investigated how the introduction of additional mutations into Var8 had influenced thermal stability in the IDF and ILS variants by incubation of cleared cell lysate containing the respective variant at various

20 temperatures, which allowed to determine T50 (Figure 5.1). Whereas the ILS variants did not lose considerable stability compared to the thermostable starting point Var8, in particular IDF2

20 (IDF1 + H209V) and IDF5 (IDF3 + A258D) showed a severe reduction in thermostability (ΔT50 = - 9.3°C and -5.1°C, respectively). We therefore back-mutated Val to His at site V209 in IDF10 and concomitantly introduced mutation V244A, a mutation that had been previously found in the same round as V209H but showed a less pronounced increase in activity compared to H209V and had therefore been excluded (Chapter 4 of this thesis). These two mutations generated

20 IDF10-3 which recovered a ΔT50 = +6.1°C compared to IDF10. Back-mutating V209H and D258A 20 20 into IDF10 regained even more thermostability compared to IDF10 (T50 = 84.1°C, ΔT50 = +9.9°C compared to IDF10), making up IDF10-5 (see Figure 5.1).

108 IDF1 92 PcDTE Var8 (Var8 + (WT + S116H, S37N) 90 K122V, F157Y, T194N, A215Q, K251T, G260C, ILS1 ILS2 88 M265L) (Var8 + Q183H) (ILS1 + IDF10-5 V153A) ILS6 (IDF10 + 86 ILS5 (ILS5 + V209H, (ILS4 + D258A) ILS4 M245I) 84 ILS3 T109N) (ILS3 + IDF10-3 IDF2 (ILS2 + G39S) T9S) (IDF10 + 82 (IDF1 + V209H,

] H209V) V244A)

C IDF3

° 80 [

(IDF2 +

0

2 G39E)

0 IDF7 5 78

T IDF6 (IDF6 + IDF8 (IDF5 + L212I) (IDF7 + IDF10 76 T109N) S256G) IDF9 IDF5 (IDF9 + (IDF8 + (IDF3 + S45A) M207V) 74 A258D)

72

70

68

66 PcDTE WT 64 Screening Progress [-]

20 20 Figure 5.1| Thermostability, expressed as T50 values, of all improved IDF and ILS variants. Values for T50 of PcDTE WT (red bar) and Var8 (blue bar) were taken from Bosshart et al. [106] and for ILS1 – ILS6 (green bars) and IDF1 – IDF8 (black bars) from chapter 4 of this thesis. IDF9 and IDF10 that incorporate SSM mutations discovered by epPCR are marked in grey. Introduction of (back)-mutations (V209H and V244A for IDF10-3 and V209H and D258A for IDF10-5, both empty bars) into IDF10 could regain significant thermostability with only minor impact on catalytic activity in case of IDF10-3.

5.3.4. Detailed Characterization

Enzyme variants IDF8, IDF9, IDF10, IDF10-3, IDF10-5 as well as ILS6 and Gen3 were purified in two steps, first by heat-precipitation of host proteins from crude cell lysate and subsequently by IMAC. Enzyme kinetic parameters were determined with purified proteins for D-fructose (IDF8, IDF9, IDF10, IDF10-3, IDF10-5, Gen3) or L-sorbose (ILS6) (Table S5.3) at 25°C.

-1 Variant IDF10-3 showed the highest catalytic activity (kcat = 44.8 s ) from all IDF variants as well 20 -1 as good thermostability (T50 = 80.3°C), whereas IDF10 showed reduced activity (kcat = 37.9 s ) as 20 well as mediocre thermostability (T50 = 74.2°C). In contrast, IDF10-5 shows the highest 20 thermostability of all improved variants (T50 = 84.1°C) but a severe reduction in catalytic -1 activity (kcat = 22.6 s ). We therefore concluded that IDF10-3 shows the most promising compromise between high activity and high thermostability, representing the most promising enzyme candidate for production of the rare sugar D-psicose. For the production of L-tagatose we selected variant ILS6, as this enzyme showed the highest

-1 20 catalytic activity of all ILS6 variants (kcat = 3.2 s ) as well as a thermostability (T50 = 87.2°C) 20 which is nearly unchanged compared to PcDTE Var8 (T50 = 87.0°C).

109 5.3.5. Determination of operational performance of IDF10-3 and ILS6

We determined the catalytic activity of IDF10-3 and ILS6 at different temperatures (30°C to 70°C for IDF10-3, 30°C to 60°C for ILS6) and plotted the resulting kcat as an Arrhenius plot (Figure 5.2).

Whereas Var8 shows an increase in activation energy (as determined from the slope of ln(kcat) vs 1/T) compared to PcDTE WT (59.7 kJ mol-1 vs. 42.6 kJ mol-1), IDF10-3 exhibits a lower activation energy (43.6 kJ mol-1) which is comparable to PcDTE WT and corroborates the higher catalytic activity of IDF10-3 (see Table S5.4). For epimerization of L-sorbose, ILS6 showed a decrease in activation energy compared to Var8 (56.3 kJ/mol vs. 85.2 kJ/mol), confirming its catalytic superiority for substrate L-sorbose. a) b)

7 4

6 3

) ) 5 2

cat

cat

ln(k ln(k 4 1

3 PcDTE wt 0 PcDTE Var8 PcDTE Var8 PcDTE IDF10-3 PcDTE ILS6 2 -1 2.8 2.9 3.0 3.1 3.2 3.3 3.4 2.8 2.9 3.0 3.1 3.2 3.3 -3 1/Temp [10-3, 1/K] 1/Temp [10 , 1/K] Figure 5.2| Arrhenius plots for a) PcDTE WT (square), PcDTE Var8 (triangle) and PcDTE IDF10-3 (circle) with D-fructose as substrate and for b) PcDTE Var8 (triangle) and PcDTE ILS6 (diamond) with L-sorbose as substrate. Lines indicate an exponential fit to the respective data points. Values for activation energies for these reactions are given in Table S5.4.

Half-life times for IDF10-3 and ILS6 were determined in an enzyme-membrane reactor (EMR) whose temperature was maintained at 50°C, 60°C, or 70°C and continuously supplied with fresh substrate in buffered solution (2 M D-fructose in 10 mM PO4 buffer, pH 7.0 for IDF10-3 and

Var8; 1M L-sorbose in 10 mM PO4 buffer, pH 7.0 + 0.1 mM MnCl2 (see below) for ILS6 and Var8). Conversion was determined from samples that were periodically collected from the outlet stream and measured by HPLC (see Figure 5.3a). Flow rates as well as enzyme concentrations of the enzyme variants in the reactor were chosen such that the conversion did not reach the thermodynamic equilibrium for the respective epimerization reaction at any time.

110 a)

Sub- strate Temp

Injection EMR valve Fraction collector Pump b) c)

40000 2000 IDF10-3 50°C ILS6 50°C IDF10-3 60°C ILS6 60°C IDF10-3 70°C 30000 ILS6 70°C Var8 50°C 1500 Var8 50°C Var8 60°C Var8 60°C Var8 70°C Var8 70°C 20000 1000

10000 500

Total D-psicoseTotal [g/g] per g enzyme

0 [g/g] enzyme g per L-tagatose Total 0 0 20 40 60 80 100 0 10 20 30 40 50 Time [h] Time [h] Figure 5.3| Determination of process performance of engineered DTE variants a) Setup for determination of operational parameters. b) Cumulated amount of D-psicose obtained during the course of EMR experiments for IDF10-3 at 50°C (grey circle), 60°C (grey square) and 70°C (grey triangle); Var8 at 50°C (white circle), 60°C (white square) and 70°C (white triangle). EMR runs were performed with an 8.6 mL reactor loaded with 2 mg Var8 (50°C), 1.3 mg Var8 (60°C, 70°C), 2 mg IDF10-3 (50°C), or 1.4 mg IDF10-3 (60°C, 70°C). EMR reactor experiments with IDF10-3 were done at 2 M D-fructose feed with a flow rate of 1 mL min-1; those for Var8 with 1 M D-fructose at a flow rate of 0.2 mL min-1 (see Figure S5.1 for details). c) ILS6 at 50°C (grey circle), 60°C (grey square) and 70°C (grey triangle); Var8 at 50°C (white circle), 60°C (white square) and 70°C (white triangle). Enzyme loading for each experiment was 2 mg (Var8 at all temperatures, ILS6 at 50°C) or 1.3 mg (ILS6 at 60°C and 70°C). All runs were performed with a flow rate of 0.15 mL min-1 of 1 M L-sorbose.

We calculated the cumulated amount of D-psicose produced from D-fructose by Var8 and IDF10-3 (Figure 5.3b) as well as the cumulated amount of L-tagatose produced from L-sorbose by Var8 and ILS6 (Figure 5.3c) in each of the respective EMR experiments, representing a convenient measure for comparison of productivities of the respective biocatalysts. The EMR was loaded with similar amounts of purified enzyme in each experiment (1.3 mg – 2 mg, differences were due to the fact that the reaction mix was not allowed to reach equilibrium, which required smaller amounts of enzymes at higher temperatures) and both substrate feeds were buffered at pH 7.0. Figure 5.3b clearly indicates that IDF10-3 has a much improved productivity at all temperatures compared to Var8, between 7.2-fold and 8.2-fold higher than Var8 if considering the first 24 hours of the experiment (Table 5.1). For both enzyme variants the increase in temperature from 50°C to 60°C yielded a big increase in productivity, whereas a further increase to 70°C did not enhance the productivity significantly for any variant. It is also worth noting that the cumulated amount of D-psicose produced in the EMR hardly changes during the experiment, indicating that both enzyme variants are fairly stable at temperatures

111 up to 70°C. Figure 5.3c depicts the same experimental setup for production of L-tagatose by Var8 or ILS6, supplied with 1 M L-sorbose at a constant flow rate of 0.15 mL min-1 and loaded with 1.3 mg or 2 mg enzyme. In contrast to the experiments with D-fructose a significant curvature of the cumulative L-tagatose production over time can be observed, indicating significant decay of enzymatic activity (see also Figure S5.2). Nevertheless, ILS6 shows 2.1-fold to 2.9-fold higher productivity during the first 24 h of experiment compared to Var8, with the highest productivity in the first 24 h at 70°C (see also Table 5.2).

Table 5.1| Parameters for enzyme inactivation, catalytic activity, total turnover number (TTN) and productivities for Var8 and IDF10-3 with substrate D-fructose D-Fructose Var8 IDF10-3 Feed Concentration 1 M D-fructose 2 M D-fructose Temperature 50°C 60°C 70°C 50°C 60°C 70°C Enzyme loading 2 mg 1.3 mg 1.3 mg 2 mg 1.4 mg 1.4 mg Reactor volume 8.6 mL 8.6 mL 8.6 mL 8.6 mL 8.6 mL 8.6 mL Residence time 43 min 43 min 43 min 8.6 min 8.6 min 8.6 min TTN [ - ] 1.2 x 108 3.5 x 107 1.1 x 107 6.3 x 107 1.3 x 108 4.5 x 107 Specific productivity a) [g g-1 h-1] 197 334 365 1431 2726 2690 Space-time yield b) [g L-1 h-1] 46 51 55 332 443 437 a) gram product per gram biocatalyst per hour; average over first 24 h of EMR run b) gram product per liter reactor volume per hour; average over first 24 h of EMR run

Table 5.2| Parameters for enzyme inactivation, catalytic activity, total turnover number (TTN) and productivities for Var8 and ILS6 with substrate L-sorbose L-Sorbose Var8 ILS6 Feed concentration 1 M L-sorbose 1 M L-sorbose Temperature 50°C 60°C 70°C 50°C 60°C 70°C Enzyme loading 2 mg 2 mg 2 mg 2 mg 1.3 mg 1.3 mg Reactor volume 8.6 mL 8.6 mL 8.6 mL 8.6 mL 8.6 mL 8.6 mL Residence Time 57.3 min 57.3 min 57.3 min 57.3 min 57.3 min 57.3 min TTN [ - ] 1.4 x 105 3.0 x 105 2.9 x 105 2.9 x 105 4.9 x 105 5.8 x 105 Specific productivity a) [g g-1 h-1] 21 46 70 60 123 151 Space-time yield b) [g L-1 h-1] 2.3 5.1 10.3 7.3 13.7 19.9 a) gram product per gram biocatalyst per hour; average over first 24 h of EMR run b) gram product per liter reactor volume per hour; average over first 24 h of EMR run

112 5.3.5.1. Total turnover numbers and productivity for IDF10-3 and Var8 with D-fructose as substrate

To compare the whole-lifetime productivity of each variant at each respective temperature, we calculated total turnover numbers (TTN), a dimensionless number that indicates the total number of catalytic turnovers during a catalyst’s lifetime [155]. The conversion profiles for PcDTE Var8 and IDF10-3 were fitted to a simple Lumry-Eyring unfolding model as described previously [218]. We obtained activation energies of enzyme inactivation (ΔGinact), activation energies of the catalytic reaction (ΔGcat), and the enthalpy changes for the transition between folded and unfolded form of the enzyme (ΔHeq). These parameters served to calculate kcat,obs and kinact,obs for each variant at each temperature (Table S5.5). TTNs are then calculated by dividing the apparent catalytic rate kcat,obs by the apparent inactivation rate, kinact,obs [155]. TTN for Var8 and IDF10-3 at 50°C to 70°C are listed in Table 5.1 and indicate that IDF10-3 is superior at higher temperatures (4.1-fold and 3.6-fold higher TTN at 70°C and 60°C compared to Var8, respectively) whereas Var8 exhibit a 2-fold higher TTN at 50°C. These findings suggest that IDF10-3 is indeed a superior biocatalyst for the production of D-psicose compared to Var8 if the reactor is operated at temperatures at or above 60°C. Compared to the original wild-type PcDTE the TTN for IDF10-3 is increased over 40-fold at high operating temperatures (≥ 60°C), collating the 10-fold increase in TTN from thermostabilization [106] and the 4-fold increase from activity improvement. As a result, the STY of IDF10-3 is 7.3 to 8.8-fold higher compared to Var8, similarly to the specific productivity that is 7.3 to 8.2-fold higher for IDF10-3 compared to Var8 (Table 5.1), illustrating the effect of higher catalytic rates on the operational performance of biocatalysts.

5.3.5.2. Total turnover numbers and productivities for ILS6 and Var8 with L-sorbose

To confirm that the evolved variant ILS6 has a superior performance for the production of L- tagatose compared to Var8, we performed long-term EMR experiments with Var8 and ILS6 at 50°C – 70°C and L-sorbose as substrate. Surprisingly, the decay for the reaction catalyzed by ILS6 was much quicker than would be expected from the high thermostability that was determined in batch reaction (Figure 5.1) and also Var8 showed a significantly faster decay rate with L-sorbose than with D-fructose (Figure S5.2). The Var8 enzyme concentrations for EMR experiments with both substrates were comparable (0.25 mg mL-1 – 0.16 mg mL-1 for D-fructose;

-1 0.23 mg mL for L-sorbose) and both substrates were provided in 10 mM PO4 buffer, pH7.0 except that for L-sorbose the buffer was supplemented additionally with 0.1 mM MnCl2. This ion was added to exclude the possibility of enzyme inactivation due to loss of the catalytically important ion from the enzyme’s active site and we previously confirmed that 0.1 mM MnCl2 had no influence on performance of Var8 with D-fructose as substrate (data not shown). Attempts to fit Var8 and ILS6 deactivation curves by the Lumry-Eyring model used before to fit D-fructose EMR runs resulted in very poor fits. Already visual inspection revealed that none of the curves derived from L-sorbose as a substrate showed an exponential behavior which is

113 typical for a 1st order decay rate [155, 219]. Instead, the curves could be fitted best by a linear fit indicating apparent zero-order decay. The determination of total turnover number (TTN) by

st means of dividing kcat,obs by kinact,obs is explicitly limited to enzymes that deactivate via a 1 order kinetic law [155]. Therefore we simply calculated the amount of product that has been generated from a certain amount of enzyme until it is inactivated completely. By fitting a linear curve to the portion of the deactivation curve that is below 10% conversion (equivalent to product concentrations below 100 mM L-tagatose) so that the impact of the reverse reaction should not be significant, we calculated the total production of L-tagatose from L-sorbose at each of the three temperatures investigated. The total amount of L-tagatose produced was divided by the amount of enzyme applied, yielding the total turnover number (TTN) for Var8 and ILS6 for L-sorbose. The development of TTN as well as of the specific productivity and the STY of the different reactions are listed in Table 5.2 for Var8 and ILS6 at 50°C – 70°C. Surprisingly, it shows that both enzyme variants show increasing TTN’s with increasing temperatures, in contrast to our results for the epimerization of D-fructose with Var8 and IDF10-3 as well as to previous literature reports [154, 155]. ILS6 shows a 1.6- to 2-fold higher TTN than Var8 for all temperatures, resulting mainly from a higher catalytic activity (see Figure 5.2b) that is however opposed by a lower operational stability. STY for ILS6 was 1.9 to 3.2-fold higher compared to Var8, whereas specific productivity was 1.6- to 2-fold higher than for Var8. Therefore it can be concluded that ILS6 is a superior biocatalyst compared to Var8 for the production of L-tagatose, although due to a significant reduction in operational stability, the difference is less than one would expect based on the much increased activity.

5.4. Discussion

5.4.1. Operational parameters

TTN’s are a good basis to compare the productivities of different biocatalyst variants as they reflect the total number of catalytic turnovers during a catalyst’s lifetime and thus incorporates both catalytic efficiency and operational stability [155]. We could show that IDF10-3 is a superior biocatalyst for the production of D-psicose if compared with PcDTE WT or Var8 (> 40-fold increase in TTN at high operating temperatures compared to WT). In contrast, ILS6 exhibited TTNs that were much smaller than expected from its 13.5-fold higher catalytic activity (chapter 4 of this thesis). However, also Var8 with substrate L-sorbose showed much lower operational stability than with D-fructose as substrate at the same temperature. Therefore, ILS6 still resulted as an improved biocatalyst. Substrates have been shown before to exhibit significantly stabilizing effects on the thermostability of their enzymes [220], thus we conclude that L-sorbose might stabilize Var8 (and supposedly also ILS6) to a lesser extent than D-fructose.

114 Nevertheless, we could obtain TTN for ILS6 that were in the order of 105 which can be considered as compatible with the demands on industrial biocatalysts, requiring TTN in the order of 104 to 105 [216, 217]. IDF10-3 on the other hand showed TTN’s up to 108, which is higher than most numbers that have been described in literature for free biocatalysts [155, 217]. It is important to notice that TTN should not be considered as sole criterion to decide on the operating temperature of a process, as this would inevitably lead to the lowest possible operating temperatures, based on the observation that TTN increases with decreasing temperature [154, 155]. Low temperatures on the other hand lead to low enzymatic activity and reaction rates, which results in very long residence times and accordingly to low volumetric productivity [154]. Therefore we calculated the specific productivity and the STY of all variants under the different conditions. We found that IDF10-3 did compare very well with Var8 also in this respect by translating the complete catalytic improvement into enhanced productivity at all temperatures applied. Comparison of IDF10-3 with other industrial biocatalyst that are operated in enzyme-membrane reactors, such as e.g. leucine dehydrogenase [221], Neu5Ac- aldolase [222] or various others [223], revealed that the IDF10-3 maximum STY (443 g L-1 h-1 at 60°C, equivalent to 10.6 kg L-1 d-1) is higher than any other reported STY. It is noteworthy that these results were obtained in a small laboratory EMR (8.6 mL volume) and with conditions that were not optimized regarding high STY, therefore we propose that even higher STY could be obtained if parameters such as substrate conversion and reactor residence time were included in the optimization. In contrast, ILS6, although exhibiting more than 20-fold lower STY than IDF10-3, allows STY’s for the production of L-tagatose that are comparable to the above mentioned industrial examples. In conclusion, two PcDTE variants were presented that are much improved in terms of TTN as well as specific productivity for the synthesis of the rare sugars D-psicose and L-tagatose from the bulk sugars D-fructose and L-sorbose, respectively. To the best of our knowledge, these variants represent the most productive D-tagatose epimerases that are currently available for industrial-scale production of D-psicose and L-tagatose.

115 5.5. Material and Methods

If not stated otherwise all chemicals were purchased from Sigma Aldrich (Buchs, Switzerland), NADH was purchased from GERBU (Heidelberg, Germany). D-Fructose, D-sorbose and L- fructose were purchased from Carbosynth (Berkshire, UK), and D-tagatose and L-sorbose were obtained from Sigma Aldrich (Buchs, Switzerland). D-Psicose was produced in-house by epimerization of D-fructose to D-psicose with DTE and separation of D-psicose from D-fructose by SMB-chromatography as described previously [197]. Restriction enzymes were obtained from New England Biolabs (Ipswich, MA, USA) and oligonucleotides from Microsynth (Balgach, Switzerland).

5.5.1. Strains and plasmids

E. coli Top10 cells (Invitrogen) were used for general molecular biology work and E. coli BL21(DE3) cells (Invitrogen) were used for protein expression. General molecular biology was performed according to standard protocols [173]. All primers used for cloning are listed in Table 5.3, and plasmids used or generated in this work are listed in Table 5.4.

Table 5.3| Primers used in this work Nr Name Sequence 1 pKTS_seq_for 5’-ACCACTCCCTATCAGTGATA-3’ 2 New_pSEVA_rev 5’-TACTCAGGAGAGCGTTCACC-3’ 3 LibM207_IDF8_f 5’-GCAAAGGTAAANNKGGTGTGTTTC-3’ 4 LibM207_IDF8_r 5’-GAAACACACCMNNTTTACCTTTGC-3’ 5 LibS45_f 5’-ATAATCTGNNKGATGCCAAAAAACGTGAACTG-3‘ 6 LibS45_r 5’-GTTTTTTGGCATCMNNCAGATTATGAAATTC-3‘ 7 IDF10_V209H_f 5‘-GTAAAGTTGGTCATTTTCATATTGGTG-3‘ 8 IDF10_V209H_r 5‘-CACCAATATGAAAATGACCAACTTTAC-3‘ 9 IDF10_V244A_f 5‘-GATGGCACCATTGCGATGGAACCGTTTATG-3‘ 10 IDF10_V244A_r 5’-CATAAACGGTTCCATCGCAATGGTGCCATC-3’ Degenerated codons: N = A/T/G/C; K = G/T; M = A/C

Table 5.4| Plasmids used in this work Nr Name Description Reference 1 pAB174 pSEVA backbone with pBR322 origin of replication, bla- [200], Chapter 4 resistance gene, Ptet-PT7 fusion promoter, thermostable PcDTE-Var8, C-terminal 6His tag

2 pAB92 SEVA vector backbone, bla resistance gene, Ptet-PT7 fusion [200], Chapter 4 promoter, MCS, ori pBR322 3 pAB202 Derivative of pAB174, encoding mutant PcDTE IDF8 Chapter 4 4 pAB216 Derivative of pAB202, encoding mutant PcDTE IDF9 This chapter 5 pAB218 Derivative of pAB216, encoding mutant PcDTE IDF10 This chapter 6 pAB222 Derivative of pAB218, encoding mutant PcDTE IDF10-3 This chapter

116 Nr Name Description Reference 7 pAB217 Derivative of pAB174, encoding PcDTE Gen3 containing This chapter mutations E214T/S45T/A172C compared to Var8 8 pAB203 Derivative of pAB174, encoding mutant PcDTE ILS6 Chapter 4

5.5.2. epPCR library construction

Diversity was introduced into the gene of thermostable PcDTE Var8 using epPCR. A typical epPCR reaction of 50 µL contained 5 µL of 10x Taq DNA polymerase buffer (GenScript), 5 µL of 25 mM MgCl2, 2.5 µL of 2 mM MnCl2, 5 µL of 10x epPCR-dNTP mix (2 mM dGTP and dATP, 10 mM dCTP and dTTP), 2 ng of template plasmid pAB174, 5 µL of 10 µM stock solutions of each primer pKTS_seq_for and New_pSEVA_rev, and 2.5 U of Taq Polymerase (GenScript). The thermal cycler program consisted of an initial denaturation step at 95°C, 25 cycles of 20 s at 95°C, 20 s at 50°C, and 2 min at 72°C before the final extension step for 5 min at 72°C. The PCR product was directly purified by a DNA purification kit (Zymo Research) and integrated into the plasmid pAB92 using restriction sites HindIII and XhoI by standard cloning techniques [173]. The library was transformed into electrocompetent E. coli BL21(DE3) cells and plated on LB agar plates containing 100 µg/mL ampicillin. Mutation rate was assessed by sequencing 5 individual clones from the library and found to be 3.8 (+/- 2) errors per gene.

5.5.3. Screening of epPCR library

Screening for increased activity of single clones of the epPCR library of PcDTE Var8 was performed as described before in chapter 4 of this thesis. In short, individual clones were pre- cultured in 500 µL of LB supplemented with 100 µg mL-1 of ampicillin at 37°C overnight before 20 µL of pre-culture was used to inoculate 1 mL of ZYM-5052 autoinduction medium [174] in 96- well deep-well plates. Cells were grown for 16h at 30°C before they were harvested by centrifugation and lysed by treatment with lysozyme and a freeze-thaw cycle. Next, the crude lysate was heat-treated at 70°C for 10 min, cell debris and precipitated protein was removed by centrifugation, and the supernatant was used for the activity assay. For the activity assay, 20 µL of heat-treated lysate was added to 100 µL of 240 mM D-fructose in 96-well flat-bottom microplates (Greiner Bio-One) and incubated for 90 min at 30°C to allow conversion of D-fructose to D-psicose before 120 µL of developing solution was added (50 mM Tris-Cl-Cl (pH 8.0), 1 mM NADH, 40 µg mL-1 KpRD) (Chapter 4 of this thesis). Reduction of D- psicose to allitol was monitored by the concomitant oxidation of NADH at 340 nm wavelength in a Perkin Elmer Wallac 1420 Victor platereader (Perkin Elmer). The resulting change in absorption from NADH oxidation allowed the determination of the concentration of D-psicose using the RbtD calibration curve recorded with known amounts of D-psicose in presence of D- fructose.

117 The most active clones (> 120% activity of parent from heat treated lysate) were regrown in triplicates in 96-well plates, lysed and heat-treated as described above, and the epimerization reaction was started as described above. After 1 h incubation 20 µL of the reaction was stopped by adding it to 145 µL of 0.1 M HCl which was followed by the addition of 135 µL of 0.1 M NaOH after 5 min. Conversion of D-fructose to D-psicose (for IDF) or L-sorbose to L-tagatose (for ILS) was determined by HPLC using a LC ICS-3000 system (Dionex, Olten, Switzerland) equipped with a CarboPac PA1 column (250mm x 4mm I.D.) preceded by a CarboPac PA1 guard column (50 mm x 4mm I.D.) (both Dionex, Olten, Switzerland). Samples were eluted isocratically with 30 mM NaOH at a flow rate of 2.0 mL min-1 and detected by triple pulsed amperometry using an EC detector with a gold electrode (all Dionex, Olten, Switzerland). Amino acid positions that showed an improvement in catalytic activity were then targeted by single site-saturation mutagenesis.

5.5.4. Site-saturation mutagenesis and screening

Sites identified by epPCR to increase the catalytic rate towards D-fructose were targeted by QuikChange mutatgenesis (Stratagene) using Phusion High-Fidelity Polymerase (NEB) and primers with NNK-degenerated codons as described before [106]. A set of 93 clones per library were then screened as described above.

5.5.5. Thermal denaturation curves

Plasmids encoding improved variants were transformed into E. coli BL21 (DE3) cells (see above) and single colonies were directly used to inoculate 25 mL of ZYM-5052 autoinduction medium supplemented with 100 µg mL-1 ampicillin. Cells were grown at 30°C for 16 h at 220 rpm and harvested by centrifugation (4,000 g, 10 min). Cell pellets were stored at -20°C until further use. Cells were resuspended in 5 mL of lysis buffer (50 mM Tris-Cl, pH 8.0, 0.1 mg mL-1 lysozyme) and lysed by a freeze/thaw cycle as described above. Cell debris was then removed by centrifugation (20,238 g, 5 min) and aliquots of 100 µL of cleared lysate were distributed into 8 thin-walled PCR tubes. One tube was kept on ice, 7 tubes were incubated at 7 different temperatures (by default 70°C – 90°C; IDF1 70°C – 100°C) using a PeqStar 2X Gradient Thermocycler (PeqLab, Sarisbury Green, UK) for 20 min. Temperature was plotted against residual activity and datapoints were fitted to a second-order sigmoidal function using

20 SigmaPlot 12.2 (Systat Software Inc., CA, USA) in order to obtain T50 values (the temperature at which 50% of the initial activity is retained after 20 min).

5.5.6. Purification of PcDTE variants

Expression and purification of PcDTE variants was done as described before (Chapter 4 of this thesis). All variants were purified in a first step by heat-precipitation (10 min at 70°C) and in a

118 second step over Ni-Sepharose columns (GE Healthcare) utilizing the C-terminal 6xHis tag. Fractions containing the desired protein were pooled and dialyzed once against 2 L of 50 mM

Tris-Cl buffer, pH 7.0 supplemented with 1 mM MnCl2 and in a second step against 2 L of 10 mM Tris-Cl buffer, pH 7.0, before aliquots were flash-frozen in dry-ice and stored at -80°C.

5.5.7. Enzyme kinetic measurement

Enzyme kinetic measurements were done as described elsewhere (Chapter 4 in this thesis). Initial conversion rates were recorded for enzyme ILS6 at 4 different temperatures (30°C, 40°C, 50°C, 60°C) and 6 different L-sorbose concentrations (4 mM, 20 mM, 100 mM, 500 mM, 1 M, 2 M) in 50 mM Na-phosphate buffer, pH 7.0. For IDF10-3 initial conversions were determined at 5 different temperatures (30°C – 70°C) and 6 different D-fructose concentrations in 50 mM Na- phosphate buffer, pH 7.0 (4 mM to 2 M). Aliquots of 20 µL of the reactions were stopped at regular intervals in 145 µL 0.1 M HCl before 135 µL 0.1 M NaOH was added after 5 min. Samples were diluted to 0.015 g L-1 total sugar concentration and conversion was determined by HPLC (see above). Fitting initial velocities to a reversible Michaelis-Menten kinetic model using

SigmaPlot 12.2 (Systat Software Inc, USA) yielded catalytic parameters kcat and Km.

5.5.8. EMR experiments

Enzyme-membrane reactor experiments were performed in a jacketed reactor from Jülich Fine Chemicals (Jülich, Germany) using an AMICON regenerated cellulose membrane with a MWCO of 10 kDa (Millipore, USA). The reactor temperature was controlled by an external water-bath thermostat and the reactor was operated in a styrofoam container to prevent heat-exchange with the environment. The EMR was stirred at 600 rpm by a magnetic stirrer and substrate was continuously supplied by an HPLC pump at the specified flowrates. For experiments with IDF10-3, 2 M D-fructose in 10 mM Na-phosphate buffer, pH 7.0,

-1 supplemented with 0.1 mM MnCl2 was pumped at a flowrate of 1 mL min , resulting in a residence time of 8.6 min (reactor volume of 8.6 mL). An aliquot of 2 mg (50°C), 1.4 mg (60°C), or 1.4 mg (70°C) purified IDF10-3 was injected using the sampling device of an HPLC (Agilent 1200, Agilent, USA), and samples at the reactor outlet were analyzed by HPLC, using an Aminex Fast Carbohydrate Column (9.0 µm, 100 mM x 7.8 mm, lead form) from Bio-Rad (Bio-Rad, USA) with a flowrate of 2.0 mL min-1 at 85°C. Water was used as eluent and samples were detected by a refractive index detector (RID). For Var8, a substrate concentration of 1 M D-fructose in 10 mM Na-phosphate buffer, pH 7.0 was pumped at a flowrate of 0.2 mL min-1 resulting in a residence time of 43 min and 2 mg (50°C), 1.3 mg (60°C), or 1.3 mg (70°C) of purified Var8 was injected. Samples at the reactor outlet were sampled and analyzed as for IDF10-3. All EMR runs with L-sorbose were done with 1 M L-sorbose, 10 mM Na-phosphate buffer, pH 7.0,

-1 0.1 mM MnCl2 at a flowrate of 0.15 mL min that resulted in a residence time of 57.3 min. An

119 aliquot of 2 mg Var8 was injected for all temperatures (50°C – 70°C) whereas 2 mg (50°C), 1.3 mg (60°C), or 1.3 mg (70°C) were injected for ILS6. Samples for all EMR experiments with L- sorbose as substrate were diluted to 0.015 g L-1 total concentration and analyzed by Dionex HPLC (see above).

5.5.9. Fitting of EMR profiles

EMR reaction curves from IDF10-3 and Var8 with substrate D-fructose were fitted to a simple Lumry-Eyring enzyme deactivation model using a non-linear regression procedure based on a Levenberg-Marquardt method as described by Bechtold and Panke [218]. For Var8 all 3 curves (50°C, 60°C, and 70°C) were fitted together whereas for IDF10-3 each curve was fitted

-1 -1 separately. Initial parameter estimates were ΔGcat=62 kJ mol , ΔGinact=115 kJ mol , Tm=340 K, -1 ΔHeq=100 kJ mol , and parameters resulting from the first fitting run were used as improved initial estimates for the second run, providing the final values. From the values of ΔGcat, values for kcat,calc were calculated based on equation 1:

( ) (1)

-1 -1 where kcat,calc (s ) is the theoretically calculated catalytic rate, kb (J K ) is Boltzmann’s constant, T -1 -1 (K) is the absolute temperature, h (J s ) is Planck’s constant, ΔGcat (kJ mol ) is the activation -1 -1 energy for catalysis and R (kJ mol K ) is the gas constant. Parameter kcat,calc was then corrected for the amount of total protein that is reversibly unfolded with equation 2:

(2) [ ( )]

-1 where ΔHeq (kJ mol ) is the enthalpy change between folded and unfolded enzyme and Tm (K) is the melting temperature. Similarly, from the values of ΔGinact, Tm and ΔHeq the deactivation rate kinact,calc was calculated by equation 3 and corrected for the amount of protein that is reversibly unfolded using equation 4, yielding kinact.

( ) (3)

(4)

[ ( )]

TTN‘s were calculated using equation 5 as described by Rogers and Bommarius [155]:

120

(5)

EMR profiles with substrate L-sorbose were fitted by a linear curve to the portion of the graph that is below 10% conversion (i.e. 100 mM L-tagatose for 1 M feed concentration). Based on this linear fit the total production of L-tagatose was calculated (i.e. the area under the resulting triangle from the injection until all enzyme is degraded) and divided by the amount of enzyme that was loaded initially. The resulting mole L-tagatose produced per mole enzyme injected is then equivalent to the TTN.

5.6. Acknowledgment

A.B. and N.W. are indebted to the Swiss National Science foundation for funding (grant 200021- 121918).

121 5.7. Supporting Material

5.7.1. Cost-benefit ratio of random vs. targeted libraries

Most directed evolution experiments that aim at improving certain enzymatic traits (i.e. substrate specificity, enzymatic activity, thermostability) are still hampered by the throughput of their screens, limiting the number of variants that can be screened to a maximum of around 104 clones. Therefore the trend towards smaller but ‘smarter’ libraries has taken on greater significance in recent years [44]. In the present study we aimed at a side-to-side comparison between the (screening-) cost-benefit ratios of a random mutagenesis approach (i.e. epPCR) versus a targeted mutagenesis approach (i.e. saturation mutagenesis of residues in vicinity to the active site) for the improvement of catalytic activity. It has been reported previously that mutations that are introduced randomly by means or epPCR and improve enzyme activity, substrate selectivity, or enantioselectivity are biased towards the proximity of the active site [224, 225]. This is interesting insofar as mutating the entire protein generates a greater number of mutations that are far away from the active site, simply because there are more amino acids that are far from the active site than close ones [225]. In case of PcDTE, 48 residues (16% out of a total of 293 residues [12]) are located within 10Å of the center of the active site (Chapter 4 of this thesis). Thus 84% of all residues were not targeted by saturation mutagenesis. These 84% of residues are however very likely to be mutated by epPCR simply because they represent the majority of residues in the protein. From all 13 mutations that were initially discovered by screening the epPCR library, 5 were found to be located in the dimeric interface that had previously been shown to be of central importance for thermostability [106]. That nearly 40% of all beneficial mutations from epPCR were located in the dimeric interface consisting of 44 residues (i.e. 15% of all PcDTE residues) lets us conclude that the ‘relaxation’ of the tight interface interaction is an easy evolutionary trajectory for the enzyme to gain activity, however at the expense of thermostability. It is worth noting that all of the 4 confirmed single beneficial mutations derived from the saturation mutagenesis of positions located by epPCR were very distant from the active site, between 13.3Å (E214T) and 24.7Å (A45G) of the residues Cα and the center of the active site. It is likely that the number of clones that were screened for the epPCR library (~7500 variants) was too small to find the (rare) mutations that are close to the active site. Therefore we conclude that site saturation mutagenesis of residues close to the active site is significantly more successful in finding beneficial (i.e. more active) mutants than a random mutagenesis approach, particularly if the screening assay allows only for screening a limited number of clones. Additionally, site-saturation mutagenesis has the advantage of excluding sites that are already known to have adverse effects on the biocatalyst as was the case in our

122 work, where residues involved in the dimeric interface that has been engineered for improved thermostability could be excluded from screening. On the other hand, epPCR mutagenesis is able to unveil mutations that are more distant from the active site and thus hard to predict. Therefore, epPCR and site-saturation mutagenesis can be used as complementary approaches as shown here, to target as much of the protein sequence space as possible but at the same time reducing the screening workload to a reasonable amount.

Table S5.1| Variants of PcDTE Var8 from epPCR library screening with > 1.2-fold WT activity Variant Sense mutation(s) Silent mutation(s) Improvement over WT (-fold) M_EP1 D173E, M207L E276E (GAAGAG) 1.38 M_EP2 F187Y G107 (GGTGGC); K204 (AAAAAG); A290 1.28 (GCAGCT) M_EP3 Q215P A131 (GCAGCT) 1.57 M_EP4 M207L I200 (ATTATC) 1.63 M_EP5 N190S, V261A G252 (GGTGGC) 1.2 M_EP6 E214G G30 (GGTGGA); A131 (GCAGCG); D176 1.57 (GATGAC) M_EP7 S45T, V261A R280 (CGTCGA) 1.45 M_EP8 A172V Y73 (TATTAC); F284 (TTTTTC) 1.38 M_EP9 Q215P G107 (GGTGGC) 1.37 M_EP10 W11R - 1.38 M_EP11 N216S - 1.49 Positions F187, N190, Q215, V261, N216 were excluded from further screening as they are located in the dimeric interface of the protein and are essential for thermostability [106]

Table S5.2| Single beneficial mutations from sites discovered in epPCR mutagenesis Mutations from error-prone PCR Mutations from QuickChange PCR Mutation Relative activity to Var8 Mutation Relative activity to Var8 S45T 1.45 S45G 1.48 A172V 1.38 A172I 1.20 M207L 1.63 M207L 1.70 E214G 1.34 E214T 1.81

123 Table S5.3| Enzyme kinetic parameters for variants discussed in this work, determined at 25°C -1 -1 -1 Enzyme variant Substrate kcat [s ] Km [mM] kcat/Km [M s ] PcDTE WT a) D-fructose 23.5 40.0 588 PcDTE Var8 b) D-fructose 4.9 45.3 108 PcDTE IDF8 b) D-fructose 42.3 473 89.4 PcDTE IDF9 D-fructose 31.0 475 65.2 PcDTE IDF10 D-fructose 37.9 530 71.6 PcDTE IDF10-3 D-fructose 44.8 482 92.9 PcDTE IDF10-5 D-fructose 22.6 171 132 PcDTE Gen3 D-fructose 16.6 246 67.6 PcDTE Var8 b) L-sorbose 0.24 54.7 4.4 PcDTE ILS6 b) L-sorbose 3.2 63.0 50.8 a) from ref. [106] determined at 30°C b) from chapter 4 of this thesis, determined at 25°C

Table S5.4| Activation energies determined from Arrhenius Plots (Figure 5.2) Enzyme variant Substrate Activation energy [kJ mol-1] PcDTE WT D-fructose 42.6 PcDTE Var8 D-fructose 59.7 PcDTE IDF10-3 D-fructose 43.6 PcDTE Var8 L-sorbose 85.2 PcDTE ILS6 L-sorbose 56.3

a) b)

300 400 250

200 300

150 200

100

D-psicose [mM] D-psicose

D-psicose [mM] D-psicose 100 50°C 50°C 50 60°C 60°C 70°C 70°C 0 0 0 20 40 60 80 100 0 5 10 15 20 25 30 35 Time [h] Time [h] Figure S5.1| Enzyme membrane-reactor experiments at 50°C (circle), 60°C Square) and 70°C triangle). a) -1 -1 -1 PcDTE Var8 with 0.2 mL min 1M D-fructose (10 mM PO4, pH 7.0) and 0.25 mg mL (50°C) or 0.16 mg mL -1 (60°C, 70°C) enzyme. b) PcDTE IDF10-3 with 1 mL min 2M D-fructose (10 mM PO4, pH 7.0 and 0.1 mM -1 -1 MnCl2), 0.24 mg mL (50°C) or 0.16 mg mL (60°C, 70°C).

124 a) b)

100 200 50°C 50°C 180 60°C 60°C 80 70°C 160 70°C 140

60 120 100

40 80

L-tagatose [mM] L-tagatose L-tagatose [mM] L-tagatose 60

20 40 20

0 0 0 10 20 30 40 50 0 10 20 30 40 50 Time [h] Time [h] Figure S5.2| Enzyme membrane-reactor experiments at 50°C (circle), 60°C Square) and 70°C triangle) with 1M L-sorbose as substrate. a) PcDTE Var8 was run at 50°C (open circle), 60°C (open square) and 70°C (open triangle) each with 0.23 mg/mL initial protein concentration in the EMR. b) PcDTE ILS6 was run at 50°C (open circle) with 0.23 mg/mL initial protein concentration, 60°C (open square) and 70°C (open triangle) both with 0.15 mg/ml initial protein concentration. Lines indicate linear curve fits to the linear part of the curve. Only product concentrations below 100 mM L-tagatose (10% conversion) were fitted to avoid significant impact of the reverse reaction to the observable decay rate.

Table S5.5| Thermodynamic parameters obtained from fitting isothermal EMR experiments for enzyme variants Var8 and IDF10-3 D-fructose Var8 IDF10-3 Feed Concentration 1 M D-fructose 2 M D-fructose Temperature 50°C 60°C 70°C 50°C 60°C 70°C Enzyme loading 2 mg 1.3 mg 1.3 mg 2 mg 1.4 mg 1.4 mg -1 ΔGact [kJ mol ] 69.6 69.6 69.6 64.8 64.7 64.0 -1 ΔGinact [kJ mol ] 113.6 113.6 113.6 107.1 112.9 116.9 -1 -7 -6 -5 -6 -6 -6 kinact,app [s ] 2.78 x 10 1.96 x 10 1.15 x 10 3.65 x 10 4.0 x 10 6.85 x 10 -1 kcat,app [s ] 34.2 69.3 125.1 227.7 505.0 511.0

125

126 CHAPTER 6: DEVELOPMENT OF A DE NOVO ORTHOGONAL D-ALLOSE CATABOLIC PATHWAY IN ESCHERICHIA COLI AND ITS APPLICATION AS AN IN VIVO SELECTION SYSTEM

Andreas Bosshart, Matthias Bechtold and Sven Panke

127 6.1. Abstract

In vivo selection systems are powerful tools for selecting enzyme variants from very large libraries up to a size of 109 variants based on their growth phenotype on selective media. Such selection systems often make use of an auxotrophy in a selection host that can be complemented by a variant exhibiting the missing metabolic reaction. Selection systems therefore heavily depend on the availability of such an auxotrophic strain, limiting their application to only a few reactions. Here we present an alternative approach by constructing a de novo metabolic pathway in an Escherichia coli host that introduces a ‘designed auxotrophy’ that can be complemented by the enzyme under investigation, in this case a D-tagatose epimerase activity . The selection system consists of a pathway to convert the rare hexose D-allose via D-psicose and D-fructose to fructose-6-phosphate that enters the glycolysis. The pathway consists of L-rhamnose isomerase, D-tagatose epimerase and fructokinase. By deleting three kinases that could promiscuously phosphorylate D-allose, and additional genes that were thought to interfere with the de novo pathway, we constructed an E. coli host that is no longer able to metabolize D- allose via its endogenous catabolic pathway. By introducing the genes for the additional activities into this host strain, growth on D-allose via the de novo catabolic pathway could be observed, but was not efficient enough to be used for successful selection experiments. This work demonstrates that the construction of a novel pathway in a host like E. coli can be hampered by many interfering promiscuous enzymatic reactions that need to be removed to construct an orthogonal pathway.

6.2. Introduction

Synthesis of pharmaceuticals or fine chemicals by enzymatic means has gained more and more importance in the last decades due to enzymes’ unmatched (enantio-)selectivity and substrate specificity [46]. On the other hand, the industrial use of enzymes is often hampered by product inhibition, narrow substrate scope, poor stability under operating conditions or low catalytic rates [46, 120]. It has been repeatedly shown that all of these shortcomings can be efficiently addressed and eventually overcome by protein engineering using directed evolution, an in vitro version of Darwinian evolution that consist of introduction of variability into a protein and subsequently selecting or screening for variants with improved characteristics [44, 226]. There is a broad consensus that this latter part of the directed evolution process is still the bottleneck in directed evolution [46, 94, 144, 227]. Diversity can be easily introduced into the target gene by methods like error-prone PCR (epPCR), shuffling of DNA fragments [228], or site- saturation mutagenesis [229], resulting in libraries that easily reach diversities of >107 members. Such huge library sizes are not amenable to screening by conventional 96-well plate based assays that still represent the backbone for directed evolution of enzymes [104, 227]. To

128 overcome these constraints, several measures are conceivable. Most importantly, the library size can be reduced, creating so-called ‘smart’ libraries [44], or the screening step can be replaced by an (in vivo) selection step, where only suitable catalysts can provide the selection host strain with a certain trait that is necessary for survival (e.g. antibiotic resistance, synthesis of a necessary nutrient). This latter strategy closely resembles Darwinian evolution insofar as ‘unfit’ mutants, which often represent the prevailing fraction of the total mutant population, are completely removed from the pool of all mutations [107]. Such a selection approach can easily handle library sizes of 106 to over 109 variants as it is effectively limited only by the transformation efficiency that can be reached for a certain host [88]. We are interested in the two enzymes D-tagatose epimerase (DTE) and L-rhamnose isomerase (LRI), which catalyze the epimerization and isomerization of hexose monosaccharides, respectively. Both enzymes play a major role in the synthesis of rare monosaccharides, compounds that have recently attracted great interest as low-calorie sweetener, prebiotics, or chiral building blocks for pharmaceuticals [8]. DTE from Pseudomonas cichorii (PcDTE) catalyzes the epimerization at the C3 position between all four possible pairs of ketohexoses, with the highest activity for the epimerization between D-tagatose and D-sorbose, whereas the epimerization of the other ketohexose pairs is catalyzed with moderate to low efficiency [12] (chapter 4 of this thesis). LRI from Thermotoga neapolitana (TnLRI) on the other hand is an efficient L-rhamnose isomerase but shows only poor activity towards the isomerization of D- psicose to D-allose as well as in the reverse reaction. Previous efforts in our group have focused on the establishment of an integrated process that couples the thermodynamically limited biotransformation of D-fructose to D-psicose by PcDTE and continuous separation via simulated moving bead chromatography (SMB), thus allowing 100 % yield in theory [197]. By combining PcDTE with TnLRI it would become possible to produce the highly interesting rare hexose D-allose directly from the inexpensive bulk hexose D-fructose. However, previous experiments have indicated that TnLRI in particular has a very low specific activity, which would lead to a very small feed stream into the separation, making an integrated process economically unfeasible. We propose to address this problem by the development of an in vivo selection system for the screening of improved PcDTE or TnLRI variants. We suggest that a de novo metabolic pathway, consisting of enzymes TnLRI, PcDTE, and a fructokinase from Zymomonas mobilis (ZmFRK) could be introduced into an E. coli host strain that can take up D-allose but no longer metabolizes it via its natural D-allose catabolic pathway [230, 231] (Figure 6.1). The enzymatic performance of either TnLRI or PcDTE could then be coupled to the growth rate of the host strain, enabling the screening of very large libraries. The construction of this selection system would require the deletion of the native D-allose catabolic pathway that exists in E. coli and allows it to grow on D-allose as sole carbon source [230, 231] as well as the deletion of any other (promiscuous) enzyme activity of the E. coli host

129 that could interfere with the de novo pathway by diverting pathway intermediates into non- productive pathways or by leading to the production of toxic compounds. Next, it requires the coordinate expression of the three enzymes that form the orthogonal novel catabolic D-allose pathway such that the enzyme under investigation represents the bottleneck of the pathway. Mutants of the respective enzyme from a library that is introduced into the selection host and exhibit beneficial enzymatic properties would thus allow for a higher flux through the pathway and hence a faster growth rate of the host. In this work, we demonstrate the development of a highly engineered selection system for the screening of rare sugar-producing enzymes PcDTE or TnLRI. As a result of the establishment of the de novo pathway, we discovered two additional promiscuous D-allose kinases that allowed E. coli to grow on D-allose despite deletion of the alsK gene coding for allose kinase, exemplifying the difficulty to orthogonalize a de novo metabolic pathway in vivo, especially in the case of monosaccharides. Further, we could demonstrate that the de novo orthogonal D- allose pathway works in principle, but that it suffers from a very low flux through the pathway that is likely the result of the very low catalytic efficiency of the L-rhamnose isomerase for the conversion of D-allose to D-psicose. Finally, different options to remedy this problem are discussed.

130

Figure 6.1| Schematic representation of the final E. coli selection system (SelSysA5.3) for screening of D- tagatose epimerase (DTE) variants. The native D-allose degradation pathway is blocked by knockouts of the genes for allose kinase (alsK), allose epimerase (alsE), as well as that for the promiscuous manno(fructo)kinase (mak) and a part of the phosphotransferase system (PTS, ptsI) that can promiscuously import and phosphorylate D-allose. Additionally the rhamnose (rhaBADM) and the arabinose (araBAD) operons are knocked out as well to remove possibly interfering kinases. A de novo D- allose catabolic pathway is established by introduction of genes dte from P. cichorii on the screening plasmid (SP), of frk from Z. mobilis and rhaA from T. neapolitana on the helper plasmid (HP). Together, these enzymes should allow E. coli to grow on D-allose as sole C-source and the flux through the pathway should correlate with the growth rate of the cell. Hence it should be possible to select for enzyme variants of either TnLRI or PcDTE that have a higher catalytic efficiency, allowing a higher flux through the pathway and therefore a faster growth rate of the cell.

131 6.3. Results

6.3.1. Construction of a selection system for catalytically improved D-tagatose epimerases

6.3.1.1. Deletion of E. coli genes that could interfere with the de novo pathway

We hypothesized that it should be possible to establish a novel D-allose catabolic pathway in E. coli by introducing the genes rhaA from T. neapolitana, coding for a L-rhamnose isomerase (TnLRI) that can convert D-allose to D-psicose (this work), dte from P. cichorii coding for a D- tagatose epimerase (PcDTE) that can convert D-psicose to D-fructose [12], and the gene for a fructokinase that can phosphorylate D-fructose to fructose 6-phosphate, that can be further channeled into glycolysis [232] (Figure 6.1). A prerequisite would be abolishing the wild-type route for D-allose utilization: E. coli MG1655 cells are able to metabolize D-allose via a specific catabolic pathway encoded by the genes in the alsRBACEK operon (see Figure S6.2) [230, 231]. This native catabolic pathway allows E. coli Mg1655 to grow on D-allose as sole carbon source with a growth-rate of 0.35 h-1 at 37°C (Figure S6.1). Genes alsBAC are coding for an ABC transporter that actively imports D-allose with high affinity (Kd of 0.33 µM) [230] by simultaneous hydrolysis of ATP. Allose is then phosphorylated to allose 6-phosphate (A6P) by the kinase AlsK [233], the latter compound is isomerized to psicose 6-phosphate (P6P) by the gene product of alsI, that is encoded outside the operon, and psicose 6-phosphate is epimerized to fructose-6-phosphate (F6P) by AlsE (Figure S6.2). It should be noted that allose 6-phosphate is considered to be toxic if overproduced [231], thus requiring an efficient mechanism to quickly remove this compound from metabolism. Given the structure of the operon, the pathway can be interrupted by removing alsK from the operon, leaving intact the genes alsBAC coding for the transporter as well as its promoter, the regulator alsR and alsE. This should then allow the modified E. coli cell to grow on D-allose as sole carbon source via the novel catabolic pathway, which essentially would rely on intracellular isomerization and epimerization of allose before phosphorylation, instead of after phosphorylation as it is the situation in wild-type E. coli. The resulting strain could be employed as a selection system, screening for functional and even improved variants of either TnLRI or PcDTE, provided that the respective enzyme represents the rate-limiting step of the reaction cascade. In a first step we aimed at removing all genes that could allow E. coli to interfere with or bypass the de novo catabolic pathway. Figure 6.1 indicates that any kinase that could phosphorylate either D-allose directly or the intermediate D-psicose would allow the host to circumvent the novel pathway by redirecting the flux through parts of the native D-allose catabolic pathway. Kim et al. have shown that AlsE is essential for the utilization of D-allose by the native catabolic pathway, whereas AlsK and AlsI (the allose 6-phosphate isomerase, also known as RpiB) are not [230]. While AlsK was shown previously to be indeed an allose kinase, Poulsen et al. found that it is not up-regulated together with the other enzymes in the allose operon and that a ΔalsK

132 strain had the same allose kinase activity as an alsK+ strain [231]. These findings suggest that at least one additional kinase in E. coli can (possibly promiscuously) phosphorylate D-allose after its import into the cells by AlsBAC. Miller et al. reported that 3 different promiscuous kinases can revert the auxotrophy of a glucokinase-deficient E. coli (glk-, Δ(ptsH, ptsI, crr)) to grow again on D-glucose as sole C-source, namely those encoded by the genes alsK, nagK (N-acetyl-β-D- glucosamine kinase), nanK (N-acetylmannosamine kinase) and mak (manno(fructo)kinase). We suspected mak of having the highest substrate promiscuity as it phosphorylates aldo- and ketohexoses like D-mannose, D-fructose, D-glucose, 2-deoxy-D-glucose and glucosamine [234] and thus might also be responsible for the phosphorylation of D-allose. We cloned mak from E. coli (EcMAK) with an additional 6His-tag at its C-terminal end onto a vector, overexpressed and purified the protein by immobilized metal affinity chromatography (IMAC) and analyzed its substrate spectrum. We found that EcMAK has indeed a high substrate promiscuity, phosphorylating 5 of the tested hexoses (D-fructose, D-glucose, D-allose, D-psicose and D- sorbitol), with its specific activity declining in the order D-fructose > D-glucose > D-psicose > D- allose > D-sorbitol as judged by visual inspection of a thin-layer chromatogram (TLC) (Figure 6.2a). We reasoned that this promiscuous kinase is a possible bypass for D-allose utilization if the allose kinase AlsK is not present, thus we eliminated mak from the E. coli MG1655 genome. To prevent interference with the de novo D-allose catabolic pathway, we also removed the arabinose (araBAD, encoding L-ribulokinase, L-arabinose isomerase and L- 5-phosphate epimerase) and rhamnose (rhaBADM, encoding L-rhamnulose kinase, L-rhamnose isomerase, rhamnulose-1-phosphate aldolase and L-rhamnose mutarotase) catabolic pathways from the E. coli MG1655 genome, resulting in E. coli strain SelSysA4.1 (Δmak, ΔalsK, ΔaraBAD, ΔrhaBADM).

133 a) b)

EcMAK EcMAK ZmFRK ZmFRK

Glu Glu Fru All Psi

Glu Glu Sor Fru All Psi Glu

Glu Sor

Sor Tag Gal

Gal Sor Tag

- - - - -

------

- -

- - -

- - -

G6P D D D D D

D D L L D G6P D D D G6P D L

D L G6P L L D

Hex Hex

Hex-6P Hex-6P

Figure 6.2| Substrate spectrum of two different fructokinases. a) Conversions of 8 different substrates by manno(fructo)kinase from E. coli (EcMAK) after 4.5 - 6 h at 30°C, separated on thin layer chromatography (TLC) silica plates and developed with butanol-ethanol-water (5:3:2 (v/v/v)). b) Conversion of 8 different substrates by fructokinase from Z. mobilis (ZmFRK) after 4.5 - 6 h at 30°C. The first two lanes (G6P and D- Glu) are used as markers for phosphorylated and non-phosphorylated hexoses, respectively. G6P: D- glucose 6-phosphate, D-Glu: D-glucose, D-Fru: D-fructose, D-All: D-allose, D-Psi: D-psicose, L-sor: L- sorbose, L-Gal: L-galactose, L-Tag: L-tagatose, D-Sor: D-sorbitol.

6.3.1.2. Debugging of the selection system

Growth experiments of SelSysA4.1 surprisingly revealed that despite the knockouts of mak and alsK the strain could still grow on M9-agar plates with D-allose as sole carbon source after only 2 days at 30°C (Figure 6.3a). To check whether the pathway that enabled growth on D-allose is mediated by a promiscuous D-allose or D-psicose kinase that leads to the formation of D-allose 6-phosphate and/or D-psicose 6-phosphate, respectively, we deleted alsE encoding a psicose-6- phosphate 3-epimerase that could convert allose 6-phosphate into fructose 6-phosphate. It has previously been described that deletion of alsE or alsI leads to toxic accumulation of phosphorylated pathway intermediates upon growth on D-allose [235]. Thus, we reasoned that the deletion of alsE in the SelSysA4.1 background (resulting in SelSysA4.2) would lead to growth inhibition when grown on glycerol and D-allose compared to glycerol alone. SelSysA4.2 indeed showed no growth on glycerol in presence of D-allose, whereas this phenotype could be readily reversed by re-introduction of alsE on plasmid p186-alsE (results not shown). Based on these findings we concluded that another promiscuous D-allose or D-psicose kinase is present in E. coli that is able to bypass the designated catabolic pathway. Systematic screening of the scientific literature did not reveal any indication of another cytosolic hexose kinase with capability of phosphorylating D-allose. Deletion of 5 different kinases from SelSysA4.1 (galK: galactokinase, xylB: xylulokinase, rbsK: ribokinase, glk: glucokinase or fucK: L-fucolokinase), which are known to accept substrates that are similar in structure to D-allose also did not reveal any observable effect of these kinases on growth on D-

134 allose (Figure 6.4). We concluded that these cytosolic kinases are most probably not those responsible for the promiscuous D-allose phosphorylation, although we cannot exclude that one of them is replaced by a newly induced kinase, or that promiscuous allose phosphorylation is distributed over several of the mentioned enzymes in such a way that the deletion of the gene of any one particular enzyme goes without observable effect. a) b) c)

SelSysA4.1 SelSysA4.1 SelSysA4.1 SelSysA4.1 SelSysA4.1 ΔptsG ΔmanZ ΔptsG ΔmanZ ΔptsI

SelSysA4.1 SelSysA4.1 SelSysA4.1

Figure 6.3| Growth test of SelSysA4.1 and its variants carrying deletions of genes from the phosphotransferase system on M9-agar plates supplemented with 0.1 % (w/v) D-allose. a) SelSysA4.1 as such as well as with deletion of gene ptsG or manZ after 2 days at 30°C. A clear reduction in growth can be seen from SelSysA4.1 ΔmanZ::kan whereas knockout ΔptsG::kan has only a minor effect on growth on D-allose. b) The same plate as in (a) after 4 d at 30°C. c) Deletion of ΔptsI::kan, coding for the first enzyme in the PTS phosphorylation cascade [236] completely abolishes growth on 0.1 % D-allose (w/v) after 2 days at 37°C (top half of (c)).

Next, we hypothesized that D-allose might be phosphorylated simultaneously with its import, as it is the case for glucose when taken up by the phosphotransferase system (PTS) [237]. GarcÍa-Alles et al. have indeed shown that E. coli is able to take up D-allose via the two permeases PtsG and ManZ of the phosphotransferase system [238]. Strains based on SelSysA4.1 and carrying additional knockouts of ptsG or manZ showed a significant reduction in growth on D-allose after 2 days if they were ΔmanZ but not when they were ΔptsG (Figure 6.3a). This suggested that ManZ could substitute fully for PtsG, while PtsG could substitute only to a lesser extent for ManZ. Instead, a knockout of the ptsI gene for the first enzyme of the PTS system (which is required for the phosphorylation cascade that provides phosphate to PtsG and ManZ) resulted in a strain that lacked growth on D-allose completely (Figure 6.3c). Therefore, we introduced a ptsI knockout in the selection strain. Furthermore, this selection strain lacked also alsE as a safeguard against still undetected kinases that would act upon D-psicose once it would be produced during the selection (see Figure 6.1). This resulted in SelSysA5.3 (Δmak::FRT, ΔrhaBADM::FRT, ΔaraBAD::cam, ΔalsEK::FRT, ΔptsI::FRT). This strain showed no longer growth on M9-agar plates with 0.2 % D-allose as sole C-source even after 9 d at 30°C (Figure 6.6, segment a), but it grew readily when complemented with genes alsEK on a plasmid under

135 control of the inducible Ptet promoter (Figure 6.6, segment b), thus re-constituting the original allose consumption pathway via allose 6-phosphate. We therefore considered SelSysA5.3 to be an appropriate host for the incorporation of the de novo D-allose catabolic pathway. a) b) c)

O O O

H OH H OH H OH ΔgalK H OH HO H HO H ΔfucK H OH HO H H OH H OH H OH H OH

OH OH OH D-allose D-galactose D-glucose Δglk ΔxylB O O O

H OH H OH HO H HO H H OH H OH H OH H OH H OH ΔrbsK HO H OH OH CH 3 D-xylose D-ribose L-fucose Figure 6.4| Growth test of SelSysA4.1 (Δmak, ΔalsK, ΔaraBAD, ΔrhaBADM) on D-allose with additional knockout of 5 different kinases. a) Growth of SelSysA4.1 plus Δglk::kan, ΔgalK::kan, ΔxylB::kan or ΔrbsK::kan on M9-agar with 0.1 % D-allose at 30°C for 2 days. b) Growth of SelSysA4.1 plus ΔfucK::kan on M9-agar with 0.1 % D-allose at 30°C for 3 days. c) Fischer projection of the relevant hexoses and pentoses. galK: gene for galactokinase, xylB: gene for xylulokinase, rbsK: gene for ribokinase, glk: gene for glucokinase, fucK: gene for L-fucolokinase

6.3.2. Growth test of the assembled selection system

6.3.2.1. Evaluation of different enzymes for the construction of the de novo pathway

The manno(fructo)kinase mak has been shown above to exhibit high promiscuous kinase activity, phosphorylating not only D-fructose but also D-psicose and D-allose, which would interfere with the conceived de novo pathway and thus prohibit the use of EcMAK as the required fructose kinase (Figure 6.2). Therefore, we tested an alternative fructokinase from Z. mobilis (ZmFRK) that was described to have high specificity for D-fructose [239]. The gene was cloned from the Z. mobilis genome under attachment of a C-terminal 6xHis tag, overexpressed in E. coli, and the protein was purified to homogeneity by IMAC. Different hexoses were tested as substrates as before for EcMAK (Figure 2a) and it could be shown that ZmFRK exhibits indeed virtually absolute specificity for D-fructose among the substrates chosen (Figure 6.2b). Enzyme kinetic parameters reported before also indicate that ZmFRK is an appropriate kinase for our

-1 purpose, with a high kcat (191 s , Table 1) and a low Km (0.7 mM) for D-fructose [240]. For the conversion of D-psicose to D-fructose we chose the D-tagatose epimerase from P. cichorii (PcDTE) that has a mediocre catalytic efficiency for this reaction (Table 6.1) but that we had characterized thoroughly and engineered towards high thermostability [106] and high catalytic activity (chapter 4 of this thesis) in previous work.

136 Table 6.1| Kinetic parameters of enzymes used in this study -1 -1 -1 Enzyme Substrate Temp. [°C] kcat [s ] Km [mM] kcat/Km [M s ] Reference variant WT TnLRI D-allose 30°C 0.83 62.4 13.3 This chapter R2 TnLRI D-allose 30°C 0.71 44.2 16.3 This chapter PcDTE WT D-psicose 30°C 32.5 18.9 1720 [106] PcDTE IDF10 D-psicose 25°C 25.6 132 194 Chapter 4 ZmFRK D-fructose 25°C 191 0.7 2.7 x 105 This chapter

The first step in the de novo catabolic pathway, the isomerization of D-allose to D-psicose, should be catalyzed by an L-rhamnose isomerase (LRI). Several LRI’s that catalyze the isomerization between D-allose and D-psicose were described in the literature [13, 15, 241, 242], with most of them exhibiting only poor specific activity and high Km’s for D-allose. Additionally, enzyme kinetic data were recorded at different and often elevated temperatures (50°C – 65°C), making it difficult to compare the various LRI’s in view of the purpose intended here. We decided to employ LRI from T. neapolitana (TnLRI), which was cloned from the T. neapolitana genome under attachment of a C-terminal 6xHis tag, expressed in E. coli, purified, and shown to have a low catalytic activity for D-allose but high thermostability [243], a desirable enzymatic feature for both directed evolution [34] and utilization in an industrial process setup [47].

6.3.2.2. Directed evolution of TnLRI

We evaluated the enzyme kinetic properties of the three enzymes employed in the de novo catabolic pathway. Table 6.1 clearly indicates that the first step in the pathway, catalyzed by

TnLRI, is likely to be the rate-limiting step in the artificial pathway due to a high Km- and a low kcat-value. Therefore we intended to improve this enzyme by one round of directed evolution using error-prone PCR (epPCR) to generate genetic diversity and shuffling in a second round to combine beneficial mutations. A total of 1023 variants from an epPCR library with 5.3 (+/- 0.6) errors per gene were screened with D-allose as substrate and the five best mutants, showing 2.4 – 3.1 times more activity as determined from heat-treated lysate (Table S6.2). These variants were subjected to DNA shuffling and 744 variants were screened from this library. The best mutant from this 2nd library showed an activity improvement of 2-fold over the best variant from the first round (variant 1-6/D12) and was termed R2TnLRI. It exhibited a calculated total improvement of 6-fold higher activity from heat-treated lysate compared to WT TnLRI and contained 4 mutations (W74G, N223D, R357S, R378*) (Table S6.2). This variant was purified by IMAC and enzyme kinetic parameters were determined (Table 6.1). R2TnLRI had nearly unchanged enzyme kinetic parameters compared with WT TnLRI, thus the improvement observed during the directed evolution was exclusively due to an increased expression level of soluble enzyme. Nevertheless, a higher soluble expression level is a desirable feature for the selection system, thus we used R2TnLRI for incorporation into the selection system.

137 6.3.2.3. Theoretical growth rate of the selection system

To assess which growth rate we could expect on D-allose as sole carbon source we calculated the theoretically possible growth rate of the selection system outlined in Figure 6.1. The AlsBAC importer that remained on the genome under the control of the D-allose inducible promoter PAll allows a growth rate of at least 0.35 h-1 in E. coli MG1655 (Figure S6.1), thus it could be assumed that this module would not be the rate-limiting factor. We further assumed that the fructokinase ZmFRK would not be a rate-limiting step either, due to its very high kcat and low Km for D-fructose (Table 6.1). We calculated the growth rate based on the catalytic properties of PcDTE or R2TnLRI using parameters for cell volume, protein concentration/content and cells per

OD600 from the literature and enzyme kinetic data from previous experiments (see Supplementary materials and Table S6.1 for details of the calculation). Different overexpression scenarios for R2TnLRI or PcDTE as well as different intracellular concentrations of D-allose or D- psicose were assumed and doubling times for each pair of parameters were calculated (Figure 6.5). The calculation clearly indicates that TnLRI is the rate-limiting step, leading to a doubling time of at least 22 hours even when assuming an unrealistically high intracellular D-allose level and a very high overexpression of the R2TnLRI biocatalyst.

a)

Cytosolic D-allose [mM]

0.1 0.2 0.5 1 2 5 10 20 0.1 309000 155000 62400 31500 16100 6860 3780 2230 0.2 154000 77400 31180 15700 8056 3430 1890 1110 0.5 61800 31000 12470 6310 3222 1370 756 448

I 1 30900 15500 6240 3150 3222 686 378 224 R

L 2 15400 7740 3120 1580 806 343 189 112 n

T 5 6180 3100 1250 631 322 137 76 45

% 10 3090 1550 623 315 161 69 38 22

b) Cytosolic D-psicose [mM]

0.1 0.2 0.5 1 2 5 10 20 0.1 2860 1440 584 300 157 72 44 29 0.2 1430 719 292 150 79 36 22 15 0.5 572 287 117 60 31 14 8.7 5.9

E 1 286 144 58 30 16 7.2 4.4 2.9 T

D 2 143 72 29 15 7.9 3.6 2.2 1.5 c

P 5 57 29 12 6.0 3.1 1.4 0.9 0.6

% 10 29 14 5.8 3.0 1.6 0.7 0.4 0.3

Figure 6.5| Theoretical doubling times (in hours) of the selection strain system SelSysA5.3[SP1,HP1] as a function of the respective intracellular metabolite level and enzyme concentration (as percent of total protein). a) Doubling time as a function of R2TnLRI level and intracellular D-allose concentration. Doubling times below 100 h are marked in green. b) Doubling time as a function of PcDTE level and intracellular D-psicose level. For details of the calculation see Supplementary Information.

138 6.3.2.4. Growth tests on M9-agar plated containing D-allose

The above calculation suggested that overexpression of R2TnLRI was a crucial prerequisite in order to observe growth in the selection system. To obtain the highest possible level of the

‘bottleneck enzyme’ R2TnLRI, we placed its gene under the inducible promoter Ptet on the screening plasmid SP1 and placed the PcDTE under the moderately strong constitutive promoter J23116 or the stronger constitutive promoter J23110 on helper plasmids HP1 and HP2, respectively, together with ZmFRK under the weak promoter J23117. This arrangement was expected to deliver the highest level of R2TnLRI. SelSysA5.3 was transformed with SP1 and either HP1 or HP2, as well as with p186-alsEK as positive control. Cells were streaked out on M9-agar plates with 0.2 % D-allose as carbon source and incubated for 9 days at 30°C (Figure 6.6). SelSysA5.3 cells containing p186-alsEK grew readily after 2-3 days, but no other construct allowed growth on D-allose, even after prolonged incubation (19 days). We concluded that either the flux through the de novo pathway is too slow to provide growth, that one of the enzymes of the pathway is not expressed correctly, or that an intermediate (phosphorylated or unphosphorylated) of the pathway is accumulating and exhibiting a toxic effect on the cell. On the other hand, aTc-dependent growth of the positive control (SelSysA5.3 + p186-alsEK) clearly indicated that the import of D-allose was working.

G G G A A A

F F F B B B

E E E C C C D D D 0 ng mL-1 aTc 10 ng mL-1 aTc 50 ng mL-1 aTc

A: SelSysA5.3 E: SelSysA5.3 + p271-17ZmFRK-16PcDTE + p186- B: SelSysA5.3 + p186-alsEK R2TnLRI C: SelSysA5.3 + p271-17ZmFRK-16PcDTE F: SelSysA5.3 + p271-17ZmFRK-10PcDTE + p186- D: SelSysA5.3 + p271-17ZmFRK-10PcDTE R2TnLRI G: - Figure 6.6| Growth test on M9-agar plates supplemented with 0.2 % (w/v) D-allose of SelSysA5.3 hosting different combinations of helper plasmids (HP) and/or screening plasmids (SP). Cells were grown for 9 days at 30°C on plates with three different inducer concentrations for expression of R2TnLRI (left: 0 ng mL-1 anhydrotetracycline (aTc), middle: 10 ng mL-1 aTc, right: 50 ng mL-1 aTc). Two colonies on the left plate (0 ng mL-1 aTc) in sector B (SelSysA5.3 + p186-alsEK) are presumably mutations in the tetracycline promoter sequence that convert the regulatable Ptet to a constitutive promoter.

139 6.3.2.5. In vitro test of enzyme functionality

In order to rule out that one of the three enzymes of the de novo pathway is not expressed correctly, we tested the functionality of the pathway in an in vitro experiment. SelSysA5.3 cells transformed with p186-alsEK, SP1, HP1, SP1 + HP1 or left without any plasmid were grown in rich LB medium and induced with aTc for 6 h. Cells were lysed, normalized for protein concentration and functionality of the pathway members was tested using an enzymatic assay as outlined in Figure 6.7a. The assay showed that all three pathway enzymes were functionally expressed and necessary for the formation of a functional de novo pathway from D-allose to fructose 6-phosphate (F6P) (Figure 6.7). Interestingly, SelSysA5.3[p186-alsEK] showed only a slight increase in F6P level compared to background (SelSysA5.3), which can be attributed to the absence of AlsI (RpiB) that is under strict control of the D-allose inducible promoter PrpiB [231] (Figure S6.2) and catalyzes the interconversion of allose 6-phosphate (A6P) to psicose 6-phosphate (P6P). As all strains for this experiment were grown in absence of D-allose, AlsI is not produced to a significant amount and A6P conversion to P6P is limited, leading finally to a low F6P level (Figure 6.7a). Based on these results we conclude that all necessary enzymes of the de novo D-allose catabolic pathway were expressed and functional in the selection strain, leaving too low turnover number of R2TnLRI or accumulation of (inhibitory) pathway intermediates as potential explanations for the observed non-growth phenotype (Figure 6.6).

140 a) c) ATP ADP NADP+ NADPH 35 D-All D-Psi D-Fru Fru-6-P Glu-6-P 6-PGL R2TnLRI PcDTE ZmFRK G6PI* G6PDH* 30 From HP1 and SP1

25

]

1 -

b) [h 20

0.8 15

0.7 SelSysA5.3 10

0.6 SelSysA5.3 + alsEK 5

Abs Abs (340 nm) Δ 0.5 SelSysA5.3 + SP1 0

SelSysA5.3 + HP1 0.4 -5 SelSysA5.3 + HP1 + SP1 0.3

pos. control Abs Abs (340 nm) [A.U.] 0.2 neg. control

0.1 0 5 10 15 20 Time [min]

Figure 6.7| In vitro functionality test of the de novo D-allose catabolic pathway. a) Schematic representation of the in vitro reaction cascade with R2TnLRI, PcDTE, and ZmFRK expressed from the screening plasmid (SP: R2TnLRI) or helper plasmid (HP1: PcDTE and ZmFRK) and glucose 6-phosphate isomerase (G6PI) and glucose 6-phosphate dehydrogenase (G6PDH) added externally for detection (marked by asterisk). D-All: D-allose, D-psi: D-psicose, D-fru: D-fructose, Fru-6-P: D-fructose 6-phosphate, Glu-6-P: D-glucose 6-phosphate, 6-PGL: 6-phosphogluconolactone. b) Progress curves of NADPH production for SelSysA5.3 with different combinations of helper and screening plasmids. Cleared lysate of the respective combination was incubated with D-allose for 20 min at 30°C before ATP, NADP+, G6PI and G6PDH was added and production of NADPH was followed at 340 nm. Data labels: alsEK: genes induced from p186-alsEK; SP1: p186-R2TnLRI; HP1: 271-16PcDTE-17ZmFRK; pos. control: purified TnLRI, PcDTE and ZmFRK added instead of lysate; neg. control: only purified WT TnLRI and ZmFRK was added instead of lysate. c) Slopes of the linear part of the progress curves shown in b), the error bars indicate the standard deviation from mean for three individual progress curves.

6.3.2.6. Growth inhibition by expression of R2TnLRI in presence of D-allose

Accumulation of pathway intermediates in un-optimized enzymatic pathways can often lead to growth inhibition of the host strain [244, 245]. To test whether this is the case for our synthetic pathway we performed a growth experiment with a synthetic M9 medium containing either 0.5 % glycerol as carbon source or 0.5 % glycerol and 0.2 % D-allose (Figure 6.8). We observed that addition of D-allose prolonged the lag-phase of the cultures in case of SelSysA5.3 without plasmids (Figure 6.8a) as well as of SelSysA5.3[p186-alsEK] (Figure 6.8b). If SelSysA5.3 was transformed with screening plasmid SP1 (R2TnLRI), an even more pronounced effect of D-allose on growth was observed, suggesting that accumulation of D-psicose formed from D-allose by R2TnLRI has a detrimental effect on cell growth (Figure 6.8c). This might be due to the detrimental effect of D-psicose itself or of its phosphorylated derivative. Addition of the down- stream pathway genes by co-transformation with HP1 did not alleviate this phenotype (Figure 6.8e). In this case, the very poor growth of SelSysA5.3 [HP1, SP1] also in absence of D-allose might indicate that the overexpression of 3 genes from multi-copy plasmids might pose a significant stress for the host cell.

141 a) b) c)

1.20 1.20 1.20 SelSysA5.3 SelSysA5.3 + AlsEK SelSysA5.3 + SP1

1.00 1.00 1.00

) [A.U.] ) ) [A.U.] )

0.80 [A.U.] ) 0.80 0.80

600

600 600 0.60 0.60 0.60

0.40 0.40 0.40

0.20 0.20 0.20

Cell density (OD densityCell

Cell density (OD densityCell Cell density (OD density Cell

0.00 0.00 0.00 0 20 40 60 0 20 40 60 0 20 40 60 Time [h] Time [h] Time [h] d) e)

1.20 1.20 SelSysA5.3 + HP1 SelSysA5.3 + HP1 + SP1

1.00 1.00 ) [A.U.] )

0.80 [A.U.] ) 0.80

600 600 0.60 0.60

0.40 0.40

0.20 0.20

Celldensity (OD Cell density (OD densityCell 0.00 0.00 0 20 40 60 0 20 40 60 Time [h] Time [h] Figure 6.8| Toxicity test of D-allose on growth of SelSysA5.3 harboring different plasmids with D-allose pathway genes. Cells were grown at 30°C on M9-medium supplemented with 0.5 % (v/v) glycerol as sole carbon source (empty squares) or with 0.5 % (v/v) glycerol and 0.2 % (w/v) D-allose (filled squares). All cultures were induced with 50 ng mL-1 anhydrotetracycline (aTc) from the beginning, regardless of the hosted plasmid. Error bars indicate standard deviation from mean of three independent growth experiments. alsEK: plasmid pAB166, SP1: p186-R2TnLRI, HP1: 271-16PcDTE-17ZmFRK.

Based on these results we concluded that accumulation of D-psicose or a phosphorylated species thereof in the host cell might lead to a reduction in growth, which can however not be alleviated by the introduction of the down-stream pathway genes (PcDTE and ZmFRK). On the other hand, the growth-hampering effect rather seems to prolong the lag-phase of the cultures than to abolish growth completely.

142 a) b)

0.300 0 ng/ml aTc 3 ng/ml aTc 0.250 P P 10 ng/ml aTc tet 10 25 ng/ml aTc 0.200 IDF10 R2TnLRI 50 ng/ml aTc 0.150

SP2 HP3 0.100 P09

0.050 ZmFRK density Cell (595 nm) [A.U.]

SelSysA5.3 0.000 0 50 100 150 200 250 300 Time [h] c) Figure 6.9| Growth curves for SelSysA5.3 with HP3 (271- 10R2TnLRI-09ZmFRK) and SP2 (p186-PcDTE IDF10) in

0.3 liquid M9 medium supplemented with 0.1 % (w/v) D- allose and 0.02 % (w/v) casamino acids (CAA). 0.2 Expression of PcDTE IDF10 was induced to different extents by different aTc concentrations (0 ng mL-1 to 0.2 50 ng mL-1) from the beginning and optical density

0.1 (OD595 nm) of the cultures was determined at different intervals. a) Schematic representation of the selection 0.1 system with helper plasmid HP3 and screening Cell density (595 [A.U.] (595 nm) density Cell 0.0284 plasmid SP2. b) Growth curves with error bars 266 193 indicating standard deviation from mean for three 161 independent measurements for 5 different inducer 50 113 25 levels for PcDTE IDF10. c) Three-dimensional 10 46 3 representation of the results from (a), indicating best 0 growth at intermediate inducer levels (10 ng mL-1 aTc).

6.3.2.7. Growth of the selection system in liquid M9 medium exhibits inducer-dependent growth-rates

As shown above, we were not able to obtain growth of the assembled selection system on 0.2% D-allose (Figure 6.6). We reasoned that the concomitant overexpression of three enzymes might be too much of a burden for the E. coli metabolism if the entire incoming carbon flux and hence the energy supply of the cell has to rely on a low-efficiency de novo pathway. We thus tested growth of the selection system in liquid M9 medium supplemented with 0.1 % D-allose as well as 0.02 % casamino acids (CAA) to facilitate access to amino acids for protein synthesis. Instead of the R2TnLRI gene we placed PcDTE IDF10 on the selection plasmid, a D-tagatose epimerase variant with around 3-fold improved solubility compared to PcDTE WT [chapter 4 of this thesis]. R2TnLRI was placed on the helper plasmid under the strong constitutive promoter J23110 (Figure 6.9a). This modified selection system was used to inoculate the M9-0.1 % D-allose medium supplemented with 0.02 % CAA and different concentrations of inducer (0 ng mL-1 aTc – 50 ng mL-1 aTc). Figure 6.9b/c show that cells were able to grow depending on the level of PcDTE IDF10 induction, however at a very slow growth-rate (minimum doubling time of around 80 hours). The fastest growth rate was obtained with a low level of induction (10 ng mL-1 aTc) whereas higher inducer concentrations lead to a significant reduction of the growth rates.

143 These results suggest that growth on D-allose plus CAA can be indeed made dependent on the level of PcDTE IDF10 induction, thus in principle allowing the selection of faster enzyme variants. However, growth rates are still extremely small, with doubling times in the order of 100 hours, which is far from optimal and effectively infeasible for a useful selection system.

6.4. Discussion

Selection systems based on genetic in vivo complementation of a dysfunctional enzyme or a whole pathway are a promising tool in enzyme engineering, allowing the screening of very large libraries containing more than 107 variants with ease [107, 246]. In contrast to screening enzyme libraries using 96-well based assays as described previously (Chapters 3 to 5), a growth- based selection system for the directed evolution of the rare sugar producing enzymes D- tagatose epimerase or L-rhamnose isomerase would allow probing a much bigger part of the protein sequence space, facilitating the discovery of catalytically beneficial enzyme variants from a big library. In a first step towards this goal we were able to remove two redundant hexose kinases from the E. coli genome that could phosphorylate the starting substrate D-allose and would have interfered with the establishment of a de novo D-allose pathway. We identified the gene mak, coding for manno(fructo)kinase that is able to phosphorylate D-allose, as well as ptsG and manZ, which both belong to the PTS system and encode transporter proteins for the import of D-glucose and D-mannose, respectively. Knocking out alsK, mak and ptsI (coding for the starting enzyme of the PTS phosphorylation cascade) resulted in an E. coli strain termed SelSysA5.3 that was unable to grow on D-allose as sole carbon source even after prolonged incubation time. This strain thus constitutes the required ‘chassis’ for the implementation of the selection system. In a second step, we aimed at the introduction of a de novo metabolic pathway into SelSysA5.3 that incorporates the two enzymes we were interested in, namely PcDTE and TnLRI, which with the help of a third enzyme (ZmFRK) should channel D-allose into glycolysis at the level of fructose 6-phosphate. A theoretical evaluation identified TnLRI as the bottleneck enzyme in the de novo pathway, suffering from both a high Km and a very low kcat, resulting in a catalytic efficiency (kcat/Km) that is 100-fold lower than that of PcDTE. Our attempt to improve TnLRI via one round of directed evolution using epPCR and shuffling and subsequent screening of mutants in a 96-well plate-based assay resulted in variant R2TnLRI that had a significantly higher expression level but nearly unchanged enzyme kinetic parameters. We hypothesize that the truncation of the C-terminal end of TnLRI, encoding also the 6xHis tag, leads to a better expression level of the protein. The exact mechanism of this is however unclear, as well as that of the other identified mutations. Together with PcDTE WT and ZmFRK this enzyme allowed no

144 visible growth on M9-agar plates containing D-allose, but enabled very slow growth in M9 liquid medium supplemented with D-allose and trace amounts of CAA. An in vitro test of the completely assembled selection system confirmed that the pathway is functionally expressed and able to channel D-allose via D-psicose and D-fructose to fructose 6-phosphate. We concluded that the pathway is in principle functional but might suffer from too low flux or from the accumulation of intermediates that exhibit a growth-inhibiting effect. We could show that the overexpression of R2TnLRI in presence of D-allose indeed showed moderate growth- inhibition compared to the overexpression of R2TnLRI in absence of D-allose, indicating that the accumulation of D-psicose leads to a moderate inhibitory effect. We hypothesize that D-psicose or a phosphorylated derivative thereof might interfere competitively with an enzyme that is involved in D-ribose catabolism/anabolism, as D-ribose and D-psicose have a very similar configuration in their furanose form, which is the relevant configuration in D-ribose phosphorylation by E. coli ribokinase [247]. Moreover, Sato et al. have reported that D-psicose inhibits the larval growth of Caenorhabditis elegans, presumably by docking to the active site of a ribose 5-phosphate isomerase (Rpi), but that D-ribose can competitively reverse this inhibition [248]. D-Psicose or a derivative thereof might also inhibit RpiA/B of E. coli, thus inhibiting the formation of the ribose moiety that is necessary for the biosynthesis of nucleosides. To resolve these limitations of the current selection system, several different measures are conceivable. Firstly, a more efficient enzyme for the conversion of D-allose to D-psicose has to be selected from the current literature that reports on enzymes catalyzing this reaction. Most of these enzymes have been assayed at high but different temperatures, thus their enzyme kinetic parameters are hardly comparable with each other [249] and of little use for the estimation of the catalytic efficiency at 30°C. Therefore we suggest selecting a comprehensive set of already published enzymes acting on D-allose and test them instead of R2TnLRI in our current selection system. Enzymes that can complement the de novo pathway would then be selected for further examination in more detail and could subsequently be used for directed evolution, increasing the catalytic rate further. Secondly, the genes on the helper plasmid could be expressed from an operon under an inducible promoter instead of constitutive promoters before each of the single genes as it is currently implemented. This would facilitate the coordinate expression of the pathway genes and could reduce the burden from the overexpression of three genes. Thirdly, in order to counteract the apparently inhibiting effect of D-psicose on E. coli growth, a variety of compounds could be tested that are intermediates of the ribose moiety in nucleoside biosynthesis. To investigate the effect of D-psicose on the E. coli metabolism and potentially reverse its effect, a similar approach could be used as has been described for the elucidation of the mode of action of the antibiotic psicofuranine that reversibly inhibits xanthosine-5- phophate aminase (GuaA) and whose effect could be counteracted with guanosine or guanine

145 [250] or the inhibitory effect of psicose on the larval development of C. elegans that could be reversed by addition of D-ribose [248]. In summary, we have reported on the development of a sophisticated selection system by deleting a series of genes or complete operons and introducing three enzymes to establish a novel D-allose metabolic pathway. Despite the shortcomings of the current system, we propose that the suggested remedies might allow the generation of a functional and efficient selection system for the directed evolution of LRI or DTE enzymes. During the work on this system we have discovered a previously unknown ambiguous D-allose kinase (EcMAK), exemplifying both the high redundancy of sugar phosphorylation [233, 251] that usually constitutes the first step in E. coli metabolism of these compounds, as well as the difficulty to carry out in vivo biotransformations of unphosphorylated sugars in this host.

146 6.5. Material and Methods

All restriction enzymes and Phusion DNA polymerase were purchased from New England Biolabs (NEB, Ipswich, MA, USA), T4 DNA was purchased from Fermentas (now Thermo Fischer Scientific, Wohlen, Switzerland), oligonucleotide synthesis and sequencing was done by Microsynth (Balgach, Switzerland). If not mentioned otherwise all chemicals were purchased from Sigma Aldrich (Buchs, Switzerland) except for D-allose and D-psicose that were purchased from Carbosynth (Berkshire, UK).

6.5.1. General molecular biology

E. coli Top10 cells were used throughout the work for cloning and vector amplification and E. coli BL21(DE3) cells were used for protein expression and screening of TnLRI libraries. Preparative PCR was done using Phusion DNA polymerase (NEB) according to the protocol of the manufacturer. Colony PCR (cPCR) was done using Taq DNA polymerase (Stratagene, La Jolla, USA) according to the manufacturer’s protocol. General molecular biology techniques were performed according to standard protocols [173]. A list of all primers used for molecular biology work is given in Table 6.2, plasmids used and generated in this study are listed in Table 6.3 and all strains that were generated and used are listed in Table 6.4.

6.5.2. Cloning of TnLRI and library generation

The gene sequence of rhaA from T. neapolitana was amplified directly from a suspension of T. neapolitana cells using the following conditions: 10 µL 5x Phusion HF buffer, 0.2 mM of each dNTP, 1 µM of each primer AB254 and AB255, 10 % DMSO, 2 µL T. neapolitana cell suspension and 0.4 U Phusion polymerase. Cycling conditions were 2 min initial denaturation at 98°C, 45 cycles of 10 s at 98°C, 20 s of 60°C (- 0.2 °C per cycle) and 50 s at 72°C, and a final extension step for 5 min at 72°C. The resulting DNA fragment was purified using Zyppy Clean and Concentrator-5 kits, digested with HindIII/XhoI and ligated into an appropriately cut pAB92 plasmid, resulting in construct pAB107. TnLRI libraries were generated by epPCR using the following conditions: 5 µL 10x Taq polymerase buffer (GenScript, USA), 2.5 mM MgCl2, 0.05 mM MnCl2, 0.2 mM of each GTP and ATP, 1 mM of each CTP and TTP, 10 ng pAB107, 1 µM of each primer AB123 and AB286, 2.5 U of Taq polymerase (GenSript, USA) in a final volume of 50 µL. Thermocycling conditions were as follows: initial denaturation at 95°C for 1 min, 20 cycles of 20 s at 95°C, 20 s at 50°C, 2.5 min at 72°C, final extension for 10 min at 72°C. The amplified DNA fragment was purified using Zyppy Clean and Concentrator-5 kits (Zymo Research, CA, USA), digested with HindIII/XhoI and inserted into an appropriately digested and gel-purified pAB92 plasmid. After overnight ligation at 18°C the ligation mixture was purified and used to transform electrocompetent E. coli Top10 cells. An aliquot of the transformation mixture was plated on LB agar containing 100 µg mL-1

147 ampicillin and 5 clones of the library were randomly picked and sequenced to verify an adequate mutation rate. The remaining part of the transformation mixture was cultivated overnight at 37°C in 2 x 5 mL LB containing 100 µg mL-1 ampicillin, minipreped and the collectively recovered plasmids were used to transform electrocompetent BL21(DE3) E. coli cells. Colonies from this transformation were then picked and screened as described below. The 5 most active variants from the first round of epPCR mutagenesis and screening were used for DNA shuffling as described before [252]. Specifically, plasmids encoding the 5 TnLRI variants were mixed in equimolar amounts and amplified by PCR as described above. The PCR fragment was purified by spin column purification and 45 µL of purified DNA was mixed with 5 µL of 500 mM Tris (pH 7.4), 10 mM MnCl2. 0.4 U of DNaseI (NEB) was added to the DNA at 15°C and incubated for 6 min before EDTA was added to a final concentration of 20 mM and the sample was placed on ice immediately. 50 – 300 bp DNA fragments were isolated from an agarose gel (1 %), purified by spin column purification and reassembled using the following PCR protocol: 10 µL of 5 x Phusion HF buffer, 0.2 µM of each dNTP, 4 % DMSO, 5 µL of purified DNaseI digested DNA fragments, 0.6 U Phusion polymerase in a final volume of 50 µL. Cycling conditions consisted of an initial denaturation for 2 min at 98°C, 40 cycles of 30 s at 98°C, 30 s at 40°C (+ 0.3°C per cycle) and 30 s (+ 5 s each cycle) at 72°C. 1 µL of this 1st assembly PCR was then used as template for the 2nd PCR containing 10 µL of 5 x Phusion HF buffer, 0.2 µM of each dNTP, 4 % DMSO, 1 µM of each primer AB123 and AB286, 0.5 U of Phusion polymerase in a total volume of 50 µL. Thermocycling conditions were as follows: initial denaturation for 2 min at 98°C, 25 cycles of 30 s at 98°C, 30 s at 55°C (- 0.2°C per cycle) and 30 s (+ 2 s each cycle) at 72°C, followed by a final extension step for 5 min at 72°C. This PCR reaction gave a single band on a 1 % agarose gel which was purified by spin column purification, digested with HindIII/XhoI and inserted into a properly digested vector as described above.

6.5.3. Library screening

Screening of TnLRI library variants was performed with D-allose as substrate that can be converted to D-psicose by TnLRI variants. The amount of D-psicose produced was determined as described previously using an assay based on ribitol dehydrogenase from K. pneumoniae (KpRD) (Bosshart et al., Chapter 2). To calibrate the assay a calibration curve for different amounts of D-psicose in presence of D-allose was recorded. 4 different calibration samples were prepared in 50 mM Tris buffer (pH 8.0) with the following concentrations of D-allose (in mM): 9.8; 9.5; 9; 8. The difference to a total hexose concentration of 10 mM was made up by D- psicose, resulting in 2 %, 5 %, 10 % and 20 % D-psicose content of total hexose concentration. An aliquot of 200 µL of these calibration solutions was supplemented with 1 mM NADH (final concentration) and 25 µg of KpRD in a 96 well microplate (Greiner Bio-One, Germany). The rate of NADH consumption was recorded at 30°C using a Perkin Elmer Wallac 1420 Victor plate

148 reader (Perkin Elmer, MA, USA). This rate was a linear function of the D-psicose concentration (up to 20% D-psicose) (Figure S6.3). TnLRI variants were basically screened as described before (Bosshart et al., Chapter 2) with small modifications. Clones from epPCR library variants transformed into BL21(DE3) were inoculated into 500 µL of LB supplemented with 100 µg mL-1 ampicillin and grown overnight at 37°C with shaking. Three wells were inoculated with WT TnLRI as control (or variant TnLRI 1- 6/D12 for screening of the shuffled library). 20 µL of the preculture was used to inoculate 1 mL of ZYM-5052 autoinduction medium [174] supplemented with 100 µg mL-1 ampicillin and grown at 30°C and 160 rpm for 16 h, whereas 160 µL of the preculture was directly frozen at -80°C for later plasmid recovery. Cells were harvested by centrifugation (3220 rcf, 10 min) and the plates were stored at -20°C. Cells were lysed by addition of 100 µL of lysis buffer A (50 mM Tris (pH 8.0), 0.2 mg mL-1 lysozyme), cells were resuspended at room temperature (RT) for 20 min before they were frozen for 20 min at -20°C. Cells were thawed at RT and 100 µL of lysis buffer B was added (50 mM Tris

(pH 8.0), 1 mM MnCl2, some crystals of DNaseI) to reduce viscosity and to saturate TnLRI with Mn2+ ions. 100 µL of the lysate was transferred to a 96-well PCR plate (Vaudaux-Eppendorf), heat-treated for 10 min at 70°C and cooled on ice. Cell debris and precipitated proteins were removed by centrifugation (3220 rcf, 10 min, 4°C). The activity assay was performed in 96-well flat-bottom microplates (Greiner Bio-One) by addition of 20 µL of heat-treated lysate to 100 µL of 20 mM D-allose in 50 mM Tris (pH 8.0). The assay was incubated for 1.5 h (reduced to 40 min for screening of the shuffling library) at 42°C before 120 µL of developing solution (50 mM Tris (pH 8.0), 46 µg mL-1 KpRD, 1 mM NADH) were added and NADH consumption was followed at 340 nm in a Perkin Elmer Wallac 1420 Victor plate reader. D-psicose concentration could then be calculated from the D-psicose calibration curve (see above). Hits discovered by this qualitative screening assay were regrown in triplicates, cells were lysed and the activity assay was performed as described above, except that 20 µL of the reaction mix was stopped after 1.5 h hours by addition to 145 µL of 0.1 M HCl, then 135 µL of 0.1 M NaOH was added after 5 min and the conversion of D-psicose from D-allose was determined by HPLC using a LC ICS-3000 system (Dionex, Olten, Switzerland) equipped with a CarboPac PA1 column (250 mm x 4 mm I.D.) and a CarboPac PA1 guard column (50 mm x 4 mm I.D.) (both Dionex, Olten, Switzerland). Samples were eluted isocratically with 30 mM NaOH at a flow rate of 2.0 mL min-1 and detected by triple pulsed amperometry using an EC detector with a gold electrode (all Dionex, Olten, Switzerland).

149 6.5.4. Expression and purification of TnLRI and R2TnLRI

TnLRI encoded on plasmid pAB107 and R2TnLRI resulting from directed evolution and encoded on plasmid pAB215 were transformed into chemo-competent BL21(DE3) E. coli cells. TnLRI was produced with a C-terminal 6xHis tag whereas R2TnLRI had an N-terminal 6xHis tag that was introduced due to mutation R378* found during directed evolution that introduced a stop codon at the C-terminal end of the protein and thus removed the C-terminal 6xHis tag. Single colonies of either variant were picked, inoculated into 5 mL of LB with 100 µg mL-1 ampicillin (pAB107) or 50 µg mL-1 kanamycin (pAB215) and grown overnight at 37°C with shaking. 300 mL of ZYM-5052 autoinduction medium [174] supplemented with 100 µg mL-1 ampicillin or 50 µg mL-1 kanamycin was inoculated with 200 µL of the overnight culture and proteins were expressed at 32°C with shaking for 16 h. Cells were harvested by centrifugation (5000 rcf, 20 min, 4°C) and stored at -20°C until further use. Cells from each expression culture were resuspended in 15 mL of lysis buffer A (50 mM Tris (pH 8.0), 0.2 mg mL-1 lysozyme), incubated at room temperature for 20 min, frozen at -20°C for 30 min and thawed again at room temperature. Cell suspensions were sonicated for 10 min in a sonication waterbath before MnCl2 was added to a final amount of 1 mM and some crystals of DNaseI were added to reduce viscosity of the cell lysate. Cell debris were removed by centrifugation (48’384 rcf, 20 min, 4°C), supernatant was loaded onto a column packed with 1 mL Ni-Sepharose 6 Fast Flow resin (GE Healthcare) and the column was washed extensively with wash buffer (50 mM Tris (pH 8.0), 100 mM NaCl, 30 mM imidazole). Protein was eluted with elution buffer (50 mM Tris (pH 8.0), 100 mM NaCl, 200 mM imidazole) and main fractions were pooled and dialyzed 2 times against 2 L of 10 mM Tris (pH 8.0). Protein purity was assessed by SDS-PAGE and found to exhibit >95 % purity by visual inspection of the gel. Protein concentration was determined spectrophotometrically at 280 nm (TnLRI: ε = 53’860 M-1 cm-1, MW = 45’759 Da; R2TnLRI: ε = 48’360 M-1 cm-1, MW = 44’191 Da). The dialyzed enzymes were aliquoted and stored at -80°C.

6.5.5. Enzyme kinetics

Enzyme kinetic constants kcat and Km were determined for TnLRI and R2TnLRI from progress curves with substrate D-allose at concentrations of 1 mM, 4 mM, 10 mM, 40 mM and 200 mM. 40 µL of purified enzyme (210 µg for TnLRI, 192 µg of R2TnLRI) was added to the different substrate concentrations in 50 mM Tris (pH 8.0) and 1 mM MnCl2 and progress curves were recorded at 30°C by periodically sampling 20 µL of reaction mix and stopping in 145 µL 0.1 M HCl. After 5 min 135 µL of 0.1 M NaOH was added and conversion of D-psicose from D-allose was determined by HPLC as described above. Kinetic parameters Km and kcat were obtained by fitting initial velocities of the progress curves to the Michaelis-Menten kinetic model using SigmaPlot 12.2 (Systat Software Inc., CA, USA).

150 6.5.6. Generation of knockout strains

E. coli MG1655 was used as host for the generation of individual knockouts as well as for the final assembly of all knockouts into the final selection strain. Each knockout was prepared in a separate E. coli strain and verified by colony PCR (cPCR) for correctness (see Table 6.4 for a list of all strains generated in this study). Gene knockouts Δmak::kan and ΔptsI::kan were obtained from strains of the Keio collection [253] and checked by cPCR using primers AB17/AB18 and AB460/AB461. The strain ΔrhaBADM::kan was constructed according to the protocol of Datsenko and Wanner [254] using plasmid pKD13 as template for PCR with primers AB15 and AB16, introducing a kanamycin-resistance cassette that is flanked by FRT sites for excision using the FLP- recombinase system. Operon araBAD was deleted using a PCR fragment generated with primers AB4 and AB5 and plasmid pACT3 as template for amplification of the chloramphenicol- resistance cassette, resulting in knockout ΔaraBAD::cam. This cam-resistance cassette was conceived as a convenient chromosomal selection marker of the final selection system and was therefore not flanked by FRT-sites. Assembly of all 5 knockouts into one strain was done using P1 phage transduction according to a protocol of Lynn et al. [255]. In short, P1vir phage lysate was prepared using single knockout strains as donors and iteratively integrated into E. coli MG1655. After each step correctness of all introduced knockouts was verified by cPCR, the selection marker (kan-resistance cassette) was removed using the FLP-recombinase system encoded on the pCP20 plasmid [254] and loss of the kan-resistance cassette was again verified by cPCR. Then the next knockout was introduced using the same procedure. The 5 knockouts of the final selection strain SelSysA5.3 were again verified by cPCR (Figure 6.10).

151

a) b) c)

m

::FRT

ca ::

::FRT 1486 bp 4815 bp

::FRT

::FRT

rhaBADM alsEK mak ptsI

araBAD AB17 AB18 AB23 AB24

Δ Δ Marker Δ Δ Δ mak rhaBADM 5.0 4.0 3.0 Knock-out mak::FRT Knock-out rhaBADM::FRT 2.0 1.5 682 bp 595 bp 1.2 1.0 AB17 AB18 AB23 AB24 0.5 FRT FRT

d) e) f)

5163 bp 3466 bp 3296 bp

AB9 AB8 AB364 AB157 AB460 AB461 araBAD alsEK ptsI

Knock-out araBAD::cam Knock-out alsEK::FRT Knock-out ptsI::FRT 773 bp 823 bp 1672 bp

AB9 AB6 AB364 AB157 AB460 AB461 cam FRT FRT

Figure 6.10| Verification of knockout of SelSysA5.3 by colony PCR (cPCR). a) Agarose gel with 2-log marker (labeled in kb) and all 5 individual cPCR. b)-f) Schematic representation of the cPCR of all knockouts, with the WT depicted on top and the knockout depicted below; bands shown in a) always represent the cPCR fragments from the knockout (lower part of each figure). Primers for each cPCR are indicated (Table 6.2) as well as the fragment lengths of the respective cPCR.

6.5.7. Construction of helper plasmids

In general, helper plasmid assembly was performed according to the BioBricks standard described by Knight [256]. In short, a standard BioBrick can be inserted upstream of another BioBrick that is in a BioBrick assembly vector by cutting the vector EcoRI/XbaI and by cutting the insert EcoRI/SpeI. Due to matching overhangs of SpeI and XbaI restriction sites, ligation is possible and concomitantly destroys both the SpeI and XbaI recognition sequence. Downstream addition of additional building blocks works similarly by cutting the vector SpeI/PstI and cutting the insert XbaI/PstI before subsequent ligation of the two parts. Parts were assembled in the assembly vector pSB103 that has been generated using primers SM103_f and SM103_r as described previously [256]. The general procedure is outlined in Transcription terminator rrnB was amplified with primers AB119 and AB120 from template pKD13, introducing EcoRI/SpeI restriction sites at the 5’-terminus and XbaI/PstI sites at the 3’- terminus. Fragment rrnB was inserted into pSB103 via restriction sites EcoRI/SpeI, forming plasmid AP1.0. The R2TnLRI PCR fragment was generated from template pAB215 with primers AB654 and AB655, PcDTE WT fragment was generated from template pKTS-PcDTE-C6H with primers AB416 and AB417 and ZmFRK fragment was generated from template pET30a-ZmFRK

152 with primers AB346 and AB347. Each of the genes was separately inserted upstream of the rrnB terminator in AP1.0 by standard BioBrick cloning procedure [256] as outlined in Figure S6.4a. Ribosome binding site BBa_B0034 was inserted upstream of each of the respective gene sequences before the promoters BBa_J23109 or BBa_J23117 were inserted upstream of the RBS for ZmFRK, promoters BBa_J23110 or BBa_J23116 was inserted upstream of the RBS for PcDTE and promoter BBa_J23110 was inserted upstream of the RBS for R2TnLRI. Fragment 16PcDTE or 10PcDTE was cut from plasmid AP4.3_16 or AP4.3_10, respectively via EcoRI/SpeI and inserted into plasmid AP4.1_17 via EcoRI/XbaI. Fragment 10R2TnLRI was cut from plasmid AP4.2_10 via EcoRI/SpeI and inserted into plasmid AP4.1_09 via EcoRI/XbaI. The resulting units containing two genes were then cut EcoRI/PstI and inserted into pAB69, resulting in the final helper plasmids HP1, HP2 and HP3.

6.5.8. Growth tests of selection system

Growth tests of selection system strains were performed with minimal M9 medium (12.8 g L-1

-1 -1 -1 -1 Na2HPO4·7H2O, 3 g L KH2PO4, 0.5 g L NaCl, 1 g L NH4Cl) supplemented with 0.1 mL L of trace -1 -1 -1 -1 -1 elements (40 g L FeSO4·7 H2O, 10 g L MnSO4·H2O, 10 g L AlCl3·6 H2O, 4 g L CoCl2·6 H2O, 2 g L -1 -1 -1 ZnSO4·7H2O, 2 g L Na2MoO4·2H2O, 1 g L CuCl2·2H2O, 0.5 g L boric acid, dissolved in 5N HCl), 1 µg -1 mL thiamine, 0.1 mM MgCl2 and 0.1 % (w/v) or 0.2 % (w/v) D-allose. The inducer anhydrotetracyline (aTc) was added to obtain the final concentration indicated in the respective experiment from a 1 mg mL-1 stock dissolved in ethanol. Antibiotics were added at the following concentrations: chloramphenicol 16 µg mL-1 (genomic marker of selection system strain), kanamycin 50 µg mL-1, ampicillin 100 µg mL-1. For growth experiments in M9 liquid cultures 0.02 % (w/v) of casamino acids was added to the medium. For M9-agar plates the medium above was solidified by adding 1.5 % agarose. In general, chemically competent cells of the selection system strains (SelSysA4.1 and SelSysA5.3) were transformed with the respective plasmids in successive steps. For growth test on solid M9-agar plates a single colony of the respective strain was picked and resuspended in 50 µL of 1 x M9 salts. From this suspension cells were streaked on the M9-agar plates using a platinum loop. M9-agar plates were then incubated at 30°C or 37°C. For liquid cultures a single colony of the respective strain was used to inoculate 5 mL of LB supplemented with the required antibiotic(s) and grown overnight at 30°C. Cells were then washed 2 times with 1 x M9 salts before the M9 medium containing 0.1 / 0.2 % D-allose and

0.02 % CAA was inoculated with the washed cells to a starting OD600 of 0.01.

153 6.5.9. In vitro pathway functionality test

Chemically competent cells of SelSysA5.3 were transformed with p186-alsEK, SP1, HP1 or HP1 and SP1. A single colony of the five different strains was used to inoculate 5 mL of LB supplemented with the respective antibiotic(s) and grown overnight at 37°C. These overnight cultures were used to inoculate the expression culture consisting of 25 mL of LB supplemented with the respective antibiotic(s). The cultures were grown for 2 h at 37°C before each of the 5 cultures was induced with 50 ng mL-1 aTc. Expression was performed for 6 h at 37°C before cells were harvested by centrifugation (3220 rcf, 10 min, 4°C) and stored at -20°C. Cells were lysed by addition of 200 µL of lysis buffer A (see above), incubation for 20 min at RT, then for 20 min at -20°C before they were thawed again at RT and 200 µL of lysis buffer B was added. Cell debris was removed by centrifugation (20’000 rcf, 2 min, RT) and the protein concentration of the cleared cell lysate was determined by Bradford protein assay with a standard curve generated with BSA. The protein concentration of each sample was normalized to 5 mg mL-1 total protein content and 5 µL of this normalized lysate was added to 100 µL of 20 mM D-allose in 50 mM Tris (pH 7.4), 1 mM MgCl2 and incubated for 20 min at 30°C. For the positive control 5.5 µg of purified TnLRI, 15.5 µg of purified PcDTE and 3.8 µg of purified ZmFRK was added to 100 µL of substrate solution, for the negative control the same enzymes were added as for the positive control except for TnLRI that was omitted. After 20 min 100 µL of developing solution was added (11 U mL-1 phsophoglucose isomerase (PGI) from S. cerevisiae (Sigma Aldrich), 1 mM ATP, 0.5 mM NADP+, 1.6 U mL-1 glucose 6-phosphate dehydrogenase from

S. cerevisiae (Sigma Aldrich), buffered in 50 mM Tris (pH 8.0), 1 mM MgCl2) and the increase in absorption at 340 nm was recorded in a Perkin Elmer Wallac 1420 Victor plate reader.

6.5.10. Growth inhibition by D-allose

Precultures of the 5 different constructs (SelSysA5.3, SelSysA5.3 + p186-alsEK, SelSysA5.3 + SP1, SelSysA5.3 + HP1, SelSysA5.3 + SP1 + HP1) were prepared as described above and washed twice with 1 x M9 salts. M9 medium was prepared as above except that for one part of the medium glycerol was added to a final concentration of 0.5 % (v/v) and to the other part glycerol (0.5 % (v/v))) and D-allose (0.2 % (w/v)) was added. Inducer aTc was added to a final concentration of

-1 50 ng mL and washed cells were inoculated at a final OD600 of 0.01. Cell were grown at 37°C with shaking and samples for OD measurement were taken periodically (100 µL, OD determined at 595 nm in Wallac 1420 Victor plate reader).

6.5.11. Substrate specificity of EcMAK and ZmFRK for different hexoses

N-terminally 6xHis tagged ZmFRK and EcMAK were expressed and purified as described above for TnLRI and R2 TnLRI except that they were expressed in 200 mL ZYM-5052 autoinduction medium and that lysis buffer and wash buffer were at pH 7.0. Fractions of both IMAC purified

154 proteins were analyzed on SDS-PAGE and found to be virtually pure (> 95 %). The main fractions of each enzyme were pooled and dialyzed twice agains 2 L of 20 mM Tris (pH 7.5). Substrate specificity of ZmFRK and EcMAK was determined as described previously [257]. In detail, 25 µg of ZmFRK or 19 µg of EcMAK were added to 50 mM hexose substrate (D-glucose, D- fructose, D-allose, D-psicose, L-sorbose, L-galactose, L-tagatose, D-sorbitol), 50 mM ATP (titrated before to pH 7.0 with NaOH), 50 mM Tris (pH 7.5), 10 mM MgCl2 and incubated for 4.5 – 6 h at 30°C. 2 µL of the reaction were then spotted on a silica gel TLC slide (Sigma Aldrich) and developed with a butanol-ethanol-water (5:3:2 (v/v/v)) mobile phase. Samples were visualized by dipping the plates shortly in methanol containing 2 % H2SO4, before the plates were dried and charred by a hot-air gun.

Table 6.2| Primers used in this work Nr Name Sequence AB4 DEaraBAD_f 5'-CTGGTTTCGTTTGATTGGCTGTGGTTTTATACAGTCTGTATTAACGAAGCGCTAAC-3' AB5 DEaraBAD_r 5'-CCCGTTTTTTTGGATGGAGTGAAACGATGGCGATTGCAGTAAGTTGGCAGCATCAC-3' AB15 DErhaBADM_f 5'-ACTGGTCGTAATGAAATTCAGCAGGATCACATTATGTGTAGGCTGGAGCTGCTTCG-3' AB16 DErhaBADM_r 5'-TGGCACATTGGGCAATTACGGCAGGTAAAACACTTCATTCCGGGGATCCGTCGACC-3' AB17 Ck_mak_f 5'-GAGGAACCCAGCCCATCTTC-3' AB18 Ck_mak_r 5'-GCTGTATGGTCGCTATAAGC-3' AB23 Ck_rha_f 5'-CTCCTGATGTCGTCAACACG-3' AB24 Ck_rha_r 5'-GAGACAGAGTGAAAGGTCAG-3' AB117 SB103_for 5'-GCTTCTAGAGTACTAGTACACTGCAGGCTTCCTCGCTCACTGACTC-3' AB118 SB103_rev 5'-AGTACTCTAGATGTGAATTCTGCCTCGTGATACGCCTAT-3' AB119 AP_rrnB_f 5'-GCAGAATTCACATCTAGAGGATGGTAGTGTGGGGTCTCC-3' AB120 AP_rrnB_r 5'-GAAGCCTGCAGTGTACTAGTAACGCAAAAAGGCCATCCG-3' AB123 pKTS_seq_for 5’-ACCACTCCCTATCAGTGATA-3’ AB157 Ck_alsEK_f 5’-CCTGCTCAAACGGAATAACC-3’ AB254 LRInea_HindIII_for 5’-ATCATAAGCTTATGATGAACATGGAAGAGATC-3’ AB255 LRInea_XhoI_rev 5’-ATCATCTCGAGTCTTCTCTCTCTCCTTCTTTTC-3’ AB286 New_pSEVA_rev 5’-TACTCAGGAGAGCGTTCACC-3’ AB346 FRKmo_BBa_for 5’-GCAGAATTCACATCTAGAGATGAAAAACGATAAAAAAATTTATG-3’ AB347 FRKmo_BBa_rev 5’-AGTGTACTAGTATTATTTATTTTCTGCCGCC-3’ AB364 Ck_alsEK_r 5’-GTTCTATTCCGGGATTGACG-3’ AB416 PcDTE_BBa_f 5’-GCAGAATTCACATCTAGAGATGAATAAAGTGGGCATG-3’ AB417 PcDTE_BBa_r 5’-AGTGTACTAGTATTATGCCAGTTTATCACGAACAAAC-3’ AB449 alsE_EcoRI_rev 5’-GACGGAATTCTTATGCTGTTTTTGCATGAG-3’ AB460 Ck_ptsHIcrr_f 5’-CCGCATTGTTTGCCGATCTC-3’ AB461 Ck_ptsHIcrr_r 5’-TCAGGAGATGCAGGTTGTGG-3’ AB654 R2-TnLRI_BBa_f 5’-GTCAGAATTCACATCTAGAGATGATGAACATGGAAGAG-3’ AB655 R2-TnLRI_BBa_r 5’-GTGTACTAGTATCATTTCTCCATGTAATTTCC-3’

155 Table 6.3| Plasmids used in this work Nr Name Description Reference

1 pET30a kan-resistance gene, PT7 promoter, 6xHis tag Novagen (Merck4Biosciences) 2 pKD13 bla-resistance gene, kan-resistance cassette flanked by FRT sites, Datsenko and R6K origin of replication Wanner [254]

3 pKD46 bla-resistance gene, lambda Red system under ParaB promoter Datsenko and control, ori R101 and temperature-sensitive repA101ts replicon Wanner [254] 4 pCP20 bla-resistance gene, cm-resistance gene, temperature-sensitive Datsenko and replicon, FLP-recombinase under control of temperature-sensitive Wanner [254] promoter

5 pAB92 SEVA vector backbone, bla resistance gene, Ptet-PT7 fusion Chapter 4 of this promoter, MCS, ori pBR322 theis 6 pAB107 derivative of pAB92 with gene rhaA from T. neapolitana (TnLRI) This work under control of Ptet-PT7 fusion promoter, C-terminal 6xHis-tag 7 p186-alsEK derivative of pAB92, E. coli genes alsEK inserted via HindIII/EcoRI This work under control of Ptet-PT7 promoter 8 p186-alsE derivative of p186-alsEK, E. coli genes alsE inserted via HindIII/EcoRI under control of Ptet-PT7 promoter 9 SP1 (p186-R2TnLRI) derivative of pAB107, coding for R2TnLRI (mutations W74G, This work N223D, R357S, R378*)

10 SP2 (p186-IDF10) derivative of pAB92, PcDTE IDF10 under control of Ptet-PT7 fusion Chapter 5 of this promoter, 6xHis tag theis 11 pSEVA231 pSEVA vector format with MCS, kan-resistance gene, pBBR1 origin Silva-Rocha et al. of replication [200] 12 pAB69 derivative of pSEVA231, origin of replication exchanged against ori This work p15A via AscI/FseI

13 pKTS-PcDTE-C6H pKTS vector, Ptet-PT7, PcDTE-6xHis tag Bosshart et al. [106] 14 HP1 (271-17ZmFRK- derivative of pAB69, constitutive promoter from BBa_J23116, This work 16PcDTE) PcDTE (WT) from pKTS-PcDTE-C6H, rrnB terminator, const. promoter from BBa_J23117, ZmFRK, rrnB terminator 15 HP2 (271-17ZmFRK- as HP1, but PcDTE under control of const. promoter BBa_J23110 This work 10PcDTE) 16 HP3 (271-09ZmFRK- as HP2; gene for R2TnLRI instead of PcDTE; const. promoter from This work 10R2TnLRI) BBa_J23109 upstream of ZmFRK 17 pET30a-ZmFRK pET30a, gene for ZmFRK inserted via NdeI/XhoI, C-terminal 6xHis This work tag 18 pET30a-EcMAK pET30a, gene for EcMAK inserted via NdeI/XhoI, C-terminal 6xHis This work tag

19 pAB139 pSEVA backbone with kan resistance gene, ori pBR322, Ptet-PT7 Chapter 4 of this fusion promoter, N-terminal 6x His tag, ribitol dehydrogenase theis gene from K. pneumonia 20 pAB215 derivative of pAB139, N-terminal 6xHis tag, R2TnLRI gene This work (mutations W74G, N223D, R357S, R378*) 21 pSB103 derivative of pUC18, MCS and promoter removed, Knight,T. [256] EcoRI/XbaI/SpeI/PstI cassette inserted 22 AP1.0 plasmid pSB103 with rrnB terminator This work

156 Table 6.4| Strains used in this work Nr Strain name Relevant genotypes Source or reference 1 E. coli MG1655 F-, λ-, rph-1 CGSC 2 E. coli Top10 F- mcrA Δ(mrr-hsdRMS-mcrBC) Φ80lacZΔM15 ΔlacX74 recA1 Invitrogen araD139 Δ(ara leu) 7697 galU galK rpsL (StrR) endA1 nupG 3 E. coli BL21(DE3) F- ompT hsdSB(rB–, mB–) gal dcm (DE3) Invitrogen 4 Zymomonas mobilis DSM 3580 DSMZ 5 Thermotoga neapolitana DSM 4359 DSMZ 6 E. coli JW0385 BW25113 Δmak::kan (F-, Δ(araD-araB)567, ΔlacZ4787(::rrnB-3), Keio collection [253] λ-, rph-1, Δ(rhaD-rhaB)568, hsdR514, Δmak::kan) 7 E. coli JW2409 BW25113 ΔptsI::kan (F-, Δ(araD-araB)567, ΔlacZ4787(::rrnB-3), Keio collection [253] λ-, rph-1, Δ(rhaD-rhaB)568, hsdR514, ΔptsI::kan) 8 E. coli MG1655 ΔalsEK MG1655 ΔalsEK::kan This work 9 E. coli MG1655 ΔalsK MG1655 ΔalsK::kan This work 10 E. coli MG1655 ΔaraBAD MG1655 ΔaraBAD::kan This work 11 E. coli MG1655 MG1655 ΔrhaBADM::kan This work ΔrhaBADM 12 E. coli SelSysA4.1 MG1655 Δmak::FRT, ΔrhaBADM::FRT, ΔaraBAD::cam, This work ΔalsK::FRT 13 E. coli SelSysA4.2 MG1655 Δmak::FRT, ΔrhaBADM::FRT, ΔaraBAD::cam, This work ΔalsEK::FRT 14 E. coli SelSysA5.3 MG1655 Δmak::FRT, ΔrhaBADM::FRT, ΔaraBAD::cam, This work ΔalsEK::FRT, ΔptsI::FRT

157 6.6. Supporting Material

6.6.1. Calculating theoretical growth rates

Table S6.1| Parameters for calculating theoretical growth rates of selection system Parameter Symbol Value Unit Source -15 Cell Volume Vc 1.2 x 10 L [258] Protein Content 0.55 g per g CDW [258] Total protein concentration inside cell 4 mM [259] Total number of proteins 2.35 x 106 [259]

Yield Coefficient YX/S 0.45 g CDW per g glucose [260] 23 -1 Avogadro constant NA 6.022 x 10 mol 12 -1 # of cells per L with OD600 = 1.0 10 cells L [261] -1 Max. growth rate on D-allose µmax,All 0.35 h This work -1 kcat of R2TnLRI 0.71 s This work

Km of R2TnLRI 44.2 mM This work -1 kcat of PcDTE WT 32.9 s This work -1 Km of PcDTE WT 18.9 s This work -1 Molecular weight of D-allose MWAll 0.18 g mmol

We calculated the flux through the respective pathway step based on a Michaelis-Menten enzyme kinetic model:

Equation 1

-1 -1 v is the catalytic rate (in mM s ), kcat (in s ) and Km (in mM) are kinetic constants of the respective enzyme, [S] is the intracellular substrate concentration that was assumed (in a range from 0.1 mM to 20 mM) and E0 is the concentration of active enzyme was calculated based on literature values for intracellular protein concentration (4 mM), multiplied by the fraction of total protein represented by the recombinant enzyme (in a range from 0.001 to 0.1, i.e. 0.1 % to 10% of total protein). Flux of the intermediate (D-allose or D-psicose) through the pathway per cell can then be calculated according to equation 2:

( ) Equation 2

-1 where v is the reaction rate from equation 1 (mM s ), VC is the cell volume of a single E. coli cell -1 (in L), and MWAll is the molecular weight of D-allose (g mmol ). By multiplication with 3600 and 12 -1 10 the flux (gAll h ) per 1 L cell culture with OD = 1 is obtained.

158 Growth rate µ can then be calculated using equation 3:

( ( ) ) Equation 3

Flux is the amount of D-allose that is converted by 1 L cell culture with OD600 = 1 in a certain -1 time Δt (calculated by equation 2) and YX/S is the yield coefficient (g g ).

From the growth rate µ the doubling time tdouble can be calculated according to equation 4:

( ) Equation 4

Table S6.2| Screening of improved TnLRI variants 1st round (epPCR library) Variant Improvement over TnLRI WT 1-5/E02 2.4 1-5/H05 2.9 1-6/D12 3.1 1-7/H07 2.4 1-8/G04 2.6 2nd round (DNA shuffling library) Variant Improvement over 1-6/D12 Improvement over TnLRI WT Mutations 2-1/F08 1.9 6.1 2-1/F01 1.9 6.1 2-2/F07 2.0 6.3 W74G, N223D, R357S, R378* 2-3/C09 1.8 5.5 2-7/E01 1.8 5.6

0.9 E. coli MG1655 0.8 SelSysA4.2+186-alsEK y = 0.0057e0.449x 0.7 R² = 0.9972 0.6 0.5

[A.U.] 0.4 600 600

0.3 OD 0.2 y = 0.007e0.3476x 0.1 R² = 0.9936 0 0 2 4 6 8 10 12 Time [h] Figure S6.1| Growth curves of E. coli MG1655 and SelSysA4.2 (Δmak, ΔrhaBADM, ΔaraBAD, ΔalsEK) + p186-alsEK on M9 + 0.2 % D-allose at 37°C. Precultures of the respective strains were inoculated at OD600 = 0.01 and SelSysA4.2 + p186-alsEK was induced from the beginning with 100 ng mL-1 aTc.

159 PalsR PalsB*

alsI/rpiB alsR alsB alsA alsC alsE alsK

P rpiB Figure S6.2| Organization of the allose operon on the E. coli MG1655 genome. alsR encodes the DNA- binding transcriptional repressor, alsBAC codes for the allose importer, alsE codes for a psicose 6- phosphate 3-epimerase and alsK for the allose kinase. rpiB/alsI is organized in reverse direction to the allose operon and encodes the allose 6-phosphate isomerase (which is also a functional ribose 5- phosphate isomerase [262]). The figure was prepared based on previous reports [230, 231].

* A putative promoter PalsB has been assigned by Zaslaver et al. [263].

18

] 1

- 16 14

12 A.U. h A.U. [ y = 0.6809x + 1.2264 t 10 Δ R² = 0.9988 / 8

6 340 340 nm

4 Abs

Δ 2 0 0 5 10 15 20 25 % D-psicose in D-allose (10 mM total sugar) Figure S6.3| Calibration curve for determination of D-psicose concentration in presence of D-allose using enzyme KpRD. Different ratios of D-psicose and D-allose were mixed (final total hexose concentration of 10 mM) and the slope of NADH consumption by KpRD by concomitant reduction of D-psicose to allitol was recorded and plotted against the amount of D-psicose in the sample. Each measurement is the mean of 3 independent measurements with standard errors depicted.

160 a) E X S P rrnB terminator AP0.0 RBS Promoter

E X S P E X S P E X S P ZmFRK R2TnLRI PcDTE AP1.1 AP2.1 AP3.1

E X S P E X S P E X S P ZmFRK R2TnLRI PcDTE AP1.2 AP2.2 AP3.2

09 10 17 10 16 E X S P E X S P E X S P ZmFRK R2TnLRI PcDTE

AP1.3_09/17 AP2.3_10 AP3.3_10/16

b) J23116 RBS PcDTE rrnB J23117 RBS ZmFRK rrnB

HP1 (271-16PcDTE-09ZmFRK)

J23110 RBS PcDTE rrnB J23117 RBS ZmFRK rrnB

HP2 (271-10PcDTE-09ZmFRK)

J23110 RBS R2TnLRI rrnB J23109 RBS ZmFRK rrnB

HP3 (271-10R2TnLRI-09ZmFRK)

Figure S6.4| a) Strategy for assembly of the helper plasmids in the assembly plasmids (AP). Letters E/X/S/P designate the restriction sites EcoRI/XbaI/SpeI/PstI, the numbers above the promoter symbol indicate the constitutive promoter variant (last two digits of the promoter name, see text for details). b) Schematic representation of the final helper plasmid architecture HP1-3. Medium-strength constitutive promoter J23116 or strong constitutive promoter J23110 were used for expression of PcDTE WT or R2TNLRI and weak constitutive promoters J23117 or J23109 were used for expression of ZmFRK. Each unit was isolated by an rrnB terminator to prevent read-through of the RNA polymerase.

161

162 CHAPTER 7: CONCLUSION AND OUTLOOK The work presented in this thesis describes the development of a D-tagatose epimerase enzyme, starting from the wild-type protein via a thermostabilized variant to two variants that can be considered as ready for industrial production of the rare hexoses D-psicose and L- tagatose by integrating the enzymatic reaction with the separation module as discussed in the introduction. This two-step approach agreed nicely with the general assumption that more stable proteins are a good basis for directed evolution, as they exhibit a larger “stability reservoir” that can be exploited in subsequent rounds of directed evolution. More importantly, stability of the biocatalyst is also of decisive importance for successful long-term integration of biocatalysis and separation by SMB due to several reasons. First, successful integration of reaction and separation requires that each module operates continuously and thus under steady-state conditions, which mean that concentrations of each compound at the entrance and exit of each module remain constant. This is at odds with the inevitable decrease of total enzyme activity in the enzyme reactor. A work-around can be achieved in two ways. Firstly, by heavily overloading the reactor such that for an extended time period the conversion of the reaction is effectively at the thermodynamic equilibrium. This approach was used for the first laboratory implementation of the integrated process for the production of the rare sugar D-psicose [197], but it suffers from an excessive loading of biocatalyst to the reactor which is unfavorable both in terms of reactor performance (accelerated reduction of mass transfer due to membrane fouling) as well as in terms of economics, because exponentially more enzyme has to be loaded into the reactor the longer the integrated process is supposed to operate. Clearly, a highly stable biocatalyst is imperative for this mode of operation. Secondly, a more thermostable biocatalyst would allow the reactor, an enzyme-membrane reactor in our case, to be run at higher temperatures (> 50°C) without losing disproportionate amounts of biocatalyst throughout the run. These elevated temperatures effectively reduce the risk of microbial contaminations that are otherwise a ubiquitous issue when working with carbohydrates and sugars. Furthermore, elevated temperature also reduces the viscosity of the sugar solutions that can become prohibitive when working with high sugar concentrations, and they increase the reaction rate of the enzyme according to the well-established Arrhenius- law and thus reduce the residence time in the reactor. The semi-rational approach that we have described in chapter 3 to thermostabilize the dimeric D-tagatose epimerase from P. cichorii is in line with the current general trend towards smaller and more focused (so-called “smart”) libraries. We could convincingly show that the subunit- subunit interface is indeed a hotspot for increasing the thermostability of this multimeric enzyme, which is not altogether surprising considering the fact that thermophilic organisms

163 have an increased percentage of multimeric enzymes compared to their mesophilic counterparts [166]. As discussed in chapter 3, the strengthened interaction between the two subunits is supposed to increase the (presumed) first step in unfolding, namely the dissociation of the dimer into monomers. Nevertheless, further examples of interface engineering of other multimeric enzymes are necessary to reliably considering the presented method as generally applicable for stabilization of any multimeric proteins. Apart from the discussed benefit of a thermostable enzyme for the integrated process it also had a convenient side-effect for the preparation of the enzyme. Whereas the wild-type enzyme had to be purified by chromatography in the first step to obtain sufficiently pure protein for application in the enzyme-membrane reactor, the thermostable DTE (as well as all variants derived from it) allowed the purification by simply heat-treating the E. coli lysate containing the overexpressed DTE for a certain time-period (10 min at 70°C). The precipitated E. coli host proteins were removed by centrifugation and the enzyme was obtained in purities (> 90%) that were perfectly suitable for application in the EMR. This purification approach can be easily scaled up to industrially relevant quantities as heating up and centrifugation/filtration are standard procedures in the chemical industry. When it comes to screening libraries in directed evolution a central question is whether a targeted and knowledge-based mutagenesis approach or a random mutagenesis strategy is preferred. As elaborated in chapter 2, the choice on the mutagenesis method is strongly interconnected with the available screening or selection method. In chapter 4 we have described the development of an enzyme-coupled assay for the detection of D-psicose and L- tagatose, respectively which allowed the screening of several thousand variants using microtiterplates. Based on this screening assay we compared the ratio of variants with improved specific activity per screened clone using (I) a completely random mutagenesis approach (epPCR) and (II) targeted mutagenesis of residues around the active site (site- saturation mutagenesis). The targeted mutagenesis approach turned out to be much more efficient in finding improved variants, corroborating the findings of other researchers on this matter. Moreover, the random mutagenesis approach resulted in beneficial mutations that changed residues in the previously engineered dimeric interface, presumably leading to a reduction in thermostability. It is therefore straight-forward to conclude that targeted mutagenesis is the method of choice for screening libraries with assays that can handle only up to 104 – 105 variants. In chapter 6, the development of a growth-based selection system was summarized for the directed evolution of D-tagatose epimerase or L-rhamnose isomerase. Such a selection system would be ideal to overcome the limitation in throughput of the microtiter plate-based screening approach as discussed in chapter 4 and 5. The establishment of the selection system, however, proved to be utterly challenging and required substantial efforts to understand a broad variety of obstacles preventing the actual successful application of the assay, including

164 the deletion of several promiscuous enzymes that interfered with the projected de novo metabolic pathway. It also revealed that implementation of a novel metabolic pathway into a microorganism like E. coli is still far from trivial due to suspected toxic effects of pathway intermediates and very low catalytic rates of certain pathway enzymes. Nevertheless, we were able to show that growth of the finally assembled selection system was possible in liquid culture, although at a very slow rate.

The presented results clearly indicate that directed evolution of enzymes can readily tailor them to their field of application, which is especially important for the implementation of biotransformation in an integrated process. Therefore, the benefit of the presented variants in the proposed integrated process for the production of rare sugars will become even clearer upon their implementation into an optimized integrated process setup. This is the topic of a complementing second project in the Bioprocess Laboratory that will be described soon.

165

166 CHAPTER 8: REFERENCES 1. Harmsen, J.G., G. Korevaar, and S.M. Lemkowitz, Process intensification contributions to sustainable development. Re-engineering the chemical processing plant, ed. S.a.J.A. Moulijn2004, New York: Marcel Dekker Inc. 2. Stankiewicz, A.I. and J.A. Moulijn, Process intensification: transforming chemical engineering. Chemical Engineering Progress, 2000. 96(2): p. 8-8. 3. Stankiewicz, A., Reactive separations for process intensification: an industrial perspective. Chemical Engineering and Processing, 2003. 42(3): p. 137-144. 4. Anderson, N.G., Practical use of continuous processing in developing and scaling up laboratory processes. Organic Process Research & Development, 2001. 5(6): p. 613-621. 5. Straathof, A.J.J., S. Panke, and A. Schmid, The production of fine chemicals by biotransformations. Current Opinion in Biotechnology, 2002. 13(6): p. 548-556. 6. Macauley, S., B. McNeil, and L.M. Harvey, The genus Gluconobacter and its applications in biotechnology. Critical Reviews in Biotechnology, 2001. 21(1): p. 1-25. 7. Granstrom, T.B., et al., Izumoring: a novel and complete strategy for bioproduction of rare sugars. Journal of Bioscience and Bioengineering, 2004. 97(2): p. 89-94. 8. Beerens, K., T. Desmet, and W. Soetaert, Enzymes for the biocatalytic production of rare sugars. Journal of Industrial Microbiology & Biotechnology, 2012. 39(6): p. 823-834. 9. Fessner, W.-D., Aldolases: enzymes for making and breaking C-C bonds, in Asymmetric Organic Synthesis with Enzymes2008, Wiley-VCH Verlag GmbH & Co. KGaA. p. 275-318. 10. Izumori, K. Izumoring: A strategy for bioproduction of all hexoses. in 12th European Congress on Biotechnology (ECB 12). 2005. Copenhagen, DENMARK: Elsevier Science Bv. 11. Kim, H.J., et al., Characterization of an Agrobacterium tumefaciens D-psicose 3-epimerase that converts D-fructose to D-psicose. Applied and Environmental Microbiology, 2006. 72(2): p. 981-985. 12. Ishida, Y., et al., Cloning and characterization of the D-tagatose 3-epimerase gene from Pseudomonas cichorii ST-24. Journal of Fermentation and Bioengineering, 1997. 83(6): p. 529-534. 13. Park, C.-S., et al., Characterization of a recombinant thermostable L-rhamnose isomerase from Thermotoga maritima ATCC 43589 and its application in the production of L-lyxose and L-mannose. Biotechnology Letters, 2010. 32(12): p. 1947-1953. 14. Prabhu, P., et al., Cloning and characterization of a rhamnose isomerase from Bacillus halodurans. Applied Microbiology and Biotechnology, 2010. 89(3): p. 635-644. 15. Leang, K., et al., Novel reactions of L-rhamnose isomerase from Pseudomonas stutzeri and its relation with D-xylose isomerase via substrate specificity. Biochimica et Biophysica Acta (BBA) - General Subjects, 2004. 1674(1): p. 68-77. 16. Cheng, L., W. Mu, and B. Jiang, Thermostable L-arabinose isomerase from Bacillus stearothermophilus IAM 11001 for D-tagatose production: gene cloning, purification and characterisation. Journal of the Science of Food and Agriculture, 2009. 90(8): p. 1327- 1333. 17. Kim, B.-C., et al., Cloning, expression and characterization of l-arabinose isomerase from Thermotoga neapolitana: bioconversion of d-galactose to d-tagatose using the enzyme. Fems Microbiology Letters, 2002. 212(1): p. 121-126. 18. Yoon, S.H., P. Kim, and D.K. Oh, Properties of L-arabinose isomerase from Escherichia coli as biocatalyst for tagatose production. World Journal of Microbiology & Biotechnology, 2003. 19(1): p. 47-51. 19. Mizanur, R.M., G. Takata, and K. Izumori, Cloning and characterization of a novel gene encoding L-ribose isomerase from Acinetobacter sp strain DL-28 in Escherichia coli. Biochimica Et Biophysica Acta-Gene Structure and Expression, 2001. 1521(1-3): p. 141-145. 20. Yoon, R.-Y., et al., Novel substrates of a ribose-5-phosphate isomerase from Clostridium thermocellum. Journal of Biotechnology, 2009. 139(1): p. 26-32. 21. Vieille, C., et al., Xyla cloning and sequencing and biochemical characterization of xylose isomerase from Thermotoga neapolitana. Applied and Environmental Microbiology, 1995. 61(5): p. 1867-1875.

167 22. Bandlish, R.K., et al., Glucose-to-fructose conversion at high temperatures with xylose (glucose) isomerases from Streptomyces murinus and two hyperthermophilic Thermotoga species. Biotechnology and Bioengineering, 2002. 80(2): p. 185-194. 23. Kim, H.J., et al., Novel activity of UDP-galactose-4-epimerase for free monosaccharide and activity improvement by active site-saturation mutagenesis. Applied Biochemistry and Biotechnology, 2011. 163(3): p. 444-451. 24. Takeda, K., et al., X-ray structures of Bacillus pallidus D-arabinose isomerase and its complex with L-fucitol. Biochimica Et Biophysica Acta-Proteins and Proteomics, 2010. 1804(6): p. 1359-1368. 25. Bechtold, M., et al. Integrated operation of continuous chromatography and biotransformations for the generic high yield production of fine chemicals. in 2nd BioPerspectives Congress. 2005. Wiesbaden, GERMANY: Elsevier Science Bv. 26. Bechtold, M., et al., Potential litigation integeration with adsorbers for the production of rare monosaccharides - a model-based study. Chemie Ingenieur Technik, 2010. 82(1-2): p. 65-75. 27. Seidel-Morgenstern, A., L.C. Kessler, and M. Kaspereit, New developments in simulated moving bed chromatography. Chemical Engineering & Technology, 2008. 31(6): p. 826- 837. 28. Juza, M., M. Mazzotti, and M. Morbidelli, Simulated moving-bed chromatography and its application to chirotechnology. Trends in Biotechnology, 2000. 18(3): p. 108-118. 29. Borren, T. and H. Schmidt-Traub, Comparison of chromatographic reactor concepts. Chemie Ingenieur Technik, 2004. 76(6): p. 805-814. 30. Bechtold, M., et al., Model-based characterization of an amino acid racemase from Pseudomanas putida DSM 3263 for application in medium-constrained continuous processes. Biotechnology and Bioengineering, 2007. 98(4): p. 812-824. 31. Woodley, J.M., Protein engineering of enzymes for process applications. Current Opinion in Chemical Biology, 2013. 17(2): p. 310-316. 32. Long, N.V.D., et al., Separation of D-psicose and D-fructose using simulated moving bed chromatography. Journal of Separation Science, 2009. 32(11): p. 1987-1995. 33. Bhosale, S.H., M.B. Rao, and V.V. Deshpande, Molecular and industrial aspects of glucose isomerase. Microbiological Reviews, 1996. 60(2): p. 280-300. 34. Bloom, J.D., et al., Protein stability promotes evolvability. Proceedings of the National Academy of Sciences of the United States of America, 2006. 103(15): p. 5869-5874. 35. Dellus-Gur, E., et al., What makes a protein fold amenable to functional innovation? Fold polarity and stability trade-offs. Journal of Molecular Biology, 2013. 425(14): p. 2609-2621. 36. Besenmatter, W., P. Kast, and D. Hilvert, Relative tolerance of mesostable and thermostable protein homologs to extensive mutation. Proteins-Structure Function and Bioinformatics, 2007. 66(2): p. 500-506. 37. Jia, M., et al., A D-psicose 3-epimerase with neutral pH optimum from Clostridium bolteae for D-psicose production: cloning, expression, purification, and characterization. Applied Microbiology and Biotechnology, 2014. 98(2): p. 717-725. 38. Zhang, W.L., et al., Characterization of a Metal-Dependent D-Psicose 3-Epimerase from a Novel Strain, Desmospora sp 8437. Journal of Agricultural and Food Chemistry, 2013. 61(47): p. 11468-11476. 39. Zhang, W.L., et al., Characterization of a novel metal-dependent D-psicose 3-epimerase from Clostridium scindens 35704. PLoS One, 2013. 8(4). 40. Zhu, Y., et al., Overexpression of D-psicose 3-epimerase from Ruminococcus sp. in Escherichia coli and its potential application in D-psicose production. Biotechnology Letters, 2012. 34(10): p. 1901-1906. 41. Mu, W.M., et al., Cloning, expression, and characterization of a D-psicose 3-epimerase from Clostridium cellulolyticum H10. Journal of Agricultural and Food Chemistry, 2011. 59(14): p. 7785-7792. 42. Mu, W., et al., Characterization of a D-psicose-producing enzyme, D-psicose 3-epimerase, from Clostridium sp. Biotechnology Letters, 2013. 35(9): p. 1481-1486.

168 43. Zhang, L., et al., Characterization of D-tagatose-3-epimerase from Rhodobacter sphaeroides that converts D-fructose into D-psicose. Biotechnol Lett, 2009. 31(6): p. 857- 862. 44. Bornscheuer, U.T., et al., Engineering the third wave of biocatalysis. Nature, 2012. 485(7397): p. 185-194. 45. Wang, M., T. Si, and H.M. Zhao, Biocatalyst development by directed evolution. Bioresource Technology, 2012. 115: p. 117-125. 46. Reetz, M.T., Biocatalysis in organic chemistry and biotechnology: past, present, and future. Journal of the American Chemical Society, 2013. 135(34): p. 12480-12496. 47. Luetz, S., L. Giver, and J. Lalonde, Engineered enzymes for chemical production. Biotechnology and Bioengineering, 2008. 101(4): p. 647-653. 48. Rothlisberger, D., et al., Kemp elimination catalysts by computational enzyme design. Nature, 2008. 453(7192): p. 190-195. 49. Vazquez-Figueroa, E., J. Chaparro-Riggers, and A.S. Bommarius, Development of a thermostable glucose dehydrogenase by a structure-guided consensus concept. Chembiochem, 2007. 8(18): p. 2295-2301. 50. Reetz, M.T., et al., Increasing the stability of an enzyme toward hostile organic solvents by directed evolution based on iterative saturation mutagenesis using the B-FIT method. Chemical Communications, 2010. 46(45): p. 8657-8658. 51. You, L. and F.H. Arnold, Directed evolution of subtilisin E in Bacillus subtilis to enhance total activity in aqueous dimethylformamide (vol 9, pg 78, 1996). Protein Engineering, 1996. 9(8): p. 719-719. 52. Fox, R.J., et al., Improving catalytic function by ProSAR-driven enzyme evolution. Nature Biotechnology, 2007. 25(3): p. 338-344. 53. Goldsmith, M. and D.S. Tawfik, Directed enzyme evolution: beyond the low-hanging fruit. Current Opinion in Structural Biology, 2012. 22(4): p. 406-412. 54. van Rossum, T., S.W.M. Kengen, and J. van der Oost, Reporter-based screening and selection of enzymes. FEBS Journal, 2013. 280(13): p. 2979-2996. 55. Dietrich, J.A., A.E. McKee, and J.D. Keasling, High-throughput metabolic engineering: advances in small-molecule screening and selection, in Annual Review of Biochemistry, Vol 79, R.D. Kornberg, et al., Editors. 2010. p. 563-590. 56. Leemhuis, H., R.M. Kelly, and L. Dijkhuizen, Directed evolution of enzymes: library screening strategies. IUBMB Life, 2009. 61(3): p. 222-228. 57. Kiss, G., et al., Computational enzyme design. Angewandte Chemie-International Edition, 2013. 52(22): p. 5700-5725. 58. Savile, C.K., et al., Biocatalytic asymmetric synthesis of chiral amines from ketones applied to sitagliptin manufacture. Science, 2010. 329(5989): p. 305-309. 59. Bloom, J.D. and F.H. Arnold, In the light of directed evolution: pathways of adaptive protein evolution. Proceedings of the National Academy of Sciences of the United States of America, 2009. 106: p. 9995-10000. 60. Dalby, P.A., Strategy and success for the directed evolution of enzymes. Current Opinion in Structural Biology, 2011. 21(4): p. 473-480. 61. Shivange, A.V., et al., Advances in generating functional diversity for directed protein evolution. Current Opinion in Chemical Biology, 2009. 13(1): p. 19-25. 62. Cirino, P.C., K.M. Mayer, and D. Umeno, Generating mutant libraries using error-prone PCR, in Directed Evolution Library Creation, F. Arnold and G. Georgiou, Editors. 2003, Humana Press. p. 3-9. 63. Camps, M., et al., Targeted gene evolution in Escherichia coli using a highly error-prone DNA polymerase I. Proceedings of the National Academy of Sciences of the United States of America, 2003. 100(17): p. 9727-9732. 64. Bershtein, S. and D.S. Tawfik, Ohno's model revisited: measuring the frequency of potentially adaptive mutations under various mutational drifts. Molecular Biology and Evolution, 2008. 25(11): p. 2311-2318. 65. Stemmer, W.P.C., Rapid evolution of a protein in-vitro by DNA shuffling. Nature, 1994. 370(6488): p. 389-391.

169 66. Crameri, A., et al., DNA shuffling of a family of genes from diverse species accelerates directed evolution. Nature, 1998. 391(6664): p. 288-291. 67. Zhao, H., et al., Molecular evolution by staggered extension process (StEP) in vitro recombination. Nature Biotechnology, 1998. 16(3): p. 258-261. 68. Ness, J.E., et al., Synthetic shuffling expands functional protein diversity by allowing amino acids to recombine independently. Nature Biotechnology, 2002. 20(12): p. 1251-1255. 69. Herman, A. and D.S. Tawfik, Incorporating synthetic oligonucleotides via gene reassembly (ISOR): a versatile tool for generating targeted libraries. Protein Engineering Design and Selection, 2007. 20(5): p. 219-226. 70. Stratagene, L., CA 92037, QuikChange® site-directed mutagenesis kit. 2005. 71. Liu, H. and J. Naismith, An efficient one-step site-directed deletion, insertion, single and multiple-site plasmid mutagenesis protocol. Bmc Biotechnology, 2008. 8(1): p. 91. 72. Reetz, M.T., D. Kahakeaw, and J. Sanchis, Shedding light on the efficacy of laboratory evolution based on iterative saturation mutagenesis. Molecular Biosystems, 2009. 5(2): p. 115-122. 73. Agudo, R., G.D. Roiban, and M.T. Reetz, Induced Axial Chirality in Biocatalytic Asymmetric Ketone Reduction. Journal of the American Chemical Society, 2013. 135(5): p. 1665-1668. 74. Wu, Q., P. Soni, and M.T. Reetz, Laboratory Evolution of Enantiocomplementary Candida antarctica Lipase B Mutants with Broad Substrate Scope. Journal of the American Chemical Society, 2013. 135(5): p. 1872-1881. 75. Reetz, M.T. and J.D. Carballeira, Iterative saturation mutagenesis (ISM) for rapid directed evolution of functional enzymes. Nature Protocols, 2007. 2(4): p. 891-903. 76. Reetz, M.T., D. Kahakeaw, and R. Lohmer, Addressing the numbers problem in directed evolution. Chembiochem, 2008. 9(11): p. 1797-1804. 77. Reetz, M.T., et al., Iterative saturation mutagenesis accelerates laboratory evolution of enzyme stereoselectivity: rigorous comparison with traditional methods. Journal of the American Chemical Society, 2010. 132(26): p. 9144-9152. 78. Lipovsek, D., et al., Selection of horseradish peroxidase variants with enhanced enantioselectivity by yeast surface display. Chemistry & Biology, 2007. 14(10): p. 1176-1185. 79. Parikh, M.R. and I. Matsumura, Site-saturation mutagenesis is more efficient than DNA shuffling for the directed evolution of beta-fucosidase from beta-galactosidase. Journal of Molecular Biology, 2005. 352(3): p. 621-628. 80. Lesley, S.A., et al., Structural genomics of the Thermotoga maritima proteome implemented in a high-throughput structure determination pipeline. Proceedings of the National Academy of Sciences, 2002. 99(18): p. 11664-11669. 81. Stevens, R.C., High-throughput protein crystallization. Current Opinion in Structural Biology, 2000. 10(5): p. 558-563. 82. Marks, D.S., T.A. Hopf, and C. Sander, Protein structure prediction from sequence variation. Nature Biotechnology, 2012. 30(11): p. 1072-1080. 83. Varadarajan, N., et al., Construction and flow cytometric screening of targeted enzyme libraries. Nature Protocols, 2009. 4(6): p. 893-901. 84. Sonke, T., et al., Industrial perspectives on assays, in Enzyme Assays2006, Wiley-VCH Verlag GmbH & Co. KGaA. p. 95-135. 85. Reetz, M.T., High-throughput screening systems for assaying the enantioselectivity of enzymes, in Enzyme Assays2006, Wiley-VCH Verlag GmbH & Co. KGaA. p. 41-76. 86. Yang, G. and S.G. Withers, Ultrahigh-throughput FACS-based screening for directed enzyme evolution. Chembiochem, 2009. 10(17): p. 2704-2715. 87. Whittle, E. and J. Shanklin, Engineering Delta(9)-16 : 0-acyl carrier protein (ACP) desaturase specificity based on combinatorial saturation mutagenesis and logical redesign of the castor Delta(9)-18 : 0-ACP desaturase. Journal of Biological Chemistry, 2001. 276(24): p. 21500-21505. 88. Neuenschwander, M., et al., A simple selection strategy for evolving highly efficient enzymes. Nature Biotechnology, 2007. 25(10): p. 1145-1147. 89. Boersma, Y.L., et al., A novel genetic selection system for improved enantioselectivity of Bacillus subtilis lipase A. ChemBioChem, 2008. 9(7): p. 1110-1115.

170 90. Alexeeva, M., et al., Deracemization of alpha-methylbenzylamine using an enzyme obtained by in vitro evolution. Angewandte Chemie-International Edition, 2002. 41(17): p. 3177-+. 91. Kim, Y.W., et al., Directed evolution of a glycosynthase from Agrobacterium sp increases its catalytic activity dramatically and expands its substrate repertoire. Journal of Biological Chemistry, 2004. 279(41): p. 42787-42793. 92. Baker, K., et al., Chemical complementation: a reaction-independent genetic assay for . Proceedings of the National Academy of Sciences of the United States of America, 2002. 99(26): p. 16537-16542. 93. Yang, G.Y., et al., Fluorescence activated cell sorting as a general ultra-high-throughput screening method for directed evolution of glycosyltransferases. Journal of the American Chemical Society, 2010. 132(30): p. 10570-10577. 94. Fernandez-Alvaro, E., et al., A combination of in vivo selection and cell sorting for the identification of enantioselective biocatalysts. Angewandte Chemie-International Edition, 2011. 50(37): p. 8584-8587. 95. Aharoni, A., et al., High-throughput screening of enzyme libraries: Thiolactonases evolved by fluorescence-activated sorting of single cells in emulsion compartments. Chemistry & Biology, 2005. 12(12): p. 1281-1289. 96. Hardiman, E., et al., Directed Evolution of a Thermophilic beta-glucosidase for Cellulosic Bioethanol Production. Applied Biochemistry and Biotechnology, 2010. 161(1-8): p. 301- 312. 97. Ostafe, R., et al., Ultra-High-Throughput Screening Method for the Directed Evolution of Glucose Oxidase. Chemistry & Biology, 2014. 21(3): p. 414-421. 98. Becker, S., et al., Single-cell high-throughput screening to identify enantioselective hydrolytic enzymes. Angewandte Chemie-International Edition, 2008. 47(27): p. 5085- 5088. 99. Droge, M.J., et al., Directed evolution of Bacillus subtilis lipase A by use of enantiomeric phosphonate inhibitors: Crystal structures and phage display selection. ChemBioChem, 2006. 7(1): p. 149-157. 100. Verhaert, R.M.D., et al., Phage display selects for amylases with improved low pH starch- binding. Journal of Biotechnology, 2002. 96(1): p. 103-118. 101. Griffiths, A.D. and D.S. Tawfik, Directed evolution of an extremely fast phosphotriesterase by in vitro compartmentalization. Embo Journal, 2003. 22(1): p. 24-35. 102. Kintses, B., et al., Picoliter cell lysate assays in microfluidic droplet compartments for directed enzyme evolution. Chemistry & Biology, 2012. 19(8): p. 1001-1009. 103. Kille, S., et al., Regio- and stereoselectivity of P450-catalysed hydroxylation of steroids controlled by laboratory evolution. Nature Chemistry, 2011. 3(9): p. 738-743. 104. Lauchli, R., et al., High-throughput screening for terpene-synthase-cyclization activity and directed evolution of a terpene synthase. Angewandte Chemie International Edition, 2013. 52(21): p. 5571-5574. 105. Yoshikuni, Y., T.E. Ferrin, and J.D. Keasling, Designed divergent evolution of enzyme function. Nature, 2006. 440(7087): p. 1078-1082. 106. Bosshart, A., S. Panke, and M. Bechtold, Systematic optimization of interface interactions increases the thermostability of a multimeric enzyme. Angewandte Chemie International Edition, 2013. 52(37): p. 9673-9676. 107. Taylor, S.V., P. Kast, and D. Hilvert, Investigating and engineering enzymes by genetic selection. Angewandte Chemie-International Edition, 2001. 40(18): p. 3310-3335. 108. Lefurgy, S. and V. Cornish, Chemical complementation, in Enzyme Assays2006, Wiley- VCH Verlag GmbH & Co. KGaA. p. 183-219. 109. Kast, P., et al., Exploring the active site of chorismate mutase by combinatorial mutagenesis and selection: The importance of electrostatic catalysis. Proceedings of the National Academy of Sciences of the United States of America, 1996. 93(10): p. 5043- 5048. 110. Kleeb, A.C., et al., Metabolic engineering of a genetic selection system with tunable stringency. Proceedings of the National Academy of Sciences of the United States of America, 2007. 104(35): p. 13907-13912.

171 111. MacBeath, G., P. Kast, and D. Hilvert, Redesigning enzyme topology by directed evolution. Science, 1998. 279(5358): p. 1958-1961. 112. Butz, M., et al., An N-terminal protein degradation tag enables robust selection of highly active enzymes. Biochemistry, 2011. 50(40): p. 8594-8602. 113. Otten, L.G., et al., Altering the substrate specificity of cephalosporin acylase by directed evolution of the beta-subunit. Journal of Biological Chemistry, 2002. 277(44): p. 42121- 42127. 114. Cheriyan, M., et al., Directed evolution of a pyruvate aldolase to recognize a long chain acyl substrate. Bioorganic & Medicinal Chemistry, 2011. 19(21): p. 6447-6453. 115. Schwab, T. and R. Sterner, Stabilization of a metabolic enzyme by library selection in Thermus thermophilus. Chembiochem, 2011. 12(10): p. 1581-1588. 116. Nam, H., et al., Network context and selection in the evolution to enzyme specificity. Science, 2012. 337(6098): p. 1101-1104. 117. Edwards, J.S. and B.O. Palsson, The Escherichia coli MG1655 in silico metabolic genotype: Its definition, characteristics, and capabilities. Proceedings of the National Academy of Sciences of the United States of America, 2000. 97(10): p. 5528-5533. 118. Turner, N.J., Agar plate-based assays, in Enzyme Assays2006, Wiley-VCH Verlag GmbH & Co. KGaA. p. 137-161. 119. Mackenzie, L.F., et al., Glycosynthases: mutant glycosidases for oligosaccharide synthesis. Journal of the American Chemical Society, 1998. 120(22): p. 5583-5584. 120. Wenda, S., et al., Industrial biotechnology-the future of green chemistry? Green Chemistry, 2011. 13(11): p. 3007-3047. 121. Michener, J.K. and C.D. Smolke, High-throughput enzyme evolution in Saccharomyces cerevisiae using a synthetic RNA switch. Metabolic Engineering, 2012. 14(4): p. 306-316. 122. Stoltenburg, R., C. Reinemann, and B. Strehlitz, SELEX-A (r)evolutionary method to generate high-affinity nucleic acid ligands. Biomolecular Engineering, 2007. 24(4): p. 381- 403. 123. Chang, A.L., J.J. Wolf, and C.D. Smolke, Synthetic RNA switches as a tool for temporal and spatial control over gene expression. Current Opinion in Biotechnology, 2012. 23(5): p. 679-688. 124. Aharoni, A., et al., High-throughput Screens and Selections of Enzyme-encoding Genes, in Enzyme Assays2006, Wiley-VCH Verlag GmbH & Co. KGaA. p. 163-181. 125. Plückthun, A., Ribosome display: a perspective. Methods Mol Biol. 2012;805:3-28. doi: 10.1007/978-1-61779-379-0_1., ed. J.A.a.J. Douthwaite, Ronald H. 2012. T - ppublish. 126. Boersma, Y.L., M.J. Droge, and W.J. Quax, Selection strategies for improved biocatalysts. Febs Journal, 2007. 274(9): p. 2181-2195. 127. Fernandez-Gacio, A., M. Uguen, and J. Fastrez, Phage display as a tool for the directed evolution of enzymes. Trends in Biotechnology, 2003. 21(9): p. 408-414. 128. Droge, M.J., et al., Binding of phage displayed Bacillus subtilis lipase A to a phosphonate suicide inhibitor. Journal of Biotechnology, 2003. 101(1): p. 19-28. 129. Amstutz, P., et al., In vitro selection for catalytic activity with ribosome display. Journal of the American Chemical Society, 2002. 124(32): p. 9396-9403. 130. Steiner, D., et al., Signal sequences directing cotranslational translocation expand the range of proteins amenable to phage display. Nat Biotech, 2006. 24(7): p. 823-831. 131. Sieber, V., A. Pluckthun, and F.X. Schmid, Selecting proteins with improved stability by a phage-based method. Nat Biotech, 1998. 16(10): p. 955-960. 132. Zinchenko, A., et al., One in a Million: Flow Cytometric Sorting of Single Cell-Lysate Assays in Monodisperse Picolitre Double Emulsion Droplets for Directed Evolution. Analytical Chemistry, 2014. 86(5): p. 2526-2533. 133. Griffiths, A.D. and D.S. Tawfik, Miniaturising the laboratory in emulsion droplets. Trends in Biotechnology, 2006. 24(9): p. 395-402. 134. Guo, M.T., et al., Droplet microfluidics for high-throughput biological assays. Lab on a Chip, 2012. 12(12): p. 2146-2155. 135. Miller, O.J., et al., Directed evolution by in vitro compartmentalization. Nature Methods, 2006. 3(7): p. 561-570.

172 136. Reymond, J.L. and P. Babiak, Screening systems, in White Biotechnology, R. Ulber and D. Sell, Editors. 2007, Springer-Verlag Berlin: Berlin. p. 31-58. 137. Gillam, E.M.J., Engineering cytochrome P450 enzymes. Chemical Research in Toxicology, 2008. 21(1): p. 220-231. 138. Lafferty, M. and M.J. Dycaico, GigaMatrix: a novel ultrahigh throughput protein optimization and discovery platform. Protein Engineering, 2004. 388: p. 119-134. 139. Reetz, M.T., et al., Super-high-throughput screening of enantioselective catalysts by using capillary array electrophoresis. Angewandte Chemie-International Edition, 2000. 39(21): p. 3891-+. 140. Ma, S.K., et al., A green-by-design biocatalytic process for atorvastatin intermediate. Green Chemistry, 2010. 12(1): p. 81-86. 141. Truppo, M.D., H. Strotman, and G. Hughes, Development of an Immobilized Transaminase Capable of Operating in Organic Solvent. Chemcatchem, 2012. 4(8): p. 1071- 1074. 142. Chen, K.Q. and F.H. Arnold, Tuning the activity of an enzyme for unusual environments: sequential random mutagenesis of subtilisin E for catalysis in dimethylformamide. Proceedings of the National Academy of Sciences of the United States of America, 1993. 90(12): p. 5618-5622. 143. Lin, H.N. and V.W. Cornish, Screening and selection methods for large-scale analysis of protein function. Angewandte Chemie-International Edition, 2002. 41(23): p. 4403-4425. 144. Turner, N.J., Directed evolution drives the next generation of biocatalysts. Nature Chemical Biology, 2009. 5(8): p. 568-574. 145. Voigt, C.A., et al., Protein building blocks preserved by recombination. Nature Structural & Molecular Biology, 2002. 9(7): p. 553-558. 146. Jiang, L., et al., De novo computational design of retro-aldol enzymes. Science, 2008. 319(5868): p. 1387-1391. 147. Siegel, J.B., et al., Computational design of an enzyme catalyst for a stereoselective bimolecular Diels-Alder reaction. Science, 2010. 329(5989): p. 309-313. 148. Blomberg, R., et al., Precision is essential for efficient catalysis in an evolved kemp eliminase. Nature, 2013. 503(7476): p. 418-+. 149. Fox, R.J. and G.W. Huisman, Enzyme optimization: moving from blind evolution to statistical exploration of sequence-function space. Trends in Biotechnology, 2008. 26(3): p. 132-138. 150. Fox, R., et al., Optimizing the search algorithm for protein engineering by directed evolution. Protein Engineering, 2003. 16(8): p. 589-597. 151. Damborsky, J. and J. Brezovsky, Computational tools for designing and engineering biocatalysts. Current Opinion in Chemical Biology, 2009. 13(1): p. 26-34. 152. Levin, I. and A. Aharoni, Evolution in microfluidic droplet. Chemistry & Biology, 2012. 19(8): p. 929-931. 153. Zhu, Y. and Q. Fang, Analytical detection techniques for droplet microfluidics-A review. Analytica Chimica Acta, 2013. 787: p. 24-35. 154. Gibbs, P.R., et al., Accelerated biocatalyst stability testing for process optimization. Biotechnology Progress, 2005. 21(3): p. 762-774. 155. Rogers, T.A. and A.S. Bommarius, Utilizing simple biochemical measurements to predict lifetime output of biocatalysts in continuous isothermal processes. Chemical Engineering Science, 2009. 65(6): p. 2118-2124. 156. Peterson, M.E., et al., The dependence of enzyme activity on temperature: determination and validation of parameters. Biochemical Journal, 2007. 402: p. 331-337. 157. Sheldon, R.A., Enzyme immobilization: the quest for optimum performance. Advanced Synthesis & Catalysis, 2007. 349(8-9): p. 1289-1307. 158. Fernandez, L., et al., Thermal stabilization of trypsin with glycol chitosan. Journal of Molecular Catalysis B: Enzymatic, 2005. 34(1-6): p. 14-17. 159. Reetz, M.T., J. D Carballeira, and A. Vogel, Iterative saturation mutagenesis on the basis of B factors as a strategy for increasing protein thermostability. Angewandte Chemie, International Edition, 2006. 45(46): p. 7745-7751.

173 160. Van den Burg, B., et al., Engineering an enzyme to resist boiling. Proceedings of the National Academy of Sciences of the United States of America, 1998. 95(5): p. 2056- 2060. 161. Miyazaki, K., et al., Directed evolution study of temperature adaptation in a psychrophilic enzyme. Journal of Molecular Biology, 2000. 297(4): p. 1015-1026. 162. Giver, L., et al., Directed evolution of a thermostable esterase. Proceedings of the National Academy of Sciences of the United States of America, 1998. 95(22): p. 12809-12813. 163. Palackal, N., et al., An evolutionary route to xylanase process fitness. Protein Science, 2004. 13(2): p. 494-503. 164. Vazquez-Figueroa, E., et al., Thermostable variants constructed via the structure-guided consensus method also show increased stability in salts solutions and homogeneous aqueous-organic media. Protein Engineering, Design & Selection, 2008. 21(11): p. 673- 680. 165. Lehmann, M., et al., The consensus concept for thermostability engineering of proteins: further proof of concept. Protein Engineering, 2002. 15(5): p. 403-411. 166. Fernandez-Lafuente, R., Stabilization of multimeric enzymes: strategies to prevent subunit dissociation. Enzyme and Microbial Technology, 2009. 45(6-7): p. 405-418. 167. Das, M., et al., Design of disulfide-linked thioredoxin dimers and multimers through analysis of crystal contacts. Journal of Molecular Biology, 2007. 372(5): p. 1278-1292. 168. Kadokura, H., F. Katzen, and J. Beckwith, Protein disulfide bond formation in prokaryotes. Annual Review of Biochemistry, 2003. 72: p. 111-135. 169. Dani, V.S., C. Ramakrishnan, and R. Varadarajan, MODIP revisited: re-evaluation and refinement of an automated procedure for modeling of disulfide bonds in proteins. Protein Engineering, 2003. 16(3): p. 187-193. 170. Krissinel, E. and K. Henrick, Inference of macromolecular assemblies from crystalline state. Journal of Molecular Biology, 2007. 372(3): p. 774-797. 171. Hubbard, S.J. and P. Argos, Cavities and packing at protein interfaces. Protein Science, 1994. 3(12): p. 2194-2206. 172. Reetz, M.T., The Importance of Additive and Non-Additive Mutational Effects in Protein Engineering. Angewandte Chemie, International Edition, 2013. 52(10): p. 2658-2666. 173. Sambrook, J. and D.W. Russell, Molecular cloning: a laboratory manual. Molecular cloning: a laboratory manual2001: Cold Spring Harbor Laboratory Press. 174. Studier, F.W., Protein production by auto-induction in high-density shaking cultures. Protein Expression and Purification, 2005. 41(1): p. 207-234. 175. Larkin, M.A., et al., Clustal W and Clustal X version 2.0. Bioinformatics, 2007. 23(21): p. 2947-2948. 176. Arnold, K., et al., The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics, 2006. 22(2): p. 195-201. 177. Ashkenazy, H., et al., ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids. Nucleic Acids Research, 2010. 38: p. W529-W533. 178. Sobolev, V., et al., SPACE: a suite of tools for protein structure prediction and analysis based on complementarity and environment. Nucleic Acids Research, 2005. 33: p. W39- W43. 179. Tracewell, C.A. and F.H. Arnold, Directed enzyme evolution: climbing fitness peaks one amino acid at a time. Current Opinion in Chemical Biology, 2009. 13(1): p. 3-9. 180. Gumulya, Y. and M.T. Reetz, Enhancing the thermal robustness of an enzyme by directed evolution: least favorable starting points and inferior mutants can map superior evolutionary pathways. Chembiochem, 2011. 12(16): p. 2502-2510. 181. Tokuriki, N. and D.S. Tawfik, Stability effects of mutations and protein evolvability. Current Opinion in Structural Biology, 2009. 19(5): p. 596-604. 182. Polizzi, K.M., et al., Stability of biocatalysts. Current Opinion in Chemical Biology, 2007. 11(2): p. 220-225. 183. Tokuriki, N. and D.S. Tawfik, Protein dynamism and evolvability. Science, 2009. 324(5924): p. 203-207. 184. Ito, S., Features and applications of microbial sugar epimerases. Applied Microbiology and Biotechnology, 2009. 84(6): p. 1053-1060.

174 185. Yoshida, H., et al., Crystal structures of D-tagatose 3-epimerase from Pseudomonas cichorii and its complexes with D-tagatose and D-fructose. Journal of Molecular Biology, 2007. 374(2): p. 443-453. 186. Mu, W.M., et al., Recent advances on applications and biotechnological production of D- psicose. Applied Microbiology and Biotechnology, 2012. 94(6): p. 1461-1467. 187. Aharoni, A., et al., The 'evolvability' of promiscuous protein functions. Nature Genetics, 2005. 37(1): p. 73-76. 188. Huwig, A., et al., Enzymatic synthesis of L-tagatose from galactitol with galactitol dehydrogenase from Rhodobacter sphaeroides D. Carbohydrate Research, 1997. 305(3-4): p. 337-339. 189. Takeshita, K., et al., Direct production of allitol from D-fructose by a coupling reaction using D-tagatose 3-epimerase, ribitol dehydrogenase and formate dehydrogenase. Journal of Bioscience and Bioengineering, 2000. 90(5): p. 545-548. 190. Kim, K., et al., Crystal structure of D-psicose 3-epimerase from Agrobacterium tumefaciens and its complex with true substrate D-fructose: a pivotal role of metal in catalysis, an active site for the non-phosphorylated substrate, and its conformational changes. Journal of Molecular Biology, 2006. 361(5): p. 920-931. 191. Koudelakova, T., et al., Engineering enzyme stability and resistance to an organic cosolvent by modification of residues in the access tunnel. Angewandte Chemie International Edition, 2013: p. n/a-n/a. 192. Markley, J.L., Observation of histidine residues in proteins by nuclear magnetic resonance spectroscopy. Accounts of Chemical Research, 1975. 8(2): p. 70-80. 193. Anandakrishnan, R., B. Aguilar, and A.V. Onufriev, H++3.0: automating pK prediction and the preparation of biomolecular structures for atomistic molecular modeling and simulations. Nucleic Acids Research, 2012. 40(W1): p. W537-W541. 194. Tokuriki, N., et al., Diminishing returns and tradeoffs constrain the laboratory optimization of an enzyme. Nat Commun, 2012. 3: p. 1257. 195. Fox, R.J. and M.D. Clay, Catalytic effectiveness, a measure of enzyme proficiency for industrial applications. Trends in Biotechnology, 2009. 27(4): p. 189-189. 196. Pavlova, M., et al., Redesigning dehalogenase access tunnels as a strategy for degrading an anthropogenic substrate. Nat Chem Biol, 2009. 5(10): p. 727-733. 197. Wagner, N., et al., Practical aspects of integrated operation of biotransformation and SMB separation for fine chemical synthesis. Organic Process Research & Development, 2012. 16(2): p. 323-330. 198. Kornberger, P., et al., Modification of galactitol dehydrogenase from Rhodobacter sphaeroides D for immobilization on polycrystalline gold surfaces. Langmuir, 2009. 25(20): p. 12380-12386. 199. Kapust, R.B., et al., Tobacco etch virus protease: mechanism of autolysis and rational design of stable mutants with wild-type catalytic proficiency. Protein Engineering, 2001. 14(12): p. 993-1000. 200. Silva-Rocha, R., et al., The standard european vector architecture (SEVA): a coherent platform for the analysis and deployment of complex prokaryotic phenotypes. Nucleic Acids Research, 2013. 41(D1): p. D666-D675. 201. Gasteiger, E., et al., ExPASy: the proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Research, 2003. 31(13): p. 3784-3788. 202. Khersonsky, O., et al., Evolutionary optimization of computationally designed enzymes: kemp eliminases of the KE07 series. Journal of Molecular Biology, 2010. 396(4): p. 1025- 1042. 203. Leslie, A.G.W., The integration of macromolecular diffraction data. Acta Crystallographica Section D-Biological Crystallography, 2006. 62: p. 48-57. 204. Evans, P., Scaling and assessment of data quality. Acta Crystallographica Section D- Biological Crystallography, 2006. 62: p. 72-82. 205. Brunger, A.T., Free R-value -a novel statistical quantity for assessing the accuracy of crystal structures. Nature, 1992. 355(6359): p. 472-475. 206. Vagin, A. and A. Teplyakov, MOLREP: an automated program for molecular replacement. Journal of Applied Crystallography, 1997. 30: p. 1022-1025.

175 207. Murshudov, G.N., A.A. Vagin, and E.J. Dodson, Refinement of macromolecular structures by the maximum-likelihood method. Acta Crystallographica Section D-Biological Crystallography, 1997. 53: p. 240-255. 208. Emsley, P., et al., Features and development of Coot. Acta Crystallographica Section D- Biological Crystallography, 2010. 66: p. 486-501. 209. Laskowski, R.A., D.S. Moss, and J.M. Thornton, Main-chain bond lengths and bond angles in protein structures. Journal of Molecular Biology, 1993. 231(4): p. 1049-1067. 210. Holm, L. and J. Park, DaliLite workbench for protein structure comparison. Bioinformatics, 2000. 16(6): p. 566-567. 211. Blaser, H.-U., et al., Comparison of Four Technical Syntheses of Ethyl (R)-2-Hydroxy-4- Phenylbutyrate, in Asymmetric Catalysis on Industrial Scale2004, Wiley-VCH Verlag GmbH & Co. KGaA. p. 91-103. 212. Makart, S., M. Bechtold, and S. Panke, Separation of amino acids by simulated moving bed under solvent constrained conditions for the integration of continuous chromatography and biotransformation. Chemical Engineering Science, 2008. 63(21): p. 5347-5355. 213. Takata, M.K., et al., Neuroprotective effect of D-psicose on 6-hydroxydopamine-induced apoptosis in rat pheochromocytoma (PC12) cells. Journal of Bioscience and Bioengineering, 2005. 100(5): p. 511-516. 214. Vasic-Racki, D., History of Industrial Biotransformations – Dreams and Realities, in Industrial Biotransformations2006, Wiley-VCH Verlag GmbH & Co. KGaA. p. 1-36. 215. Bremus, C., et al., The use of microorganisms in L-ascorbic acid production. Journal of Biotechnology, 2006. 124(1): p. 196-205. 216. Bommarius, A.S. and M.F. Paye, Stabilizing biocatalysts. Chemical Society Reviews, 2013. 42(15): p. 6534-6565. 217. Seelbach, K., et al., Improvement of the total turnover number and space-time yield for chloroperoxidase catalyzed oxidation. Biotechnology and Bioengineering, 1997. 55(2): p. 283-288. 218. Bechtold, M. and S. Panke, Model-based characterization of operational stability of multimeric enzymes with complex deactivation behavior: An in-silico investigation. Chemical Engineering Science, 2012. 80(1): p. 435-450. 219. Peterson, M.E., et al., A new intrinsic thermal parameter for enzymes reveals true temperature optima. Journal of Biological Chemistry, 2004. 279(20): p. 20717-20722. 220. Klibanov, A.M., Stabilization of enzymes against thermal inactivation. Advances in Applied Microbiology, 1983. 29: p. 1-28. 221. Kragl, U., et al., Enzyme engineering aspects of biocatalysis: Cofactor regeneration as example. Biotechnology and Bioengineering, 1996. 52(2): p. 309-319. 222. Salagnad, C., et al., Enzymatic large-scale production of 2-keto-3-deoxy-D-glycero-D- galacto-nonopyranulosonic acid in enzyme membrane reactors. Biotechnology Progress, 1997. 13(6): p. 810-813. 223. Yuryev, R., S. Strompen, and A. Liese, Coupled chemo(enzymatic) reactions in continuous flow. Beilstein Journal of Organic Chemistry, 2011. 7: p. 1449-1467. 224. Paramesvaran, J., et al., Distributions of enzyme residues yielding mutants with improved substrate specificities from two different directed evolution strategies. Protein Engineering Design & Selection, 2009. 22(7): p. 401-411. 225. Morley, K.L. and R.J. Kazlauskas, Improving enzyme properties: when are closer mutations better? Trends in Biotechnology, 2005. 23(5): p. 231-237. 226. Kuchner, O. and F.H. Arnold, Directed evolution of enzyme catalysts. Trends in Biotechnology, 1997. 15(12): p. 523-530. 227. Reetz, M.T., Laboratory evolution of stereoselective enzymes: A prolific source of catalysts for asymmetric reactions. Angewandte Chemie-International Edition, 2011. 50(1): p. 138- 174. 228. Stemmer, W.P.C., DNA shuffling by random fragmentation and reassembly - In-vitro recombination for molecular evolution. Proceedings of the National Academy of Sciences of the United States of America, 1994. 91(22): p. 10747-10751. 229. Dennig, A., et al., OmniChange: the sequence independent method for simultaneous site- saturation of five codons. PLoS One, 2011. 6(10).

176 230. Kim, C.H., S.G. Song, and C. Park, The D-allose operon of Escherichia coli K-12. Journal of Bacteriology, 1997. 179(24): p. 7631-7637. 231. Poulsen, T.S., Y.Y. Chang, and B. Hove-Jensen, D-Allose catabolism of Escherichia coli: involvement of alsI and regulation of als regulon expression by allose and ribose. Journal of Bacteriology, 1999. 181(22): p. 7126-7130. 232. Sproul, A.A., et al., Genetic control of manno(fructo)kinase activity in Escherichia coli. Proceedings of the National Academy of Sciences of the United States of America, 2001. 98(26): p. 15257-15259. 233. Miller, B.G. and R.T. Raines, Reconstitution of a defunct glycolytic pathway via recruitment of ambiguous sugar kinases. Biochemistry, 2005. 44(32): p. 10776-10783. 234. Sebastian, J. and C. Asensio, Purification and properties of mannokinase from Escherichia coli. Archives of Biochemistry and Biophysics, 1972. 151(1): p. 227-&. 235. Zittan, L., P.B. Poulsen, and S.H. Hemmingsen, Sweetzyme - a new immobilized glucose isomerase. Starke, 1975. 27(7): p. 236-241. 236. Ginsburg, A. and A. Peterkofsky, Enzyme I: The gateway to the bacterial phosphoenolpyruvate:sugar phosphotransferase system. Archives of Biochemistry and Biophysics, 2002. 397(2): p. 273-278. 237. Postma, P.W., J.W. Lengeler, and G.R. Jacobson, Phosphoenolpyruvate:carbohydrate phosphotransferase systems of bacteria. Microbiological Reviews, 1993. 57(3): p. 543-594. 238. García-Alles, L.F., A. Zahn, and B. Erni, Sugar recognition by the glucose and mannose permeases of Escherichia coli. Steady-state kinetics and inhibition studies. Biochemistry, 2002. 41(31): p. 10077-10086. 239. Doelle, H.W., Kinetic characteristics and regulatory mechanisms of glucokinase and fructokinase from Zymomonas mobilis. European Journal of Applied Microbiology and Biotechnology, 1982. 14(4): p. 241-246. 240. Scopes, R.K., et al., Simultaneous purification and characterization of glucokinase, fructokinase and glucose-6-phosphate-dehydrogenase from Zymomonas mobilis. Biochemical Journal, 1985. 228(3): p. 627-634. 241. Lin, C.J., W.C. Tseng, and T.Y. Fang, Characterization of a thermophilic L-rhamnose isomerase from Caldicellulosiruptor saccharolyticus ATCC 43494. Journal of Agricultural and Food Chemistry, 2011. 59(16): p. 8702-8708. 242. Takata, G., et al., Characterization of Mesorhizobium loti L-rhamnose isomerase and its application to L-talose production. Bioscience Biotechnology and Biochemistry, 2011. 75(5): p. 1006-1009. 243. Volz, E., Expression of the L-rhamnose isomerase from T. neapolitana in E. coli, purification and subsequent characterization of the enzyme, 2011. 244. Pitera, D.J., et al., Balancing a heterologous mevalonate pathway for improved isoprenoid production in Escherichia coli. Metabolic Engineering, 2007. 9(2): p. 193-207. 245. Martin, V.J.J., et al., Engineering a mevalonate pathway in Escherichia coli for production of terpenoids. Nature Biotechnology, 2003. 21(7): p. 796-802. 246. Giger, L., et al., A novel genetic selection system for PLP-dependent threonine aldolases. Tetrahedron, 2012. 68(37): p. 7549-7557. 247. Sigrell, J.A., et al., Structure of Escherichia coli ribokinase in complex with ribose and dinucleotide determined to 1.8 angstrom resolution: insights into a new family of kinase structures. Structure, 1998. 6(2): p. 183-193. 248. Sato, M., et al., D-Ribose competitively reverses inhibition by D-psicose of larval growth in Caenorhabditis elegans. Biological & Pharmaceutical Bulletin, 2009. 32(5): p. 950-952. 249. Lim, Y.R. and D.K. Oh, Microbial metabolism and biotechnological production of D-allose. Applied Microbiology and Biotechnology, 2011. 91(2): p. 229-235. 250. Hanka, L.J., Psicofuranine, in Antibiotics, D. Gottlieb and P. Shaw, Editors. 1967, Springer Berlin Heidelberg. p. 457-463. 251. Miller, B.G. and R.T. Raines, Identifying latent enzyme activities: Substrate ambiguity within modern bacterial sugar kinases. Biochemistry, 2004. 43(21): p. 6387-6392. 252. Zhao, H.M. and F.H. Arnold, Optimization of DNA shuffling for high fidelity recombination. Nucleic Acids Research, 1997. 25(6): p. 1307-1308.

177 253. Baba, T., et al., Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Molecular Systems Biology, 2006. 2: p. 11. 254. Datsenko, K.A. and B.L. Wanner, One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products. Proceedings of the National Academy of Sciences of the United States of America, 2000. 97(12): p. 6640-6645. 255. Lynn C. Thomason, N.C., Donald L. Court, E. coli genome manipulation by P1 transduction. Current Protocols in Molecular Biology, 2007. 256. Knight, T.F., Idempotent vector design for standard assembly of BioBricks. Tech. rep., MIT Synthetic Biology Working Group Technical Reports. [http://hdl.handle.net/1721.1/21168], 2003. 257. Meyer, D., et al., Molecular characterization of glucokinase from Escherichia coli K-12. Journal of Bacteriology, 1997. 179(4): p. 1298-1306. 258. Valgepea, K., et al., Escherichia coli achieves faster growth by increasing catalytic and translation rates of proteins. Molecular BioSystems, 2013. 9(9): p. 2344-2358. 259. Neidhardt, C.F., Escherichia coli and Salmonella: cellular and molecular biology. Vol. 1. 1996: ASM Press; 2 edition. 260. Link, H., B. Anselment, and D. Weuster-Botz, Leakage of adenylates during cold methanol/glycerol quenching of Escherichia coli. Metabolomics, 2008. 4(3): p. 240-247. 261. Hixson, J.E., Short protocols in molecular biology. American Journal of Human Biology, ed. F. Ausubel, Brent,R, Kingston,RE, Moore,DD, Seidman,JG, Smith,JA, Struhl,K, Wangiverson,P, Bonitz,SG. Vol. 2. 1990. 172-173. 262. Roos, A.K., et al., D-Ribose-5-phosphate isomerase B from Escherichia coli is also a functional D-allose-6-phosphate isomerase, while the Mycobacterium tuberculosis enzyme is not. Journal of Molecular Biology, 2008. 382(3): p. 667-679. 263. Zaslaver, A., et al., A comprehensive library of fluorescent transcriptional reporters for Escherichia coli. Nature Methods, 2006. 3(8): p. 623-628.

178 CHAPTER 9: ACKNOWLEDGEMENTS First of all I would like to thank the supervisors of my thesis, Prof. Sven Panke and Dr. Matthias Bechtold, for giving me the opportunity to work on this project (which I’m still enthusiastic about), for constant support and helpful suggestions throughout the whole thesis. I greatly appreciated the freedom and the confidence I was given to develop my own ideas and pursue experiments in my own way, sometimes in a slightly different manner than I was initially supposed to do. I had a very instructive and intense time here in the Bioprocess Lab and will always look back to it with gratitude.

My special thanks go to the members of my thesis committee, Prof. Dr. Sai Reddy and Prof. Dr. Andreas Plückthun, for taking their time to examine this thesis and being co-referees.

Thanks to the “3.32-Crew”, I had a very good time in this lab with lots of inspiring discussions on scientific (and less scientific) topics, with brainstorming on experimental problems and extensive sessions on the white-board. Thank you Nina, Markus F, Christian a.k.a. Crissy and Irene.

Further, I want to thank all the present and past members of the Panke Lab, for discussions during the coffee break, for lots of fun during lab hikes and ski-weekends, for suggestions and help throughout my time here in the lab. Thanks Giovanni, Marcel, Alex N, Matthias Bu, Karel, Robert, Martin, René, Andy, Giorgia, Steven, Eva, Alex S, Anke, Johannes, Sabine, Vijay, Michael, Daniel, Corinna, Anne, Tillman, Markus J, Gaspar, Luzi, Christoph, Tania, Katja, Sven D... I’m sure I forgot to mention some people, but please take a “Thanks” nonetheless. I also want to thank all my students that contributed to this thesis in one or the other way. Thank you Sebastian Lont, Sebastian Locher, Lei, Ximing and Esther.

I’m very greateful for the big help and friendliness I experienced in the group of Prof. Dr. Tillman Schirmer during the crystallization and structure determination of my proteins. Special thanks go to Chee Seng Hee for introducing me to the secrets of X-ray crystallography. Thanks Chee, I really enjoyed our collaboration and learned a lot from you. Thanks go also to Stephanie, Frédéric, Amit, Badri, Roman and Eddy for help and suggestions during crystallization and measurements at the SLS beamline.

The people from FIS, IT and the department administration of D-BSSE deserve a big thank you for their excellent and very efficient help in minor and major problems around the lab and for providing a very good infrastructure for doing research. A very special thank you goes to the

179 shop-team for providing a terrific service and facilitating lab-work very much. Thanks Rolf, Linda, Fabian, Rosita, Ursi, Ester and Jean-Pierre.

The biggest thank of all goes to Nina, for working together with me on this challenging project, for sharing successes and failures, for encouragement and solace in rough times and for inspiring (scientific) discussions during long hikes.

Finally, I want to thank my family for their constant support during my studies.

180 CHAPTER 10: CURRICULUM VITAE

Andreas Bosshart Born: 29th November 1982 in Wattwil, Switzerland Citizenship: Switzerland

Education: Since 05/2009 PhD studies under the supervision of Prof. Dr. Sven Panke at the Department of Biosystems Science and Engineering (D-BSSE), ETH Zurich Thesis Title: Enzyme Engineering for Intensified Processes for the Production of Rare Monosaccharides

10/2006 – 02/2008 M.Sc. UZH in Biochemistry under the supervision of Prof. Dr. Andreas Plückthun, Biochemical Institute, University of Zurich Thesis Title: Designed Mercury Binding Sites in DARPins; The Binder cp34h_15 specific to the Membrane Protein CitS as an Example

10/2003 – 10/2006 B.Sc. UZH in Biochemistry, University of Zurich

08/2002 Matura, Kantonsschule Wattwil, Switzerland

Employment: 04/2008 – 04/2009 Research assistant at the NCCR Structural Biology High-Throughput Lab, University of Zurich

Publications: Bosshart, A., Panke, S. & Bechtold, M. Systematic Optimization of Interface Interactions Increases the Thermostability of a Multimeric Enzyme. Angew. Chem. Int. Ed. 52, 9673-9676 (2013).

Wagner, N., Fuereder, M., Bosshart, A., Panke, S. & Bechtold, M. Practical Aspects of Integrated Operation of Biotransformation and SMB Separation for Fine Chemical Synthesis. Org. Process Res. Dev. 16, 323-330 (2012).

Conference Participations: 08/2013 Poster presentation at Biotrans 2013, Manchester (UK) 09/2012 Poster presentation at 15th European Congress on Biotechnology, Istanbul (TR) 09/2011 Poster presentation at Biotrans 2011, Giardini Naxos (I)

181

182