<<

Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1318

Aldolases for Enzymatic Carboligation

Directed Evolution and Enzyme Structure-Function Relationship Studies

HUAN MA

ACTA UNIVERSITATIS UPSALIENSIS ISSN 1651-6214 ISBN 978-91-554-9411-7 UPPSALA urn:nbn:se:uu:diva-266902 2015 Dissertation presented at Uppsala University to be publicly examined in B22, BMC, Husargatan3, Uppsala, Friday, 5 February 2016 at 13:15 for the degree of Doctor of Philosophy. The examination will be conducted in English. Faculty examiner: Prof. Dr. Wolf- Dieter Fessner (Organische Chemie, Technische Universität Darmstadt ).

Abstract Ma, H. 2015. Aldolases for Enzymatic Carboligation. Directed Evolution and Enzyme Structure-Function Relationship Studies. Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1318. 67 pp. Uppsala: Acta Universitatis Upsaliensis. ISBN 978-91-554-9411-7.

The research summarized in this thesis focuses on directed evolution and enzyme mechanism studies of two aldolases: 2-deoxyribose-5- aldolase (DERA) and -6- phosphate aldolase (FSA). Aldolases are nature’s own catalysts for one of the most fundamental reactions in organic chemistry: the formation of new -carbon bonds. In biological systems, aldol formation and cleavage reactions play central roles in . In organic synthesis, aldolases attract great attention as environmentally friendly alternative for the synthesis of polyhydroxylated compounds in stereocontrolled manner. However, naturally occurring aldolases can hardly be used directly in organic synthesis mainly due to their narrow substrate scopes, especially phosphate dependency on substrate level. Semi-rational directed evolution was used in order to investigate the possibility of expanding the substrate scope of both DERA and FSA and to understand more about the relationship between protein structure and catalytic properties. The first two projects focus on the directed evolution of DERA and studies of the enzyme mechanism. The directed evolution project aims to alter the acceptor substrate preference from phosphorylated to aryl-substituted aldehydes. Effort has been made to develop screening methods and screen for variants with desired properties. In the study of enzyme mechanism where enzyme steady state kinetic studies were combined with molecular dynamic simulations, we investigated the role of Ser238 and Ser239 in the phosphate binding site and the possible connection between enzyme dynamics and catalytic properties. The other two projects focus on the directed evolution of FSA and the development of a new screening assay facilitating screening for FSA variants with improved activity in catalyzing aldol reaction between phenylacetaldehyde and hydroxyacetone. The new assay is based on a coupled enzyme system using an engineered alcohol dehydrogenase, FucO DA1472, as reporting enzyme. The assay has been successfully used to identify a hit with 9-fold improvement in catalytic efficiency and to determine the steady state kinetic parameters of wild-type FSA as well as the mutants. The results from directed evolution illustrated the high degree malleability of FSA active site. This opens up possibilities to generate FSA variants which could utilize both aryl-substituted donor and acceptor substrates.

Keywords: aldolase, 2-deoxyribose-5-phosphate aldolase, fructose-6-phosphate aldolase, directed evolution, enzyme dynamics

Huan Ma, Department of Chemistry - BMC, , Box 576, Uppsala University, SE-75123 Uppsala, Sweden.

© Huan Ma 2015

ISSN 1651-6214 ISBN 978-91-554-9411-7 urn:nbn:se:uu:diva-266902 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-266902)

       

List of Papers

This thesis is based on the following papers, which are referred to in the text by their Roman numerals.

I Ma, H., Szeler, K., Kamerlin, S. C. L. and Widersten, M. (2015) Linking Coupled Motions and Entropic Effects to the Catalytic Ac- tivity of 2-deoxyribose-5-phosphate Aldolase (DERA). Chem. Sci., in the press, DOI: 10.1039/C5SC03666F.

II Ma, H., Enugala, T. R. and Widersten, M. (2015) A Microplate Format Assay for Real-Time Screening for New Aldolases that Ac- cept Aryl-Substituted Acceptor Substrate. ChemBioChem, in the press, DOI: 10.1002/cbic.201500466.

Ma, H., Enugala, T. R. and Widersten, M. (2015) Directed Evolution of Fructose-6-Phosphate Aldolase Towards Improved Activity with Aryl-Substituted Acceptor Substrates. Manuscript.

Reprints were made with permission from the respective publishers.

Publication not included in this thesis

IV Janfalk Carlsson, Å., Bauer, P., Ma, H. and Widersten, M. (2012) Obtaining Optical Purity for Product Diols in Enzyme-Catalyzed Epoxide Hydrolysis: Contributions from Changes in both Enantio- and Regioselectivity. Biochemistry, 51: 7627-7637.

Contents

Introduction ...... 11 1. Biocatalysis and biocatalysts ...... 11 2. Directed evolution ...... 13 3. Saturation mutagenesis and codon denegeracy ...... 14 4. Molecular dynamics simulation ...... 16 5. Enzyme dynamics and its potential link to enzyme catalysis ...... 17 6. Aldolases ...... 18 6.1 Aldol reaction ...... 18 6.2 Aldolases ...... 18 6.3 Class I aldolases ...... 19 7. Aldolases studied in this thesis ...... 20 7.1 2-Deoxyribose-5-phosphate aldolase: DERA ...... 20 7.2 Fructose-6-phosphate aldolase: FSA ...... 23 Present Investigations ...... 25 Present Investigation I: Directed evolution of DERA- an attempt to expand the acceptor substrate scope to non-phosphorylated aromatic compounds...... 26 1. Aim ...... 26 2. The design of enzyme libraries ...... 27 3. Screening assay using thin layer chromatography and 14C labeled acetaldehyde...... 28 4. Screening results and hits sequence analysis ...... 30 5. Conclusion and discussion ...... 31 Present Investigation II: Enzyme kinetic study and molecular dynamics simulation on wild-type DERA and related mutants...... 33 1. Aim ...... 33 2. Choice of enzymes ...... 33 3. Role of phosphate binding in DERA ...... 34 4. Enzyme steady state kinetic analysis ...... 35 5. Molecular dynamics simulation ...... 38 6. Conclusion ...... 40 Present Investigation III: Development of screening assay for FSA variants with improved activity towards phenylacetaldehyde ...... 41 1. Aim ...... 41

2. Development of a coupled enzyme assay for detecting the formation of phenylacetaldehyde ...... 41 3. Steady state kinetic study of FSA in aldol cleavage reaction ...... 42 4. Test screening on 96-well plate ...... 43 5. Conclusion ...... 45 Present Investigation IV: Directed evolution of FSA ...... 46 1. Aim ...... 46 2. Library design ...... 46 3. Library screening and identification of hits ...... 47 4. Steady state kinetic studies on first generation library hits ...... 49 5. Conclusion and future plan ...... 49 Concluding Remarks and Future Perspectives ...... 51 Populärvetenskaplig Sammanfattning ...... 52 Résumé ...... 55 论文摘要 ...... 58 Acknowledgements ...... 60 References ...... 65

Abbreviations

2dR 2-deoxyribose 2dR5P 2-deoxyribose-5-phosphate CASTing Combinatorial active site testing DCCM Dynamic cross correlation maps DERA 2-deoxyribose-5-phosphate aldolase DHPP 3,4-dihydroxy-5-phenylpentane-2-one E. coli Escherichia coli F6P Fructose-6-phosphate FSA Fructose-6-phosphate aldolase FucO Propanediol oxidoreductase G3P -3-phosphate G3PD Glycerol-3-phosphate dehydrogenase GOL Glycerol HPD 1-hydroxy-pentane-3,4-diol-5-phosphate open form of 2-deoxyribofuranose-5-phosphate ISM Iterative saturation mutagenesis MD Molecular dynamics NADH Nicotinamide dinucleotide, reduced form PEP Phosphoenolpyruvate RMSF Root mean square fluctuations StEH1 Solanum tuberosum epoxide hydrolase 1 TEA Triethanolamine THF Tetrahydrofuran TIM/TPI phosphate isomerase TLC Thin Layer Chromatography

Introduction

1. Biocatalysis and biocatalysts

“Be an enzyme, a catalyst for change” – unknown

There are two fundamental conditions for : firstly the organism must be able to self-replicate and, secondly, it must be able to catalyze a vast variety of chemical reactions efficiently.1 Without doubt, the second point is achieved by hundreds of thousands of different enzymes in living organisms, from the simplest E. coli to human beings with much more complicated structures and advanced functions. Enzymes are proteins with the exception of a small group of catalytic . Enzymes are central to each and every biochemical process. They catalyze thousands of stepwise reactions that keep life going.

The biochemical term of enzyme was first coined by the German physiolo- gist Wilhelm Kühne (1837-1900) as enzym. But origin of the term enzyme dated back to ancient Greek, en-within and zumē-leaven. This reveals the very early understanding of enzyme in the human history, it was something one added to the dough to make it fermented. Human discovered fermenta- tion as an important food processing method as early as Neolithic age. For example, fermentation is used for preservation in a process that produces lactic acid as found in sour foods such as pickled cucumber and kimchi, in making cheese and of course, for producing alcoholic beverages such as wine and beer. These are the very early applications of enzymes, long before the concept and the science behind them were known.

The concept of modern biocatalysis refers to the application of enzymes and microbes in synthetic chemistry.2 It is the application of nature’s own cata- lysts for new purposes, for which they have not been evolved for in their natural context. The first milestone in modern biocatalysis was the enzyme mediated kinetic resolution carried out by Louis Pasteur in 1858 by treating aqueous solution of racemic tartaric acid with Penicillium glaucum culture. Another remarkable event was in 1897, Eduard Buchner reported successful fermentation of sugar by cell-free yeast extract. This was the first proof that the biological transformations do not necessarily require living cells and this

11 leads to the expansion of fermentation from food processing to the produc- tion of a vast array of chiral and achiral chemicals. 3

Enzymes are able to catalyze a variety of different reactions with remarkable selectivity and the reaction range is increasing with more and more enzymes being discovered and characterized.4 Enzymes are produced from renewable sources, biodegradable and non-toxic. Enzymes enable a more environmen- tally friendly manufacturing process at mild pH, moderate temperatures and atmospheric pressure. The strategic importance of biocatalysis in chemical industry lies not only in greener manufacturing processes, more importantly, it opens up new possibilities in chemical manufacturing towards the goal of green chemistry and sustainable development. Firstly, it presents new ap- proaches for utilizing biomass, such as agricultural crops and residues, ani- mal and industrial residues, sewage as industrial feedstock for chemical and fuel production.5-7 Therefore it brings changes to the current heavy depend- ency on the unsustainable petroleum and fossil fuel feedstocks. Secondly, biocatalysis is an enabling technology which provides the access to new types of chemicals that are difficult, if not impossible, to manufacture in traditional chemical industry.8

Over the past decades, with the rapid advances in molecular , bio- technology and structure biology, biocatalysis has overcome many long standing obstacles and been established as an environmentally friendly alter- native to traditional metallo- and organo-catalysis.9 More and more enzymat- ic synthesis processes for different types of bulk chemicals, pharmaceuticals, fine chemicals, agrochemicals and food ingredients have been reported, both in laboratory and at industrial scale.10,11

However, there are still many challenges ahead in expanding the application of biocatalysis. For example, a limited number of enzymes exist in nature that can be used for direct industrial application. Even though, there are en- zymes that catalyze almost any type of reaction, however due to evolution, enzymes are generally specific for their natural substrates, which are often of little industrial interest. The stability and activity of naturally occurring en- zymes are often not applicable enough under industrially conditions, which often require organic solvents and sometimes increased temperatures.3 This is due to enzyme activity is largely depending on its unique three- dimensional structure maintained mainly by weak interactions. The change in temperature, pH, solvent polarity, physical forces disturbs these interac- tions and decrease the enzyme activity. Some enzymes suffer from the sub- strate and/or product inhibition, which prevents them from being used in processing condition with high substrate load. The development cycle is generally longer for new biocatalysts as compared to their chemical counter

12 parts. The developing time for most of the flagship industrial scale biocatal- ysis processes has taken between 10 and 20 years.8

To overcome the above mentioned limitations concerning the catalytic prop- erties of enzymes, more advanced protein engineering strategies are required to facilitate intensive modification of enzyme properties from different as- pects, and to make them fit better with the industrial requirements. The time consuming development cycle is due to the as-yet incomplete knowledge in biocatalysis and protein engineering.8 To solve this problem and convert collections of case studies into engineering principles will require deeper understanding of the relationships between amino acid sequence, protein structure and catalytic function of enzymes, and to develop machine-learning guided protein engineering methods rather than trial-and-error approaches.

2. Directed evolution

“The amino acid sequences of similar proteins in mice and human typically differ around 13%. Today’s advance protein engineering makes similar change in converting a wild-type enzyme into an enzyme suitable for chemi- cal process application. This protein engineering is equivalent to compress the 75,000,000-yr evolution of an early mammal into modern-day mice and human into several months of laboratory work.” - U. T. Bornscheuer2

Directed evolution is a widely used engineering strategy for modifying the physical properties or biochemical functions of proteins.12 It iteratively mim- ics the natural evolution on molecular level in vitro to evolve protein to- wards a certain defined goal. The theoretical foundation for directed evolu- tion is the same as in classical Darwinian evolution, where the competition under a selection pressure tailors the phenotypes and ultimately leads to new species by selecting for beneficial traits.13 The major difference in directed evolution compared to the natural process is the artificial selection pressure. A protein can evolve outside of its biological context and this opens up the possibility to acquire new enzyme properties that are less likely, if not im- possible, to acquire in nature.14 Therefore it provides means for tailoring enzymes for new synthetic tasks as biocatalysts.15 This process also offers empirical lessons about how protein evolves under clearly defined selection pressure and the entire fossil record of the evolutionary intermediates pro- vides valuable information in studying the relationship between amino acid sequence, protein structure and protein function.16

The initial concept and attempt of laboratory directed evolution was first reported in 1980s.17 It is based on the site-specific mutagenesis developed by Michael Smith.18 He showed that the change in amino acid sequences can

13 affect the catalytic property and this concept paved the road for modern di- rected evolution through which the native enzyme catalytic properties can be intensively modified.19 After nearly four decades of development, directed evolution is now a standard and effective approach for tuning the catalytic properties of an enzyme in a targeted manner to achieve a better biocatalyst that can be ultimately used in industrial processes of chemical synthesis.20 There are many enzymatic properties can be changed or improved by di- rected evolution, for example, activity, thermal stability, stability in organic solvent, activity in high substrate concentration, enantioselectivity, broader substrate scope or even novel catalytic functions.21 Numerous successful examples of directed evolution in both laboratory and industry scale have been reported.22-25

In directed evolution, there are many different approaches to generate en- zyme variants depending on the resources. In Table 1, different strategies are summarized according to their randomness and mutation extent.26

Randomness of Extent of residues mutated mutation LOW HIGH LOW Saturation Mutagenesis de novo Design HIGH Error-prone PCR DNA Shuffling

Table 1. Different mutagenesis strategies in directed evolution

In random directed evolution, the mutations are introduced into the protein in a random manner. The big library size usually requires intensive screening effort and it is also difficult to understand, at the molecular level, the role of each and every amino acid residue in altering the enzyme properties. 26

In sequence and/or structure guided semi-rational directed evolution, the targets for mutagenesis are generally chosen based on the known enzyme structures. The libraries are usually smaller. More focused libraries help to reduce the time requirement and cost in screening. The studies of library hits provide valuable insights on protein structure and function relationships.27 With more and more in silico methods (bioinformatics, molecular modeling, de novo design) being incorporated into directed evolution, computational aided directed evolution is becoming an approach with expanding applica- tions.28

14 3. Saturation mutagenesis and codon denegeracy

Knowledge-guided approaches for navigating the protein sequence space

The catalytic activity of an enzyme is largely depending on its unique micro- environment and this particular structure is determined by the primary struc- ture. Therefore the modification of amino acid sequence opens up many possibilities to alter the catalytic properties of enzymes.26 The mutations can be introduced into a parent protein in different ways.

Saturation mutagenesis (SM) refers to the full randomization of amino acid residues at predetermined sites in an enzyme. Iterative saturation mutagene- sis (ISM) is a combined repetition of saturation mutagenesis at different predetermined sites.29 For example, in a directed evolution project that aims at producing enzyme variants which could accept a new substrate S, three sites A, B, C are chosen to form three different enzyme libraries. Each site may involve more than one amino acid residues. Initial rounds of saturation mutagenesis are performed at each chosen site. The hits from the first round of selection from each library are used as templates for a second and a third round of saturation mutagenesis at other two sites. In this process, the choice of the saturation mutagenesis sites is critical for the library quality. Combi- natorial active site testing (CASTing) is such a way to navigate the possible sites for mutagenesis within the enzyme active site. In this method, the po- tential mutation sites are grouped into combination of one to three amino acid residues and are visited by saturation mutagenesis.30

In order to further reduce the protein sequence space, a simplified amino acid alphabet at the mutation sites can be used.31 Among the 64 genetic co- dons, 61 represent amino acids, and three are stop codons. Although each codon is specific for only one amino acid (or one stop codon), the genetic code is described as redundant, because a single amino acid may be coded by more than one codon. NNN or NNK codon encodes for all 20 amino acid but this brings redundant codons and causes uneven distribution of amino acids in the constructed library. To solve the problem of bias and to avoid the stop codons, NDT codon set and a ‘‘20 trick’’ can be used.

NDT codon set encodes for 12 amino acids (FLIVYHNDCRSG) which are a mixture of amino acids with polar, nonpolar, positive and negative charged side chains. This is a fair representation of the functionality found in all 20 amino acids. With this degenerate codon set, for 95% coverage, 430 variants need to be sampled compared to 3066 if use NNK codon with a mutation site of two amino acids.32

15 The ‘‘20c-trick’’ set is a mixture of NDT (12 codons), VMA (6 codons), ATG and TGG. It contains only one codon per amino acid for all twenty amino acids. This allows the full saturation mutagenesis at a chosen site and avoids the possibility of premature stop codons, codon bias and rare codons in E. coli.33

Degenerate base designation Actual base coded N A or T or C or G D A or G or T V A or C or G K G or T M A or C

Table 2. Degenerate base designation and corresponding base coded

The combination of above-mentioned methods, ISM, CASTing and degener- ate codon create smaller and more focused enzyme libraries in a knowledge- guided manner. They serve as general tools to efficiently randomize target at positions of interest and significantly reduce the following screening effort, making enzyme directed evolution more time efficient.

4. Molecular dynamics simulation

“All things are made of atoms and everything that living things can do can be understood in terms of the jigglings and wigglings of atoms.” -R. Feynman

Molecular dynamics (MD) simulation is a computational method to simulate the physical movements of atoms and molecules. The trajectories of the at- oms and molecules in the system are determined by numerically solving the Newton’s equations of motion. For the interacting particles, the force be- tween them is defined by interatomic potentials or molecular mechanics force field. The method was originally developed within theoretical physics in late 1950s.34 After years of development, MD simulations have become important tools for studying and understanding the physical basis of the structures and functions of biological macromolecules. The early view of proteins as relatively rigid structures has been gradually replaced by a more dynamic view with internal motions. The resulting effect of these internal motions such as conformational change has been recognized to play essential role in protein functions.35 MD simulations provide many insights into bio- molecule systems and obtain information that is difficult to access through experimental approaches.36

16 The outcome of a MD simulation is primarily controlled by the expression of the total energy, which is collectively referred as force field.37 The force fields used in MD study presented in this thesis are TIP3P38 for water and OPLS All Atom force field for organic and biomolecular system. 37

5. Enzyme dynamics and its potential link to enzyme catalysis

“The role of the enzyme’ s dynamic motions in catalysis is at the center of heated contemporary debates among both theoreticians and experimentalists. Resolving these apparent disputes is of both intellectual and practical im- portance: incorporation of enzyme dynamics could be critical for any calcu- lation of enzymatic function and may have profound implications for struc- ture-based drug design and the design of biomimetic catalysts.” -A. Kohen51

The precious origin(s) of the remarkable rate enhancement power of en- zymes remains an unsolved problem and a great challenge.39 Even though many different theories have been proposed, from Fisher’s lock and key model40 to the induced fit theory by Pauling and Koshland.41,42 The role of weak interactions, such as electrostatic interactions, hydrogen bonding and desolvation has been long recognized as contributing factor to enzyme catal- ysis.39 A folded protein is a highly dynamic entity, best described as a hierar- chy or ensemble of interconverting conformations on all time scale from femtoseconds to minutes.43 The connection between these dynamic effects and enzyme catalysis, however, remains controversial and debated.44 Inves- tigations involving dihydrofolate reductase (DHFR, EC 1.5.5.5), which is a model system in this field, have suggested the link between enzyme dynam- ics and catalysis. Studies using NMR have suggested that particular residues found throughout the whole protein may act like a coupled network which promotes catalysis.45 Distal mutations from the active site can perturb the active site structure and therefore affect the chemical steps.46 These distal mutations are also suggested to have impacts on the network of motions which correlate to the hydride transfer in DHFR.47 The link between coupled motions of this enzyme and its catalysis was also proposed.48 The anti- correlated motions in horse liver alcohol dehydrogenase (HLADH, EC 1.1.1.1) was also suggested as a driving force in enzyme catalysis.49 The coupled motion in protein refers to whether the fluctuation of one residue is related to the fluctuation of another distant residue. In general, the movement of one residue respect to another appears to be random, but in some cases they move in concert with each other, either in the same direction, correlat- ed, or in the opposite direction, anti-correlated.44 Other case studies also suggest similar links between protein dynamics and catalysis.44,50 This topic has been recently reviewed by A. Kohen.51 The study about the link between

17 enzyme dynamics and catalysis has broad implication for protein engineer- ing since the complexity added by mutation may not only affect the static structure but also the dynamics of the enzyme and the change in structure and dynamics may lead to altered catalytic properties.45

6. Aldolases

Unlocking the potential for enzymatic carboligation.

6.1 Aldol reaction The aldol reaction is a whole class of reactions between enolates and car- bonyl compounds. These are very important reactions in organic synthesis, because when the nucleophilic enolate attacks the electrophilic or ketone, a new carbon-carbon bond is formed.52 In biological systems, aldol formation and cleavage reactions play critical roles in central and peripheral sugar metabolism.53 In synthetic organic chemistry, it is the most commonly applied reaction type for the construction of polyhydroxylated compounds with new chiral centers from smaller and simpler precursors.54 These com- pounds are often of biological interest as active precursors from the pharma- ceutical perspective due to the fact that multiple hydroxyl groups improve not only the water solubility but also aid in molecular recognition.55 Among different types of aldol reactions, aldehydes are of particular interest as do- nor molecules, because the product formed are aldehydes themselves and they can be subjected to further addition and lead to more complex building blocks.56

Aldol reactions have been extensively studied from different perspectives including the development of new catalysts for stereocontrolled synthesis. Catalysts that have been developed for asymmetric aldol reaction include small organic molecules,56,57 main group and transition metal coordination complexes with chiral ligands58 and aldolases.59 The synthetic utility of aldol reactions would be greatly enhanced if the formation of enolate anion and the attack of carbonyl group could be chemically selective in a cross aldol reaction to avoid the formation of complex products mixture. Since enzymes exert high level of selectivity that is not usually possible in simple chemical reaction in the laboratory,52 aldolases have emerged as an environmentally friendly alternative with great application potentials in the synthesis of rare or sugar derivatives such as statins, iminocyclitols, epothilones and sialic acids.60,61

18 6.2 Aldolases Aldolases (EC 4.1.2.X) are lyases that typically catalyze the stereoselective addition of a ketone or aldehyde donor to an aldehyde acceptor. Up until now, there are more than 30 known aldolases present in most, if not all, or- ganisms.61 In a cell, C-C bond formation and cleavage are essential steps in both metabolism and catabolism of wide arrange of and keto acids. Aldolases are key enzymes in many fundamental metabolic processes, glycolysis, gluconeogenesis, fructose metabolism, phosphate path- way, to name a few. Aldolases can be divided into two major groups accord- ing to their catalytic mechanisms.62 Class I aldolases activate the donor mol- ecule via a catalytic Lys residue and form a covalent Schiff base intermedi- ate while Class II aldolases require a divalent metal ion (in most of the cases, Zn2+, Fe2+ or Co2+ in some cases) to stabilize the intermediates by polarizing the substrate carbonyl group.61 Class I aldolases are present mainly in higher with some recent found aldolases in prokaryotes.62 Class II al- dolases appear to be only present in prokaryotes and lower eukaryotes. Class I and II aldolases share little sequence similarity and are of different evolu- tionary origins. This is a typical example of convergent evolution.63

6.3 Class I aldolases Class I aldolases have attracted great attention for their potential as biocata- lysts in organic synthesis. Firstly because these aldolases are co-factor free and secondly the stereochemistry at the newly formed stereocenter is con- trolled by the enzyme which makes it easier to predict the structure of the product.61 Class I aldolases typically are more tolerant regarding the struc- ture of the acceptor molecules while relatively more restricted about the donor molecule structure. Therefore, according to their donor dependency Class I aldolases can be further classified into the following five sub- groups.60

1. phosphate (DHAP) dependent aldolases. They use DHAP as donor to produce 2-keto-3,4-dihydroxy products. This group in- cludes the mostly studied aldolase fructose-1,6-bisphosphate aldolase (FBP aldolase or FruA EC 4.1.2.13) which was isolated 75 years ago. It is also the most used aldolase as biocatalyst in organic synthesis due to its commercial availability.64 The application of FBP in organic synthesis is severely limited by its donor dependency. DHAP is expensive and chemically unstable, and phosphorylated end products from the reaction generally require downstream work-ups for dephosphorylation. DHAP and sugar have similar physicochemical properties and this leads to the difficulties in product puri- fication by chromatographic approaches.60,65

19 2. Dihydroxyacetone (DHA) dependent aldolases. Fructose-6-phosphate aldolase (FSA) is the only know enzyme in this family. It is the first enzyme reported to cleave fructose-6-phosphate generating DHA and glyceraldehyde 3-phosphate (G3P). It exists as two isoenzymes.62,66

3. Phosphoenolpyruvate (PEP) and pyruvate dependent aldolases. They use pyruvate or phosphoenolpyruvate to generate 3-deoxy-2-keto acids. They are almost exclusively of microbial origin. PEP and pyruvate- dependent aldolases have recently become of increasing importance, as sialic acid derivatives are used for cancer therapy and as anti-infectives.65

4. Glycine dependent aldolases. They catalyze the reversible condensation of glycine with an aldehyde acceptor to form a β-hydroxy-α-amino acid. This class of enzymes has been used for enzymatic resolution of β-hydroxy- α-amino acids with few examples to catalyze carboligation.61

5. Acetaldehyde dependent aldolases. The only known member of this class is 2-deoxyribose-5-phosphate alsolase (DERA). It catalyzes the for- mation of 2-deoxyribose-5-phosphate (2dR5P) from G3P and acetaldehyde DERA is also capable of catalyzing a tandem aldol reaction using the prod- uct from the first aldol reaction as an acceptor. In addition to acetaldehyde, DERA can utilize propanal, acetone, and fluoroacetone as donor substrates.61

The protein engineering efforts to modify different properties of the above- mentioned aldolases for biocatalysis purposes have been reviewed recently by A. Berry and coworkers 67

7. Aldolases studied in this thesis 7.1 2-Deoxyribose-5-phosphate aldolase: DERA 7.1.1 Structure, function and catalysis E. coli DERA (EC 4.1.2.4) is a 28 kDa soluble protein consisting of 259 amino acid residues in its monomeric form. It is the only member of the acetaldehyde dependent Class I aldolase and it is one of the only two known aldolases that catalyze aldol reactions between two aldehydes.61 Dynamic light-scattering experiments have confirmed that DERA is present in both monomeric and dimeric form in solution.68 In biological systems, DERA reversibly cleaves D-2-deoxyribose-5-phosphate to generate acetaldehyde and G3P as shown in Figure 1. It is a key enzyme involved in the catabolism of the pentose motif in the pentose phosphate pathway.69

20 OH DERA O OPO -2 -2 + 3 O OPO3 O OH OH

Figure 1. DERA catalyzes the reversible cleavage of D-2-deoxyribose-5-phosphate to form D-glyceraldehyde-3-phosphate and acetaldehyde.

The crystal structure of DERA with a reaction intermediate captured was determined at atomic resolution. DERA displays a typical TIM (α/β)8 barrel fold which is a conserved structural feature among the enzymes in Class I aldolase family as shown in Figure 2. DERA utilizes a catalytic lysine and forms the Schiff base intermediate during catalysis. The catalytic residue 167 Lys is located on the β6 strand of the barrel core. The pKa of a lysine residue in aqueous solution is 10.5. In DERA Lys207 on the β7 strand in close proximity, with the distance of 3.1 Å between Nς atoms, is proposed to be 167 involved in the pKa pertubation of Lys via electrostatic interaction which enable Lys167 to remain uncharged and function as the nucleophile to carry out the catalysis at netural pH.70

Lys167

Figure 2. DERA overall structure. The catalytic residue Lys167 is shown in sticks. PDB entry: 1P1X.70

The catalytic mechanism of DERA in the cleavage of 2dR5P is shown in Figure 3. In the aldol reaction, a general base is also critically needed in addition to the nucleophile. This general base is responsible for the deproto- nation of the imine to form the enamine. Intensive mechanistic investigation involving mutagenesis and 1H NMR studies have been done in order to iden- tify this general base in DERA. It was proposed that it is a proton relay sys- tem involving Asp102 and an extensively hydrogen-bonded water molecule rather than a single amino acid residue that acts as the general base in DE-

21 RA’s case.70 The architecture of DERA phosphate binding site based on the crystal structure was analyzed71 and the attempt of structure based mutagenesis in order to expand the substrate specificity of DERA was reported.72 These two points will be discussed more in detail in a later chapter.

OH OH H+ + K167 NH2 -2 -2 O OPO3 K167 N OPO3 OH H2O H OH

H+

G3P K167 NH O -2 2 K167 N O OPO3 H H2O OH Figure 3. The catalytic mechanism of DERA in cleaving 2dR5P.

7.1.2 Application in biocatalysis As the first known -type of aldolase, the potential of DERA as a tool for enzyme mediated carboligation in stereocontrolled manner was first in- vestigated by Chi-Huey Wong and coworkers in the early 1990s.73 They demonstrated that in addition to acetaldehyde, DERA can utilize acetone, fluoroacetone and propionaldehyde as donor substrates. The catalytic effi- ciency however is extremely low towards unphosphorylated acceptor sub- strates.74 DERA is also capable of catalyzing tandem aldol reaction in which the aldehyde product from the first aldol addition is used as acceptor sub- strate in the second reaction.75

In recent years, a new DERA variant which can withstand higher concentra- tion of acetaldehyde and with higher catalytic efficiency has been isolated and used in industrial synthesis of statin intermediate, for example, 6-chloro- 2,4,6-trideoxy-D-erythrohexapyranoside, an important precursor for satin- type cholesterol-lowering drugs such as Lipitor and Crestor.76 This new DE- RA variant was discovered by high throughput screening of genomic librar- ies prepared directly form environmental DNA, therefore the exact organism source of this particular DERA is unknown. DERA has also been immobi- lized on multiwalled carbon nanotubes and the catalytic performance was evaluated in the self-condensation of acetaldehyde and using chloro- acetaldehyde. DERA in immobilized form has been shown to exhibit toler- ance to higher acetaldehyde concentration and the enzyme activity was maintained after five cycles.77 The current limitation on the acceptor sub- strates hampers the application of DERA in organic synthesis despite the fact that enzyme-mediated carboligation is an appealing alternative to the tradi-

22 tional method in catalyzing aldol reaction between two aldehydes in stere- ocontrolled fashion.

7.2 Fructose-6-phosphate aldolase: FSA 7.2.1 Structure and function FSA (EC 4.1.2.-)is the most recently discovered member of the Class I al- dolase family. It exists as isoenzymes: FSAA and FSAB coded by mipB and talC in E. coli, respectively. FSAA has been cloned from E. coli strain K-12, over-expressed and characterized by G. A. Sprenger and coworkers in 2001.66 FSAB was investigated similarly by M. Lemaire and coworkers in 2012. The two isoenzymes have similar substrate profiles but FSAB shows lower catalytic activity and thermostability.67

FSA is the first reported enzyme capable of catalyzing the cleavage of fruc- tose-6-phosphate to generate DHA and G3P (Figure 4). The exact physio- logical function FSA is still unknown and the gene for both isoenzymes are present but not transcribed under normal conditions in E. coli.66

O OH FSA O HO -2 -2 OPO3 O OPO3 + HO OH OH OH OH Figure 4. The cleavage of fructose-6-phosphate catalyzed by FSA. There is no ob- served cleavage of fructose, fructose 1-phosphate, 6-phosphate, and fructose 1,6-bisphosphate by FSA. In the aldol addition reaction direction, dihydroxyacetone is viewed as the standard donor, hydroxyacetone is a weaker donor in comparison. and are weak acceptors. Dihydroxyacetone phosphate does not serve as donor compound.66

The crystal structure of FSA (Figure 5) at 1.93Å resolution was reported by G. Schneider and coworkers.78 The overall structure of FSA is a 250 kDa decamer of ten identical subunits. The single domained subunit of FSA folds into a classic α/β TIM barrel fold, a conserved structural feature among all the Class I aldolases. The catalytic Lys85 is located on the β4 strand. Five identical subunits of 220 amino acid residues first assemble into a pentamer, and two identical pentamers pack on top of each other forming a doughnut shaped decamer of 105 Å in diameter and 65 Å in thickness. An inner chan- nel of 30 Å in diameter passes through the middle of the decamer. The main interaction to maintain the pentameric structure comes first from the hydro- phobic interaction between the C-terminal helix from one subunit and C- terminal side of the barrel of the neighboring subunit. The interaction be- tween a number of helices and N-terminal loops in the interface of adjacent subunits also contribute to the maintaining of the pentameric structure. The major interactions between two pentameric subunits are the interactions be-

23 tween the residues which are close to the border of the inner channel of the decamer. Overall, the decameric structure of FSA is very tightly packed which contributes to FSA’s high thermostability. 78

B

Lys85

A C

Figure 5. The crystal structure of FSA, a 250 kDa homodecamer. (A) Structure of one subunit, catalytic Lys85 is shown in stick. (B) Overall structure, top view. (C) Overall structure, side view. PDB entry: 1L6W.78

7.2.2 Application in organic synthesis Since its discovery, FSA has been attracting increasing attention for applications in organic synthesis due to its remarkable catalytic potential in asymmetric synthesis of polyhydroxylated compounds. Unlike many other aldolases from the Class I aldolase family, FSA does not show any phosphate dependency on its donor molecule.79,80 A mutant with 17-fold improved affinity to dihydroxyacetone has been generated by rational mutagegesis.81 Wild-type FSA also shows activity towards dimerization of glycolaldehyde and it is the only known aldolase in addition to DERA which could catalyze aldol addition between two aldehydes.82 The catalytic efficiency of glycolaldehyde dimerization has been improved up to 1800- fold by directed evolution.83 Therefore, FSA can be considered both as a - and an aldose-aldolase which opens up more opportunities of utilizing FSA in enzyme mediated carboligation. FSA also shows high thermostability which can be highly advantageous from biocatalysis point of view.

24 Present Investigations

The focuses of the presented investigations were modifications of enzyme properties by directed evolution and enzyme mechanism studies combining experiments and computational simulations.

Investigation I, directed evolution of DERA aimed to generate new variants capable of utilizing aromatic acceptor substrates, by mutating amino acid residues involved in the phosphate binding site. Investigation II aimed to study the link between structural changes in the DERA phosphate binding site, enzyme dynamics and the catalytic properties. Investigation III focused on the development of a new screening assay for FSA directed evolution. The assay was to be used to screen for variants with increased aldol activity with phenylacetaldehyde and hydroxyacetone as substrates. Investigation IV focused on the directed evolution of FSA to increase the above-mentioned aldol activity and to explore the possibility to generate enzyme variants ca- pable of utilizing aryl-substituted compounds as both donor and acceptor substrates.

Enzyme directed evolution has become an effective and standard approach in the development of biocatalysts by modifying properties of natural en- zymes. However, the development of biocatalysts through directed evolution is still to some extent a collection of case studies rather than a quantitative approach, like many other well-established engineering sciences. The appli- cation of successful cases to the problems at hand makes the outcome of a directed evolution project unpredictable2 and this leads to, in general, long development time of a new biocatalyst. One important reason for this is the as-yet incomplete knowledge base in the understanding of enzyme sequence- structure-function relationships.8 From this perspective, directed evolution is not only a useful tool to generate enzyme variants with new catalytic proper- ties, but also an important strategy allowing further insights into protein sequence-activity relationship. Ultimately with an increasing number of available sequence-activity pairs as inputs from numerous similar studies, it could become possible to build mathematical models, just as in classical machine learning process, that could be used to predict which mutations might be most fruitful to test in directed evolution in order to significantly shorten the enzyme tailoring process.

25 Present Investigation I: Directed evolution of DERA- an attempt to expand the acceptor substrate scope to non- phosphorylated aromatic compounds.

“Identifying molecules in a protein population that do something interesting is generally the greatest challenge for directed evolution.” -D. Hilvert31

1. Aim

DERA is one of the only two Class I aldolases which can catalyze the aldol reaction between two aldehydes and this property places DERA as an attrac- tive candidate for enzymatic carboligation. The goal of this project was to explore the possibility of expanding the acceptor substrate scope of DERA from the phosphorylated aldehyde, G3P, to aryl-substituted aldehydes.

In previous studies, DERA has been reported to have a 250-fold decrease in aldol reaction activity when using glyceraldehyde as substrate comparing to phosphorylated G3P. There is no detectable wild-type activity when using benzaldehyde as acceptor. In fact, the wild-type DERA aldol activity has never been reported for any aromatic acceptor substrate.74

One reason that motivated us to explore the possibility of evolving DERA accepting aryl-substituted substrates was the importance of aromatic com- pounds as building blocks in organic synthesis. This also fits in with the general goal of our research group which is to test the possibility of develop- ing synthesis pathways catalyzed by multiple enzymes as shown in Figure 6. DERA is the third enzyme on the pathway from an epoxide to an aldehyde and the preceding enzymes have already been engineered and proved to car- ry out the desired reactions.84,85

OH OH StEH1 FucO DERA spontaneously OH O O O O OH OH O OH OH

Figure 6. A reaction series involving three different engineered enzymes.

However, the aldehyde product from the FucO reaction can not yet be pro- duced in large enough quantities for screening purposes, therefore, phenyla- cetaldehyde was chosen as a surrogate substrate for the development of a screening assay and for library screening.

26 2. The design of enzyme libraries 2.1 The phosphate binding sites targeted for saturation mutagenesis A structure guided semi-rational approach based on a high-resolution crystal structure was chosen in designing the DERA library. A previous study has shown that the site directed mutagenesis within the phosphate binding region lead to the isolation of a DERA variant, S238D, with 2.5-fold activity im- provement in retro-aldol cleavage of the non-phosphorylated 2-deoxyribose (2dR).71

The main residues and interactions involved in the phosphate binding sites are: (1), Ser238 and Ser293 in Ser-Ser loop on the β8 (residue 237-240). They interact with substrate phosphate group via direct side chain interaction and via a bridging water molecule, respectively. This Ser-Ser loop is located at the N-terminus of the last α-helix. It is not a typical GXGXXG phosphate binding motif and is not conserved in the Class I adolase family. A possible conformational change upon substrate binding in this region has also been proposed. (2), The countercharge from Lys172 and the positive dipole mo- ment from the N-terminus of the last helix can also interact with the nega- tively charged phosphate group in 2dR5P. (3), The backbone amide group of Ser238 and Gly205 via a bridging water molecule, the backbone amide of Gly171, Gly204, Val206 and Ser239 are also involved in the substrate phosphate binding.72

In order to alter the acceptor substrate preference, the main targeted residues for mutagenesis are the ones interacting with the phosphate group in 2dR5P. Initially six amino acid residues in three groups around the phosphate bind- ing sites were identified (Figure 7).

2.2 Limited codon degeneracy In order to further limit the library size, the NDT codon set encoding for twelve amino acids (FLIVYHNDCRSG) with different side chain properties was applied in mutagenesis for constructing library A and B. A mixture of mutagenesis primer containing NDT, VMA, ATG, TGG, codons33 were used in the construction of library C. This later codon set allows us to introduce all twenty amino acids into the mutation sites without the presence of codon bias and stop codon.

27 Ser238

Lys167

Ser239

HDP Lys172

Ala203

Gly204

Arg207

Figure 7. Active-site residues mutated to generate libraries of DERA aiming to alter the acceptor substrate preference. The chosen residues were divided into three groups: Library A (blue), Ser238 and Ser239. The main interaction involved is hydro- gen bonding. Library B (gold), Lys172 and Arg207. These two positively charged residues provide counter charge to the negatively charged phosphate group and their removal is expected to relax the phosphate dependency. Library C (green): Ala203 and Gly204. Main interactions from these two residues involve backbone amide.

3. Screening assay using thin layer chromatography and 14C labeled acetaldehyde. There are two main challenges in developing a screening assay for detecting enzyme activity in this particular aldol reaction. Firstly, DERA is a highly specific enzyme with no reported promiscuity towards non-phosphorylated acceptor substrates, any enzyme activity that could possibly emerge from the enzyme libraries would be expected to be extremely low, therefore the screening assay must be highly sensitive. Secondly, both reactants and prod- ucts from this aldol reaction are aldehydes which makes it difficult to devel- op an assay based on different reactivity of the chemical functional groups. Therefore, a screening assay based on changes in molecular structure was developed and tested for screening of library A. If any enzyme variant could catalyze the proposed aldol reaction (Figure 8), 14C-labeled acetaldehyde would become incorporated into the product. The resulting reaction mixture were separated by thin layer chromatography (TLC) and subsequently visu- alized by autoradiography.

28 O O * DERA * + * O * OH

phenylacetaldehyde 3-hydroxy-4-phenylbutanal

Figure 8. Screening reaction for detection of DERA catalyzed addition between phenylacetaldehyde (acceptor) and acetaldehyde (donor). The asterisks indicate positions labeled with 14C.

There are two assumptions concerning this screening approach. Firstly, we assumed that the spot closer to the solvent front is the aldol product between phenylacetaldehyde and acetaldehyde. Because the suspected product from aldol condensation is 3-hydroxy-4-phenylbutanal which contains both a hy- droxyl group and an aldehyde group, therefore its polarity will be between phenylacetaldehyde and 1-phenyl-1,2-ethanediol. Based on TLC result by using phenylacetaldehyde and 1-phenyl-1,2-ethanediol as reference com- pounds and other control experiments, the spot closer to the solvent front was identified to be corresponding to the desired aldol product. Secondly, the intensity of a signal spot on the film is proportional to the amount incor- porated 14C-labeled acetaldehyde. Based on these two assumptions, the in- tensity of each signal spot resulting from reaction catalyzed by different enzyme variants in the library were analyzed by pixel counting software ImageJ (http://imagej.nih.gov/ij/). The intensity of each signal spot was normalized in order to minimize the random error and enzyme variants that had been present in reactions resulting in increased intensities were scored as hits. 72 hits were identified from the screening and their sequence were de- termined by DNA sequencing.

Figure 9 shows an example autoradiograph of TLC-separated reaction mix- tures from aldol reaction between 14C-labeled acetaldehyde and phenyla- cetaldehyde catalyzed by DERA library A variants. The intensity in radioac- tivity of the fastest moving product (at the top of each panel) was used to distinguish hits from background applying a threshold of >2-fold increase over wild-type DERA.

29

Figure 9. Autoradiograph of TLC-separated reaction mixtures of 14C-labeled acetal- dehyde and phenylacetaldehyde catalyzed by DERA library A variants. The concen- tration of acetaldehyde (1,2-14C) was 0.6 mCi/ml (60 mM) with a specific activity of 10 mCi/mmol. The screening was carried out in 96-well plate format. A typical screening reaction was performed as follows: 45 µl cell lysate from each well was mixed with 4 µl of 0.05 M acetaldehyde, 2 µl of 1.3 M phenyl acetaldehyde (dis- solved in acetonitrile), 10 µl 0.06 mCi/mL 14C labeled acetaldehyde and 39 µl of 50 mM triethanolamine (TEA) buffer, pH 8.0. The reaction mixture was incubated with shaking at 100 rpm, 30 °C for 4 hours. Control reactions included lysates from wild- type DERA (WT), alcohol dehydrogenase A (ADHA) and protein extraction buffer only (BLANK). 10 µl from each reaction mixture was spotted on silica gel on TLC AL foils. The mobile phase for TLC was ethylacetate : pentane, 1:1 (v/v). Air-dried TLC plates were then positioned to face the emulsion side of Kodak BioMax MR films. The films were exposed overnight in cassettes in dark, and developed accord- ing to the instructions from the manufacturer. The developed films were then scanned for quantification of products.

4. Screening results and hits sequence analysis After the first round of mutagenesis and library screening, 72 hits were sent for DNA sequencing. As can bee seen from the deduced amino acid compo- sition chart (Figure 10) at mutated sites of DERA library A hits. Ile and to some extent Ser and Val are over-represented at position 238 in the selected variants. Leu, Ile and His are over-represented at position 239. It is notewor- thy that Ser is counter-selected for at position 239 in these DERA variants.

30

Figure 10. Deduced amino acid composition chart at positions 238 (black bars) and 239 (white bars) of DERA library A hits. The bar height is normalized for the ex- pected occurrence of a certain residue as judged from sequencing of clones in the unselected (naive) library.

5. Conclusion and discussion Eight variants after the sequence analysis were picked for re-screening using the same approach. S238I/S239I showed the best improvement compared to wild-type DERA. Therefore, efforts were made in order to isolate the formed product from the aldol reaction between phenylacetaldehyde and acetalde- hyde by using this putative hit. However, the attempts were unsuccessful and the assumed aldol product was not isolated despite many efforts. Rea- sons for why validating the screening assay failed, could either be that there was no DERA variant with high enough activity generated after a single round of mutagenesis and selection, or due to the high assay sensitivity and the unstable chemical feature of the expected product, it was too difficult to isolate and validate its presence by other methods. In a typical directed evo- lution project, desired beneficial improvements are accumulated during the selection process until the goal is reached. The success of such an experi- ment depends on two factors, the feasibility of target function and whether measurable improvements can be accumulated to reach the goal.12 DERA’s lack of basal activity with phenylacetaldehyde as acceptor substrate is a chal- lenging staring point for directed evolution since there is no enzyme promis- cuity which normally provides a good jump-off point for directed evolu- tion.86 It has also been illustrated that in most failed cases, the most probable reason is that the selection bar has been set too high, the fitness improvement required to pass the screening was not possible with single set of mutations. The simple adaptive walk of enzyme variants in such case will fail screen- ing, this happens more often when searching for a new function which the parent enzyme does not exihibit.12,13 From this point of view, the chosen

31 substrate might have been too ambitious, a substrate which is more structur- ally similar to G3P might serve as a better target substrate with intermediate challenge. The expected product from the reaction, 3-hydroxy-4- phenylbutanal is chemically unstable and this makes the detection of this compound particularly difficult after enzymatic reaction in aqueous solution. Efforts have been made, without any success, to chemically synthesize this compound and to validate the aldol activity of putative hit S238I/S239I from the retro-aldol direction. Therefore, the screening assay was not validated and the screening of remaining two libraries was terminated due to the lack of clear evidence that the enzyme hit from library A screening can indeed catalyze the desired reaction. However, this putative hit which behaved dif- ferently in screening was used for the enzyme mechanism study focusing on understanding the link between phosphate binding and catalytic activity of DERA.

32 Present Investigation II: Enzyme kinetic study and molecular dynamics simulation on wild-type DERA and related mutants.

“Serendipity: the occurrence and development of events by chance in a bene- ficial way.” -Oxford dictionary

1. Aim As mentioned in the introduction part, the application of DERA in organic synthesis has been long-time hampered greatly by its strict phosphate de- pendency on the acceptor substrate despite it is one of the only two known aldolases catalyzing aldol reaction between two aldehydes.71 This was illus- trated by the drastic loss of activity when D-glyceraldehyde was used as acceptor compared to its phosphorylated counter part, G3P, in aldol reac- tion73. In this project, the primary goal was to investigate why the substrate phosphate group is so important for efficient DERA catalysis, how the struc- tural change in a non-canonical phosphate binding region consisting of two Ser238 and Ser239 could effect the catalytic properties of DERA, and if there were any other factors that could possibly impact the catalysis. A combina- tion of steady kinetic analysis and molecular dynamics (MD) simulation were used for investigation involving wild-type DERA and four mutants.

2. Choice of enzymes The enzymes involved in this studies were wild-type DERA, single-mutants S238P and S239P, double-mutants S238P/S239P and S238I/S239I. The pro- line substitutions could have dual effects. Firstly, the peptide backbone posi- tion will be altered which may affect both direct side chain and backbone interactions between these residues and the substrate phosphate group. Sec- ondly, Ser238 is placed at the beginning of α-helix 8 and the proline replace- ment is expected to slightly affect the orientation of this helix and thereby decrease the proposed stabilizing effect from the N-terminus of α-helix 8. The S238I/S239I mutant was the putative hit from the previously described directed evolution project.

33 Ser238 Lys167

Ser239

HDP

Figure 11. The structure of DERA non-canonical phosphate binding site consisting of the Ser238-Ser239 loop. The structure shows the reaction intermediate, the Schiff base formed between Lys167 and HDP. HDP is the open chain form of 2- deoxyribo- -5-phosphate.70 PDB entry: 1JCJ.70

3. Role of phosphate binding in DERA Aldol formation and cleavage reactions play very important roles in central and peripheral sugar metabolism.53 In vivo, many intermediates in sugar metabolism intermediates pathway are phosphorylated, for example, all nine intermediates are phosphorylated in glycolysis. In general, the phosphate groups in these intermediates have three functions. Firstly, the cell plasma membrane lacks the transporters for phosphorylated sugars, so no extra en- ergy is needed to retain those intermediates inside the cell. Secondly, phos- phoryl groups are essential parts in the enzymatic conservation of metabolic energy. Last but not the least is the binding energy from the binding of the phosphate groups to the active sites of enzymes lowers the activation energy and increases the specificity of the enzyme reaction.87

In the case of DERA, the phosphate dependency is evolutionarily decided by the physiological function of DERA as a key enzyme involved in pentose phosphate pathway. The most trivial function of the substrate phosphate group is to increase the substrate affinity throught interaction between negatively charged phosphate group and positively charged side chains in DERA. This acts as a anchor and is particular important for a small substrate, like G3P in aldol addition reaction. Secondly, the binding of phos- phate group helps to position the carbonyl group of the acceptor substrate for nucleophilic attack by activated acetaldehyde in the aldol reaction.70,71 Thirdly, phosphate group is essential in catalyzing ring opening reaction of 2dR5P and generate the substrate in the correct form. The only crystal struc- ture with DERA and substrate co-crystallized in the same structure reveals the substrate 2dR5P in a straight chain aldehyde form.70 In solution the pre-

34 dominant form of 2dR is furanose, the same would be expected for 2dR5P. The free aldehyde concentration of 2dR in solution is lower than 0.2% per- cent of the total 2dR.88 The very low percentage of the aldehyde form will be further in equilibrium with hydrated gem-diol form. The ring opening reac- tion of furanose is estimated to be below 1 s-1.89 However, the ring opening rate constants of phosphorylated and are significantly higher than those of their non-phosphorylated analogues. Moreover, the rate con- stants are independent on the concentration of aldolse phosphate over 6-fold range indicating the intramolecular phosphate catalysis predominated inter- molecular phosphate catalysis. The efficiency of the phosphate catalysis depends on the ring structure and phosphate ionization state, so the phos- phorylation of simple sugar enhances the interconversion rate between dif- ferent sugar forms.90

Another important role that the phosphate group could play in catalysis in case of other enzymes is illustrated by extensive works by J. Richard and coworkers. It is proposed that the remote interaction between protein and phosphodianion can be crucial for transition state stabilization by promoting and stabilizing active high energy conformation of the enzyme.91-96

In conclusion, the substrate phosphate group enhances the ring opening rate of 2dR5P and convertes it into open chain aldehyde form and aid in position- ing substrate and contribute to the affinity. These factors could explain the strict phosphate dependency of DERA. However, they do not explain the lack of activity with non-phosphorylated linear aldehydes.

4. Enzyme steady state kinetic analysis The steady state kinetic parameters of wild-type DERA and four mutants 71 were determined by a previously published assay. To determine kcat and KM and kcat/KM, the raw data was fitted to the Michaelis-Menten equation by MMFIT and RFFIT respectively in SIMFIT (http://www.simfit.org).

The critical role of Ser238 in catalysis can be seen from the drastic loss of activity of the S238P and the S238P/S239P double mutant (Table 3). In con- trast, the main effect of mutation in Ser239 position results in minor decrease 239 in kcat and 30 –fold increase in KM, therefore, Ser is less crucial for cataly- 238 sis as compared to Ser . The precise origin of this increase in KM is compli- cated in this multistep reaction. The key steps involved are formation and decay of Michaelis complex, formation of Schiff base intermediate and for- mation of product. Destabilization of any of these enzyme substrate com- plexes would lead to an increase in KM value. Furthermore, the pKa of cata- lytic Lys167 is perturbed in order to function as nucleophile at physiological pH. If the mutation at Ser239 position affects the acidity of Lys167, it could be

35 suspected to raise the KM, however the similar pH profile (Figure 12) of S239P and the wild-type DERA suggests this is unlikely the reason.

-1 -1 -1 Enzyme kcat (s ) kcat/KM (s mM ) KM (mM)

wild-type 15±0.4 130±7 0.096±0.008 S238P - <0.0001 - S239P 11±1 3.6±0.06 3.0±0.6 S238P/S239P - <0.0001 - S238I/S239I 1.4±0.2 0.15±0.006 9.6±1.5

Table 3. The kinetic parameters for aldol cleavage reaction of 2dR5P catalyzed by DERA and mutants. The measurements were assayed in 50 mM TEA buffer, pH 8.0, at 30 °C with increasing concentrations of 2dR5P and in the presence of saturating concentration of NADH (0.5 mM) and excess amount (0.14 units) of α- glycerophosphate dehydrogenase-triosephosphate isomerase enzyme mix. KM value is calculated from the total concentration of 2dR5P.

The temperature dependency of catalytic rates illustrates the effect of the S239P mutation on thermodynamic parameters of the aldol cleavage of 2dR5P (Table 4). The thermodynamic parameters in the presence of wild- type DERA or the S239P mutant, were determined at six different tempera- tures (15 °C, 20 °C, 25 °C, 30 °C, 38 °C and 45 °C). The natural logarithms of the ratios of the determined rates were plotted as functions of the inversed T, shown as the modified Eyring plot in Figure 13. This allows the extrac- tion of the relative changes in transition state enthalpy and entropy, caused by the S239P mutation. The thermodynamic date suggests that the main neg- ative effect caused by S239P mutation on catalysis is primarily a relative loss of activation entropy. The relative effect on the TS enthalpies are favor- able, and of similar magnitude for both EŸ2dR5P /E-2dR5P→E+products (kcat) and E+2dR5P→ E+products reaction (kcat/KM). The observation that the mutation results in relative decrease in activation enthalpies is counter- intuitive since one might assume that the phosphate interaction in the wild- type enzyme would contribute to the TS stabilization.

36

Figure 12. Figure 13.

Figure 12. pH dependence of rates, kcat (squares) and kcat/KM (circles) in the retro- aldol cleavage of 2dR5P catalyzed by wild-type DERA (filled symbols), the S239P mutant (half-filled symbols) and the S238I/S239I mutant (unfilled symbols). pH 7 and 8 measurements were carried out in 50 mM TEA buffer and pH 9 measurements were carried out in 50 mM sodium-borate buffer.

Figure 13. Modified Eyring plot of the effects by the S239P mutation on catalytic rates. The ratios of kcat (circles) and kcat/KM (squares) plotted as functions of tem- perature. The slopes are described by -""H‡/RT and the intercepts by ""S‡/R. The relative differences in activation enthalpies and entropies are given in Table 4.

!!H‡a T!!S‡a,b !!G‡c

(kJ/mol) (kJ/mol) (kJ/mol)

kcat -12±2 -13±2 0.78±0.2

kcat/KM -17±4 -26±4 9.0±0.1

Table 4. Effects by S239P mutation on thermodynamic parameters for aldol cleav- age of 2dR5P. a. Standard errors from linear least square regression. b. Calculated at 30 °C. c. from -RT ln (rateS239P/ratewt). "" indicates the relative difference between wild-type and the S239P.

37 5. Molecular dynamics simulation In order to investigate the effect on protein dynamics of mutations at Ser238 and Ser239 site, MD simulations were performed on wild-type DERA and four mutants. The molecular dynamics simulations were performed using Q simulation package with the OPLS-AA force field.37 Wild-type DERA struc- ture at 0.99 Å resolution (PDB code, 1P1X72 ) was used as the starting point for all simulations. The mutations were introduced into the structure by UCSF Chimera software package.97All residues besides the Schiff base were protonated according to the their preferred protonation states at physiological pH. The overall net charge of the system is −5 which is neutralized by Na+ counterions. The system was solvated by a 35 Å sphere of TIP3P water mol- ecules with 10 kcal mol-1Å-2 positional restrain on the 15 % outer layer of the solvent for stabilization during the simulation.98,99 The system was heated from 0 K to 300 K by using the Berendsen thermostat99 over 1 ns with 1 fs time steps. There was an initial positional restraint of 200 kcal mol-1Å-2 on all heavy atoms in order to allow protons to equilibrate to their optimal posi- tions and remove initial bad contacts in the crystal structure. The restraint was gradually dropped during the simulation as the structure was allowed to relax. Another 3 kcal mol-1Å-2 positional restraint was put on the Na+ coun- terions to keep them inside the simulation sphere. Three replicas of 100 ns production run after initial equilibrium with different initial velocities were performed on each mutant. The last 100 ns of each trajectory was used for further analysis by Gromacs101 and the dynamic cross correlation analysis was performed by using Bio3D package.102

The root mean square fluctuations plot (RMSF) presented in Figure 14 is an average over three MD trajectories for wild-type DERA and the S239P mu- tant. Each peak in the plot corresponds to a different helix color-coded in the same figure. The plot reflects the breathing motion and flexibility of the enzyme. The mutations clearly dampen the flexibility of the enzyme, in par- ticular, the flexibility in the region including residue 25-75, 170-180 and 220-230 was reduced.

38

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!#!!

Figure 14. (A) Average root mean square fluctuations. The value was calculated from the average of three MD trajectories for DERA wild-type and the S239P mu- tant. (B) The corresponding structure elements in the wild-type DERA structure.

Figure15. Dynamic cross correlation maps (DCCM) of five DERA variants. (A) wild-type DERA. (B) S239P. (C) S238I/S239I. (D) S238P/S239P. (E) S238P. The blue color and pink color indicate the degree of correlated and anti-correlated motions respectively.

39 A comparison of dynamic cross correlation maps (DCCM) of the same en- zyme pair reveals a substantial loss in anti-correlated motions in S239P (Figure 15). Furthermore, the loss of anti-correlated motions follows a clear trend, the one with highest kcat/KM value has the highest level of anti- correlated motions, and vice versa. This observation agrees with a previous work by T. C. Bruice which proposed anti-correlated motions is linked to enzyme function.49 Therefore we suggested the introduction of mutations at Ser238 and Ser239 position effects the enzyme dynamics and reduced its flexi- bility and level of anti-correlated motion and which leads to fewer catalyti- cally active conformations and ultimately a decrease in kcat/KM value.

6. Conclusion In this project, we combined experimental and computational efforts to in- vestigate the main role of the phosphate group in 2dR5P and the conse- quences of mutating two serine residues in the phosphate binding region of DERA. The phosphate group in 2dR5P enhances the ring opening rate, facil- itates positioning the substrate and increases the substrate affinity. We also illustrated that the local structure in Ser238 and Ser239 sites appears to affect the global dynamics of the enzyme, primarily by decreasing the level of anti- correlated motions within the enzyme which could also be linked to the loss of enzyme activity. These local mutations also affect the motional flexibility in regions that are distant from the mutation sites. The thermodynamic anal- ysis of wild-type DERA and S239P mutants suggested that the observed changes in activity are caused by a relative decrease in the activation entropy upon mutation of this enzyme. This investigation highlights the connection between enzyme dynamics and catalytic activity.

40 Present Investigation III: Development of screening assay for FSA variants with improved activity towards phenylacetaldehyde

“You get what you screen for” - F. Arnold13

1. Aim Since its discovery, FSA has attracted increasing attention as promising bio- catalyst for stereocontrolled carboligation generating polyhydroxylated compounds due to its relaxed phosphate dependency on both acceptor and donor substrates and high thermostability.79 Wild-type FSA has been report- ed to have low but measurable activity catalyzing the aldol reaction between phenylacetaldehyde and hydroxyacetone as shown in Figure 16.80 This basal activity provides a good starting point for improving the catalytic efficiency of FSA towards these two substrates via directed evolution.

OH O O FSA + HO

OH O Figure 16. FSA catalyzed aldol reaction between phenylacetaldehyde and hydroxy- acetone. The formed aldol product is 3,4-dihydroxy-5-phenylpentane-2-one (DHPP)

The directed evolution effort concerning this goal will be discussed in the next chapter. This investigation mainly focused on the critical step in the directed evolution project: developing an efficient, medium-to-high through- put assay for screening for FSA variants that display improved activity with these two substrates.

2. Development of a coupled enzyme assay for detecting the formation of phenylacetaldehyde A good screening assay should provide a clear link between the enzyme activity of interest and a recordable signal. FSA catalysis does not involve any co-factor and there is no spectroscopic change during the catalysis that can be directly monitored. A typical strategy to solve this type of problems is to couple the reaction of interest to another reaction that gives off spectro- scopic signal during reaction for detection. A classic example is the coupled enzyme assay consisting of triose phosphate isomerase and glycerol-3- phosphate dehydrogenase. It has been used to measure activities of aldolas- es, for instance DERA and fructose 1,6-bisphosphate aldolase.71

41 An alcohol dehydrogenase mutant from E. coli 1,2-propanediol oxidoreduc- tase (FucO), namely FucO DA1472, was isolated previously from another directed evolution project by Dr. Cecilia Blikstad in our group.85 This mutant contains two point mutations, N151G and L259V. These mutations made this variant highly active in reducing phenylacetaldehyde and virtually inactive with its natural substrate 2-(S)-1,2-propanediol.103 Furthermore, FucO DA1472 does not utilize secondary alcohols or ketones as substrates,104 there- fore it will not use the aldol product as substrate in either reduction or oxida- tion direction. All the mentioned features make FucO DA1472 a very good reporting enzyme in a coupled enzyme assay monitoring the formation of phenylacetaldehyde during the aldol cleavage of substrate 3,4-dihydroxy-5- phenylpentane-2-one (DHPP) catalyzed by FSA. In this coupled assay, the change in co-factor NADH absorbance at 340 nm provides a quantitative probe with adequate sensitivity for detecting the rate of on-going reaction. The detection mechanism of the coupled assay is shown in Figure 17. FSA catalyzes the reversible aldol reaction between phenylacetaldehyde and hy- droxyacetone, any enzyme variants with improved activity in retro-aldol cleavage direction will also have improved activity in aldol reaction direction.

O HO NADH+H+NAD+ OH O OH FSA FucO DA1472 OH O Figure 17. The detection mechanism of FSA-FucO coupled assay for detecting the formation of phenylacetaldehyde during cleavage reaction of substrate DHPP (syn). The absorption change of NADH can be spectrophotometrically monitored at 340 nm.

3. Steady state kinetic study of FSA in aldol cleavage reaction In order to validate the coupled enzyme assay and gather more information about wild-type activity of FSA in aldol cleavage of DHPP, steady state kinetic measurements were performed with purified wild-type FSA and chemically synthesized DHPP, a mixture of syn and anti.

OH OH O O SmI2 HO (0.1M in THF) + OH O + OH O THF,argon,RT syn 61% anti 39% Figure 18. Chemical synthesis of 3,4-dihydroxy-5-phenylpentane-2-one (DHPP). 105

42 The steady state kinetics were assayed in 50mM TEA buffer, pH 8, at 30 °C in the presence of 0.2 mM NADH, 88 nM FucO DA 1472, 0.83 μM FSA, with substrate range between 0.25 mM and 2 mM. The concentrations of NADH and FucO were chosen as such to ensure the cleavage reaction rate has zero dependence on NADH and FucO and give a reasonable initial ab- sorbance. The raw date was then fitted to the Michaelis-Menten equation in for determination of kinetic parameters. A typical reaction track is shown in Figure 19(A). The kinetic parameters of related enzymes determined by this assay are presented in Table 5. The enzyme kinetic parameters of cleavage reaction of F6P were determined according to a previously published assay. 78

-1 -1 -1 Enzyme Substrate kcat (s ) KM (mM) kcat/KM (s mM )

FSA F6Pa 0.91±0.02 0.56±0.07 1.6±0.2 FSA DHPPb 0.10±0.01 1.2±0.3 0.090±0.02 Phenylace- DA1472 21±1 4.4±0.7 4.8±0.5 taldehydec

Table 5. Steady state kinetic parameters of used enzymes. a. Assayed in 50 mM TEA buffer, pH 8, at 30 °C. b. Based on total concentration of DHPP (61% syn). c. Data from reference 85.

4. Test screening on 96-well plate The assay was further adapted to 96-well format in order to validate the de- tecting sensitivity for desired FSA activity in crude bacterial lysates. The real time kinetics was monitored by a SpectraMAX190 plate reader at 30 °C. In order the to investigate the detection sensitivity of this coupled assay in cell lysate, different amount of purified FSA, from 8.3 to 330 pmol, was added into the lysate from E. coli BL21-AI cells containing plasmid pGT7YNR064c-5H encoding for an unrelated yeast protein.106 The raw ki- netic date was processed with programme SoftMax Pro 5.2. The detected reaction velocities displayed a linear relationship with the amount of added FSA as shown in Figure 19(B).

43 1.35 30

25 1.30 nm 20 340 A

1.25 (mAbs/min) v 15

1.20 10 0 40 80 120 160 200 0 0.2 0.4 0.6 0.8 1.0 1.2 time (s) [FSA in lysate] (µM) A B

Figure 19. (A). Real-time reaction tracks of the coupled assay. The reaction mixture contains 0.2 mM NADH, 1 mM DHPP, 88 nM FucO DA1472, no FSA (dashed line) or 0.83 μM FSA (solid line) in 50 mM TEA buffer, pH 8, at 30 °C. (B). Recorded reaction velocities in 96-well formats in the presence of 0.2 mM NADH, 2 mM DHPP, 0.29 µM FucO, E. coli lysate and purified FSA ranging from 8.3 to 330 pmol per well, in 50 mM TEA buffer, pH 8.0 at 30 °C The solid line is the best-fit linear regression to the data points generated by SIMFIT. (www.simfit.org.uk)

The increase in the observed velocity over the “background” velocities is approximately 10 % in the presence of 28 nM (8.3 pmol) of added wild-type FSA. This level of signal increase can be reproduced and can be assumed as the lower detection limit of the assay. In other words, this assay is able to detect as low as 8.3 pmol of FSA with activity of 0.09 s-1 mM-1. Therefore, any enzyme variants with recorded velocity with 1 mAU/min or more differ- ence over the background (~9 mAU/min under assay condition) will be con- sidered hits during screening.

The same assay was tested by directly using lysates from cells carrying the pGT7FSA-5H plasmid. However they did not show significantly better aldol cleavage activity than control cells and therefore can not be distinguished from the control samples on 96-well plate. SDS-PAGE was used to check the expression level of FSA on 96-well plate. Estimation from Coomassie Brilliant Blue-R250 stained SDS-PAGE showed that the amount of wild- type FSA presented in BL21-AI cell lysate on 96-well plate after standard expression/lysation protocol was approximately 4 pmol per well. This sug- gests the low basal activity of wild-type FSA and the very limited enzyme expression level on 96-well plate are two main factors leading to the inabil- ity to detect a clear signal in these wells. This also suggests that either a threefold increase in the expression level or a three-fold increase in enzyme activity can be robustly detected.

44 5. Conclusion In conclusion, in this project a new coupled enzyme screening assay based on FucO DA1472, an engineered alcohol dehydrogenase which reduces phe- nylacetaldehyde by using NADH as co-factor, was developed and adapted to 96-well plate format with crude bacterial lysate. The steady state kinetic parameters of wild-type FSA towards substrate DHPP was determined by this assay and the detection limit of the assay was estimated to be able to detect as low as 8 pmol FSA with with activity of 0.09 s-1 mM-1 in crude E. coli lysate and should therefore provides adequate sensitivity for a directed evolution protocol. Furthermore, the relatively wide linear range of aldol cleavage velocity versus added amount of FSA also shows that the assay can provide quantitative real time kinetic data over at least two orders of magni- tude. The coupled assay does not involve the other product from the aldol cleavage which corresponds to the donor molecule in the aldol reaction di- rection for detection, therefore this assay is ready to be used in screening for FSA variants displaying novel activity towards alternative donors with phe- nylacetaldehyde as acceptor substrate.

45 Present Investigation IV: Directed evolution of FSA

“Evolution, like politics, is an art of the possible.” - F. A. Kondrashov

1. Aim As discussed in the previous part, FSA has attracted great attention as bio- catalyst for aldol reaction. In order to broaden its application in organic syn- thesis, efforts have been made to expand the substrate scope of FSA.81,83 The long-term goal of our directed evolution effort is to explore the possibility to generate FSA variants that are efficient with both aryl-substituted donor and acceptor substrates. Activity towards aromatic donor substrates has never been reported in the case of FSA. In a directed evolution project, as a general rule, it is important to not demand too much improvement in a single round of mutagenesis and the screen hurdle should not be higher than what can be achieved by a single set of mutations.16 Therefore we broke down the prob- lem into two intermediate challenges. The first goal is to screen for FSA variants with increased catalytic activity for aldol reaction between phenyla- cetaldehyde and hydroxyacetone using wild-type FSA as starting point. The second goal is to screen for variants which could utilize phenylacetaldehyde as acceptor and other aryl-substituted ketones as donor. Ideally, the template for this library will be the hits from the first directed evolution with im- proved activity towards phenylacetaldehyde. The basal activity for aldol reaction between phenylacetaldehyde and hydroxyacetone reported in 200980 provides a good jump-off point for possible improvement via directed evolu- tion. Furthermore, FSA is highly thermostable. Thermostability is correlated with mutational tolerance which refers to the ability of the enzyme to with- stand the degree of mutation without losing its structural integrity. From this point of view, FSA is expected to be able to withstand certain number of mutations and still remain reasonably stable.107,108 The current investigation focuses on the first generation of FSA library.

2. Library design The design of FSA library is a semi-rational approach based on previously published crystal structure by G. Schneider and coworkers (PDB code 1L6W).78 Compared to its native acceptor substrate G3P, phenylacetaldehyde is aromatic, less polar and more bulky. Therefore, we focused around the substrate binding site near the catalytic Lys85. Arg134 and Ser166 were targeted for mutagenesis due to the fact that they are residues with polar and relative big side chains and these residues are expected to be in close proximity with the possible binding site of substrate without direct involvement in the catal- ysis.

46 GOL

Ser166 Lys85

134 Arg Figure 20. FSA library A, consisting of variants mutated at Arg134 and Ser166. The aim was to improve the catalytic activity for the aldol reaction between phenyla- cetaldehyde and hydroxyacetone.

Therefore an FSA library involving saturation mutagenesis at these two posi- tions were constructed by using overlapping PCRs with mutagenesis pri- mers. Full random mutagenesis at these two positions introducing all 20 amino acids was carried out according to a previously published method.33 The constructed enzyme library contained 400 possible combinations, in order to achieve 95% coverage, 1260 variants from the library were screened.

3. Library screening and identification of hits The first round of screening was performed according to the previously de- scribed assay in 96-well plate format. In summary, the screening condition in each well is as following, 0.2 mM NADH, 2 mM substrate, 3,4-dihydroxy-5- phenylpentane-2-one, 0.44 µM FucO, cell lysate 75 µL in 50mM TEA buff- er, pH 8. The kinetic measurements were carried out at 30 °C for 25 minutes and the reaction velocities were calculated by SoftMax Pro 5.2 from the real time kinetic tracks. 28 hits were picked and sequenced out of which 16 se- quence results were relevant. These 16 library hits were then cultivated at 2 ml scale and subjected to re-screening and SDS-PAGE to check the protein expression level. After re-screening, the three most active variants were ex- pressed at 500 mL scale and purified for enzyme kinetic studies.

As observed from sequence analysis, serine was over-represented at position 134 and there was no apparent preference at position 166. From the SDS- PAGE analysis (Figure 21), it is noticeable that the expression level of li- brary hits varied. Most of the FSA variants showed lower expression levels compared to the wild-type FSA.

47 No. Postion 134 Position 166 Wild-type FSA Arg Ser 1 Met Val 2 Met Lys 3 Trp Asn

4/13 Ser Asn H3 5 Phe Ala

6 Gln Asp H1 7 Lys Leu 8 Asn Ala

9 Ser Lys H2* 10 Cys Glu 11 Pro Val 12 Ser Ala 14 Ile Tyr 15 Ala Glu 16 Ser Arg

Table 6. Sequence results from first round of FSA library A screening. * H2 has one extra mutation, I135T.

.)/0 ((/0 &'/0 %0/0

$0/#

#&/&

!""""#"""""$""""%""""&""""'""""("""")""""*""""+,-

.""#0"""##"""#$"""!""+,-""#%"""#&"""#'""#( Figure 21. SDS-PAGE results of 16 library hits. FSA indicates wild-type FSA, its expression level was used as a reference to estimate the expression level of library mutants. Samples were from 2 mL expression culture. The arrows indicate the bands corresponding to FSA variants.

48 4. Steady state kinetic studies on first generation library hits The steady state kinetic parameters of wild-type FSA and three library hits were determined using the same assay presented for screening. The three studied library hits are: H1, R134Q S166D; H2, R134S I135T S166K; H3, R134S S166N. One extra mutation presents in H2 is most probably due to a cloning artifact.

The steady state kinetics was assayed in 50 mM TEA buffer, pH 8 at 30 °C with presence of 0.2 mM NADH, 0.67 µM FucO DA1472 with the substrate range between 0.25 mM and 1 mM. The raw kinetic data were processed by SIMFIT (www.Simfit.org). The kinetic parameters of wild-type FSA and two library hits are listed in Table 7.

-1 -1 -1 Variant kcat (s ) KM (mM) kcat/KM (s mM ) Improvement in kcat/KM

WT ≥ 0.27±0.02* ≥ 0.50* 0.26±0.03 1 H2 1.2±0.3 0.51±0.33 2.4±0.9 9.2 H3 0.076±0.01 0.28±0.14 0.28±0.1 1.1

Table 7. The steady state kinetic parameters of wild-type FSA and first generation library hits, H1 and H2. * Estimated values due to possible substrate inhibition. H1 shows lower activity than wild-type (data not presented).

As seen from the Table 7, kcat/KM of H2 was improved over 9-fold comparing to wild-type FSA during one round of directed evolution. H3 doesn’t display significant improvement in catalytic parameters, however it does not show the same substrate inhibition as the wild-type enzyme.

5. Conclusion and future plan An FSA variant displaying over 9-fold improvement in catalyzing the retro- aldol cleavage of in 3,4-dihydroxy-5-phenylpentane-2-one, the aldol addition product between phenylacetaldehyde and hydroxyacetone, was isolated after a single round of saturation mutagenesis of two FSA active site residues and selection. The improvement in catalytic activity illustrated that the catalytic property of FSA can be modified through structural changes in the active site and that the active site of FSA is malleable. It has also validated the previ- ously described screening assay and demonstrated the sensitivity of the assay is enough to pick up the possible hits from the lysate. The same screening approach can be used to re-screen the library hits and avoid false positives when the protein expression level is higher.

49 Arg134 GOL

Lys85

135 Tyr 131 Ile

Figure 22. Library B (Tyr131) and C (Ile135), aiming to further improve the catalytic activity of FSA in aldol reaction between phenylacetaldehyde and hydroxyacetone.

Asn28 Thr26

Lys85

GOL

Figure 23. FSA library D, consisting of Thr26 and Asn28, aiming at exploring the possibility to utilize bulky aromatic donor.

In order to further improve the above mentioned activity, a second enzyme library consisting of only one residue Tyr131 around the possible acceptor substrate binding region will be constructed and screened using the same approach with wild-type FSA and H2 as template. H2 contains one extra mutation at Ile135 site caused by cloning artifact and it is also worth investi- gating whether the mutation at this site contributes to the improvement in desired activity. Therefore, another library will be constructed at this site using both wild-type FSA and H2 as template. Two other amino acid resi- dues around the donor binding region Thr26 and Asn28 were proposed for construction of another library aiming at altering the donor substrate speci- ficity to aryl-substituted substrates. Ideally, the templates for this library will be the hits from previous generation of libraries with improved activity when utilizing phenylacetaldehyde as acceptor substrate.

50 Concluding Remarks and Future Perspectives

The possibility to alter the substrate preference of two aldolases, DERA and FSA, by directed evolution was investigated in this thesis. Concerning DE- RA, a screening assay based on TLC and 14C labeled acetaldehyde was de- veloped and tested for screening for DERA variants displaying new activity in catalyzing aldol reaction between phenylacetaldehyde and acetaldehyde. The difficulties in validating the assay were also discussed. The role of Ser238 and Ser239 in the phosphate binding site of DERA was investigated by en- zyme kinetics analysis and MD simulation. The link between decreased DE- RA activity and reduced level of anti-correlated motions was proposed. Con- cerning FSA, a new screening strategy based on a coupled enzyme assay was developed and validated for screening for FSA variants with improved cata- lytic activity for aldol reaction between phenylacetaldehyde and hydroxyace- tone. The low initial aldol addtion activity between these two substrates has been improved by 9-fold through one round of directed evolution. The stud- ies in this thesis illustrate that directed evolution is not only an effective strategy to modify enzyme catalytic properties but also provides valuable insights into the relationship between protein structure and enzyme function. Also, with the knowledge and experience learnt in this process, we may thereby be able to optimize the way of navigating the sequence space and screening for new activities. The study between enzyme dynamics and catal- ysis also has wider implications for protein engineering since the mutation maybe not only affect the local structure but also bring changes to the global dynamics of the enzyme which may ultimately affect the catalytic properties.

The next step in the case of DERA is to further investigate the link between enzyme dynamics and catalytic activity and alter the directed evolution strat- egy in order to continue to explore the possibility to change the dependency on substrate phosphate in aldol reaction direction. A better approach may be to use a substrate that is more similar to DERA’s natural substrate G3P to serve as an intermediate challenge. The next step for FSA directed evolution is to further improve the catalytic activity for aldol reaction between phe- nylacetaldehyde and hydroxyacetone. Future generations of FSA libraries are proposed and will be screened using the same assay. Enzyme variants with improved activity from these libraries will serve as starting point to explore the possibility of utilizing aryl-substituted donor molecules in the future.

51 Populärvetenskaplig Sammanfattning

Det finns två fundamentala förutsättningar för liv: för det första måste organ- ismen kunna replikera sig själv och för det andra måste den kunna katalysera en mängd olika kemiska reaktioner effektivt. Den andra förutsättningen ut- förs, utan tvekan, av hundratusentals olika enzymer som finns närvarande i hela spektrumet av levande organismer; från den enkla E. coli till oss männi- skor, med mer avancerade strukturer och funktioner.

Enzymer är proteiner med undantaget att vissa katalytiska RNA. Enzymer är centrala i alla biokemiska processer. De katalyserar hundratals stegvisa re- aktioner från nedbrytning av näring från vårt dagliga matintag, till bevarande och omvandling av energi, till byggande av stora komplexa biomolekyler från simpla startmaterial.

Den biokemiska termen enzym myntades av tysken Wilhelm Kühne (1837- 1900) som Enzym, men ursprunget för termen enzym är från gammalgrekis- kan; en-inom och zumē-surdeg. Detta visar på väldigt tidig användning av enzymer i människans kultur; det är något som sattes till degen så att den kunde fermenteras. Fermentering är en viktig tillredningsmetod för mat lik- värdig med utvecklingen av jordbruket. Människan har använt fermentering för att producera mat och drycker sedan den neolitiska tidsåldern utan kun- skap om koncepten och vetenskapen bakom. Till exempel används fermente- ring för bevarande av mat genom en process som producerar mjölksyra som återfinns i sur mat, så som inlagd gurka, kimchi, produktion av ost och för att tillverka alkoholhaltiga drycker som exempelvis öl och vin, vilket finns do- kumenterat så tidigt som 7000-6600 före kristus.

Louis Pasteur utförde 1858 den första enzym-katalyserade kinetiska resolut- ionen genom att behandla racemiskt vinsyra med en kultur av Penicillium glaucum. Eduard Buchner rapporterade 1897 den framgångsrika nedbryt- ningen av socker genom att använda cellfri jästextrakt. Dessa är viktiga mil- stolpar i fermenteringens historia då de visade att biologiska reaktioner inte behöver levande celler. Detta ledde till expansionen av fermentering från matberedning till produktionen av stor mängd av olika akirala och kirala kemikalier. Detta paradigmshift öppnade dörren för den moderna biokataly- sen, vilken innebär användning av enzymer och mikrober i organisk syntes.

52 Med de utmaningar gällande den globala miljön, så ökar intresset för enzy- mer i den kemiska industrin som miljövänliga, ekonomiska och rena kataly- satorer med applikationer från tvättmedel och papperstillverkning till kemisk syntes och reagens inom diagnostikforskning. Biokatalys bidrar inte enbart med en mer miljövänlig väg för syntes av traditionella kemikalier, viktigare är att det öppnar möjligheten att använda biomassa som råmaterial. Det kommer möjliggöra en minskning av det tunga beroendet av fossila rå- material i den kemiska industrin. Biokatalys möjliggör också nya syntesvä- gar för nya aktiva kemikalier som är svåra, om inte omöjliga, att göra via traditionella metoder. På grund av sina inneboende begränsningar är dock de flesta nativa enzymer inte passande för direkt applikation på industriell skala. För att minska detta gap mellan naturen och industrin så har modifie- ring av enzymer utvecklats som ett kraftfullt verktyg för att artificiellt modi- fiera egenskaperna hos naturliga enzymer. Riktad evolution är en viktig me- tod bland flera olika strategier, som efterliknar naturlig evolution in vivo. Genom att skapa enzymvarianter från samma ursprungsenzym via en process kallad mutagenes och applicera ett artificiellt selektionstryck på enzymvari- anterna, så kan egenskaper hos det naturliga enzymet, såsom termostabilitet, stabilitet för organiska lösningsmedel och även katalytiska egenskaper, mo- difieras avsevärt i avseende på specifika krav. Dessa artificiellt modifierade enzymer är mer anpassade till industriella processer jämfört med deras natur- liga motsvarigheter.

Undersökningarna summerade i denna avhandling fokuserar på riktad evo- lution och enzymmekanismer hos aldolaser. Aldolaser är naturens egen kata- lysator för en av de viktigaste organkemiska reaktionerna; bildandet av nya kol-kol bindningar. Målet med riktad evolution är att öka substratspecificite- ten hos både DERA och FSA, vilket innebär skapa och leta efter nya enzym- varianter som kan katalysera reaktioner med nya substrat vilket deras förfä- der inte kan. Studien gällande DERAs och relaterade mutanters enzym- mekanismer har som mål att förstå i mer detalj relationen mellan aminosyra- sekvensen hos enzymer, proteinstrukturen, katalytisk aktivitet och enzymdynamik, genom en kombination av experimentella och datorbaserade metoder.

Riktad evolution bidrar med en väg för att skapa bättre biokatalysatorer och bidrar till väldigt värdefull information för att förstå kopplingen mellan ett enzyms struktur och funktion. Olikt många andra tekniska vetenskaper, så är “enzyme engineering” ortfarande en process där man får testa sig fram. Det beror mycket på den, än så länge, bristfälliga förståelsen av förhållandet mellan proteinerstrukturs och funktions. Detta innebära att kan insikterna från talrika exempel av riktad evolution, inkluderat de i denna avhandling, hjälper till att etablera en mer generell och effektiv strategi för att skräddarsy enzymer. Det kan slutligen medföra att fler nuvarande kemiska processer

53 kan utföras på renare sätt och öppna upp för fler möjligheter för den kemiska industrin under kommande år.

54 Résumé

Il y a deux conditions fondamentales à la vie: premièrement, l'organisme doit être capable de se reproduire. Deuxièmement, il doit être capable de cataly- ser efficacement différents types de réactions chimiques. Nous savons que cela repose sur de très nombreuses différentes enzymes qui sont présentes dans les organismes vivants, de E. coli, un organisme très simple, jusqu’ à nous-mêmes, les êtres humains, avec des structures beaucoup plus com- plexes et des fonctions avancées.

Les enzymes sont des grandes biomolécules. La plupart sont des protéines à l'exception d'un petit groupe qui sont des ARN catalytiques et qui s’appellent des ribozymes. Elles peuvent accélèrer jusqu’à des millions de fois la vitesse d’une réaction chimique en baissant son énergie d’activation. Les enzymes sont au cœur de chaque processus biochimique en catalysant par étape les centaines de réactions qui se passent à tout moment dans les organismes vivants. Par exemple, la dégradation des nutriments, la conservation et la transformation de l'énergie, la synthèse de biomolécules grandes et com- plexes à partir de précurseurs petits et simples. Une enzyme est générale- ment très spécifique et met en œuvre toujours la même réaction avec les mêmes substrats. Cette spécificité découle de la structure tridimensionnelle unique de leurs sites de liaison.

Le terme biochimique ‘‘enzyme”, de l’allemand Enzym, a été utilisé pour la première fois par Wilhelm Kühne (1837-1900), un physiologiste allemand. Mais l'origine du mot remonte au grec ancien, en signifiant intérieur et zume levain. Cela révèle une utilisation très précoce des enzymes dans la culture humaine : c’est quelque chose qu'on ajoute à la pâte pour la faire fermenter. Les Hommes ont utilisé la fermentation pour produire des aliments et des boissons alcoolisées depuis l'âge néolithique (7000 - 6600 av.jc) sans con- naître le concept de l’enzyme ni la science derrière elle. Aujourd’hui encore, la fermentation est une méthode importante de transformation des aliments : elle sert à produire du fromage, de la bière, du vin...

En 1858, Louis Pasteur a réalisé la première résolution cinétique enzyma- tique en traitant une solution aqueuse d'acide tartrique racémique avec une culture de Penicillium glaucum. En 1897, Eduard Büchner a annoncé la fer- mentation réussie du sucre par un extrait de levure. Ces deux exemples sont

55 des jalons importants dans l'histoire de la fermentation. Ils ont ainsi prouvé que les transformations biologiques ne nécessitent pas obligatoirement des cellules vivantes. Ceci a permis l’application de la fermentation à la produc- tion de molécules de synthèse, chirales ou achirales. Ce changement de pa- radigme a ouvert la voie à la biocatalyse moderne qui se réfère à l'utilisation des enzymes et des microbes dans la synthèse organique.

Face aux défis environnementaux globaux, les enzymes deviennent de plus en plus attractives dans l'industrie chimique comme catalyseurs respectueux de l'environnement et économiques. Les enzymes sont très utilisées dans la production de détergents, le traitement du papier ou la chimie fine, par exemple. La biocatalyse fournit une approche plus écologique pour la syn- thèse de nombreux produits chimiques traditionnels, et plus important en- core, elle offre la possibilité d'utiliser la biomasse comme matière première. Cela aidera à réduire notre dépendance aux ressources fossiles dans l'indus- trie chimique. La biocatalyse permet aussi la production de nouveaux pro- duits chimiques qui sont traditionnellement difficiles à obtenir.

Toutefois, en raison de leurs limitations intrinsèques, la plupart des enzymes naturelles ne sont pas utilisables directement pour une application dans l’industrie chimique. Pour combler l'écart entre la nature et l'industrie, le génie enzymatique a émergé comme un outil puissant afin de modifier artifi- ciellement les propriétés des enzymes naturelles. L'évolution dirigée est une méthode importante pour le génie enzymatique. Elle consiste a imité l'évolu- tion naturelle in vitro. Typiquement, une expérience d’évolution dirigée comporte deux étapes, la première étape est la création d’une diversité géné- tique et la deuxième est le criblage ou sélection.

En pratique, en créant des variantes d'enzymes à partir de la même enzyme ancêtre via un processus appelé mutagénèse et en leurs appliquant une sélec- tion artificielle, les propriétés catalytiques de l'enzyme naturelle peuvent être profondément modifiées en fonction des besoins spécifiques ; par exemple, avoir une meilleure stabilité thermique ou aux solvants organiques. Ces en- zymes modifiées artificiellement sont plus appropriées aux les procédés in- dustriels.

Cette thèse présente l'évolution dirigée et l'étude des mécanismes enzyma- tiques de deux aldolases : la DERA (2-deoxyribose-5-phosphate aldolase) et la FSA (fructose-6-phosphate aldolase). Celles-ci sont des catalyseurs de l’une des plus importantes réactions organiques : la formation d'une nouvelle liaison entre deux atomes de carbone. Le but de l'évolution dirigée est ici d'élargir la gamme de substrats de ces deux enzymes. C’est à dire, chercher des variantes d'enzymes capables de catalyser la réaction entre de nouveaux substrats que leurs enzymes ancêtres ne sont pas capables d'utiliser. L'étude,

56 expérimentale et théorique, du mécanisme enzymatique de la DERA et de ses mutants, nous aidera à comprendre plus en détail la relation entre la sé- quence d'acides aminés, la structure tridimensionnelle, l'activité catalytique et la dynamique de l’enzyme.

L'évolution dirigée fournit un moyen de créer de meilleures biocatalyseurs, ainsi que de très précieuses informations pour comprendre le lien entre la structure et la fonction catalytique d'une enzyme. Contrairement à de nom- breuses autres sciences d'ingénierie bien établies, le génie enzymatique est encore très empirique, car nous n’avons pas encore une vision générale qui lie la structure à l’activité des enzymes. Par conséquent, les exmples présen- tés dans cette thèse vont aider à améliorer notre compréhension. Cela per- mettra d’établir une stratégie plus générale et efficace pour modifier les en- zymes et, finalement, développer des procédés chimiques plus respectueux de l’environnement et offre plus de possibilités dans l'industrie chimique.

57 论文摘要

所有生命的存在依赖于两个前提条件,一是可以进行自我复制;二是 可以高效和有选择性的进行新陈代谢所需的成千上万种不同的化学反 应。毫无疑问,生物体内各种各样的酶是催化新陈代谢反应中不可或 缺的催化剂。

酶泛指具有生物催化功能的高分子物质。自然界中大多数的酶属于蛋 白质,但有少部分的酶属于核糖核酸,称为核酶。 酶反应是一切生命 活动的重要基础,因为几乎所有细胞新陈代谢中的化学反应都需要酶 的催化来提高反应效率。酶甚至可以将其催化的反应速率提高上百万 倍。 酶催化的反应具有高度的专一性,这意味着一种特定的酶只会催 化特定的反应底物,特定的反应类型生成特定构型的产物。 正因为有 了酶的催化,我们才能够消化食物,从中获取能量并合成生命活动所 需要的各种结构复杂的生物大分子。

酶这个概念最早是在 19 世纪由德国的生理学家 Wilhelm Kühne 提出 的,称 enzym。 但是这个词的起源可以追溯到古希腊语, 意思是“在 酵母里的物质”。 这反映出人类应用自然界里天然存在的酶进行发酵 的历史远远长于酶这个概念的提出。 发酵是人类最早掌握的食品处理 方法之一。比如利用无氧发酵所产生的乳酸来保存蔬菜,这就是我们 熟知的泡菜。 同样道理,新鲜牛奶可以被制成奶酪和酸奶。当然应用 最广泛历史最悠久的当属利用发酵来制作含酒精的饮料,可考证的最 早的酿酒记录追溯到我国的商周时期。“古来圣贤多寂寞,惟有饮者 留其名”一首又一首传颂于世的诗词歌赋从另一个层面展示了发酵自 古以来在人类文明史的地位。 在 1897 年,另一位德国化学家 Eduard Buchner 第一次成功地利用酵母的提取物发酵制造糖类。这是发酵史 上一个重要的里程碑,因为这第一次证明了发酵不一定必须要活的微 生物中进行,酵母的萃取物—也就是今天我们常说的酶,在生物体外 只要有合适的条件也可以催化化学反应。 这可以说是开了生物催化在 化学合成工业中应用的先例。从此以后发酵的应用不仅仅局限在保存 和处理食品上,有很多化学品也可以通过微生物发酵的方法合成。

时至今日,我们日常生活的方方面面都依赖与化学合成工业。从医疗 药品到纺织纤维再到最普通不过的洗发水牙膏, 化学合成工业举不胜 举的产品从一定程度上定义了现代生活。 然而在化学合成工业方便我 们日常生活的方方面面的同时,它也在全球范围内带给我们很大的挑 战,那就是日益严峻的环境问题。面对着经济发展和资源环境保护的

58 矛盾, 可持续发展战略的重要性和紧迫性日益突出。在这个背景之下, 在化学合成工业也掀起了绿色化学的革命。所谓绿色化学就是研究如 何在化学产品的设计,开发和加工生产过程中最充分的利用原料和能 源并将对环境的负面影响减少到最低的一门新兴学科。催化剂在化学 合成工业中的地位举足轻重, 如果没有催化剂的催化,很多化学反应 难以发生。然而应用广泛的传统的有机和金属催化剂虽然有成熟的应 用领域但不可否认的是它也有巨大的弊端-对环境的负面影响。因此 寻找能够替代传统催化剂的新型催化剂是绿色化学革命中的非常重要 的环节。酶最为自然界中天然存在的催化剂,自然而然备受关注。其 专一性,高效性,对环境影响的是一般催化剂所不能比拟的,但是天 然酶的许多性质不适应工业生产的条件, 所以天然存在的酶极少能够 被直接应用在工业化的化学合成中。

酶工程正是为了解决这个问题应运而生的, 酶工程最主要的目的在于 用各种不同的策略对天然存在的酶进行改良最终使其能够被运用在大 规模的有机合工业中。 在众多酶工程的策略里,酶定向进化技术是目 前发展最快取得成果最多的一个分支。酶定向进化技术与生物学史上 的一个转折点-达尔文在《物种起源》中提出以自然选择为基础的进 化学说非常类似,在一定程度上可以理解为分子规模上的在试管中的 进化。酶定向进化技术的关键在于 在实验室中模仿自然进化的关键步 骤(突变、重组和筛选),从天然的或人为获得的酶出发,经过基因 突变和重组,构建一个人工突变酶库,通过筛选最终获得预先期望的 具有某些特性的进化酶, 这样可以在较短时间内完成漫长的自然进化过 程,有效改造蛋白质,使之具有一些天然酶所不具有得的性质。比如 天然酶的热稳定性和在有机溶剂中的稳定性就可以通过定向进化而大 幅度提高。再比如,酶的底物专一性也可以通过这个方法改变。利用 酶定向进化技术重新设计及改进天然酶分子的结构和性质,可以逐步 接近和满足将酶催化用于工业生产的需要。

本文所涉及的领域是酶定向进化技术的一个例子,研究的重点是两种 新陈代谢中重要的醛缩酶,2 -脱氧核糖-5-磷酸醛缩酶 和果糖 6-磷酸果 糖醛缩酶。醛缩酶大致可以分为两类,它们催化醛类和酮类之间的醛 缩反应。在生物体内,醛缩酶在糖类的在新陈代谢中地位举足轻重。 有机合成中醛缩酶也十分重要, 因为它们催化合成新的碳碳键,碳碳 键的形成是构建一个有机分子的基本骨架的重要方法。醛缩酶在有机 合成多羟基衍生物中有很多的应用。多羟基衍生物中有很多是药物的 前体分子。论文中提到的 2 -脱氧核糖-5-磷酸醛缩酶在工业上就被应用 于合成阿托伐他汀类降低胆固醇药物的前体, 比如辉瑞制药生产的立 普妥。天然醛缩酶反应底物非常有局限性,所以我们利用酶定向进化 技术改良原有的醛缩酶的底物选择性,同时去进一步了解酶的分子结 构和催化活性之间的联系。 这样的研究对于完善酶分子结构和催化活 性之间的联系的理论有极大的意义。

59 Acknowledgements

Now it is time to stop and thank the people who have made a difference in my life. I wish I could put this chapter in the very beginning of the book because without all the people here, I will not be where I am today.

Frist of all, my supervisor, Prof. Mikael Widersten, thank you for taking me as a project student and then as a Ph.D student, for giving me chances to try out my own ideas and guiding me through many difficult moments, for calming me down and teaching me to focus on the important things whenev- er I get panic and have no idea what to do. Prof. Helena Danielson, thank you for being my co-supervisor, for all the meaningful and helpful conversa- tions.

Lynn and Klaudia, thank you very much for your effort in our collaboration, for patiently guiding me through a completely different enzyme world in the computer. Thilak, thank you for your effort in synthesizing the substrates for aldolase project.

Åsa, thank you for teaching me all the lab survival skills when I was your project student and being patient even at the moments when I run the DNA gel in the opposite direction, when I got sparks during electroporation and when I called you in the evening and told you my SDS gel had melt- ed…Being in the same office for five years, I have a lot to say: the Miffy stickers, our lucky office dog balloon, the conference trips, the forever last row Afrodances, the daily discussions, the daily complains, the daily panic attacks, the daily laughter, and all the accidents with the wardrobes…Thank you for sharing so many things with me during these years, letting me invade your super tidy bench from time to time and helping me out whenever my Swedish is not enough.

Emil, we have shared the same office with a shaky bookshelf in between for four years. Thank you for listening to me patiently and turning you chair to help whenever I ask “Emil, do you happen to know…?” And thank you very much for your help with the Swedish popular science summary.

Cissi, I know you laughed at my tiny hands and short arms since the first day I joined the MW group. Well, nothing has happened to my small hands after

60 all these years, I still struggle to hold the 1 L bottle in one hand. I just want to thank you for sharing so many moments after dark in the lab and on the weekends, for cheering me up in your own way, and for your FucO DA1472. Yepp, as you said before, it all worked out itself (of course, with some strug- gles).

Paul, even though you “abandoned” us some years ago, but I still want to thank you for helping me whenever I’m lost with the computer and don’t know which button to push. Nisse, thank you for all the super useful practi- cal tips! My project students during the years, Enrico and Candice, thank you for your efforts in the lab.

I would also like to thank all the nice biochemists in the department: Gunnar J., Ylar, Doreen, Dirk, Sara S., Vladimir… thank you for all the nice discus- sions during our seminars and around the lunch table. Special thanks to: Erika, for being our social specialist. Eldar, for helping me out whenever I need some emergency Amp plates or some instructions for whatever instru- ment.

Inger and Johanna, thank you for helping me out with the most practical issues and answering my never-ending questions about Primula. Gunnar S., thank you for delivering our packets always with a smile. Bosse, thank you for helping me with all the softwares even when I messed up my Microsoft Office three times in half an year.

Françoise, merci de m'avoir sauvé du calcul des heures d'enseignement avec ton super tableau et d'avoir corrigé mon résumé en français. J'ai beaucoup apprécié ton aide pour me faire pratiquer le suédois puis le français.

I would like to thank all the colleagues in the department for nice lunch table conversations: Aleksandra B., Alexander O., Kathik, Rabia, Kjell, Sandra, Thomas…to name a few. Special thanks to: Maxim, for all the BBQs we had in the not-so-warm Swedish summers and all the drinks at O’Connor’s. Sasha, for understanding that finding the right lipstick is never an easy task and for making me feel not so bad about the number of useless-useful- colorful things I own. Carina, for your professional eyes for all the struc- tures and reaction mechanisms in my thesis and for all the nice after-work drinks we had. Alban, for all the help for my French, even though sometimes I don’t really get your jokes. 谢谢我所有的中国同事: 杨杰,家辉,佳 杰,黄枭,熊哥,黄皓。谢谢你们在吃午饭时的时候听我胡说八道, 谢谢你们积极的和我一起胡说八道,讲着只有我们能听懂的的笑话, 八卦着没有办法翻译的八卦。

61 My entire Uppsala crew: Carmen, Giulia, Fernando, Pieter, Davide, Mahsa, Ana Rosa, Guillermo, Alberto, María, Carol, Ana, José. I will probably still get my Ph.D without all your guys, but my life will never be the same with- out you. I know I have not yet reached the Spanish party standard after all these years’ practice, I still fall asleep when the real party is about to start, but the time I had with all of you in all the our either crazy parties (when we were still young) or not-so-crazy parties (when we are not as young as we use to be) is one of the best parts of my twenties. Thank you for spicing my life up! Special thanks to: Carmen, for introducing me to the Spanish crew and for all the girly and not so girly talks we had. Giulia and Fernando, for being our party specialists and making me laugh so much in all our parties. Alex, my flat mate who reads palms, for tolerating the bathroom BBC news every morning and for all the nice conversations we had in our little kitchen. (I know most of the time you were just listening to all my complaints).

Sonja, thank you for being my friend since once upon a time a crazy Chinese girl came to you and asked: “are you Sonja from Germany?” You said you were trying to practice English, I said I was trying to make friends. Now, you are waiting for the new family member and I am closing my Uppsala chapter…many things have changed in these years, but some has not, like our Sunday swimming. Thank you for being a part of my Uppsala time from the very beginning and I wish you all the best for your journey in the aca- demia and for your young family.

Laura, do you still remember that photo? The one we were both in it without knowing each other. This is such a lovely little surprise we got from life. Thank you for being such a great friend inside and outside of the lab all these years. Thank you for sharing all the ups and downs during these years and cheering me up each time when I’m stuck in the local minimum on the Ph.D landscape. I am so lucky to have a friend like you. Laura, Fernando and Pieter, thank you for our seven-days-in-Tibet trip, full of drama, full of memories…

Eveline et “The King of the Kitchen”. Merci de me gâter à chaque fois que je suis à la Maison Menucourt et merci de me montrer le meilleur de la culture française, le fromage, le vin, la famille et bien sûr, l'amour.

Merci et bisou pour mon bonbon. Merci de m’aime, me calîner et me faire un thé même quand je suis plus ronchonne que le Grumpy Cat. Les meilleures choses à Uppsala pendant mon doctorat sont, un stupide email avec un numéro de téléphone, une salade de saumon avec une sauce comme une soupe…Le meilleur, c’est toi. Merci de m’apporter d’autres possibilities de vie, merci d’être ma nouvelle aventure.

62 谢谢我永远的谢老师, 一直到今天在远在瑞典的实验室里我仍然会想 起当初船中初一(2)班的第一节课。当年那个黄毛丫头在 18 年后已 经慢慢长大。谢谢您这么多年陪伴着我成长,作为我的老师更作为我 的好朋友。  谢谢张珣姐姐, 1998 年咱俩认识的时候,长江发着气势不可挡的大 水,你嘲笑着我头上的三个辫子。呵呵,你近二十年的闺蜜终于要把 以前的三个辫子升级了。记得以前你教我英语的时候,我分不清 in 和 on。 一转眼,这么多年过去了,我竟也凑凑合合用英语写了这本书。 二十年前我是一个什么都不懂的小屁孩,二十年后是依然很多问题没 有想明白的,没钱没房没车没工作的匆匆奔三而去的人。人总要有点 自嘲的精神, 不然骨感的现实里不是一点幽默感都没有了。谢谢姐姐 这些年在我自嘲的时候给我无条件的支持,也谢谢姐姐在这些年我没 心没肺忙着看这花花世界的时候不时陪伴我的老爸老妈,让他们不太 寂寞。

谢谢我最亲爱的老爸老妈, 谢谢你们的爱给了我整个世界。谢谢你们 抚养我长大并一直鼓励我寻找自己的方向,即使这意味着我们相隔万 里。 我知道你们不一定能看得懂这本书在写什么,但是如果没有你们 一路上无条件的爱和支持,我不可能在任性追逐自己梦想的路上走了 这么远。 老爸送给我的几个字我一直记得也将永远记得,自信,勤奋 和宽容。 MERCI MERCI TACK TACK GRACIAS THANK OBRIGADO YOU 谢谢 DZI KUJ TE"EKKÜR EDER#M DANKE ! ! RAHMAT

GRAZIE

!

63 References

1. D. L. Nelson and M. M. Cox, Lehniger Principles of Biochemistry, 5th Edition, W. H. Freeman and Company, 2008, ISBN 9780716771081, pp. 183. 2. U.T. Bornscheuer, G.W. Huisman, R. J. Kazlauskas, S. Lutz, J. C. Moore and K. Robins, Nature, 2012, 485, pp. 185-194 3. Manfred T. Reetz, J. Am. Chem. Soc., 2013,135, pp.12480-12496. 4. V. Gotor and S. Filtsch, Curr. Opin. Chem. Biol., 2011, 15, pp. 185-186. 5. M. G. Adsul, M. S. Singhvi, S. A. Gaikaiwari and D. V. Gokhale. Bioresour. Technol., 2011, 102, pp. 4304-4312. 6. H. Fukuda, A. Kondo and S. Tamalampudi, Biochem. Eng. J., 2009, 44, pp. 2- 12. 7. R. Hatti-Kaul, U. Törnvall, L. Gustafsson and P. Börjesson, Trends Biotechnol., 2007, 25, pp. 119-124. 8. A. S. Bommarius and B. R. Riebel, Biocatalysis, Wiley-VCH, 2004, ISBN 9783527303441. 9. H. E. Schoemaker, D. Mink and M. G. Wubbolts, Science, 2003, 299, pp. 1694- 1697. 10. A. Schmid, J. S. Dordick, B. Hauer, A. Kiener, M. Wubbolts and B. Witholt, Nature, 2001, 409, pp. 258-267. 11. K. M. Koeller and C. H. Wong, Nature, 2001, 499, pp. 232-240. 12. J. D. Bloom and F. H. Arnold, PANS, 2009, 106, pp. 9995-10000. 13. F. H. Arnold, Cold Spring Harb. Symp. Quant. Biol., 2009, 74, pp. 41-46. 14. M. J. Dougherty and F. H. Arnold, Curr. Opin. Biotech., 2009, 20, pp. 486-491. 15. N. J. Turner, Nat. Chem. Biol., 2009, 5, pp. 567-573. 16. P. A. Romero and F. H. Arnold, Nature Review, 2009, 10, pp. 866-876. 17. N. J. Turner and M. D. Truppo, Crurr. Opin. Chem. Biol., 2013, 17, pp. 212- 214. 18. C. A. Hutchison III, S. Philips, M. H. Edgell, S. Gillam, P. Jahnke and M. Smith, J. Biol. Chem., 1978, 253, pp. 6551-6560. 19. M. Smith, Angew. Chem. Int. Ed., 1994, 33, pp. 1214-1221. 20. H. Y. Tao and V. W. Cornish, Curr. Opin. Chem. Biol., 2002, 6, pp. 858-864. 21. E. M. Brustad and F. H. Arnold, Curr. Opin. Chem. Biol., 2011, 15, pp. 201- 210. 22. C. A. Denard, H. Ren and H. M. Zhao, Curr. Opin. Chem. Biol., 25, pp. 55-64. 23. A. S. Bommatius, J. K. Blum and M. J. Abrahamson, Curr. Opin. Chem. Biol., 2011, 15, 194-200. 24. P. Bernhardt and S. E. O’conner, Curr. Opin. Chem. Biol., 2009, 13, pp. 35-42. 25. J. H. Tao and J. H. Xue, Curr. Opin. Chem. Biol., 2009, 13, pp. 43-50. 26. A. Currin, N. Swainston, P. J. Day and D. B. Kell, Chem. Soc. Rev., 2015, 44, pp. 1171-1239. 27. S. Lutz, Curr. Opin. Biotech., 2010, 21, pp. 734-743. 28. J. Damborsky and J. Brezovsky, Curr. Opin. Chem. Biol., 2009, 13, pp. 26-34. 29. D. Kahakeaw, J. Sanchis and M. T. Reetz, Mol. Biosyst., 2009, 5, pp. 115-122.

64 30. M. Bocola, J. D. Carballeira, D. X. Zhao A. Vogel and M. T. Reetz, Angew. Chem. Int. Ed., 2005, 44, pp. 4192-4196. 31. C. Jäckel and D. Hilvert, Curr. Opin. Chem. Biol., 2010, 21, pp. 753-759. 32. M. T. Reetz, D. Kahakeaw and R. Lohmer, ChemBioChem, 2008, 9, pp. 1797- 1804. 33. L. X. Tang, H. Gao, X. C. Zhu, X. Wang, M. Zhou, and R. X. Jiang, BioTech- niques, 2012, 52, pp. 149-158. 34. B. J. Alder and T. E. Wainwright, J. Chem. Phys., 1959, 31, 459-482. 35. M. Karplus and J. A. McCammon, Nature Struct. Biol., 2002, 9, pp. 646-652. 36. A. Kukol, Molecular Modeling of Proteins, 2th Edition, Humana Press, 2015. 37. W. L. Jorgensen, D. S. Maxwell and J. Tirado-Rives, J. Am. Chem. Soc., 1996, 118, pp. 11225-11236. 38. W. L. Jorgensen, J. Chandrasekhar, J. D. Madura, R. W. Impey and M. L. Klein, J. Chem, Phys., 1983,79, pp. 926-937. 39. S. Hay and N. S. Scrutton, Nature Chemistry, 4, 2012, pp. 161-168 40. E. Fischer, Ber. Dtsch. Chem. Ges., 1894, 27, pp. 2985–2993 41. L. Pauling, Am. Sci., 1948, 36, pp. 51–58. 42. D. E. Koshland, PANS, 1958, 44, pp. 98–104. 43. R. Callender and R. B. Dyer, Acc. Chem. Res., 2015, 48, pp. 407-413. 44. J. L. Radkiewicz and C. L. Brook III, J. Am. Chem. Soc., 2000, 122, pp. 225- 231. 45. P. K. Agarwal, S. R. Billeter, P. T. R. Rajagopalan, S. J. Benkovic and S. Hammes-Schiffer, PNAS, 2002, 99, pp. 2794-2799. 46. T. H. Rod, J. L. Radkiewicz and C. L. Brooks III, PNAS, 2003, 100, pp. 6980- 6985. 47. K. F. Wong, T. Selzer, S. J. Benkovic and S. Hammes-Schiffer, PNAS, 2005, 102, pp. 6807-6812. 48. P. Singh, A. Sen, K. Francis and A. Kohen, J. Am. Chem. Soc., 2014, 136, pp. 2575-2582. 49. J. Luo and T. C. Bruice, PNAS, 2004, 101, pp. 13152-13156. 50. S. Osuna, G. Jiménez-Osés, E. L. Noey and K. N. Houk, Acc. Chem. Res., 2015, 48, pp. 1080-1089. 51. A. Kohen, Acc. Chem. Res., 2015, 48, 466−473. 52. J. Clayden, N. Greeves, S. Warren and P. Wothers, Organic Chemistry, Oxford University Press, 2001, ISBN 9780198503460. 53. A. K. Samland, M. Rale, G. A. Sprenger, and W. D. Fessner, ChemBioChem, 2011, 12, pp.1454-1474. 54. R. Mahrwald, Modern Aldo reactions Vol. 1, Wiley-VCH. Weiheim, 2004, ISBN 3527307141. 55. W. D. Fessner and V. Helaine, Curr. Opin. Biotechn., 2001, 12, pp. 574–586. 56. S. Mukherjee, J. W. Yang, S. Hoffmann, and B. List, Chem. Rev., 2007, 107, pp. 5471-5569. 57. K. N. Houk and B. List, Acc. Chem. Res., 2004, 37, pp. 487–631. 58. B. M. Trost, A. Fettes and B. T. Shireman, J. Am. Chem. Soc., 2004, 126, pp. 2660–2661. 59. A. K. Samland and G. A. Sprenger. Appl. Microbiol. Biotechnol, 2006, 71, pp. 253-264. 60. P. Clapés, W. D. Fessner, G. A. Sprenger and A. K. Samland, Curr. Opin. Chem. Biol., 2010, 14, pp. 154-176. 61. T. D. Machajewski and C. H. Wong, Angew. Chem. Int. Ed., 2000, 39, pp. 1352-1374. 62. M. Schuman and G. A. Sprenger, J. Biol. Chem., 2001, 276, pp. 11055-11061.

65 63. J. J. Marsh and H. G. Lebherz, Trends Biochem. Sci., 17, 1992, pp. 110–113. 64. T. Gefflaut, C. Blonski , J. Perie and M. Willson, Prog. Biophys. Mol. Biol., 1995, 63, pp. 301–340. 65. A. K. Samland and G. A. Sprenger, Appl. Microbiol. Biotechnol., 2006, 71, pp. 253-264. 66. I. Sánchez-Moreno, L. nauton, V. Théry, A. Pinet, J. L. Petit, V. de Berardinis, A. K. Samland, C. Guérard-Hélaine and M. Lemaire, J. Mol. Catal. B: Enzym., 2012, 84, pp. 9-14. 67. C. L. Windle, M. Muller, A. Nelson and A. Berry, Curr. Opin. Chem. Biol., 2014, 19, pp. 25-33. 68. E. A. Stura, S. Ghosh, E. Garcia-Junceda, L. Chen, C. H. Wong and I. A. Wil- son, Proteins: Struc. Func. Genet., 1995, 22, pp. 67-72. 69. D. L. Nelson and M. M. Cox, Lehniger Principles of Biochemistry, 5th Edition, W. H. Freeman and Company, 2008, ISBN 9780716771081, pp. 558-563. 70. A. Heine, G. Desantis, J. G. Luz, M. Mitchell, C. H. Wong and I. A. Wilson, Science, 2001, 294, pp. 369-374. 71. G. DeSantis, J. J. Liu, D. P, Clark, A. Heine, I. A. Wilson and C. H. Wong, Biorg. Med. Chem., 2003, 11, pp. 43-52. 72. A. Heine, J. G. Luz, C. H. Wong and I. A. Wilson, J. Mol. Biol. 2004, 343, pp. 1019-1034. 73. C. F. Barbas III, Y. F. Wang and C. H. Wang , J. Am. Chem. Soc., 1990, 112, pp. 2013-2014. 74. L. Chen, D. P. Dumas and C. H. Wong, J. Am. Chem. Soc., 1992, 114, pp. 741- 748. 75. H. J. M. Gijsen and C. H. Wong, J. Am. Chem. Soc., 1994,116, pp. 8422-8423. 76. W. A. Greenberg, A. Varvak S. R. Hanson, K. Wong, H. J. Huang, P. Chen and Mark J. Burk, PNAS, 2004, 101, pp. 5788-5793. 77. F. Subrizi, M. Crucianelli, V. Grossi, M. Passacantando, G. Botta, R. Antiochia and R. Saladino, ACS Catal., 2014, 4, pp. 3059-3068. 78. S. Thorell, M. Schürmann, G. A. Sprenger and G. Schneider, J. Mol. Biol., 2001, 319, pp. 161-171. 79. J. A. Castillo, J. Calveras, J. Cassas, M. Mitjans, M. P. Vinardell, T. Parella, T. Inoue, G. A. Sprenger, J. Joglar and P. Clapés, Org. Lett., 2006, 8, pp. 6067- 6070. 80. A. L. Concia, C. Lozano, J. A. Castillo, T. Parella, J. Joglar and P. Clapés, Chem. Eur. J., 2009, 15, pp. 3808-3816. 81. J. A. Castillo, C. Guérnard-Hélaine, M. Gutiérrez, X. Garrabou, M. Sancelme, M. Schürmann, T. Inoue, V. Hélaine, F. Charmantray, T. Gefflaut, L. Hecquet, J. Joglar, P. Clapés, G. A. Sprenger and M. Lemaire, Adv. Synth. Catal., 2010, 352, pp. 1039-1046. 82. X. Garrabou, J. A. Castillo, C. Guérnard-Hélaine, T. Parella, J. Joglar, M. Lemaire and P. Clapés, Angew. Chem. Int. Ed., 2009, 48, pp. 5521-5525. 83. A. Szekrenyi, A. Soler, X. Garrabou, C. Guérnard-Hélaine, T. Parella, J. Joglar, M. Lemaire, J. Bujons and P. Clapés, Chem. Eur. J., 2014, 20, pp. 12572- 12583. 84. Å. Janfalk Carlsson, P. Bauer, H. Ma and M. Widersten, Biochemistry, 2012, 51, pp. 7627-7637. 85. C. Blikstad, K. M. Dahlström, T. A. Salminen, M. Widerstern, ACS Catal., 2013, 3, pp. 3016-3025. 86. O. Khersonsky, C. Roodveldt and D. S. Tawfik, Curr. Opin. Chem. Biol., 2006, 10, pp. 498-508.

66 87. D. L. Nelson and M. M. Cox, Lehniger Principles of Biochemistry, 5th Edition, W. H. Freeman and Company, 2008, ISBN 9780716771081, pp. 530-531. 88. J. P. Dworkin and S. L. Miller, Research, 2000, 329, pp.359-365. 89. A. S. Seranni, J. Pierce, S. G. Huang and R. Barker, J.Am. Chem. Soc., 1982, 104, pp.4037-4044. 90. J. Pierce, A. S. Serianni and R. Barker, J. Am. Chem. Soc., 1985, 107, 2448- 2456. 91. K. Spong, T. L. Amyes and J. P. Richard, J. Am. Chem. Soc., 2013, 135, pp.18343-18346. 92. A. C. Reyes, X. Zhai, K. T. Morgan, C. J. Reinhardt, T. L. Amyes and J. P. Richard, J. Am. Chem. Soc., 2015, 137, pp. 1372-1382. 93. B. Goryanova, L. M. Goldman, S. Ming, T. L. Amyes, J. A. Gerit and J. P. Richard, Biochemistry, 2015, 54, pp. 4555-4564. 94. X. Zhai, T. L. Amyes, R. K. Wierenga, J. P. Loria and J. P. Richard, Biochemis- try, 2013, 52, pp. 5928-5940. 95. X. Zhai, M. K. Go, A. C. O’Donoghue, T. L. Amyes, S. D. Pegan, Y. F. Wang, J. P. Loria, A. D. Mesecar and J. P. Richard, Biochemistry, 2014, 53, pp. 3486- 3501. 96. X. Zhai, T. L. Amyes and J. P. Richard, J. Am. Chem. Soc., 2014, 136, pp. 4145-4148. 97. S. C. Lovell, J. M. Word, J. S. Richardson and D. C. Richardson, Proteins: Struct. Func. Genet., 2000, 40, pp. 389-408. 98. A. Barrozo, F. Duarte, P. Bauer, A. T. Carvalho and S. C. Kamerlin, J. Am. Chem. Soc., 2015, 137, pp. 9061-9076. 99. B. A. Amrein, P. Bauer, F, Duarte, Å. Janfalk Carlsson, A. Naworyta, S. L. Mowbray, M. Widersten and S. C. L. Kamerlin, ACS Catal., 2015, 5, pp. 5702- 5713. 100. H. J. C. Berendsen, J. P. M. Postma, W. F. van Gunsteren, A. DiNola and J. R. Haak, J. Chem. Phys., 1984, 81, pp. 3684-3690. 101. D. van der Spoel, E. Lindahl, B. Hess, G. Groehnhof, A. E. Mark and H. J. C. Berendsen, J. Comput. Chem., 2005, 26, pp. 1701-1718. 102. B. J. Grant, A. P. Rodrigues, K. M. ElSawy, J. A. McCammon and L. S. Caves, Bioinformatics, 2006, 22, pp. pp. 2695-2696. 103. C. Blikstad, K. M. Dahlström, T. A. Salminen, M. Widstern, ACS Catal., 2013, 3, pp. 3016-3025. 104. C. Blikstad, K. M. Dahlström, T. A. Salminen, M. Widstern, FEBS J., 2014, 281, pp. 2387-2398. 105. C. Blikstad and M. Widersten. J. Mol. Catal. B: Enzym., 2010, 66, pp. 148-155. 106. N. Miyoshi, S. Takeuchi and Y. Ohgo. Chem. Lett., 1993, 22, pp. 2129-2132. 107. L. T Elfström and M. Widersten, Biochim. Biophy. Acta, 2005, 1748, 231-221. 108. S. Bershtein and D. S. Tawfik, Curr. Opin. Chem. Biol., 2008, 12, pp. 151-158. 109. J. D. Bloom, S. T. Labthavikul, C. R. Otey and F. H. Arnold, PNAS, 2006, 103, pp. 5869-5874.

67 Acta Universitatis Upsaliensis Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1318 Editor: The Dean of the Faculty of Science and Technology

A doctoral dissertation from the Faculty of Science and Technology, Uppsala University, is usually a summary of a number of papers. A few copies of the complete dissertation are kept at major Swedish research libraries, while the summary alone is distributed internationally through the series Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology. (Prior to January, 2005, the series was published under the title “Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology”.)

ACTA UNIVERSITATIS UPSALIENSIS Distribution: publications.uu.se UPPSALA urn:nbn:se:uu:diva-266902 2015